diff --git a/BMPv2_README.md b/BMPv2_README.md new file mode 100644 index 0000000000000000000000000000000000000000..7d1e48393ad20abbc142756ce659bde7652e91de --- /dev/null +++ b/BMPv2_README.md @@ -0,0 +1,311 @@ +
+ +
+
+ +
+ +
+ BBoxMaskPose v2 loop + + [![Website](https://img.shields.io/badge/Website-BBoxMaskPose-green)](https://mirapurkrabek.github.io/BBox-Mask-Pose/)     + [![License](https://img.shields.io/badge/License-GPL%203.0-orange.svg)](LICENSE)     + [![Video](https://img.shields.io/badge/Video-YouTube-red?logo=youtube)](https://youtu.be/U05yUP4b2LQ) + + [![Paper](https://img.shields.io/badge/ProbPose-CVPR%202025-blue)](https://arxiv.org/abs/2412.02254)     + [![Paper](https://img.shields.io/badge/BMPv1-ICCV%202025-blue)](https://arxiv.org/abs/2412.01562)     + [![Paper](https://img.shields.io/badge/SAMpose2seg-CVWW%202026-blue)](https://arxiv.org/abs/2601.08982)     + [![Paper](https://img.shields.io/badge/BMPv2-arXiv-blue)](https://arxiv.org/abs/2601.15200)     + + + + + +
+ +> [!CAUTION] +> This branch is a **work in progress**! +> +> Until merged with main, use on your own discretion. For stable version, please refer to main branch with BMPv1. + +## ๐Ÿ“ข News + +- **Feb 2026**: Version 2.0 with improved (1) pose and (2) SAM and (3) wiring to 3D prediction released. +- **Feb 2026**: SAM-pose2seg won a Best Paper Award on CVWW 2026 ๐ŸŽ‰ +- **Jan 2026**: [BMPv2 paper](https://arxiv.org/abs/2601.15200) is available on arXiv +- **Aug 2025**: [HuggingFace Image Demo](https://huggingface.co/spaces/purkrmir/BBoxMaskPose-demo) is out! ๐ŸŽฎ +- **Jul 2025**: Version 1.1 with easy-to-run image demo released +- **Jun 2025**: BMPv1 paper accepted to ICCV 2025! ๐ŸŽ‰ +- **Dec 2024**: BMPv1 code is available +- **Nov 2024**: The [project website](https://MiraPurkrabek.github.io/BBox-Mask-Pose) is on + +## ๐Ÿ“‘ Table of Contents + +- [Installation](#-installation) +- [Demo](#-demo) +- [API Examples](#api-examples) +- [Pre-trained Models](#-pre-trained-models) +- [Acknowledgments](#-acknowledgments) +- [Citation](#-citation) + + +## ๐Ÿ“‹ Project Overview + +Bounding boxes, masks, and poses capture complementary aspects of the human body. BBoxMaskPose links detection, segmentation, and pose estimation iteratively, where each prediction refines the others. PMPose combines probabilistic modeling with mask conditioning for robust pose estimation in crowds. Together, these components achieve state-of-the-art results on COCO and OCHuman, being the first method to exceed 50 AP on OCHuman. + + +### Repository Structure + +The repository is organized into two main packages with stable public APIs: + +``` +BBoxMaskPose/ +โ”œโ”€โ”€ pmpose/ # PMPose package (pose estimation) +โ”‚ โ””โ”€โ”€ pmpose/ +โ”‚ โ”œโ”€โ”€ api.py # PUBLIC API: PMPose class +โ”‚ โ”œโ”€โ”€ mm_utils.py # Internal utilities +โ”‚ โ””โ”€โ”€ posevis_lite.py # Visualization +โ”œโ”€โ”€ mmpose/ # MMPose fork with our edits +โ”œโ”€โ”€ bboxmaskpose/ # BBoxMaskPose package (full pipeline) +โ”‚ โ””โ”€โ”€ bboxmaskpose/ +โ”‚ โ”œโ”€โ”€ api.py # PUBLIC API: BBoxMaskPose class +โ”‚ โ”œโ”€โ”€ sam2/ # SAM2 implementation +โ”‚ โ”œโ”€โ”€ configs/ # BMP configurations +โ”‚ โ””โ”€โ”€ *_utils.py # Internal utilities +โ”œโ”€โ”€ demos/ # Public API demos +โ”‚ โ”œโ”€โ”€ PMPose_demo.py # PMPose usage example +โ”‚ โ”œโ”€โ”€ BMP_demo.py # BBoxMaskPose usage example +โ”‚ โ””โ”€โ”€ quickstart.ipynb # Interactive notebook +โ””โ”€โ”€ demo/ # Legacy demo (still functional) +``` + +Key contributions: +1. **MaskPose**: a pose estimation model conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters + - Download pre-trained weights below +2. **BBox-MaskPose (BMP)**: method linking bounding boxes, segmentation masks, and poses to simultaneously address multi-body detection, segmentation and pose estimation + - Try the demo! +3. Fine-tuned RTMDet adapted for itterative detection (ignoring 'holes') + - Download pre-trained weights below +4. Support for multi-dataset training of ViTPose, previously implemented in the official ViTPose repository but absent in MMPose. + +For more details, please visit our [project website](https://mirapurkrabek.github.io/BBox-Mask-Pose/). + + + +## ๐Ÿš€ Installation + +### Docker Installation (Recommended) + +The fastest way to get started with GPU support: + +```bash +# Clone and build +git clone https://github.com/mirapurkrabek/BBoxMaskPose.git +cd BBoxMaskPose +docker-compose build + +# Run the demo +docker-compose up +``` + +Requires: Docker Engine 19.03+, [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html), NVIDIA GPU with CUDA 12.1 support. + +### Manual Installation + +This project is built on top of [MMPose](https://github.com/open-mmlab/mmpose) and [SAM 2.1](https://github.com/facebookresearch/sam2). +Please refer to the [MMPose installation guide](https://mmpose.readthedocs.io/en/latest/installation.html) or [SAM installation guide](https://github.com/facebookresearch/sam2/blob/main/INSTALL.md) for detailed setup instructions. + +Basic installation steps: +```bash +# Clone the repository +git clone https://github.com/mirapurkrabek/BBoxMaskPose.git BBoxMaskPose/ +cd BBoxMaskPose + +# Install your version of torch, torchvision, OpenCV and NumPy +pip install torch==2.1.2+cu121 torchvision==0.16.2+cu121 --extra-index-url https://download.pytorch.org/whl/cu121 +pip install numpy==1.25.1 opencv-python==4.9.0.80 + +# Install MMLibrary +pip install -U openmim +mim install mmengine "mmcv==2.1.0" "mmdet==3.3.0" "mmpretrain==1.2.0" + +# Install dependencies +pip install -r requirements.txt +pip install -e . +``` + +## ๐ŸŽฎ Demo + +#### PMPose Demo (Pose Estimation Only) +```bash +python demos/PMPose_demo.py --image data/004806.jpg --device cuda +``` + +#### BBoxMaskPose Demo (Full Pipeline) +```bash +python demos/BMP_demo.py --image data/004806.jpg --device cuda +``` + +After running the demo, outputs are in `outputs/004806/`. The expected output should look like this: +
+ + Detection results + +      + + Pose results + +
+ +#### BBoxMaskPose v2 Demo (Full Pipeline + 3D Mesh Recovery) +This demo extends BMP with [SAM-3D-Body](https://github.com/facebookresearch/sam-3d-body) for 3D human mesh recovery: +```bash +# Basic usage (auto-downloads checkpoint from HuggingFace) +python demos/BMPv2_demo.py --image data/004806.jpg --device cuda + +# With local checkpoint +python demos/BMPv2_demo.py --image data/004806.jpg --device cuda \ + --sam3d_checkpoint checkpoints/sam-3d-body-dinov3/model.ckpt \ + --mhr_path checkpoints/sam-3d-body-dinov3/assets/mhr_model.pt +``` + +**SAM-3D-Body Installation (Optional):** +BMPv2 requires SAM-3D-Body for 3D mesh recovery. Install it separately: +```bash +# 1. Install dependencies +pip install -r requirements/sam3d.txt + +# 2. Install detectron2 +pip install 'git+https://github.com/facebookresearch/detectron2.git@a1ce2f9' --no-build-isolation --no-deps + +# 3. Install MoGe (optional, for FOV estimation) +pip install git+https://github.com/microsoft/MoGe.git + +# 4. Install adapted SAM-3D-Body repository +pip install git+https://github.com/MiraPurkrabek/sam-3d-body.git + +# 5. Request access to checkpoints at https://huggingface.co/facebook/sam-3d-body-dinov3 +``` + +For more details, see [SAM-3D-Body installation guide](https://github.com/facebookresearch/sam-3d-body/blob/main/INSTALL.md). + +#### Jupyter Notebook +Interactive demo with both PMPose and BBoxMaskPose: +```bash +jupyter notebook demos/quickstart.ipynb +``` + +## API Examples + +**PMPose API** - Pose estimation with bounding boxes: +```python +from pmpose import PMPose + +# Initialize model +pose_model = PMPose(device="cuda", from_pretrained=True) + +# Run inference +keypoints, presence, visibility, heatmaps = pose_model.predict( + image="demo/data/004806.jpg", + bboxes=[[100, 100, 300, 400]], # [x1, y1, x2, y2] +) + +# Visualize +vis_img = pose_model.visualize(image="demo/data/004806.jpg", keypoints=keypoints) +``` + +**BBoxMaskPose API** - Full detection + pose + segmentation: + +```python +from pmpose import PMPose +from bboxmaskpose import BBoxMaskPose + +# Create pose model +pose_model = PMPose(device="cuda", from_pretrained=True) + +# Inject into BMP +bmp_model = BBoxMaskPose(config="BMP_D3", device="cuda", pose_model=pose_model) +result = bmp_model.predict(image="demo/data/004806.jpg") + +# Visualize +vis_img = bmp_model.visualize(image="demo/data/004806.jpg", result=result) +``` + + +## ๐Ÿ“ฆ Pre-trained Models + +Pre-trained models are available on [VRG Hugging Face ๐Ÿค—](https://huggingface.co/vrg-prague/BBoxMaskPose/). +To run the demo, you only need do download SAM weights with [enclosed script](models/SAM/download_ckpts.sh). +Our detector and pose estimator will be downloaded during the runtime. + +If you want to download our weights yourself, here are the links to our HuggingFace: +- ViTPose-b trained on COCO+MPII+AIC -- [download weights](https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/ViTPose-b-multi_mmpose20.pth) +- MaskPose-b -- [download weights](https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/MaskPose-b.pth) +- Fine-tuned RTMDet-L -- [download weights](https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/rtmdet-ins-l-mask.pth) + +## ๐Ÿ™ Acknowledgments + +The code combines [MMDetection](https://github.com/open-mmlab/mmdetection), [MMPose 2.0](https://github.com/open-mmlab/mmpose), [ViTPose](https://github.com/ViTAE-Transformer/ViTPose), [SAM 2.1](https://github.com/facebookresearch/sam2) and [SAM-3D-Body](https://github.com/facebookresearch/sam-3d-body). + +Our visualizations integrate [Distinctipy](https://github.com/alan-turing-institute/distinctipy) for automatic color selection. + +This repository combines our work on BBoxMaskPose project with our previous work on [probabilistic 2D human pose estimation modelling](https://mirapurkrabek.github.io/ProbPose/). + +## ๐Ÿ“ Citation + +The code was implemented by [Miroslav Purkrรกbek](https://mirapurkrabek.github.io/) and Constantin Kolomiiets. +If you use this work, kindly cite it using the references provided below. + +For questions, please use the Issues of Discussion. + +``` +@InProceedings{Purkrabek2025BMPv1, + author = {Purkrabek, Miroslav and Matas, Jiri}, + title = {Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle}, + booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, + month = {October}, + year = {2025}, + pages = {9004-9013} +} +``` + +``` +@InProceedings{Purkrabek2026BMPv2, + author = {Purkrabek, Miroslav and Kolomiiets, Constantin and Matas, Jiri}, + title = {BBoxMaskPose v2: Expanding Mutual Conditioning to 3D}, + booktitle = {arXiv preprint arXiv:2601.15200}, + year = {2026} +} +``` + +``` +@article{yang2025sam3dbody, + title={SAM 3D Body: Robust Full-Body Human Mesh Recovery}, + author={Yang, Xitong and Kukreja, Devansh and Pinkus, Don and Sagar, Anushka and Fan, Taosha and Park, Jinhyung and Shin, Soyong and Cao, Jinkun and Liu, Jiawei and Ugrinovic, Nicolas and Feiszli, Matt and Malik, Jitendra and Dollar, Piotr and Kitani, Kris}, + journal={arXiv preprint; identifier to be added}, + year={2025} +} +``` + +``` +@InProceedings{Kolomiiets2026CVWW, + author = {Kolomiiets, Constantin and Purkrabek, Miroslav and Matas, Jiri}, + title = {SAM-pose2seg: Pose-Guided Human Instance Segmentation in Crowds}, + booktitle = {Computer Vision Winter Workshop (CVWW)}, + year = {2026} +} +``` diff --git a/SAM3D_INTEGRATION.md b/SAM3D_INTEGRATION.md new file mode 100644 index 0000000000000000000000000000000000000000..dd78107fff6c7a78fb8eb653f5d3647632498ff4 --- /dev/null +++ b/SAM3D_INTEGRATION.md @@ -0,0 +1,302 @@ +# SAM-3D-Body Integration Guide + +This guide explains how to integrate and use SAM-3D-Body for 3D human mesh recovery within the BBoxMaskPose pipeline. + +## Overview + +BBoxMaskPose v2 extends the original BMP pipeline with [SAM-3D-Body](https://github.com/facebookresearch/sam-3d-body) from Meta AI, enabling full 3D human mesh recovery from single images. The integration leverages BMP's high-quality 2D pose estimates and segmentation masks as prompts to SAM-3D-Body, resulting in accurate 3D reconstructions even in crowded scenes. + +**Pipeline Flow:** +``` +Input Image + โ†“ +BBoxMaskPose (Detection + 2D Pose + Segmentation) + โ†“ +2D Bboxes + Masks + Poses + โ†“ +SAM-3D-Body (3D Mesh Recovery) + โ†“ +3D Human Meshes (vertices, joints, faces) +``` + +## Installation + +### Prerequisites + +- BBoxMaskPose must be already installed and working +- CUDA-capable GPU recommended (CPU inference is very slow) +- Python 3.8+ (Python 3.11 recommended for SAM-3D-Body) + +### Step 1: Install SAM-3D-Body Dependencies + +```bash +# Navigate to BBoxMaskPose root directory +cd /path/to/BBoxMaskPose + +# Install SAM-3D-Body dependencies +pip install -r requirements/sam3d.txt +``` + +### Step 2: Install Detectron2 + +SAM-3D-Body requires a specific version of Detectron2: + +```bash +pip install 'git+https://github.com/facebookresearch/detectron2.git@a1ce2f9' \ + --no-build-isolation --no-deps +``` + +### Step 3: Install MoGe (Optional but Recommended) + +MoGe provides FOV (field-of-view) estimation for better camera calibration: + +```bash +pip install git+https://github.com/microsoft/MoGe.git +``` + +### Step 4: Install SAM-3D-Body + +```bash +# Install adapted SAM-3D-Body repository +pip install git+https://github.com/MiraPurkrabek/sam-3d-body.git +``` + +### Step 5: Get Model Checkpoints + +SAM-3D-Body checkpoints are hosted on HuggingFace. You need to: + +1. **Request access** at [facebook/sam-3d-body-dinov3](https://huggingface.co/facebook/sam-3d-body-dinov3) +2. **Wait for approval** (usually within 24 hours) +3. **Authenticate** with HuggingFace: + ```bash + pip install huggingface_hub + huggingface-cli login + ``` + +The BMPv2 demo will auto-download the checkpoint on first use, or you can download manually to the default location for auto-detection: + +```bash +# Download checkpoint manually to default location (will be auto-detected) +mkdir -p checkpoints +huggingface-cli download facebook/sam-3d-body-dinov3 \ + --local-dir checkpoints/sam-3d-body-dinov3 +``` + +## Usage + +### Basic Usage + +Run the BMPv2 demo with automatic checkpoint handling: + +```bash +python demos/BMPv2_demo.py --image data/004806.jpg --device cuda +``` + +**The demo will:** +1. Auto-detect checkpoint in `checkpoints/sam-3d-body-dinov3/` OR download from HuggingFace (~3.5 GB) +2. Run BMP pipeline to get 2D detections, poses, and masks +3. Run SAM-3D-Body to recover 3D meshes +4. Save visualizations to `demos/outputs/bboxmaskpose_v2/` + +### Advanced Usage + +#### Use Local Checkpoint (Auto-Detection) + +Download checkpoint to the default location for automatic detection: + +```bash +# The demo automatically detects checkpoints in this location +huggingface-cli download facebook/sam-3d-body-dinov3 \ + --local-dir checkpoints/sam-3d-body-dinov3 + +# Then just run the demo - no checkpoint arguments needed! +python demos/BMPv2_demo.py --image data/004806.jpg --device cuda +``` + +#### Use Custom Checkpoint Path + +If your checkpoint is in a different location: + +```bash +python demos/BMPv2_demo.py \ + --image data/004806.jpg \ + --device cuda \ + --sam3d_checkpoint /path/to/model.ckpt \ + --mhr_path /path/to/mhr_model.pt +``` + +#### Speed vs Quality Trade-offs + +```bash +# Fastest: body-only inference without mask conditioning +python demos/BMPv2_demo.py --image data/004806.jpg \ + --inference_type body --no_mask_conditioning + +# Balanced: body-only with mask conditioning +python demos/BMPv2_demo.py --image data/004806.jpg \ + --inference_type body + +# Best quality: full inference with mask conditioning (default) +python demos/BMPv2_demo.py --image data/004806.jpg +``` + +#### Disable Mask Conditioning + +Faster but less accurate (doesn't use segmentation masks as prompts): + +```bash +python demos/BMPv2_demo.py \ + --image data/004806.jpg \ + --no_mask_conditioning +``` + +#### Skip 3D Recovery + +Run only BMP pipeline (useful for testing BMP without SAM-3D-Body): + +```bash +python demos/BMPv2_demo.py \ + --image data/004806.jpg \ + --skip_3d +``` + +### Output Files + +The demo saves the following visualizations: + +- `{image_name}_bmp_pose.jpg` - 2D pose estimation results +- `{image_name}_bmp_mask.jpg` - Segmentation mask results +- `{image_name}_3d_mesh.jpg` - 3D mesh overlay on image +- `{image_name}_combined.jpg` - Side-by-side comparison of all results + +## Programmatic API + +You can also use SAM-3D-Body programmatically: + +```python +from bboxmaskpose import BBoxMaskPose +from bboxmaskpose.sam3d_utils import SAM3DBodyWrapper, visualize_3d_meshes + +# Step 1: Run BMP pipeline +bmp = BBoxMaskPose(config="bmp_D3", device="cuda") +result = bmp.predict(image="path/to/image.jpg") + +# Step 2: Initialize SAM-3D-Body +sam3d = SAM3DBodyWrapper(device="cuda") + +# Step 3: Predict 3D meshes from BMP outputs +outputs_3d = sam3d.predict( + image="path/to/image.jpg", + bboxes=result['bboxes'], + masks=result['masks'], + use_mask=True, + inference_type="full", # Options: "full", "body", "hand" +) + +# Step 4: Visualize results +import cv2 +img = cv2.imread("path/to/image.jpg") +vis = visualize_3d_meshes(img, outputs_3d, sam3d.faces) +cv2.imwrite("output_3d.jpg", vis) +``` + +### Access 3D Mesh Data + +Each element in `outputs_3d` is a dictionary containing: + +```python +output_3d[0].keys() +# dict_keys(['vertices', 'joints', 'bbox', 'mask', ...]) + +# 3D mesh vertices in camera coordinates (V, 3) +vertices = outputs_3d[0]['vertices'] + +# 3D joint locations (J, 3) +joints_3d = outputs_3d[0]['joints'] + +# Mesh faces (shared across all people) +faces = sam3d.faces # (F, 3) +``` + +## Integration Architecture + +### Wrapper Design + +The integration follows BBoxMaskPose's modular design pattern: + +``` +bboxmaskpose/ +โ”œโ”€โ”€ sam3d_utils.py # SAM-3D-Body wrapper (new) +โ”‚ โ”œโ”€โ”€ SAM3DBodyWrapper # Main wrapper class +โ”‚ โ”œโ”€โ”€ visualize_3d_meshes # Visualization helper +โ”‚ โ””โ”€โ”€ check_sam3d_available +โ”‚ +demos/ +โ”œโ”€โ”€ BMP_demo.py # Original BMP demo +โ””โ”€โ”€ BMPv2_demo.py # New demo with 3D (new) +``` + +### Why a Wrapper? + +The `SAM3DBodyWrapper` class: +- **Simplifies** SAM-3D-Body's complex initialization +- **Adapts** BMP outputs (bboxes, masks) to SAM-3D-Body inputs +- **Handles** optional dependencies gracefully (no hard requirement) +- **Follows** BMP's design patterns (similar to PMPose wrapper) + +### Key Design Decisions + +1. **Optional Dependency**: SAM-3D-Body is not required for core BMP functionality +2. **No Code Duplication**: Reuses SAM-3D-Body's existing code via wrapper +3. **Mask Conditioning**: Leverages BMP's high-quality masks as prompts +4. **No Internal Detector**: Disables SAM-3D-Body's detector (BMP already detects) + +## Troubleshooting + +### Import Error: `sam_3d_body` not found + +**Solution**: Install SAM-3D-Body following Step 4 above. + +### HuggingFace Authentication Error + +**Solution**: +1. Request access at https://huggingface.co/facebook/sam-3d-body-dinov3 +2. Login: `huggingface-cli login` + +### MoGe Import Error (FOV Estimator) + +**Solution**: Either: +- Install MoGe: `pip install git+https://github.com/microsoft/MoGe.git` +- Or disable FOV estimation (uses default FOV instead) + +### Detectron2 Build Errors + +**Solution**: Make sure you have: +- CUDA toolkit installed and matching PyTorch CUDA version +- GCC/G++ compiler available +- Use the exact commit hash: `@a1ce2f9` + +## References + +- **SAM-3D-Body**: [GitHub](https://github.com/facebookresearch/sam-3d-body) | [Paper](https://ai.meta.com/research/publications/sam-3d-body-robust-full-body-human-mesh-recovery/) +- **BBoxMaskPose**: [GitHub](https://github.com/MiraPurkrabek/BBoxMaskPose) | [Paper](https://arxiv.org/abs/2601.15200) + +## Citation + +If you use this integration, please cite both works: + +```bibtex +@article{yang2025sam3dbody, + title={SAM 3D Body: Robust Full-Body Human Mesh Recovery}, + author={Yang, Xitong and Kukreja, Devansh and Pinkus, Don and Sagar, Anushka and Fan, Taosha and Park, Jinhyung and Shin, Soyong and Cao, Jinkun and Liu, Jiawei and Ugrinovic, Nicolas and Feiszli, Matt and Malik, Jitendra and Dollar, Piotr and Kitani, Kris}, + journal={arXiv preprint; identifier to be added}, + year={2025} +} + +@InProceedings{Purkrabek2026BMPv2, + author = {Purkrabek, Miroslav and Kolomiiets, Constantin and Matas, Jiri}, + title = {BBoxMaskPose v2: Expanding Mutual Conditioning to 3D}, + booktitle = {arXiv preprint arXiv:2601.15200}, + year = {2026} +} +``` diff --git a/app.py b/app.py index c5b2252ce7591ca7bdcdf5fe71531ea425890272..d80065c8588707727251722995597fab46b3b9fe 100644 --- a/app.py +++ b/app.py @@ -1,188 +1,204 @@ -import gradio as gr -import spaces - -from pathlib import Path +from typing import Any +import cv2 +import gradio as gr import numpy as np -import yaml -from demo.demo_utils import DotDict, concat_instances, filter_instances, pose_nms, visualize_demo -from demo.mm_utils import run_MMDetector, run_MMPose -from mmdet.apis import init_detector -from demo.sam2_utils import prepare_model as prepare_sam2_model -from demo.sam2_utils import process_image_with_SAM - -from mmpose.apis import init_model as init_pose_estimator -from mmpose.utils import adapt_mmdet_pipeline - -# Default thresholds -DEFAULT_CAT_ID: int = 0 - -DEFAULT_BBOX_THR: float = 0.3 -DEFAULT_NMS_THR: float = 0.3 -DEFAULT_KPT_THR: float = 0.3 - -# Global models variable -det_model = None -pose_model = None -sam2_model = None - -def _parse_yaml_config(yaml_path: Path) -> DotDict: - """ - Load BMP configuration from a YAML file. - - Args: - yaml_path (Path): Path to YAML config. - Returns: - DotDict: Nested config dictionary. - """ - with open(yaml_path, "r") as f: - cfg = yaml.safe_load(f) - return DotDict(cfg) - -def load_models(bmp_config): - device = 'cuda:0' - - global det_model, pose_model, sam2_model - - # build detectors - det_model = init_detector(bmp_config.detector.det_config, bmp_config.detector.det_checkpoint, device='cpu') # Detect with CPU because of installation issues on HF - det_model.cfg = adapt_mmdet_pipeline(det_model.cfg) - - - # build pose estimator - pose_model = init_pose_estimator( - bmp_config.pose_estimator.pose_config, - bmp_config.pose_estimator.pose_checkpoint, - device=device, - cfg_options=dict(model=dict(test_cfg=dict(output_heatmaps=False))), - ) +import spaces - sam2_model = prepare_sam2_model( - model_cfg=bmp_config.sam2.sam2_config, - model_checkpoint=bmp_config.sam2.sam2_checkpoint, +from bboxmaskpose import BBoxMaskPose + +# Global BMP model singleton +bmp_model = None +bmp_model_config = "bmp_v2" +bmp_model_device = "cuda:0" + + +def _to_numpy(value: Any): + """Convert model outputs to numpy arrays when needed.""" + if value is None: + return None + if isinstance(value, np.ndarray): + return value + if hasattr(value, "detach"): + return value.detach().cpu().numpy() + if hasattr(value, "cpu") and hasattr(value, "numpy"): + return value.cpu().numpy() + return np.asarray(value) + + +def _empty_result(height: int, width: int) -> dict[str, np.ndarray]: + """Create an empty result dictionary compatible with BBoxMaskPose.visualize().""" + return { + "bboxes": np.zeros((0, 4), dtype=np.float32), + "masks": np.zeros((0, height, width), dtype=np.uint8), + "keypoints": np.zeros((0, 17, 3), dtype=np.float32), + "presence": np.zeros((0, 17), dtype=np.float32), + "visibility": np.zeros((0, 17), dtype=np.float32), + } + + +def _normalize_result(result: dict, height: int, width: int) -> dict[str, np.ndarray]: + """Normalize prediction dictionary into a robust shape for visualization.""" + bboxes = _to_numpy(result.get("bboxes")) + keypoints = _to_numpy(result.get("keypoints")) + masks = _to_numpy(result.get("masks")) + presence = _to_numpy(result.get("presence")) + visibility = _to_numpy(result.get("visibility")) + + if bboxes is None: + bboxes = np.zeros((0, 4), dtype=np.float32) + bboxes = np.asarray(bboxes, dtype=np.float32).reshape(-1, 4) + num_instances = bboxes.shape[0] + + if keypoints is None: + keypoints = np.zeros((num_instances, 17, 3), dtype=np.float32) + keypoints = np.asarray(keypoints, dtype=np.float32) + if keypoints.ndim == 2: + keypoints = keypoints[None, ...] + if keypoints.shape[0] != num_instances: + keypoints = np.zeros((num_instances, 17, 3), dtype=np.float32) + if keypoints.shape[1] > 17: + keypoints = keypoints[:, :17, :] + if keypoints.shape[1] < 17: + padded = np.zeros((num_instances, 17, 3), dtype=np.float32) + padded[:, : keypoints.shape[1], : min(keypoints.shape[2], 3)] = keypoints[:, :, :3] + keypoints = padded + if keypoints.shape[2] == 2: + scores = np.ones((num_instances, 17, 1), dtype=np.float32) + keypoints = np.concatenate([keypoints, scores], axis=-1) + elif keypoints.shape[2] > 3: + keypoints = keypoints[:, :, :3] + + if masks is None: + masks = np.zeros((num_instances, height, width), dtype=np.uint8) + masks = np.asarray(masks) + if masks.ndim == 2: + masks = masks[None, ...] + if masks.ndim == 4 and masks.shape[-1] == 1: + masks = masks.squeeze(-1) + if masks.shape[0] != num_instances: + masks = np.zeros((num_instances, height, width), dtype=np.uint8) + masks = masks.astype(np.uint8) + + if presence is None: + presence = keypoints[:, :, 2] + presence = np.asarray(presence, dtype=np.float32).reshape(num_instances, -1) + if presence.shape[1] > 17: + presence = presence[:, :17] + if presence.shape[1] < 17: + padded_presence = np.zeros((num_instances, 17), dtype=np.float32) + padded_presence[:, : presence.shape[1]] = presence + presence = padded_presence + + if visibility is None: + visibility = keypoints[:, :, 2] + visibility = np.asarray(visibility, dtype=np.float32).reshape(num_instances, -1) + if visibility.shape[1] > 17: + visibility = visibility[:, :17] + if visibility.shape[1] < 17: + padded_visibility = np.zeros((num_instances, 17), dtype=np.float32) + padded_visibility[:, : visibility.shape[1]] = visibility + visibility = padded_visibility + + return { + "bboxes": bboxes, + "masks": masks, + "keypoints": keypoints, + "presence": presence, + "visibility": visibility, + } + + +def _extract_baseline_result(intermediates: Any, fallback_result: dict, height: int, width: int) -> dict[str, np.ndarray]: + """Build baseline result from first intermediate pose output.""" + if not intermediates: + return _normalize_result(fallback_result, height, width) + + first_intermediate = intermediates[0] if len(intermediates) > 0 else None + if first_intermediate is None: + return _normalize_result(fallback_result, height, width) + + pose_instances = first_intermediate.get("poses") + if pose_instances is None: + return _normalize_result(fallback_result, height, width) + + result = { + "bboxes": _to_numpy(getattr(pose_instances, "bboxes", None)), + "keypoints": _to_numpy(getattr(pose_instances, "keypoints", None)), + "masks": _to_numpy(getattr(pose_instances, "masks", None)), + "presence": _to_numpy(getattr(pose_instances, "keypoint_prob", None)), + "visibility": _to_numpy(getattr(pose_instances, "keypoint_vis", None)), + } + return _normalize_result(result, height, width) + + +def _blend_pose_and_mask(model: BBoxMaskPose, image_bgr: np.ndarray, result: dict[str, np.ndarray]) -> np.ndarray: + """Render pose and mask overlays, then blend them into one image.""" + pose_vis = model.visualize(image=image_bgr, result=result, vis_type="pose") + mask_vis = model.visualize(image=image_bgr, result=result, vis_type="mask") + return cv2.addWeighted(pose_vis, 0.5, mask_vis, 0.5, 0.0) + + +def _get_bmp_model(config_name: str = "bmp_v2", device: str = "cuda:0") -> BBoxMaskPose: + """Lazily initialize and reuse BBoxMaskPose model.""" + global bmp_model, bmp_model_config, bmp_model_device + + should_rebuild = ( + bmp_model is None + or bmp_model_config != config_name + or bmp_model_device != device ) - - return det_model, pose_model, sam2_model + if should_rebuild: + try: + bmp_model = BBoxMaskPose(config=config_name, device=device) + bmp_model_config = config_name + bmp_model_device = device + except Exception as exc: + raise RuntimeError( + f"Failed to initialize BBoxMaskPose with config='{config_name}', device='{device}'." + ) from exc + + return bmp_model @spaces.GPU(duration=60) def process_image_with_BMP( img: np.ndarray ) -> tuple[np.ndarray, np.ndarray]: """ - Run the full BMP pipeline on a single image: detection, pose, SAM mask refinement, and visualization. - - Args: - args (Namespace): Parsed CLI arguments. - bmp_config (DotDict): Configuration parameters. - img_path (Path): Path to the input image. - detector: Primary MMDetection model. - detector_prime: Secondary MMDetection model for iterations. - pose_estimator: MMPose model for keypoint estimation. - sam2_model: SAM model for mask refinement. - Returns: - InstanceData: Final merged detections and refined masks. + Run BMP inference using the public BBoxMaskPose API. + + The function keeps the original Gradio interface: + - output 1: baseline-style result from first intermediate pass + - output 2: final BMP-refined result """ - bmp_config = _parse_yaml_config(Path("configs/bmp_D3.yaml")) - load_models(bmp_config) - - # img: RGB -> BGR - img = img[..., ::-1] - - img_for_detection = img.copy() - rtmdet_result = None - all_detections = None - for iteration in range(bmp_config.num_bmp_iters): - - # Step 1: Detection - det_instances = run_MMDetector( - det_model, - img_for_detection, - det_cat_id=DEFAULT_CAT_ID, - bbox_thr=DEFAULT_BBOX_THR, - nms_thr=DEFAULT_NMS_THR, - ) - if len(det_instances.bboxes) == 0: - continue - - # Step 2: Pose estimation - pose_instances = run_MMPose( - pose_model, - img.copy(), - detections=det_instances, - kpt_thr=DEFAULT_KPT_THR, - ) - - # Restrict to first 17 COCO keypoints - pose_instances.keypoints = pose_instances.keypoints[:, :17, :] - pose_instances.keypoint_scores = pose_instances.keypoint_scores[:, :17] - pose_instances.keypoints = np.concatenate( - [pose_instances.keypoints, pose_instances.keypoint_scores[:, :, None]], axis=-1 - ) - - # Step 3: Pose-NMS and SAM refinement - all_keypoints = ( - pose_instances.keypoints - if all_detections is None - else np.concatenate([all_detections.keypoints, pose_instances.keypoints], axis=0) - ) - all_bboxes = ( - pose_instances.bboxes - if all_detections is None - else np.concatenate([all_detections.bboxes, pose_instances.bboxes], axis=0) - ) - num_valid_kpts = np.sum(all_keypoints[:, :, 2] > bmp_config.sam2.prompting.confidence_thr, axis=1) - keep_indices = pose_nms( - DotDict({"confidence_thr": bmp_config.sam2.prompting.confidence_thr, "oks_thr": bmp_config.oks_nms_thr}), - image_kpts=all_keypoints, - image_bboxes=all_bboxes, - num_valid_kpts=num_valid_kpts, - ) - keep_indices = sorted(keep_indices) # Sort by original index - num_old_detections = 0 if all_detections is None else len(all_detections.bboxes) - keep_new_indices = [i - num_old_detections for i in keep_indices if i >= num_old_detections] - keep_old_indices = [i for i in keep_indices if i < num_old_detections] - if len(keep_new_indices) == 0: - continue - # filter new detections and compute scores - new_dets = filter_instances(pose_instances, keep_new_indices) - new_dets.scores = pose_instances.keypoint_scores[keep_new_indices].mean(axis=-1) - old_dets = None - if len(keep_old_indices) > 0: - old_dets = filter_instances(all_detections, keep_old_indices) - - new_detections = process_image_with_SAM( - DotDict(bmp_config.sam2.prompting), - img.copy(), - sam2_model, - new_dets, - old_dets if old_dets is not None else None, - ) - - # Merge detections - if all_detections is None: - all_detections = new_detections - else: - all_detections = concat_instances(all_detections, new_dets) - - # Step 4: Visualization - img_for_detection, rtmdet_r, _ = visualize_demo( - img.copy(), - all_detections, - ) - - if iteration == 0: - rtmdet_result = rtmdet_r - - _, _, bmp_result = visualize_demo( - img.copy(), - all_detections, + if img is None: + raise ValueError("Input image is None.") + + # Gradio image is RGB; BMP API expects BGR. + img_bgr = img[..., ::-1].copy() + height, width = img_bgr.shape[:2] + + model = _get_bmp_model(config_name=bmp_model_config, device=bmp_model_device) + final_result = model.predict( + image=img_bgr, + bboxes=None, + return_intermediates=True, ) + normalized_final = _normalize_result(final_result, height, width) + + # No-detection robustness: return original image for both outputs. + if normalized_final["bboxes"].shape[0] == 0: + original_rgb = img_bgr[..., ::-1] + return original_rgb, original_rgb + + intermediates = final_result.get("intermediates", []) + baseline_result = _extract_baseline_result(intermediates, normalized_final, height, width) - # img: BGR -> RGB - rtmdet_result = rtmdet_result[..., ::-1] - bmp_result = bmp_result[..., ::-1] + baseline_vis = _blend_pose_and_mask(model, img_bgr, baseline_result) + bmp_vis = _blend_pose_and_mask(model, img_bgr, normalized_final) - return rtmdet_result, bmp_result + # BGR -> RGB for Gradio + return baseline_vis[..., ::-1], bmp_vis[..., ::-1] with gr.Blocks() as app: @@ -281,4 +297,4 @@ with gr.Blocks() as app: ) # Launch the demo -app.launch() \ No newline at end of file +app.launch() diff --git a/bboxmaskpose/__init__.py b/bboxmaskpose/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..2740700360b9f49c9b9cab705aedcf2668db5583 --- /dev/null +++ b/bboxmaskpose/__init__.py @@ -0,0 +1,10 @@ +# Copyright (c) authors of BBoxMaskPose (BMPv2). All rights reserved. +""" +BBoxMaskPose package - Public API for detection, pose estimation, and segmentation. + +This package provides a stable wrapper for the full BBoxMaskPose pipeline. +""" + +from .api import BBoxMaskPose + +__all__ = ["BBoxMaskPose"] diff --git a/bboxmaskpose/api.py b/bboxmaskpose/api.py new file mode 100644 index 0000000000000000000000000000000000000000..a0e898c88950c4e8aff6c7dedf8ab4fc5eb88a0b --- /dev/null +++ b/bboxmaskpose/api.py @@ -0,0 +1,515 @@ +# Copyright (c) authors of BBoxMaskPose (BMPv2). All rights reserved. + +""" +Public API for BBoxMaskPose wrapper. + +This module provides a stable, user-friendly interface for the full +BBoxMaskPose pipeline: detection, pose estimation, and mask refinement. +""" + +import glob +import os +from pathlib import Path +from typing import Dict, List, Optional, Union + +import cv2 +import mmengine +import numpy as np +import yaml +from mmdet.apis import inference_detector, init_detector +from mmengine.structures import InstanceData + +from .demo_utils import DotDict, _visualize_predictions, concat_instances, filter_instances, pose_nms +from .posevis_lite import pose_visualization + +# Import from BBoxMaskPose package +from .sam2_utils import prepare_model as prepare_sam2_model, process_image_with_SAM + +BMP_ROOT = Path(__file__).parent.parent + +# Note: PMPose will be imported when needed to avoid circular imports +# from pmpose import PMPose + +# Default detector and pose config +DEFAULT_DET_CAT_ID: int = 0 +DEFAULT_BBOX_THR: float = 0.3 +DEFAULT_NMS_THR: float = 0.3 +DEFAULT_KPT_THR: float = 0.3 + +# Pretrained config URLs (for future use) +PRETRAINED_CONFIGS = { + "bmp-d3": "BMP_D3", + "bmp-j1": "BMP_J1", +} + + +class BBoxMaskPose: + """ + Public wrapper API for BBoxMaskPose pipeline. + + This class provides a complete pipeline for detection, pose estimation, + and mask refinement using SAM2. + + Example: + >>> bmp_model = BBoxMaskPose(config="BMP_D3", device="cuda") + >>> result = bmp_model.predict( + ... image="path/to/image.jpg", + ... return_intermediates=True + ... ) + """ + + def __init__( + self, + config: str = "BMP_D3", + device: str = "cuda", + config_path: Optional[str] = None, + pose_model=None, # Type hint removed to avoid import at module level + pretrained_id: Optional[str] = None, + n_kpts_to_work_with: Optional[int] = 17, + ): + """ + Initialize BBoxMaskPose model. + + Args: + config (str): Config alias ('BMP_D3', 'BMP_J1'). Defaults to 'BMP_D3'. + device (str): Device for inference. Defaults to 'cuda'. + config_path (str, optional): Path to custom YAML config file. + pose_model (PMPose, optional): Pre-initialized PMPose instance. + If None, will create internal pose model. + pretrained_id (str, optional): Alias for pretrained config. + n_kpts_to_work_with (int, optional): Number of keypoints to work with. + Defaults to 17 (COCO keypoints). + """ + self.device = device + self.config_name = config + + self.n_kpts_to_work_with = 17 # Hard-code 17 as no experiments were done with other values. Keep the argument for future flexibility, but ignore it for now. + if n_kpts_to_work_with != 17: + print( + f"Warning: n_kpts_to_work_with is set to {n_kpts_to_work_with}, but currently only 17 keypoints are supported. Ignoring this argument for now." + ) + + # Determine config path + if config_path is not None: + self.config_path = config_path + else: + bmp_configs_root = os.path.join(BMP_ROOT, "bboxmaskpose", "configs") + config_file = f"{config}.yaml" + self.config_path = os.path.join(bmp_configs_root, config_file) + + if not os.path.exists(self.config_path): + available_configs = glob.glob(os.path.join(bmp_configs_root, "*.yaml")) + available_configs = [os.path.basename(f).replace(".yaml", "") for f in available_configs] + raise FileNotFoundError(f"Config file not found: {self.config_path}. " f"Available configs: {', '.join(available_configs)}") + + # Load config + self.config = self._load_config(self.config_path) + + # Initialize or use provided pose model + if pose_model is not None: + self.pose_model = pose_model + self._owns_pose_model = False + else: + # Create internal PMPose instance + self.pose_model = self._create_pose_model() + self._owns_pose_model = True + + # Initialize detector and SAM2 + self.detector = None + self.detector_prime = None + self.sam2_model = None + self._initialize_models() + + def _load_config(self, config_path: str) -> DotDict: + """Load BMP configuration from YAML file.""" + with open(config_path, "r") as f: + cfg = yaml.safe_load(f) + return DotDict(cfg) + + def _create_pose_model(self): + """Create internal PMPose model from config.""" + # Import PMPose here to avoid circular imports + from pmpose import PMPose + + # Extract pose config from BMP config + pose_config = self.config.pose_estimator.pose_config + pose_checkpoint = self.config.pose_estimator.pose_checkpoint + + # Create PMPose instance with custom config + full_pose_config = str(BMP_ROOT / pose_config) + + pose_model = PMPose( + device=self.device, + config_path=full_pose_config, + from_pretrained=True, + ) + + # Load checkpoint if it's a local path + if not pose_checkpoint.startswith("http"): + pose_model.load_from_file(pose_checkpoint) + + return pose_model + + def _initialize_models(self): + """Initialize detector and SAM2 models.""" + # Initialize detector + self.detector = init_detector(self.config.detector.det_config, self.config.detector.det_checkpoint, device=self.device) + + # Adapt detector pipeline + from mmpose.utils import adapt_mmdet_pipeline + + self.detector.cfg = adapt_mmdet_pipeline(self.detector.cfg) + + # Initialize detector prime (may be same as detector) + if ( + self.config.detector.det_config == self.config.detector.det_prime_config + and self.config.detector.det_checkpoint == self.config.detector.det_prime_checkpoint + ) or (self.config.detector.det_prime_config is None or self.config.detector.det_prime_checkpoint is None): + self.detector_prime = self.detector + else: + self.detector_prime = init_detector( + self.config.detector.det_prime_config, self.config.detector.det_prime_checkpoint, device=self.device + ) + self.detector_prime.cfg = adapt_mmdet_pipeline(self.detector_prime.cfg) + + # Initialize SAM2 + sam2_config_path = os.path.join(BMP_ROOT, "bboxmaskpose", "sam2", self.config.sam2.sam2_config) + self.sam2_model = prepare_sam2_model( + model_cfg=sam2_config_path, + model_checkpoint=self.config.sam2.sam2_checkpoint, + ) + + def predict( + self, + image: Union[str, np.ndarray], + bboxes: Optional[np.ndarray] = None, + return_intermediates: bool = False, + return_probmaps: bool = False, + ) -> Dict: + """ + Run full BBoxMaskPose pipeline on image. + + Args: + image: Image path (str) or BGR numpy array. + bboxes: Optional (N, 4) bboxes in [x1, y1, x2, y2] format. + If None, run detector. + return_intermediates: If True, return intermediate outputs. + return_probmaps: If True, request heatmaps from pose model. + + Returns: + Dict with keys: + - 'bboxes': (N, 4) final bounding boxes + - 'masks': (N, H, W) refined binary masks + - 'keypoints': (N, K, 3) keypoints with scores + - 'presence': (N, K) presence probabilities + - 'visibility': (N, K) visibility flags + - 'detector': (optional) raw detector outputs + - 'sam2': (optional) intermediate SAM outputs + """ + # Load image + if isinstance(image, str): + img = cv2.imread(image) + if img is None: + raise ValueError(f"Failed to load image from {image}") + else: + img = image.copy() + + # Run BMP iterations + all_detections = None + intermediate_results = [] if return_intermediates else None + + for iteration in range(self.config.num_bmp_iters): + # Step 1: Detection + if iteration == 0 and bboxes is not None: + # Use provided bboxes for first iteration + det_instances = InstanceData(bboxes=bboxes, bbox_scores=np.ones(len(bboxes)), masks=None) + else: + # Run detector + det_instances = self._run_detector( + self.detector if iteration == 0 else self.detector_prime, + img if all_detections is None else self._mask_out_image(img, all_detections), + ) + + if len(det_instances.bboxes) == 0: + continue + + # Step 2: Pose estimation using PMPose wrapper + pose_results = self._run_pose_estimation(img, det_instances, return_probmaps=return_probmaps) + + # Step 3: Pose NMS and SAM refinement + new_detections, old_detections = self._refine_with_sam( + img, + pose_results, + all_detections, + ) + + # Merge detections + if all_detections is None: + all_detections = new_detections + else: + all_detections = concat_instances(old_detections, new_detections) + + # Store intermediates if requested + if return_intermediates: + intermediate_results.append( + { + "iteration": iteration, + "detections": det_instances, + "poses": pose_results, + "refined": new_detections, + } + ) + + # Prepare final result + result = self._format_result(all_detections, img.shape[:2]) + + if return_intermediates: + result["intermediates"] = intermediate_results + + return result + + def _run_detector( + self, + detector, + img: np.ndarray, + ) -> InstanceData: + """Run MMDetection detector.""" + from mmpose.evaluation.functional import nms + + # Run detection + det_result = inference_detector(detector, img) + pred_instances = det_result.pred_instances.cpu().numpy() + + # Aggregate bboxes and scores + bboxes_all = np.concatenate((pred_instances.bboxes, pred_instances.scores[:, None]), axis=1) + + # Filter by category and score + keep_mask = np.logical_and(pred_instances.labels == DEFAULT_DET_CAT_ID, pred_instances.scores > DEFAULT_BBOX_THR) + + if not np.any(keep_mask): + return InstanceData(bboxes=np.zeros((0, 4)), bbox_scores=np.zeros((0,)), masks=np.zeros((0, 1, 1))) + + bboxes = bboxes_all[keep_mask] + masks = getattr(pred_instances, "masks", None) + if masks is not None: + masks = masks[keep_mask] + + # Sort by score + order = np.argsort(bboxes[:, 4])[::-1] + bboxes = bboxes[order] + if masks is not None: + masks = masks[order] + + # Apply NMS + keep_indices = nms(bboxes, DEFAULT_NMS_THR) + bboxes = bboxes[keep_indices] + if masks is not None: + masks = masks[keep_indices] + + return InstanceData(bboxes=bboxes[:, :4], bbox_scores=bboxes[:, 4], masks=masks) + + def _run_pose_estimation( + self, + img: np.ndarray, + det_instances: InstanceData, + return_probmaps: bool = False, + ) -> InstanceData: + """Run pose estimation using PMPose wrapper.""" + bboxes = det_instances.bboxes + masks = det_instances.masks + + if len(bboxes) == 0: + return InstanceData( + keypoints=np.zeros((0, self.n_kpts_to_work_with, 3)), + keypoint_scores=np.zeros((0, self.n_kpts_to_work_with)), + bboxes=bboxes, + bbox_scores=det_instances.bbox_scores, + masks=masks, + ) + + # Call PMPose public API + keypoints, probabilities, visibilities, heatmaps = self.pose_model.predict( + img, + bboxes, + masks=masks, + return_probmaps=return_probmaps, + ) + + # Restrict to first 17 COCO keypoints + keypoints = keypoints[:, : self.n_kpts_to_work_with, :] + probabilities = probabilities[:, : self.n_kpts_to_work_with] + visibilities = visibilities[:, : self.n_kpts_to_work_with] + + if heatmaps is not None: + heatmaps = heatmaps[:, : self.n_kpts_to_work_with, :, :] + + # Create InstanceData with results + result = InstanceData( + keypoints=keypoints, + keypoint_scores=keypoints[:, :, 2], + bboxes=bboxes, + bbox_scores=det_instances.bbox_scores, + masks=masks, + keypoint_vis=visibilities, + keypoint_prob=probabilities, + ) + + if return_probmaps and heatmaps is not None: + result.heatmaps = heatmaps + + return result + + def _refine_with_sam( + self, + img: np.ndarray, + pose_instances: InstanceData, + all_detections: Optional[InstanceData], + ) -> tuple: + """Perform Pose-NMS and SAM refinement.""" + # Combine keypoints with scores + keypoints_with_scores = pose_instances.keypoints + + # Perform Pose-NMS + all_keypoints = ( + keypoints_with_scores if all_detections is None else np.concatenate([all_detections.keypoints, keypoints_with_scores], axis=0) + ) + all_bboxes = ( + pose_instances.bboxes if all_detections is None else np.concatenate([all_detections.bboxes, pose_instances.bboxes], axis=0) + ) + + num_valid_kpts = np.sum(all_keypoints[:, :, 2] > self.config.sam2.prompting.confidence_thr, axis=1) + + keep_indices = pose_nms( + DotDict({"confidence_thr": self.config.sam2.prompting.confidence_thr, "oks_thr": self.config.oks_nms_thr}), + image_kpts=all_keypoints, + image_bboxes=all_bboxes, + num_valid_kpts=num_valid_kpts, + ) + + keep_indices = sorted(keep_indices) + num_old_detections = 0 if all_detections is None else len(all_detections.bboxes) + keep_new_indices = [i - num_old_detections for i in keep_indices if i >= num_old_detections] + keep_old_indices = [i for i in keep_indices if i < num_old_detections] + + if len(keep_new_indices) == 0: + return None, all_detections + + # Filter new detections + new_dets = filter_instances(pose_instances, keep_new_indices) + new_dets.scores = pose_instances.keypoint_scores[keep_new_indices].mean(axis=-1) + + old_dets = None + if len(keep_old_indices) > 0: + old_dets = filter_instances(all_detections, keep_old_indices) + + # Run SAM refinement + new_detections = process_image_with_SAM( + DotDict(self.config.sam2.prompting), + img.copy(), + self.sam2_model, + new_dets, + old_dets if old_dets is not None else None, + ) + + return new_detections, old_dets + + def _mask_out_image( + self, + img: np.ndarray, + detections: InstanceData, + ) -> np.ndarray: + """Mask out detected instances from image for next iteration.""" + masked_img = img.copy() + if hasattr(detections, "refined_masks") and detections.refined_masks is not None: + for mask in detections.refined_masks: + if mask is not None: + masked_img[mask.astype(bool)] = 0 + return masked_img + + def _format_result( + self, + detections: Optional[InstanceData], + img_shape: tuple, + ) -> Dict: + """Format detection results into standard output dict.""" + if detections is None or len(detections.bboxes) == 0: + return { + "bboxes": np.zeros((0, 4)), + "masks": np.zeros((0, img_shape[0], img_shape[1])), + "keypoints": np.zeros((0, 17, 3)), + "presence": np.zeros((0, 17)), + "visibility": np.zeros((0, 17)), + } + + # Extract refined masks if available + if hasattr(detections, "refined_masks") and detections.refined_masks is not None: + masks = detections.refined_masks + elif hasattr(detections, "pred_masks") and detections.pred_masks is not None: + masks = detections.pred_masks + elif hasattr(detections, "masks") and detections.masks is not None: + masks = detections.masks + else: + masks = np.zeros((len(detections.bboxes), img_shape[0], img_shape[1])) + + return { + "bboxes": detections.bboxes, + "masks": masks, + "keypoints": detections.keypoints, + "presence": detections.keypoint_prob, + "visibility": detections.keypoint_vis, + } + + def visualize( + self, + image: Union[str, np.ndarray], + result: Dict, + save_path: Optional[str] = None, + vis_type: str = "pose", + ) -> np.ndarray: + """ + Visualize BBoxMaskPose results on image. + + Args: + image: Image path (str) or BGR numpy array. + result: Result dict from predict(). + save_path: Optional path to save visualization. + vis_type: Type of visualization ("pose" or "mask"). + Returns: + np.ndarray: Visualization image (BGR). + """ + # Load image + if isinstance(image, str): + img = cv2.imread(image) + if img is None: + raise ValueError(f"Failed to load image from {image}") + else: + img = image.copy() + + if vis_type == "mask": + vis_img, _ = _visualize_predictions( + img, + bboxes=result["bboxes"], + scores=np.ones(len(result["bboxes"])), + masks=result["masks"], + poses=result["keypoints"], + vis_type="mask", + mask_is_binary=True, + ) + img = vis_img + else: + # Visualize using posevis_lite + keypoints = result["keypoints"] + keypoints = keypoints[:, :17, :] # Use first 17 COCO keypoints + img = pose_visualization( + img, + keypoints, + width_multiplier=8, + differ_individuals=True, + keep_image_size=True, + ) + + # Save if requested + if save_path is not None: + cv2.imwrite(save_path, img) + + return img diff --git a/configs/README.md b/bboxmaskpose/configs/README.md similarity index 100% rename from configs/README.md rename to bboxmaskpose/configs/README.md diff --git a/configs/bmp_D3.yaml b/bboxmaskpose/configs/bmp_D3.yaml similarity index 67% rename from configs/bmp_D3.yaml rename to bboxmaskpose/configs/bmp_D3.yaml index a537f13ee67cbba1ba0b32e8b00d4a574a46fe19..9cf6f207b5cf157394232761724375851387e51b 100644 --- a/configs/bmp_D3.yaml +++ b/bboxmaskpose/configs/bmp_D3.yaml @@ -1,3 +1,8 @@ +###################################################################################### +### THIS CONFIG IS DEPRACATED AND KEPT ONLY FOR REPRODUCTION OF BMPv1 EXPERIMENTS. ### +### FOR BMPv2 EXPERIMENTS, PLEASE USE THE bmp_v2.yaml CONFIG. ### +###################################################################################### + # BBoxMaskPose Hyperparameters from Experiment D3. # For details, see the paper: https://arxiv.org/abs/2412.01562, Tab 8. in the supplementary. @@ -11,8 +16,10 @@ detector: det_prime_checkpoint: null pose_estimator: - pose_config: 'mmpose/configs/MaskPose/ViTb-multi_mask.py' - pose_checkpoint: 'https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/MaskPose-b.pth' + pose_config: 'mmpose/configs/ProbMaskPose/PMPose-b-1.0.0.py' + pose_checkpoint: 'models/pose_estimators/PMPose-b-1.0.0.pth' + # pose_config: 'mmpose/configs/MaskPose/ViTb-multi_mask.py' + # pose_checkpoint: 'https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/MaskPose-b.pth' sam2: sam2_config: 'configs/samurai/sam2.1_hiera_b+.yaml' # Use SAMURAI as it has img_size 1024 (SAM-2.1 has 512) diff --git a/configs/bmp_J1.yaml b/bboxmaskpose/configs/bmp_J1.yaml similarity index 82% rename from configs/bmp_J1.yaml rename to bboxmaskpose/configs/bmp_J1.yaml index 51a43715a877f084e531356d1a02d4413dc776ce..4516a1880c6e27ef07c373c1ba310470a2cb93c5 100644 --- a/configs/bmp_J1.yaml +++ b/bboxmaskpose/configs/bmp_J1.yaml @@ -1,3 +1,8 @@ +###################################################################################### +### THIS CONFIG IS DEPRACATED AND KEPT ONLY FOR REPRODUCTION OF BMPv1 EXPERIMENTS. ### +### FOR BMPv2 EXPERIMENTS, PLEASE USE THE bmp_v2.yaml CONFIG. ### +###################################################################################### + # BBoxMaskPose Hyperparameters from Experiment J1. # For details, see the paper: https://arxiv.org/abs/2412.01562, Tab 8. in the supplementary. diff --git a/bboxmaskpose/configs/bmp_v2.yaml b/bboxmaskpose/configs/bmp_v2.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f90b25ea951548e1bed320a19c378ba1c7c9f56e --- /dev/null +++ b/bboxmaskpose/configs/bmp_v2.yaml @@ -0,0 +1,34 @@ +# This configuration is good for the BMP loop as was used for most of the experiments. +detector: + det_config: 'mmpose/configs/mmdet/rtmdet/rtmdet-ins_l_8xb32-300e_coco.py' + det_checkpoint: 'https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/rtmdet-ins-l-mask.pth' + + # Detectors D and D' could be different. + det_prime_config: null + det_prime_checkpoint: null + +pose_estimator: + pose_config: 'mmpose/configs/ProbMaskPose/PMPose-b-1.0.0.py' + pose_checkpoint: 'https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/PMPose/PMPose-b-1.0.0.pth' + +sam2: + sam2_config: 'configs/sam-pose2seg/sam-pose2seg_hiera_b+.yaml' + sam2_checkpoint: 'https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/SAM-pose2seg_hiera_b%2B.pt' + prompting: + batch: False + use_bbox: False + num_pos_keypoints: 3 + num_pos_keypoints_if_crowd: 3 + num_neg_keypoints: 0 + confidence_thr: 0.5 # not used + visibility_thr: 0.5 # not used + selection_method: 'k_most_visible' + extend_bbox: False + pose_mask_consistency: False + crowd_by_max_iou: False # Determine if the instance is in the multi-body scenario. If yes, use different amount of keypoints and NO BBOX. If no, use bbox according to 'use_bbox' argument. + crop: False + exclusive_masks: True + ignore_small_bboxes: False + +num_bmp_iters: 2 +oks_nms_thr: 0.8 \ No newline at end of file diff --git a/demo/demo_utils.py b/bboxmaskpose/demo_utils.py similarity index 84% rename from demo/demo_utils.py rename to bboxmaskpose/demo_utils.py index eb3285eac703bbdc56f1a090c6b568994210c679..4877ced3f14a414db86e87c4dff54ad535fe55df 100644 --- a/demo/demo_utils.py +++ b/bboxmaskpose/demo_utils.py @@ -1,3 +1,4 @@ +# Copyright (c) authors of BBoxMaskPose (BMPv2). All rights reserved. """ Utilities for the BMP demo: - Visualization of detections, masks, and poses @@ -18,9 +19,10 @@ import numpy as np from mmengine.logging import print_log from mmengine.structures import InstanceData from pycocotools import mask as Mask -from sam2.distinctipy import get_colors from tqdm import tqdm +from bboxmaskpose.sam2.distinctipy import get_colors + ### Visualization hyperparameters MIN_CONTOUR_AREA: int = 50 BBOX_WEIGHT: float = 0.9 @@ -38,6 +40,21 @@ except ImportError: from .posevis_lite import pose_visualization +WHITELIST_ATTRIBUTES = [ + "bboxes", + "bbox_scores", + "keypoints", + "keypoint_scores", + "scores", + "pred_masks", + "refined_masks", + "sam_scores", + "sam_kpts", + "keypoint_vis", + "keypoint_prob", +] + + class DotDict(dict): """Dictionary with attribute access and nested dict wrapping.""" @@ -68,17 +85,7 @@ def filter_instances(instances: InstanceData, indices): return None data = {} # Attributes to filter - for attr in [ - "bboxes", - "bbox_scores", - "keypoints", - "keypoint_scores", - "scores", - "pred_masks", - "refined_masks", - "sam_scores", - "sam_kpts", - ]: + for attr in WHITELIST_ATTRIBUTES: if hasattr(instances, attr): arr = getattr(instances, attr) data[attr] = arr[indices] if arr is not None else None @@ -95,17 +102,7 @@ def concat_instances(instances1: InstanceData, instances2: InstanceData): if instances2 is None: return instances1 data = {} - for attr in [ - "bboxes", - "bbox_scores", - "keypoints", - "keypoint_scores", - "scores", - "pred_masks", - "refined_masks", - "sam_scores", - "sam_kpts", - ]: + for attr in WHITELIST_ATTRIBUTES: arr1 = getattr(instances1, attr, None) arr2 = getattr(instances2, attr, None) if arr1 is None and arr2 is None: @@ -145,43 +142,20 @@ def _visualize_predictions( """ vis_types = vis_type.split("+") - # # Filter-out small detections to make the visualization more clear - # new_bboxes = [] - # new_scores = [] - # new_masks = [] - # new_poses = [] - # size_thr = img.shape[0] * img.shape[1] * 0.01 - # for bbox, score, mask, pose in zip(bboxes, scores, masks, poses): - # area = mask.sum() # Assume binary mask. OK for demo purposes - # if area > size_thr: - # new_bboxes.append(bbox) - # new_scores.append(score) - # new_masks.append(mask) - # new_poses.append(pose) - # bboxes = np.array(new_bboxes) - # scores = np.array(new_scores) - # masks = new_masks - # poses = new_poses - + # Exclude white, black, and green colors from the palette as they are not distinctive + colors = (np.array(get_colors(len(bboxes), exclude_colors=[(0, 1, 0), (0, 0, 0), (1, 1, 1)], rng=0)) * 255).astype(int) + if mask_is_binary: poly_masks: List[Optional[List[np.ndarray]]] = [] for binary_mask in masks: if binary_mask is not None: - contours, _ = cv2.findContours( - (binary_mask * 255).astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE - ) + contours, _ = cv2.findContours((binary_mask * 255).astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) polys = [cnt.flatten() for cnt in contours if cv2.contourArea(cnt) >= MIN_CONTOUR_AREA] else: polys = None poly_masks.append(polys) masks = poly_masks # type: ignore - # Exclude white, black, and green colors from the palette as they are not distinctive - colors = (np.array(get_colors(len(bboxes), exclude_colors=[(0, 1, 0), (.5, .5, .5), (0, 0, 0), (1, 1, 1)], rng=0)) * 255).astype( - int - ) - - if "inv-mask" in vis_types: stencil = np.zeros_like(img) @@ -272,9 +246,7 @@ def visualize_itteration( label = "BMP {:d}x: {}".format(iteration_idx + 1, vis_def["label"]) cv2.putText(vis_img, label, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 3) cv2.putText(vis_img, label, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 255), 2) - out_path = os.path.join( - output_root, "{}_iter{}_{}.jpg".format(img_name, iteration_idx + 1, vis_def["label"].replace(" ", "_")) - ) + out_path = os.path.join(output_root, "{}_iter{}_{}.jpg".format(img_name, iteration_idx + 1, vis_def["label"].replace(" ", "_"))) cv2.imwrite(str(out_path), vis_img) # Show prompting keypoints @@ -311,43 +283,6 @@ def visualize_itteration( return masked_out -def visualize_demo( - img: np.ndarray, detections: Any, -) -> Optional[np.ndarray]: - """ - Generate and save visualization images for each BMP iteration. - - Args: - img (np.ndarray): Original input image. - detections: InstanceData containing bboxes, scores, masks, keypoints. - iteration_idx (int): Current iteration index (0-based). - output_root (Path): Directory to save output images. - img_name (str): Base name of the image without extension. - with_text (bool): Whether to overlay text labels. - - Returns: - Optional[np.ndarray]: The masked-out image if generated, else None. - """ - bboxes = detections.bboxes - scores = detections.scores - pred_masks = detections.pred_masks - refined_masks = detections.refined_masks - keypoints = detections.keypoints - - returns = [] - for vis_def in [ - {"type": "mask-out", "masks": refined_masks, "label": ""}, - {"type": "mask+pose", "masks": pred_masks, "label": "RTMDet-L"}, - {"type": "mask+pose", "masks": refined_masks, "label": "BMP"}, - ]: - vis_img, colors = _visualize_predictions( - img.copy(), bboxes, scores, vis_def["masks"], keypoints, vis_type=vis_def["type"], mask_is_binary=True - ) - returns.append(vis_img) - - return returns - - def create_GIF( img_path: Path, output_root: Path, @@ -419,7 +354,6 @@ def create_GIF( # Add 'before' and 'after' images after1_img = os.path.join(dirname, "{}_iter{}_Final_Poses.jpg".format(img_name_wo_ext, bmp_x)) after2_img = os.path.join(dirname, "{}_iter{}_SAM_Masks.jpg".format(img_name_wo_ext, bmp_x)) - # gif_images.append(os.path.join(dirname, "black_image.jpg")) # Add black image at the end gif_images.append(after1_img) gif_images.append(after2_img) gif_images.append(os.path.join(dirname, "black_image.jpg")) # Add black image at the end @@ -457,10 +391,7 @@ def create_GIF( right = "[{}:v]".format(i) out = "[v{}]".format(i) offset = (i - 1) * (display_dur + fade_dur) + display_dur - parts.append( - "{}{}xfade=transition=fade:".format(left, right) - + "duration={}:offset={:.3f}{}".format(fade_dur, offset, out) - ) + parts.append("{}{}xfade=transition=fade:".format(left, right) + "duration={}:offset={:.3f}{}".format(fade_dur, offset, out)) filter_complex = ";".join(parts) # 3. make MP4 slideshow @@ -544,9 +475,7 @@ def create_GIF( print_log(f"GIF saved as '{gif_output_path}'", logger="current") -def _update_bbox_by_mask( - bbox: List[int], mask_poly: Optional[List[List[int]]], image_shape: Tuple[int, int, int] -) -> List[int]: +def _update_bbox_by_mask(bbox: List[int], mask_poly: Optional[List[List[int]]], image_shape: Tuple[int, int, int]) -> List[int]: """ Adjust bounding box to tightly fit mask polygon. @@ -591,11 +520,6 @@ def pose_nms(config: Any, image_kpts: np.ndarray, image_bboxes: np.ndarray, num_ Returns: np.ndarray: Indices of kept instances. """ - # Sort image kpts by average score - lowest first - # scores = image_kpts[:, :, 2].mean(axis=1) - # sort_idx = np.argsort(scores) - # image_kpts = image_kpts[sort_idx, :, :] - # Compute OKS between all pairs of poses oks_matrix = np.zeros((image_kpts.shape[0], image_kpts.shape[0])) for i in range(image_kpts.shape[0]): @@ -611,8 +535,7 @@ def pose_nms(config: Any, image_kpts: np.ndarray, image_bboxes: np.ndarray, num_ dt = {"keypoints": image_kpts[j].copy(), "bbox": gt_bbox_xyxy} gt["keypoints"][:, 2] = (gt["keypoints"][:, 2] > config.confidence_thr) * 2 oks = compute_oks(gt, dt) - if oks > 1: - breakpoint() + assert oks <= 1.0, f"OKS value {oks} exceeds 1.0, which indicates a bug in compute_oks" oks_matrix[i, j] = oks np.fill_diagonal(oks_matrix, -1) @@ -653,13 +576,10 @@ def compute_oks(gt: Dict[str, Any], dt: Dict[str, Any], use_area: bool = True, p Returns: float: OKS score or mean OKS. """ - sigmas = ( - np.array([0.26, 0.25, 0.25, 0.35, 0.35, 0.79, 0.79, 0.72, 0.72, 0.62, 0.62, 1.07, 1.07, 0.87, 0.87, 0.89, 0.89]) - / 10.0 - ) + sigmas = np.array([0.26, 0.25, 0.25, 0.35, 0.35, 0.79, 0.79, 0.72, 0.72, 0.62, 0.62, 1.07, 1.07, 0.87, 0.87, 0.89, 0.89]) / 10.0 vars = (sigmas * 2) ** 2 k = len(sigmas) - visibility_condition = lambda x: x > 0 + visibility_condition = lambda x: x > 0.3 g = np.array(gt["keypoints"]).reshape(k, 3) xg = g[:, 0] yg = g[:, 1] diff --git a/demo/posevis_lite.py b/bboxmaskpose/posevis_lite.py similarity index 97% rename from demo/posevis_lite.py rename to bboxmaskpose/posevis_lite.py index 89de044228c0c3e82f36b704dd4b29926cdbd96d..25027e2cae771e070f0fe9f833f7bd417b110c62 100644 --- a/demo/posevis_lite.py +++ b/bboxmaskpose/posevis_lite.py @@ -1,9 +1,13 @@ +# Copyright (c) authors of BBoxMaskPose (BMPv2). All rights reserved. + import os from typing import Any, Dict, List, Optional, Tuple, Union import cv2 import numpy as np +from bboxmaskpose.sam2.distinctipy import get_colors + NEUTRAL_COLOR = (52, 235, 107) LEFT_ARM_COLOR = (216, 235, 52) @@ -85,14 +89,6 @@ def _draw_line( start = np.array(start)[:2] stop = np.array(stop)[:2] if line_type.lower() == "solid": - img = cv2.line( - img, - (int(start[0]), int(start[1])), - (int(stop[0]), int(stop[1])), - color=(0, 0, 0), - thickness=thickness+1, - lineType=cv2.LINE_AA, - ) img = cv2.line( img, (int(start[0]), int(start[1])), @@ -193,7 +189,14 @@ def pose_visualization( if not isinstance(color, (list, tuple)): color = [color for keypoint in keypoints] else: - color = [None for keypoint in keypoints] + if differ_individuals: + color = ( + (np.array(get_colors(len(keypoints), exclude_colors=[(0, 1, 0), (0, 0, 0), (1, 1, 1)], rng=0)) * 255) + .astype(int) + .tolist() + ) + else: + color = [None for keypoint in keypoints] max_padding = [0, 0, 0, 0] for keypoint, clr in zip(keypoints, color): @@ -243,12 +246,9 @@ def pose_visualization( # If conf >= confidence_thr: conf = 2 vis_is_float = np.any(np.logical_and(keypoints[:, -1] > 0, keypoints[:, -1] < 1)) if keypoints.shape[1] == 3 and vis_is_float: - # print("before", keypoints[:, -1]) lower_idx = keypoints[:, -1] < confidence_thr keypoints[lower_idx, -1] = 1 keypoints[~lower_idx, -1] = 2 - # print("after", keypoints[:, -1]) - # print("-"*20) # All visibility values should be ints keypoints[:, -1] = keypoints[:, -1].astype(int) diff --git a/sam2/__init__.py b/bboxmaskpose/sam2/__init__.py similarity index 82% rename from sam2/__init__.py rename to bboxmaskpose/sam2/__init__.py index 0712dd03cb280ab94ba04f8a32aa8ddc8aa3db4a..e7ce8d3449e5b73af6a7a03e7d034261e07fe2b2 100644 --- a/sam2/__init__.py +++ b/bboxmaskpose/sam2/__init__.py @@ -8,4 +8,4 @@ from hydra import initialize_config_module from hydra.core.global_hydra import GlobalHydra if not GlobalHydra.instance().is_initialized(): - initialize_config_module("sam2", version_base="1.2") + initialize_config_module("bboxmaskpose.sam2", version_base="1.2") diff --git a/sam2/automatic_mask_generator.py b/bboxmaskpose/sam2/automatic_mask_generator.py similarity index 89% rename from sam2/automatic_mask_generator.py rename to bboxmaskpose/sam2/automatic_mask_generator.py index 065e469e27c2d3af40d51d072031e828692c799b..6592ec4393b8e60492c70ed861c93bb71578c829 100644 --- a/sam2/automatic_mask_generator.py +++ b/bboxmaskpose/sam2/automatic_mask_generator.py @@ -11,9 +11,10 @@ import numpy as np import torch from torchvision.ops.boxes import batched_nms, box_area # type: ignore -from sam2.modeling.sam2_base import SAM2Base -from sam2.sam2_image_predictor import SAM2ImagePredictor -from sam2.utils.amg import ( +from bboxmaskpose.sam2.modeling.sam2_base import SAM2Base +from bboxmaskpose.sam2.sam2_image_predictor import SAM2ImagePredictor +from bboxmaskpose.sam2.utils.amg import ( + MaskData, area_from_rle, batch_iterator, batched_mask_to_box, @@ -24,7 +25,6 @@ from sam2.utils.amg import ( generate_crop_boxes, is_box_near_crop_edge, mask_to_rle_pytorch, - MaskData, remove_small_regions, rle_to_mask, uncrop_boxes_xyxy, @@ -103,9 +103,7 @@ class SAM2AutomaticMaskGenerator: multimask_output (bool): Whether to output multimask at each point of the grid. """ - assert (points_per_side is None) != ( - point_grids is None - ), "Exactly one of points_per_side or point_grid must be provided." + assert (points_per_side is None) != (point_grids is None), "Exactly one of points_per_side or point_grid must be provided." if points_per_side is not None: self.point_grids = build_all_layer_point_grids( points_per_side, @@ -161,7 +159,7 @@ class SAM2AutomaticMaskGenerator: Returns: (SAM2AutomaticMaskGenerator): The loaded model. """ - from sam2.build_sam import build_sam2_hf + from bboxmaskpose.sam2.build_sam import build_sam2_hf sam_model = build_sam2_hf(model_id, **kwargs) return cls(sam_model, **kwargs) @@ -197,9 +195,7 @@ class SAM2AutomaticMaskGenerator: # Encode masks if self.output_mode == "coco_rle": - mask_data["segmentations"] = [ - coco_encode_rle(rle) for rle in mask_data["rles"] - ] + mask_data["segmentations"] = [coco_encode_rle(rle) for rle in mask_data["rles"]] elif self.output_mode == "binary_mask": mask_data["segmentations"] = [rle_to_mask(rle) for rle in mask_data["rles"]] else: @@ -223,9 +219,7 @@ class SAM2AutomaticMaskGenerator: def _generate_masks(self, image: np.ndarray) -> MaskData: orig_size = image.shape[:2] - crop_boxes, layer_idxs = generate_crop_boxes( - orig_size, self.crop_n_layers, self.crop_overlap_ratio - ) + crop_boxes, layer_idxs = generate_crop_boxes(orig_size, self.crop_n_layers, self.crop_overlap_ratio) # Iterate over image crops data = MaskData() @@ -268,9 +262,7 @@ class SAM2AutomaticMaskGenerator: # Generate masks for this crop in batches data = MaskData() for (points,) in batch_iterator(self.points_per_batch, points_for_image): - batch_data = self._process_batch( - points, cropped_im_size, crop_box, orig_size, normalize=True - ) + batch_data = self._process_batch(points, cropped_im_size, crop_box, orig_size, normalize=True) data.cat(batch_data) del batch_data self.predictor.reset_predictor() @@ -302,15 +294,9 @@ class SAM2AutomaticMaskGenerator: orig_h, orig_w = orig_size # Run model on this batch - points = torch.as_tensor( - points, dtype=torch.float32, device=self.predictor.device - ) - in_points = self.predictor._transforms.transform_coords( - points, normalize=normalize, orig_hw=im_size - ) - in_labels = torch.ones( - in_points.shape[0], dtype=torch.int, device=in_points.device - ) + points = torch.as_tensor(points, dtype=torch.float32, device=self.predictor.device) + in_points = self.predictor._transforms.transform_coords(points, normalize=normalize, orig_hw=im_size) + in_labels = torch.ones(in_points.shape[0], dtype=torch.int, device=in_points.device) masks, iou_preds, low_res_masks = self.predictor._predict( in_points[:, None, :], in_labels[:, None], @@ -334,23 +320,15 @@ class SAM2AutomaticMaskGenerator: data.filter(keep_mask) # Calculate and filter by stability score - data["stability_score"] = calculate_stability_score( - data["masks"], self.mask_threshold, self.stability_score_offset - ) + data["stability_score"] = calculate_stability_score(data["masks"], self.mask_threshold, self.stability_score_offset) if self.stability_score_thresh > 0.0: keep_mask = data["stability_score"] >= self.stability_score_thresh data.filter(keep_mask) else: # One step refinement using previous mask predictions - in_points = self.predictor._transforms.transform_coords( - data["points"], normalize=normalize, orig_hw=im_size - ) - labels = torch.ones( - in_points.shape[0], dtype=torch.int, device=in_points.device - ) - masks, ious = self.refine_with_m2m( - in_points, labels, data["low_res_masks"], self.points_per_batch - ) + in_points = self.predictor._transforms.transform_coords(data["points"], normalize=normalize, orig_hw=im_size) + labels = torch.ones(in_points.shape[0], dtype=torch.int, device=in_points.device) + masks, ious = self.refine_with_m2m(in_points, labels, data["low_res_masks"], self.points_per_batch) data["masks"] = masks.squeeze(1) data["iou_preds"] = ious.squeeze(1) @@ -358,9 +336,7 @@ class SAM2AutomaticMaskGenerator: keep_mask = data["iou_preds"] > self.pred_iou_thresh data.filter(keep_mask) - data["stability_score"] = calculate_stability_score( - data["masks"], self.mask_threshold, self.stability_score_offset - ) + data["stability_score"] = calculate_stability_score(data["masks"], self.mask_threshold, self.stability_score_offset) if self.stability_score_thresh > 0.0: keep_mask = data["stability_score"] >= self.stability_score_thresh data.filter(keep_mask) @@ -370,9 +346,7 @@ class SAM2AutomaticMaskGenerator: data["boxes"] = batched_mask_to_box(data["masks"]) # Filter boxes that touch crop boundaries - keep_mask = ~is_box_near_crop_edge( - data["boxes"], crop_box, [0, 0, orig_w, orig_h] - ) + keep_mask = ~is_box_near_crop_edge(data["boxes"], crop_box, [0, 0, orig_w, orig_h]) if not torch.all(keep_mask): data.filter(keep_mask) @@ -384,9 +358,7 @@ class SAM2AutomaticMaskGenerator: return data @staticmethod - def postprocess_small_regions( - mask_data: MaskData, min_area: int, nms_thresh: float - ) -> MaskData: + def postprocess_small_regions(mask_data: MaskData, min_area: int, nms_thresh: float) -> MaskData: """ Removes small disconnected regions and holes in masks, then reruns box NMS to remove any new duplicates. @@ -438,9 +410,7 @@ class SAM2AutomaticMaskGenerator: new_masks = [] new_iou_preds = [] - for cur_points, cur_point_labels, low_res_mask in batch_iterator( - points_per_batch, points, point_labels, low_res_masks - ): + for cur_points, cur_point_labels, low_res_mask in batch_iterator(points_per_batch, points, point_labels, low_res_masks): best_masks, best_iou_preds, _ = self.predictor._predict( cur_points[:, None, :], cur_point_labels[:, None], diff --git a/sam2/benchmark.py b/bboxmaskpose/sam2/benchmark.py similarity index 89% rename from sam2/benchmark.py rename to bboxmaskpose/sam2/benchmark.py index 6519534c8619e04b9a632859a5128ad2cee34c13..054b69c8f33dc7f3ca9a561cf0ecf0d19d3f3df9 100644 --- a/sam2/benchmark.py +++ b/bboxmaskpose/sam2/benchmark.py @@ -11,7 +11,7 @@ import numpy as np import torch from tqdm import tqdm -from sam2.build_sam import build_sam2_video_predictor +from bboxmaskpose.sam2.build_sam import build_sam2_video_predictor # Only cuda supported assert torch.cuda.is_available() @@ -28,19 +28,13 @@ sam2_checkpoint = "checkpoints/sam2.1_hiera_base_plus.pt" model_cfg = "configs/sam2.1/sam2.1_hiera_b+.yaml" # Build video predictor with vos_optimized=True setting -predictor = build_sam2_video_predictor( - model_cfg, sam2_checkpoint, device=device, vos_optimized=True -) +predictor = build_sam2_video_predictor(model_cfg, sam2_checkpoint, device=device, vos_optimized=True) # Initialize with video video_dir = "notebooks/videos/bedroom" # scan all the JPEG frame names in this directory -frame_names = [ - p - for p in os.listdir(video_dir) - if os.path.splitext(p)[-1] in [".jpg", ".jpeg", ".JPG", ".JPEG"] -] +frame_names = [p for p in os.listdir(video_dir) if os.path.splitext(p)[-1] in [".jpg", ".jpeg", ".JPG", ".JPEG"]] frame_names.sort(key=lambda p: int(os.path.splitext(p)[0])) inference_state = predictor.init_state(video_path=video_dir) diff --git a/sam2/build_sam.py b/bboxmaskpose/sam2/build_sam.py similarity index 82% rename from sam2/build_sam.py rename to bboxmaskpose/sam2/build_sam.py index f0c79b6462848a185770f60e343f8a23ab9489ea..d58abb28b098e0c4fd4ea21960e47599675d46ea 100644 --- a/sam2/build_sam.py +++ b/bboxmaskpose/sam2/build_sam.py @@ -6,14 +6,16 @@ import logging import os +import urllib.parse as urlparse import torch -from hydra import compose + +import bboxmaskpose.sam2 as sam2 +from hydra import compose, initialize_config_dir +from hydra.core.global_hydra import GlobalHydra from hydra.utils import instantiate from omegaconf import OmegaConf -import sam2 - # Check if the user is running Python from the parent directory of the sam2 repo # (i.e. the directory where this repo is cloned into) -- this is not supported since # it could shadow the sam2 package and cause issues. @@ -86,13 +88,26 @@ def build_sam2( "++model.sam_mask_decoder_extra_args.dynamic_multimask_stability_delta=0.05", "++model.sam_mask_decoder_extra_args.dynamic_multimask_stability_thresh=0.98", ] + + # IMPORTANT: compose() requires Hydra to be initialized with a config source. + # Also important if build_sam2() can be called multiple times in one process. + GlobalHydra.instance().clear() + + # Point Hydra at the directory that contains the SAM2 yaml configs + config_dir = os.path.dirname(config_file) + + # Hydra expects config_name WITHOUT .yaml + config_name = os.path.basename(config_file).replace(".yaml", "") + # Read config and init model try: - cfg = compose(config_name=config_file) + with initialize_config_dir(version_base=None, config_dir=str(config_dir)): + cfg = compose(config_name=config_name, overrides=hydra_overrides_extra) + # cfg = compose(config_name=config_file) except Exception as e: logging.error(f"Error loading config: {e}") cfg = compose(config_name=config_file, overrides=hydra_overrides_extra) - + OmegaConf.resolve(cfg) model = instantiate(cfg.model, _recursive_=True) _load_checkpoint(model, ckpt_path) @@ -161,14 +176,23 @@ def build_sam2_hf(model_id, **kwargs): def build_sam2_video_predictor_hf(model_id, **kwargs): config_name, ckpt_path = _hf_download(model_id) - return build_sam2_video_predictor( - config_file=config_name, ckpt_path=ckpt_path, **kwargs - ) + return build_sam2_video_predictor(config_file=config_name, ckpt_path=ckpt_path, **kwargs) + + +def _is_url(path: str) -> bool: + return urlparse.urlparse(path).scheme != "" def _load_checkpoint(model, ckpt_path): if ckpt_path is not None: - sd = torch.load(ckpt_path, map_location="cpu", weights_only=True)["model"] + + if _is_url(ckpt_path): + sd = torch.hub.load_state_dict_from_url(ckpt_path, map_location="cpu", weights_only=True)["model"] + elif os.path.exists(ckpt_path): + sd = torch.load(ckpt_path, map_location="cpu", weights_only=True)["model"] + else: + raise FileNotFoundError(f"Checkpoint not found: {ckpt_path}") + missing_keys, unexpected_keys = model.load_state_dict(sd) if missing_keys: logging.error(missing_keys) @@ -176,4 +200,5 @@ def _load_checkpoint(model, ckpt_path): if unexpected_keys: logging.error(unexpected_keys) raise RuntimeError() + logging.info("Loaded checkpoint sucessfully") diff --git a/sam2/colorblind.py b/bboxmaskpose/sam2/colorblind.py similarity index 94% rename from sam2/colorblind.py rename to bboxmaskpose/sam2/colorblind.py index fc8d298d5c1de8b48b1d4a47a704a5be61704206..8a0b77ba0d252c122bebd7e6a32ac580dd6cfd6b 100644 --- a/sam2/colorblind.py +++ b/bboxmaskpose/sam2/colorblind.py @@ -1,7 +1,10 @@ +# Adapted from the distinctipy repository (https://github.com/alan-turing-institute/distinctipy). +# Original authors: distinctipy contributors. Included with minor modifications. """ Adapted from "The Color Blind Simulation function" by Matthew Wickline and the Human - Computer Interaction Resource Network (http://hcirn.com/), 2000 - 2001. """ + import numpy as np rBlind = { @@ -261,16 +264,13 @@ def simulate_colors(colors, colorblind_type="Deuteranomaly", one_row=None, show= :return: """ import matplotlib.pyplot as plt - from distinctipy import distinctipy filtered_colors = [colorblind_filter(color, colorblind_type) for color in colors] fig, axes = plt.subplots(1, 2, figsize=(8, 4)) - distinctipy.color_swatch( - colors, ax=axes[0], one_row=one_row, title="Viewed with Normal Sight" - ) + distinctipy.color_swatch(colors, ax=axes[0], one_row=one_row, title="Viewed with Normal Sight") distinctipy.color_swatch( filtered_colors, @@ -324,30 +324,22 @@ def simulate_clusters( """ import matplotlib.pyplot as plt import pandas as pd - from distinctipy import distinctipy if dataset not in ("s1", "s2", "s3", "s4", "a1", "a2", "a3", "b1"): raise ValueError("dataset must be s1, s2, s3, s4, a1, a2, a3 or b1") - URL = ( - "https://raw.githubusercontent.com/alan-turing-institute/distinctipy/" - "main/distinctipy/datasets/" - ) + URL = "https://raw.githubusercontent.com/alan-turing-institute/distinctipy/" "main/distinctipy/datasets/" df = pd.read_csv(URL + dataset + ".csv") if colorblind_distinct: - orig_colors = distinctipy.get_colors( - df["cluster"].nunique(), colorblind_type=colorblind_type - ) + orig_colors = distinctipy.get_colors(df["cluster"].nunique(), colorblind_type=colorblind_type) else: orig_colors = distinctipy.get_colors(df["cluster"].nunique()) orig_cmap = distinctipy.get_colormap(orig_colors) - filtered_colors = [ - colorblind_filter(color, colorblind_type) for color in orig_colors - ] + filtered_colors = [colorblind_filter(color, colorblind_type) for color in orig_colors] filtered_cmap = distinctipy.get_colormap(filtered_colors) fig, axes = plt.subplots(1, 2, figsize=(10, 5)) @@ -376,4 +368,4 @@ def _main(): if __name__ == "__main__": - _main() \ No newline at end of file + _main() diff --git a/bboxmaskpose/sam2/configs/sam-pose2seg/sam-pose2seg_hiera_b+.yaml b/bboxmaskpose/sam2/configs/sam-pose2seg/sam-pose2seg_hiera_b+.yaml new file mode 100644 index 0000000000000000000000000000000000000000..dfc882e045ff44bdec0c99fcf2f65976fb1f5da5 --- /dev/null +++ b/bboxmaskpose/sam2/configs/sam-pose2seg/sam-pose2seg_hiera_b+.yaml @@ -0,0 +1,118 @@ +# @package _global_ + +# Model +model: + _target_: bboxmaskpose.sam2.modeling.sam2_base.SAM2Base + image_encoder: + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.ImageEncoder + scalp: 1 + trunk: + _target_: bboxmaskpose.sam2.modeling.backbones.hieradet.Hiera + embed_dim: 112 + num_heads: 2 + neck: + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.FpnNeck + position_encoding: + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine + num_pos_feats: 256 + normalize: true + scale: null + temperature: 10000 + d_model: 256 + backbone_channel_list: [896, 448, 224, 112] + fpn_top_down_levels: [2, 3] # output level 0 and 1 directly use the backbone features + fpn_interp_model: nearest + + memory_attention: + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttention + d_model: 256 + pos_enc_at_input: true + layer: + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttentionLayer + activation: relu + dim_feedforward: 2048 + dropout: 0.1 + pos_enc_at_attn: false + self_attention: + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention + rope_theta: 10000.0 + feat_sizes: [64, 64] + embedding_dim: 256 + num_heads: 1 + downsample_rate: 1 + dropout: 0.1 + d_model: 256 + pos_enc_at_cross_attn_keys: true + pos_enc_at_cross_attn_queries: false + cross_attention: + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention + rope_theta: 10000.0 + feat_sizes: [64, 64] + rope_k_repeat: True + embedding_dim: 256 + num_heads: 1 + downsample_rate: 1 + dropout: 0.1 + kv_in_dim: 64 + num_layers: 4 + + memory_encoder: + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MemoryEncoder + out_dim: 64 + position_encoding: + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine + num_pos_feats: 64 + normalize: true + scale: null + temperature: 10000 + mask_downsampler: + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MaskDownSampler + kernel_size: 3 + stride: 2 + padding: 1 + fuser: + _target_: bboxmaskpose.sam2.modeling.memory_encoder.Fuser + layer: + _target_: bboxmaskpose.sam2.modeling.memory_encoder.CXBlock + dim: 256 + kernel_size: 7 + padding: 3 + layer_scale_init_value: 1e-6 + use_dwconv: True # depth-wise convs + num_layers: 2 + + num_maskmem: 7 + image_size: 1024 + # apply scaled sigmoid on mask logits for memory encoder, and directly feed input mask as output mask + sigmoid_scale_for_mem_enc: 20.0 + sigmoid_bias_for_mem_enc: -10.0 + use_mask_input_as_output_without_sam: true + # Memory + directly_add_no_mem_embed: true + no_obj_embed_spatial: true + # use high-resolution feature map in the SAM mask decoder + use_high_res_features_in_sam: true + # output 3 masks on the first click on initial conditioning frames + multimask_output_in_sam: true + # SAM heads + iou_prediction_use_sigmoid: True + # cross-attend to object pointers from other frames (based on SAM output tokens) in the encoder + use_obj_ptrs_in_encoder: true + add_tpos_enc_to_obj_ptrs: true + proj_tpos_enc_in_obj_ptrs: true + use_signed_tpos_enc_to_obj_ptrs: true + only_obj_ptrs_in_the_past_for_eval: true + # object occlusion prediction + pred_obj_scores: true + pred_obj_scores_mlp: true + fixed_no_obj_ptr: true + # multimask tracking settings + multimask_output_for_tracking: true + use_multimask_token_for_obj_ptr: true + multimask_min_pt_num: 0 + multimask_max_pt_num: 1 + use_mlp_for_obj_ptr_proj: true + + n_kpts_encoder: 8 + # Compilation flag + # compile_image_encoder: False diff --git a/sam2/configs/sam2.1/sam2.1_hiera_b+.yaml b/bboxmaskpose/sam2/configs/sam2.1/sam2.1_hiera_b+.yaml similarity index 72% rename from sam2/configs/sam2.1/sam2.1_hiera_b+.yaml rename to bboxmaskpose/sam2/configs/sam2.1/sam2.1_hiera_b+.yaml index 3c1bde042c39212ed2782e769f3036a03a967799..16146e76234b755c895a5d83c47e47774f3cb6c0 100644 --- a/sam2/configs/sam2.1/sam2.1_hiera_b+.yaml +++ b/bboxmaskpose/sam2/configs/sam2.1/sam2.1_hiera_b+.yaml @@ -2,18 +2,18 @@ # Model model: - _target_: sam2.modeling.sam2_base.SAM2Base + _target_: bboxmaskpose.sam2.modeling.sam2_base.SAM2Base image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.ImageEncoder scalp: 1 trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera + _target_: bboxmaskpose.sam2.modeling.backbones.hieradet.Hiera embed_dim: 112 num_heads: 2 neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.FpnNeck position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 256 normalize: true scale: null @@ -24,17 +24,17 @@ model: fpn_interp_model: nearest memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttention d_model: 256 pos_enc_at_input: true layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttentionLayer activation: relu dim_feedforward: 2048 dropout: 0.1 pos_enc_at_attn: false self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] embedding_dim: 256 @@ -45,7 +45,7 @@ model: pos_enc_at_cross_attn_keys: true pos_enc_at_cross_attn_queries: false cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] rope_k_repeat: True @@ -57,23 +57,23 @@ model: num_layers: 4 memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MemoryEncoder out_dim: 64 position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 64 normalize: true scale: null temperature: 10000 mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MaskDownSampler kernel_size: 3 stride: 2 padding: 1 fuser: - _target_: sam2.modeling.memory_encoder.Fuser + _target_: bboxmaskpose.sam2.modeling.memory_encoder.Fuser layer: - _target_: sam2.modeling.memory_encoder.CXBlock + _target_: bboxmaskpose.sam2.modeling.memory_encoder.CXBlock dim: 256 kernel_size: 7 padding: 3 diff --git a/sam2/configs/sam2.1/sam2.1_hiera_l.yaml b/bboxmaskpose/sam2/configs/sam2.1/sam2.1_hiera_l.yaml similarity index 73% rename from sam2/configs/sam2.1/sam2.1_hiera_l.yaml rename to bboxmaskpose/sam2/configs/sam2.1/sam2.1_hiera_l.yaml index 23073ea7a95901be656b3c6d1a66ce8736ab7ad3..c7ce324087f8a46833b8430d1f2381d8be8e17cb 100644 --- a/sam2/configs/sam2.1/sam2.1_hiera_l.yaml +++ b/bboxmaskpose/sam2/configs/sam2.1/sam2.1_hiera_l.yaml @@ -2,12 +2,12 @@ # Model model: - _target_: sam2.modeling.sam2_base.SAM2Base + _target_: bboxmaskpose.sam2.modeling.sam2_base.SAM2Base image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.ImageEncoder scalp: 1 trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera + _target_: bboxmaskpose.sam2.modeling.backbones.hieradet.Hiera embed_dim: 144 num_heads: 2 stages: [2, 6, 36, 4] @@ -15,9 +15,9 @@ model: window_pos_embed_bkg_spatial_size: [7, 7] window_spec: [8, 4, 16, 8] neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.FpnNeck position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 256 normalize: true scale: null @@ -28,17 +28,17 @@ model: fpn_interp_model: nearest memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttention d_model: 256 pos_enc_at_input: true layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttentionLayer activation: relu dim_feedforward: 2048 dropout: 0.1 pos_enc_at_attn: false self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] embedding_dim: 256 @@ -49,7 +49,7 @@ model: pos_enc_at_cross_attn_keys: true pos_enc_at_cross_attn_queries: false cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] rope_k_repeat: True @@ -61,23 +61,23 @@ model: num_layers: 4 memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MemoryEncoder out_dim: 64 position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 64 normalize: true scale: null temperature: 10000 mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MaskDownSampler kernel_size: 3 stride: 2 padding: 1 fuser: - _target_: sam2.modeling.memory_encoder.Fuser + _target_: bboxmaskpose.sam2.modeling.memory_encoder.Fuser layer: - _target_: sam2.modeling.memory_encoder.CXBlock + _target_: bboxmaskpose.sam2.modeling.memory_encoder.CXBlock dim: 256 kernel_size: 7 padding: 3 diff --git a/sam2/configs/sam2.1/sam2.1_hiera_s.yaml b/bboxmaskpose/sam2/configs/sam2.1/sam2.1_hiera_s.yaml similarity index 73% rename from sam2/configs/sam2.1/sam2.1_hiera_s.yaml rename to bboxmaskpose/sam2/configs/sam2.1/sam2.1_hiera_s.yaml index fd8d40465b18b3de39b0a565aca712306306c4ed..e0f727c7d5a39fe58e9c5f48f84f3e667d58f844 100644 --- a/sam2/configs/sam2.1/sam2.1_hiera_s.yaml +++ b/bboxmaskpose/sam2/configs/sam2.1/sam2.1_hiera_s.yaml @@ -2,21 +2,21 @@ # Model model: - _target_: sam2.modeling.sam2_base.SAM2Base + _target_: bboxmaskpose.sam2.modeling.sam2_base.SAM2Base image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.ImageEncoder scalp: 1 trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera + _target_: bboxmaskpose.sam2.modeling.backbones.hieradet.Hiera embed_dim: 96 num_heads: 1 stages: [1, 2, 11, 2] global_att_blocks: [7, 10, 13] window_pos_embed_bkg_spatial_size: [7, 7] neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.FpnNeck position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 256 normalize: true scale: null @@ -27,17 +27,17 @@ model: fpn_interp_model: nearest memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttention d_model: 256 pos_enc_at_input: true layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttentionLayer activation: relu dim_feedforward: 2048 dropout: 0.1 pos_enc_at_attn: false self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] embedding_dim: 256 @@ -48,7 +48,7 @@ model: pos_enc_at_cross_attn_keys: true pos_enc_at_cross_attn_queries: false cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] rope_k_repeat: True @@ -60,23 +60,23 @@ model: num_layers: 4 memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MemoryEncoder out_dim: 64 position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 64 normalize: true scale: null temperature: 10000 mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MaskDownSampler kernel_size: 3 stride: 2 padding: 1 fuser: - _target_: sam2.modeling.memory_encoder.Fuser + _target_: bboxmaskpose.sam2.modeling.memory_encoder.Fuser layer: - _target_: sam2.modeling.memory_encoder.CXBlock + _target_: bboxmaskpose.sam2.modeling.memory_encoder.CXBlock dim: 256 kernel_size: 7 padding: 3 diff --git a/sam2/configs/sam2.1/sam2.1_hiera_t.yaml b/bboxmaskpose/sam2/configs/sam2.1/sam2.1_hiera_t.yaml similarity index 74% rename from sam2/configs/sam2.1/sam2.1_hiera_t.yaml rename to bboxmaskpose/sam2/configs/sam2.1/sam2.1_hiera_t.yaml index e762aec932f26436d13798f3feb3ec82c360a943..fe2f9527631aeb3393d74f848d05255a39de452c 100644 --- a/sam2/configs/sam2.1/sam2.1_hiera_t.yaml +++ b/bboxmaskpose/sam2/configs/sam2.1/sam2.1_hiera_t.yaml @@ -2,21 +2,21 @@ # Model model: - _target_: sam2.modeling.sam2_base.SAM2Base + _target_: bboxmaskpose.sam2.modeling.sam2_base.SAM2Base image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.ImageEncoder scalp: 1 trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera + _target_: bboxmaskpose.sam2.modeling.backbones.hieradet.Hiera embed_dim: 96 num_heads: 1 stages: [1, 2, 7, 2] global_att_blocks: [5, 7, 9] window_pos_embed_bkg_spatial_size: [7, 7] neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.FpnNeck position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 256 normalize: true scale: null @@ -27,17 +27,17 @@ model: fpn_interp_model: nearest memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttention d_model: 256 pos_enc_at_input: true layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttentionLayer activation: relu dim_feedforward: 2048 dropout: 0.1 pos_enc_at_attn: false self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] embedding_dim: 256 @@ -48,7 +48,7 @@ model: pos_enc_at_cross_attn_keys: true pos_enc_at_cross_attn_queries: false cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] rope_k_repeat: True @@ -60,23 +60,23 @@ model: num_layers: 4 memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MemoryEncoder out_dim: 64 position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 64 normalize: true scale: null temperature: 10000 mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MaskDownSampler kernel_size: 3 stride: 2 padding: 1 fuser: - _target_: sam2.modeling.memory_encoder.Fuser + _target_: bboxmaskpose.sam2.modeling.memory_encoder.Fuser layer: - _target_: sam2.modeling.memory_encoder.CXBlock + _target_: bboxmaskpose.sam2.modeling.memory_encoder.CXBlock dim: 256 kernel_size: 7 padding: 3 diff --git a/bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO+CIHP_finetune_sam-pose2seg.yaml b/bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO+CIHP_finetune_sam-pose2seg.yaml new file mode 100644 index 0000000000000000000000000000000000000000..e3f221ee4fc18503a7cb680ee2d2f783635facea --- /dev/null +++ b/bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO+CIHP_finetune_sam-pose2seg.yaml @@ -0,0 +1,343 @@ +# @package _global_ + +scratch: + resolution: 1024 + train_batch_size: 1 + num_train_workers: 10 + num_frames: 1 + max_num_objects: 1 + base_lr: 5.0e-6 + vision_lr: 3.0e-06 + phases_per_epoch: 1 + num_epochs: 15 + +dataset: + # PATHS to Dataset + img_folder: path/to/datasett + gt_folder: path/to/dataset + multiplier: 2 + +# Video transforms +vos: + train_transforms: + - _target_: training.dataset.transforms.ComposeAPI + transforms: + - _target_: training.dataset.transforms.RandomHorizontalFlip + consistent_transform: True + - _target_: training.dataset.transforms.RandomAffine + degrees: 25 + shear: 20 + image_interpolation: bilinear + consistent_transform: True + - _target_: training.dataset.transforms.RandomResizeAPI + sizes: ${scratch.resolution} + square: true + consistent_transform: True + - _target_: training.dataset.transforms.ColorJitter + consistent_transform: True + brightness: 0.1 + contrast: 0.03 + saturation: 0.03 + hue: null + - _target_: training.dataset.transforms.RandomGrayscale + p: 0.05 + consistent_transform: True + - _target_: training.dataset.transforms.ColorJitter + consistent_transform: False + brightness: 0.1 + contrast: 0.05 + saturation: 0.05 + hue: null + - _target_: training.dataset.transforms.ToTensorAPI + - _target_: training.dataset.transforms.NormalizeAPI + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + +trainer: + _target_: training.trainer.Trainer + mode: train_only # change to train ? (a.k.a. train + val) + max_epochs: ${times:${scratch.num_epochs},${scratch.phases_per_epoch}} + accelerator: cuda + seed_value: 123 + + model: + _target_: training.model.sam2.SAM2Train + image_encoder: + _target_: sam2.modeling.backbones.image_encoder.ImageEncoder + scalp: 1 + trunk: + _target_: sam2.modeling.backbones.hieradet.Hiera + embed_dim: 112 + num_heads: 2 + drop_path_rate: 0.1 + neck: + _target_: sam2.modeling.backbones.image_encoder.FpnNeck + position_encoding: + _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + num_pos_feats: 256 + normalize: true + scale: null + temperature: 10000 + d_model: 256 + backbone_channel_list: [896, 448, 224, 112] + fpn_top_down_levels: [2, 3] # output level 0 and 1 directly use the backbone features + fpn_interp_model: nearest + + memory_attention: + _target_: sam2.modeling.memory_attention.MemoryAttention + d_model: 256 + pos_enc_at_input: true + layer: + _target_: sam2.modeling.memory_attention.MemoryAttentionLayer + activation: relu + dim_feedforward: 2048 + dropout: 0.1 + pos_enc_at_attn: false + self_attention: + _target_: sam2.modeling.sam.transformer.RoPEAttention + rope_theta: 10000.0 + feat_sizes: [64, 64] + embedding_dim: 256 + num_heads: 1 + downsample_rate: 1 + dropout: 0.1 + d_model: 256 + pos_enc_at_cross_attn_keys: true + pos_enc_at_cross_attn_queries: false + cross_attention: + _target_: sam2.modeling.sam.transformer.RoPEAttention + rope_theta: 10000.0 + feat_sizes: [64, 64] + rope_k_repeat: True + embedding_dim: 256 + num_heads: 1 + downsample_rate: 1 + dropout: 0.1 + kv_in_dim: 64 + num_layers: 4 + + memory_encoder: + _target_: sam2.modeling.memory_encoder.MemoryEncoder + out_dim: 64 + position_encoding: + _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + num_pos_feats: 64 + normalize: true + scale: null + temperature: 10000 + mask_downsampler: + _target_: sam2.modeling.memory_encoder.MaskDownSampler + kernel_size: 3 + stride: 2 + padding: 1 + fuser: + _target_: sam2.modeling.memory_encoder.Fuser + layer: + _target_: sam2.modeling.memory_encoder.CXBlock + dim: 256 + kernel_size: 7 + padding: 3 + layer_scale_init_value: 1e-6 + use_dwconv: True # depth-wise convs + num_layers: 2 + + num_maskmem: 7 + image_size: ${scratch.resolution} + # apply scaled sigmoid on mask logits for memory encoder, and directly feed input mask as output mask + sigmoid_scale_for_mem_enc: 20.0 + sigmoid_bias_for_mem_enc: -10.0 + use_mask_input_as_output_without_sam: true + # Memory + directly_add_no_mem_embed: true + no_obj_embed_spatial: true + # use high-resolution feature map in the SAM mask decoder + use_high_res_features_in_sam: true + # output 3 masks on the first click on initial conditioning frames + multimask_output_in_sam: true + # SAM heads + iou_prediction_use_sigmoid: True + # cross-attend to object pointers from other frames (based on SAM output tokens) in the encoder + use_obj_ptrs_in_encoder: true + add_tpos_enc_to_obj_ptrs: true + proj_tpos_enc_in_obj_ptrs: true + use_signed_tpos_enc_to_obj_ptrs: true + only_obj_ptrs_in_the_past_for_eval: true + # object occlusion prediction + pred_obj_scores: true + pred_obj_scores_mlp: true + fixed_no_obj_ptr: true + # multimask tracking settings + multimask_output_for_tracking: true + use_multimask_token_for_obj_ptr: false + multimask_min_pt_num: 0 + multimask_max_pt_num: 1 + use_mlp_for_obj_ptr_proj: true + + n_kpts_encoder: 8 + # Compilation flag + # compile_image_encoder: False + + ####### Training specific params ####### + # box/point input and corrections + prob_to_use_pt_input_for_train: 1.0 + prob_to_use_pt_input_for_eval: 0.0 + prob_to_use_box_input_for_train: 0.0 # 0.5*0.5 = 0.25 prob to use box instead of points + prob_to_use_box_input_for_eval: 0.0 + prob_to_sample_from_gt_for_train: 0.1 # with a small prob, sampling correction points from GT mask instead of prediction errors + num_frames_to_correct_for_train: 2 # iteratively sample on random 1~2 frames (always include the first frame) + num_frames_to_correct_for_eval: 1 # only iteratively sample on first frame + rand_frames_to_correct_for_train: True # random #init-cond-frame ~ 2 + add_all_frames_to_correct_as_cond: True # when a frame receives a correction click, it becomes a conditioning frame (even if it's not initially a conditioning frame) + # maximum 2 initial conditioning frames + num_init_cond_frames_for_train: 2 + rand_init_cond_frames_for_train: True # random 1~2 + num_correction_pt_per_frame: 7 ## CHANGED + use_act_ckpt_iterative_pt_sampling: false + + + + num_init_cond_frames_for_eval: 1 # only mask on the first frame + forward_backbone_per_frame_for_eval: True + + + data: + train: + _target_: training.dataset.sam2_datasets.TorchTrainMixedDataset + phases_per_epoch: ${scratch.phases_per_epoch} + batch_sizes: + - ${scratch.train_batch_size} + + datasets: + - _target_: training.dataset.utils.RepeatFactorWrapper + dataset: + _target_: training.dataset.utils.ConcatDataset + datasets: + - _target_: training.dataset.vos_dataset.VOSDataset + transforms: ${vos.train_transforms} + training: true + video_dataset: + _target_: training.dataset.vos_raw_dataset.SA1BRawDataset + img_folder: ${dataset.img_folder} + gt_folder: ${dataset.gt_folder} + # file_list_txt: ${dataset.file_list_txt} + sampler: + _target_: training.dataset.vos_sampler.RandomUniformSampler + num_frames: ${scratch.num_frames} + max_num_objects: ${scratch.max_num_objects} + multiplier: ${dataset.multiplier} + shuffle: True + num_workers: ${scratch.num_train_workers} + pin_memory: True + drop_last: True + collate_fn: + _target_: training.utils.data_utils.collate_fn + _partial_: true + dict_key: all + + # val: + + + optim: + amp: + enabled: True + amp_dtype: bfloat16 + + optimizer: + _target_: torch.optim.AdamW + + gradient_clip: + _target_: training.optimizer.GradientClipper + max_norm: 0.1 + norm_type: 2 + + param_group_modifiers: + - _target_: training.optimizer.layer_decay_param_modifier + _partial_: True + layer_decay_value: 0.9 + apply_to: 'image_encoder.trunk' + overrides: + - pattern: '*pos_embed*' + value: 1.0 + + options: + lr: + - scheduler: + _target_: fvcore.common.param_scheduler.CosineParamScheduler + start_value: ${scratch.base_lr} + end_value: ${divide:${scratch.base_lr},10} + - scheduler: + _target_: fvcore.common.param_scheduler.CosineParamScheduler + start_value: ${scratch.vision_lr} + end_value: ${divide:${scratch.vision_lr},10} + param_names: + - 'image_encoder.*' + weight_decay: + - scheduler: + _target_: fvcore.common.param_scheduler.ConstantParamScheduler + value: 0.1 + - scheduler: + _target_: fvcore.common.param_scheduler.ConstantParamScheduler + value: 0.0 + param_names: + - '*bias*' + module_cls_names: ['torch.nn.LayerNorm'] + + loss: + all: + _target_: training.loss_fns.MultiStepMultiMasksAndIous + weight_dict: + loss_mask: 20 + loss_dice: 1 + loss_iou: 1 + loss_class: 1 + supervise_all_iou: true + iou_use_l1_loss: true + pred_obj_scores: true + focal_gamma_obj_score: 0.0 + focal_alpha_obj_score: -1.0 + + distributed: + backend: nccl + find_unused_parameters: True + + logging: + tensorboard_writer: + _target_: training.utils.logger.make_tensorboard_logger + log_dir: ${launcher.experiment_log_dir}/tensorboard + flush_secs: 120 + should_log: True + log_dir: ${launcher.experiment_log_dir}/logs + log_freq: 10 + + # initialize from a SAM 2 checkpoint + checkpoint: + save_dir: ${launcher.experiment_log_dir}/checkpoints + save_freq: 0 # 0 only last checkpoint is saved. + model_weight_initializer: + _partial_: True + _target_: training.utils.checkpoint_utils.load_state_dict_into_model + strict: True + ignore_unexpected_keys: null + ignore_missing_keys: null + + state_dict: + _target_: training.utils.checkpoint_utils.load_checkpoint_and_apply_kernels + checkpoint_path: ./checkpoints/sam2.1_hiera_base_plus.pt ## CHANGED - PATH to SAM 2.1 checkpoint + ckpt_state_dict_keys: ['model'] + +launcher: + num_nodes: 1 + gpus_per_node: 8 + experiment_log_dir: null # Path to log directory, defaults to ./sam2_logs/${config_name} + +# SLURM args if running on a cluster +submitit: + partition: null + account: null + qos: null + cpus_per_task: 10 + use_cluster: false + timeout_hour: 24 + name: null + port_range: [10000, 65000] + diff --git a/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_1024_prompt.yaml b/bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_1024_prompt.yaml similarity index 86% rename from sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_1024_prompt.yaml rename to bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_1024_prompt.yaml index ef690bfe5b6eb7185c567a01b581a3daeed06d8d..e0cd8eb9476a135547d3664122704946293d7456 100644 --- a/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_1024_prompt.yaml +++ b/bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_1024_prompt.yaml @@ -11,14 +11,6 @@ scratch: phases_per_epoch: 1 num_epochs: 40 -dataset: - # PATHS to Dataset - img_folder: /mnt/personal/purkrmir/data/COCO/original/train2017/ # PATH to MOSE JPEGImages folder - gt_folder: /mnt/personal/purkrmir/data/COCO/original/annotations/ # PATH to MOSE Annotations folder - # img_folder: /datagrid/personal/purkrmir/data/COCO/original/val2017/ # PATH to MOSE JPEGImages folder - # gt_folder: /datagrid/personal/purkrmir/data/COCO/original/annotations/ # PATH to MOSE Annotations folder - file_list_txt: null # Optional PATH to filelist containing a subset of videos to be used for training - multiplier: 2 # Video transforms vos: @@ -69,19 +61,19 @@ trainer: unfreeze_decoder: False model: - _target_: training.model.sam2.SAM2Train + _target_: training.model.bboxmaskpose.sam2.sam2Train image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.ImageEncoder scalp: 1 trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera + _target_: bboxmaskpose.sam2.modeling.backbones.hieradet.Hiera embed_dim: 112 num_heads: 2 drop_path_rate: 0.1 neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.FpnNeck position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 256 normalize: true scale: null @@ -92,17 +84,17 @@ trainer: fpn_interp_model: nearest memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttention d_model: 256 pos_enc_at_input: true layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttentionLayer activation: relu dim_feedforward: 2048 dropout: 0.1 pos_enc_at_attn: false self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] embedding_dim: 256 @@ -113,7 +105,7 @@ trainer: pos_enc_at_cross_attn_keys: true pos_enc_at_cross_attn_queries: false cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] rope_k_repeat: True @@ -125,23 +117,23 @@ trainer: num_layers: 4 memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MemoryEncoder out_dim: 64 position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 64 normalize: true scale: null temperature: 10000 mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MaskDownSampler kernel_size: 3 stride: 2 padding: 1 fuser: - _target_: sam2.modeling.memory_encoder.Fuser + _target_: bboxmaskpose.sam2.modeling.memory_encoder.Fuser layer: - _target_: sam2.modeling.memory_encoder.CXBlock + _target_: bboxmaskpose.sam2.modeling.memory_encoder.CXBlock dim: 256 kernel_size: 7 padding: 3 @@ -325,7 +317,7 @@ trainer: state_dict: _target_: training.utils.checkpoint_utils.load_checkpoint_and_apply_kernels - checkpoint_path: ./checkpoints/sam2.1_hiera_base_plus.pt # PATH to SAM 2.1 checkpoint + checkpoint_path: ./checkpoints/bboxmaskpose.sam2.1_hiera_base_plus.pt # PATH to SAM 2.1 checkpoint ckpt_state_dict_keys: ['model'] launcher: diff --git a/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_finetune.yaml b/bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_finetune.yaml similarity index 86% rename from sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_finetune.yaml rename to bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_finetune.yaml index 7e0c2d6a10703185254d4e8fe41e3d44fc8f495a..77125b897e3b533fe8516443522c7cf9be4b1e42 100644 --- a/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_finetune.yaml +++ b/bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_finetune.yaml @@ -11,15 +11,6 @@ scratch: phases_per_epoch: 1 num_epochs: 40 -dataset: - # PATHS to Dataset - img_folder: /mnt/personal/purkrmir/data/COCO/original/train2017/ # PATH to MOSE JPEGImages folder - gt_folder: /mnt/personal/purkrmir/data/COCO/original/annotations/ # PATH to MOSE Annotations folder - # img_folder: /datagrid/personal/purkrmir/data/COCO/original/val2017/ # PATH to MOSE JPEGImages folder - # gt_folder: /datagrid/personal/purkrmir/data/COCO/original/annotations/ # PATH to MOSE Annotations folder - file_list_txt: null # Optional PATH to filelist containing a subset of videos to be used for training - multiplier: 2 - # Video transforms vos: train_transforms: @@ -69,19 +60,19 @@ trainer: unfreeze_decoder: False model: - _target_: training.model.sam2.SAM2Train + _target_: training.model.bboxmaskpose.sam2.sam2Train image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.ImageEncoder scalp: 1 trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera + _target_: bboxmaskpose.sam2.modeling.backbones.hieradet.Hiera embed_dim: 112 num_heads: 2 drop_path_rate: 0.1 neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.FpnNeck position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 256 normalize: true scale: null @@ -92,17 +83,17 @@ trainer: fpn_interp_model: nearest memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttention d_model: 256 pos_enc_at_input: true layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttentionLayer activation: relu dim_feedforward: 2048 dropout: 0.1 pos_enc_at_attn: false self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] embedding_dim: 256 @@ -113,7 +104,7 @@ trainer: pos_enc_at_cross_attn_keys: true pos_enc_at_cross_attn_queries: false cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] rope_k_repeat: True @@ -125,23 +116,23 @@ trainer: num_layers: 4 memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MemoryEncoder out_dim: 64 position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 64 normalize: true scale: null temperature: 10000 mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MaskDownSampler kernel_size: 3 stride: 2 padding: 1 fuser: - _target_: sam2.modeling.memory_encoder.Fuser + _target_: bboxmaskpose.sam2.modeling.memory_encoder.Fuser layer: - _target_: sam2.modeling.memory_encoder.CXBlock + _target_: bboxmaskpose.sam2.modeling.memory_encoder.CXBlock dim: 256 kernel_size: 7 padding: 3 @@ -325,7 +316,7 @@ trainer: state_dict: _target_: training.utils.checkpoint_utils.load_checkpoint_and_apply_kernels - checkpoint_path: ./checkpoints/sam2.1_hiera_base_plus.pt # PATH to SAM 2.1 checkpoint + checkpoint_path: ./checkpoints/bboxmaskpose.sam2.1_hiera_base_plus.pt # PATH to SAM 2.1 checkpoint ckpt_state_dict_keys: ['model'] launcher: diff --git a/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_finetune_prompt+decoder.yaml b/bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_finetune_prompt+decoder.yaml similarity index 86% rename from sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_finetune_prompt+decoder.yaml rename to bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_finetune_prompt+decoder.yaml index 48c020084162252c3af061592325dfbcd60de2b6..7145d48f71054ed4a4b92c1639595e7dd0817e47 100644 --- a/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_finetune_prompt+decoder.yaml +++ b/bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_COCO_finetune_prompt+decoder.yaml @@ -11,15 +11,6 @@ scratch: phases_per_epoch: 1 num_epochs: 40 -dataset: - # PATHS to Dataset - img_folder: /mnt/personal/purkrmir/data/COCO/original/train2017/ # PATH to MOSE JPEGImages folder - gt_folder: /mnt/personal/purkrmir/data/COCO/original/annotations/ # PATH to MOSE Annotations folder - # img_folder: /datagrid/personal/purkrmir/data/COCO/original/train2017/ # PATH to MOSE JPEGImages folder - # gt_folder: /datagrid/personal/purkrmir/data/COCO/original/annotations/ # PATH to MOSE Annotations folder - file_list_txt: null # Optional PATH to filelist containing a subset of videos to be used for training - multiplier: 2 - # Video transforms vos: train_transforms: @@ -69,19 +60,19 @@ trainer: unfreeze_decoder: True model: - _target_: training.model.sam2.SAM2Train + _target_: training.model.bboxmaskpose.sam2.sam2Train image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.ImageEncoder scalp: 1 trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera + _target_: bboxmaskpose.sam2.modeling.backbones.hieradet.Hiera embed_dim: 112 num_heads: 2 drop_path_rate: 0.1 neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.FpnNeck position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 256 normalize: true scale: null @@ -92,17 +83,17 @@ trainer: fpn_interp_model: nearest memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttention d_model: 256 pos_enc_at_input: true layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttentionLayer activation: relu dim_feedforward: 2048 dropout: 0.1 pos_enc_at_attn: false self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] embedding_dim: 256 @@ -113,7 +104,7 @@ trainer: pos_enc_at_cross_attn_keys: true pos_enc_at_cross_attn_queries: false cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] rope_k_repeat: True @@ -125,23 +116,23 @@ trainer: num_layers: 4 memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MemoryEncoder out_dim: 64 position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 64 normalize: true scale: null temperature: 10000 mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MaskDownSampler kernel_size: 3 stride: 2 padding: 1 fuser: - _target_: sam2.modeling.memory_encoder.Fuser + _target_: bboxmaskpose.sam2.modeling.memory_encoder.Fuser layer: - _target_: sam2.modeling.memory_encoder.CXBlock + _target_: bboxmaskpose.sam2.modeling.memory_encoder.CXBlock dim: 256 kernel_size: 7 padding: 3 @@ -325,7 +316,7 @@ trainer: state_dict: _target_: training.utils.checkpoint_utils.load_checkpoint_and_apply_kernels - checkpoint_path: ./checkpoints/sam2.1_hiera_base_plus.pt # PATH to SAM 2.1 checkpoint + checkpoint_path: ./checkpoints/bboxmaskpose.sam2.1_hiera_base_plus.pt # PATH to SAM 2.1 checkpoint ckpt_state_dict_keys: ['model'] launcher: diff --git a/sam2/configs/sam2.1_training/sam2.1_hiera_b+_MOSE_finetune.yaml b/bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_MOSE_finetune.yaml similarity index 87% rename from sam2/configs/sam2.1_training/sam2.1_hiera_b+_MOSE_finetune.yaml rename to bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_MOSE_finetune.yaml index 6e4c5a947f8a42bdcbf1048b1a16de127b3bd0e1..6285692b8f48ec3bb831ab270a6185714c9dc375 100644 --- a/sam2/configs/sam2.1_training/sam2.1_hiera_b+_MOSE_finetune.yaml +++ b/bboxmaskpose/sam2/configs/sam2.1_training/sam2.1_hiera_b+_MOSE_finetune.yaml @@ -11,12 +11,6 @@ scratch: phases_per_epoch: 1 num_epochs: 40 -dataset: - # PATHS to Dataset - img_folder: /datagrid/personal/purkrmir/data/MOSE/train/JPEGImages/ # PATH to MOSE JPEGImages folder - gt_folder: /datagrid/personal/purkrmir/data/MOSE/train/Annotations/ # PATH to MOSE Annotations folder - file_list_txt: training/assets/MOSE_sample_train_list.txt # Optional PATH to filelist containing a subset of videos to be used for training - multiplier: 2 # Video transforms vos: @@ -62,19 +56,19 @@ trainer: seed_value: 123 model: - _target_: training.model.sam2.SAM2Train + _target_: training.model.bboxmaskpose.sam2.sam2Train image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.ImageEncoder scalp: 1 trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera + _target_: bboxmaskpose.sam2.modeling.backbones.hieradet.Hiera embed_dim: 112 num_heads: 2 drop_path_rate: 0.1 neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.FpnNeck position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 256 normalize: true scale: null @@ -85,17 +79,17 @@ trainer: fpn_interp_model: nearest memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttention d_model: 256 pos_enc_at_input: true layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttentionLayer activation: relu dim_feedforward: 2048 dropout: 0.1 pos_enc_at_attn: false self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] embedding_dim: 256 @@ -106,7 +100,7 @@ trainer: pos_enc_at_cross_attn_keys: true pos_enc_at_cross_attn_queries: false cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [64, 64] rope_k_repeat: True @@ -118,23 +112,23 @@ trainer: num_layers: 4 memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MemoryEncoder out_dim: 64 position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 64 normalize: true scale: null temperature: 10000 mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MaskDownSampler kernel_size: 3 stride: 2 padding: 1 fuser: - _target_: sam2.modeling.memory_encoder.Fuser + _target_: bboxmaskpose.sam2.modeling.memory_encoder.Fuser layer: - _target_: sam2.modeling.memory_encoder.CXBlock + _target_: bboxmaskpose.sam2.modeling.memory_encoder.CXBlock dim: 256 kernel_size: 7 padding: 3 @@ -318,7 +312,7 @@ trainer: state_dict: _target_: training.utils.checkpoint_utils.load_checkpoint_and_apply_kernels - checkpoint_path: ./checkpoints/sam2.1_hiera_base_plus.pt # PATH to SAM 2.1 checkpoint + checkpoint_path: ./checkpoints/bboxmaskpose.sam2.1_hiera_base_plus.pt # PATH to SAM 2.1 checkpoint ckpt_state_dict_keys: ['model'] launcher: diff --git a/sam2/configs/sam2/sam2_hiera_b+.yaml b/bboxmaskpose/sam2/configs/sam2/sam2_hiera_b+.yaml similarity index 72% rename from sam2/configs/sam2/sam2_hiera_b+.yaml rename to bboxmaskpose/sam2/configs/sam2/sam2_hiera_b+.yaml index 58f3eb81554018e873f8515ecb98e36d16ac29e4..2bf8fb6acbdea4b208b18d9aad0dd8105bc4f11c 100644 --- a/sam2/configs/sam2/sam2_hiera_b+.yaml +++ b/bboxmaskpose/sam2/configs/sam2/sam2_hiera_b+.yaml @@ -2,18 +2,18 @@ # Model model: - _target_: sam2.modeling.sam2_base.SAM2Base + _target_: bboxmaskpose.sam2.modeling.sam2_base.SAM2Base image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.ImageEncoder scalp: 1 trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera + _target_: bboxmaskpose.sam2.modeling.backbones.hieradet.Hiera embed_dim: 112 num_heads: 2 neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.FpnNeck position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 256 normalize: true scale: null @@ -24,17 +24,17 @@ model: fpn_interp_model: nearest memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttention d_model: 256 pos_enc_at_input: true layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttentionLayer activation: relu dim_feedforward: 2048 dropout: 0.1 pos_enc_at_attn: false self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [32, 32] embedding_dim: 256 @@ -45,7 +45,7 @@ model: pos_enc_at_cross_attn_keys: true pos_enc_at_cross_attn_queries: false cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [32, 32] rope_k_repeat: True @@ -57,23 +57,23 @@ model: num_layers: 4 memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MemoryEncoder out_dim: 64 position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 64 normalize: true scale: null temperature: 10000 mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MaskDownSampler kernel_size: 3 stride: 2 padding: 1 fuser: - _target_: sam2.modeling.memory_encoder.Fuser + _target_: bboxmaskpose.sam2.modeling.memory_encoder.Fuser layer: - _target_: sam2.modeling.memory_encoder.CXBlock + _target_: bboxmaskpose.sam2.modeling.memory_encoder.CXBlock dim: 256 kernel_size: 7 padding: 3 diff --git a/sam2/configs/sam2/sam2_hiera_l.yaml b/bboxmaskpose/sam2/configs/sam2/sam2_hiera_l.yaml similarity index 73% rename from sam2/configs/sam2/sam2_hiera_l.yaml rename to bboxmaskpose/sam2/configs/sam2/sam2_hiera_l.yaml index 918667f50c3e1ad2dcf77c0c14cb4dd114cfd080..f89330c36f88265101a0ae503d2fea81a74be501 100644 --- a/sam2/configs/sam2/sam2_hiera_l.yaml +++ b/bboxmaskpose/sam2/configs/sam2/sam2_hiera_l.yaml @@ -2,12 +2,12 @@ # Model model: - _target_: sam2.modeling.sam2_base.SAM2Base + _target_: bboxmaskpose.sam2.modeling.sam2_base.SAM2Base image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.ImageEncoder scalp: 1 trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera + _target_: bboxmaskpose.sam2.modeling.backbones.hieradet.Hiera embed_dim: 144 num_heads: 2 stages: [2, 6, 36, 4] @@ -15,9 +15,9 @@ model: window_pos_embed_bkg_spatial_size: [7, 7] window_spec: [8, 4, 16, 8] neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.FpnNeck position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 256 normalize: true scale: null @@ -28,17 +28,17 @@ model: fpn_interp_model: nearest memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttention d_model: 256 pos_enc_at_input: true layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttentionLayer activation: relu dim_feedforward: 2048 dropout: 0.1 pos_enc_at_attn: false self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [32, 32] embedding_dim: 256 @@ -49,7 +49,7 @@ model: pos_enc_at_cross_attn_keys: true pos_enc_at_cross_attn_queries: false cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [32, 32] rope_k_repeat: True @@ -61,23 +61,23 @@ model: num_layers: 4 memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MemoryEncoder out_dim: 64 position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 64 normalize: true scale: null temperature: 10000 mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MaskDownSampler kernel_size: 3 stride: 2 padding: 1 fuser: - _target_: sam2.modeling.memory_encoder.Fuser + _target_: bboxmaskpose.sam2.modeling.memory_encoder.Fuser layer: - _target_: sam2.modeling.memory_encoder.CXBlock + _target_: bboxmaskpose.sam2.modeling.memory_encoder.CXBlock dim: 256 kernel_size: 7 padding: 3 diff --git a/sam2/configs/sam2/sam2_hiera_s.yaml b/bboxmaskpose/sam2/configs/sam2/sam2_hiera_s.yaml similarity index 72% rename from sam2/configs/sam2/sam2_hiera_s.yaml rename to bboxmaskpose/sam2/configs/sam2/sam2_hiera_s.yaml index 26e5d4d39f7b2892396106005c37c7ffe6c83bc2..c5d370ae8334665a5647e5b9e2316031e6158a36 100644 --- a/sam2/configs/sam2/sam2_hiera_s.yaml +++ b/bboxmaskpose/sam2/configs/sam2/sam2_hiera_s.yaml @@ -2,21 +2,21 @@ # Model model: - _target_: sam2.modeling.sam2_base.SAM2Base + _target_: bboxmaskpose.sam2.modeling.sam2_base.SAM2Base image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.ImageEncoder scalp: 1 trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera + _target_: bboxmaskpose.sam2.modeling.backbones.hieradet.Hiera embed_dim: 96 num_heads: 1 stages: [1, 2, 11, 2] global_att_blocks: [7, 10, 13] window_pos_embed_bkg_spatial_size: [7, 7] neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.FpnNeck position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 256 normalize: true scale: null @@ -27,17 +27,17 @@ model: fpn_interp_model: nearest memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttention d_model: 256 pos_enc_at_input: true layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttentionLayer activation: relu dim_feedforward: 2048 dropout: 0.1 pos_enc_at_attn: false self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [32, 32] embedding_dim: 256 @@ -48,7 +48,7 @@ model: pos_enc_at_cross_attn_keys: true pos_enc_at_cross_attn_queries: false cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [32, 32] rope_k_repeat: True @@ -60,23 +60,23 @@ model: num_layers: 4 memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MemoryEncoder out_dim: 64 position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 64 normalize: true scale: null temperature: 10000 mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MaskDownSampler kernel_size: 3 stride: 2 padding: 1 fuser: - _target_: sam2.modeling.memory_encoder.Fuser + _target_: bboxmaskpose.sam2.modeling.memory_encoder.Fuser layer: - _target_: sam2.modeling.memory_encoder.CXBlock + _target_: bboxmaskpose.sam2.modeling.memory_encoder.CXBlock dim: 256 kernel_size: 7 padding: 3 diff --git a/sam2/configs/sam2/sam2_hiera_t.yaml b/bboxmaskpose/sam2/configs/sam2/sam2_hiera_t.yaml similarity index 73% rename from sam2/configs/sam2/sam2_hiera_t.yaml rename to bboxmaskpose/sam2/configs/sam2/sam2_hiera_t.yaml index a62c903aaa5f80828077c6e06a59626926570ed6..f0fb533e8222bb359804a1ae6025ae8bec5f7064 100644 --- a/sam2/configs/sam2/sam2_hiera_t.yaml +++ b/bboxmaskpose/sam2/configs/sam2/sam2_hiera_t.yaml @@ -2,21 +2,21 @@ # Model model: - _target_: sam2.modeling.sam2_base.SAM2Base + _target_: bboxmaskpose.sam2.modeling.sam2_base.SAM2Base image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.ImageEncoder scalp: 1 trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera + _target_: bboxmaskpose.sam2.modeling.backbones.hieradet.Hiera embed_dim: 96 num_heads: 1 stages: [1, 2, 7, 2] global_att_blocks: [5, 7, 9] window_pos_embed_bkg_spatial_size: [7, 7] neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck + _target_: bboxmaskpose.sam2.modeling.backbones.image_encoder.FpnNeck position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 256 normalize: true scale: null @@ -27,17 +27,17 @@ model: fpn_interp_model: nearest memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttention d_model: 256 pos_enc_at_input: true layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer + _target_: bboxmaskpose.sam2.modeling.memory_attention.MemoryAttentionLayer activation: relu dim_feedforward: 2048 dropout: 0.1 pos_enc_at_attn: false self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [32, 32] embedding_dim: 256 @@ -48,7 +48,7 @@ model: pos_enc_at_cross_attn_keys: true pos_enc_at_cross_attn_queries: false cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention + _target_: bboxmaskpose.sam2.modeling.sam.transformer.RoPEAttention rope_theta: 10000.0 feat_sizes: [32, 32] rope_k_repeat: True @@ -60,23 +60,23 @@ model: num_layers: 4 memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MemoryEncoder out_dim: 64 position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine + _target_: bboxmaskpose.sam2.modeling.position_encoding.PositionEmbeddingSine num_pos_feats: 64 normalize: true scale: null temperature: 10000 mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler + _target_: bboxmaskpose.sam2.modeling.memory_encoder.MaskDownSampler kernel_size: 3 stride: 2 padding: 1 fuser: - _target_: sam2.modeling.memory_encoder.Fuser + _target_: bboxmaskpose.sam2.modeling.memory_encoder.Fuser layer: - _target_: sam2.modeling.memory_encoder.CXBlock + _target_: bboxmaskpose.sam2.modeling.memory_encoder.CXBlock dim: 256 kernel_size: 7 padding: 3 diff --git a/sam2/csrc/connected_components.cu b/bboxmaskpose/sam2/csrc/connected_components.cu similarity index 100% rename from sam2/csrc/connected_components.cu rename to bboxmaskpose/sam2/csrc/connected_components.cu diff --git a/sam2/distinctipy.py b/bboxmaskpose/sam2/distinctipy.py similarity index 95% rename from sam2/distinctipy.py rename to bboxmaskpose/sam2/distinctipy.py index 2044b335e2bac2a24a4eab908b64b37be829bfa8..334c9ef5cc248f1e2358c430c680c840ee5897e4 100644 --- a/sam2/distinctipy.py +++ b/bboxmaskpose/sam2/distinctipy.py @@ -1,3 +1,5 @@ +# Adapted from the distinctipy repository (https://github.com/alan-turing-institute/distinctipy). +# Original authors: distinctipy contributors. Included with minor modifications. import math import random @@ -125,9 +127,7 @@ def color_distance(c1, c2): return distance -def distinct_color( - exclude_colors, pastel_factor=0.0, n_attempts=1000, colorblind_type=None, rng=None -): +def distinct_color(exclude_colors, pastel_factor=0.0, n_attempts=1000, colorblind_type=None, rng=None): """ Generate a colour as distinct as possible from the colours defined in exclude_colors Inspired by: https://gist.github.com/adewes/5884820 @@ -164,10 +164,7 @@ def distinct_color( return get_random_color(pastel_factor=pastel_factor, rng=rng) if colorblind_type: - exclude_colors = [ - colorblind.colorblind_filter(color, colorblind_type) - for color in exclude_colors - ] + exclude_colors = [colorblind.colorblind_filter(color, colorblind_type) for color in exclude_colors] max_distance = None best_color = None @@ -181,9 +178,7 @@ def distinct_color( else: compare_color = color - distance_to_nearest = min( - [color_distance(compare_color, c) for c in exclude_colors] - ) + distance_to_nearest = min([color_distance(compare_color, c) for c in exclude_colors]) if (not max_distance) or (distance_to_nearest > max_distance): max_distance = distance_to_nearest @@ -202,9 +197,7 @@ def distinct_color( else: compare_color = color - distance_to_nearest = min( - [color_distance(compare_color, c) for c in exclude_colors] - ) + distance_to_nearest = min([color_distance(compare_color, c) for c in exclude_colors]) if (not max_distance) or (distance_to_nearest > max_distance): max_distance = distance_to_nearest @@ -500,4 +493,4 @@ def get_colormap(list_of_colors, name="distinctipy"): cmap = matplotlib.colors.ListedColormap(list_of_colors, name=name) - return cmap \ No newline at end of file + return cmap diff --git a/sam2/modeling/__init__.py b/bboxmaskpose/sam2/modeling/__init__.py similarity index 100% rename from sam2/modeling/__init__.py rename to bboxmaskpose/sam2/modeling/__init__.py diff --git a/sam2/modeling/backbones/__init__.py b/bboxmaskpose/sam2/modeling/backbones/__init__.py similarity index 100% rename from sam2/modeling/backbones/__init__.py rename to bboxmaskpose/sam2/modeling/backbones/__init__.py diff --git a/sam2/modeling/backbones/hieradet.py b/bboxmaskpose/sam2/modeling/backbones/hieradet.py similarity index 88% rename from sam2/modeling/backbones/hieradet.py rename to bboxmaskpose/sam2/modeling/backbones/hieradet.py index 19ac77b61d8e1345a301686d39ef2ab6e4b035fb..8adc9ba14f7a5efe770e917c0a7dac103191dfe1 100644 --- a/sam2/modeling/backbones/hieradet.py +++ b/bboxmaskpose/sam2/modeling/backbones/hieradet.py @@ -11,15 +11,10 @@ from typing import List, Tuple, Union import torch import torch.nn as nn import torch.nn.functional as F -from iopath.common.file_io import g_pathmgr - -from sam2.modeling.backbones.utils import ( - PatchEmbed, - window_partition, - window_unpartition, -) -from sam2.modeling.sam2_utils import DropPath, MLP +from bboxmaskpose.sam2.modeling.backbones.utils import PatchEmbed, window_partition, window_unpartition +from bboxmaskpose.sam2.modeling.sam2_utils import MLP, DropPath +from iopath.common.file_io import g_pathmgr def do_pool(x: torch.Tensor, pool: nn.Module, norm: nn.Module = None) -> torch.Tensor: @@ -107,9 +102,7 @@ class MultiScaleBlock(nn.Module): self.pool, self.q_stride = None, q_stride if self.q_stride: - self.pool = nn.MaxPool2d( - kernel_size=q_stride, stride=q_stride, ceil_mode=False - ) + self.pool = nn.MaxPool2d(kernel_size=q_stride, stride=q_stride, ceil_mode=False) self.attn = MultiScaleAttention( dim, @@ -218,16 +211,10 @@ class Hiera(nn.Module): # Windowed positional embedding (https://arxiv.org/abs/2311.05613) self.window_pos_embed_bkg_spatial_size = window_pos_embed_bkg_spatial_size - self.pos_embed = nn.Parameter( - torch.zeros(1, embed_dim, *self.window_pos_embed_bkg_spatial_size) - ) - self.pos_embed_window = nn.Parameter( - torch.zeros(1, embed_dim, self.window_spec[0], self.window_spec[0]) - ) + self.pos_embed = nn.Parameter(torch.zeros(1, embed_dim, *self.window_pos_embed_bkg_spatial_size)) + self.pos_embed_window = nn.Parameter(torch.zeros(1, embed_dim, self.window_spec[0], self.window_spec[0])) - dpr = [ - x.item() for x in torch.linspace(0, drop_path_rate, depth) - ] # stochastic depth decay rule + dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)] # stochastic depth decay rule cur_stage = 1 self.blocks = nn.ModuleList() @@ -259,11 +246,7 @@ class Hiera(nn.Module): embed_dim = dim_out self.blocks.append(block) - self.channel_list = ( - [self.blocks[i].dim_out for i in self.stage_ends[::-1]] - if return_interm_layers - else [self.blocks[-1].dim_out] - ) + self.channel_list = [self.blocks[i].dim_out for i in self.stage_ends[::-1]] if return_interm_layers else [self.blocks[-1].dim_out] if weights_path is not None: with g_pathmgr.open(weights_path, "rb") as f: @@ -274,9 +257,7 @@ class Hiera(nn.Module): h, w = hw window_embed = self.pos_embed_window pos_embed = F.interpolate(self.pos_embed, size=(h, w), mode="bicubic") - pos_embed = pos_embed + window_embed.tile( - [x // y for x, y in zip(pos_embed.shape, window_embed.shape)] - ) + pos_embed = pos_embed + window_embed.tile([x // y for x, y in zip(pos_embed.shape, window_embed.shape)]) pos_embed = pos_embed.permute(0, 2, 3, 1) return pos_embed @@ -290,9 +271,7 @@ class Hiera(nn.Module): outputs = [] for i, blk in enumerate(self.blocks): x = blk(x) - if (i == self.stage_ends[-1]) or ( - i in self.stage_ends and self.return_interm_layers - ): + if (i == self.stage_ends[-1]) or (i in self.stage_ends and self.return_interm_layers): feats = x.permute(0, 3, 1, 2) outputs.append(feats) diff --git a/sam2/modeling/backbones/image_encoder.py b/bboxmaskpose/sam2/modeling/backbones/image_encoder.py similarity index 97% rename from sam2/modeling/backbones/image_encoder.py rename to bboxmaskpose/sam2/modeling/backbones/image_encoder.py index 37e9266bc98596e97ca303118c910ed24f6cee2c..ebb7537029826908b146b1117176320f8f81be9d 100644 --- a/sam2/modeling/backbones/image_encoder.py +++ b/bboxmaskpose/sam2/modeling/backbones/image_encoder.py @@ -117,9 +117,7 @@ class FpnNeck(nn.Module): prev_features.to(dtype=torch.float32), scale_factor=2.0, mode=self.fpn_interp_model, - align_corners=( - None if self.fpn_interp_model == "nearest" else False - ), + align_corners=(None if self.fpn_interp_model == "nearest" else False), antialias=False, ) prev_features = lateral_features + top_down_features diff --git a/sam2/modeling/backbones/utils.py b/bboxmaskpose/sam2/modeling/backbones/utils.py similarity index 92% rename from sam2/modeling/backbones/utils.py rename to bboxmaskpose/sam2/modeling/backbones/utils.py index 930b1b7622e7b0e7270120dcafccc242ef0f4f28..ff6707217e4c9b6f7da000ddd5512762056a1959 100644 --- a/sam2/modeling/backbones/utils.py +++ b/bboxmaskpose/sam2/modeling/backbones/utils.py @@ -50,9 +50,7 @@ def window_unpartition(windows, window_size, pad_hw, hw): Hp, Wp = pad_hw H, W = hw B = windows.shape[0] // (Hp * Wp // window_size // window_size) - x = windows.reshape( - B, Hp // window_size, Wp // window_size, window_size, window_size, -1 - ) + x = windows.reshape(B, Hp // window_size, Wp // window_size, window_size, window_size, -1) x = x.permute(0, 1, 3, 2, 4, 5).reshape(B, Hp, Wp, -1) if Hp > H or Wp > W: @@ -82,9 +80,7 @@ class PatchEmbed(nn.Module): embed_dim (int): embed_dim (int): Patch embedding dimension. """ super().__init__() - self.proj = nn.Conv2d( - in_chans, embed_dim, kernel_size=kernel_size, stride=stride, padding=padding - ) + self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=kernel_size, stride=stride, padding=padding) def forward(self, x: torch.Tensor) -> torch.Tensor: x = self.proj(x) diff --git a/sam2/modeling/memory_attention.py b/bboxmaskpose/sam2/modeling/memory_attention.py similarity index 94% rename from sam2/modeling/memory_attention.py rename to bboxmaskpose/sam2/modeling/memory_attention.py index 0b07f9d87e3d8194ca5e11fc20f01604d591a59d..1efa31511bbecab2792386d0337150273843eb3d 100644 --- a/sam2/modeling/memory_attention.py +++ b/bboxmaskpose/sam2/modeling/memory_attention.py @@ -7,11 +7,10 @@ from typing import Optional import torch -from torch import nn, Tensor +from torch import Tensor, nn -from sam2.modeling.sam.transformer import RoPEAttention - -from sam2.modeling.sam2_utils import get_activation_fn, get_clones +from bboxmaskpose.sam2.modeling.sam2_utils import get_activation_fn, get_clones +from bboxmaskpose.sam2.modeling.sam.transformer import RoPEAttention class MemoryAttentionLayer(nn.Module): @@ -132,9 +131,7 @@ class MemoryAttention(nn.Module): curr_pos[0], ) - assert ( - curr.shape[1] == memory.shape[1] - ), "Batch size must be the same for curr and memory" + assert curr.shape[1] == memory.shape[1], "Batch size must be the same for curr and memory" output = curr if self.pos_enc_at_input and curr_pos is not None: diff --git a/sam2/modeling/memory_encoder.py b/bboxmaskpose/sam2/modeling/memory_encoder.py similarity index 93% rename from sam2/modeling/memory_encoder.py rename to bboxmaskpose/sam2/modeling/memory_encoder.py index f60202dfaba87232c3870fb2101b5322a119d985..ed43ac91eeb78e623ae3210345068a46bf8af2a5 100644 --- a/sam2/modeling/memory_encoder.py +++ b/bboxmaskpose/sam2/modeling/memory_encoder.py @@ -11,7 +11,7 @@ import torch import torch.nn as nn import torch.nn.functional as F -from sam2.modeling.sam2_utils import DropPath, get_clones, LayerNorm2d +from bboxmaskpose.sam2.modeling.sam2_utils import DropPath, LayerNorm2d, get_clones class MaskDownSampler(nn.Module): @@ -89,16 +89,10 @@ class CXBlock(nn.Module): groups=dim if use_dwconv else 1, ) # depthwise conv self.norm = LayerNorm2d(dim, eps=1e-6) - self.pwconv1 = nn.Linear( - dim, 4 * dim - ) # pointwise/1x1 convs, implemented with linear layers + self.pwconv1 = nn.Linear(dim, 4 * dim) # pointwise/1x1 convs, implemented with linear layers self.act = nn.GELU() self.pwconv2 = nn.Linear(4 * dim, dim) - self.gamma = ( - nn.Parameter(layer_scale_init_value * torch.ones((dim)), requires_grad=True) - if layer_scale_init_value > 0 - else None - ) + self.gamma = nn.Parameter(layer_scale_init_value * torch.ones((dim)), requires_grad=True) if layer_scale_init_value > 0 else None self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity() def forward(self, x): diff --git a/sam2/modeling/position_encoding.py b/bboxmaskpose/sam2/modeling/position_encoding.py similarity index 88% rename from sam2/modeling/position_encoding.py rename to bboxmaskpose/sam2/modeling/position_encoding.py index 2241d4cf1a4495b4c67dc35cbed1c606357b9b7a..16c4f1fd08592d92920ab264cff1f7cce2e08019 100644 --- a/sam2/modeling/position_encoding.py +++ b/bboxmaskpose/sam2/modeling/position_encoding.py @@ -8,7 +8,6 @@ import math from typing import Any, Optional, Tuple import numpy as np - import torch from torch import nn @@ -61,12 +60,8 @@ class PositionEmbeddingSine(nn.Module): pos_x = x_embed[:, None] / dim_t pos_y = y_embed[:, None] / dim_t - pos_x = torch.stack( - (pos_x[:, 0::2].sin(), pos_x[:, 1::2].cos()), dim=2 - ).flatten(1) - pos_y = torch.stack( - (pos_y[:, 0::2].sin(), pos_y[:, 1::2].cos()), dim=2 - ).flatten(1) + pos_x = torch.stack((pos_x[:, 0::2].sin(), pos_x[:, 1::2].cos()), dim=2).flatten(1) + pos_y = torch.stack((pos_y[:, 0::2].sin(), pos_y[:, 1::2].cos()), dim=2).flatten(1) return pos_x, pos_y @torch.no_grad() @@ -92,16 +87,8 @@ class PositionEmbeddingSine(nn.Module): if cache_key in self.cache: return self.cache[cache_key].to(device)[None].repeat(B, 1, 1, 1) - y_embed = ( - torch.arange(1, H + 1, dtype=torch.float32, device=device) - .view(1, -1, 1) - .repeat(B, 1, W) - ) - x_embed = ( - torch.arange(1, W + 1, dtype=torch.float32, device=device) - .view(1, 1, -1) - .repeat(B, H, 1) - ) + y_embed = torch.arange(1, H + 1, dtype=torch.float32, device=device).view(1, -1, 1).repeat(B, 1, W) + x_embed = torch.arange(1, W + 1, dtype=torch.float32, device=device).view(1, 1, -1).repeat(B, H, 1) if self.normalize: eps = 1e-6 @@ -113,12 +100,8 @@ class PositionEmbeddingSine(nn.Module): pos_x = x_embed[:, :, :, None] / dim_t pos_y = y_embed[:, :, :, None] / dim_t - pos_x = torch.stack( - (pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4 - ).flatten(3) - pos_y = torch.stack( - (pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4 - ).flatten(3) + pos_x = torch.stack((pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4).flatten(3) + pos_y = torch.stack((pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4).flatten(3) pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2) self.cache[cache_key] = pos[0] return pos @@ -166,9 +149,7 @@ class PositionEmbeddingRandom(nn.Module): pe = self._pe_encoding(torch.stack([x_embed, y_embed], dim=-1)) return pe.permute(2, 0, 1) # C x H x W - def forward_with_coords( - self, coords_input: torch.Tensor, image_size: Tuple[int, int] - ) -> torch.Tensor: + def forward_with_coords(self, coords_input: torch.Tensor, image_size: Tuple[int, int]) -> torch.Tensor: """Positionally encode points that are not normalized to [0,1].""" coords = coords_input.clone() coords[:, :, 0] = coords[:, :, 0] / image_size[1] @@ -216,11 +197,7 @@ def apply_rotary_enc( repeat_freqs_k: bool = False, ): xq_ = torch.view_as_complex(xq.float().reshape(*xq.shape[:-1], -1, 2)) - xk_ = ( - torch.view_as_complex(xk.float().reshape(*xk.shape[:-1], -1, 2)) - if xk.shape[-2] != 0 - else None - ) + xk_ = torch.view_as_complex(xk.float().reshape(*xk.shape[:-1], -1, 2)) if xk.shape[-2] != 0 else None freqs_cis = reshape_for_broadcast(freqs_cis, xq_) xq_out = torch.view_as_real(xq_ * freqs_cis).flatten(3) if xk_ is None: diff --git a/sam2/modeling/sam/__init__.py b/bboxmaskpose/sam2/modeling/sam/__init__.py similarity index 100% rename from sam2/modeling/sam/__init__.py rename to bboxmaskpose/sam2/modeling/sam/__init__.py diff --git a/sam2/modeling/sam/mask_decoder.py b/bboxmaskpose/sam2/modeling/sam/mask_decoder.py similarity index 89% rename from sam2/modeling/sam/mask_decoder.py rename to bboxmaskpose/sam2/modeling/sam/mask_decoder.py index 9bebc0366b2703ffcb80a44bfd19cce8339b4fed..3ccb9daf3ac96b7502ad83428ca7c182ff47c41d 100644 --- a/sam2/modeling/sam/mask_decoder.py +++ b/bboxmaskpose/sam2/modeling/sam/mask_decoder.py @@ -9,7 +9,7 @@ from typing import List, Optional, Tuple, Type import torch from torch import nn -from sam2.modeling.sam2_utils import LayerNorm2d, MLP +from bboxmaskpose.sam2.modeling.sam2_utils import MLP, LayerNorm2d class MaskDecoder(nn.Module): @@ -63,30 +63,19 @@ class MaskDecoder(nn.Module): self.use_multimask_token_for_obj_ptr = use_multimask_token_for_obj_ptr self.output_upscaling = nn.Sequential( - nn.ConvTranspose2d( - transformer_dim, transformer_dim // 4, kernel_size=2, stride=2 - ), + nn.ConvTranspose2d(transformer_dim, transformer_dim // 4, kernel_size=2, stride=2), LayerNorm2d(transformer_dim // 4), activation(), - nn.ConvTranspose2d( - transformer_dim // 4, transformer_dim // 8, kernel_size=2, stride=2 - ), + nn.ConvTranspose2d(transformer_dim // 4, transformer_dim // 8, kernel_size=2, stride=2), activation(), ) self.use_high_res_features = use_high_res_features if use_high_res_features: - self.conv_s0 = nn.Conv2d( - transformer_dim, transformer_dim // 8, kernel_size=1, stride=1 - ) - self.conv_s1 = nn.Conv2d( - transformer_dim, transformer_dim // 4, kernel_size=1, stride=1 - ) + self.conv_s0 = nn.Conv2d(transformer_dim, transformer_dim // 8, kernel_size=1, stride=1) + self.conv_s1 = nn.Conv2d(transformer_dim, transformer_dim // 4, kernel_size=1, stride=1) self.output_hypernetworks_mlps = nn.ModuleList( - [ - MLP(transformer_dim, transformer_dim, transformer_dim // 8, 3) - for i in range(self.num_mask_tokens) - ] + [MLP(transformer_dim, transformer_dim, transformer_dim // 8, 3) for i in range(self.num_mask_tokens)] ) self.iou_prediction_head = MLP( @@ -188,12 +177,8 @@ class MaskDecoder(nn.Module): ) s = 1 else: - output_tokens = torch.cat( - [self.iou_token.weight, self.mask_tokens.weight], dim=0 - ) - output_tokens = output_tokens.unsqueeze(0).expand( - sparse_prompt_embeddings.size(0), -1, -1 - ) + output_tokens = torch.cat([self.iou_token.weight, self.mask_tokens.weight], dim=0) + output_tokens = output_tokens.unsqueeze(0).expand(sparse_prompt_embeddings.size(0), -1, -1) tokens = torch.cat((output_tokens, sparse_prompt_embeddings), dim=1) # Expand per-image data in batch direction to be per-mask @@ -203,9 +188,7 @@ class MaskDecoder(nn.Module): assert image_embeddings.shape[0] == tokens.shape[0] src = image_embeddings src = src + dense_prompt_embeddings - assert ( - image_pe.size(0) == 1 - ), "image_pe should have size 1 in batch dim (from `get_dense_pe()`)" + assert image_pe.size(0) == 1, "image_pe should have size 1 in batch dim (from `get_dense_pe()`)" pos_src = torch.repeat_interleave(image_pe, tokens.shape[0], dim=0) b, c, h, w = src.shape @@ -226,9 +209,7 @@ class MaskDecoder(nn.Module): hyper_in_list: List[torch.Tensor] = [] for i in range(self.num_mask_tokens): - hyper_in_list.append( - self.output_hypernetworks_mlps[i](mask_tokens_out[:, i, :]) - ) + hyper_in_list.append(self.output_hypernetworks_mlps[i](mask_tokens_out[:, i, :])) hyper_in = torch.stack(hyper_in_list, dim=1) b, c, h, w = upscaled_embedding.shape masks = (hyper_in @ upscaled_embedding.view(b, c, h * w)).view(b, -1, h, w) @@ -267,9 +248,7 @@ class MaskDecoder(nn.Module): multimask_logits = all_mask_logits[:, 1:, :, :] multimask_iou_scores = all_iou_scores[:, 1:] best_scores_inds = torch.argmax(multimask_iou_scores, dim=-1) - batch_inds = torch.arange( - multimask_iou_scores.size(0), device=all_iou_scores.device - ) + batch_inds = torch.arange(multimask_iou_scores.size(0), device=all_iou_scores.device) best_multimask_logits = multimask_logits[batch_inds, best_scores_inds] best_multimask_logits = best_multimask_logits.unsqueeze(1) best_multimask_iou_scores = multimask_iou_scores[batch_inds, best_scores_inds] diff --git a/sam2/modeling/sam/pose_encoder.py b/bboxmaskpose/sam2/modeling/sam/pose_encoder.py similarity index 89% rename from sam2/modeling/sam/pose_encoder.py rename to bboxmaskpose/sam2/modeling/sam/pose_encoder.py index 6b48f57ebf6de91bae2e94c0307df8663724b1fc..340f4a5f77a458c21d68e833a752635411c1adde 100644 --- a/sam2/modeling/sam/pose_encoder.py +++ b/bboxmaskpose/sam2/modeling/sam/pose_encoder.py @@ -9,9 +9,8 @@ from typing import Optional, Tuple, Type import torch from torch import nn -from sam2.modeling.position_encoding import PositionEmbeddingRandom - -from sam2.modeling.sam2_utils import LayerNorm2d +from bboxmaskpose.sam2.modeling.position_encoding import PositionEmbeddingRandom +from bboxmaskpose.sam2.modeling.sam2_utils import LayerNorm2d class PoseEncoder(nn.Module): @@ -44,9 +43,7 @@ class PoseEncoder(nn.Module): self.pe_layer = PositionEmbeddingRandom(embed_dim // 2) self.num_point_embeddings: int = 17 # 17 COCO keypoints - point_embeddings = [ - nn.Embedding(1, embed_dim) for i in range(self.num_point_embeddings) - ] + point_embeddings = [nn.Embedding(1, embed_dim) for i in range(self.num_point_embeddings)] self.point_embeddings = nn.ModuleList(point_embeddings) self.not_a_point_embed = nn.Embedding(1, embed_dim) @@ -89,17 +86,12 @@ class PoseEncoder(nn.Module): padding_label = -torch.ones((labels.shape[0], 1), device=labels.device) points = torch.cat([points, padding_point], dim=1) labels = torch.cat([labels, padding_label], dim=1) - point_embedding = self.pe_layer.forward_with_coords( - points, self.input_image_size - ) + point_embedding = self.pe_layer.forward_with_coords(points, self.input_image_size) kpt_embeddings = torch.cat([self.point_embeddings[i].weight for i in range(self.num_point_embeddings)], dim=0) negative_embedding = torch.zeros_like(point_embedding) + self.not_a_point_embed.weight positive_embedding = point_embedding + kpt_embeddings - weighted_embedding = ( - positive_embedding * labels.unsqueeze(-1).float() + - negative_embedding * (1 - labels.unsqueeze(-1).float()) - ) + weighted_embedding = positive_embedding * labels.unsqueeze(-1).float() + negative_embedding * (1 - labels.unsqueeze(-1).float()) point_embedding = torch.where( (labels == 0).unsqueeze(-1), @@ -112,9 +104,7 @@ class PoseEncoder(nn.Module): """Embeds box prompts.""" boxes = boxes + 0.5 # Shift to center of pixel coords = boxes.reshape(-1, 2, 2) - corner_embedding = self.pe_layer.forward_with_coords( - coords, self.input_image_size - ) + corner_embedding = self.pe_layer.forward_with_coords(coords, self.input_image_size) corner_embedding[:, 0, :] += self.point_embeddings[2].weight corner_embedding[:, 1, :] += self.point_embeddings[3].weight return corner_embedding @@ -170,9 +160,7 @@ class PoseEncoder(nn.Module): Bx(embed_dim)x(embed_H)x(embed_W) """ bs = self._get_batch_size(points, boxes, masks) - sparse_embeddings = torch.empty( - (bs, 0, self.embed_dim), device=self._get_device() - ) + sparse_embeddings = torch.empty((bs, 0, self.embed_dim), device=self._get_device()) if points is not None: coords, labels = points point_embeddings = self._embed_points(coords, labels, pad=(boxes is None)) diff --git a/sam2/modeling/sam/prompt_encoder.py b/bboxmaskpose/sam2/modeling/sam/prompt_encoder.py similarity index 82% rename from sam2/modeling/sam/prompt_encoder.py rename to bboxmaskpose/sam2/modeling/sam/prompt_encoder.py index abdd9cb72da9eb86f46a03f4a6fc0f90c0cdd451..12386948b8e5cbc71d4f33170d1d7a29f32fe152 100644 --- a/sam2/modeling/sam/prompt_encoder.py +++ b/bboxmaskpose/sam2/modeling/sam/prompt_encoder.py @@ -6,12 +6,12 @@ from typing import Optional, Tuple, Type +import numpy as np import torch from torch import nn -from sam2.modeling.position_encoding import PositionEmbeddingRandom - -from sam2.modeling.sam2_utils import LayerNorm2d +from bboxmaskpose.sam2.modeling.position_encoding import PositionEmbeddingRandom +from bboxmaskpose.sam2.modeling.sam2_utils import LayerNorm2d class PromptEncoder(nn.Module): @@ -22,6 +22,7 @@ class PromptEncoder(nn.Module): input_image_size: Tuple[int, int], mask_in_chans: int, activation: Type[nn.Module] = nn.GELU, + n_kpts_encoder: int = -1, ) -> None: """ Encodes prompts for input to SAM's mask decoder. @@ -44,9 +45,7 @@ class PromptEncoder(nn.Module): self.pe_layer = PositionEmbeddingRandom(embed_dim // 2) self.num_point_embeddings: int = 4 # pos/neg point + 2 box corners - point_embeddings = [ - nn.Embedding(1, embed_dim) for i in range(self.num_point_embeddings) - ] + point_embeddings = [nn.Embedding(1, embed_dim) for i in range(self.num_point_embeddings)] self.point_embeddings = nn.ModuleList(point_embeddings) self.not_a_point_embed = nn.Embedding(1, embed_dim) @@ -63,6 +62,7 @@ class PromptEncoder(nn.Module): activation(), nn.Conv2d(mask_in_chans, embed_dim, kernel_size=1), ) + self.n_kpts_encoder = n_kpts_encoder self.no_mask_embed = nn.Embedding(1, embed_dim) def get_dense_pe(self) -> torch.Tensor: @@ -76,45 +76,41 @@ class PromptEncoder(nn.Module): """ return self.pe_layer(self.image_embedding_size).unsqueeze(0) - def _embed_points( - self, - points: torch.Tensor, - labels: torch.Tensor, - pad: bool, + def _embed_points( ## embeds the points into a high-dimensional space (e.g., 256-dim) using learned embeddings + self, points: torch.Tensor, labels: torch.Tensor, pad: bool, normalize: bool ) -> torch.Tensor: """Embeds point prompts.""" + # print("EMBED points ", points) # KPTS OUTPUT points = points + 0.5 # Shift to center of pixel if pad: padding_point = torch.zeros((points.shape[0], 1, 2), device=points.device) padding_label = -torch.ones((labels.shape[0], 1), device=labels.device) points = torch.cat([points, padding_point], dim=1) labels = torch.cat([labels, padding_label], dim=1) - point_embedding = self.pe_layer.forward_with_coords( - points, self.input_image_size - ) + point_embedding = self.pe_layer.forward_with_coords(points, self.input_image_size) point_embedding = torch.where( (labels == -1).unsqueeze(-1), torch.zeros_like(point_embedding) + self.not_a_point_embed.weight, point_embedding, ) - point_embedding = torch.where( + point_embedding = torch.where( ## negative pts (labels == 0).unsqueeze(-1), point_embedding + self.point_embeddings[0].weight, point_embedding, ) point_embedding = torch.where( - (labels == 1).unsqueeze(-1), + (labels == 1).unsqueeze(-1), ## positive pts point_embedding + self.point_embeddings[1].weight, point_embedding, ) point_embedding = torch.where( - (labels == 2).unsqueeze(-1), + (labels == 2).unsqueeze(-1), ## bbox top left point_embedding + self.point_embeddings[2].weight, point_embedding, ) point_embedding = torch.where( - (labels == 3).unsqueeze(-1), + (labels == 3).unsqueeze(-1), ## bbox bottom right point_embedding + self.point_embeddings[3].weight, point_embedding, ) @@ -124,9 +120,7 @@ class PromptEncoder(nn.Module): """Embeds box prompts.""" boxes = boxes + 0.5 # Shift to center of pixel coords = boxes.reshape(-1, 2, 2) - corner_embedding = self.pe_layer.forward_with_coords( - coords, self.input_image_size - ) + corner_embedding = self.pe_layer.forward_with_coords(coords, self.input_image_size) corner_embedding[:, 0, :] += self.point_embeddings[2].weight corner_embedding[:, 1, :] += self.point_embeddings[3].weight return corner_embedding @@ -160,9 +154,9 @@ class PromptEncoder(nn.Module): def forward( self, points: Optional[Tuple[torch.Tensor, torch.Tensor]], - # skeletons: Optional[Tuple[torch.Tensor, torch.Tensor]], boxes: Optional[torch.Tensor], masks: Optional[torch.Tensor], + normalize: bool = True, ) -> Tuple[torch.Tensor, torch.Tensor]: """ Embeds different types of prompts, returning both sparse and dense @@ -182,12 +176,13 @@ class PromptEncoder(nn.Module): Bx(embed_dim)x(embed_H)x(embed_W) """ bs = self._get_batch_size(points, boxes, masks) - sparse_embeddings = torch.empty( - (bs, 0, self.embed_dim), device=self._get_device() - ) + sparse_embeddings = torch.empty((bs, 0, self.embed_dim), device=self._get_device()) if points is not None: coords, labels = points - point_embeddings = self._embed_points(coords, labels, pad=(boxes is None)) + coords = coords.to(self._get_device()) + labels = labels.to(self._get_device()) + point_embeddings = self._embed_points(coords, labels, pad=(boxes is None), normalize=normalize) + # point_embeddings = self._embed_points(coords, labels, pad=(boxes is None)) sparse_embeddings = torch.cat([sparse_embeddings, point_embeddings], dim=1) if boxes is not None: box_embeddings = self._embed_boxes(boxes) diff --git a/sam2/modeling/sam/transformer.py b/bboxmaskpose/sam2/modeling/sam/transformer.py similarity index 87% rename from sam2/modeling/sam/transformer.py rename to bboxmaskpose/sam2/modeling/sam/transformer.py index f9fe9a3fbc5cce4f1abe8ee0ae3a8602bbe2ff1b..5aacae055b7e1046c5f756d75fd81021c2742e4e 100644 --- a/sam2/modeling/sam/transformer.py +++ b/bboxmaskpose/sam2/modeling/sam/transformer.py @@ -10,10 +10,10 @@ from typing import Tuple, Type import torch import torch.nn.functional as F -from torch import nn, Tensor +from torch import Tensor, nn -from sam2.modeling.position_encoding import apply_rotary_enc, compute_axial_cis -from sam2.modeling.sam2_utils import MLP +from bboxmaskpose.sam2.modeling.position_encoding import apply_rotary_enc, compute_axial_cis +from bboxmaskpose.sam2.modeling.sam2_utils import MLP class TwoWayTransformer(nn.Module): @@ -57,9 +57,7 @@ class TwoWayTransformer(nn.Module): ) ) - self.final_attn_token_to_image = Attention( - embedding_dim, num_heads, downsample_rate=attention_downsample_rate - ) + self.final_attn_token_to_image = Attention(embedding_dim, num_heads, downsample_rate=attention_downsample_rate) self.norm_final_attn = nn.LayerNorm(embedding_dim) def forward( @@ -136,26 +134,18 @@ class TwoWayAttentionBlock(nn.Module): self.self_attn = Attention(embedding_dim, num_heads) self.norm1 = nn.LayerNorm(embedding_dim) - self.cross_attn_token_to_image = Attention( - embedding_dim, num_heads, downsample_rate=attention_downsample_rate - ) + self.cross_attn_token_to_image = Attention(embedding_dim, num_heads, downsample_rate=attention_downsample_rate) self.norm2 = nn.LayerNorm(embedding_dim) - self.mlp = MLP( - embedding_dim, mlp_dim, embedding_dim, num_layers=2, activation=activation - ) + self.mlp = MLP(embedding_dim, mlp_dim, embedding_dim, num_layers=2, activation=activation) self.norm3 = nn.LayerNorm(embedding_dim) self.norm4 = nn.LayerNorm(embedding_dim) - self.cross_attn_image_to_token = Attention( - embedding_dim, num_heads, downsample_rate=attention_downsample_rate - ) + self.cross_attn_image_to_token = Attention(embedding_dim, num_heads, downsample_rate=attention_downsample_rate) self.skip_first_layer_pe = skip_first_layer_pe - def forward( - self, queries: Tensor, keys: Tensor, query_pe: Tensor, key_pe: Tensor - ) -> Tuple[Tensor, Tensor]: + def forward(self, queries: Tensor, keys: Tensor, query_pe: Tensor, key_pe: Tensor) -> Tuple[Tensor, Tensor]: # Self attention block if self.skip_first_layer_pe: queries = self.self_attn(q=queries, k=queries, v=queries) @@ -206,9 +196,7 @@ class Attention(nn.Module): self.kv_in_dim = kv_in_dim if kv_in_dim is not None else embedding_dim self.internal_dim = embedding_dim // downsample_rate self.num_heads = num_heads - assert ( - self.internal_dim % num_heads == 0 - ), "num_heads must divide embedding_dim." + assert self.internal_dim % num_heads == 0, "num_heads must divide embedding_dim." self.q_proj = nn.Linear(embedding_dim, self.internal_dim) self.k_proj = nn.Linear(self.kv_in_dim, self.internal_dim) @@ -263,18 +251,12 @@ class RoPEAttention(Attention): ): super().__init__(*args, **kwargs) - self.compute_cis = partial( - compute_axial_cis, dim=self.internal_dim // self.num_heads, theta=rope_theta - ) + self.compute_cis = partial(compute_axial_cis, dim=self.internal_dim // self.num_heads, theta=rope_theta) freqs_cis = self.compute_cis(end_x=feat_sizes[0], end_y=feat_sizes[1]) - self.freqs_cis = ( - freqs_cis.to("cuda") if torch.cuda.is_available() else freqs_cis - ) + self.freqs_cis = freqs_cis.to("cuda") if torch.cuda.is_available() else freqs_cis self.rope_k_repeat = rope_k_repeat - def forward( - self, q: Tensor, k: Tensor, v: Tensor, num_k_exclude_rope: int = 0 - ) -> Tensor: + def forward(self, q: Tensor, k: Tensor, v: Tensor, num_k_exclude_rope: int = 0) -> Tensor: # Input projections q = self.q_proj(q) k = self.k_proj(k) diff --git a/sam2/modeling/sam2_base.py b/bboxmaskpose/sam2/modeling/sam2_base.py similarity index 75% rename from sam2/modeling/sam2_base.py rename to bboxmaskpose/sam2/modeling/sam2_base.py index d3afd12e1e7f77e271acec11038f4d71d6c85a6c..0742f1bbfea4d84b8676dd4495f6073106c04d60 100644 --- a/sam2/modeling/sam2_base.py +++ b/bboxmaskpose/sam2/modeling/sam2_base.py @@ -4,20 +4,15 @@ # This source code is licensed under the license found in the # LICENSE file in the root directory of this source tree. -from loguru import logger - import torch import torch.distributed import torch.nn.functional as F - from torch.nn.init import trunc_normal_ -from sam2.modeling.sam.mask_decoder import MaskDecoder -from sam2.modeling.sam.prompt_encoder import PromptEncoder -from sam2.modeling.sam.transformer import TwoWayTransformer -from sam2.modeling.sam2_utils import get_1d_sine_pe, MLP, select_closest_cond_frames - -from sam2.utils.kalman_filter import KalmanFilter +from bboxmaskpose.sam2.modeling.sam2_utils import MLP, get_1d_sine_pe, select_closest_cond_frames +from bboxmaskpose.sam2.modeling.sam.mask_decoder import MaskDecoder +from bboxmaskpose.sam2.modeling.sam.prompt_encoder import PromptEncoder +from bboxmaskpose.sam2.modeling.sam.transformer import TwoWayTransformer # a large negative value as a placeholder score for missing objects NO_OBJ_SCORE = -1024.0 @@ -97,19 +92,10 @@ class SAM2Base(torch.nn.Module): # extra arguments used to construct the SAM mask decoder; if not None, it should be a dict of kwargs to be passed into `MaskDecoder` class. sam_mask_decoder_extra_args=None, compile_image_encoder: bool = False, - # Whether to use SAMURAI or original SAM 2 - samurai_mode: bool = False, - # Hyperparameters for SAMURAI - stable_frames_threshold: int = 15, - stable_ious_threshold: float = 0.3, - min_obj_score_logits: float = -1, - kf_score_weight: float = 0.15, - memory_bank_iou_threshold: float = 0.5, - memory_bank_obj_score_threshold: float = 0.0, - memory_bank_kf_score_threshold: float = 0.0, + n_kpts_encoder: int = -1, ): super().__init__() - + self.n_kpts_encoder = n_kpts_encoder # Part 1: the image backbone self.image_encoder = image_encoder # Use level 0, 1, 2 for high-res setting, or just level 2 for the default setting @@ -137,16 +123,12 @@ class SAM2Base(torch.nn.Module): # Part 3: memory encoder for the previous frame's outputs self.memory_encoder = memory_encoder self.mem_dim = self.hidden_dim - if hasattr(self.memory_encoder, "out_proj") and hasattr( - self.memory_encoder.out_proj, "weight" - ): + if hasattr(self.memory_encoder, "out_proj") and hasattr(self.memory_encoder.out_proj, "weight"): # if there is compression of memories along channel dim self.mem_dim = self.memory_encoder.out_proj.weight.shape[0] self.num_maskmem = num_maskmem # Number of memories accessible # Temporal encoding of the memories - self.maskmem_tpos_enc = torch.nn.Parameter( - torch.zeros(num_maskmem, 1, 1, self.mem_dim) - ) + self.maskmem_tpos_enc = torch.nn.Parameter(torch.zeros(num_maskmem, 1, 1, self.mem_dim)) trunc_normal_(self.maskmem_tpos_enc, std=0.02) # a single token to indicate no memory embedding from previous frames self.no_mem_embed = torch.nn.Parameter(torch.zeros(1, 1, self.hidden_dim)) @@ -194,37 +176,10 @@ class SAM2Base(torch.nn.Module): self._build_sam_heads() self.max_cond_frames_in_attn = max_cond_frames_in_attn - - # Whether to use SAMURAI or original SAM 2 - self.samurai_mode = samurai_mode - - # Init Kalman Filter - self.kf = KalmanFilter() - self.kf_mean = None - self.kf_covariance = None - self.stable_frames = 0 - - # Debug purpose - self.history = {} # debug - self.frame_cnt = 0 # debug - - # Hyperparameters for SAMURAI - self.stable_frames_threshold = stable_frames_threshold - self.stable_ious_threshold = stable_ious_threshold - self.min_obj_score_logits = min_obj_score_logits - self.kf_score_weight = kf_score_weight - self.memory_bank_iou_threshold = memory_bank_iou_threshold - self.memory_bank_obj_score_threshold = memory_bank_obj_score_threshold - self.memory_bank_kf_score_threshold = memory_bank_kf_score_threshold - - print(f"\033[93mSAMURAI mode: {self.samurai_mode}\033[0m") - # Model compilation if compile_image_encoder: # Compile the forward function (not the full module) to allow loading checkpoints. - print( - "Image encoder compilation is enabled. First forward pass will be slow." - ) + print("Image encoder compilation is enabled. First forward pass will be slow.") self.image_encoder.forward = torch.compile( self.image_encoder.forward, mode="max-autotune", @@ -232,6 +187,15 @@ class SAM2Base(torch.nn.Module): dynamic=False, ) + freeze_prompt_encoder = False + freeze_mask_decoder = False + if freeze_prompt_encoder: + for p in self.sam_prompt_encoder.parameters(): + p.requires_grad = False + if freeze_mask_decoder: + for p in self.sam_mask_decoder.parameters(): + p.requires_grad = False + @property def device(self): return next(self.parameters()).device @@ -257,7 +221,9 @@ class SAM2Base(torch.nn.Module): ), input_image_size=(self.image_size, self.image_size), mask_in_chans=16, + n_kpts_encoder=self.n_kpts_encoder, ) + self.sam_mask_decoder = MaskDecoder( num_multimask_outputs=3, transformer=TwoWayTransformer( @@ -276,13 +242,16 @@ class SAM2Base(torch.nn.Module): use_multimask_token_for_obj_ptr=self.use_multimask_token_for_obj_ptr, **(self.sam_mask_decoder_extra_args or {}), ) + for p in self.sam_prompt_encoder.parameters(): + p.requires_grad = True + for p in self.sam_mask_decoder.parameters(): + p.requires_grad = True + if self.use_obj_ptrs_in_encoder: # a linear projection on SAM output tokens to turn them into object pointers self.obj_ptr_proj = torch.nn.Linear(self.hidden_dim, self.hidden_dim) if self.use_mlp_for_obj_ptr_proj: - self.obj_ptr_proj = MLP( - self.hidden_dim, self.hidden_dim, self.hidden_dim, 3 - ) + self.obj_ptr_proj = MLP(self.hidden_dim, self.hidden_dim, self.hidden_dim, 3) else: self.obj_ptr_proj = torch.nn.Identity() if self.proj_tpos_enc_in_obj_ptrs: @@ -395,7 +364,7 @@ class SAM2Base(torch.nn.Module): high_res_features=high_res_features, ) if self.pred_obj_scores: - is_obj_appearing = object_score_logits > self.min_obj_score_logits + is_obj_appearing = object_score_logits > 0 # Mask used for spatial memories is always a *hard* choice between obj and no obj, # consistent with the actual mask prediction @@ -416,87 +385,7 @@ class SAM2Base(torch.nn.Module): ) sam_output_token = sam_output_tokens[:, 0] - kf_ious = None - if multimask_output and self.samurai_mode: - if self.kf_mean is None and self.kf_covariance is None or self.stable_frames == 0: - best_iou_inds = torch.argmax(ious, dim=-1) - batch_inds = torch.arange(B, device=device) - low_res_masks = low_res_multimasks[batch_inds, best_iou_inds].unsqueeze(1) - high_res_masks = high_res_multimasks[batch_inds, best_iou_inds].unsqueeze(1) - non_zero_indices = torch.argwhere(high_res_masks[0][0] > 0.0) - if len(non_zero_indices) == 0: - high_res_bbox = [0, 0, 0, 0] - else: - y_min, x_min = non_zero_indices.min(dim=0).values - y_max, x_max = non_zero_indices.max(dim=0).values - high_res_bbox = [x_min.item(), y_min.item(), x_max.item(), y_max.item()] - self.kf_mean, self.kf_covariance = self.kf.initiate(self.kf.xyxy_to_xyah(high_res_bbox)) - if sam_output_tokens.size(1) > 1: - sam_output_token = sam_output_tokens[batch_inds, best_iou_inds] - self.frame_cnt += 1 - self.stable_frames += 1 - elif self.stable_frames < self.stable_frames_threshold: - self.kf_mean, self.kf_covariance = self.kf.predict(self.kf_mean, self.kf_covariance) - best_iou_inds = torch.argmax(ious, dim=-1) - batch_inds = torch.arange(B, device=device) - low_res_masks = low_res_multimasks[batch_inds, best_iou_inds].unsqueeze(1) - high_res_masks = high_res_multimasks[batch_inds, best_iou_inds].unsqueeze(1) - non_zero_indices = torch.argwhere(high_res_masks[0][0] > 0.0) - if len(non_zero_indices) == 0: - high_res_bbox = [0, 0, 0, 0] - else: - y_min, x_min = non_zero_indices.min(dim=0).values - y_max, x_max = non_zero_indices.max(dim=0).values - high_res_bbox = [x_min.item(), y_min.item(), x_max.item(), y_max.item()] - if ious[0][best_iou_inds] > self.stable_ious_threshold: - self.kf_mean, self.kf_covariance = self.kf.update(self.kf_mean, self.kf_covariance, self.kf.xyxy_to_xyah(high_res_bbox)) - self.stable_frames += 1 - else: - self.stable_frames = 0 - if sam_output_tokens.size(1) > 1: - sam_output_token = sam_output_tokens[batch_inds, best_iou_inds] - self.frame_cnt += 1 - else: - self.kf_mean, self.kf_covariance = self.kf.predict(self.kf_mean, self.kf_covariance) - high_res_multibboxes = [] - batch_inds = torch.arange(B, device=device) - for i in range(ious.shape[1]): - non_zero_indices = torch.argwhere(high_res_multimasks[batch_inds, i].unsqueeze(1)[0][0] > 0.0) - if len(non_zero_indices) == 0: - high_res_multibboxes.append([0, 0, 0, 0]) - else: - y_min, x_min = non_zero_indices.min(dim=0).values - y_max, x_max = non_zero_indices.max(dim=0).values - high_res_multibboxes.append([x_min.item(), y_min.item(), x_max.item(), y_max.item()]) - # compute the IoU between the predicted bbox and the high_res_multibboxes - kf_ious = torch.tensor(self.kf.compute_iou(self.kf_mean[:4], high_res_multibboxes), device=device) - # weighted iou - weighted_ious = self.kf_score_weight * kf_ious + (1 - self.kf_score_weight) * ious - best_iou_inds = torch.argmax(weighted_ious, dim=-1) - batch_inds = torch.arange(B, device=device) - low_res_masks = low_res_multimasks[batch_inds, best_iou_inds].unsqueeze(1) - high_res_masks = high_res_multimasks[batch_inds, best_iou_inds].unsqueeze(1) - if sam_output_tokens.size(1) > 1: - sam_output_token = sam_output_tokens[batch_inds, best_iou_inds] - - if False: - # make all these on cpu - self.history[self.frame_cnt] = { - "kf_predicted_bbox": self.kf.xyah_to_xyxy(self.kf_mean[:4]), - # "multi_masks": high_res_multimasks.cpu(), - "ious": ious.cpu(), - "multi_bboxes": high_res_multibboxes, - "kf_ious": kf_ious, - "weighted_ious": weighted_ious.cpu(), - "final_selection": best_iou_inds.cpu(), - } - self.frame_cnt += 1 - - if ious[0][best_iou_inds] < self.stable_ious_threshold: - self.stable_frames = 0 - else: - self.kf_mean, self.kf_covariance = self.kf.update(self.kf_mean, self.kf_covariance, self.kf.xyxy_to_xyah(high_res_multibboxes[best_iou_inds])) - elif multimask_output and not self.samurai_mode: + if multimask_output: # take the best mask prediction (with the highest IoU estimation) best_iou_inds = torch.argmax(ious, dim=-1) batch_inds = torch.arange(B, device=device) @@ -505,7 +394,6 @@ class SAM2Base(torch.nn.Module): if sam_output_tokens.size(1) > 1: sam_output_token = sam_output_tokens[batch_inds, best_iou_inds] else: - best_iou_inds = 0 low_res_masks, high_res_masks = low_res_multimasks, high_res_multimasks # Extract object pointer from the SAM output token (with occlusion handling) @@ -529,8 +417,6 @@ class SAM2Base(torch.nn.Module): high_res_masks, obj_ptr, object_score_logits, - ious[0][best_iou_inds], - kf_ious[best_iou_inds] if kf_ious is not None else None, ) def _use_mask_as_output(self, backbone_features, high_res_features, mask_inputs): @@ -553,12 +439,10 @@ class SAM2Base(torch.nn.Module): ious = mask_inputs.new_ones(mask_inputs.size(0), 1).float() if not self.use_obj_ptrs_in_encoder: # all zeros as a dummy object pointer (of shape [B, C]) - obj_ptr = torch.zeros( - mask_inputs.size(0), self.hidden_dim, device=mask_inputs.device - ) + obj_ptr = torch.zeros(mask_inputs.size(0), self.hidden_dim, device=mask_inputs.device) else: # produce an object pointer using the SAM decoder from the mask input - _, _, _, _, _, obj_ptr, _, _, _ = self._forward_sam_heads( + _, _, _, _, _, obj_ptr, _ = self._forward_sam_heads( backbone_features=backbone_features, mask_inputs=self.mask_downsample(mask_inputs_float), high_res_features=high_res_features, @@ -591,12 +475,8 @@ class SAM2Base(torch.nn.Module): if self.use_high_res_features_in_sam: # precompute projected level 0 and level 1 features in SAM decoder # to avoid running it again on every SAM click - backbone_out["backbone_fpn"][0] = self.sam_mask_decoder.conv_s0( - backbone_out["backbone_fpn"][0] - ) - backbone_out["backbone_fpn"][1] = self.sam_mask_decoder.conv_s1( - backbone_out["backbone_fpn"][1] - ) + backbone_out["backbone_fpn"][0] = self.sam_mask_decoder.conv_s0(backbone_out["backbone_fpn"][0]) + backbone_out["backbone_fpn"][1] = self.sam_mask_decoder.conv_s1(backbone_out["backbone_fpn"][1]) return backbone_out def _prepare_backbone_features(self, backbone_out): @@ -657,63 +537,36 @@ class SAM2Base(torch.nn.Module): # We also allow taking the memory frame non-consecutively (with stride>1), in which case # we take (self.num_maskmem - 2) frames among every stride-th frames plus the last frame. stride = 1 if self.training else self.memory_temporal_stride_for_eval - - if self.samurai_mode: - valid_indices = [] - if frame_idx > 1: # Ensure we have previous frames to evaluate - for i in range(frame_idx - 1, 1, -1): # Iterate backwards through previous frames - iou_score = output_dict["non_cond_frame_outputs"][i]["best_iou_score"] # Get mask affinity score - obj_score = output_dict["non_cond_frame_outputs"][i]["object_score_logits"] # Get object score - kf_score = output_dict["non_cond_frame_outputs"][i]["kf_score"] if "kf_score" in output_dict["non_cond_frame_outputs"][i] else None # Get motion score if available - # Check if the scores meet the criteria for being a valid index - if iou_score.item() > self.memory_bank_iou_threshold and \ - obj_score.item() > self.memory_bank_obj_score_threshold and \ - (kf_score is None or kf_score.item() > self.memory_bank_kf_score_threshold): - valid_indices.insert(0, i) - # Check the number of valid indices - if len(valid_indices) >= self.max_obj_ptrs_in_encoder - 1: - break - if frame_idx - 1 not in valid_indices: - valid_indices.append(frame_idx - 1) - for t_pos in range(1, self.num_maskmem): # Iterate over the number of mask memories - idx = t_pos - self.num_maskmem # Calculate the index for valid indices - if idx < -len(valid_indices): # Skip if index is out of bounds - continue - out = output_dict["non_cond_frame_outputs"].get(valid_indices[idx], None) # Get output for the valid index - if out is None: # If not found, check unselected outputs - out = unselected_cond_outputs.get(valid_indices[idx], None) - t_pos_and_prevs.append((t_pos, out)) # Append the temporal position and output to the list - else: - for t_pos in range(1, self.num_maskmem): - t_rel = self.num_maskmem - t_pos # how many frames before current frame - if t_rel == 1: - # for t_rel == 1, we take the last frame (regardless of r) - if not track_in_reverse: - # the frame immediately before this frame (i.e. frame_idx - 1) - prev_frame_idx = frame_idx - t_rel - else: - # the frame immediately after this frame (i.e. frame_idx + 1) - prev_frame_idx = frame_idx + t_rel + for t_pos in range(1, self.num_maskmem): + t_rel = self.num_maskmem - t_pos # how many frames before current frame + if t_rel == 1: + # for t_rel == 1, we take the last frame (regardless of r) + if not track_in_reverse: + # the frame immediately before this frame (i.e. frame_idx - 1) + prev_frame_idx = frame_idx - t_rel else: - # for t_rel >= 2, we take the memory frame from every r-th frames - if not track_in_reverse: - # first find the nearest frame among every r-th frames before this frame - # for r=1, this would be (frame_idx - 2) - prev_frame_idx = ((frame_idx - 2) // stride) * stride - # then seek further among every r-th frames - prev_frame_idx = prev_frame_idx - (t_rel - 2) * stride - else: - # first find the nearest frame among every r-th frames after this frame - # for r=1, this would be (frame_idx + 2) - prev_frame_idx = -(-(frame_idx + 2) // stride) * stride - # then seek further among every r-th frames - prev_frame_idx = prev_frame_idx + (t_rel - 2) * stride - out = output_dict["non_cond_frame_outputs"].get(prev_frame_idx, None) - if out is None: - # If an unselected conditioning frame is among the last (self.num_maskmem - 1) - # frames, we still attend to it as if it's a non-conditioning frame. - out = unselected_cond_outputs.get(prev_frame_idx, None) - t_pos_and_prevs.append((t_pos, out)) + # the frame immediately after this frame (i.e. frame_idx + 1) + prev_frame_idx = frame_idx + t_rel + else: + # for t_rel >= 2, we take the memory frame from every r-th frames + if not track_in_reverse: + # first find the nearest frame among every r-th frames before this frame + # for r=1, this would be (frame_idx - 2) + prev_frame_idx = ((frame_idx - 2) // stride) * stride + # then seek further among every r-th frames + prev_frame_idx = prev_frame_idx - (t_rel - 2) * stride + else: + # first find the nearest frame among every r-th frames after this frame + # for r=1, this would be (frame_idx + 2) + prev_frame_idx = -(-(frame_idx + 2) // stride) * stride + # then seek further among every r-th frames + prev_frame_idx = prev_frame_idx + (t_rel - 2) * stride + out = output_dict["non_cond_frame_outputs"].get(prev_frame_idx, None) + if out is None: + # If an unselected conditioning frame is among the last (self.num_maskmem - 1) + # frames, we still attend to it as if it's a non-conditioning frame. + out = unselected_cond_outputs.get(prev_frame_idx, None) + t_pos_and_prevs.append((t_pos, out)) for t_pos, prev in t_pos_and_prevs: if prev is None: @@ -726,9 +579,7 @@ class SAM2Base(torch.nn.Module): maskmem_enc = prev["maskmem_pos_enc"][-1].to(device) maskmem_enc = maskmem_enc.flatten(2).permute(2, 0, 1) # Temporal positional encoding - maskmem_enc = ( - maskmem_enc + self.maskmem_tpos_enc[self.num_maskmem - t_pos - 1] - ) + maskmem_enc = maskmem_enc + self.maskmem_tpos_enc[self.num_maskmem - t_pos - 1] to_cat_memory_pos_embed.append(maskmem_enc) # Construct the list of past object pointers @@ -738,20 +589,14 @@ class SAM2Base(torch.nn.Module): # (optionally, only include object pointers in the past during evaluation) if not self.training and self.only_obj_ptrs_in_the_past_for_eval: ptr_cond_outputs = { - t: out - for t, out in selected_cond_outputs.items() - if (t >= frame_idx if track_in_reverse else t <= frame_idx) + t: out for t, out in selected_cond_outputs.items() if (t >= frame_idx if track_in_reverse else t <= frame_idx) } else: ptr_cond_outputs = selected_cond_outputs pos_and_ptrs = [ # Temporal pos encoding contains how far away each pointer is from current frame ( - ( - (frame_idx - t) * tpos_sign_mul - if self.use_signed_tpos_enc_to_obj_ptrs - else abs(frame_idx - t) - ), + ((frame_idx - t) * tpos_sign_mul if self.use_signed_tpos_enc_to_obj_ptrs else abs(frame_idx - t)), out["obj_ptr"], ) for t, out in ptr_cond_outputs.items() @@ -761,9 +606,7 @@ class SAM2Base(torch.nn.Module): t = frame_idx + t_diff if track_in_reverse else frame_idx - t_diff if t < 0 or (num_frames is not None and t >= num_frames): break - out = output_dict["non_cond_frame_outputs"].get( - t, unselected_cond_outputs.get(t, None) - ) + out = output_dict["non_cond_frame_outputs"].get(t, unselected_cond_outputs.get(t, None)) if out is not None: pos_and_ptrs.append((t_diff, out["obj_ptr"])) # If we have at least one object pointer, add them to the across attention @@ -776,9 +619,7 @@ class SAM2Base(torch.nn.Module): if self.add_tpos_enc_to_obj_ptrs: t_diff_max = max_obj_ptrs_in_encoder - 1 tpos_dim = C if self.proj_tpos_enc_in_obj_ptrs else self.mem_dim - obj_pos = torch.tensor(pos_list).to( - device=device, non_blocking=True - ) + obj_pos = torch.tensor(pos_list).to(device=device, non_blocking=True) obj_pos = get_1d_sine_pe(obj_pos / t_diff_max, dim=tpos_dim) obj_pos = self.obj_ptr_tpos_proj(obj_pos) obj_pos = obj_pos.unsqueeze(1).expand(-1, B, self.mem_dim) @@ -786,9 +627,7 @@ class SAM2Base(torch.nn.Module): obj_pos = obj_ptrs.new_zeros(len(pos_list), B, self.mem_dim) if self.mem_dim < C: # split a pointer into (C // self.mem_dim) tokens for self.mem_dim < C - obj_ptrs = obj_ptrs.reshape( - -1, B, C // self.mem_dim, self.mem_dim - ) + obj_ptrs = obj_ptrs.reshape(-1, B, C // self.mem_dim, self.mem_dim) obj_ptrs = obj_ptrs.permute(0, 2, 1, 3).flatten(0, 1) obj_pos = obj_pos.repeat_interleave(C // self.mem_dim, dim=0) to_cat_memory.append(obj_ptrs) @@ -841,9 +680,7 @@ class SAM2Base(torch.nn.Module): # optionally, apply non-overlapping constraints to the masks (it's applied # in the batch dimension and should only be used during eval, where all # the objects come from the same video under batch size 1). - pred_masks_high_res = self._apply_non_overlapping_constraints( - pred_masks_high_res - ) + pred_masks_high_res = self._apply_non_overlapping_constraints(pred_masks_high_res) # scale the raw mask logits with a temperature before applying sigmoid binarize = self.binarize_mask_from_pts_for_mem_enc and is_mask_from_pts if binarize and not self.training: @@ -856,18 +693,14 @@ class SAM2Base(torch.nn.Module): mask_for_mem = mask_for_mem * self.sigmoid_scale_for_mem_enc if self.sigmoid_bias_for_mem_enc != 0.0: mask_for_mem = mask_for_mem + self.sigmoid_bias_for_mem_enc - maskmem_out = self.memory_encoder( - pix_feat, mask_for_mem, skip_mask_sigmoid=True # sigmoid already applied - ) + maskmem_out = self.memory_encoder(pix_feat, mask_for_mem, skip_mask_sigmoid=True) # sigmoid already applied maskmem_features = maskmem_out["vision_features"] maskmem_pos_enc = maskmem_out["vision_pos_enc"] # add a no-object embedding to the spatial memory to indicate that the frame # is predicted to be occluded (i.e. no object is appearing in the frame) if self.no_obj_embed_spatial is not None: is_obj_appearing = (object_score_logits > 0).float() - maskmem_features += ( - 1 - is_obj_appearing[..., None, None] - ) * self.no_obj_embed_spatial[..., None, None].expand( + maskmem_features += (1 - is_obj_appearing[..., None, None]) * self.no_obj_embed_spatial[..., None, None].expand( *maskmem_features.shape ) @@ -891,8 +724,7 @@ class SAM2Base(torch.nn.Module): # High-resolution feature maps for the SAM head, reshape (HW)BC => BCHW if len(current_vision_feats) > 1: high_res_features = [ - x.permute(1, 2, 0).view(x.size(1), x.size(2), *s) - for x, s in zip(current_vision_feats[:-1], feat_sizes[:-1]) + x.permute(1, 2, 0).view(x.size(1), x.size(2), *s) for x, s in zip(current_vision_feats[:-1], feat_sizes[:-1]) ] else: high_res_features = None @@ -901,9 +733,7 @@ class SAM2Base(torch.nn.Module): # (see it as a GT mask) without using a SAM prompt encoder + mask decoder. pix_feat = current_vision_feats[-1].permute(1, 2, 0) pix_feat = pix_feat.view(-1, self.hidden_dim, *feat_sizes[-1]) - sam_outputs = self._use_mask_as_output( - pix_feat, high_res_features, mask_inputs - ) + sam_outputs = self._use_mask_as_output(pix_feat, high_res_features, mask_inputs) else: # fused the visual feature with previous memory features in the memory bank pix_feat = self._prepare_memory_conditioned_features( @@ -1002,15 +832,11 @@ class SAM2Base(torch.nn.Module): high_res_masks, obj_ptr, object_score_logits, - best_iou_score, - kf_ious ) = sam_outputs current_out["pred_masks"] = low_res_masks current_out["pred_masks_high_res"] = high_res_masks current_out["obj_ptr"] = obj_ptr - current_out["best_iou_score"] = best_iou_score - current_out["kf_ious"] = kf_ious if not self.training: # Only add this in inference (to avoid unused param in activation checkpointing; # it's mainly used in the demo to encode spatial memories w/ consolidated masks) diff --git a/sam2/modeling/sam2_base_pose.py b/bboxmaskpose/sam2/modeling/sam2_base_pose.py similarity index 93% rename from sam2/modeling/sam2_base_pose.py rename to bboxmaskpose/sam2/modeling/sam2_base_pose.py index f7bd3f58c10971fdf71f6bb2951282f252c39f06..691cc97db201b239e0b183cf49735de9eb96102f 100644 --- a/sam2/modeling/sam2_base_pose.py +++ b/bboxmaskpose/sam2/modeling/sam2_base_pose.py @@ -4,20 +4,17 @@ # This source code is licensed under the license found in the # LICENSE file in the root directory of this source tree. -from loguru import logger - import torch import torch.distributed import torch.nn.functional as F - from torch.nn.init import trunc_normal_ -from sam2.modeling.sam.mask_decoder import MaskDecoder -from sam2.modeling.sam.pose_encoder import PoseEncoder -from sam2.modeling.sam.transformer import TwoWayTransformer -from sam2.modeling.sam2_utils import get_1d_sine_pe, MLP, select_closest_cond_frames - -from sam2.utils.kalman_filter import KalmanFilter +from bboxmaskpose.sam2.modeling.sam2_utils import MLP, get_1d_sine_pe, select_closest_cond_frames +from bboxmaskpose.sam2.modeling.sam.mask_decoder import MaskDecoder +from bboxmaskpose.sam2.modeling.sam.pose_encoder import PoseEncoder +from bboxmaskpose.sam2.modeling.sam.transformer import TwoWayTransformer +from bboxmaskpose.sam2.utils.kalman_filter import KalmanFilter +from loguru import logger # a large negative value as a placeholder score for missing objects NO_OBJ_SCORE = -1024.0 @@ -137,16 +134,12 @@ class SAM2Base(torch.nn.Module): # Part 3: memory encoder for the previous frame's outputs self.memory_encoder = memory_encoder self.mem_dim = self.hidden_dim - if hasattr(self.memory_encoder, "out_proj") and hasattr( - self.memory_encoder.out_proj, "weight" - ): + if hasattr(self.memory_encoder, "out_proj") and hasattr(self.memory_encoder.out_proj, "weight"): # if there is compression of memories along channel dim self.mem_dim = self.memory_encoder.out_proj.weight.shape[0] self.num_maskmem = num_maskmem # Number of memories accessible # Temporal encoding of the memories - self.maskmem_tpos_enc = torch.nn.Parameter( - torch.zeros(num_maskmem, 1, 1, self.mem_dim) - ) + self.maskmem_tpos_enc = torch.nn.Parameter(torch.zeros(num_maskmem, 1, 1, self.mem_dim)) trunc_normal_(self.maskmem_tpos_enc, std=0.02) # a single token to indicate no memory embedding from previous frames self.no_mem_embed = torch.nn.Parameter(torch.zeros(1, 1, self.hidden_dim)) @@ -205,8 +198,8 @@ class SAM2Base(torch.nn.Module): self.stable_frames = 0 # Debug purpose - self.history = {} # debug - self.frame_cnt = 0 # debug + self.history = {} # debug + self.frame_cnt = 0 # debug # Hyperparameters for SAMURAI self.stable_frames_threshold = stable_frames_threshold @@ -222,9 +215,7 @@ class SAM2Base(torch.nn.Module): # Model compilation if compile_image_encoder: # Compile the forward function (not the full module) to allow loading checkpoints. - print( - "Image encoder compilation is enabled. First forward pass will be slow." - ) + print("Image encoder compilation is enabled. First forward pass will be slow.") self.image_encoder.forward = torch.compile( self.image_encoder.forward, mode="max-autotune", @@ -280,9 +271,7 @@ class SAM2Base(torch.nn.Module): # a linear projection on SAM output tokens to turn them into object pointers self.obj_ptr_proj = torch.nn.Linear(self.hidden_dim, self.hidden_dim) if self.use_mlp_for_obj_ptr_proj: - self.obj_ptr_proj = MLP( - self.hidden_dim, self.hidden_dim, self.hidden_dim, 3 - ) + self.obj_ptr_proj = MLP(self.hidden_dim, self.hidden_dim, self.hidden_dim, 3) else: self.obj_ptr_proj = torch.nn.Identity() if self.proj_tpos_enc_in_obj_ptrs: @@ -480,7 +469,7 @@ class SAM2Base(torch.nn.Module): sam_output_token = sam_output_tokens[batch_inds, best_iou_inds] if False: - # make all these on cpu + # make all these on cpu self.history[self.frame_cnt] = { "kf_predicted_bbox": self.kf.xyah_to_xyxy(self.kf_mean[:4]), # "multi_masks": high_res_multimasks.cpu(), @@ -495,7 +484,9 @@ class SAM2Base(torch.nn.Module): if ious[0][best_iou_inds] < self.stable_ious_threshold: self.stable_frames = 0 else: - self.kf_mean, self.kf_covariance = self.kf.update(self.kf_mean, self.kf_covariance, self.kf.xyxy_to_xyah(high_res_multibboxes[best_iou_inds])) + self.kf_mean, self.kf_covariance = self.kf.update( + self.kf_mean, self.kf_covariance, self.kf.xyxy_to_xyah(high_res_multibboxes[best_iou_inds]) + ) elif multimask_output and not self.samurai_mode: # take the best mask prediction (with the highest IoU estimation) best_iou_inds = torch.argmax(ious, dim=-1) @@ -553,9 +544,7 @@ class SAM2Base(torch.nn.Module): ious = mask_inputs.new_ones(mask_inputs.size(0), 1).float() if not self.use_obj_ptrs_in_encoder: # all zeros as a dummy object pointer (of shape [B, C]) - obj_ptr = torch.zeros( - mask_inputs.size(0), self.hidden_dim, device=mask_inputs.device - ) + obj_ptr = torch.zeros(mask_inputs.size(0), self.hidden_dim, device=mask_inputs.device) else: # produce an object pointer using the SAM decoder from the mask input _, _, _, _, _, obj_ptr, _, _, _ = self._forward_sam_heads( @@ -591,12 +580,8 @@ class SAM2Base(torch.nn.Module): if self.use_high_res_features_in_sam: # precompute projected level 0 and level 1 features in SAM decoder # to avoid running it again on every SAM click - backbone_out["backbone_fpn"][0] = self.sam_mask_decoder.conv_s0( - backbone_out["backbone_fpn"][0] - ) - backbone_out["backbone_fpn"][1] = self.sam_mask_decoder.conv_s1( - backbone_out["backbone_fpn"][1] - ) + backbone_out["backbone_fpn"][0] = self.sam_mask_decoder.conv_s0(backbone_out["backbone_fpn"][0]) + backbone_out["backbone_fpn"][1] = self.sam_mask_decoder.conv_s1(backbone_out["backbone_fpn"][1]) return backbone_out def _prepare_backbone_features(self, backbone_out): @@ -659,21 +644,27 @@ class SAM2Base(torch.nn.Module): stride = 1 if self.training else self.memory_temporal_stride_for_eval if self.samurai_mode: - valid_indices = [] + valid_indices = [] if frame_idx > 1: # Ensure we have previous frames to evaluate for i in range(frame_idx - 1, 1, -1): # Iterate backwards through previous frames iou_score = output_dict["non_cond_frame_outputs"][i]["best_iou_score"] # Get mask affinity score obj_score = output_dict["non_cond_frame_outputs"][i]["object_score_logits"] # Get object score - kf_score = output_dict["non_cond_frame_outputs"][i]["kf_score"] if "kf_score" in output_dict["non_cond_frame_outputs"][i] else None # Get motion score if available + kf_score = ( + output_dict["non_cond_frame_outputs"][i]["kf_score"] + if "kf_score" in output_dict["non_cond_frame_outputs"][i] + else None + ) # Get motion score if available # Check if the scores meet the criteria for being a valid index - if iou_score.item() > self.memory_bank_iou_threshold and \ - obj_score.item() > self.memory_bank_obj_score_threshold and \ - (kf_score is None or kf_score.item() > self.memory_bank_kf_score_threshold): - valid_indices.insert(0, i) + if ( + iou_score.item() > self.memory_bank_iou_threshold + and obj_score.item() > self.memory_bank_obj_score_threshold + and (kf_score is None or kf_score.item() > self.memory_bank_kf_score_threshold) + ): + valid_indices.insert(0, i) # Check the number of valid indices - if len(valid_indices) >= self.max_obj_ptrs_in_encoder - 1: + if len(valid_indices) >= self.max_obj_ptrs_in_encoder - 1: break - if frame_idx - 1 not in valid_indices: + if frame_idx - 1 not in valid_indices: valid_indices.append(frame_idx - 1) for t_pos in range(1, self.num_maskmem): # Iterate over the number of mask memories idx = t_pos - self.num_maskmem # Calculate the index for valid indices @@ -726,9 +717,7 @@ class SAM2Base(torch.nn.Module): maskmem_enc = prev["maskmem_pos_enc"][-1].to(device) maskmem_enc = maskmem_enc.flatten(2).permute(2, 0, 1) # Temporal positional encoding - maskmem_enc = ( - maskmem_enc + self.maskmem_tpos_enc[self.num_maskmem - t_pos - 1] - ) + maskmem_enc = maskmem_enc + self.maskmem_tpos_enc[self.num_maskmem - t_pos - 1] to_cat_memory_pos_embed.append(maskmem_enc) # Construct the list of past object pointers @@ -738,20 +727,14 @@ class SAM2Base(torch.nn.Module): # (optionally, only include object pointers in the past during evaluation) if not self.training and self.only_obj_ptrs_in_the_past_for_eval: ptr_cond_outputs = { - t: out - for t, out in selected_cond_outputs.items() - if (t >= frame_idx if track_in_reverse else t <= frame_idx) + t: out for t, out in selected_cond_outputs.items() if (t >= frame_idx if track_in_reverse else t <= frame_idx) } else: ptr_cond_outputs = selected_cond_outputs pos_and_ptrs = [ # Temporal pos encoding contains how far away each pointer is from current frame ( - ( - (frame_idx - t) * tpos_sign_mul - if self.use_signed_tpos_enc_to_obj_ptrs - else abs(frame_idx - t) - ), + ((frame_idx - t) * tpos_sign_mul if self.use_signed_tpos_enc_to_obj_ptrs else abs(frame_idx - t)), out["obj_ptr"], ) for t, out in ptr_cond_outputs.items() @@ -761,9 +744,7 @@ class SAM2Base(torch.nn.Module): t = frame_idx + t_diff if track_in_reverse else frame_idx - t_diff if t < 0 or (num_frames is not None and t >= num_frames): break - out = output_dict["non_cond_frame_outputs"].get( - t, unselected_cond_outputs.get(t, None) - ) + out = output_dict["non_cond_frame_outputs"].get(t, unselected_cond_outputs.get(t, None)) if out is not None: pos_and_ptrs.append((t_diff, out["obj_ptr"])) # If we have at least one object pointer, add them to the across attention @@ -776,9 +757,7 @@ class SAM2Base(torch.nn.Module): if self.add_tpos_enc_to_obj_ptrs: t_diff_max = max_obj_ptrs_in_encoder - 1 tpos_dim = C if self.proj_tpos_enc_in_obj_ptrs else self.mem_dim - obj_pos = torch.tensor(pos_list).to( - device=device, non_blocking=True - ) + obj_pos = torch.tensor(pos_list).to(device=device, non_blocking=True) obj_pos = get_1d_sine_pe(obj_pos / t_diff_max, dim=tpos_dim) obj_pos = self.obj_ptr_tpos_proj(obj_pos) obj_pos = obj_pos.unsqueeze(1).expand(-1, B, self.mem_dim) @@ -786,9 +765,7 @@ class SAM2Base(torch.nn.Module): obj_pos = obj_ptrs.new_zeros(len(pos_list), B, self.mem_dim) if self.mem_dim < C: # split a pointer into (C // self.mem_dim) tokens for self.mem_dim < C - obj_ptrs = obj_ptrs.reshape( - -1, B, C // self.mem_dim, self.mem_dim - ) + obj_ptrs = obj_ptrs.reshape(-1, B, C // self.mem_dim, self.mem_dim) obj_ptrs = obj_ptrs.permute(0, 2, 1, 3).flatten(0, 1) obj_pos = obj_pos.repeat_interleave(C // self.mem_dim, dim=0) to_cat_memory.append(obj_ptrs) @@ -841,9 +818,7 @@ class SAM2Base(torch.nn.Module): # optionally, apply non-overlapping constraints to the masks (it's applied # in the batch dimension and should only be used during eval, where all # the objects come from the same video under batch size 1). - pred_masks_high_res = self._apply_non_overlapping_constraints( - pred_masks_high_res - ) + pred_masks_high_res = self._apply_non_overlapping_constraints(pred_masks_high_res) # scale the raw mask logits with a temperature before applying sigmoid binarize = self.binarize_mask_from_pts_for_mem_enc and is_mask_from_pts if binarize and not self.training: @@ -856,18 +831,14 @@ class SAM2Base(torch.nn.Module): mask_for_mem = mask_for_mem * self.sigmoid_scale_for_mem_enc if self.sigmoid_bias_for_mem_enc != 0.0: mask_for_mem = mask_for_mem + self.sigmoid_bias_for_mem_enc - maskmem_out = self.memory_encoder( - pix_feat, mask_for_mem, skip_mask_sigmoid=True # sigmoid already applied - ) + maskmem_out = self.memory_encoder(pix_feat, mask_for_mem, skip_mask_sigmoid=True) # sigmoid already applied maskmem_features = maskmem_out["vision_features"] maskmem_pos_enc = maskmem_out["vision_pos_enc"] # add a no-object embedding to the spatial memory to indicate that the frame # is predicted to be occluded (i.e. no object is appearing in the frame) if self.no_obj_embed_spatial is not None: is_obj_appearing = (object_score_logits > 0).float() - maskmem_features += ( - 1 - is_obj_appearing[..., None, None] - ) * self.no_obj_embed_spatial[..., None, None].expand( + maskmem_features += (1 - is_obj_appearing[..., None, None]) * self.no_obj_embed_spatial[..., None, None].expand( *maskmem_features.shape ) @@ -891,8 +862,7 @@ class SAM2Base(torch.nn.Module): # High-resolution feature maps for the SAM head, reshape (HW)BC => BCHW if len(current_vision_feats) > 1: high_res_features = [ - x.permute(1, 2, 0).view(x.size(1), x.size(2), *s) - for x, s in zip(current_vision_feats[:-1], feat_sizes[:-1]) + x.permute(1, 2, 0).view(x.size(1), x.size(2), *s) for x, s in zip(current_vision_feats[:-1], feat_sizes[:-1]) ] else: high_res_features = None @@ -901,9 +871,7 @@ class SAM2Base(torch.nn.Module): # (see it as a GT mask) without using a SAM prompt encoder + mask decoder. pix_feat = current_vision_feats[-1].permute(1, 2, 0) pix_feat = pix_feat.view(-1, self.hidden_dim, *feat_sizes[-1]) - sam_outputs = self._use_mask_as_output( - pix_feat, high_res_features, mask_inputs - ) + sam_outputs = self._use_mask_as_output(pix_feat, high_res_features, mask_inputs) else: # fused the visual feature with previous memory features in the memory bank pix_feat = self._prepare_memory_conditioned_features( @@ -994,17 +962,7 @@ class SAM2Base(torch.nn.Module): prev_sam_mask_logits, ) - ( - _, - _, - _, - low_res_masks, - high_res_masks, - obj_ptr, - object_score_logits, - best_iou_score, - kf_ious - ) = sam_outputs + _, _, _, low_res_masks, high_res_masks, obj_ptr, object_score_logits, best_iou_score, kf_ious = sam_outputs current_out["pred_masks"] = low_res_masks current_out["pred_masks_high_res"] = high_res_masks diff --git a/sam2/modeling/sam2_utils.py b/bboxmaskpose/sam2/modeling/sam2_utils.py similarity index 95% rename from sam2/modeling/sam2_utils.py rename to bboxmaskpose/sam2/modeling/sam2_utils.py index e16caae3a9a49e451b2d03d1ee60c47f8e9ed23c..27016bd561c10a0717d6bf99add168540a310e8b 100644 --- a/sam2/modeling/sam2_utils.py +++ b/bboxmaskpose/sam2/modeling/sam2_utils.py @@ -13,7 +13,7 @@ import torch import torch.nn as nn import torch.nn.functional as F -from sam2.utils.misc import mask_to_box +from bboxmaskpose.sam2.utils.misc import mask_to_box def select_closest_cond_frames(frame_idx, cond_frame_outputs, max_cond_frame_num): @@ -54,9 +54,7 @@ def select_closest_cond_frames(frame_idx, cond_frame_outputs, max_cond_frame_num key=lambda x: abs(x - frame_idx), )[:num_remain] selected_outputs.update((t, cond_frame_outputs[t]) for t in inds_remain) - unselected_outputs = { - t: v for t, v in cond_frame_outputs.items() if t not in selected_outputs - } + unselected_outputs = {t: v for t, v in cond_frame_outputs.items() if t not in selected_outputs} return selected_outputs, unselected_outputs @@ -122,9 +120,7 @@ class MLP(nn.Module): super().__init__() self.num_layers = num_layers h = [hidden_dim] * (num_layers - 1) - self.layers = nn.ModuleList( - nn.Linear(n, k) for n, k in zip([input_dim] + h, h + [output_dim]) - ) + self.layers = nn.ModuleList(nn.Linear(n, k) for n, k in zip([input_dim] + h, h + [output_dim])) self.sigmoid_output = sigmoid_output self.act = activation() @@ -175,9 +171,7 @@ def sample_box_points( device = masks.device box_coords = mask_to_box(masks) B, _, H, W = masks.shape - box_labels = torch.tensor( - [top_left_label, bottom_right_label], dtype=torch.int, device=device - ).repeat(B) + box_labels = torch.tensor([top_left_label, bottom_right_label], dtype=torch.int, device=device).repeat(B) if noise > 0.0: if not isinstance(noise_bound, torch.Tensor): noise_bound = torch.tensor(noise_bound, device=device) @@ -189,9 +183,7 @@ def sample_box_points( box_noise = box_noise * torch.stack((max_dx, max_dy, max_dx, max_dy), dim=-1) box_coords = box_coords + box_noise - img_bounds = ( - torch.tensor([W, H, W, H], device=device) - 1 - ) # uncentered pixel coords + img_bounds = torch.tensor([W, H, W, H], device=device) - 1 # uncentered pixel coords box_coords.clamp_(torch.zeros_like(img_bounds), img_bounds) # In place clamping box_coords = box_coords.reshape(-1, 2, 2) # always 2 points diff --git a/sam2/sam2_image_predictor.py b/bboxmaskpose/sam2/sam2_image_predictor.py similarity index 82% rename from sam2/sam2_image_predictor.py rename to bboxmaskpose/sam2/sam2_image_predictor.py index 84b01b1f764ece7a545cc254b9ad407b5b2e11c5..bfbec15634099d591e9fa4c0328064dcc80900d7 100644 --- a/sam2/sam2_image_predictor.py +++ b/bboxmaskpose/sam2/sam2_image_predictor.py @@ -5,16 +5,14 @@ # LICENSE file in the root directory of this source tree. import logging - from typing import List, Optional, Tuple, Union import numpy as np import torch from PIL.Image import Image -from sam2.modeling.sam2_base import SAM2Base - -from sam2.utils.transforms import SAM2Transforms +from bboxmaskpose.sam2.modeling.sam2_base import SAM2Base +from bboxmaskpose.sam2.utils.transforms import SAM2Transforms class SAM2ImagePredictor: @@ -61,9 +59,9 @@ class SAM2ImagePredictor: # Spatial dim for backbone feature maps isize = self.model.image_size self._bb_feat_sizes = [ - (isize//4, isize//4), - (isize//8, isize//8), - (isize//16, isize//16), + (isize // 4, isize // 4), + (isize // 8, isize // 8), + (isize // 16, isize // 16), ] @classmethod @@ -78,7 +76,7 @@ class SAM2ImagePredictor: Returns: (SAM2ImagePredictor): The loaded model. """ - from sam2.build_sam import build_sam2_hf + from bboxmaskpose.sam2.build_sam import build_sam2_hf sam_model = build_sam2_hf(model_id, **kwargs) return cls(sam_model, **kwargs) @@ -111,9 +109,7 @@ class SAM2ImagePredictor: input_image = self._transforms(image) input_image = input_image[None, ...].to(self.device) - assert ( - len(input_image.shape) == 4 and input_image.shape[1] == 3 - ), f"input_image must be of size 1x3xHxW, got {input_image.shape}" + assert len(input_image.shape) == 4 and input_image.shape[1] == 3, f"input_image must be of size 1x3xHxW, got {input_image.shape}" logging.info("Computing image embeddings for the provided image...") backbone_out = self.model.forward_image(input_image) _, vision_feats, _, _ = self.model._prepare_backbone_features(backbone_out) @@ -122,10 +118,9 @@ class SAM2ImagePredictor: vision_feats[-1] = vision_feats[-1] + self.model.no_mem_embed # breakpoint() - feats = [ - feat.permute(1, 2, 0).view(1, -1, *feat_size) - for feat, feat_size in zip(vision_feats[::-1], self._bb_feat_sizes[::-1]) - ][::-1] + feats = [feat.permute(1, 2, 0).view(1, -1, *feat_size) for feat, feat_size in zip(vision_feats[::-1], self._bb_feat_sizes[::-1])][ + ::-1 + ] self._features = {"image_embed": feats[-1], "high_res_feats": feats[:-1]} self._is_image_set = True logging.info("Image embeddings computed.") @@ -148,17 +143,13 @@ class SAM2ImagePredictor: assert isinstance(image_list, list) self._orig_hw = [] for image in image_list: - assert isinstance( - image, np.ndarray - ), "Images are expected to be an np.ndarray in RGB format, and of shape HWC" + assert isinstance(image, np.ndarray), "Images are expected to be an np.ndarray in RGB format, and of shape HWC" self._orig_hw.append(image.shape[:2]) # Transform the image to the form expected by the model img_batch = self._transforms.forward_batch(image_list) img_batch = img_batch.to(self.device) batch_size = img_batch.shape[0] - assert ( - len(img_batch.shape) == 4 and img_batch.shape[1] == 3 - ), f"img_batch must be of size Bx3xHxW, got {img_batch.shape}" + assert len(img_batch.shape) == 4 and img_batch.shape[1] == 3, f"img_batch must be of size Bx3xHxW, got {img_batch.shape}" logging.info("Computing image embeddings for the provided images...") backbone_out = self.model.forward_image(img_batch) _, vision_feats, _, _ = self.model._prepare_backbone_features(backbone_out) @@ -167,8 +158,7 @@ class SAM2ImagePredictor: vision_feats[-1] = vision_feats[-1] + self.model.no_mem_embed feats = [ - feat.permute(1, 2, 0).view(batch_size, -1, *feat_size) - for feat, feat_size in zip(vision_feats[::-1], self._bb_feat_sizes[::-1]) + feat.permute(1, 2, 0).view(batch_size, -1, *feat_size) for feat, feat_size in zip(vision_feats[::-1], self._bb_feat_sizes[::-1]) ][::-1] self._features = {"image_embed": feats[-1], "high_res_feats": feats[:-1]} self._is_image_set = True @@ -190,25 +180,17 @@ class SAM2ImagePredictor: """ assert self._is_batch, "This function should only be used when in batched mode" if not self._is_image_set: - raise RuntimeError( - "An image must be set with .set_image_batch(...) before mask prediction." - ) + raise RuntimeError("An image must be set with .set_image_batch(...) before mask prediction.") num_images = len(self._features["image_embed"]) all_masks = [] all_ious = [] all_low_res_masks = [] for img_idx in range(num_images): # Transform input prompts - point_coords = ( - point_coords_batch[img_idx] if point_coords_batch is not None else None - ) - point_labels = ( - point_labels_batch[img_idx] if point_labels_batch is not None else None - ) + point_coords = point_coords_batch[img_idx] if point_coords_batch is not None else None + point_labels = point_labels_batch[img_idx] if point_labels_batch is not None else None box = box_batch[img_idx] if box_batch is not None else None - mask_input = ( - mask_input_batch[img_idx] if mask_input_batch is not None else None - ) + mask_input = mask_input_batch[img_idx] if mask_input_batch is not None else None mask_input, unnorm_coords, labels, unnorm_box = self._prep_prompts( point_coords, point_labels, @@ -227,9 +209,7 @@ class SAM2ImagePredictor: img_idx=img_idx, ) masks_np = masks.squeeze(0).float().detach().cpu().numpy() - iou_predictions_np = ( - iou_predictions.squeeze(0).float().detach().cpu().numpy() - ) + iou_predictions_np = iou_predictions.squeeze(0).float().detach().cpu().numpy() low_res_masks_np = low_res_masks.squeeze(0).float().detach().cpu().numpy() all_masks.append(masks_np) all_ious.append(iou_predictions_np) @@ -281,15 +261,11 @@ class SAM2ImagePredictor: a subsequent iteration as mask input. """ if not self._is_image_set: - raise RuntimeError( - "An image must be set with .set_image(...) before mask prediction." - ) + raise RuntimeError("An image must be set with .set_image(...) before mask prediction.") # Transform input prompts - mask_input, unnorm_coords, labels, unnorm_box = self._prep_prompts( - point_coords, point_labels, box, mask_input, normalize_coords - ) + mask_input, unnorm_coords, labels, unnorm_box = self._prep_prompts(point_coords, point_labels, box, mask_input, normalize_coords) masks, iou_predictions, low_res_masks = self._predict( unnorm_coords, @@ -305,33 +281,21 @@ class SAM2ImagePredictor: low_res_masks_np = low_res_masks.squeeze(0).float().detach().cpu().numpy() return masks_np, iou_predictions_np, low_res_masks_np - def _prep_prompts( - self, point_coords, point_labels, box, mask_logits, normalize_coords, img_idx=-1 - ): + def _prep_prompts(self, point_coords, point_labels, box, mask_logits, normalize_coords, img_idx=-1): unnorm_coords, labels, unnorm_box, mask_input = None, None, None, None if point_coords is not None: - assert ( - point_labels is not None - ), "point_labels must be supplied if point_coords is supplied." - point_coords = torch.as_tensor( - point_coords, dtype=torch.float, device=self.device - ) - unnorm_coords = self._transforms.transform_coords( - point_coords, normalize=normalize_coords, orig_hw=self._orig_hw[img_idx] - ) + assert point_labels is not None, "point_labels must be supplied if point_coords is supplied." + point_coords = torch.as_tensor(point_coords, dtype=torch.float, device=self.device) + unnorm_coords = self._transforms.transform_coords(point_coords, normalize=normalize_coords, orig_hw=self._orig_hw[img_idx]) labels = torch.as_tensor(point_labels, dtype=torch.int, device=self.device) if len(unnorm_coords.shape) == 2: unnorm_coords, labels = unnorm_coords[None, ...], labels[None, ...] if box is not None: box = torch.as_tensor(box, dtype=torch.float, device=self.device) - unnorm_box = self._transforms.transform_boxes( - box, normalize=normalize_coords, orig_hw=self._orig_hw[img_idx] - ) # Bx2x2 + unnorm_box = self._transforms.transform_boxes(box, normalize=normalize_coords, orig_hw=self._orig_hw[img_idx]) # Bx2x2 if mask_logits is not None: - mask_input = torch.as_tensor( - mask_logits, dtype=torch.float, device=self.device - ) + mask_input = torch.as_tensor(mask_logits, dtype=torch.float, device=self.device) if len(mask_input.shape) == 3: mask_input = mask_input[None, :, :, :] return mask_input, unnorm_coords, labels, unnorm_box @@ -383,9 +347,7 @@ class SAM2ImagePredictor: a subsequent iteration as mask input. """ if not self._is_image_set: - raise RuntimeError( - "An image must be set with .set_image(...) before mask prediction." - ) + raise RuntimeError("An image must be set with .set_image(...) before mask prediction.") if point_coords is not None: concat_points = (point_coords, point_labels) @@ -413,13 +375,8 @@ class SAM2ImagePredictor: ) # Predict masks - batched_mode = ( - concat_points is not None and concat_points[0].shape[0] > 1 - ) # multi object prediction - high_res_features = [ - feat_level[img_idx].unsqueeze(0) - for feat_level in self._features["high_res_feats"] - ] + batched_mode = concat_points is not None and concat_points[0].shape[0] > 1 # multi object prediction + high_res_features = [feat_level[img_idx].unsqueeze(0) for feat_level in self._features["high_res_feats"]] low_res_masks, iou_predictions, _, _ = self.model.sam_mask_decoder( image_embeddings=self._features["image_embed"][img_idx].unsqueeze(0), image_pe=self.model.sam_prompt_encoder.get_dense_pe(), @@ -431,9 +388,7 @@ class SAM2ImagePredictor: ) # Upscale the masks to the original image resolution - masks = self._transforms.postprocess_masks( - low_res_masks, self._orig_hw[img_idx] - ) + masks = self._transforms.postprocess_masks(low_res_masks, self._orig_hw[img_idx]) low_res_masks = torch.clamp(low_res_masks, -32.0, 32.0) if not return_logits: masks = masks > self.mask_threshold @@ -447,12 +402,8 @@ class SAM2ImagePredictor: the embedding spatial dimension of SAM (typically C=256, H=W=64). """ if not self._is_image_set: - raise RuntimeError( - "An image must be set with .set_image(...) to generate an embedding." - ) - assert ( - self._features is not None - ), "Features must exist if an image has been set." + raise RuntimeError("An image must be set with .set_image(...) to generate an embedding.") + assert self._features is not None, "Features must exist if an image has been set." return self._features["image_embed"] @property diff --git a/sam2/sam2_video_predictor.py b/bboxmaskpose/sam2/sam2_video_predictor.py similarity index 80% rename from sam2/sam2_video_predictor.py rename to bboxmaskpose/sam2/sam2_video_predictor.py index 1f2626ceba4f8f079e66553e549db4fb8d28a777..65883c492926b4129a516c4fe3ed86296bd76e3a 100644 --- a/sam2/sam2_video_predictor.py +++ b/bboxmaskpose/sam2/sam2_video_predictor.py @@ -9,7 +9,6 @@ from collections import OrderedDict import torch import torch.nn.functional as F - from tqdm import tqdm from sam2.modeling.sam2_base import NO_OBJ_SCORE, SAM2Base @@ -27,11 +26,6 @@ class SAM2VideoPredictor(SAM2Base): # whether to clear non-conditioning memory of the surrounding frames (which may contain outdated information) after adding correction clicks; # note that this would only apply to *single-object tracking* unless `clear_non_cond_mem_for_multi_obj` is also set to True) clear_non_cond_mem_around_input=False, -<<<<<<< HEAD - # whether to also clear non-conditioning memory of the surrounding frames (only effective when `clear_non_cond_mem_around_input` is True). - clear_non_cond_mem_for_multi_obj=False, -======= ->>>>>>> 2b90b9f5ceec907a1c18123530e92e794ad901a4 # if `add_all_frames_to_correct_as_cond` is True, we also append to the conditioning frame list any frame that receives a later correction click # if `add_all_frames_to_correct_as_cond` is False, we conditioning frame list to only use those initial conditioning frames add_all_frames_to_correct_as_cond=False, @@ -41,10 +35,6 @@ class SAM2VideoPredictor(SAM2Base): self.fill_hole_area = fill_hole_area self.non_overlap_masks = non_overlap_masks self.clear_non_cond_mem_around_input = clear_non_cond_mem_around_input -<<<<<<< HEAD - self.clear_non_cond_mem_for_multi_obj = clear_non_cond_mem_for_multi_obj -======= ->>>>>>> 2b90b9f5ceec907a1c18123530e92e794ad901a4 self.add_all_frames_to_correct_as_cond = add_all_frames_to_correct_as_cond @torch.inference_mode() @@ -296,9 +286,7 @@ class SAM2VideoPredictor(SAM2Base): is_cond=is_cond, consolidate_at_video_res=True, ) - _, video_res_masks = self._get_orig_video_res_output( - inference_state, consolidated_out["pred_masks_video_res"] - ) + _, video_res_masks = self._get_orig_video_res_output(inference_state, consolidated_out["pred_masks_video_res"]) return frame_idx, obj_ids, video_res_masks def add_new_points(self, *args, **kwargs): @@ -384,9 +372,7 @@ class SAM2VideoPredictor(SAM2Base): is_cond=is_cond, consolidate_at_video_res=True, ) - _, video_res_masks = self._get_orig_video_res_output( - inference_state, consolidated_out["pred_masks_video_res"] - ) + _, video_res_masks = self._get_orig_video_res_output(inference_state, consolidated_out["pred_masks_video_res"]) return frame_idx, obj_ids, video_res_masks def _get_orig_video_res_output(self, inference_state, any_res_masks): @@ -450,23 +436,6 @@ class SAM2VideoPredictor(SAM2Base): dtype=torch.float32, device=inference_state["storage_device"], ), -<<<<<<< HEAD - "obj_ptr": torch.full( - size=(batch_size, self.hidden_dim), - fill_value=NO_OBJ_SCORE, - dtype=torch.float32, - device=inference_state["device"], - ), - "object_score_logits": torch.full( - size=(batch_size, 1), - # default to 10.0 for object_score_logits, i.e. assuming the object is - # present as sigmoid(10)=1, same as in `predict_masks` of `MaskDecoder` - fill_value=10.0, - dtype=torch.float32, - device=inference_state["device"], - ), -======= ->>>>>>> 2b90b9f5ceec907a1c18123530e92e794ad901a4 } for obj_idx in range(batch_size): obj_temp_output_dict = inference_state["temp_output_dict_per_obj"][obj_idx] @@ -499,36 +468,6 @@ class SAM2VideoPredictor(SAM2Base): align_corners=False, ) consolidated_pred_masks[obj_idx : obj_idx + 1] = resized_obj_mask -<<<<<<< HEAD - consolidated_out["obj_ptr"][obj_idx : obj_idx + 1] = out["obj_ptr"] - consolidated_out["object_score_logits"][obj_idx : obj_idx + 1] = out[ - "object_score_logits" - ] - - # Optionally, apply non-overlapping constraints on the consolidated scores - # and rerun the memory encoder - if run_mem_encoder: - device = inference_state["device"] - high_res_masks = torch.nn.functional.interpolate( - consolidated_out["pred_masks"].to(device, non_blocking=True), - size=(self.image_size, self.image_size), - mode="bilinear", - align_corners=False, - ) - if self.non_overlap_masks_for_mem_enc: - high_res_masks = self._apply_non_overlapping_constraints(high_res_masks) - maskmem_features, maskmem_pos_enc = self._run_memory_encoder( - inference_state=inference_state, - frame_idx=frame_idx, - batch_size=batch_size, - high_res_masks=high_res_masks, - object_score_logits=consolidated_out["object_score_logits"], - is_mask_from_pts=True, # these frames are what the user interacted with - ) - consolidated_out["maskmem_features"] = maskmem_features - consolidated_out["maskmem_pos_enc"] = maskmem_pos_enc -======= ->>>>>>> 2b90b9f5ceec907a1c18123530e92e794ad901a4 return consolidated_out @@ -538,9 +477,7 @@ class SAM2VideoPredictor(SAM2Base): # Check and make sure that every object has received input points or masks. batch_size = self._get_obj_num(inference_state) if batch_size == 0: - raise RuntimeError( - "No input points or masks are provided for any object; please add inputs first." - ) + raise RuntimeError("No input points or masks are provided for any object; please add inputs first.") # Consolidate per-object temporary outputs in "temp_output_dict_per_obj" and # add them into "output_dict". @@ -549,9 +486,7 @@ class SAM2VideoPredictor(SAM2Base): obj_temp_output_dict = inference_state["temp_output_dict_per_obj"][obj_idx] for is_cond in [False, True]: # Separately consolidate conditioning and non-conditioning temp outputs - storage_key = ( - "cond_frame_outputs" if is_cond else "non_cond_frame_outputs" - ) + storage_key = "cond_frame_outputs" if is_cond else "non_cond_frame_outputs" # Find all the frames that contain temporary outputs for any objects # (these should be the frames that have just received clicks for mask inputs # via `add_new_points_or_box` or `add_new_mask`) @@ -579,9 +514,7 @@ class SAM2VideoPredictor(SAM2Base): obj_output_dict[storage_key][frame_idx] = out if self.clear_non_cond_mem_around_input: # clear non-conditioning memory of the surrounding frames - self._clear_obj_non_cond_mem_around_input( - inference_state, frame_idx, obj_idx - ) + self._clear_obj_non_cond_mem_around_input(inference_state, frame_idx, obj_idx) # clear temporary outputs in `temp_output_dict_per_obj` obj_temp_output_dict[storage_key].clear() @@ -590,9 +523,7 @@ class SAM2VideoPredictor(SAM2Base): obj_output_dict = inference_state["output_dict_per_obj"][obj_idx] if len(obj_output_dict["cond_frame_outputs"]) == 0: obj_id = self._obj_idx_to_id(inference_state, obj_idx) - raise RuntimeError( - f"No input points or masks are provided for object id {obj_id}; please add inputs first." - ) + raise RuntimeError(f"No input points or masks are provided for object id {obj_id}; please add inputs first.") # edge case: if an output is added to "cond_frame_outputs", we remove any prior # output on the same frame in "non_cond_frame_outputs" for frame_idx in obj_output_dict["cond_frame_outputs"]: @@ -617,9 +548,7 @@ class SAM2VideoPredictor(SAM2Base): if start_frame_idx is None: # default: start from the earliest frame with input points start_frame_idx = min( - t - for obj_output_dict in inference_state["output_dict_per_obj"].values() - for t in obj_output_dict["cond_frame_outputs"] + t for obj_output_dict in inference_state["output_dict_per_obj"].values() for t in obj_output_dict["cond_frame_outputs"] ) if max_frame_num_to_track is None: # default: track all the frames in the video @@ -631,9 +560,7 @@ class SAM2VideoPredictor(SAM2Base): else: processing_order = [] # skip reverse tracking if starting from frame 0 else: - end_frame_idx = min( - start_frame_idx + max_frame_num_to_track, num_frames - 1 - ) + end_frame_idx = min(start_frame_idx + max_frame_num_to_track, num_frames - 1) processing_order = range(start_frame_idx, end_frame_idx + 1) for frame_idx in tqdm(processing_order, desc="propagate in video"): @@ -651,9 +578,7 @@ class SAM2VideoPredictor(SAM2Base): pred_masks = current_out["pred_masks"].to(device, non_blocking=True) if self.clear_non_cond_mem_around_input: # clear non-conditioning memory of the surrounding frames - self._clear_obj_non_cond_mem_around_input( - inference_state, frame_idx, obj_idx - ) + self._clear_obj_non_cond_mem_around_input(inference_state, frame_idx, obj_idx) else: storage_key = "non_cond_frame_outputs" current_out, pred_masks = self._run_single_frame_inference( @@ -669,9 +594,7 @@ class SAM2VideoPredictor(SAM2Base): ) obj_output_dict[storage_key][frame_idx] = current_out - inference_state["frames_tracked_per_obj"][obj_idx][frame_idx] = { - "reverse": reverse - } + inference_state["frames_tracked_per_obj"][obj_idx][frame_idx] = {"reverse": reverse} pred_masks_per_obj[obj_idx] = pred_masks # Resize the output mask to the original video resolution (we directly use @@ -680,42 +603,11 @@ class SAM2VideoPredictor(SAM2Base): all_pred_masks = torch.cat(pred_masks_per_obj, dim=0) else: all_pred_masks = pred_masks_per_obj[0] - _, video_res_masks = self._get_orig_video_res_output( - inference_state, all_pred_masks - ) + _, video_res_masks = self._get_orig_video_res_output(inference_state, all_pred_masks) yield frame_idx, obj_ids, video_res_masks @torch.inference_mode() - def clear_all_prompts_in_frame( - self, inference_state, frame_idx, obj_id, need_output=True - ): -<<<<<<< HEAD - """ - Split a multi-object output into per-object output slices and add them into - `output_dict_per_obj`. The resulting slices share the same tensor storage. - """ - maskmem_features = current_out["maskmem_features"] - assert maskmem_features is None or isinstance(maskmem_features, torch.Tensor) - - maskmem_pos_enc = current_out["maskmem_pos_enc"] - assert maskmem_pos_enc is None or isinstance(maskmem_pos_enc, list) - - output_dict_per_obj = inference_state["output_dict_per_obj"] - for obj_idx, obj_output_dict in output_dict_per_obj.items(): - obj_slice = slice(obj_idx, obj_idx + 1) - obj_out = { - "maskmem_features": None, - "maskmem_pos_enc": None, - "pred_masks": current_out["pred_masks"][obj_slice], - "obj_ptr": current_out["obj_ptr"][obj_slice], - "object_score_logits": current_out["object_score_logits"][obj_slice], - } - if maskmem_features is not None: - obj_out["maskmem_features"] = maskmem_features[obj_slice] - if maskmem_pos_enc is not None: - obj_out["maskmem_pos_enc"] = [x[obj_slice] for x in maskmem_pos_enc] - obj_output_dict[storage_key][frame_idx] = obj_out -======= + def clear_all_prompts_in_frame(self, inference_state, frame_idx, obj_id, need_output=True): """Remove all input points or mask in a specific frame for a given object.""" obj_idx = self._obj_id_to_idx(inference_state, obj_id) @@ -740,91 +632,14 @@ class SAM2VideoPredictor(SAM2Base): return # Finally, output updated masks per object (after removing the inputs above) obj_ids = inference_state["obj_ids"] - is_cond = any( - frame_idx in obj_temp_output_dict["cond_frame_outputs"] - for obj_temp_output_dict in temp_output_dict_per_obj.values() - ) + is_cond = any(frame_idx in obj_temp_output_dict["cond_frame_outputs"] for obj_temp_output_dict in temp_output_dict_per_obj.values()) consolidated_out = self._consolidate_temp_output_across_obj( inference_state, frame_idx, is_cond=is_cond, consolidate_at_video_res=True, ) - _, video_res_masks = self._get_orig_video_res_output( - inference_state, consolidated_out["pred_masks_video_res"] - ) - return frame_idx, obj_ids, video_res_masks ->>>>>>> 2b90b9f5ceec907a1c18123530e92e794ad901a4 - - @torch.inference_mode() - def clear_all_prompts_in_frame( - self, inference_state, frame_idx, obj_id, need_output=True - ): - """Remove all input points or mask in a specific frame for a given object.""" - obj_idx = self._obj_id_to_idx(inference_state, obj_id) - - # Clear the conditioning information on the given frame - inference_state["point_inputs_per_obj"][obj_idx].pop(frame_idx, None) - inference_state["mask_inputs_per_obj"][obj_idx].pop(frame_idx, None) - - temp_output_dict_per_obj = inference_state["temp_output_dict_per_obj"] - temp_output_dict_per_obj[obj_idx]["cond_frame_outputs"].pop(frame_idx, None) - temp_output_dict_per_obj[obj_idx]["non_cond_frame_outputs"].pop(frame_idx, None) - - # Check and see if there are still any inputs left on this frame - batch_size = self._get_obj_num(inference_state) - frame_has_input = False - for obj_idx2 in range(batch_size): - if frame_idx in inference_state["point_inputs_per_obj"][obj_idx2]: - frame_has_input = True - break - if frame_idx in inference_state["mask_inputs_per_obj"][obj_idx2]: - frame_has_input = True - break - - # If this frame has no remaining inputs for any objects, we further clear its - # conditioning frame status - if not frame_has_input: - output_dict = inference_state["output_dict"] - consolidated_frame_inds = inference_state["consolidated_frame_inds"] - consolidated_frame_inds["cond_frame_outputs"].discard(frame_idx) - consolidated_frame_inds["non_cond_frame_outputs"].discard(frame_idx) - # Remove the frame's conditioning output (possibly downgrading it to non-conditioning) - out = output_dict["cond_frame_outputs"].pop(frame_idx, None) - if out is not None: - # The frame is not a conditioning frame anymore since it's not receiving inputs, - # so we "downgrade" its output (if exists) to a non-conditioning frame output. - output_dict["non_cond_frame_outputs"][frame_idx] = out - inference_state["frames_already_tracked"].pop(frame_idx, None) - # Similarly, do it for the sliced output on each object. - for obj_idx2 in range(batch_size): - obj_output_dict = inference_state["output_dict_per_obj"][obj_idx2] - obj_out = obj_output_dict["cond_frame_outputs"].pop(frame_idx, None) - if obj_out is not None: - obj_output_dict["non_cond_frame_outputs"][frame_idx] = obj_out - - # If all the conditioning frames have been removed, we also clear the tracking outputs - if len(output_dict["cond_frame_outputs"]) == 0: - self._reset_tracking_results(inference_state) - - if not need_output: - return - # Finally, output updated masks per object (after removing the inputs above) - obj_ids = inference_state["obj_ids"] - is_cond = any( - frame_idx in obj_temp_output_dict["cond_frame_outputs"] - for obj_temp_output_dict in temp_output_dict_per_obj.values() - ) - consolidated_out = self._consolidate_temp_output_across_obj( - inference_state, - frame_idx, - is_cond=is_cond, - run_mem_encoder=False, - consolidate_at_video_res=True, - ) - _, video_res_masks = self._get_orig_video_res_output( - inference_state, consolidated_out["pred_masks_video_res"] - ) + _, video_res_masks = self._get_orig_video_res_output(inference_state, consolidated_out["pred_masks_video_res"]) return frame_idx, obj_ids, video_res_masks @torch.inference_mode() @@ -859,9 +674,7 @@ class SAM2VideoPredictor(SAM2Base): def _get_image_feature(self, inference_state, frame_idx, batch_size): """Compute the image features on a given frame.""" # Look up in the cache first - image, backbone_out = inference_state["cached_features"].get( - frame_idx, (None, None) - ) + image, backbone_out = inference_state["cached_features"].get(frame_idx, (None, None)) if backbone_out is None: # Cache miss -- we will run inference on a single image device = inference_state["device"] @@ -878,9 +691,7 @@ class SAM2VideoPredictor(SAM2Base): "vision_pos_enc": backbone_out["vision_pos_enc"].copy(), } for i, feat in enumerate(expanded_backbone_out["backbone_fpn"]): - expanded_backbone_out["backbone_fpn"][i] = feat.expand( - batch_size, -1, -1, -1 - ) + expanded_backbone_out["backbone_fpn"][i] = feat.expand(batch_size, -1, -1, -1) for i, pos in enumerate(expanded_backbone_out["vision_pos_enc"]): pos = pos.expand(batch_size, -1, -1, -1) expanded_backbone_out["vision_pos_enc"][i] = pos @@ -935,33 +746,23 @@ class SAM2VideoPredictor(SAM2Base): if maskmem_features is not None: maskmem_features = maskmem_features.to(torch.bfloat16) maskmem_features = maskmem_features.to(storage_device, non_blocking=True) - pred_masks_gpu = current_out["pred_masks"] # (B, 1, H, W) + pred_masks_gpu = current_out["pred_masks"] # potentially fill holes in the predicted masks if self.fill_hole_area > 0: - pred_masks_gpu = fill_holes_in_mask_scores( - pred_masks_gpu, self.fill_hole_area - ) + pred_masks_gpu = fill_holes_in_mask_scores(pred_masks_gpu, self.fill_hole_area) pred_masks = pred_masks_gpu.to(storage_device, non_blocking=True) # "maskmem_pos_enc" is the same across frames, so we only need to store one copy of it maskmem_pos_enc = self._get_maskmem_pos_enc(inference_state, current_out) # object pointer is a small tensor, so we always keep it on GPU memory for fast access obj_ptr = current_out["obj_ptr"] object_score_logits = current_out["object_score_logits"] -<<<<<<< HEAD - best_iou_score = current_out["best_iou_score"] -======= ->>>>>>> 2b90b9f5ceec907a1c18123530e92e794ad901a4 # make a compact version of this frame's output to reduce the state size compact_current_out = { - "maskmem_features": maskmem_features, # (B, C, H, W) - "maskmem_pos_enc": maskmem_pos_enc, + "maskmem_features": maskmem_features, + "maskmem_pos_enc": maskmem_pos_enc, "pred_masks": pred_masks, "obj_ptr": obj_ptr, "object_score_logits": object_score_logits, -<<<<<<< HEAD - "best_iou_score": best_iou_score, -======= ->>>>>>> 2b90b9f5ceec907a1c18123530e92e794ad901a4 } return compact_current_out, pred_masks_gpu @@ -980,9 +781,7 @@ class SAM2VideoPredictor(SAM2Base): memory also need to be computed again with the memory encoder. """ # Retrieve correct image features - _, _, current_vision_feats, _, feat_sizes = self._get_image_feature( - inference_state, frame_idx, batch_size - ) + _, _, current_vision_feats, _, feat_sizes = self._get_image_feature(inference_state, frame_idx, batch_size) maskmem_features, maskmem_pos_enc = self._encode_new_memory( current_vision_feats=current_vision_feats, feat_sizes=feat_sizes, @@ -996,9 +795,7 @@ class SAM2VideoPredictor(SAM2Base): maskmem_features = maskmem_features.to(torch.bfloat16) maskmem_features = maskmem_features.to(storage_device, non_blocking=True) # "maskmem_pos_enc" is the same across frames, so we only need to store one copy of it - maskmem_pos_enc = self._get_maskmem_pos_enc( - inference_state, {"maskmem_pos_enc": maskmem_pos_enc} - ) + maskmem_pos_enc = self._get_maskmem_pos_enc(inference_state, {"maskmem_pos_enc": maskmem_pos_enc}) return maskmem_features, maskmem_pos_enc def _get_maskmem_pos_enc(self, inference_state, current_out): @@ -1019,9 +816,7 @@ class SAM2VideoPredictor(SAM2Base): maskmem_pos_enc = model_constants["maskmem_pos_enc"] # expand the cached maskmem_pos_enc to the actual batch size batch_size = out_maskmem_pos_enc[0].size(0) - expanded_maskmem_pos_enc = [ - x.expand(batch_size, -1, -1, -1) for x in maskmem_pos_enc - ] + expanded_maskmem_pos_enc = [x.expand(batch_size, -1, -1, -1) for x in maskmem_pos_enc] else: expanded_maskmem_pos_enc = None return expanded_maskmem_pos_enc @@ -1039,8 +834,7 @@ class SAM2VideoPredictor(SAM2Base): if not strict: return inference_state["obj_ids"], updated_frames raise RuntimeError( - f"Cannot remove object id {obj_id} as it doesn't exist. " - f"All existing object ids: {inference_state['obj_ids']}." + f"Cannot remove object id {obj_id} as it doesn't exist. " f"All existing object ids: {inference_state['obj_ids']}." ) # If this is the only remaining object id, we simply reset the state. @@ -1054,16 +848,10 @@ class SAM2VideoPredictor(SAM2Base): # (note that this step is required as it might downgrade conditioning frames to # non-conditioning ones) obj_input_frames_inds = set() - obj_input_frames_inds.update( - inference_state["point_inputs_per_obj"][old_obj_idx_to_rm] - ) - obj_input_frames_inds.update( - inference_state["mask_inputs_per_obj"][old_obj_idx_to_rm] - ) + obj_input_frames_inds.update(inference_state["point_inputs_per_obj"][old_obj_idx_to_rm]) + obj_input_frames_inds.update(inference_state["mask_inputs_per_obj"][old_obj_idx_to_rm]) for frame_idx in obj_input_frames_inds: - self.clear_all_prompts_in_frame( - inference_state, frame_idx, obj_id, need_output=False - ) + self.clear_all_prompts_in_frame(inference_state, frame_idx, obj_id, need_output=False) # Step 1: Update the object id mapping (note that it must be done after Step 0, # since Step 0 still requires the old object id mappings in inference_state) @@ -1080,11 +868,6 @@ class SAM2VideoPredictor(SAM2Base): inference_state["obj_ids"] = new_obj_ids # Step 2: For per-object tensor storage, we shift their obj_idx in the dict keys. -<<<<<<< HEAD - # (note that "consolidated_frame_inds" doesn't need to be updated in this step as - # it's already handled in Step 0) -======= ->>>>>>> 2b90b9f5ceec907a1c18123530e92e794ad901a4 def _map_keys(container): new_kvs = [] for k in old_obj_inds: @@ -1097,57 +880,23 @@ class SAM2VideoPredictor(SAM2Base): _map_keys(inference_state["mask_inputs_per_obj"]) _map_keys(inference_state["output_dict_per_obj"]) _map_keys(inference_state["temp_output_dict_per_obj"]) -<<<<<<< HEAD - - # Step 3: For packed tensor storage, we index the remaining ids and rebuild the per-object slices. - def _slice_state(output_dict, storage_key): - for frame_idx, out in output_dict[storage_key].items(): - out["maskmem_features"] = out["maskmem_features"][remain_old_obj_inds] - out["maskmem_pos_enc"] = [ - x[remain_old_obj_inds] for x in out["maskmem_pos_enc"] - ] - # "maskmem_pos_enc" is the same across frames, so we only need to store one copy of it - out["maskmem_pos_enc"] = self._get_maskmem_pos_enc(inference_state, out) - out["pred_masks"] = out["pred_masks"][remain_old_obj_inds] - out["obj_ptr"] = out["obj_ptr"][remain_old_obj_inds] - out["object_score_logits"] = out["object_score_logits"][ - remain_old_obj_inds - ] - # also update the per-object slices - self._add_output_per_object( - inference_state, frame_idx, out, storage_key - ) - - _slice_state(inference_state["output_dict"], "cond_frame_outputs") - _slice_state(inference_state["output_dict"], "non_cond_frame_outputs") - - # Step 4: Further collect the outputs on those frames in `obj_input_frames_inds`, which -======= _map_keys(inference_state["frames_tracked_per_obj"]) # Step 3: Further collect the outputs on those frames in `obj_input_frames_inds`, which ->>>>>>> 2b90b9f5ceec907a1c18123530e92e794ad901a4 # could show an updated mask for objects previously occluded by the object being removed if need_output: temp_output_dict_per_obj = inference_state["temp_output_dict_per_obj"] for frame_idx in obj_input_frames_inds: is_cond = any( - frame_idx in obj_temp_output_dict["cond_frame_outputs"] - for obj_temp_output_dict in temp_output_dict_per_obj.values() + frame_idx in obj_temp_output_dict["cond_frame_outputs"] for obj_temp_output_dict in temp_output_dict_per_obj.values() ) consolidated_out = self._consolidate_temp_output_across_obj( inference_state, frame_idx, is_cond=is_cond, -<<<<<<< HEAD - run_mem_encoder=False, -======= ->>>>>>> 2b90b9f5ceec907a1c18123530e92e794ad901a4 consolidate_at_video_res=True, ) - _, video_res_masks = self._get_orig_video_res_output( - inference_state, consolidated_out["pred_masks_video_res"] - ) + _, video_res_masks = self._get_orig_video_res_output(inference_state, consolidated_out["pred_masks_video_res"]) updated_frames.append((frame_idx, video_res_masks)) return inference_state["obj_ids"], updated_frames @@ -1218,18 +967,12 @@ class SAM2VideoPredictorVOS(SAM2VideoPredictor): if self.use_high_res_features_in_sam: # precompute projected level 0 and level 1 features in SAM decoder # to avoid running it again on every SAM click - backbone_out["backbone_fpn"][0] = self.sam_mask_decoder.conv_s0( - backbone_out["backbone_fpn"][0] - ) - backbone_out["backbone_fpn"][1] = self.sam_mask_decoder.conv_s1( - backbone_out["backbone_fpn"][1] - ) + backbone_out["backbone_fpn"][0] = self.sam_mask_decoder.conv_s0(backbone_out["backbone_fpn"][0]) + backbone_out["backbone_fpn"][1] = self.sam_mask_decoder.conv_s1(backbone_out["backbone_fpn"][1]) # Clone to help torch.compile for i in range(len(backbone_out["backbone_fpn"])): backbone_out["backbone_fpn"][i] = backbone_out["backbone_fpn"][i].clone() - backbone_out["vision_pos_enc"][i] = backbone_out["vision_pos_enc"][ - i - ].clone() + backbone_out["vision_pos_enc"][i] = backbone_out["vision_pos_enc"][i].clone() return backbone_out def _forward_sam_heads( @@ -1388,9 +1131,7 @@ class SAM2VideoPredictorVOS(SAM2VideoPredictor): # optionally, apply non-overlapping constraints to the masks (it's applied # in the batch dimension and should only be used during eval, where all # the objects come from the same video under batch size 1). - pred_masks_high_res = self._apply_non_overlapping_constraints( - pred_masks_high_res - ) + pred_masks_high_res = self._apply_non_overlapping_constraints(pred_masks_high_res) # scale the raw mask logits with a temperature before applying sigmoid binarize = self.binarize_mask_from_pts_for_mem_enc and is_mask_from_pts if binarize and not self.training: @@ -1403,9 +1144,7 @@ class SAM2VideoPredictorVOS(SAM2VideoPredictor): mask_for_mem = mask_for_mem * self.sigmoid_scale_for_mem_enc if self.sigmoid_bias_for_mem_enc != 0.0: mask_for_mem = mask_for_mem + self.sigmoid_bias_for_mem_enc - maskmem_out = self.memory_encoder( - pix_feat, mask_for_mem, skip_mask_sigmoid=True # sigmoid already applied - ) + maskmem_out = self.memory_encoder(pix_feat, mask_for_mem, skip_mask_sigmoid=True) # sigmoid already applied # Clone the feats and pos_enc to enable compilation maskmem_features = maskmem_out["vision_features"].clone() maskmem_pos_enc = [m.clone() for m in maskmem_out["vision_pos_enc"]] @@ -1413,9 +1152,7 @@ class SAM2VideoPredictorVOS(SAM2VideoPredictor): # is predicted to be occluded (i.e. no object is appearing in the frame) if self.no_obj_embed_spatial is not None: is_obj_appearing = (object_score_logits > 0).float() - maskmem_features += ( - 1 - is_obj_appearing[..., None, None] - ) * self.no_obj_embed_spatial[..., None, None].expand( + maskmem_features += (1 - is_obj_appearing[..., None, None]) * self.no_obj_embed_spatial[..., None, None].expand( *maskmem_features.shape ) diff --git a/sam2/sam2_video_predictor_legacy.py b/bboxmaskpose/sam2/sam2_video_predictor_legacy.py similarity index 93% rename from sam2/sam2_video_predictor_legacy.py rename to bboxmaskpose/sam2/sam2_video_predictor_legacy.py index c7e01ccf972491904b013526333826b337354db1..22ad9aeeb684c775e6d0674abcb12fe3606808dd 100644 --- a/sam2/sam2_video_predictor_legacy.py +++ b/bboxmaskpose/sam2/sam2_video_predictor_legacy.py @@ -8,11 +8,10 @@ import warnings from collections import OrderedDict import torch - from tqdm import tqdm -from sam2.modeling.sam2_base import NO_OBJ_SCORE, SAM2Base -from sam2.utils.misc import concat_points, fill_holes_in_mask_scores, load_video_frames +from bboxmaskpose.sam2.modeling.sam2_base import NO_OBJ_SCORE, SAM2Base +from bboxmaskpose.sam2.utils.misc import concat_points, fill_holes_in_mask_scores, load_video_frames class SAM2VideoPredictor(SAM2Base): @@ -122,7 +121,7 @@ class SAM2VideoPredictor(SAM2Base): Returns: (SAM2VideoPredictor): The loaded model. """ - from sam2.build_sam import build_sam2_video_predictor_hf + from bboxmaskpose.sam2.build_sam import build_sam2_video_predictor_hf sam_model = build_sam2_video_predictor_hf(model_id, **kwargs) return sam_model @@ -308,9 +307,7 @@ class SAM2VideoPredictor(SAM2Base): run_mem_encoder=False, consolidate_at_video_res=True, ) - _, video_res_masks = self._get_orig_video_res_output( - inference_state, consolidated_out["pred_masks_video_res"] - ) + _, video_res_masks = self._get_orig_video_res_output(inference_state, consolidated_out["pred_masks_video_res"]) return frame_idx, obj_ids, video_res_masks def add_new_points(self, *args, **kwargs): @@ -396,9 +393,7 @@ class SAM2VideoPredictor(SAM2Base): run_mem_encoder=False, consolidate_at_video_res=True, ) - _, video_res_masks = self._get_orig_video_res_output( - inference_state, consolidated_out["pred_masks_video_res"] - ) + _, video_res_masks = self._get_orig_video_res_output(inference_state, consolidated_out["pred_masks_video_res"]) return frame_idx, obj_ids, video_res_masks def _get_orig_video_res_output(self, inference_state, any_res_masks): @@ -503,9 +498,7 @@ class SAM2VideoPredictor(SAM2Base): # i.e. when we need to build the memory for tracking). if run_mem_encoder: if empty_mask_ptr is None: - empty_mask_ptr = self._get_empty_mask_ptr( - inference_state, frame_idx - ) + empty_mask_ptr = self._get_empty_mask_ptr(inference_state, frame_idx) # fill object pointer with a dummy pointer (based on an empty mask) consolidated_out["obj_ptr"][obj_idx : obj_idx + 1] = empty_mask_ptr continue @@ -524,9 +517,7 @@ class SAM2VideoPredictor(SAM2Base): ) consolidated_pred_masks[obj_idx : obj_idx + 1] = resized_obj_mask consolidated_out["obj_ptr"][obj_idx : obj_idx + 1] = out["obj_ptr"] - consolidated_out["object_score_logits"][obj_idx : obj_idx + 1] = out[ - "object_score_logits" - ] + consolidated_out["object_score_logits"][obj_idx : obj_idx + 1] = out["object_score_logits"] # Optionally, apply non-overlapping constraints on the consolidated scores # and rerun the memory encoder @@ -621,12 +612,8 @@ class SAM2VideoPredictor(SAM2Base): ) # merge them into "output_dict" and also create per-object slices output_dict[storage_key][frame_idx] = consolidated_out - self._add_output_per_object( - inference_state, frame_idx, consolidated_out, storage_key - ) - clear_non_cond_mem = self.clear_non_cond_mem_around_input and ( - self.clear_non_cond_mem_for_multi_obj or batch_size <= 1 - ) + self._add_output_per_object(inference_state, frame_idx, consolidated_out, storage_key) + clear_non_cond_mem = self.clear_non_cond_mem_around_input and (self.clear_non_cond_mem_for_multi_obj or batch_size <= 1) if clear_non_cond_mem: # clear non-conditioning memory of the surrounding frames self._clear_non_cond_mem_around_input(inference_state, frame_idx) @@ -648,10 +635,7 @@ class SAM2VideoPredictor(SAM2Base): # Make sure that the frame indices in "consolidated_frame_inds" are exactly those frames # with either points or mask inputs (which should be true under a correct workflow). - all_consolidated_frame_inds = ( - consolidated_frame_inds["cond_frame_outputs"] - | consolidated_frame_inds["non_cond_frame_outputs"] - ) + all_consolidated_frame_inds = consolidated_frame_inds["cond_frame_outputs"] | consolidated_frame_inds["non_cond_frame_outputs"] input_frames_inds = set() for point_inputs_per_frame in inference_state["point_inputs_per_obj"].values(): input_frames_inds.update(point_inputs_per_frame.keys()) @@ -677,9 +661,7 @@ class SAM2VideoPredictor(SAM2Base): batch_size = self._get_obj_num(inference_state) if len(output_dict["cond_frame_outputs"]) == 0: raise RuntimeError("No points are provided; please add points first") - clear_non_cond_mem = self.clear_non_cond_mem_around_input and ( - self.clear_non_cond_mem_for_multi_obj or batch_size <= 1 - ) + clear_non_cond_mem = self.clear_non_cond_mem_around_input and (self.clear_non_cond_mem_for_multi_obj or batch_size <= 1) # set start index, end index, and processing order if start_frame_idx is None: @@ -695,9 +677,7 @@ class SAM2VideoPredictor(SAM2Base): else: processing_order = [] # skip reverse tracking if starting from frame 0 else: - end_frame_idx = min( - start_frame_idx + max_frame_num_to_track, num_frames - 1 - ) + end_frame_idx = min(start_frame_idx + max_frame_num_to_track, num_frames - 1) processing_order = range(start_frame_idx, end_frame_idx + 1) for frame_idx in tqdm(processing_order, desc="propagate in video"): @@ -732,21 +712,15 @@ class SAM2VideoPredictor(SAM2Base): output_dict[storage_key][frame_idx] = current_out # Create slices of per-object outputs for subsequent interaction with each # individual object after tracking. - self._add_output_per_object( - inference_state, frame_idx, current_out, storage_key - ) + self._add_output_per_object(inference_state, frame_idx, current_out, storage_key) inference_state["frames_already_tracked"][frame_idx] = {"reverse": reverse} # Resize the output mask to the original video resolution (we directly use # the mask scores on GPU for output to avoid any CPU conversion in between) - _, video_res_masks = self._get_orig_video_res_output( - inference_state, pred_masks - ) + _, video_res_masks = self._get_orig_video_res_output(inference_state, pred_masks) yield frame_idx, obj_ids, video_res_masks - def _add_output_per_object( - self, inference_state, frame_idx, current_out, storage_key - ): + def _add_output_per_object(self, inference_state, frame_idx, current_out, storage_key): """ Split a multi-object output into per-object output slices and add them into `output_dict_per_obj`. The resulting slices share the same tensor storage. @@ -774,9 +748,7 @@ class SAM2VideoPredictor(SAM2Base): obj_output_dict[storage_key][frame_idx] = obj_out @torch.inference_mode() - def clear_all_prompts_in_frame( - self, inference_state, frame_idx, obj_id, need_output=True - ): + def clear_all_prompts_in_frame(self, inference_state, frame_idx, obj_id, need_output=True): """Remove all input points or mask in a specific frame for a given object.""" obj_idx = self._obj_id_to_idx(inference_state, obj_id) @@ -828,10 +800,7 @@ class SAM2VideoPredictor(SAM2Base): return # Finally, output updated masks per object (after removing the inputs above) obj_ids = inference_state["obj_ids"] - is_cond = any( - frame_idx in obj_temp_output_dict["cond_frame_outputs"] - for obj_temp_output_dict in temp_output_dict_per_obj.values() - ) + is_cond = any(frame_idx in obj_temp_output_dict["cond_frame_outputs"] for obj_temp_output_dict in temp_output_dict_per_obj.values()) consolidated_out = self._consolidate_temp_output_across_obj( inference_state, frame_idx, @@ -839,9 +808,7 @@ class SAM2VideoPredictor(SAM2Base): run_mem_encoder=False, consolidate_at_video_res=True, ) - _, video_res_masks = self._get_orig_video_res_output( - inference_state, consolidated_out["pred_masks_video_res"] - ) + _, video_res_masks = self._get_orig_video_res_output(inference_state, consolidated_out["pred_masks_video_res"]) return frame_idx, obj_ids, video_res_masks @torch.inference_mode() @@ -879,9 +846,7 @@ class SAM2VideoPredictor(SAM2Base): def _get_image_feature(self, inference_state, frame_idx, batch_size): """Compute the image features on a given frame.""" # Look up in the cache first - image, backbone_out = inference_state["cached_features"].get( - frame_idx, (None, None) - ) + image, backbone_out = inference_state["cached_features"].get(frame_idx, (None, None)) if backbone_out is None: # Cache miss -- we will run inference on a single image device = inference_state["device"] @@ -898,9 +863,7 @@ class SAM2VideoPredictor(SAM2Base): "vision_pos_enc": backbone_out["vision_pos_enc"].copy(), } for i, feat in enumerate(expanded_backbone_out["backbone_fpn"]): - expanded_backbone_out["backbone_fpn"][i] = feat.expand( - batch_size, -1, -1, -1 - ) + expanded_backbone_out["backbone_fpn"][i] = feat.expand(batch_size, -1, -1, -1) for i, pos in enumerate(expanded_backbone_out["vision_pos_enc"]): pos = pos.expand(batch_size, -1, -1, -1) expanded_backbone_out["vision_pos_enc"][i] = pos @@ -958,9 +921,7 @@ class SAM2VideoPredictor(SAM2Base): pred_masks_gpu = current_out["pred_masks"] # potentially fill holes in the predicted masks if self.fill_hole_area > 0: - pred_masks_gpu = fill_holes_in_mask_scores( - pred_masks_gpu, self.fill_hole_area - ) + pred_masks_gpu = fill_holes_in_mask_scores(pred_masks_gpu, self.fill_hole_area) pred_masks = pred_masks_gpu.to(storage_device, non_blocking=True) # "maskmem_pos_enc" is the same across frames, so we only need to store one copy of it maskmem_pos_enc = self._get_maskmem_pos_enc(inference_state, current_out) @@ -992,9 +953,7 @@ class SAM2VideoPredictor(SAM2Base): memory also need to be computed again with the memory encoder. """ # Retrieve correct image features - _, _, current_vision_feats, _, feat_sizes = self._get_image_feature( - inference_state, frame_idx, batch_size - ) + _, _, current_vision_feats, _, feat_sizes = self._get_image_feature(inference_state, frame_idx, batch_size) maskmem_features, maskmem_pos_enc = self._encode_new_memory( current_vision_feats=current_vision_feats, feat_sizes=feat_sizes, @@ -1008,9 +967,7 @@ class SAM2VideoPredictor(SAM2Base): maskmem_features = maskmem_features.to(torch.bfloat16) maskmem_features = maskmem_features.to(storage_device, non_blocking=True) # "maskmem_pos_enc" is the same across frames, so we only need to store one copy of it - maskmem_pos_enc = self._get_maskmem_pos_enc( - inference_state, {"maskmem_pos_enc": maskmem_pos_enc} - ) + maskmem_pos_enc = self._get_maskmem_pos_enc(inference_state, {"maskmem_pos_enc": maskmem_pos_enc}) return maskmem_features, maskmem_pos_enc def _get_maskmem_pos_enc(self, inference_state, current_out): @@ -1031,9 +988,7 @@ class SAM2VideoPredictor(SAM2Base): maskmem_pos_enc = model_constants["maskmem_pos_enc"] # expand the cached maskmem_pos_enc to the actual batch size batch_size = out_maskmem_pos_enc[0].size(0) - expanded_maskmem_pos_enc = [ - x.expand(batch_size, -1, -1, -1) for x in maskmem_pos_enc - ] + expanded_maskmem_pos_enc = [x.expand(batch_size, -1, -1, -1) for x in maskmem_pos_enc] else: expanded_maskmem_pos_enc = None return expanded_maskmem_pos_enc @@ -1051,8 +1006,7 @@ class SAM2VideoPredictor(SAM2Base): if not strict: return inference_state["obj_ids"], updated_frames raise RuntimeError( - f"Cannot remove object id {obj_id} as it doesn't exist. " - f"All existing object ids: {inference_state['obj_ids']}." + f"Cannot remove object id {obj_id} as it doesn't exist. " f"All existing object ids: {inference_state['obj_ids']}." ) # If this is the only remaining object id, we simply reset the state. @@ -1066,16 +1020,10 @@ class SAM2VideoPredictor(SAM2Base): # (note that this step is required as it might downgrade conditioning frames to # non-conditioning ones) obj_input_frames_inds = set() - obj_input_frames_inds.update( - inference_state["point_inputs_per_obj"][old_obj_idx_to_rm] - ) - obj_input_frames_inds.update( - inference_state["mask_inputs_per_obj"][old_obj_idx_to_rm] - ) + obj_input_frames_inds.update(inference_state["point_inputs_per_obj"][old_obj_idx_to_rm]) + obj_input_frames_inds.update(inference_state["mask_inputs_per_obj"][old_obj_idx_to_rm]) for frame_idx in obj_input_frames_inds: - self.clear_all_prompts_in_frame( - inference_state, frame_idx, obj_id, need_output=False - ) + self.clear_all_prompts_in_frame(inference_state, frame_idx, obj_id, need_output=False) # Step 1: Update the object id mapping (note that it must be done after Step 0, # since Step 0 still requires the old object id mappings in inference_state) @@ -1111,20 +1059,14 @@ class SAM2VideoPredictor(SAM2Base): def _slice_state(output_dict, storage_key): for frame_idx, out in output_dict[storage_key].items(): out["maskmem_features"] = out["maskmem_features"][remain_old_obj_inds] - out["maskmem_pos_enc"] = [ - x[remain_old_obj_inds] for x in out["maskmem_pos_enc"] - ] + out["maskmem_pos_enc"] = [x[remain_old_obj_inds] for x in out["maskmem_pos_enc"]] # "maskmem_pos_enc" is the same across frames, so we only need to store one copy of it out["maskmem_pos_enc"] = self._get_maskmem_pos_enc(inference_state, out) out["pred_masks"] = out["pred_masks"][remain_old_obj_inds] out["obj_ptr"] = out["obj_ptr"][remain_old_obj_inds] - out["object_score_logits"] = out["object_score_logits"][ - remain_old_obj_inds - ] + out["object_score_logits"] = out["object_score_logits"][remain_old_obj_inds] # also update the per-object slices - self._add_output_per_object( - inference_state, frame_idx, out, storage_key - ) + self._add_output_per_object(inference_state, frame_idx, out, storage_key) _slice_state(inference_state["output_dict"], "cond_frame_outputs") _slice_state(inference_state["output_dict"], "non_cond_frame_outputs") @@ -1135,8 +1077,7 @@ class SAM2VideoPredictor(SAM2Base): temp_output_dict_per_obj = inference_state["temp_output_dict_per_obj"] for frame_idx in obj_input_frames_inds: is_cond = any( - frame_idx in obj_temp_output_dict["cond_frame_outputs"] - for obj_temp_output_dict in temp_output_dict_per_obj.values() + frame_idx in obj_temp_output_dict["cond_frame_outputs"] for obj_temp_output_dict in temp_output_dict_per_obj.values() ) consolidated_out = self._consolidate_temp_output_across_obj( inference_state, @@ -1145,9 +1086,7 @@ class SAM2VideoPredictor(SAM2Base): run_mem_encoder=False, consolidate_at_video_res=True, ) - _, video_res_masks = self._get_orig_video_res_output( - inference_state, consolidated_out["pred_masks_video_res"] - ) + _, video_res_masks = self._get_orig_video_res_output(inference_state, consolidated_out["pred_masks_video_res"]) updated_frames.append((frame_idx, video_res_masks)) return inference_state["obj_ids"], updated_frames diff --git a/bboxmaskpose/sam2/training/README.md b/bboxmaskpose/sam2/training/README.md new file mode 100644 index 0000000000000000000000000000000000000000..b0c829d49d051d8f72e7bef959e33e6f0329c94d --- /dev/null +++ b/bboxmaskpose/sam2/training/README.md @@ -0,0 +1,116 @@ +# Training Code for SAM 2 + +This folder contains the training code for SAM 2, a foundation model for promptable visual segmentation in images and videos. +The code allows users to train and fine-tune SAM 2 on their own datasets (image, video, or both). + +## Structure + +The training code is organized into the following subfolders: + +* `dataset`: This folder contains image and video dataset and dataloader classes as well as their transforms. +* `model`: This folder contains the main model class (`SAM2Train`) for training/fine-tuning. `SAM2Train` inherits from `SAM2Base` model and provides functions to enable training or fine-tuning SAM 2. It also accepts all training-time parameters used for simulating user prompts (e.g. iterative point sampling). +* `utils`: This folder contains training utils such as loggers and distributed training utils. +* `scripts`: This folder contains the script to extract the frames of SA-V dataset to be used in training. +* `loss_fns.py`: This file has the main loss class (`MultiStepMultiMasksAndIous`) used for training. +* `optimizer.py`: This file contains all optimizer utils that support arbitrary schedulers. +* `trainer.py`: This file contains the `Trainer` class that accepts all the `Hydra` configurable modules (model, optimizer, datasets, etc..) and implements the main train/eval loop. +* `train.py`: This script is used to launch training jobs. It supports single and multi-node jobs. For usage, please check the [Getting Started](README.md#getting-started) section or run `python training/train.py -h` + +## Getting Started + +To get started with the training code, we provide a simple example to fine-tune our checkpoints on [MOSE](https://henghuiding.github.io/MOSE/) dataset, which can be extended to your custom datasets. + +#### Requirements: +- We assume training on A100 GPUs with **80 GB** of memory. +- Download the MOSE dataset using one of the provided links from [here](https://github.com/henghuiding/MOSE-api?tab=readme-ov-file#download). + +#### Steps to fine-tune on MOSE: +- Install the packages required for training by running `pip install -e ".[dev]"`. +- Set the paths for MOSE dataset in `configs/sam2.1_training/sam2.1_hiera_b+_MOSE_finetune.yaml`. + ```yaml + dataset: + # PATHS to Dataset + img_folder: null # PATH to MOSE JPEGImages folder + gt_folder: null # PATH to MOSE Annotations folder + file_list_txt: null # Optional PATH to filelist containing a subset of videos to be used for training + ``` +- To fine-tune the base model on MOSE using 8 GPUs, run + + ```python + python training/train.py \ + -c configs/sam2.1_training/sam2.1_hiera_b+_MOSE_finetune.yaml \ + --use-cluster 0 \ + --num-gpus 8 + ``` + + We also support multi-node training on a cluster using [SLURM](https://slurm.schedmd.com/documentation.html), for example, you can train on 2 nodes by running + + ```python + python training/train.py \ + -c configs/sam2.1_training/sam2.1_hiera_b+_MOSE_finetune.yaml \ + --use-cluster 1 \ + --num-gpus 8 \ + --num-nodes 2 + --partition $PARTITION \ + --qos $QOS \ + --account $ACCOUNT + ``` + where partition, qos, and account are optional and depend on your SLURM configuration. + By default, the checkpoint and logs will be saved under `sam2_logs` directory in the root of the repo. Alternatively, you can set the experiment log directory in the config file as follows: + + ```yaml + experiment_log_dir: null # Path to log directory, defaults to ./sam2_logs/${config_name} + ``` + The training losses can be monitored using `tensorboard` logs stored under `tensorboard/` in the experiment log directory. We also provide a sample validation [split]( ../training/assets/MOSE_sample_val_list.txt) for evaluation purposes. To generate predictions, follow this [guide](../tools/README.md) on how to use our `vos_inference.py` script. After generating the predictions, you can run the `sav_evaluator.py` as detailed [here](../sav_dataset/README.md#sa-v-val-and-test-evaluation). The expected MOSE J&F after fine-tuning the Base plus model is 79.4. + + + After training/fine-tuning, you can then use the new checkpoint (saved in `checkpoints/` in the experiment log directory) similar to SAM 2 released checkpoints (as illustrated [here](../README.md#image-prediction)). +## Training on images and videos +The code supports training on images and videos (similar to how SAM 2 is trained). We provide classes for loading SA-1B as a sample image dataset, SA-V as a sample video dataset, as well as any DAVIS-style video dataset (e.g. MOSE). Note that to train on SA-V, you must first extract all videos to JPEG frames using the provided extraction [script](./scripts/sav_frame_extraction_submitit.py). Below is an example of how to setup the datasets in your config to train on a mix of image and video datasets: + +```yaml +data: + train: + _target_: training.dataset.sam2_datasets.TorchTrainMixedDataset + phases_per_epoch: ${phases_per_epoch} # Chunks a single epoch into smaller phases + batch_sizes: # List of batch sizes corresponding to each dataset + - ${bs1} # Batch size of dataset 1 + - ${bs2} # Batch size of dataset 2 + datasets: + # SA1B as an example of an image dataset + - _target_: training.dataset.vos_dataset.VOSDataset + training: true + video_dataset: + _target_: training.dataset.vos_raw_dataset.SA1BRawDataset + img_folder: ${path_to_img_folder} + gt_folder: ${path_to_gt_folder} + file_list_txt: ${path_to_train_filelist} # Optional + sampler: + _target_: training.dataset.vos_sampler.RandomUniformSampler + num_frames: 1 + max_num_objects: ${max_num_objects_per_image} + transforms: ${image_transforms} + # SA-V as an example of a video dataset + - _target_: training.dataset.vos_dataset.VOSDataset + training: true + video_dataset: + _target_: training.dataset.vos_raw_dataset.JSONRawDataset + img_folder: ${path_to_img_folder} + gt_folder: ${path_to_gt_folder} + file_list_txt: ${path_to_train_filelist} # Optional + ann_every: 4 + sampler: + _target_: training.dataset.vos_sampler.RandomUniformSampler + num_frames: 8 # Number of frames per video + max_num_objects: ${max_num_objects_per_video} + reverse_time_prob: ${reverse_time_prob} # probability to reverse video + transforms: ${video_transforms} + shuffle: True + num_workers: ${num_train_workers} + pin_memory: True + drop_last: True + collate_fn: + _target_: training.utils.data_utils.collate_fn + _partial_: true + dict_key: all +``` diff --git a/sam2/utils/__init__.py b/bboxmaskpose/sam2/training/__init__.py similarity index 100% rename from sam2/utils/__init__.py rename to bboxmaskpose/sam2/training/__init__.py diff --git a/bboxmaskpose/sam2/training/assets/COCO/COCO_custom_small_train.txt b/bboxmaskpose/sam2/training/assets/COCO/COCO_custom_small_train.txt new file mode 100644 index 0000000000000000000000000000000000000000..1686d674d05f3ac1c2736b44ac461d0d972b068f --- /dev/null +++ b/bboxmaskpose/sam2/training/assets/COCO/COCO_custom_small_train.txt @@ -0,0 +1,11 @@ +sa_12754.jpg +sa_17741.jpg +sa_19157.jpg +sa_19523.jpg +sa_19608.jpg +sa_22816.jpg +sa_31092.jpg +sa_32124.jpg +sa_37209.jpg +sa_50713.jpg +sa_57703.jpg \ No newline at end of file diff --git a/bboxmaskpose/sam2/training/assets/MOSE_custom_val.txt b/bboxmaskpose/sam2/training/assets/MOSE_custom_val.txt new file mode 100644 index 0000000000000000000000000000000000000000..c69df25fc7c3123d976b24fb16a7ed5f3229b866 --- /dev/null +++ b/bboxmaskpose/sam2/training/assets/MOSE_custom_val.txt @@ -0,0 +1,7 @@ +95718597 +012b09a0 +15e281a9 +9069547d +2b84174e +5b479de9 +817263d6 \ No newline at end of file diff --git a/bboxmaskpose/sam2/training/assets/MOSE_sample_train_list.txt b/bboxmaskpose/sam2/training/assets/MOSE_sample_train_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..28b22e3170f63de0fba3c77ef999f958cd6c48ff --- /dev/null +++ b/bboxmaskpose/sam2/training/assets/MOSE_sample_train_list.txt @@ -0,0 +1,1246 @@ +28191f94 +662487fe +80906bf9 +7e704f2e +efa25913 +b6f03bd9 +6834d249 +5a723c30 +07779415 +4ce088c6 +199995b5 +54273925 +4fa342f5 +110da3cf +65856fa0 +46705bb3 +d869a3cf +555aa049 +8f01fb2c +37b07a28 +5e80b3dd +ba0e4dd4 +6f5144b6 +acec8407 +93723f88 +c7c7528c +97f58761 +e71f9faa +e64c13dc +8830d59d +0e4aeed9 +63437cf3 +95215aa1 +255f86ef +dc54aab2 +327cd258 +198021ad +c690220c +d25ff89d +7875b874 +4fa6d325 +9fc933f6 +4d8baafe +55ae6921 +6a3bc149 +89f8163f +2d65d2ac +dba172b1 +a14de179 +4017d1b3 +52ddf44c +3ba93641 +34a5f964 +da7dee28 +872b76de +1dc12eca +265a69f4 +86a2b59f +51e5ca25 +ddf80bcd +6786602e +4fa28c89 +f56942e9 +2184bb93 +d883e976 +bfe1469e +bc4e7b11 +1c80acb0 +2b0e34d3 +56b9ce41 +15f0b0cd +cc5d0dd1 +1b7eada8 +7286b176 +0ab42ab1 +adb82dc9 +c060b1e6 +3da63bd5 +5488796e +d7066e20 +aab5ed11 +17f66311 +24df9789 +208fa934 +7ce2c865 +debe4249 +4c56bbea +149dbae2 +beb693c9 +49eb0315 +e7ad4717 +4e016d5a +95e24093 +07b5d86c +80701b6c +337dfa1e +b624a46e +3f849de8 +5db21df2 +47891b4c +a966d7fd +013103f6 +da5e4bc5 +ba9ea03d +526195de +57f3a53e +b3aff7f8 +26048547 +bb7ee856 +aef0d049 +e35a8262 +57ad022e +f45d3823 +e5e9eb29 +39cc637e +a4fc4f17 +dd5a4739 +bbe97d18 +33602f6b +9061dac9 +23454d80 +a20baeec +794f01d4 +02de2f2a +055fca57 +a69df343 +e307510e +d07ad1be +1fc5e086 +db6533a5 +fe9706b7 +87e32230 +8ba58e4c +561f6380 +2ab9ba0f +86571569 +756cc6c9 +aa185af5 +c6d7f94b +7f54c579 +71f4b40e +4190c83a +fef0aba4 +2f7c71bb +e4b6f2ef +76adaeea +11cdeb64 +733f2a02 +e50dbddb +f643141f +d2e75e95 +84559bc3 +7ade3068 +e69db797 +0b787263 +57895315 +d7969c29 +62529cd4 +203733e7 +48fd97a6 +723fd024 +849f0efb +aafea009 +dd4eb8f1 +d18554ae +f3c0f0cf +90fe55b9 +b0ffaf3b +e79ecd47 +d670ce7b +56a5643a +90ff1d09 +1fb378d9 +57014c7d +994ed763 +5bc7ea74 +e99bd793 +cbb66185 +5f3fcff6 +05ed1023 +85efa9e3 +652929ce +905d8740 +a6fcde01 +0fdf67f7 +a5cf4c8d +e1c48bdd +782551f7 +6acd353f +c30641cf +81d12756 +51befc31 +9d5ab5ca +d262b7e4 +2cd705a9 +f7360199 +d3f3bf9d +028f6f64 +94767cb4 +3a739934 +72433603 +ec66879d +6149becc +5845c157 +c5082b3c +f89b54d0 +f3ada126 +409dcb8a +4411fdee +eb93ed20 +9cb1ba0e +b8e1ec26 +7edd8b4f +5e9412c0 +2744f35a +dafeb75e +f3f072f2 +6f1df574 +5a064706 +89c76ac4 +a6adef89 +76303516 +dbd67417 +a53ef3fa +10552818 +ac7deb19 +2d403c59 +55c157f1 +214aeac3 +a9f5e251 +d7807996 +d1dba33b +1367e367 +44476e77 +0644075b +eda37457 +f2de4198 +9a4ce701 +46e00caf +2ae75f99 +cd49fb99 +4e4483e7 +a0669957 +a6f0d882 +9ce1d54a +1fc2314b +21f363b3 +32ecef67 +70bcaf68 +115348f9 +60827ada +a218e951 +6d30d5ac +6da17988 +f22c39ce +5825f0e0 +f415f9ad +0d4feda2 +832fc243 +414ca58b +a92390a0 +ddd383cc +43dc67f7 +962ae0e2 +6dd74e7b +2bcd6c3b +b394847f +637fd121 +d46e771b +f6bfc699 +63f138de +932ad0a6 +2080824a +52fa9174 +843d3bf7 +f3431885 +5c20c48a +134a2ab0 +2ea465de +f6786ab5 +2bf49664 +a49ce97b +6a50e93a +a7c21e95 +616ad8ec +0a8d7b41 +b0c90527 +2d893fb7 +19310598 +7744dc51 +4539b907 +9d299f60 +e495537a +0b02886a +f4c4a2ca +e957b2b5 +e6f3bf07 +258944c8 +54364322 +ebb77f95 +0af03282 +cbdbc6c3 +494ecef0 +ee91f783 +9698f06e +11e16068 +b942ce0a +423a50e6 +fb16e746 +9c88ae45 +8620c024 +d3af3c85 +780a25de +e569a15f +c4f9f19e +1106f3a7 +d37e29a7 +e53611da +fdb2e432 +18ad3117 +6fcd426d +3bfa8379 +3b19c5c3 +ff1142df +cd182615 +b60ea255 +b3f5d019 +6dc5e55d +103166c7 +37af9ac1 +ad1881d1 +731149b3 +90e3338a +6aa0b6f2 +a25316a3 +dc8679e0 +571fb490 +80afed16 +983a551b +a58578e5 +2bc0bba4 +1143b3fe +fdd8dd49 +7fe2bf77 +890ef032 +8466eeb2 +c791ddbb +631b82bd +78bf9b51 +a99df45f +2bdb692f +e89b1501 +4e6aa1e8 +e5665030 +fe21fd5c +635577d5 +4414cd3a +03c99e83 +ff041cd1 +c33adbc2 +a988ec74 +576031e0 +03c21af7 +79b25f4b +bbc485d6 +d36d5a0d +efdab888 +b20e6781 +81fdc526 +e1c26a53 +7c6d3504 +52a04667 +f22e34d4 +bb936ead +13f0606c +d2abc61e +af509e8f +bea1c144 +e15e4de8 +e727099f +b30744df +ffb6a2e4 +0d31d3a6 +a23048fe +7d452630 +6c736334 +046ed4f4 +94f4c2aa +c290cfd3 +f7203226 +2fdae3c5 +7c78e351 +02b72b8d +2d22d3be +ba28d02e +197f6587 +43199a98 +b563b04f +9293b755 +9cef7489 +d156b96f +15e9161e +6d094cd5 +0d876a65 +c818d30a +8094b12b +a4a8e24b +14655f54 +11c14893 +8a48f62a +7f3d9c22 +d952481c +03e0f9b8 +28980657 +6a0b5563 +5879983c +37549a79 +4a7162bd +7a6aa1ef +0dc1b78c +f6dba17b +1dba51af +b2f4d608 +e2e6f421 +464066da +5d24e4ea +1e75004d +a02ed92c +673adbcc +c2a0c0fd +85addee5 +54b8f502 +f5d2d8d3 +a19507e1 +803e1756 +0d1fe009 +5968c2d8 +b926e1ad +a9162e14 +ae470d2b +bd731802 +68c879f2 +21fe05d9 +c1ed21d0 +831498e4 +cc45a7f2 +cb170015 +59750be4 +30d1cb6b +03e5f069 +106d33db +3f003746 +3e5ad020 +8bc5a91c +64b89eb5 +bfd28682 +f8687b9a +7bbf38ee +d6d92b30 +ceaa6c65 +677c8ed7 +dc33acf8 +cfd1de31 +e5be4781 +85585220 +5d2316f6 +dd3f4a07 +34535f5f +3ae0bc5d +f521e3c5 +74c2284f +12a42fd9 +61403519 +88cd32f3 +662a1846 +825a1944 +cf376cf1 +8465d99c +61a2e246 +62d44645 +103b3ca8 +c7e745ed +4ed71139 +230c2edf +529c6889 +9e509c0d +54b9dea2 +a8934c0d +29cffe2f +48017512 +c9f7f69d +ce691ee6 +21c89360 +3b97c07b +ebd82d35 +2895bb8b +7043c5c1 +85d694d7 +88fd7507 +18d8931e +aa718745 +89b671bb +0d8d30ae +26163977 +a6121689 +1589579d +159789c4 +f5ca8271 +fcc16740 +3158be0b +860fc1f7 +3f54a330 +82f24ce7 +069f6a2a +2fa9c523 +c9f1d87f +efe9cbca +8f969ea5 +4f5db794 +62c501f8 +2d3b0320 +c99637f0 +0f3b1fcb +6e4ee861 +e0d9aff0 +230ddb91 +e14d1f96 +c83aa6a1 +eabdf66a +6783a303 +81659eb2 +ce954bd7 +9a48c0c9 +0ab807b4 +f0617f71 +fe86f2f8 +61d80e22 +e4b6d2a0 +ac093040 +0e05fabe +d0b507c3 +3d828137 +c4fa0bab +f7783321 +ec27366a +404e4c58 +073baf48 +0f685e01 +b0e98fdd +b4891f7f +a46b7b77 +ee059f99 +3c87888e +8d23ddcc +2d8d7d35 +5680be79 +fc79c03e +20660b72 +53f67585 +90956534 +7e709e2d +dae93f5c +54b9dbba +cc41ba05 +1e207fe0 +a9c6abf2 +35e0ca09 +e3dcd186 +1b8bb699 +92162474 +cdad6812 +50b91533 +570215ac +6042d64a +b6e2c041 +08746283 +7a056996 +b8651773 +adf443e1 +6a6e0e3b +886ed981 +c1d57fea +43030c4c +7ebfbf57 +0770ad03 +e85301d5 +31ac3d98 +acaef45e +8f415dd1 +fe2dc281 +2c0b9d99 +8e24501e +911ec4ad +8036b58e +c3b350b9 +b6cadd11 +a3a80cf7 +88ab50cd +59c755a8 +1339321a +91b2f707 +97b0811e +1da33959 +31b09833 +c1a40349 +708098a9 +1f220f98 +999e07cb +0b5e5d29 +94c63453 +b826d642 +a598602d +4c83eab8 +2efd5e50 +6ec5da3a +9fcd95eb +9a2c6b5b +c205a718 +e638e950 +cb43141c +494dd91d +c4957274 +4975a81d +a1f4c54d +51e6fafa +514490e5 +b0d09e6a +c6726eb8 +06772c9a +5a65ffd7 +3657c62b +03012cfd +529df209 +f1c38e66 +ab417352 +118a067e +8957514f +22e8b380 +3b1a4616 +a4457543 +57c9f6e0 +e362c16b +0f809e41 +857e375e +9cff25e3 +d754fb65 +6ad44b86 +051052d8 +a4564b94 +f68507d0 +80a7cf7b +ad8cd1e0 +60b19cd3 +274fe944 +f06632aa +628a337b +92c96c05 +87fc565c +6f6e6c37 +228a0234 +6487110a +aa911a8e +40c47fa3 +9606508b +6ba9e61f +c8c1d5a9 +cf01df5b +9421b9ad +006e6b64 +1c28e081 +06273084 +8925e11b +b46c822b +00501424 +cfd946b2 +2e92a7dc +1c5f5bb6 +1d29944c +8248698e +19247506 +1eac1aff +ee9caa47 +4a41cbf8 +d97c9309 +4ca87c14 +9707f1e3 +8bb9a221 +6605e67d +95cf72d7 +1c6fb814 +033130b2 +4344808d +5f14e5d2 +a810399b +e325a6d4 +7014ddf4 +725d4bfb +790285e8 +1a6a731f +fbfb6e30 +0d4d88f6 +80ce18a4 +572495b7 +4b44dc50 +95dce33c +4a6fb202 +3142014e +a3c56751 +96b2a414 +c4aa176c +fd1e394f +93f0f509 +f494e9fa +bfa42a75 +db5319c7 +aa92e070 +81220a93 +e4a72496 +fc467bf1 +5397b01d +1dc0c9a0 +f6f8b4a6 +53dc7db4 +8ef303eb +62ca45c9 +e9d3465e +3784e3f6 +8c934e67 +5ba84e3f +30e41f1e +61cf0ec8 +e93e8f01 +fc6086dd +a95f0aea +33a04ef2 +6f295adb +d2aa8c66 +724cc810 +d8623d26 +8d0d641a +4bda7a76 +38030c69 +56199c41 +d2f4b9e2 +a7b8ac96 +64044df1 +fd1078cc +0165667b +16e1cca7 +915f0d9a +eeaaa67e +378430d5 +a84c60e6 +b4ae36cc +2a3a0571 +13e6df75 +aa348c45 +59d7a11d +68954daf +d6f883c6 +f28b429a +32dc49d4 +ccf14ee0 +7d512591 +9bdabdb2 +ed878d94 +54eda06d +132561ee +3c4b6736 +0367af42 +531c1c36 +843d8f25 +333bdbdc +c3c21268 +07b00746 +c7fe0584 +49fc9f2e +9ed4317a +d29991b4 +98b0033d +f0b922bf +89fe6899 +58264713 +2f49220a +6ff85ca5 +4b96b2c8 +a42f54f5 +aa425600 +22fdee40 +dde85a9d +3722f6fe +e7529cbc +5ae23f9f +cc32235b +730bc486 +b12701b7 +a96b3010 +16130bd3 +2c713560 +f7935d24 +a7eb6616 +0d6e7177 +100edaef +0442a954 +60f4fa43 +37bf7edf +76b18413 +ab0646a9 +c575434d +1e356390 +5416fbb7 +df7cf932 +269872de +9033b607 +c2e88575 +932542cd +23e046fb +3d08dadd +7999adc5 +ed81c485 +3bd7facd +1feae28e +8d72533b +6a8d35d6 +65308bdc +7f0b7662 +98290486 +fee3371f +c463c7e5 +faf7d852 +75c34dc5 +96a6722e +e5605136 +851bc5d9 +15c41c4b +6a39e104 +5fbff256 +0e7001dd +5411113f +3ea2f7f2 +242b74b1 +87727003 +ec6dd0e9 +980baf58 +9d0b7bf1 +9113c9d4 +5ebef6bd +a5f70ce7 +b0240233 +06ad78e0 +8745edd0 +d8e8d984 +ac32a655 +38568758 +d48c552d +0b27d5f7 +c65d0736 +800e3c14 +d37a5857 +bcebc660 +d3ab52cc +405e3ee7 +e33cddc9 +b0197182 +89fd5681 +9e192417 +8554c402 +aae923b8 +31af515d +75b26f88 +60471744 +460945aa +c0fe8e1a +1731babb +2e85e35d +f9c20062 +115da184 +ddfa88c7 +359003f8 +dfa99126 +bf04814f +f407a414 +e18723c4 +0a7a3629 +c07ab37e +1251a1c9 +4d09d22a +5984ed74 +34504f63 +ced51047 +08ff419c +d942e98c +2697f864 +3b671a61 +72a2f7e2 +48e7cafe +6adad2f7 +18840617 +1e44f47e +36cc4055 +8c494902 +2982de7a +6a428397 +c4a0ecfb +231d6945 +fe470104 +f93e1bd0 +bd18bc5a +7bd70d93 +8f81a0ee +db78e7a1 +7593caea +86d5b29b +5457b298 +0d967fd1 +62372d4c +68259db3 +f0944ea2 +7b017dbf +bcb6e338 +03692b14 +f7d36a47 +1ca2531a +6728528d +1fc0e6a8 +0ba9c5ad +a386eaa2 +b0c5459f +1d64aff3 +b97d4f1a +b3745d91 +c461003e +910bf878 +ae42601c +8d2ddeff +aaecaa39 +250b5034 +edb11192 +7bfe9b57 +6d533759 +51586b36 +a38d648a +8fdb48e5 +6075d6b0 +3588ea03 +bc844942 +398d41f5 +660e3b70 +0b99f522 +f169fd1b +7bfa2ab5 +ab461319 +25153e58 +002b4dce +a2df1bee +550a7357 +b604f2dd +2f477d05 +bdf9eb5a +857ddc6e +c8f0fd41 +6df96f15 +e147ab26 +788da8e8 +02221fb0 +d1d95c61 +a3f0cb28 +3a6e6ace +67c2909a +220382ab +eaed776d +aff08a61 +b99d1bd6 +9d9ae988 +34ccea00 +41dae436 +18513251 +ad57acd1 +67f110fc +3f09f5c9 +25ef7d43 +12a5d0d7 +3ff48b8b +26ed56e6 +c047a092 +bb8639e1 +8788747f +584838d4 +f8e5f837 +657242e8 +cb8eedf4 +74a917f1 +578f71da +c9b27125 +22e1f53c +f40145c2 +4795259b +3f313a2f +c9012bf6 +22167a50 +6e7f9437 +ef51a724 +356e0fcb +d3ea999d +08a5c662 +85aa3b0e +579fadec +7bc95dc2 +c097af8e +f01d8b9f +80fb79c6 +ea65e6b7 +29ff29f6 +9e1f739d +b7fb59c9 +e2160f17 +0be33bc1 +e96b9b04 +b1affe79 +c4f4b2e2 +f4c8ffb1 +6a009e50 +a8828854 +2786f841 +a64e724c +5f54d077 +7040385d +6e0f0ecc +f33d3c15 +8108b358 +46a502de +1e0fb02a +ddbdfa32 +e7b34ab6 +c9080ed1 +395224b3 +33f9ab47 +c245ecda +c28d81a9 +37303a3b +6380dd6f +2fb5a55b +83b7c53c +41c8d0d2 +3aab2d13 +dc7d21fb +86a88668 +37bb38fe +ab6413a8 +bbe585b2 +a0ca072a +9d5940d2 +ddb1d0b1 +a946317a +988b29a4 +89dc0432 +5df8490d +5e167efa +50a86faa +fe6a535a +a9f8b8b4 +6e2dce1b +d0696759 +c09da3b2 +f07dd347 +67408899 +406165ff +a4a9d03d +9b5f0f47 +5f3e8022 +1d7a23e0 +25af2eeb +82a3db34 +c9351029 +6c93d44c +f088ad1c +9ee59f51 +b5276b3f +ca74a924 +781af187 +fa3e0b85 +b898c99e +1ca51f06 +5a92a0c1 +138c81fe +d0722d0f +05a7d84d +e18f1dea +799a2d61 +8276e558 +f0ba8748 +ce733e8a +2f9d0911 +58f24fa4 +66a25278 +3135d31d +4b9223ee +bdd5e6b3 +ddbebec1 +8dbebbd9 +3020b38f +e607450d +724a5d1c +91b754c5 +2e85e790 +3a407bd9 +fd137178 +a304029b +4023fc77 +440d5072 +2eb73c7c +164a7305 +b33ade7c +277ad883 +b0f7e75c +74107936 +83924bdb +b72beb78 +86c01d64 +f6f441eb +23b9a3ea +80b73f1a +93c6411d +1e95ef5e +800b5eac +9519832a +ae043406 +b06a902e +1dbca5cc +571f88a1 +b1faf52b +45572497 +8d016cdb +f92cdae8 +316931f8 +f9884439 +e1b7f212 +e23c6392 +ccfae073 +5aa1efda +74f0687c +eaff3301 +b6520a94 +c5398714 +15e7e4d1 +0fc00006 +8cf49218 +3a8ddc0a +e7e2a0b9 +eec4c008 +8d73085e +77e246da +00e92ab4 +f76f6cf9 +19801183 +233406ef +b80e028c +342c0b2a +a2768c47 +99350a74 +adbd400b +f3978ade +b87a4f6c +fa95a6a2 +6dff20c9 +935b5ad8 +dbbbb401 +1b6472c1 +9c0e6331 +04ae7a6b +4c94e4f3 +90cb46cb +2831ecf5 +ff77a145 +79af6097 +ba61a719 +abcb7665 +7e87750e +c4c7bc5d +3a670b81 +3d9a7023 +82667d52 +a4587f62 +ca619b7f +7c5462f5 +bda5c60d +e6e48ac8 +405c6000 +7981f344 +f7375ab3 +bb467ff9 +cfc68a82 +e417a6d8 +1a6177c1 +7b75dace +b1af350d +484d48a3 +1f805416 +7416ab4e +1291276c +9e85179b +5a74660c +7e6d00df +01e3cec8 +ee2c0688 +f6de8226 +a217538c +b432c3ef +49e5ff4e +035359e5 +8ae8e7ed +2da12766 +cac39070 +115adda4 +1a2872dc +fac3378e +294e7bf8 +a1a4991f +c062f4d7 +72b2b77d +158062aa +9ae447a7 +a7b05677 +fdfd5d56 +eac1a9e6 +a5905593 +59992293 +84298fae +f708e55f +093d3d93 +75d26197 +924f5d88 +3184a7ec +b454fdbc +2d9101b8 +ae70fb7c +4385b2c4 +63b37343 +0b4b662c +2883ae72 +ffcab778 +0f96e2d7 +897066e3 +f23e98ad +797a7b7e +2fc476f9 diff --git a/bboxmaskpose/sam2/training/assets/MOSE_sample_val_list.txt b/bboxmaskpose/sam2/training/assets/MOSE_sample_val_list.txt new file mode 100644 index 0000000000000000000000000000000000000000..9721028718245ff5297fdae59d35a7c89cb5f56a --- /dev/null +++ b/bboxmaskpose/sam2/training/assets/MOSE_sample_val_list.txt @@ -0,0 +1,200 @@ +32e5d721 +5bad0bab +267bfd6c +0a43a414 +56c56ca9 +9a1146b3 +c6ad7aaf +78a1f4b1 +fc455e73 +072e7b3f +77ccb57d +a76ee415 +8cdcfc17 +5d518b42 +376dd830 +0e843fc8 +2af0e766 +2bd4e845 +de2f2a6a +ade9ee91 +001ca3cb +fc4c1c67 +8ef55579 +b84ce852 +4cc8528a +767ffaaa +112a2ef0 +a338c8aa +cbd144f5 +5ff72128 +86a949e2 +9f2323ac +1fab1d1c +75924351 +ef55817b +02deca50 +4d979d99 +4d65f873 +28470fa0 +0d1575fe +06ea172e +29a6ddc2 +797f1bec +780e7a99 +b9ed5b44 +02a236b4 +607d8ff5 +af5666b2 +0558d0ed +a938c6b2 +103df575 +77110e80 +739e5a07 +6763a576 +06ebc138 +ba4b3b09 +b35cc2f3 +4e0597a0 +5949ee84 +5348d547 +323c4236 +b3b51117 +55727ddd +ab2714f3 +d2878895 +c0734cb3 +94f7c53e +2a2745e5 +442ffb54 +3592425a +50ae03b0 +5f150435 +3067f9fa +9ffb2818 +adeaf5aa +31caacec +1cd99b86 +aa22f9d0 +8fa50320 +e6348d2c +42ff84a5 +8c8b7913 +c96adcbc +495be321 +db735509 +ee113fc4 +a678cdab +c409ca4d +68d2b259 +592b4dee +4e2b4dc7 +eb4d26e1 +2009a00f +bec5c89d +67191f24 +a3e85b4b +da7080cd +80d978e9 +36dcb93f +a41e8c44 +12fdc864 +46d140ea +657c9dd9 +a86f84ee +90c1c43d +33015509 +afc7664d +23df06e1 +291d4799 +0ab75563 +251bf059 +bcefdcc4 +ce9a2796 +94d3403a +8f2e04bc +f9cda066 +9dfa2cc5 +66924c91 +e765a09e +15654ee1 +48e0bd39 +ee095221 +2463609b +544d0d1f +51b8c2e1 +d321dde4 +4cb11a5f +d7058a0d +37af282a +fabae187 +7be91184 +181ec185 +2d16ceeb +b56be4b1 +6699eff0 +79acac96 +d61c4665 +0c13e1e7 +100f6ecf +71217dfc +82df0888 +4c42c747 +c9fdf703 +d2efeb4b +69ed9d14 +64914fb6 +255bedbc +4ea934d8 +a034feb2 +e4f4ddae +e36a3026 +c1489591 +111bb373 +e1d9fb32 +93e22d48 +c1ec4b26 +d9638e69 +60ab04c5 +cfe7773a +62132822 +2f5fb2a3 +7bdd197d +033333fd +130fcdbe +12e509c2 +67138c33 +6f90cc5f +4e3020fe +bbdd8bb7 +b399ccdb +fecd10d2 +2e0967f7 +f509054f +792c6ff7 +48e2afc5 +d904c048 +111e0a5c +b83024e2 +e6a7b79c +bdc5ccf7 +b8146d00 +9d394f1a +645b84f9 +95ab2d0f +e6f8a31d +b4f876fb +dc2c570d +3afd02d7 +5c80c82c +b1b32ddd +9f25fc61 +ba538072 +f8916fef +43c04ad2 +a658e949 +2861dd53 +f6e40aba +09d305d1 +aac33bff +8d9d4c08 diff --git a/bboxmaskpose/sam2/training/dataset/__init__.py b/bboxmaskpose/sam2/training/dataset/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..5277f46157403e47fd830fc519144b97ef69d4ae --- /dev/null +++ b/bboxmaskpose/sam2/training/dataset/__init__.py @@ -0,0 +1,5 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. diff --git a/bboxmaskpose/sam2/training/dataset/sam2_datasets.py b/bboxmaskpose/sam2/training/dataset/sam2_datasets.py new file mode 100644 index 0000000000000000000000000000000000000000..8a9fe6eb2d6e22551f9033464771678ef5551486 --- /dev/null +++ b/bboxmaskpose/sam2/training/dataset/sam2_datasets.py @@ -0,0 +1,171 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +import logging +import math +from typing import Callable, Iterable, List, Optional, Sequence + +import torch +from torch.utils.data import BatchSampler, DataLoader, Dataset, IterableDataset, Subset +from torch.utils.data.distributed import DistributedSampler + + +class MixedDataLoader: + def __init__(self, dataloaders: List[DataLoader], mixing_prob: torch.FloatTensor): + """ + Args: + dataloaders (List[DataLoader]): List of DataLoaders to be mixed. + mixing_prob (torch.FloatTensor): Probability of each dataloader to be sampled from + + """ + assert len(dataloaders) == mixing_prob.shape[0] + self.dataloaders = dataloaders + self.mixing_prob = mixing_prob + # Iterator state + self._iter_dls = None + self._iter_mixing_prob = None + self.random_generator = torch.Generator() + + def __len__(self): + return sum([len(d) for d in self.dataloaders]) + + def __iter__(self): + # Synchronize dataloader seeds + self.random_generator.manual_seed(42) + self._iter_dls = [iter(loader) for loader in self.dataloaders] + self._iter_mixing_prob = self.mixing_prob.clone() + return self + + def __next__(self): + """ + Sample a dataloader to sample from based on mixing probabilities. If one of the dataloaders is exhausted, we continue sampling from the other loaders until all are exhausted. + """ + if self._iter_dls is None: + raise TypeError(f"{type(self).__name__} object is not an iterator") + + while self._iter_mixing_prob.any(): # at least one D-Loader with non-zero prob. + dataset_idx = self._iter_mixing_prob.multinomial(1, generator=self.random_generator).item() + try: + item = next(self._iter_dls[dataset_idx]) + return item + except StopIteration: + # No more iterations for this dataset, set it's mixing probability to zero and try again. + self._iter_mixing_prob[dataset_idx] = 0 + except Exception as e: + # log and raise any other unexpected error. + logging.error(e) + raise e + + # Exhausted all iterators + raise StopIteration + + +class TorchTrainMixedDataset: + def __init__( + self, + datasets: List[Dataset], + batch_sizes: List[int], + num_workers: int, + shuffle: bool, + pin_memory: bool, + drop_last: bool, + collate_fn: Optional[Callable] = None, + worker_init_fn: Optional[Callable] = None, + phases_per_epoch: int = 1, + dataset_prob: Optional[List[float]] = None, + ) -> None: + """ + Args: + datasets (List[Dataset]): List of Datasets to be mixed. + batch_sizes (List[int]): Batch sizes for each dataset in the list. + num_workers (int): Number of workers per dataloader. + shuffle (bool): Whether or not to shuffle data. + pin_memory (bool): If True, use pinned memory when loading tensors from disk. + drop_last (bool): Whether or not to drop the last batch of data. + collate_fn (Callable): Function to merge a list of samples into a mini-batch. + worker_init_fn (Callable): Function to init each dataloader worker. + phases_per_epoch (int): Number of phases per epoch. + dataset_prob (List[float]): Probability of choosing the dataloader to sample from. Should sum to 1.0 + """ + + self.datasets = datasets + self.batch_sizes = batch_sizes + self.num_workers = num_workers + self.shuffle = shuffle + self.pin_memory = pin_memory + self.drop_last = drop_last + self.collate_fn = collate_fn + self.worker_init_fn = worker_init_fn + assert len(self.datasets) > 0 + for dataset in self.datasets: + assert not isinstance(dataset, IterableDataset), "Not supported" + # `RepeatFactorWrapper` requires calling set_epoch first to get its length + self._set_dataset_epoch(dataset, 0) + self.phases_per_epoch = phases_per_epoch + self.chunks = [None] * len(datasets) + if dataset_prob is None: + # If not provided, assign each dataset a probability proportional to its length. + dataset_lens = [(math.floor(len(d) / bs) if drop_last else math.ceil(len(d) / bs)) for d, bs in zip(datasets, batch_sizes)] + total_len = sum(dataset_lens) + dataset_prob = torch.tensor([d_len / total_len for d_len in dataset_lens]) + else: + assert len(dataset_prob) == len(datasets) + dataset_prob = torch.tensor(dataset_prob) + + logging.info(f"Dataset mixing probabilities: {dataset_prob.tolist()}") + assert dataset_prob.sum().item() == 1.0, "Probabilities should sum to 1.0" + self.dataset_prob = dataset_prob + + def _set_dataset_epoch(self, dataset, epoch: int) -> None: + if hasattr(dataset, "epoch"): + dataset.epoch = epoch + if hasattr(dataset, "set_epoch"): + dataset.set_epoch(epoch) + + def get_loader(self, epoch) -> Iterable: + dataloaders = [] + for d_idx, (dataset, batch_size) in enumerate(zip(self.datasets, self.batch_sizes)): + if self.phases_per_epoch > 1: + # Major epoch that looops over entire dataset + # len(main_epoch) == phases_per_epoch * len(epoch) + main_epoch = epoch // self.phases_per_epoch + + # Phase with in the main epoch + local_phase = epoch % self.phases_per_epoch + + # Start of new data-epoch or job is resumed after preemtion. + if local_phase == 0 or self.chunks[d_idx] is None: + # set seed for dataset epoch + # If using RepeatFactorWrapper, this step currectly re-samples indices before chunking. + self._set_dataset_epoch(dataset, main_epoch) + + # Separate random generator for subset sampling + g = torch.Generator() + g.manual_seed(main_epoch) + self.chunks[d_idx] = torch.chunk( + torch.randperm(len(dataset), generator=g), + self.phases_per_epoch, + ) + + dataset = Subset(dataset, self.chunks[d_idx][local_phase]) + else: + self._set_dataset_epoch(dataset, epoch) + + sampler = DistributedSampler(dataset, shuffle=self.shuffle) + sampler.set_epoch(epoch) + + batch_sampler = BatchSampler(sampler, batch_size, drop_last=self.drop_last) + dataloaders.append( + DataLoader( + dataset, + num_workers=self.num_workers, + pin_memory=self.pin_memory, + batch_sampler=batch_sampler, + collate_fn=self.collate_fn, + worker_init_fn=self.worker_init_fn, + ) + ) + return MixedDataLoader(dataloaders, self.dataset_prob) diff --git a/bboxmaskpose/sam2/training/dataset/transforms.py b/bboxmaskpose/sam2/training/dataset/transforms.py new file mode 100644 index 0000000000000000000000000000000000000000..f7b54ff1303887748a8b12b7834cd1803da3cdbc --- /dev/null +++ b/bboxmaskpose/sam2/training/dataset/transforms.py @@ -0,0 +1,477 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +""" +Transforms and data augmentation for both image + bbox. +""" + +import logging +import random +from typing import Iterable + +import torch +import torchvision.transforms as T +import torchvision.transforms.functional as F +import torchvision.transforms.v2.functional as Fv2 +from PIL import Image as PILImage +from torchvision.transforms import InterpolationMode + +from training.utils.data_utils import VideoDatapoint + + +def hflip(datapoint, index): + + datapoint.frames[index].data = F.hflip(datapoint.frames[index].data) + for obj in datapoint.frames[index].objects: + if obj.segment is not None: + obj.segment[0] = F.hflip(obj.segment[0]) + + return datapoint + + +def get_size_with_aspect_ratio(image_size, size, max_size=None): + w, h = image_size + if max_size is not None: + min_original_size = float(min((w, h))) + max_original_size = float(max((w, h))) + if max_original_size / min_original_size * size > max_size: + size = max_size * min_original_size / max_original_size + + if (w <= h and w == size) or (h <= w and h == size): + return (h, w) + + if w < h: + ow = int(round(size)) + oh = int(round(size * h / w)) + else: + oh = int(round(size)) + ow = int(round(size * w / h)) + + return (oh, ow) + + +def resize(datapoint, index, size, max_size=None, square=False, v2=False): + # size can be min_size (scalar) or (w, h) tuple + + def get_size(image_size, size, max_size=None): + if isinstance(size, (list, tuple)): + return size[::-1] + else: + return get_size_with_aspect_ratio(image_size, size, max_size) + + if square: + size = size, size + else: + cur_size = datapoint.frames[index].data.size()[-2:][::-1] if v2 else datapoint.frames[index].data.size + size = get_size(cur_size, size, max_size) + + old_size = datapoint.frames[index].data.size()[-2:][::-1] if v2 else datapoint.frames[index].data.size + if v2: + datapoint.frames[index].data = Fv2.resize(datapoint.frames[index].data, size, antialias=True) + else: + datapoint.frames[index].data = F.resize(datapoint.frames[index].data, size) + + new_size = datapoint.frames[index].data.size()[-2:][::-1] if v2 else datapoint.frames[index].data.size + + for obj in datapoint.frames[index].objects: + if obj.segment is not None: + obj.segment[0] = F.resize(obj.segment[0][None, None], size).squeeze() + + h, w = size + datapoint.frames[index].size = (h, w) + return datapoint + + +def pad(datapoint, index, padding, v2=False): + old_h, old_w = datapoint.frames[index].size + h, w = old_h, old_w + if len(padding) == 2: + # assumes that we only pad on the bottom right corners + datapoint.frames[index].data = F.pad(datapoint.frames[index].data, (0, 0, padding[0], padding[1])) + h += padding[1] + w += padding[0] + else: + # left, top, right, bottom + datapoint.frames[index].data = F.pad( + datapoint.frames[index].data, + (padding[0], padding[1], padding[2], padding[3]), + ) + h += padding[1] + padding[3] + w += padding[0] + padding[2] + + datapoint.frames[index].size = (h, w) + + for obj in datapoint.frames[index].objects: + if obj.segment is not None: + if v2: + if len(padding) == 2: + obj.segment[0] = Fv2.pad(obj.segment[0], (0, 0, padding[0], padding[1])) + else: + obj.segment[0] = Fv2.pad(obj.segment[0], tuple(padding)) + else: + if len(padding) == 2: + obj.segment[0] = F.pad(obj.segment[0], (0, 0, padding[0], padding[1])) + else: + obj.segment[0] = F.pad(obj.segment[0], tuple(padding)) + return datapoint + + +class RandomHorizontalFlip: + def __init__(self, consistent_transform, p=0.5): + self.p = p + self.consistent_transform = consistent_transform + + def __call__(self, datapoint, **kwargs): + if self.consistent_transform: + if random.random() < self.p: + for i in range(len(datapoint.frames)): + datapoint = hflip(datapoint, i) + return datapoint + for i in range(len(datapoint.frames)): + if random.random() < self.p: + datapoint = hflip(datapoint, i) + return datapoint + + +class RandomResizeAPI: + def __init__(self, sizes, consistent_transform, max_size=None, square=False, v2=False): + if isinstance(sizes, int): + sizes = (sizes,) + assert isinstance(sizes, Iterable) + self.sizes = list(sizes) + self.max_size = max_size + self.square = square + self.consistent_transform = consistent_transform + self.v2 = v2 + + def __call__(self, datapoint, **kwargs): + if self.consistent_transform: + size = random.choice(self.sizes) + for i in range(len(datapoint.frames)): + datapoint = resize(datapoint, i, size, self.max_size, square=self.square, v2=self.v2) + return datapoint + for i in range(len(datapoint.frames)): + size = random.choice(self.sizes) + datapoint = resize(datapoint, i, size, self.max_size, square=self.square, v2=self.v2) + return datapoint + + +class ToTensorAPI: + def __init__(self, v2=False): + self.v2 = v2 + + def __call__(self, datapoint: VideoDatapoint, **kwargs): + for img in datapoint.frames: + if self.v2: + img.data = Fv2.to_image_tensor(img.data) + else: + img.data = F.to_tensor(img.data) + return datapoint + + +class NormalizeAPI: + def __init__(self, mean, std, v2=False): + self.mean = mean + self.std = std + self.v2 = v2 + + def __call__(self, datapoint: VideoDatapoint, **kwargs): + for img in datapoint.frames: + if self.v2: + img.data = Fv2.convert_image_dtype(img.data, torch.float32) + img.data = Fv2.normalize(img.data, mean=self.mean, std=self.std) + else: + img.data = F.normalize(img.data, mean=self.mean, std=self.std) + + return datapoint + + +class ComposeAPI: + def __init__(self, transforms): + self.transforms = transforms + + def __call__(self, datapoint, **kwargs): + for t in self.transforms: + datapoint = t(datapoint, **kwargs) + return datapoint + + def __repr__(self): + format_string = self.__class__.__name__ + "(" + for t in self.transforms: + format_string += "\n" + format_string += " {0}".format(t) + format_string += "\n)" + return format_string + + +class RandomGrayscale: + def __init__(self, consistent_transform, p=0.5): + self.p = p + self.consistent_transform = consistent_transform + self.Grayscale = T.Grayscale(num_output_channels=3) + + def __call__(self, datapoint: VideoDatapoint, **kwargs): + if self.consistent_transform: + if random.random() < self.p: + for img in datapoint.frames: + img.data = self.Grayscale(img.data) + return datapoint + for img in datapoint.frames: + if random.random() < self.p: + img.data = self.Grayscale(img.data) + return datapoint + + +class ColorJitter: + def __init__(self, consistent_transform, brightness, contrast, saturation, hue): + self.consistent_transform = consistent_transform + self.brightness = brightness if isinstance(brightness, list) else [max(0, 1 - brightness), 1 + brightness] + self.contrast = contrast if isinstance(contrast, list) else [max(0, 1 - contrast), 1 + contrast] + self.saturation = saturation if isinstance(saturation, list) else [max(0, 1 - saturation), 1 + saturation] + self.hue = hue if isinstance(hue, list) or hue is None else ([-hue, hue]) + + def __call__(self, datapoint: VideoDatapoint, **kwargs): + if self.consistent_transform: + # Create a color jitter transformation params + ( + fn_idx, + brightness_factor, + contrast_factor, + saturation_factor, + hue_factor, + ) = T.ColorJitter.get_params(self.brightness, self.contrast, self.saturation, self.hue) + for img in datapoint.frames: + if not self.consistent_transform: + ( + fn_idx, + brightness_factor, + contrast_factor, + saturation_factor, + hue_factor, + ) = T.ColorJitter.get_params(self.brightness, self.contrast, self.saturation, self.hue) + for fn_id in fn_idx: + if fn_id == 0 and brightness_factor is not None: + img.data = F.adjust_brightness(img.data, brightness_factor) + elif fn_id == 1 and contrast_factor is not None: + img.data = F.adjust_contrast(img.data, contrast_factor) + elif fn_id == 2 and saturation_factor is not None: + img.data = F.adjust_saturation(img.data, saturation_factor) + elif fn_id == 3 and hue_factor is not None: + img.data = F.adjust_hue(img.data, hue_factor) + return datapoint + + +class RandomAffine: + def __init__( + self, + degrees, + consistent_transform, + scale=None, + translate=None, + shear=None, + image_mean=(123, 116, 103), + log_warning=True, + num_tentatives=1, + image_interpolation="bicubic", + ): + """ + The mask is required for this transform. + if consistent_transform if True, then the same random affine is applied to all frames and masks. + """ + self.degrees = degrees if isinstance(degrees, list) else ([-degrees, degrees]) + self.scale = scale + self.shear = shear if isinstance(shear, list) else ([-shear, shear] if shear else None) + self.translate = translate + self.fill_img = image_mean + self.consistent_transform = consistent_transform + self.log_warning = log_warning + self.num_tentatives = num_tentatives + + if image_interpolation == "bicubic": + self.image_interpolation = InterpolationMode.BICUBIC + elif image_interpolation == "bilinear": + self.image_interpolation = InterpolationMode.BILINEAR + else: + raise NotImplementedError + + def __call__(self, datapoint: VideoDatapoint, **kwargs): + for _tentative in range(self.num_tentatives): + res = self.transform_datapoint(datapoint) + if res is not None: + return res + + if self.log_warning: + logging.warning(f"Skip RandomAffine for zero-area mask in first frame after {self.num_tentatives} tentatives") + return datapoint + + def transform_datapoint(self, datapoint: VideoDatapoint): + _, height, width = F.get_dimensions(datapoint.frames[0].data) + img_size = [width, height] + + if self.consistent_transform: + # Create a random affine transformation + affine_params = T.RandomAffine.get_params( + degrees=self.degrees, + translate=self.translate, + scale_ranges=self.scale, + shears=self.shear, + img_size=img_size, + ) + + for img_idx, img in enumerate(datapoint.frames): + this_masks = [obj.segment[0].unsqueeze(0) if obj.segment is not None else None for obj in img.objects] + if not self.consistent_transform: + # if not consistent we create a new affine params for every frame&mask pair Create a random affine transformation + affine_params = T.RandomAffine.get_params( + degrees=self.degrees, + translate=self.translate, + scale_ranges=self.scale, + shears=self.shear, + img_size=img_size, + ) + + transformed_bboxes, transformed_masks = [], [] + for i in range(len(img.objects)): + if this_masks[i] is None: + transformed_masks.append(None) + # Dummy bbox for a dummy target + transformed_bboxes.append(torch.tensor([[0, 0, 1, 1]])) + else: + transformed_mask = F.affine( + this_masks[i], + *affine_params, + interpolation=InterpolationMode.NEAREST, + fill=0.0, + ) + if img_idx == 0 and transformed_mask.max() == 0: + # We are dealing with a video and the object is not visible in the first frame + # Return the datapoint without transformation + return None + transformed_masks.append(transformed_mask.squeeze()) + + for i in range(len(img.objects)): + img.objects[i].segment[0] = transformed_masks[i] + + img.data = F.affine( + img.data, + *affine_params, + interpolation=self.image_interpolation, + fill=self.fill_img, + ) + return datapoint + + +def random_mosaic_frame( + datapoint, + index, + grid_h, + grid_w, + target_grid_y, + target_grid_x, + should_hflip, +): + # Step 1: downsize the images and paste them into a mosaic + image_data = datapoint.frames[index].data + is_pil = isinstance(image_data, PILImage.Image) + if is_pil: + H_im = image_data.height + W_im = image_data.width + image_data_output = PILImage.new("RGB", (W_im, H_im)) + else: + H_im = image_data.size(-2) + W_im = image_data.size(-1) + image_data_output = torch.zeros_like(image_data) + + downsize_cache = {} + for grid_y in range(grid_h): + for grid_x in range(grid_w): + y_offset_b = grid_y * H_im // grid_h + x_offset_b = grid_x * W_im // grid_w + y_offset_e = (grid_y + 1) * H_im // grid_h + x_offset_e = (grid_x + 1) * W_im // grid_w + H_im_downsize = y_offset_e - y_offset_b + W_im_downsize = x_offset_e - x_offset_b + + if (H_im_downsize, W_im_downsize) in downsize_cache: + image_data_downsize = downsize_cache[(H_im_downsize, W_im_downsize)] + else: + image_data_downsize = F.resize( + image_data, + size=(H_im_downsize, W_im_downsize), + interpolation=InterpolationMode.BILINEAR, + antialias=True, # antialiasing for downsizing + ) + downsize_cache[(H_im_downsize, W_im_downsize)] = image_data_downsize + if should_hflip[grid_y, grid_x].item(): + image_data_downsize = F.hflip(image_data_downsize) + + if is_pil: + image_data_output.paste(image_data_downsize, (x_offset_b, y_offset_b)) + else: + image_data_output[:, y_offset_b:y_offset_e, x_offset_b:x_offset_e] = image_data_downsize + + datapoint.frames[index].data = image_data_output + + # Step 2: downsize the masks and paste them into the target grid of the mosaic + for obj in datapoint.frames[index].objects: + if obj.segment is None: + continue + assert obj.segment[0].shape == (H_im, W_im) and obj.segment[0].dtype == torch.uint8 + segment_output = torch.zeros_like(obj.segment[0]) + + target_y_offset_b = target_grid_y * H_im // grid_h + target_x_offset_b = target_grid_x * W_im // grid_w + target_y_offset_e = (target_grid_y + 1) * H_im // grid_h + target_x_offset_e = (target_grid_x + 1) * W_im // grid_w + target_H_im_downsize = target_y_offset_e - target_y_offset_b + target_W_im_downsize = target_x_offset_e - target_x_offset_b + + segment_downsize = F.resize( + obj.segment[0][None, None], + size=(target_H_im_downsize, target_W_im_downsize), + interpolation=InterpolationMode.BILINEAR, + antialias=True, # antialiasing for downsizing + )[0, 0] + if should_hflip[target_grid_y, target_grid_x].item(): + segment_downsize = F.hflip(segment_downsize[None, None])[0, 0] + + segment_output[target_y_offset_b:target_y_offset_e, target_x_offset_b:target_x_offset_e] = segment_downsize + obj.segment[0] = segment_output + + return datapoint + + +class RandomMosaicVideoAPI: + def __init__(self, prob=0.15, grid_h=2, grid_w=2, use_random_hflip=False): + self.prob = prob + self.grid_h = grid_h + self.grid_w = grid_w + self.use_random_hflip = use_random_hflip + + def __call__(self, datapoint, **kwargs): + if random.random() > self.prob: + return datapoint + + # select a random location to place the target mask in the mosaic + target_grid_y = random.randint(0, self.grid_h - 1) + target_grid_x = random.randint(0, self.grid_w - 1) + # whether to flip each grid in the mosaic horizontally + if self.use_random_hflip: + should_hflip = torch.rand(self.grid_h, self.grid_w) < 0.5 + else: + should_hflip = torch.zeros(self.grid_h, self.grid_w, dtype=torch.bool) + for i in range(len(datapoint.frames)): + datapoint = random_mosaic_frame( + datapoint, + i, + grid_h=self.grid_h, + grid_w=self.grid_w, + target_grid_y=target_grid_y, + target_grid_x=target_grid_x, + should_hflip=should_hflip, + ) + + return datapoint diff --git a/bboxmaskpose/sam2/training/dataset/utils.py b/bboxmaskpose/sam2/training/dataset/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..4bdcc4167fea1468083d74a1d29d701ee680052c --- /dev/null +++ b/bboxmaskpose/sam2/training/dataset/utils.py @@ -0,0 +1,98 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +"""Some wrapping utilities extended from pytorch's to support repeat factor sampling in particular""" + +from typing import Iterable + +import torch +from torch.utils.data import ConcatDataset as TorchConcatDataset, Dataset, Subset as TorchSubset + + +class ConcatDataset(TorchConcatDataset): + def __init__(self, datasets: Iterable[Dataset]) -> None: + super(ConcatDataset, self).__init__(datasets) + + self.repeat_factors = torch.cat([d.repeat_factors for d in datasets]) + + def set_epoch(self, epoch: int): + for dataset in self.datasets: + if hasattr(dataset, "epoch"): + dataset.epoch = epoch + if hasattr(dataset, "set_epoch"): + dataset.set_epoch(epoch) + + +class Subset(TorchSubset): + def __init__(self, dataset, indices) -> None: + super(Subset, self).__init__(dataset, indices) + + self.repeat_factors = dataset.repeat_factors[indices] + assert len(indices) == len(self.repeat_factors) + + +# Adapted from Detectron2 +class RepeatFactorWrapper(Dataset): + """ + Thin wrapper around a dataset to implement repeat factor sampling. + The underlying dataset must have a repeat_factors member to indicate the per-image factor. + Set it to uniformly ones to disable repeat factor sampling + """ + + def __init__(self, dataset, seed: int = 0): + self.dataset = dataset + self.epoch_ids = None + self._seed = seed + + # Split into whole number (_int_part) and fractional (_frac_part) parts. + self._int_part = torch.trunc(dataset.repeat_factors) + self._frac_part = dataset.repeat_factors - self._int_part + + def _get_epoch_indices(self, generator): + """ + Create a list of dataset indices (with repeats) to use for one epoch. + + Args: + generator (torch.Generator): pseudo random number generator used for + stochastic rounding. + + Returns: + torch.Tensor: list of dataset indices to use in one epoch. Each index + is repeated based on its calculated repeat factor. + """ + # Since repeat factors are fractional, we use stochastic rounding so + # that the target repeat factor is achieved in expectation over the + # course of training + rands = torch.rand(len(self._frac_part), generator=generator) + rep_factors = self._int_part + (rands < self._frac_part).float() + # Construct a list of indices in which we repeat images as specified + indices = [] + for dataset_index, rep_factor in enumerate(rep_factors): + indices.extend([dataset_index] * int(rep_factor.item())) + return torch.tensor(indices, dtype=torch.int64) + + def __len__(self): + if self.epoch_ids is None: + # Here we raise an error instead of returning original len(self.dataset) avoid + # accidentally using unwrapped length. Otherwise it's error-prone since the + # length changes to `len(self.epoch_ids)`changes after set_epoch is called. + raise RuntimeError("please call set_epoch first to get wrapped length") + # return len(self.dataset) + + return len(self.epoch_ids) + + def set_epoch(self, epoch: int): + g = torch.Generator() + g.manual_seed(self._seed + epoch) + self.epoch_ids = self._get_epoch_indices(g) + if hasattr(self.dataset, "set_epoch"): + self.dataset.set_epoch(epoch) + + def __getitem__(self, idx): + if self.epoch_ids is None: + raise RuntimeError("Repeat ids haven't been computed. Did you forget to call set_epoch?") + + return self.dataset[self.epoch_ids[idx]] diff --git a/bboxmaskpose/sam2/training/dataset/vos_dataset.py b/bboxmaskpose/sam2/training/dataset/vos_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..846ca2a976c556b76f4c6383ad250997bc2be159 --- /dev/null +++ b/bboxmaskpose/sam2/training/dataset/vos_dataset.py @@ -0,0 +1,161 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +import logging +import random +from copy import deepcopy + +import numpy as np +import torch +from PIL import Image as PILImage +from torchvision.datasets.vision import VisionDataset + +from iopath.common.file_io import g_pathmgr +from training.dataset.vos_raw_dataset import VOSRawDataset +from training.dataset.vos_sampler import VOSSampler +from training.dataset.vos_segment_loader import JSONSegmentLoader +from training.utils.data_utils import Frame, Object, VideoDatapoint + +MAX_RETRIES = 100 + + +class VOSDataset(VisionDataset): + def __init__( + self, + transforms, + training: bool, + video_dataset: VOSRawDataset, + sampler: VOSSampler, + multiplier: int, + always_target=True, + target_segments_available=True, + ): + print(f"Max retries = {MAX_RETRIES}") + self._transforms = transforms + self.training = training + self.video_dataset = video_dataset + self.sampler = sampler + + self.repeat_factors = torch.ones(len(self.video_dataset), dtype=torch.float32) + self.repeat_factors *= multiplier + print(f"Raw dataset length = {len(self.video_dataset)}") + + self.curr_epoch = 0 # Used in case data loader behavior changes across epochs + self.always_target = always_target + self.target_segments_available = target_segments_available + + def _get_datapoint(self, idx): + + for retry in range(MAX_RETRIES): + try: + if isinstance(idx, torch.Tensor): + idx = idx.item() + # sample a video + video, segment_loader = self.video_dataset.get_video(idx) + # sample frames and object indices to be used in a datapoint + sampled_frms_and_objs = self.sampler.sample(video, segment_loader, epoch=self.curr_epoch) + + break # Succesfully loaded video + except Exception as e: + if self.training: + logging.warning(f"Loading failed (id={idx}); Retry {retry} with exception: {e}") + idx = random.randrange(0, len(self.video_dataset)) + else: + # Shouldn't fail to load a val video + raise e + + datapoint = self.construct(video, sampled_frms_and_objs, segment_loader) + + for transform in self._transforms: + datapoint = transform(datapoint, epoch=self.curr_epoch) + + return datapoint + + def construct(self, video, sampled_frms_and_objs, segment_loader): + """ + Constructs a VideoDatapoint sample to pass to transforms + """ + + sampled_frames = sampled_frms_and_objs.frames + sampled_object_ids = sampled_frms_and_objs.object_ids + + images = [] + rgb_images = load_images(sampled_frames) + # Iterate over the sampled frames and store their rgb data and object data (bbox, segment) + for frame_idx, frame in enumerate(sampled_frames): + w, h = rgb_images[frame_idx].size + images.append( + Frame( + data=rgb_images[frame_idx], + objects=[], + ) + ) + # We load the gt segments associated with the current frame + if isinstance(segment_loader, JSONSegmentLoader): + segments = segment_loader.load(frame.frame_idx, obj_ids=sampled_object_ids) + else: + segments = segment_loader.load(frame.frame_idx) + + for obj_id in sampled_object_ids: + # Extract the segment + if obj_id in segments: + assert segments[obj_id] is not None, "None targets are not supported" + # segment is uint8 and remains uint8 throughout the transforms + segment = segments[obj_id] + segment[0] = segments[obj_id][0].to(torch.uint8) + else: + # There is no target, we either use a zero mask target or drop this object + if not self.always_target: + continue + segment = [torch.zeros(h, w, dtype=torch.uint8), torch.zeros(1), torch.zeros(1)] + + images[frame_idx].objects.append( + Object( + object_id=obj_id, + frame_index=frame.frame_idx, + segment=segment, + ) + ) + + return VideoDatapoint( + frames=images, + video_id=video.video_id, + size=(h, w), + ) + + def __getitem__(self, idx): + return self._get_datapoint(idx) + + def __len__(self): + return len(self.video_dataset) + + +def load_images(frames): + all_images = [] + cache = {} + for frame in frames: + + if frame.data is None: + # Load the frame rgb data from file + path = frame.image_path + if path in cache: + all_images.append(deepcopy(all_images[cache[path]])) + continue + with g_pathmgr.open(path, "rb") as fopen: + all_images.append(PILImage.open(fopen).convert("RGB")) + cache[path] = len(all_images) - 1 + else: + # The frame rgb data has already been loaded + # Convert it to a PILImage + all_images.append(tensor_2_PIL(frame.data)) + + return all_images + + +def tensor_2_PIL(data: torch.Tensor) -> PILImage.Image: + data = data.cpu().numpy().transpose((1, 2, 0)) * 255.0 + data = data.astype(np.uint8) + return PILImage.fromarray(data) diff --git a/bboxmaskpose/sam2/training/dataset/vos_raw_dataset.py b/bboxmaskpose/sam2/training/dataset/vos_raw_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..91e445e2242479758c43d27519faa09fd0caa91e --- /dev/null +++ b/bboxmaskpose/sam2/training/dataset/vos_raw_dataset.py @@ -0,0 +1,279 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +import glob +import logging +import os +from dataclasses import dataclass +from typing import List, Optional + +import pandas as pd +import torch + +from iopath.common.file_io import g_pathmgr +from omegaconf.listconfig import ListConfig +from training.dataset.vos_segment_loader import JSONSegmentLoader, MultiplePNGSegmentLoader, PalettisedPNGSegmentLoader, SA1BSegmentLoader + + +@dataclass +class VOSFrame: + frame_idx: int + image_path: str + data: Optional[torch.Tensor] = None + is_conditioning_only: Optional[bool] = False + + +@dataclass +class VOSVideo: + video_name: str + video_id: int + frames: List[VOSFrame] + + def __len__(self): + return len(self.frames) + + +class VOSRawDataset: + def __init__(self): + pass + + def get_video(self, idx): + raise NotImplementedError() + + +class PNGRawDataset(VOSRawDataset): + def __init__( + self, + img_folder, + gt_folder, + file_list_txt=None, + excluded_videos_list_txt=None, + sample_rate=1, + is_palette=True, + single_object_mode=False, + truncate_video=-1, + frames_sampling_mult=False, + ): + self.img_folder = img_folder + self.gt_folder = gt_folder + self.sample_rate = sample_rate + self.is_palette = is_palette + self.single_object_mode = single_object_mode + self.truncate_video = truncate_video + + # Read the subset defined in file_list_txt + if file_list_txt is not None: + with g_pathmgr.open(file_list_txt, "r") as f: + subset = [os.path.splitext(line.strip())[0] for line in f] + else: + subset = os.listdir(self.img_folder) + + # Read and process excluded files if provided + if excluded_videos_list_txt is not None: + with g_pathmgr.open(excluded_videos_list_txt, "r") as f: + excluded_files = [os.path.splitext(line.strip())[0] for line in f] + else: + excluded_files = [] + + # Check if it's not in excluded_files + self.video_names = sorted([video_name for video_name in subset if video_name not in excluded_files]) + + if self.single_object_mode: + # single object mode + self.video_names = sorted( + [ + os.path.join(video_name, obj) + for video_name in self.video_names + for obj in os.listdir(os.path.join(self.gt_folder, video_name)) + ] + ) + + if frames_sampling_mult: + video_names_mult = [] + for video_name in self.video_names: + num_frames = len(os.listdir(os.path.join(self.img_folder, video_name))) + video_names_mult.extend([video_name] * num_frames) + self.video_names = video_names_mult + + def get_video(self, idx): + """ + Given a VOSVideo object, return the mask tensors. + """ + video_name = self.video_names[idx] + + if self.single_object_mode: + video_frame_root = os.path.join(self.img_folder, os.path.dirname(video_name)) + else: + video_frame_root = os.path.join(self.img_folder, video_name) + + video_mask_root = os.path.join(self.gt_folder, video_name) + + if self.is_palette: + segment_loader = PalettisedPNGSegmentLoader(video_mask_root) + else: + segment_loader = MultiplePNGSegmentLoader(video_mask_root, self.single_object_mode) + + all_frames = sorted(glob.glob(os.path.join(video_frame_root, "*.jpg"))) + if self.truncate_video > 0: + all_frames = all_frames[: self.truncate_video] + frames = [] + for _, fpath in enumerate(all_frames[:: self.sample_rate]): + fid = int(os.path.basename(fpath).split(".")[0]) + frames.append(VOSFrame(fid, image_path=fpath)) + video = VOSVideo(video_name, idx, frames) + + return video, segment_loader + + def __len__(self): + return len(self.video_names) + + +class SA1BRawDataset(VOSRawDataset): + def __init__( + self, + img_folder, + gt_folder, + file_list_txt=None, + excluded_videos_list_txt=None, + num_frames=1, + mask_area_frac_thresh=1.1, # no filtering by default + uncertain_iou=-1, # no filtering by default + ): + self.img_folder = img_folder + self.gt_folder = gt_folder + self.num_frames = num_frames + self.mask_area_frac_thresh = mask_area_frac_thresh + self.uncertain_iou = uncertain_iou # stability score + + print("file list txt", file_list_txt) + # Read the subset defined in file_list_txt + if file_list_txt is not None: + with g_pathmgr.open(file_list_txt, "r") as f: + subset = [os.path.splitext(line.strip())[0] for line in f] + else: + subset = os.listdir(self.img_folder) + subset = [path.split(".")[0] for path in subset if path.endswith(".jpg")] # remove extension + # Read and process excluded files if provided + if excluded_videos_list_txt is not None: + with g_pathmgr.open(excluded_videos_list_txt, "r") as f: + excluded_files = [os.path.splitext(line.strip())[0] for line in f] + else: + excluded_files = [] + + # Check if it's not in excluded_files and it exists + self.video_names = [video_name for video_name in subset if video_name not in excluded_files] + + def get_video(self, idx): + """ + Given a VOSVideo object, return the mask tensors. + """ + video_name = self.video_names[idx] + + video_frame_path = os.path.join(self.img_folder, video_name + ".jpg") + video_mask_path = os.path.join(self.gt_folder, video_name + ".json") + + segment_loader = SA1BSegmentLoader( + video_mask_path, + mask_area_frac_thresh=self.mask_area_frac_thresh, + video_frame_path=video_frame_path, + uncertain_iou=self.uncertain_iou, + ) + + frames = [] + for frame_idx in range(self.num_frames): + frames.append(VOSFrame(frame_idx, image_path=video_frame_path)) + video_name = video_name.split("_")[-1] # filename is sa_{int} + # video id needs to be image_id to be able to load correct annotation file during eval + video = VOSVideo(video_name, int(video_name), frames) + return video, segment_loader + + def __len__(self): + return len(self.video_names) + + +class JSONRawDataset(VOSRawDataset): + """ + Dataset where the annotation in the format of SA-V json files + """ + + def __init__( + self, + img_folder, + gt_folder, + file_list_txt=None, + excluded_videos_list_txt=None, + sample_rate=1, + rm_unannotated=True, + ann_every=1, + frames_fps=24, + ): + self.gt_folder = gt_folder + self.img_folder = img_folder + self.sample_rate = sample_rate + self.rm_unannotated = rm_unannotated + self.ann_every = ann_every + self.frames_fps = frames_fps + + # Read and process excluded files if provided + excluded_files = [] + if excluded_videos_list_txt is not None: + if isinstance(excluded_videos_list_txt, str): + excluded_videos_lists = [excluded_videos_list_txt] + elif isinstance(excluded_videos_list_txt, ListConfig): + excluded_videos_lists = list(excluded_videos_list_txt) + else: + raise NotImplementedError + + for excluded_videos_list_txt in excluded_videos_lists: + with open(excluded_videos_list_txt, "r") as f: + excluded_files.extend([os.path.splitext(line.strip())[0] for line in f]) + excluded_files = set(excluded_files) + + # Read the subset defined in file_list_txt + if file_list_txt is not None: + with g_pathmgr.open(file_list_txt, "r") as f: + subset = [os.path.splitext(line.strip())[0] for line in f] + else: + subset = os.listdir(self.img_folder) + + self.video_names = sorted([video_name for video_name in subset if video_name not in excluded_files]) + + def get_video(self, video_idx): + """ + Given a VOSVideo object, return the mask tensors. + """ + video_name = self.video_names[video_idx] + video_json_path = os.path.join(self.gt_folder, video_name + "_manual.json") + segment_loader = JSONSegmentLoader( + video_json_path=video_json_path, + ann_every=self.ann_every, + frames_fps=self.frames_fps, + ) + + frame_ids = [int(os.path.splitext(frame_name)[0]) for frame_name in sorted(os.listdir(os.path.join(self.img_folder, video_name)))] + + frames = [ + VOSFrame( + frame_id, + image_path=os.path.join(self.img_folder, f"{video_name}/%05d.jpg" % (frame_id)), + ) + for frame_id in frame_ids[:: self.sample_rate] + ] + + if self.rm_unannotated: + # Eliminate the frames that have not been annotated + valid_frame_ids = [ + i * segment_loader.ann_every + for i, annot in enumerate(segment_loader.frame_annots) + if annot is not None and None not in annot + ] + frames = [f for f in frames if f.frame_idx in valid_frame_ids] + + video = VOSVideo(video_name, video_idx, frames) + return video, segment_loader + + def __len__(self): + return len(self.video_names) diff --git a/bboxmaskpose/sam2/training/dataset/vos_sampler.py b/bboxmaskpose/sam2/training/dataset/vos_sampler.py new file mode 100644 index 0000000000000000000000000000000000000000..fd8dcd60a4a4842f81ada4595960d45f51ba05a7 --- /dev/null +++ b/bboxmaskpose/sam2/training/dataset/vos_sampler.py @@ -0,0 +1,103 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +import random +from dataclasses import dataclass +from typing import List + +from training.dataset.vos_segment_loader import LazySegments + +MAX_RETRIES = 1000 + + +@dataclass +class SampledFramesAndObjects: + frames: List[int] + object_ids: List[int] + + +class VOSSampler: + def __init__(self, sort_frames=True): + # frames are ordered by frame id when sort_frames is True + self.sort_frames = sort_frames + + def sample(self, video): + raise NotImplementedError() + + +class RandomUniformSampler(VOSSampler): + def __init__( + self, + num_frames, + max_num_objects, + reverse_time_prob=0.0, + ): + self.num_frames = num_frames + self.max_num_objects = max_num_objects + self.reverse_time_prob = reverse_time_prob + + def sample(self, video, segment_loader, epoch=None): + + for retry in range(MAX_RETRIES): + if len(video.frames) < self.num_frames: + raise Exception( + f"Cannot sample {self.num_frames} frames from video {video.video_name} as it only has {len(video.frames)} annotated frames." + ) + start = random.randrange(0, len(video.frames) - self.num_frames + 1) + frames = [video.frames[start + step] for step in range(self.num_frames)] + if random.uniform(0, 1) < self.reverse_time_prob: + # Reverse time + frames = frames[::-1] + + # Get first frame object ids + visible_object_ids = [] + loaded_segms = segment_loader.load(frames[0].frame_idx) + if isinstance(loaded_segms, LazySegments): + # LazySegments for SA1BRawDataset + visible_object_ids = list(loaded_segms.keys()) + else: + for object_id, segment in segment_loader.load(frames[0].frame_idx).items(): + if segment.sum(): + visible_object_ids.append(object_id) + + # First frame needs to have at least a target to track + if len(visible_object_ids) > 0: + break + if retry >= MAX_RETRIES - 1: + return None + + object_ids = random.sample( + visible_object_ids, + min(len(visible_object_ids), self.max_num_objects), + ) + return SampledFramesAndObjects(frames=frames, object_ids=object_ids) + + +class EvalSampler(VOSSampler): + """ + VOS Sampler for evaluation: sampling all the frames and all the objects in a video + """ + + def __init__( + self, + ): + super().__init__() + + def sample(self, video, segment_loader, epoch=None): + """ + Sampling all the frames and all the objects + """ + if self.sort_frames: + # ordered by frame id + frames = sorted(video.frames, key=lambda x: x.frame_idx) + else: + # use the original order + frames = video.frames + object_ids = segment_loader.load(frames[0].frame_idx).keys() + if len(object_ids) == 0: + raise Exception("First frame of the video has no objects") + + return SampledFramesAndObjects(frames=frames, object_ids=object_ids) diff --git a/bboxmaskpose/sam2/training/dataset/vos_segment_loader.py b/bboxmaskpose/sam2/training/dataset/vos_segment_loader.py new file mode 100644 index 0000000000000000000000000000000000000000..6f5063249ac44774bffab66593cdb489d1700601 --- /dev/null +++ b/bboxmaskpose/sam2/training/dataset/vos_segment_loader.py @@ -0,0 +1,309 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +import glob +import json +import os + +import numpy as np +import pandas as pd +import torch +from PIL import Image as PILImage + +try: + from pycocotools import mask as mask_utils +except: + pass + + +class JSONSegmentLoader: + def __init__(self, video_json_path, ann_every=1, frames_fps=24, valid_obj_ids=None): + # Annotations in the json are provided every ann_every th frame + self.ann_every = ann_every + # Ids of the objects to consider when sampling this video + self.valid_obj_ids = valid_obj_ids + with open(video_json_path, "r") as f: + data = json.load(f) + if isinstance(data, list): + self.frame_annots = data + elif isinstance(data, dict): + masklet_field_name = "masklet" if "masklet" in data else "masks" + self.frame_annots = data[masklet_field_name] + if "fps" in data: + if isinstance(data["fps"], list): + annotations_fps = int(data["fps"][0]) + else: + annotations_fps = int(data["fps"]) + assert frames_fps % annotations_fps == 0 + self.ann_every = frames_fps // annotations_fps + else: + raise NotImplementedError + + def load(self, frame_id, obj_ids=None): + assert frame_id % self.ann_every == 0 + rle_mask = self.frame_annots[frame_id // self.ann_every] + + valid_objs_ids = set(range(len(rle_mask))) + if self.valid_obj_ids is not None: + # Remove the masklets that have been filtered out for this video + valid_objs_ids &= set(self.valid_obj_ids) + if obj_ids is not None: + # Only keep the objects that have been sampled + valid_objs_ids &= set(obj_ids) + valid_objs_ids = sorted(list(valid_objs_ids)) + + # Construct rle_masks_filtered that only contains the rle masks we are interested in + id_2_idx = {} + rle_mask_filtered = [] + for obj_id in valid_objs_ids: + if rle_mask[obj_id] is not None: + id_2_idx[obj_id] = len(rle_mask_filtered) + rle_mask_filtered.append(rle_mask[obj_id]) + else: + id_2_idx[obj_id] = None + + # Decode the masks + raw_segments = torch.from_numpy(mask_utils.decode(rle_mask_filtered)).permute(2, 0, 1) # ๏ผˆnum_obj, h, w๏ผ‰ + segments = {} + for obj_id in valid_objs_ids: + if id_2_idx[obj_id] is None: + segments[obj_id] = None + else: + idx = id_2_idx[obj_id] + segments[obj_id] = raw_segments[idx] + + return segments + + def get_valid_obj_frames_ids(self, num_frames_min=None): + # For each object, find all the frames with a valid (not None) mask + num_objects = len(self.frame_annots[0]) + + # The result dict associates each obj_id with the id of its valid frames + res = {obj_id: [] for obj_id in range(num_objects)} + + for annot_idx, annot in enumerate(self.frame_annots): + for obj_id in range(num_objects): + if annot[obj_id] is not None: + res[obj_id].append(int(annot_idx * self.ann_every)) + + if num_frames_min is not None: + # Remove masklets that have less than num_frames_min valid masks + for obj_id, valid_frames in list(res.items()): + if len(valid_frames) < num_frames_min: + res.pop(obj_id) + + return res + + +class PalettisedPNGSegmentLoader: + def __init__(self, video_png_root): + """ + SegmentLoader for datasets with masks stored as palettised PNGs. + video_png_root: the folder contains all the masks stored in png + """ + self.video_png_root = video_png_root + # build a mapping from frame id to their PNG mask path + # note that in some datasets, the PNG paths could have more + # than 5 digits, e.g. "00000000.png" instead of "00000.png" + png_filenames = os.listdir(self.video_png_root) + self.frame_id_to_png_filename = {} + for filename in png_filenames: + frame_id, _ = os.path.splitext(filename) + self.frame_id_to_png_filename[int(frame_id)] = filename + + def load(self, frame_id): + """ + load the single palettised mask from the disk (path: f'{self.video_png_root}/{frame_id:05d}.png') + Args: + frame_id: int, define the mask path + Return: + binary_segments: dict + """ + # check the path + mask_path = os.path.join(self.video_png_root, self.frame_id_to_png_filename[frame_id]) + + # load the mask + masks = PILImage.open(mask_path).convert("P") + masks = np.array(masks) + + object_id = pd.unique(masks.flatten()) + object_id = object_id[object_id != 0] # remove background (0) + + # convert into N binary segmentation masks + binary_segments = {} + for i in object_id: + bs = masks == i + binary_segments[i] = torch.from_numpy(bs) + + return binary_segments + + def __len__(self): + return + + +class MultiplePNGSegmentLoader: + def __init__(self, video_png_root, single_object_mode=False): + """ + video_png_root: the folder contains all the masks stored in png + single_object_mode: whether to load only a single object at a time + """ + self.video_png_root = video_png_root + self.single_object_mode = single_object_mode + # read a mask to know the resolution of the video + if self.single_object_mode: + tmp_mask_path = glob.glob(os.path.join(video_png_root, "*.png"))[0] + else: + tmp_mask_path = glob.glob(os.path.join(video_png_root, "*", "*.png"))[0] + tmp_mask = np.array(PILImage.open(tmp_mask_path)) + self.H = tmp_mask.shape[0] + self.W = tmp_mask.shape[1] + if self.single_object_mode: + self.obj_id = int(video_png_root.split("/")[-1]) + 1 # offset by 1 as bg is 0 + else: + self.obj_id = None + + def load(self, frame_id): + if self.single_object_mode: + return self._load_single_png(frame_id) + else: + return self._load_multiple_pngs(frame_id) + + def _load_single_png(self, frame_id): + """ + load single png from the disk (path: f'{self.obj_id}/{frame_id:05d}.png') + Args: + frame_id: int, define the mask path + Return: + binary_segments: dict + """ + mask_path = os.path.join(self.video_png_root, f"{frame_id:05d}.png") + binary_segments = {} + + if os.path.exists(mask_path): + mask = np.array(PILImage.open(mask_path)) + else: + # if png doesn't exist, empty mask + mask = np.zeros((self.H, self.W), dtype=bool) + binary_segments[self.obj_id] = torch.from_numpy(mask > 0) + return binary_segments + + def _load_multiple_pngs(self, frame_id): + """ + load multiple png masks from the disk (path: f'{obj_id}/{frame_id:05d}.png') + Args: + frame_id: int, define the mask path + Return: + binary_segments: dict + """ + # get the path + all_objects = sorted(glob.glob(os.path.join(self.video_png_root, "*"))) + num_objects = len(all_objects) + assert num_objects > 0 + + # load the masks + binary_segments = {} + for obj_folder in all_objects: + # obj_folder is {video_name}/{obj_id}, obj_id is specified by the name of the folder + obj_id = int(obj_folder.split("/")[-1]) + obj_id = obj_id + 1 # offset 1 as bg is 0 + mask_path = os.path.join(obj_folder, f"{frame_id:05d}.png") + if os.path.exists(mask_path): + mask = np.array(PILImage.open(mask_path)) + else: + mask = np.zeros((self.H, self.W), dtype=bool) + binary_segments[obj_id] = torch.from_numpy(mask > 0) + + return binary_segments + + def __len__(self): + return + + +class LazySegments: + """ + Only decodes segments that are actually used. + """ + + def __init__(self): + self.segments = {} + self.cache = {} + + def __setitem__(self, key, item): + self.segments[key] = item + + def __getitem__(self, key): + if key in self.cache: + return self.cache[key] + rle = self.segments[key][0] + mask = torch.from_numpy(mask_utils.decode([rle])).permute(2, 0, 1)[0] + self.cache[key] = [mask, self.segments[key][1], self.segments[key][2]] + return self.cache[key] + + def __contains__(self, key): + return key in self.segments + + def __len__(self): + return len(self.segments) + + def keys(self): + return self.segments.keys() + + +class SA1BSegmentLoader: + def __init__( + self, + video_mask_path, + mask_area_frac_thresh=1.1, + video_frame_path=None, + uncertain_iou=-1, + ): + with open(video_mask_path, "r") as f: + self.frame_annots = json.load(f) + + if mask_area_frac_thresh <= 1.0: + # Lazily read frame + orig_w, orig_h = PILImage.open(video_frame_path).size + area = orig_w * orig_h + + width_coeff = 1024.0 / float(self.frame_annots["image"]["width"]) + height_coeff = 1024.0 / float(self.frame_annots["image"]["height"]) + self.frame_annots = self.frame_annots["annotations"] + + rle_masks = [] + points = [] + labels = [] + + for frame_annot in self.frame_annots: + if not frame_annot["area"] > 0: + + continue + if ("uncertain_iou" in frame_annot) and (frame_annot["uncertain_iou"] < uncertain_iou): + continue + if mask_area_frac_thresh <= 1.0 and (frame_annot["area"] / area) >= mask_area_frac_thresh: + continue + + rle_masks.append(frame_annot["segmentation"]) + + # # ADD BBOX: + # frame_annot["point_coords"].append(frame_annot["bbox"][0:2]) + # frame_annot["point_coords"].append(frame_annot["bbox"][2:4]) + # frame_annot["point_labels"].extend([2, 3]) + + pt_coords = torch.tensor(frame_annot["point_coords"], device="cuda:0").float() + pt_labels = torch.tensor(frame_annot["point_labels"], device="cuda:0").float() + pt_coords[:, 0] *= width_coeff + pt_coords[:, 1] *= height_coeff + + points.append(pt_coords) + labels.append(pt_labels) + + self.segments = LazySegments() + for i in range(len(rle_masks)): + self.segments[i] = [rle_masks[i], points[i], labels[i]] + + def load(self, frame_idx): + # print("vos segment loader / segments", self.segments[0]) + return self.segments diff --git a/bboxmaskpose/sam2/training/loss_fns.py b/bboxmaskpose/sam2/training/loss_fns.py new file mode 100644 index 0000000000000000000000000000000000000000..0c4c0b469d3527f739153efcd06e6e69f1815845 --- /dev/null +++ b/bboxmaskpose/sam2/training/loss_fns.py @@ -0,0 +1,288 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +from collections import defaultdict +from typing import Dict, List + +import torch +import torch.distributed +import torch.nn as nn +import torch.nn.functional as F + +from training.trainer import CORE_LOSS_KEY +from training.utils.distributed import get_world_size, is_dist_avail_and_initialized + + +def dice_loss(inputs, targets, num_objects, loss_on_multimask=False): + """ + Compute the DICE loss, similar to generalized IOU for masks + Args: + inputs: A float tensor of arbitrary shape. + The predictions for each example. + targets: A float tensor with the same shape as inputs. Stores the binary + classification label for each element in inputs + (0 for the negative class and 1 for the positive class). + num_objects: Number of objects in the batch + loss_on_multimask: True if multimask prediction is enabled + Returns: + Dice loss tensor + """ + inputs = inputs.sigmoid() + if loss_on_multimask: + # inputs and targets are [N, M, H, W] where M corresponds to multiple predicted masks + assert inputs.dim() == 4 and targets.dim() == 4 + # flatten spatial dimension while keeping multimask channel dimension + inputs = inputs.flatten(2) + targets = targets.flatten(2) + numerator = 2 * (inputs * targets).sum(-1) + else: + inputs = inputs.flatten(1) + numerator = 2 * (inputs * targets).sum(1) + denominator = inputs.sum(-1) + targets.sum(-1) + loss = 1 - (numerator + 1) / (denominator + 1) + if loss_on_multimask: + return loss / num_objects + return loss.sum() / num_objects + + +def sigmoid_focal_loss( + inputs, + targets, + num_objects, + alpha: float = 0.25, + gamma: float = 2, + loss_on_multimask=False, +): + """ + Loss used in RetinaNet for dense detection: https://arxiv.org/abs/1708.02002. + Args: + inputs: A float tensor of arbitrary shape. + The predictions for each example. + targets: A float tensor with the same shape as inputs. Stores the binary + classification label for each element in inputs + (0 for the negative class and 1 for the positive class). + num_objects: Number of objects in the batch + alpha: (optional) Weighting factor in range (0,1) to balance + positive vs negative examples. Default = -1 (no weighting). + gamma: Exponent of the modulating factor (1 - p_t) to + balance easy vs hard examples. + loss_on_multimask: True if multimask prediction is enabled + Returns: + focal loss tensor + """ + prob = inputs.sigmoid() + ce_loss = F.binary_cross_entropy_with_logits(inputs, targets, reduction="none") + p_t = prob * targets + (1 - prob) * (1 - targets) + loss = ce_loss * ((1 - p_t) ** gamma) + if alpha >= 0: + alpha_t = alpha * targets + (1 - alpha) * (1 - targets) + loss = alpha_t * loss + + if loss_on_multimask: + # loss is [N, M, H, W] where M corresponds to multiple predicted masks + assert loss.dim() == 4 + return loss.flatten(2).mean(-1) / num_objects # average over spatial dims + return loss.mean(1).sum() / num_objects + + +def iou_loss(inputs, targets, pred_ious, num_objects, loss_on_multimask=False, use_l1_loss=False): + """ + Args: + inputs: A float tensor of arbitrary shape. + The predictions for each example. + targets: A float tensor with the same shape as inputs. Stores the binary + classification label for each element in inputs + (0 for the negative class and 1 for the positive class). + pred_ious: A float tensor containing the predicted IoUs scores per mask + num_objects: Number of objects in the batch + loss_on_multimask: True if multimask prediction is enabled + use_l1_loss: Whether to use L1 loss is used instead of MSE loss + Returns: + IoU loss tensor + """ + assert inputs.dim() == 4 and targets.dim() == 4 + pred_mask = inputs.flatten(2) > 0 + gt_mask = targets.flatten(2) > 0 + area_i = torch.sum(pred_mask & gt_mask, dim=-1).float() + area_u = torch.sum(pred_mask | gt_mask, dim=-1).float() + actual_ious = area_i / torch.clamp(area_u, min=1.0) + + if use_l1_loss: + loss = F.l1_loss(pred_ious, actual_ious, reduction="none") + else: + loss = F.mse_loss(pred_ious, actual_ious, reduction="none") + if loss_on_multimask: + return loss / num_objects + return loss.sum() / num_objects + + +class MultiStepMultiMasksAndIous(nn.Module): + def __init__( + self, + weight_dict, + focal_alpha=0.25, + focal_gamma=2, + supervise_all_iou=False, + iou_use_l1_loss=False, + pred_obj_scores=False, + focal_gamma_obj_score=0.0, + focal_alpha_obj_score=-1, + ): + """ + This class computes the multi-step multi-mask and IoU losses. + Args: + weight_dict: dict containing weights for focal, dice, iou losses + focal_alpha: alpha for sigmoid focal loss + focal_gamma: gamma for sigmoid focal loss + supervise_all_iou: if True, back-prop iou losses for all predicted masks + iou_use_l1_loss: use L1 loss instead of MSE loss for iou + pred_obj_scores: if True, compute loss for object scores + focal_gamma_obj_score: gamma for sigmoid focal loss on object scores + focal_alpha_obj_score: alpha for sigmoid focal loss on object scores + """ + + super().__init__() + self.weight_dict = weight_dict + self.focal_alpha = focal_alpha + self.focal_gamma = focal_gamma + assert "loss_mask" in self.weight_dict + assert "loss_dice" in self.weight_dict + assert "loss_iou" in self.weight_dict + if "loss_class" not in self.weight_dict: + self.weight_dict["loss_class"] = 0.0 + + self.focal_alpha_obj_score = focal_alpha_obj_score + self.focal_gamma_obj_score = focal_gamma_obj_score + self.supervise_all_iou = supervise_all_iou + self.iou_use_l1_loss = iou_use_l1_loss + self.pred_obj_scores = pred_obj_scores + + def forward(self, outs_batch: List[Dict], targets_batch: torch.Tensor): + assert len(outs_batch) == len(targets_batch) + num_objects = torch.tensor( + (targets_batch.shape[1]), device=targets_batch.device, dtype=torch.float + ) # Number of objects is fixed within a batch + if is_dist_avail_and_initialized(): + torch.distributed.all_reduce(num_objects) + num_objects = torch.clamp(num_objects / get_world_size(), min=1).item() + + losses = defaultdict(int) + for outs, targets in zip(outs_batch, targets_batch): + cur_losses = self._forward(outs, targets, num_objects) + for k, v in cur_losses.items(): + losses[k] += v + + return losses + + def _forward(self, outputs: Dict, targets: torch.Tensor, num_objects): + """ + Compute the losses related to the masks: the focal loss and the dice loss. + and also the MAE or MSE loss between predicted IoUs and actual IoUs. + + Here "multistep_pred_multimasks_high_res" is a list of multimasks (tensors + of shape [N, M, H, W], where M could be 1 or larger, corresponding to + one or multiple predicted masks from a click. + + We back-propagate focal, dice losses only on the prediction channel + with the lowest focal+dice loss between predicted mask and ground-truth. + If `supervise_all_iou` is True, we backpropagate ious losses for all predicted masks. + """ + + target_masks = targets.unsqueeze(1).float() + assert target_masks.dim() == 4 # [N, 1, H, W] + src_masks_list = outputs["multistep_pred_multimasks_high_res"] + ious_list = outputs["multistep_pred_ious"] + object_score_logits_list = outputs["multistep_object_score_logits"] + + assert len(src_masks_list) == len(ious_list) + assert len(object_score_logits_list) == len(ious_list) + + # accumulate the loss over prediction steps + losses = {"loss_mask": 0, "loss_dice": 0, "loss_iou": 0, "loss_class": 0} + for src_masks, ious, object_score_logits in zip(src_masks_list, ious_list, object_score_logits_list): + self._update_losses(losses, src_masks, target_masks, ious, num_objects, object_score_logits) + losses[CORE_LOSS_KEY] = self.reduce_loss(losses) + return losses + + def _update_losses(self, losses, src_masks, target_masks, ious, num_objects, object_score_logits): + target_masks = target_masks.expand_as(src_masks) + # get focal, dice and iou loss on all output masks in a prediction step + loss_multimask = sigmoid_focal_loss( + src_masks, + target_masks, + num_objects, + alpha=self.focal_alpha, + gamma=self.focal_gamma, + loss_on_multimask=True, + ) + loss_multidice = dice_loss(src_masks, target_masks, num_objects, loss_on_multimask=True) + if not self.pred_obj_scores: + loss_class = torch.tensor(0.0, dtype=loss_multimask.dtype, device=loss_multimask.device) + target_obj = torch.ones( + loss_multimask.shape[0], + 1, + dtype=loss_multimask.dtype, + device=loss_multimask.device, + ) + else: + target_obj = torch.any((target_masks[:, 0] > 0).flatten(1), dim=-1)[..., None].float() + loss_class = sigmoid_focal_loss( + object_score_logits, + target_obj, + num_objects, + alpha=self.focal_alpha_obj_score, + gamma=self.focal_gamma_obj_score, + ) + + loss_multiiou = iou_loss( + src_masks, + target_masks, + ious, + num_objects, + loss_on_multimask=True, + use_l1_loss=self.iou_use_l1_loss, + ) + assert loss_multimask.dim() == 2 + assert loss_multidice.dim() == 2 + assert loss_multiiou.dim() == 2 + if loss_multimask.size(1) > 1: + # take the mask indices with the smallest focal + dice loss for back propagation + loss_combo = loss_multimask * self.weight_dict["loss_mask"] + loss_multidice * self.weight_dict["loss_dice"] + best_loss_inds = torch.argmin(loss_combo, dim=-1) + batch_inds = torch.arange(loss_combo.size(0), device=loss_combo.device) + loss_mask = loss_multimask[batch_inds, best_loss_inds].unsqueeze(1) + loss_dice = loss_multidice[batch_inds, best_loss_inds].unsqueeze(1) + # calculate the iou prediction and slot losses only in the index + # with the minimum loss for each mask (to be consistent w/ SAM) + if self.supervise_all_iou: + loss_iou = loss_multiiou.mean(dim=-1).unsqueeze(1) + else: + loss_iou = loss_multiiou[batch_inds, best_loss_inds].unsqueeze(1) + else: + loss_mask = loss_multimask + loss_dice = loss_multidice + loss_iou = loss_multiiou + + # backprop focal, dice and iou loss only if obj present + loss_mask = loss_mask * target_obj + loss_dice = loss_dice * target_obj + loss_iou = loss_iou * target_obj + + # sum over batch dimension (note that the losses are already divided by num_objects) + losses["loss_mask"] += loss_mask.sum() + losses["loss_dice"] += loss_dice.sum() + losses["loss_iou"] += loss_iou.sum() + losses["loss_class"] += loss_class + + def reduce_loss(self, losses): + reduced_loss = 0.0 + for loss_key, weight in self.weight_dict.items(): + if loss_key not in losses: + raise ValueError(f"{type(self)} doesn't compute {loss_key}") + if weight != 0: + reduced_loss += losses[loss_key] * weight + + return reduced_loss diff --git a/bboxmaskpose/sam2/training/model/__init__.py b/bboxmaskpose/sam2/training/model/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..5277f46157403e47fd830fc519144b97ef69d4ae --- /dev/null +++ b/bboxmaskpose/sam2/training/model/__init__.py @@ -0,0 +1,5 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. diff --git a/bboxmaskpose/sam2/training/model/sam2.py b/bboxmaskpose/sam2/training/model/sam2.py new file mode 100644 index 0000000000000000000000000000000000000000..4a187ddd68a1f0658b3cf37327f5e7cb11a3264a --- /dev/null +++ b/bboxmaskpose/sam2/training/model/sam2.py @@ -0,0 +1,543 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +import logging + +import numpy as np +import torch +import torch.distributed + +from sam2.modeling.sam2_base import SAM2Base +from sam2.modeling.sam2_utils import get_1d_sine_pe, get_next_point, sample_box_points, select_closest_cond_frames +from sam2.utils.misc import concat_points +from training.utils.data_utils import BatchedVideoDatapoint + + +class SAM2Train(SAM2Base): + def __init__( + self, + image_encoder, + memory_attention=None, + memory_encoder=None, + prob_to_use_pt_input_for_train=0.0, + prob_to_use_pt_input_for_eval=0.0, + prob_to_use_box_input_for_train=0.0, + prob_to_use_box_input_for_eval=0.0, + # if it is greater than 1, we interactive point sampling in the 1st frame and other randomly selected frames + num_frames_to_correct_for_train=1, # default: only iteratively sample on first frame + num_frames_to_correct_for_eval=1, # default: only iteratively sample on first frame + rand_frames_to_correct_for_train=False, + rand_frames_to_correct_for_eval=False, + # how many frames to use as initial conditioning frames (for both point input and mask input; the first frame is always used as an initial conditioning frame) + # - if `rand_init_cond_frames` below is True, we randomly sample 1~num_init_cond_frames initial conditioning frames + # - otherwise we sample a fixed number of num_init_cond_frames initial conditioning frames + # note: for point input, we sample correction points on all such initial conditioning frames, and we require that `num_frames_to_correct` >= `num_init_cond_frames`; + # these are initial conditioning frames because as we track the video, more conditioning frames might be added + # when a frame receives correction clicks under point input if `add_all_frames_to_correct_as_cond=True` + num_init_cond_frames_for_train=1, # default: only use the first frame as initial conditioning frame + num_init_cond_frames_for_eval=1, # default: only use the first frame as initial conditioning frame + rand_init_cond_frames_for_train=True, # default: random 1~num_init_cond_frames_for_train cond frames (to be constent w/ previous TA data loader) + rand_init_cond_frames_for_eval=False, + # if `add_all_frames_to_correct_as_cond` is True, we also append to the conditioning frame list any frame that receives a later correction click + # if `add_all_frames_to_correct_as_cond` is False, we conditioning frame list to only use those initial conditioning frames + add_all_frames_to_correct_as_cond=False, + # how many additional correction points to sample (on each frame selected to be corrected) + # note that the first frame receives an initial input click (in addition to any correction clicks) + num_correction_pt_per_frame=7, + # method for point sampling during evaluation + # "uniform" (sample uniformly from error region) or "center" (use the point with the largest distance to error region boundary) + # default to "center" to be consistent with evaluation in the SAM paper + pt_sampling_for_eval="center", + # During training, we optionally allow sampling the correction points from GT regions + # instead of the prediction error regions with a small probability. This might allow the + # model to overfit less to the error regions in training datasets + prob_to_sample_from_gt_for_train=0.0, + use_act_ckpt_iterative_pt_sampling=False, + # whether to forward image features per frame (as it's being tracked) during evaluation, instead of forwarding image features + # of all frames at once. This avoids backbone OOM errors on very long videos in evaluation, but could be slightly slower. + forward_backbone_per_frame_for_eval=False, + freeze_image_encoder=True, # CHNAGED + **kwargs, + ): + super().__init__(image_encoder, memory_attention, memory_encoder, **kwargs) + self.use_act_ckpt_iterative_pt_sampling = use_act_ckpt_iterative_pt_sampling + self.forward_backbone_per_frame_for_eval = forward_backbone_per_frame_for_eval + + # Point sampler and conditioning frames + self.prob_to_use_pt_input_for_train = prob_to_use_pt_input_for_train + self.prob_to_use_box_input_for_train = prob_to_use_box_input_for_train + self.prob_to_use_pt_input_for_eval = prob_to_use_pt_input_for_eval + self.prob_to_use_box_input_for_eval = prob_to_use_box_input_for_eval + if prob_to_use_pt_input_for_train > 0 or prob_to_use_pt_input_for_eval > 0: + logging.info(f"Training with points (sampled from masks) as inputs with p={prob_to_use_pt_input_for_train}") + assert num_frames_to_correct_for_train >= num_init_cond_frames_for_train + assert num_frames_to_correct_for_eval >= num_init_cond_frames_for_eval + + self.num_frames_to_correct_for_train = num_frames_to_correct_for_train + self.num_frames_to_correct_for_eval = num_frames_to_correct_for_eval + self.rand_frames_to_correct_for_train = rand_frames_to_correct_for_train + self.rand_frames_to_correct_for_eval = rand_frames_to_correct_for_eval + # Initial multi-conditioning frames + self.num_init_cond_frames_for_train = num_init_cond_frames_for_train + self.num_init_cond_frames_for_eval = num_init_cond_frames_for_eval + self.rand_init_cond_frames_for_train = rand_init_cond_frames_for_train + self.rand_init_cond_frames_for_eval = rand_init_cond_frames_for_eval + self.add_all_frames_to_correct_as_cond = add_all_frames_to_correct_as_cond + self.num_correction_pt_per_frame = num_correction_pt_per_frame + self.pt_sampling_for_eval = pt_sampling_for_eval + self.prob_to_sample_from_gt_for_train = prob_to_sample_from_gt_for_train + # A random number generator with a fixed initial seed across GPUs + self.rng = np.random.default_rng(seed=42) + + if freeze_image_encoder: + for p in self.image_encoder.parameters(): + p.requires_grad = False + + def forward(self, input: BatchedVideoDatapoint): + if self.training or not self.forward_backbone_per_frame_for_eval: + # precompute image features on all frames before tracking + backbone_out = self.forward_image(input.flat_img_batch) + else: + # defer image feature computation on a frame until it's being tracked + backbone_out = {"backbone_fpn": None, "vision_pos_enc": None} + backbone_out = self.prepare_prompt_inputs(backbone_out, input) + previous_stages_out = self.forward_tracking(backbone_out, input) + + return previous_stages_out + + def _prepare_backbone_features_per_frame(self, img_batch, img_ids): + """Compute the image backbone features on the fly for the given img_ids.""" + # Only forward backbone on unique image ids to avoid repetitive computation + # (if `img_ids` has only one element, it's already unique so we skip this step). + if img_ids.numel() > 1: + unique_img_ids, inv_ids = torch.unique(img_ids, return_inverse=True) + else: + unique_img_ids, inv_ids = img_ids, None + + # Compute the image features on those unique image ids + image = img_batch[unique_img_ids] + backbone_out = self.forward_image(image) + ( + _, + vision_feats, + vision_pos_embeds, + feat_sizes, + ) = self._prepare_backbone_features(backbone_out) + # Inverse-map image features for `unique_img_ids` to the final image features + # for the original input `img_ids`. + if inv_ids is not None: + image = image[inv_ids] + vision_feats = [x[:, inv_ids] for x in vision_feats] + vision_pos_embeds = [x[:, inv_ids] for x in vision_pos_embeds] + + return image, vision_feats, vision_pos_embeds, feat_sizes + + def prepare_prompt_inputs(self, backbone_out, input, start_frame_idx=0): + """ + Prepare input mask, point or box prompts. Optionally, we allow tracking from + a custom `start_frame_idx` to the end of the video (for evaluation purposes). + """ + # Load the ground-truth masks on all frames (so that we can later + # sample correction points from them) + # gt_masks_per_frame = { + # stage_id: targets.segments.unsqueeze(1) # [B, 1, H_im, W_im] + # for stage_id, targets in enumerate(input.find_targets) + # } + gt_masks_per_frame = {stage_id: masks.unsqueeze(1) for stage_id, masks in enumerate(input.masks)} # [B, 1, H_im, W_im] + + # gt_masks_per_frame = input.masks.unsqueeze(2) # [T,B,1,H_im,W_im] keep everything in tensor form + backbone_out["gt_masks_per_frame"] = gt_masks_per_frame + num_frames = input.num_frames + backbone_out["num_frames"] = num_frames + + # Randomly decide whether to use point inputs or mask inputs + if self.training: + prob_to_use_pt_input = self.prob_to_use_pt_input_for_train + prob_to_use_box_input = self.prob_to_use_box_input_for_train + num_frames_to_correct = self.num_frames_to_correct_for_train + rand_frames_to_correct = self.rand_frames_to_correct_for_train + num_init_cond_frames = self.num_init_cond_frames_for_train + rand_init_cond_frames = self.rand_init_cond_frames_for_train + else: + prob_to_use_pt_input = self.prob_to_use_pt_input_for_eval + prob_to_use_box_input = self.prob_to_use_box_input_for_eval + num_frames_to_correct = self.num_frames_to_correct_for_eval + rand_frames_to_correct = self.rand_frames_to_correct_for_eval + num_init_cond_frames = self.num_init_cond_frames_for_eval + rand_init_cond_frames = self.rand_init_cond_frames_for_eval + if num_frames == 1: + # here we handle a special case for mixing video + SAM on image training, + # where we force using point input for the SAM task on static images + prob_to_use_pt_input = 1.0 + num_frames_to_correct = 1 + num_init_cond_frames = 1 + assert num_init_cond_frames >= 1 + # (here `self.rng.random()` returns value in range 0.0 <= X < 1.0) + use_pt_input = self.rng.random() < prob_to_use_pt_input + if rand_init_cond_frames and num_init_cond_frames > 1: + # randomly select 1 to `num_init_cond_frames` frames as initial conditioning frames + num_init_cond_frames = self.rng.integers(1, num_init_cond_frames, endpoint=True) + if use_pt_input and rand_frames_to_correct and num_frames_to_correct > num_init_cond_frames: + # randomly select `num_init_cond_frames` to `num_frames_to_correct` frames to sample + # correction clicks (only for the case of point input) + num_frames_to_correct = self.rng.integers(num_init_cond_frames, num_frames_to_correct, endpoint=True) + backbone_out["use_pt_input"] = use_pt_input + + # Sample initial conditioning frames + if num_init_cond_frames == 1: + init_cond_frames = [start_frame_idx] # starting frame + else: + # starting frame + randomly selected remaining frames (without replacement) + init_cond_frames = [start_frame_idx] + self.rng.choice( + range(start_frame_idx + 1, num_frames), + num_init_cond_frames - 1, + replace=False, + ).tolist() + backbone_out["init_cond_frames"] = init_cond_frames + backbone_out["frames_not_in_init_cond"] = [t for t in range(start_frame_idx, num_frames) if t not in init_cond_frames] + # Prepare mask or point inputs on initial conditioning frames + backbone_out["mask_inputs_per_frame"] = {} # {frame_idx: } + backbone_out["point_inputs_per_frame"] = {} # {frame_idx: } + use_pt_input = True + for t in init_cond_frames: + if not use_pt_input: + backbone_out["mask_inputs_per_frame"][t] = gt_masks_per_frame[t] + else: + pts = input.points[t] + + # find the smallest number of keypoints + min_k = min(p.shape[0] for p in pts) + + # trim all tensors to this size + pts_trimmed = [p[:min_k] for p in pts] + + pts_trimmed = [p.to(gt_masks_per_frame[t].device)[:min_k] for p in pts] + # now stack + point_coords = torch.stack(pts_trimmed, dim=0) # [B, min_k, 2] + labels_trimmed = [l.to(gt_masks_per_frame[t].device)[:min_k] for l in input.labels[t]] + point_labels = torch.stack(labels_trimmed, dim=0) # [B, min_k] + pts_from_gt, labels_from_gt = get_next_point( + gt_masks=gt_masks_per_frame[t], + pred_masks=None, + method=("uniform_or_kpt" if self.training else self.pt_sampling_for_eval), + points=point_coords, + ) + # append the GT points & label + # expand dims to match batch if necessary + if pts_from_gt.dim() == 2: + pts_from_gt = pts_from_gt.unsqueeze(0) # [1, N_gt, 2] + labels_from_gt = labels_from_gt.unsqueeze(0) # [1, N_gt] + + point_coords = torch.cat([point_coords, pts_from_gt], dim=1) # concatenate along keypoint dimension + point_labels = torch.cat([point_labels, labels_from_gt], dim=1) # same for labels + backbone_out["point_inputs_per_frame"][t] = { + "point_coords": point_coords, + "point_labels": point_labels, + "n_pose_kpts": min_k, + } + + # Sample frames where we will add correction clicks on the fly + # based on the error between prediction and ground-truth masks + if not use_pt_input: + # no correction points will be sampled when using mask inputs + frames_to_add_correction_pt = [] + elif num_frames_to_correct == num_init_cond_frames: + frames_to_add_correction_pt = init_cond_frames + else: + assert num_frames_to_correct > num_init_cond_frames + # initial cond frame + randomly selected remaining frames (without replacement) + extra_num = num_frames_to_correct - num_init_cond_frames + frames_to_add_correction_pt = ( + init_cond_frames + self.rng.choice(backbone_out["frames_not_in_init_cond"], extra_num, replace=False).tolist() + ) + backbone_out["frames_to_add_correction_pt"] = frames_to_add_correction_pt + + return backbone_out + + def forward_tracking(self, backbone_out, input: BatchedVideoDatapoint, return_dict=False): + """Forward video tracking on each frame (and sample correction clicks).""" + img_feats_already_computed = backbone_out["backbone_fpn"] is not None + if img_feats_already_computed: + # Prepare the backbone features + # - vision_feats and vision_pos_embeds are in (HW)BC format + ( + _, + vision_feats, + vision_pos_embeds, + feat_sizes, + ) = self._prepare_backbone_features(backbone_out) + + # Starting the stage loop + num_frames = backbone_out["num_frames"] + init_cond_frames = backbone_out["init_cond_frames"] + frames_to_add_correction_pt = backbone_out["frames_to_add_correction_pt"] + # first process all the initial conditioning frames to encode them as memory, + # and then conditioning on them to track the remaining frames + processing_order = init_cond_frames + backbone_out["frames_not_in_init_cond"] + output_dict = { + "cond_frame_outputs": {}, # dict containing {frame_idx: } + "non_cond_frame_outputs": {}, # dict containing {frame_idx: } + } + for stage_id in processing_order: + # Get the image features for the current frames + # img_ids = input.find_inputs[stage_id].img_ids + img_ids = input.flat_obj_to_img_idx[stage_id] + if img_feats_already_computed: + # Retrieve image features according to img_ids (if they are already computed). + current_vision_feats = [x[:, img_ids] for x in vision_feats] + current_vision_pos_embeds = [x[:, img_ids] for x in vision_pos_embeds] + else: + # Otherwise, compute the image features on the fly for the given img_ids + # (this might be used for evaluation on long videos to avoid backbone OOM). + ( + _, + current_vision_feats, + current_vision_pos_embeds, + feat_sizes, + ) = self._prepare_backbone_features_per_frame(input.flat_img_batch, img_ids) + + # Get output masks based on this frame's prompts and previous memory + current_out = self.track_step( + frame_idx=stage_id, + is_init_cond_frame=stage_id in init_cond_frames, + current_vision_feats=current_vision_feats, + current_vision_pos_embeds=current_vision_pos_embeds, + feat_sizes=feat_sizes, + point_inputs=backbone_out["point_inputs_per_frame"].get(stage_id, None), + mask_inputs=backbone_out["mask_inputs_per_frame"].get(stage_id, None), + gt_masks=backbone_out["gt_masks_per_frame"].get(stage_id, None), + frames_to_add_correction_pt=frames_to_add_correction_pt, + output_dict=output_dict, + num_frames=num_frames, + ) + # Append the output, depending on whether it's a conditioning frame + add_output_as_cond_frame = stage_id in init_cond_frames or ( + self.add_all_frames_to_correct_as_cond and stage_id in frames_to_add_correction_pt + ) + if add_output_as_cond_frame: + output_dict["cond_frame_outputs"][stage_id] = current_out + else: + output_dict["non_cond_frame_outputs"][stage_id] = current_out + + if return_dict: + return output_dict + # turn `output_dict` into a list for loss function + all_frame_outputs = {} + all_frame_outputs.update(output_dict["cond_frame_outputs"]) + all_frame_outputs.update(output_dict["non_cond_frame_outputs"]) + all_frame_outputs = [all_frame_outputs[t] for t in range(num_frames)] + # Make DDP happy with activation checkpointing by removing unused keys + all_frame_outputs = [{k: v for k, v in d.items() if k != "obj_ptr"} for d in all_frame_outputs] + + return all_frame_outputs + + def track_step( + self, + frame_idx, + is_init_cond_frame, + current_vision_feats, + current_vision_pos_embeds, + feat_sizes, + point_inputs, + mask_inputs, + output_dict, + num_frames, + track_in_reverse=False, # tracking in reverse time order (for demo usage) + run_mem_encoder=True, # Whether to run the memory encoder on the predicted masks. + prev_sam_mask_logits=None, # The previously predicted SAM mask logits. + frames_to_add_correction_pt=None, + gt_masks=None, + ): + if frames_to_add_correction_pt is None: + frames_to_add_correction_pt = [] + + n_kpts = 1 + device = gt_masks.device + + point_coords = point_inputs["point_coords"] + point_labels = point_inputs["point_labels"] + n_pose_kpts = point_inputs["n_pose_kpts"] + + # comment this section out in case the keypoints are pre-sorted + _, indices = torch.sort(point_labels) + point_coords = point_coords[indices] # comment out in case + point_labels = point_labels[indices] + + point_labels.fill_(1) + cropped_point_inputs = {k: (v.clone().to(device) if torch.is_tensor(v) else v) for k, v in point_inputs.items()} + cropped_point_inputs["point_coords"] = point_coords[:, :n_kpts] + cropped_point_inputs["point_labels"] = point_labels[:, :n_kpts] + + current_out, sam_outputs, high_res_features, pix_feat = self._track_step( + frame_idx, + is_init_cond_frame, + current_vision_feats, + current_vision_pos_embeds, + feat_sizes, + cropped_point_inputs, + mask_inputs, + output_dict, + num_frames, + track_in_reverse, + prev_sam_mask_logits, + ) + + ( + low_res_multimasks, + high_res_multimasks, + ious, + low_res_masks, + high_res_masks, + obj_ptr, + object_score_logits, + ) = sam_outputs + + current_out["multistep_pred_masks"] = low_res_masks + current_out["multistep_pred_masks_high_res"] = high_res_masks + current_out["multistep_pred_multimasks"] = [low_res_multimasks] + current_out["multistep_pred_multimasks_high_res"] = [high_res_multimasks] + current_out["multistep_pred_ious"] = [ious] + current_out["multistep_point_inputs"] = [cropped_point_inputs] + current_out["multistep_object_score_logits"] = [object_score_logits] + + if frame_idx in frames_to_add_correction_pt: + point_inputs, final_sam_outputs = self._iter_correct_pt_sampling( + is_init_cond_frame, + cropped_point_inputs, + gt_masks, + high_res_features, + pix_feat, + low_res_multimasks, + high_res_multimasks, + ious, + low_res_masks, + high_res_masks, + object_score_logits, + current_out, + n_pose_kpts, + point_inputs, + ) + ( + _, + _, + _, + low_res_masks, + high_res_masks, + obj_ptr, + object_score_logits, + ) = final_sam_outputs + + # Use the final prediction (after all correction steps for output and eval) + current_out["pred_masks"] = low_res_masks + current_out["pred_masks_high_res"] = high_res_masks + current_out["obj_ptr"] = obj_ptr + + # Finally run the memory encoder on the predicted mask to encode + # it into a new memory feature (that can be used in future frames) + self._encode_memory_in_output( + current_vision_feats, + feat_sizes, + point_inputs, + run_mem_encoder, + high_res_masks, + object_score_logits, + current_out, + ) + return current_out + + def _iter_correct_pt_sampling( + self, + is_init_cond_frame, + point_inputs, + gt_masks, + high_res_features, + pix_feat_with_mem, + low_res_multimasks, + high_res_multimasks, + ious, + low_res_masks, + high_res_masks, + object_score_logits, + current_out, + n_pose_kpts, + all_pose_kpts, + ): + + assert gt_masks is not None + all_pred_masks = [low_res_masks] + all_pred_high_res_masks = [high_res_masks] + all_pred_multimasks = [low_res_multimasks] + all_pred_high_res_multimasks = [high_res_multimasks] + all_pred_ious = [ious] + all_point_inputs = [point_inputs] + all_object_score_logits = [object_score_logits] + for correction_i in range(self.num_correction_pt_per_frame): + # sample a new point from the error between prediction and ground-truth + # (with a small probability, directly sample from GT masks instead of errors) + if self.training and self.prob_to_sample_from_gt_for_train > 0: + sample_from_gt = self.rng.random() < self.prob_to_sample_from_gt_for_train + else: + sample_from_gt = False + # if `pred_for_new_pt` is None, only GT masks will be used for point sampling + pred_for_new_pt = None if sample_from_gt else (high_res_masks > 0) + new_points, new_labels = get_next_point( + gt_masks=gt_masks, + pred_masks=pred_for_new_pt, + method="uniform_or_kpt" if self.training else self.pt_sampling_for_eval, + points=all_pose_kpts["point_coords"][:, :n_pose_kpts], + ) + point_inputs = concat_points(point_inputs, new_points, new_labels) + # Feed the mask logits of the previous SAM outputs in the next SAM decoder step. + # For tracking, this means that when the user adds a correction click, we also feed + # the tracking output mask logits along with the click as input to the SAM decoder. + mask_inputs = low_res_masks + multimask_output = self._use_multimask(is_init_cond_frame, point_inputs) + if self.use_act_ckpt_iterative_pt_sampling and not multimask_output: + sam_outputs = torch.utils.checkpoint.checkpoint( + self._forward_sam_heads, + backbone_features=pix_feat_with_mem, + point_inputs=point_inputs, + mask_inputs=mask_inputs, + high_res_features=high_res_features, + multimask_output=multimask_output, + use_reentrant=False, + ) + else: + sam_outputs = self._forward_sam_heads( + backbone_features=pix_feat_with_mem, + point_inputs=point_inputs, + mask_inputs=mask_inputs, + high_res_features=high_res_features, + multimask_output=multimask_output, + ) + ( + low_res_multimasks, + high_res_multimasks, + ious, + low_res_masks, + high_res_masks, + _, + object_score_logits, + ) = sam_outputs + all_pred_masks.append(low_res_masks) + all_pred_high_res_masks.append(high_res_masks) + all_pred_multimasks.append(low_res_multimasks) + all_pred_high_res_multimasks.append(high_res_multimasks) + all_pred_ious.append(ious) + all_point_inputs.append(point_inputs) + all_object_score_logits.append(object_score_logits) + + # Concatenate the masks along channel (to compute losses on all of them, + # using `MultiStepIteractiveMasks`) + current_out["multistep_pred_masks"] = torch.cat(all_pred_masks, dim=1) + current_out["multistep_pred_masks_high_res"] = torch.cat(all_pred_high_res_masks, dim=1) + current_out["multistep_pred_multimasks"] = all_pred_multimasks + current_out["multistep_pred_multimasks_high_res"] = all_pred_high_res_multimasks + current_out["multistep_pred_ious"] = all_pred_ious + current_out["multistep_point_inputs"] = all_point_inputs + current_out["multistep_object_score_logits"] = all_object_score_logits + + return point_inputs, sam_outputs diff --git a/bboxmaskpose/sam2/training/optimizer.py b/bboxmaskpose/sam2/training/optimizer.py new file mode 100644 index 0000000000000000000000000000000000000000..2d74e996bcfe1f56747ff686fdcff2134129770d --- /dev/null +++ b/bboxmaskpose/sam2/training/optimizer.py @@ -0,0 +1,435 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +import fnmatch +import inspect +import itertools +import logging +import types +from typing import Any, Callable, Dict, Iterable, List, Mapping, Optional, Set, Tuple, Type, Union + +import torch +import torch.nn as nn +from torch import Tensor + +import hydra +from omegaconf import DictConfig + + +class Optimizer: + def __init__(self, optimizer, schedulers=None) -> None: + self.optimizer = optimizer + self.schedulers = schedulers + self._validate_optimizer_schedulers() + self.step_schedulers(0.0, 0) + + def _validate_optimizer_schedulers(self): + if self.schedulers is None: + return + for _, set_of_schedulers in enumerate(self.schedulers): + for option, _ in set_of_schedulers.items(): + assert option in self.optimizer.defaults, ( + "Optimizer option " f"{option} not found in {self.optimizer}. Valid options are " f"{self.optimizer.defaults.keys()}" + ) + + def step_schedulers(self, where: float, step: int) -> None: + if self.schedulers is None: + return + for i, param_group in enumerate(self.optimizer.param_groups): + for option, scheduler in self.schedulers[i].items(): + if "step" in inspect.signature(scheduler.__call__).parameters: + new_value = scheduler(step=step, where=where) + elif hasattr(scheduler, "scheduler") and "step" in inspect.signature(scheduler.scheduler.__call__).parameters: + # To handle ValueScaler wrappers + new_value = scheduler(step=step, where=where) + else: + new_value = scheduler(where) + param_group[option] = new_value + + def step(self, where, step, closure=None): + self.step_schedulers(where, step) + return self.optimizer.step(closure) + + def zero_grad(self, *args, **kwargs): + return self.optimizer.zero_grad(*args, **kwargs) + + +def set_default_parameters(scheduler_cfgs: List[DictConfig], all_parameter_names: Set[str]) -> None: + """Set up the "default" scheduler with the right parameters. + + Args: + scheduler_cgfs: A list of scheduler configs, where each scheduler also + specifies which parameters it applies to, based on the names of parameters + or the class of the modules. At most one scheduler is allowed to skip this + specification, which is used as a "default" specification for any remaining + parameters. + all_parameter_names: Names of all the parameters to consider. + """ + constraints = [scheduler_cfg.parameter_names for scheduler_cfg in scheduler_cfgs if scheduler_cfg.parameter_names is not None] + if len(constraints) == 0: + default_params = set(all_parameter_names) + else: + default_params = all_parameter_names - set.union(*constraints) + default_count = 0 + for scheduler_cfg in scheduler_cfgs: + if scheduler_cfg.parameter_names is None: + scheduler_cfg.parameter_names = default_params + default_count += 1 + assert default_count <= 1, "Only one scheduler per option can be default" + if default_count == 0: + # No default scheduler specified, add a default, but without any scheduler + # for that option + scheduler_cfgs.append({"parameter_names": default_params}) + + +def name_constraints_to_parameters(param_constraints: List[Set[str]], named_parameters: Dict[str, Tensor]) -> List[torch.nn.Parameter]: + """Return parameters which match the intersection of parameter constraints. + + Note that this returns the parameters themselves, not their names. + + Args: + param_constraints: A list, with each element being a set of allowed parameters. + named_parameters: Mapping from a parameter name to the parameter itself. + + Returns: + A list containing the parameters which overlap with _each_ constraint set from + param_constraints. + """ + matching_names = set.intersection(*param_constraints) + return [value for name, value in named_parameters.items() if name in matching_names] + + +def map_scheduler_cfgs_to_param_groups( + all_scheduler_cfgs: Iterable[List[Dict]], + named_parameters: Dict[str, Tensor], +) -> Tuple[List[Dict[Any, Any]], List[Dict[str, List[torch.nn.Parameter]]]]: + """Produce parameter groups corresponding to all the scheduler configs. + + Takes all the scheduler configs, each of which applies to a specific optimizer + option (like "lr" or "weight_decay") and has a set of parameter names which it + applies to, and produces a final set of param groups where each param group + covers all the options which apply to a particular set of parameters. + + Args: + all_scheduler_cfgs: All the scheduler configs covering every option. + named_parameters: Mapping from a parameter name to the parameter itself. + Returns: + Tuple of lists of schedulers and param_groups, where schedulers[i] + applies to param_groups[i]. + """ + + scheduler_cfgs_per_param_group = itertools.product(*all_scheduler_cfgs) + schedulers = [] + param_groups = [] + for scheduler_cfgs in scheduler_cfgs_per_param_group: + param_constraints = [scheduler_cfg["parameter_names"] for scheduler_cfg in scheduler_cfgs] + matching_parameters = name_constraints_to_parameters(param_constraints, named_parameters) + if len(matching_parameters) == 0: # If no overlap of parameters, skip + continue + schedulers_for_group = { + scheduler_cfg["option"]: scheduler_cfg["scheduler"] for scheduler_cfg in scheduler_cfgs if "option" in scheduler_cfg + } + schedulers.append(schedulers_for_group) + param_groups.append({"params": matching_parameters}) + return schedulers, param_groups + + +def validate_param_group_params(param_groups: List[Dict], model: nn.Module): + """Check that the param groups are non-overlapping and cover all the parameters. + + Args: + param_groups: List of all param groups + model: Model to validate against. The check ensures that all the model + parameters are part of param_groups + """ + for pg in param_groups: + # no param should be repeated within a group + assert len(pg["params"]) == len(set(pg["params"])) + parameters = [set(param_group["params"]) for param_group in param_groups] + model_parameters = {parameter for _, parameter in model.named_parameters()} + for p1, p2 in itertools.permutations(parameters, 2): + assert p1.isdisjoint(p2), "Scheduler generated param_groups should be disjoint" + assert set.union(*parameters) == model_parameters, ( + "Scheduler generated param_groups must include all parameters of the model." + f" Found {len(set.union(*parameters))} params whereas model has" + f" {len(model_parameters)} params" + ) + + +def unix_module_cls_pattern_to_parameter_names( + filter_module_cls_names: List[str], + module_cls_to_param_names: Dict[Type, str], +) -> Union[None, Set[str]]: + """Returns param names which pass the filters specified in filter_module_cls_names. + + Args: + filter_module_cls_names: A list of filter strings containing class names, like + ["torch.nn.LayerNorm", "torch.nn.BatchNorm2d"] + module_cls_to_param_names: Mapping from module classes to the parameter names + they contain. See `get_module_cls_to_param_names`. + """ + if filter_module_cls_names is None: + return set() + allowed_parameter_names = [] + for module_cls_name in filter_module_cls_names: + module_cls = hydra.utils.get_class(module_cls_name) + if module_cls not in module_cls_to_param_names: + raise AssertionError(f"module_cls_name {module_cls_name} does not " "match any classes in the model") + matching_parameters = module_cls_to_param_names[module_cls] + assert len(matching_parameters) > 0, f"module_cls_name {module_cls_name} does not contain any parameters in the model" + logging.info(f"Matches for module_cls_name [{module_cls_name}]: {matching_parameters} ") + allowed_parameter_names.append(matching_parameters) + return set.union(*allowed_parameter_names) + + +def unix_param_pattern_to_parameter_names( + filter_param_names: Optional[List[str]], + parameter_names: Dict[str, torch.Tensor], +) -> Union[None, Set[str]]: + """Returns param names which pass the filters specified in filter_param_names. + + Args: + filter_param_names: A list of unix-style filter strings with optional + wildcards, like ["block.2.*", "block.2.linear.weight"] + module_cls_to_param_names: Mapping from module classes to the parameter names + they contain. See `get_module_cls_to_param_names`. + """ + + if filter_param_names is None: + return set() + allowed_parameter_names = [] + for param_name in filter_param_names: + matching_parameters = set(fnmatch.filter(parameter_names, param_name)) + assert len(matching_parameters) >= 1, f"param_name {param_name} does not match any parameters in the model" + logging.info(f"Matches for param_name [{param_name}]: {matching_parameters}") + allowed_parameter_names.append(matching_parameters) + return set.union(*allowed_parameter_names) + + +def _unix_pattern_to_parameter_names( + scheduler_cfg: DictConfig, + parameter_names: Set[str], + module_cls_to_param_names: Dict[Type, str], +) -> Union[None, Set[str]]: + """Returns param names which pass the filters specified in scheduler_cfg. + + Args: + scheduler_cfg: The config for the scheduler + parameter_names: The set of all parameter names which will be filtered + """ + if "param_names" not in scheduler_cfg and "module_cls_names" not in scheduler_cfg: + return None + return unix_param_pattern_to_parameter_names(scheduler_cfg.get("param_names"), parameter_names).union( + unix_module_cls_pattern_to_parameter_names(scheduler_cfg.get("module_cls_names"), module_cls_to_param_names) + ) + + +def get_module_cls_to_param_names(model: nn.Module, param_allowlist: Set[str] = None) -> Dict[Type, str]: + """Produce a mapping from all the modules classes to the names of parames they own. + + Only counts a parameter as part of the immediate parent module, i.e. recursive + parents do not count. + + Args: + model: Model to iterate over + param_allowlist: If specified, only these param names will be processed + """ + + module_cls_to_params = {} + for module_name, module in model.named_modules(): + module_cls = type(module) + module_cls_to_params.setdefault(module_cls, set()) + for param_name, _ in module.named_parameters(recurse=False): + full_param_name = get_full_parameter_name(module_name, param_name) + if param_allowlist is None or full_param_name in param_allowlist: + module_cls_to_params[module_cls].add(full_param_name) + return module_cls_to_params + + +def construct_optimizer( + model: torch.nn.Module, + optimizer_conf: Any, + options_conf: Mapping[str, List] = None, + param_group_modifiers_conf: List[Callable] = None, + param_allowlist: Optional[Set[str]] = None, + validate_param_groups=True, +) -> Optimizer: + """ + Constructs a stochastic gradient descent or ADAM (or ADAMw) optimizer + with momentum. i.e, constructs a torch.optim.Optimizer with zero-weight decay + Batchnorm and/or no-update 1-D parameters support, based on the config. + + Supports wrapping the optimizer with Layer-wise Adaptive Rate Scaling + (LARS): https://arxiv.org/abs/1708.03888 + + Args: + model: model to perform stochastic gradient descent + optimization or ADAM optimization. + optimizer_conf: Hydra config consisting a partial torch optimizer like SGD or + ADAM, still missing the params argument which this function provides to + produce the final optimizer + param_group_modifiers_conf: Optional user specified functions which can modify + the final scheduler configs before the optimizer's param groups are built + param_allowlist: The parameters to optimize. Parameters which are not part of + this allowlist will be skipped. + validate_param_groups: If enabled, valides that the produced param_groups don't + overlap and cover all the model parameters. + """ + if param_allowlist is None: + param_allowlist = {name for name, _ in model.named_parameters()} + + named_parameters = {name: param for name, param in model.named_parameters() if name in param_allowlist} + + if not options_conf: + optimizer = hydra.utils.instantiate(optimizer_conf, named_parameters.values()) + return Optimizer(optimizer) + + all_parameter_names = {name for name, _ in model.named_parameters() if name in param_allowlist} + module_cls_to_all_param_names = get_module_cls_to_param_names(model, param_allowlist) + + scheduler_cfgs_per_option = hydra.utils.instantiate(options_conf) + all_scheduler_cfgs = [] + for option, scheduler_cfgs in scheduler_cfgs_per_option.items(): + for config in scheduler_cfgs: + config.option = option + config.parameter_names = _unix_pattern_to_parameter_names(config, all_parameter_names, module_cls_to_all_param_names) + set_default_parameters(scheduler_cfgs, all_parameter_names) + all_scheduler_cfgs.append(scheduler_cfgs) + + if param_group_modifiers_conf: + for custom_param_modifier in param_group_modifiers_conf: + custom_param_modifier = hydra.utils.instantiate(custom_param_modifier) + all_scheduler_cfgs = custom_param_modifier(scheduler_cfgs=all_scheduler_cfgs, model=model) + schedulers, param_groups = map_scheduler_cfgs_to_param_groups(all_scheduler_cfgs, named_parameters) + if validate_param_groups: + validate_param_group_params(param_groups, model) + optimizer = hydra.utils.instantiate(optimizer_conf, param_groups) + return Optimizer(optimizer, schedulers) + + +def get_full_parameter_name(module_name, param_name): + if module_name == "": + return param_name + return f"{module_name}.{param_name}" + + +class GradientClipper: + """ + Gradient clipping utils that works for DDP + """ + + def __init__(self, max_norm: float = 1.0, norm_type: int = 2): + assert isinstance(max_norm, (int, float)) or max_norm is None + self.max_norm = max_norm if max_norm is None else float(max_norm) + self.norm_type = norm_type + + def __call__(self, model: nn.Module): + if self.max_norm is None: + return # no-op + + nn.utils.clip_grad_norm_(model.parameters(), max_norm=self.max_norm, norm_type=self.norm_type) + + +class ValueScaler: + def __init__(self, scheduler, mult_val: float): + self.scheduler = scheduler + self.mult_val = mult_val + + def __call__(self, *args, **kwargs): + val = self.scheduler(*args, **kwargs) + return val * self.mult_val + + +def rgetattr(obj, rattrs: str = None): + """ + Like getattr(), but supports dotted notation for nested objects. + rattrs is a str of form 'attr1.attr2', returns obj.attr1.attr2 + """ + if rattrs is None: + return obj + attrs = rattrs.split(".") + for attr in attrs: + obj = getattr(obj, attr) + return obj + + +def layer_decay_param_modifier( + scheduler_cfgs: List[List[Dict]], + model, + layer_decay_value: float, + layer_decay_min: Optional[float] = None, + apply_to: Optional[str] = None, + overrides: List[Dict] = (), +) -> List[List[Dict]]: + """ + Args + - scheduler_cfgs: a list of omegaconf.ListConfigs. + Each element in the list is a omegaconfg.DictConfig with the following structure + { + "scheduler": + "option": possible options are "lr", "weight_decay" etc. + "parameter_names": Set of str indicating param names that this scheduler applies to + } + - model: a model that implements a method `get_layer_id` that maps layer_name to an integer and + and a method get_num_layers. + Alternatively, use apply_to argument to select a specific component of the model. + - layer_decay_value: float + - layer_decay_min: min val for layer decay + - apply_to: optional arg to select which component of the model to apply the the layer decay modifier to + - overrides: to manually override lr for specific patterns. Is a list of dicts. Each dict, has keys "pattern", "value". + Returns + - scheduler_configs: same structure as the input, elements can be modified + """ + model = rgetattr(model, apply_to) + num_layers = model.get_num_layers() + 1 + layer_decays = [layer_decay_value ** (num_layers - i) for i in range(num_layers + 1)] + if layer_decay_min is not None: + layer_decays = [max(val, layer_decay_min) for val in layer_decays] + final_scheduler_cfgs = [] + # scheduler_cfgs is a list of lists + for scheduler_cfg_group in scheduler_cfgs: + curr_cfg_group = [] + # scheduler_cfg_group is a list of dictionaries + for scheduler_cfg in scheduler_cfg_group: + if scheduler_cfg["option"] != "lr": + curr_cfg_group.append(scheduler_cfg) + continue + # Need sorted so that the list of parameter names is deterministic and consistent + # across re-runs of this job. Else it was causing issues with loading the optimizer + # state during a job restart (D38591759) + parameter_names = sorted(scheduler_cfg["parameter_names"]) + + # Only want one cfg group per layer + layer_cfg_groups = {} + for param_name in parameter_names: + layer_id = num_layers + this_scale = layer_decays[layer_id] + if param_name.startswith(apply_to): + layer_id = model.get_layer_id(param_name) + this_scale = layer_decays[layer_id] + # Overrides + for override in overrides: + if fnmatch.fnmatchcase(param_name, override["pattern"]): + this_scale = float(override["value"]) + layer_id = override["pattern"] + break + + if layer_id not in layer_cfg_groups: + curr_param = { + "option": scheduler_cfg["option"], + "scheduler": ValueScaler(scheduler_cfg["scheduler"], this_scale), + "parameter_names": {param_name}, + } + else: + curr_param = layer_cfg_groups[layer_id] + curr_param["parameter_names"].add(param_name) + layer_cfg_groups[layer_id] = curr_param + + for layer_cfg in layer_cfg_groups.values(): + curr_cfg_group.append(layer_cfg) + + final_scheduler_cfgs.append(curr_cfg_group) + return final_scheduler_cfgs diff --git a/bboxmaskpose/sam2/training/scripts/sav_frame_extraction_submitit.py b/bboxmaskpose/sam2/training/scripts/sav_frame_extraction_submitit.py new file mode 100644 index 0000000000000000000000000000000000000000..44e5345c47ae8878472e8cd9bb384ed3090ef45a --- /dev/null +++ b/bboxmaskpose/sam2/training/scripts/sav_frame_extraction_submitit.py @@ -0,0 +1,155 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +import argparse +import os +from pathlib import Path + +import cv2 +import numpy as np +import tqdm + +import submitit + + +def get_args_parser(): + parser = argparse.ArgumentParser( + description="[SA-V Preprocessing] Extracting JPEG frames", + formatter_class=argparse.ArgumentDefaultsHelpFormatter, + ) + + # ------------ + # DATA + # ------------ + data_parser = parser.add_argument_group( + title="SA-V dataset data root", + description="What data to load and how to process it.", + ) + data_parser.add_argument( + "--sav-vid-dir", + type=str, + required=True, + help=("Where to find the SAV videos"), + ) + data_parser.add_argument( + "--sav-frame-sample-rate", + type=int, + default=4, + help="Rate at which to sub-sample frames", + ) + + # ------------ + # LAUNCH + # ------------ + launch_parser = parser.add_argument_group( + title="Cluster launch settings", + description="Number of jobs and retry settings.", + ) + launch_parser.add_argument( + "--n-jobs", + type=int, + required=True, + help="Shard the run over this many jobs.", + ) + launch_parser.add_argument("--timeout", type=int, required=True, help="SLURM timeout parameter in minutes.") + launch_parser.add_argument("--partition", type=str, required=True, help="Partition to launch on.") + launch_parser.add_argument("--account", type=str, required=True, help="Partition to launch on.") + launch_parser.add_argument("--qos", type=str, required=True, help="QOS.") + + # ------------ + # OUTPUT + # ------------ + output_parser = parser.add_argument_group(title="Setting for results output", description="Where and how to save results.") + output_parser.add_argument( + "--output-dir", + type=str, + required=True, + help=("Where to dump the extracted jpeg frames"), + ) + output_parser.add_argument( + "--slurm-output-root-dir", + type=str, + required=True, + help=("Where to save slurm outputs"), + ) + return parser + + +def decode_video(video_path: str): + assert os.path.exists(video_path) + video = cv2.VideoCapture(video_path) + video_frames = [] + while video.isOpened(): + ret, frame = video.read() + if ret: + video_frames.append(frame) + else: + break + return video_frames + + +def extract_frames(video_path, sample_rate): + frames = decode_video(video_path) + return frames[::sample_rate] + + +def submitit_launch(video_paths, sample_rate, save_root): + for path in tqdm.tqdm(video_paths): + frames = extract_frames(path, sample_rate) + output_folder = os.path.join(save_root, Path(path).stem) + if not os.path.exists(output_folder): + os.makedirs(output_folder) + for fid, frame in enumerate(frames): + frame_path = os.path.join(output_folder, f"{fid*sample_rate:05d}.jpg") + cv2.imwrite(frame_path, frame) + print(f"Saved output to {save_root}") + + +if __name__ == "__main__": + parser = get_args_parser() + args = parser.parse_args() + + sav_vid_dir = args.sav_vid_dir + save_root = args.output_dir + sample_rate = args.sav_frame_sample_rate + + # List all SA-V videos + mp4_files = sorted([str(p) for p in Path(sav_vid_dir).glob("*/*.mp4")]) + mp4_files = np.array(mp4_files) + chunked_mp4_files = [x.tolist() for x in np.array_split(mp4_files, args.n_jobs)] + + print(f"Processing videos in: {sav_vid_dir}") + print(f"Processing {len(mp4_files)} files") + print(f"Beginning processing in {args.n_jobs} processes") + + # Submitit params + jobs_dir = os.path.join(args.slurm_output_root_dir, "%j") + cpus_per_task = 4 + executor = submitit.AutoExecutor(folder=jobs_dir) + executor.update_parameters( + timeout_min=args.timeout, + gpus_per_node=0, + tasks_per_node=1, + slurm_array_parallelism=args.n_jobs, + cpus_per_task=cpus_per_task, + slurm_partition=args.partition, + slurm_account=args.account, + slurm_qos=args.qos, + ) + executor.update_parameters(slurm_srun_args=["-vv", "--cpu-bind", "none"]) + + # Launch + jobs = [] + with executor.batch(): + for _, mp4_chunk in tqdm.tqdm(enumerate(chunked_mp4_files)): + job = executor.submit( + submitit_launch, + video_paths=mp4_chunk, + sample_rate=sample_rate, + save_root=save_root, + ) + jobs.append(job) + + for j in jobs: + print(f"Slurm JobID: {j.job_id}") + print(f"Saving outputs to {save_root}") + print(f"Slurm outputs at {args.slurm_output_root_dir}") diff --git a/bboxmaskpose/sam2/training/train.py b/bboxmaskpose/sam2/training/train.py new file mode 100644 index 0000000000000000000000000000000000000000..34483ce88757cdc4dbf34cd55e0be9f80456cc58 --- /dev/null +++ b/bboxmaskpose/sam2/training/train.py @@ -0,0 +1,230 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +import logging +import os +import random +import sys +import traceback +from argparse import ArgumentParser + +import torch + +import submitit +from hydra import compose, initialize_config_module +from hydra.utils import instantiate +from iopath.common.file_io import g_pathmgr +from omegaconf import OmegaConf +from training.utils.train_utils import makedir, register_omegaconf_resolvers + +os.environ["HYDRA_FULL_ERROR"] = "1" + + +def single_proc_run(local_rank, main_port, cfg, world_size): + """Single GPU process""" + os.environ["MASTER_ADDR"] = "localhost" + os.environ["MASTER_PORT"] = str(main_port) + os.environ["RANK"] = str(local_rank) + os.environ["LOCAL_RANK"] = str(local_rank) + os.environ["WORLD_SIZE"] = str(world_size) + try: + register_omegaconf_resolvers() + except Exception as e: + logging.info(e) + + trainer = instantiate(cfg.trainer, _recursive_=False) + trainer.run() + + +def single_node_runner(cfg, main_port: int): + assert cfg.launcher.num_nodes == 1 + num_proc = cfg.launcher.gpus_per_node + torch.multiprocessing.set_start_method("spawn") # CUDA runtime does not support `fork` + if num_proc == 1: + # directly call single_proc so we can easily set breakpoints + # mp.spawn does not let us set breakpoints + single_proc_run(local_rank=0, main_port=main_port, cfg=cfg, world_size=num_proc) + else: + mp_runner = torch.multiprocessing.start_processes + args = (main_port, cfg, num_proc) + # Note: using "fork" below, "spawn" causes time and error regressions. Using + # spawn changes the default multiprocessing context to spawn, which doesn't + # interact well with the dataloaders (likely due to the use of OpenCV). + mp_runner(single_proc_run, args=args, nprocs=num_proc, start_method="spawn") + + +def format_exception(e: Exception, limit=20): + traceback_str = "".join(traceback.format_tb(e.__traceback__, limit=limit)) + return f"{type(e).__name__}: {e}\nTraceback:\n{traceback_str}" + + +class SubmititRunner(submitit.helpers.Checkpointable): + """A callable which is passed to submitit to launch the jobs.""" + + def __init__(self, port, cfg): + self.cfg = cfg + self.port = port + self.has_setup = False + + def run_trainer(self): + job_env = submitit.JobEnvironment() + # Need to add this again so the hydra.job.set_env PYTHONPATH + # is also set when launching jobs. + add_pythonpath_to_sys_path() + os.environ["MASTER_ADDR"] = job_env.hostnames[0] + os.environ["MASTER_PORT"] = str(self.port) + os.environ["RANK"] = str(job_env.global_rank) + os.environ["LOCAL_RANK"] = str(job_env.local_rank) + os.environ["WORLD_SIZE"] = str(job_env.num_tasks) + + register_omegaconf_resolvers() + cfg_resolved = OmegaConf.to_container(self.cfg, resolve=False) + cfg_resolved = OmegaConf.create(cfg_resolved) + + trainer = instantiate(cfg_resolved.trainer, _recursive_=False) + trainer.run() + + def __call__(self): + job_env = submitit.JobEnvironment() + self.setup_job_info(job_env.job_id, job_env.global_rank) + try: + self.run_trainer() + except Exception as e: + # Log the exception. Then raise it again (as what SubmititRunner currently does). + message = format_exception(e) + logging.error(message) + raise e + + def setup_job_info(self, job_id, rank): + """Set up slurm job info""" + self.job_info = { + "job_id": job_id, + "rank": rank, + "cluster": self.cfg.get("cluster", None), + "experiment_log_dir": self.cfg.launcher.experiment_log_dir, + } + + self.has_setup = True + + +def add_pythonpath_to_sys_path(): + if "PYTHONPATH" not in os.environ or not os.environ["PYTHONPATH"]: + return + sys.path = os.environ["PYTHONPATH"].split(":") + sys.path + + +def main(args) -> None: + cfg = compose(config_name=args.config) + if cfg.launcher.experiment_log_dir is None: + cfg.launcher.experiment_log_dir = os.path.join(os.getcwd(), "sam2_logs", args.config) + print("###################### Train App Config ####################") + print(OmegaConf.to_yaml(cfg)) + print("############################################################") + + add_pythonpath_to_sys_path() + makedir(cfg.launcher.experiment_log_dir) + with g_pathmgr.open(os.path.join(cfg.launcher.experiment_log_dir, "config.yaml"), "w") as f: + f.write(OmegaConf.to_yaml(cfg)) + + cfg_resolved = OmegaConf.to_container(cfg, resolve=False) + cfg_resolved = OmegaConf.create(cfg_resolved) + + with g_pathmgr.open(os.path.join(cfg.launcher.experiment_log_dir, "config_resolved.yaml"), "w") as f: + f.write(OmegaConf.to_yaml(cfg_resolved, resolve=True)) + + submitit_conf = cfg.get("submitit", None) + assert submitit_conf is not None, "Missing submitit config" + + submitit_dir = cfg.launcher.experiment_log_dir + submitit_dir = os.path.join(submitit_dir, "submitit_logs") + # Priotrize cmd line args + cfg.launcher.gpus_per_node = args.num_gpus if args.num_gpus is not None else cfg.launcher.gpus_per_node + cfg.launcher.num_nodes = args.num_nodes if args.num_nodes is not None else cfg.launcher.num_nodes + submitit_conf.use_cluster = args.use_cluster if args.use_cluster is not None else submitit_conf.use_cluster + if submitit_conf.use_cluster: + executor = submitit.AutoExecutor(folder=submitit_dir) + submitit_conf.partition = args.partition if args.partition is not None else submitit_conf.get("partition", None) + submitit_conf.account = args.account if args.account is not None else submitit_conf.get("account", None) + submitit_conf.qos = args.qos if args.qos is not None else submitit_conf.get("qos", None) + job_kwargs = { + "timeout_min": 60 * submitit_conf.timeout_hour, + "name": (submitit_conf.name if hasattr(submitit_conf, "name") else args.config), + "slurm_partition": submitit_conf.partition, + "gpus_per_node": cfg.launcher.gpus_per_node, + "tasks_per_node": cfg.launcher.gpus_per_node, # one task per GPU + "cpus_per_task": submitit_conf.cpus_per_task, + "nodes": cfg.launcher.num_nodes, + "slurm_additional_parameters": { + "exclude": " ".join(submitit_conf.get("exclude_nodes", [])), + }, + } + if "include_nodes" in submitit_conf: + assert len(submitit_conf["include_nodes"]) >= cfg.launcher.num_nodes, "Not enough nodes" + job_kwargs["slurm_additional_parameters"]["nodelist"] = " ".join(submitit_conf["include_nodes"]) + if submitit_conf.account is not None: + job_kwargs["slurm_additional_parameters"]["account"] = submitit_conf.account + if submitit_conf.qos is not None: + job_kwargs["slurm_additional_parameters"]["qos"] = submitit_conf.qos + + if submitit_conf.get("mem_gb", None) is not None: + job_kwargs["mem_gb"] = submitit_conf.mem_gb + elif submitit_conf.get("mem", None) is not None: + job_kwargs["slurm_mem"] = submitit_conf.mem + + if submitit_conf.get("constraints", None) is not None: + job_kwargs["slurm_constraint"] = submitit_conf.constraints + + if submitit_conf.get("comment", None) is not None: + job_kwargs["slurm_comment"] = submitit_conf.comment + + # Supports only cpu-bind option within srun_args. New options can be added here + if submitit_conf.get("srun_args", None) is not None: + job_kwargs["slurm_srun_args"] = [] + if submitit_conf.srun_args.get("cpu_bind", None) is not None: + job_kwargs["slurm_srun_args"].extend(["--cpu-bind", submitit_conf.srun_args.cpu_bind]) + + print("###################### SLURM Config ####################") + print(job_kwargs) + print("##########################################") + executor.update_parameters(**job_kwargs) + + main_port = random.randint(submitit_conf.port_range[0], submitit_conf.port_range[1]) + runner = SubmititRunner(main_port, cfg) + job = executor.submit(runner) + print(f"Submitit Job ID: {job.job_id}") + runner.setup_job_info(job.job_id, rank=0) + else: + cfg.launcher.num_nodes = 1 + main_port = random.randint(submitit_conf.port_range[0], submitit_conf.port_range[1]) + single_node_runner(cfg, main_port) + + +if __name__ == "__main__": + + initialize_config_module("sam2", version_base="1.2") + parser = ArgumentParser() + parser.add_argument( + "-c", + "--config", + required=True, + type=str, + help="path to config file (e.g. configs/sam2.1_training/sam2.1_hiera_b+_MOSE_finetune.yaml)", + ) + parser.add_argument( + "--use-cluster", + type=int, + default=None, + help="whether to launch on a cluster, 0: run locally, 1: run on a cluster", + ) + parser.add_argument("--partition", type=str, default=None, help="SLURM partition") + parser.add_argument("--account", type=str, default=None, help="SLURM account") + parser.add_argument("--qos", type=str, default=None, help="SLURM qos") + parser.add_argument("--num-gpus", type=int, default=None, help="number of GPUS per node") + parser.add_argument("--num-nodes", type=int, default=None, help="Number of nodes") + args = parser.parse_args() + args.use_cluster = bool(args.use_cluster) if args.use_cluster is not None else None + register_omegaconf_resolvers() + main(args) diff --git a/bboxmaskpose/sam2/training/trainer.py b/bboxmaskpose/sam2/training/trainer.py new file mode 100644 index 0000000000000000000000000000000000000000..9a8edb7fb6ce59e54074c5b82ef2ba96c9a6b49b --- /dev/null +++ b/bboxmaskpose/sam2/training/trainer.py @@ -0,0 +1,1038 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +import gc +import json +import logging +import math +import os +import time +from collections import OrderedDict +from dataclasses import dataclass, field +from typing import Any, Dict, List, Mapping, Optional + +import numpy as np +import torch +import torch.distributed as dist +import torch.nn as nn + +from hydra.utils import instantiate +from iopath.common.file_io import g_pathmgr +from training.optimizer import construct_optimizer +from training.utils.checkpoint_utils import ( + assert_skipped_parameters_are_frozen, + exclude_params_matching_unix_pattern, + load_state_dict_into_model, + with_check_parameter_frozen, +) +from training.utils.data_utils import BatchedVideoDatapoint +from training.utils.distributed import all_reduce_max, barrier, get_rank +from training.utils.logger import Logger, setup_logging +from training.utils.train_utils import ( + AverageMeter, + DurationMeter, + MemMeter, + Phase, + ProgressMeter, + collect_dict_keys, + get_amp_type, + get_machine_local_and_dist_rank, + get_resume_checkpoint, + human_readable_time, + is_dist_avail_and_initialized, + log_env_variables, + makedir, + set_seeds, + setup_distributed_backend, +) + +CORE_LOSS_KEY = "core_loss" + + +def unwrap_ddp_if_wrapped(model): + if isinstance(model, torch.nn.parallel.DistributedDataParallel): + return model.module + return model + + +@dataclass +class OptimAMPConf: + enabled: bool = False + amp_dtype: str = "float16" + + +@dataclass +class OptimConf: + optimizer: torch.optim.Optimizer = None + options: Optional[Dict[str, Any]] = None + param_group_modifiers: Optional[List] = None + amp: Optional[Dict[str, Any]] = None + gradient_clip: Any = None + gradient_logger: Any = None + + def __post_init__(self): + # amp + if not isinstance(self.amp, OptimAMPConf): + if self.amp is None: + self.amp = {} + assert isinstance(self.amp, Mapping) + self.amp = OptimAMPConf(**self.amp) + + +@dataclass +class DistributedConf: + backend: Optional[str] = None # inferred from accelerator type + comms_dtype: Optional[str] = None + find_unused_parameters: bool = False + timeout_mins: int = 30 + + +@dataclass +class CudaConf: + cudnn_deterministic: bool = False + cudnn_benchmark: bool = True + allow_tf32: bool = False + # if not None, `matmul_allow_tf32` key will override `allow_tf32` for matmul + matmul_allow_tf32: Optional[bool] = None + # if not None, `cudnn_allow_tf32` key will override `allow_tf32` for cudnn + cudnn_allow_tf32: Optional[bool] = None + + +@dataclass +class CheckpointConf: + save_dir: str + save_freq: int + save_list: List[int] = field(default_factory=list) + model_weight_initializer: Any = None + save_best_meters: List[str] = None + skip_saving_parameters: List[str] = field(default_factory=list) + initialize_after_preemption: Optional[bool] = None + # if not None, training will be resumed from this checkpoint + resume_from: Optional[str] = None + + def infer_missing(self): + if self.initialize_after_preemption is None: + with_skip_saving = len(self.skip_saving_parameters) > 0 + self.initialize_after_preemption = with_skip_saving + return self + + +@dataclass +class LoggingConf: + log_dir: str + log_freq: int # In iterations + tensorboard_writer: Any + log_level_primary: str = "INFO" + log_level_secondary: str = "ERROR" + log_scalar_frequency: int = 100 + log_visual_frequency: int = 100 + scalar_keys_to_log: Optional[Dict[str, Any]] = None + log_batch_stats: bool = False + + +class Trainer: + """ + Trainer supporting the DDP training strategies. + """ + + EPSILON = 1e-8 + + def __init__( + self, + *, # the order of these args can change at any time, so they are keyword-only + data: Dict[str, Any], + model: Dict[str, Any], + logging: Dict[str, Any], + checkpoint: Dict[str, Any], + max_epochs: int, + mode: str = "train", + accelerator: str = "cuda", + seed_value: int = 123, + val_epoch_freq: int = 1, + distributed: Dict[str, bool] = None, + cuda: Dict[str, bool] = None, + env_variables: Optional[Dict[str, Any]] = None, + optim: Optional[Dict[str, Any]] = None, + optim_overrides: Optional[List[Dict[str, Any]]] = None, + meters: Optional[Dict[str, Any]] = None, + loss: Optional[Dict[str, Any]] = None, + ): + + self._setup_env_variables(env_variables) + self._setup_timers() + + self.data_conf = data + self.model_conf = model + self.logging_conf = LoggingConf(**logging) + self.checkpoint_conf = CheckpointConf(**checkpoint).infer_missing() + self.max_epochs = max_epochs + self.mode = mode + self.val_epoch_freq = val_epoch_freq + self.optim_conf = OptimConf(**optim) if optim is not None else None + self.meters_conf = meters + self.loss_conf = loss + distributed = DistributedConf(**distributed or {}) + cuda = CudaConf(**cuda or {}) + self.where = 0.0 + + self._infer_distributed_backend_if_none(distributed, accelerator) + + self._setup_device(accelerator) + + self._setup_torch_dist_and_backend(cuda, distributed) + + makedir(self.logging_conf.log_dir) + setup_logging( + __name__, + output_dir=self.logging_conf.log_dir, + rank=self.rank, + log_level_primary=self.logging_conf.log_level_primary, + log_level_secondary=self.logging_conf.log_level_secondary, + ) + + set_seeds(seed_value, self.max_epochs, self.distributed_rank) + log_env_variables() + + assert is_dist_avail_and_initialized(), "Torch distributed needs to be initialized before calling the trainer." + + self._setup_components() # Except Optimizer everything is setup here. + self._move_to_device() + self._construct_optimizers() + self._setup_dataloaders() + + self.time_elapsed_meter = DurationMeter("Time Elapsed", self.device, ":.2f") + + if self.checkpoint_conf.resume_from is not None: + assert os.path.exists( + self.checkpoint_conf.resume_from + ), f"The 'resume_from' checkpoint {self.checkpoint_conf.resume_from} does not exist!" + dst = os.path.join(self.checkpoint_conf.save_dir, "checkpoint.pt") + if self.distributed_rank == 0 and not os.path.exists(dst): + # Copy the "resume_from" checkpoint to the checkpoint folder + # if there is not a checkpoint to resume from already there + makedir(self.checkpoint_conf.save_dir) + g_pathmgr.copy(self.checkpoint_conf.resume_from, dst) + barrier() + + self.load_checkpoint() + self._setup_ddp_distributed_training(distributed, accelerator) + barrier() + + def _setup_timers(self): + """ + Initializes counters for elapsed time and eta. + """ + self.start_time = time.time() + self.ckpt_time_elapsed = 0 + self.est_epoch_time = dict.fromkeys([Phase.TRAIN, Phase.VAL], 0) + + def _get_meters(self, phase_filters=None): + if self.meters is None: + return {} + meters = {} + for phase, phase_meters in self.meters.items(): + if phase_filters is not None and phase not in phase_filters: + continue + for key, key_meters in phase_meters.items(): + if key_meters is None: + continue + for name, meter in key_meters.items(): + meters[f"{phase}_{key}/{name}"] = meter + return meters + + def _infer_distributed_backend_if_none(self, distributed_conf, accelerator): + if distributed_conf.backend is None: + distributed_conf.backend = "nccl" if accelerator == "cuda" else "gloo" + + def _setup_env_variables(self, env_variables_conf) -> None: + if env_variables_conf is not None: + for variable_name, value in env_variables_conf.items(): + os.environ[variable_name] = value + + def _setup_torch_dist_and_backend(self, cuda_conf, distributed_conf) -> None: + if torch.cuda.is_available(): + torch.backends.cudnn.deterministic = cuda_conf.cudnn_deterministic + torch.backends.cudnn.benchmark = cuda_conf.cudnn_benchmark + torch.backends.cuda.matmul.allow_tf32 = ( + cuda_conf.matmul_allow_tf32 if cuda_conf.matmul_allow_tf32 is not None else cuda_conf.allow_tf32 + ) + torch.backends.cudnn.allow_tf32 = cuda_conf.cudnn_allow_tf32 if cuda_conf.cudnn_allow_tf32 is not None else cuda_conf.allow_tf32 + + self.rank = setup_distributed_backend(distributed_conf.backend, distributed_conf.timeout_mins) + + def _setup_device(self, accelerator): + self.local_rank, self.distributed_rank = get_machine_local_and_dist_rank() + if accelerator == "cuda": + self.device = torch.device("cuda", self.local_rank) + torch.cuda.set_device(self.local_rank) + elif accelerator == "cpu": + self.device = torch.device("cpu") + else: + raise ValueError(f"Unsupported accelerator: {accelerator}") + + def _setup_ddp_distributed_training(self, distributed_conf, accelerator): + + assert isinstance(self.model, torch.nn.Module) + + self.model = nn.parallel.DistributedDataParallel( + self.model, + device_ids=[self.local_rank] if accelerator == "cuda" else [], + find_unused_parameters=distributed_conf.find_unused_parameters, + ) + if distributed_conf.comms_dtype is not None: # noqa + from torch.distributed.algorithms import ddp_comm_hooks + + amp_type = get_amp_type(distributed_conf.comms_dtype) + if amp_type == torch.bfloat16: + hook = ddp_comm_hooks.default_hooks.bf16_compress_hook + logging.info("Enabling bfloat16 grad communication") + else: + hook = ddp_comm_hooks.default_hooks.fp16_compress_hook + logging.info("Enabling fp16 grad communication") + process_group = None + self.model.register_comm_hook(process_group, hook) + + def _move_to_device(self): + logging.info(f"Moving components to device {self.device} and local rank {self.local_rank}.") + + self.model.to(self.device) + + logging.info(f"Done moving components to device {self.device} and local rank {self.local_rank}.") + + def save_checkpoint(self, epoch, checkpoint_names=None): + checkpoint_folder = self.checkpoint_conf.save_dir + makedir(checkpoint_folder) + if checkpoint_names is None: + checkpoint_names = ["checkpoint"] + if (self.checkpoint_conf.save_freq > 0 and (int(epoch) % self.checkpoint_conf.save_freq == 0)) or int( + epoch + ) in self.checkpoint_conf.save_list: + checkpoint_names.append(f"checkpoint_{int(epoch)}") + + checkpoint_paths = [] + for ckpt_name in checkpoint_names: + checkpoint_paths.append(os.path.join(checkpoint_folder, f"{ckpt_name}.pt")) + + state_dict = unwrap_ddp_if_wrapped(self.model).state_dict() + state_dict = exclude_params_matching_unix_pattern(patterns=self.checkpoint_conf.skip_saving_parameters, state_dict=state_dict) + + checkpoint = { + "model": state_dict, + "optimizer": self.optim.optimizer.state_dict(), + "epoch": epoch, + "loss": self.loss.state_dict(), + "steps": self.steps, + "time_elapsed": self.time_elapsed_meter.val, + "best_meter_values": self.best_meter_values, + } + if self.optim_conf.amp.enabled: + checkpoint["scaler"] = self.scaler.state_dict() + + # DDP checkpoints are only saved on rank 0 (all workers are identical) + if self.distributed_rank != 0: + return + + for checkpoint_path in checkpoint_paths: + self._save_checkpoint(checkpoint, checkpoint_path) + + def _save_checkpoint(self, checkpoint, checkpoint_path): + """ + Save a checkpoint while guarding against the job being killed in the middle + of checkpoint saving (which corrupts the checkpoint file and ruins the + entire training since usually only the last checkpoint is kept per run). + + We first save the new checkpoint to a temp file (with a '.tmp' suffix), and + and move it to overwrite the old checkpoint_path. + """ + checkpoint_path_tmp = f"{checkpoint_path}.tmp" + with g_pathmgr.open(checkpoint_path_tmp, "wb") as f: + torch.save(checkpoint, f) + # after torch.save is completed, replace the old checkpoint with the new one + if g_pathmgr.exists(checkpoint_path): + # remove the old checkpoint_path file first (otherwise g_pathmgr.mv fails) + g_pathmgr.rm(checkpoint_path) + success = g_pathmgr.mv(checkpoint_path_tmp, checkpoint_path) + assert success + + def load_checkpoint(self): + ckpt_path = get_resume_checkpoint(self.checkpoint_conf.save_dir) + if ckpt_path is None: + self._init_model_state() + else: + if self.checkpoint_conf.initialize_after_preemption: + self._call_model_initializer() + self._load_resuming_checkpoint(ckpt_path) + + def _init_model_state(self): + # Checking that parameters that won't be saved are indeed frozen + # We do this check here before even saving the model to catch errors + # are early as possible and not at the end of the first epoch + assert_skipped_parameters_are_frozen( + patterns=self.checkpoint_conf.skip_saving_parameters, + model=self.model, + ) + + # Checking that parameters that won't be saved are initialized from + # within the model definition, unless `initialize_after_preemption` + # is explicitly set to `True`. If not, this is a bug, and after + # preemption, the `skip_saving_parameters` will have random values + allow_init_skip_parameters = self.checkpoint_conf.initialize_after_preemption + with with_check_parameter_frozen( + patterns=self.checkpoint_conf.skip_saving_parameters, + model=self.model, + disabled=allow_init_skip_parameters, + ): + self._call_model_initializer() + + def _call_model_initializer(self): + model_weight_initializer = instantiate(self.checkpoint_conf.model_weight_initializer) + if model_weight_initializer is not None: + logging.info(f"Loading pretrained checkpoint from {self.checkpoint_conf.model_weight_initializer}") + self.model = model_weight_initializer(model=self.model) + + def _load_resuming_checkpoint(self, ckpt_path: str): + logging.info(f"Resuming training from {ckpt_path}") + + with g_pathmgr.open(ckpt_path, "rb") as f: + checkpoint = torch.load(f, map_location="cpu") + load_state_dict_into_model( + model=self.model, + state_dict=checkpoint["model"], + ignore_missing_keys=self.checkpoint_conf.skip_saving_parameters, + ) + + self.optim.optimizer.load_state_dict(checkpoint["optimizer"]) + self.loss.load_state_dict(checkpoint["loss"], strict=True) + self.epoch = checkpoint["epoch"] + self.steps = checkpoint["steps"] + self.ckpt_time_elapsed = checkpoint.get("time_elapsed") + + if self.optim_conf.amp.enabled and "scaler" in checkpoint: + self.scaler.load_state_dict(checkpoint["scaler"]) + + self.best_meter_values = checkpoint.get("best_meter_values", {}) + + if "train_dataset" in checkpoint and self.train_dataset is not None: + self.train_dataset.load_checkpoint_state(checkpoint["train_dataset"]) + + def is_intermediate_val_epoch(self, epoch): + return epoch % self.val_epoch_freq == 0 and epoch < self.max_epochs - 1 + + def _step( + self, + batch: BatchedVideoDatapoint, + model: nn.Module, + phase: str, + ): + + outputs = model(batch) + targets = batch.masks + batch_size = len(batch.img_batch) + + key = batch.dict_key # key for dataset + loss = self.loss[key](outputs, targets) + loss_str = f"Losses/{phase}_{key}_loss" + + loss_log_str = os.path.join("Step_Losses", loss_str) + + # loss contains multiple sub-components we wish to log + step_losses = {} + if isinstance(loss, dict): + step_losses.update({f"Losses/{phase}_{key}_{k}": v for k, v in loss.items()}) + loss = self._log_loss_detailed_and_return_core_loss(loss, loss_log_str, self.steps[phase]) + + if self.steps[phase] % self.logging_conf.log_scalar_frequency == 0: + self.logger.log( + loss_log_str, + loss, + self.steps[phase], + ) + + self.steps[phase] += 1 + + ret_tuple = {loss_str: loss}, batch_size, step_losses + + if phase in self.meters and key in self.meters[phase]: + meters_dict = self.meters[phase][key] + if meters_dict is not None: + for _, meter in meters_dict.items(): + meter.update( + find_stages=outputs, + find_metadatas=batch.metadata, + ) + + return ret_tuple + + def run(self): + assert self.mode in ["train", "train_only", "val"] + if self.mode == "train": + if self.epoch > 0: + logging.info(f"Resuming training from epoch: {self.epoch}") + # resuming from a checkpoint + if self.is_intermediate_val_epoch(self.epoch - 1): + logging.info("Running previous val epoch") + self.epoch -= 1 + self.run_val() + self.epoch += 1 + self.run_train() + self.run_val() + elif self.mode == "val": + self.run_val() + elif self.mode == "train_only": + self.run_train() + + def _setup_dataloaders(self): + self.train_dataset = None + self.val_dataset = None + + if self.mode in ["train", "val"]: + self.val_dataset = instantiate(self.data_conf.get(Phase.VAL, None)) + + if self.mode in ["train", "train_only"]: + self.train_dataset = instantiate(self.data_conf.train) + + def run_train(self): + + while self.epoch < self.max_epochs: + dataloader = self.train_dataset.get_loader(epoch=int(self.epoch)) + barrier() + outs = self.train_epoch(dataloader) + self.logger.log_dict(outs, self.epoch) # Logged only on rank 0 + + # log train to text file. + if self.distributed_rank == 0: + with g_pathmgr.open( + os.path.join(self.logging_conf.log_dir, "train_stats.json"), + "a", + ) as f: + f.write(json.dumps(outs) + "\n") + + # Save checkpoint before validating + self.save_checkpoint(self.epoch + 1) + + del dataloader + gc.collect() + + # Run val, not running on last epoch since will run after the + # loop anyway + if self.is_intermediate_val_epoch(self.epoch): + self.run_val() + + if self.distributed_rank == 0: + self.best_meter_values.update(self._get_trainer_state("train")) + with g_pathmgr.open( + os.path.join(self.logging_conf.log_dir, "best_stats.json"), + "a", + ) as f: + f.write(json.dumps(self.best_meter_values) + "\n") + + self.epoch += 1 + # epoch was incremented in the loop but the val step runs out of the loop + self.epoch -= 1 + + def run_val(self): + if not self.val_dataset: + return + + dataloader = self.val_dataset.get_loader(epoch=int(self.epoch)) + outs = self.val_epoch(dataloader, phase=Phase.VAL) + del dataloader + gc.collect() + self.logger.log_dict(outs, self.epoch) # Logged only on rank 0 + + if self.distributed_rank == 0: + with g_pathmgr.open( + os.path.join(self.logging_conf.log_dir, "val_stats.json"), + "a", + ) as f: + f.write(json.dumps(outs) + "\n") + + def val_epoch(self, val_loader, phase): + batch_time = AverageMeter("Batch Time", self.device, ":.2f") + data_time = AverageMeter("Data Time", self.device, ":.2f") + mem = MemMeter("Mem (GB)", self.device, ":.2f") + + iters_per_epoch = len(val_loader) + + curr_phases = [phase] + curr_models = [self.model] + + loss_names = [] + for p in curr_phases: + for key in self.loss.keys(): + loss_names.append(f"Losses/{p}_{key}_loss") + + loss_mts = OrderedDict([(name, AverageMeter(name, self.device, ":.2e")) for name in loss_names]) + extra_loss_mts = {} + + for model in curr_models: + model.eval() + if hasattr(unwrap_ddp_if_wrapped(model), "on_validation_epoch_start"): + unwrap_ddp_if_wrapped(model).on_validation_epoch_start() + + progress = ProgressMeter( + iters_per_epoch, + [batch_time, data_time, mem, self.time_elapsed_meter, *loss_mts.values()], + self._get_meters(curr_phases), + prefix="Val Epoch: [{}]".format(self.epoch), + ) + + end = time.time() + + for data_iter, batch in enumerate(val_loader): + + # measure data loading time + data_time.update(time.time() - end) + + batch = batch.to(self.device, non_blocking=True) + + # compute output + with torch.no_grad(): + with torch.cuda.amp.autocast( + enabled=(self.optim_conf.amp.enabled if self.optim_conf else False), + dtype=(get_amp_type(self.optim_conf.amp.amp_dtype) if self.optim_conf else None), + ): + for phase, model in zip(curr_phases, curr_models): + loss_dict, batch_size, extra_losses = self._step( + batch, + model, + phase, + ) + + assert len(loss_dict) == 1 + loss_key, loss = loss_dict.popitem() + + loss_mts[loss_key].update(loss.item(), batch_size) + + for k, v in extra_losses.items(): + if k not in extra_loss_mts: + extra_loss_mts[k] = AverageMeter(k, self.device, ":.2e") + extra_loss_mts[k].update(v.item(), batch_size) + + # measure elapsed time + batch_time.update(time.time() - end) + end = time.time() + + self.time_elapsed_meter.update(time.time() - self.start_time + self.ckpt_time_elapsed) + + if torch.cuda.is_available(): + mem.update(reset_peak_usage=True) + + if data_iter % self.logging_conf.log_freq == 0: + progress.display(data_iter) + + if data_iter % self.logging_conf.log_scalar_frequency == 0: + # Log progress meters. + for progress_meter in progress.meters: + self.logger.log( + os.path.join("Step_Stats", phase, progress_meter.name), + progress_meter.val, + self.steps[Phase.VAL], + ) + + if data_iter % 10 == 0: + dist.barrier() + + self.est_epoch_time[phase] = batch_time.avg * iters_per_epoch + self._log_timers(phase) + for model in curr_models: + if hasattr(unwrap_ddp_if_wrapped(model), "on_validation_epoch_end"): + unwrap_ddp_if_wrapped(model).on_validation_epoch_end() + + out_dict = self._log_meters_and_save_best_ckpts(curr_phases) + + for k, v in loss_mts.items(): + out_dict[k] = v.avg + for k, v in extra_loss_mts.items(): + out_dict[k] = v.avg + + for phase in curr_phases: + out_dict.update(self._get_trainer_state(phase)) + self._reset_meters(curr_phases) + logging.info(f"Meters: {out_dict}") + + return out_dict + + def _get_trainer_state(self, phase): + return { + "Trainer/where": self.where, + "Trainer/epoch": self.epoch, + f"Trainer/steps_{phase}": self.steps[phase], + } + + def train_epoch(self, train_loader): + + # Init stat meters + batch_time_meter = AverageMeter("Batch Time", self.device, ":.2f") + data_time_meter = AverageMeter("Data Time", self.device, ":.2f") + mem_meter = MemMeter("Mem (GB)", self.device, ":.2f") + data_times = [] + phase = Phase.TRAIN + + iters_per_epoch = len(train_loader) + + loss_names = [] + for batch_key in self.loss.keys(): + loss_names.append(f"Losses/{phase}_{batch_key}_loss") + + loss_mts = OrderedDict([(name, AverageMeter(name, self.device, ":.2e")) for name in loss_names]) + extra_loss_mts = {} + + progress = ProgressMeter( + iters_per_epoch, + [ + batch_time_meter, + data_time_meter, + mem_meter, + self.time_elapsed_meter, + *loss_mts.values(), + ], + self._get_meters([phase]), + prefix="Train Epoch: [{}]".format(self.epoch), + ) + + # Model training loop + self.model.train() + end = time.time() + + for data_iter, batch in enumerate(train_loader): + # measure data loading time + data_time_meter.update(time.time() - end) + data_times.append(data_time_meter.val) + batch = batch.to(self.device, non_blocking=True) # move tensors in a tensorclass + + try: + self._run_step(batch, phase, loss_mts, extra_loss_mts) + + # compute gradient and do optim step + exact_epoch = self.epoch + float(data_iter) / iters_per_epoch + self.where = float(exact_epoch) / self.max_epochs + assert self.where <= 1 + self.EPSILON + if self.where < 1.0: + self.optim.step_schedulers(self.where, step=int(exact_epoch * iters_per_epoch)) + else: + logging.warning(f"Skipping scheduler update since the training is at the end, i.e, {self.where} of [0,1].") + + # Log schedulers + if data_iter % self.logging_conf.log_scalar_frequency == 0: + for j, param_group in enumerate(self.optim.optimizer.param_groups): + for option in self.optim.schedulers[j]: + optim_prefix = "" + f"{j}_" if len(self.optim.optimizer.param_groups) > 1 else "" + self.logger.log( + os.path.join("Optim", f"{optim_prefix}", option), + param_group[option], + self.steps[phase], + ) + + # Clipping gradients and detecting diverging gradients + if self.gradient_clipper is not None: + self.scaler.unscale_(self.optim.optimizer) + self.gradient_clipper(model=self.model) + + if self.gradient_logger is not None: + self.gradient_logger(self.model, rank=self.distributed_rank, where=self.where) + + # Optimizer step: the scaler will make sure gradients are not + # applied if the gradients are infinite + self.scaler.step(self.optim.optimizer) + self.scaler.update() + + # measure elapsed time + batch_time_meter.update(time.time() - end) + end = time.time() + + self.time_elapsed_meter.update(time.time() - self.start_time + self.ckpt_time_elapsed) + + mem_meter.update(reset_peak_usage=True) + if data_iter % self.logging_conf.log_freq == 0: + progress.display(data_iter) + + if data_iter % self.logging_conf.log_scalar_frequency == 0: + # Log progress meters. + for progress_meter in progress.meters: + self.logger.log( + os.path.join("Step_Stats", phase, progress_meter.name), + progress_meter.val, + self.steps[phase], + ) + + # Catching NaN/Inf errors in the loss + except FloatingPointError as e: + raise e + + self.est_epoch_time[Phase.TRAIN] = batch_time_meter.avg * iters_per_epoch + self._log_timers(Phase.TRAIN) + self._log_sync_data_times(Phase.TRAIN, data_times) + + out_dict = self._log_meters_and_save_best_ckpts([Phase.TRAIN]) + + for k, v in loss_mts.items(): + out_dict[k] = v.avg + for k, v in extra_loss_mts.items(): + out_dict[k] = v.avg + out_dict.update(self._get_trainer_state(phase)) + logging.info(f"Losses and meters: {out_dict}") + self._reset_meters([phase]) + + os.system("bash epoch_evaluation.sh") ## CHANGED - OUTPUT AP 50:95 XTCOCOTOOLS EVALUATION TO JSON + + return out_dict + + def _log_sync_data_times(self, phase, data_times): + data_times = all_reduce_max(torch.tensor(data_times)).tolist() + steps = range(self.steps[phase] - len(data_times), self.steps[phase]) + for step, data_time in zip(steps, data_times): + if step % self.logging_conf.log_scalar_frequency == 0: + self.logger.log( + os.path.join("Step_Stats", phase, "Data Time Synced"), + data_time, + step, + ) + + def _run_step( + self, + batch: BatchedVideoDatapoint, + phase: str, + loss_mts: Dict[str, AverageMeter], + extra_loss_mts: Dict[str, AverageMeter], + raise_on_error: bool = True, + ): + """ + Run the forward / backward + """ + + # it's important to set grads to None, especially with Adam since 0 + # grads will also update a model even if the step doesn't produce + # gradients + self.optim.zero_grad(set_to_none=True) + with torch.cuda.amp.autocast( + enabled=self.optim_conf.amp.enabled, + dtype=get_amp_type(self.optim_conf.amp.amp_dtype), + ): + loss_dict, batch_size, extra_losses = self._step( + batch, + self.model, + phase, + ) + + assert len(loss_dict) == 1 + loss_key, loss = loss_dict.popitem() + + if not math.isfinite(loss.item()): + error_msg = f"Loss is {loss.item()}, attempting to stop training" + logging.error(error_msg) + if raise_on_error: + raise FloatingPointError(error_msg) + else: + return + + self.scaler.scale(loss).backward() + loss_mts[loss_key].update(loss.item(), batch_size) + for extra_loss_key, extra_loss in extra_losses.items(): + if extra_loss_key not in extra_loss_mts: + extra_loss_mts[extra_loss_key] = AverageMeter(extra_loss_key, self.device, ":.2e") + extra_loss_mts[extra_loss_key].update(extra_loss.item(), batch_size) + + def _log_meters_and_save_best_ckpts(self, phases: List[str]): + logging.info("Synchronizing meters") + out_dict = {} + checkpoint_save_keys = [] + for key, meter in self._get_meters(phases).items(): + meter_output = meter.compute_synced() + is_better_check = getattr(meter, "is_better", None) + + for meter_subkey, meter_value in meter_output.items(): + out_dict[os.path.join("Meters_train", key, meter_subkey)] = meter_value + + if is_better_check is None: + continue + + tracked_meter_key = os.path.join(key, meter_subkey) + if tracked_meter_key not in self.best_meter_values or is_better_check( + meter_value, + self.best_meter_values[tracked_meter_key], + ): + self.best_meter_values[tracked_meter_key] = meter_value + + if self.checkpoint_conf.save_best_meters is not None and key in self.checkpoint_conf.save_best_meters: + checkpoint_save_keys.append(tracked_meter_key.replace("/", "_")) + + if len(checkpoint_save_keys) > 0: + self.save_checkpoint(self.epoch + 1, checkpoint_save_keys) + + return out_dict + + def _log_timers(self, phase): + time_remaining = 0 + epochs_remaining = self.max_epochs - self.epoch - 1 + val_epochs_remaining = sum(n % self.val_epoch_freq == 0 for n in range(self.epoch, self.max_epochs)) + + # Adding the guaranteed val run at the end if val_epoch_freq doesn't coincide with + # the end epoch. + if (self.max_epochs - 1) % self.val_epoch_freq != 0: + val_epochs_remaining += 1 + + # Remove the current val run from estimate + if phase == Phase.VAL: + val_epochs_remaining -= 1 + + time_remaining += epochs_remaining * self.est_epoch_time[Phase.TRAIN] + val_epochs_remaining * self.est_epoch_time[Phase.VAL] + + self.logger.log( + os.path.join("Step_Stats", phase, self.time_elapsed_meter.name), + self.time_elapsed_meter.val, + self.steps[phase], + ) + + logging.info(f"Estimated time remaining: {human_readable_time(time_remaining)}") + + def _reset_meters(self, phases: str) -> None: + for meter in self._get_meters(phases).values(): + meter.reset() + + def _check_val_key_match(self, val_keys, phase): + if val_keys is not None: + # Check if there are any duplicates + assert len(val_keys) == len(set(val_keys)), f"Duplicate keys in val datasets, keys: {val_keys}" + + # Check that the keys match the meter keys + if self.meters_conf is not None and phase in self.meters_conf: + assert set(val_keys) == set(self.meters_conf[phase].keys()), ( + f"Keys in val datasets do not match the keys in meters." + f"\nMissing in meters: {set(val_keys) - set(self.meters_conf[phase].keys())}" + f"\nMissing in val datasets: {set(self.meters_conf[phase].keys()) - set(val_keys)}" + ) + + if self.loss_conf is not None: + loss_keys = set(self.loss_conf.keys()) - set(["all"]) + assert all([k in loss_keys for k in val_keys]), ( + f"Keys in val datasets do not match the keys in losses." + f"\nMissing in losses: {set(val_keys) - loss_keys}" + f"\nMissing in val datasets: {loss_keys - set(val_keys)}" + ) + + def _setup_components(self): + + # Get the keys for all the val datasets, if any + val_phase = Phase.VAL + val_keys = None + if self.data_conf.get(val_phase, None) is not None: + val_keys = collect_dict_keys(self.data_conf[val_phase]) + # Additional checks on the sanity of the config for val datasets + self._check_val_key_match(val_keys, phase=val_phase) + + logging.info("Setting up components: Model, loss, optim, meters etc.") + self.epoch = 0 + self.steps = {Phase.TRAIN: 0, Phase.VAL: 0} + + self.logger = Logger(self.logging_conf) + + self.model = instantiate(self.model_conf, _convert_="all") + print_model_summary(self.model) + + self.loss = None + if self.loss_conf: + self.loss = {key: el for (key, el) in instantiate(self.loss_conf, _convert_="all").items()} # wrap_base_loss(el) + self.loss = nn.ModuleDict(self.loss) + + self.meters = {} + self.best_meter_values = {} + if self.meters_conf: + self.meters = instantiate(self.meters_conf, _convert_="all") + + self.scaler = torch.amp.GradScaler( + self.device, + enabled=self.optim_conf.amp.enabled if self.optim_conf else False, + ) + + self.gradient_clipper = instantiate(self.optim_conf.gradient_clip) if self.optim_conf else None + self.gradient_logger = instantiate(self.optim_conf.gradient_logger) if self.optim_conf else None + + logging.info("Finished setting up components: Model, loss, optim, meters etc.") + + def _construct_optimizers(self): + self.optim = construct_optimizer( + self.model, + self.optim_conf.optimizer, + self.optim_conf.options, + self.optim_conf.param_group_modifiers, + ) + + def _log_loss_detailed_and_return_core_loss(self, loss, loss_str, step): + core_loss = loss.pop(CORE_LOSS_KEY) + if step % self.logging_conf.log_scalar_frequency == 0: + for k in loss: + log_str = os.path.join(loss_str, k) + self.logger.log(log_str, loss[k], step) + return core_loss + + +def print_model_summary(model: torch.nn.Module, log_dir: str = ""): + """ + Prints the model and the number of parameters in the model. + # Multiple packages provide this info in a nice table format + # However, they need us to provide an `input` (as they also write down the output sizes) + # Our models are complex, and a single input is restrictive. + # https://github.com/sksq96/pytorch-summary + # https://github.com/nmhkahn/torchsummaryX + """ + if get_rank() != 0: + return + param_kwargs = {} + trainable_parameters = sum(p.numel() for p in model.parameters(**param_kwargs) if p.requires_grad) + total_parameters = sum(p.numel() for p in model.parameters(**param_kwargs)) + non_trainable_parameters = total_parameters - trainable_parameters + logging.info("==" * 10) + logging.info(f"Summary for model {type(model)}") + logging.info(f"Model is {model}") + logging.info(f"\tTotal parameters {get_human_readable_count(total_parameters)}") + logging.info(f"\tTrainable parameters {get_human_readable_count(trainable_parameters)}") + logging.info(f"\tNon-Trainable parameters {get_human_readable_count(non_trainable_parameters)}") + logging.info("==" * 10) + + if log_dir: + output_fpath = os.path.join(log_dir, "model.txt") + with g_pathmgr.open(output_fpath, "w") as f: + print(model, file=f) + + +PARAMETER_NUM_UNITS = [" ", "K", "M", "B", "T"] + + +def get_human_readable_count(number: int) -> str: + """ + Abbreviates an integer number with K, M, B, T for thousands, millions, + billions and trillions, respectively. + Examples: + >>> get_human_readable_count(123) + '123 ' + >>> get_human_readable_count(1234) # (one thousand) + '1.2 K' + >>> get_human_readable_count(2e6) # (two million) + '2.0 M' + >>> get_human_readable_count(3e9) # (three billion) + '3.0 B' + >>> get_human_readable_count(4e14) # (four hundred trillion) + '400 T' + >>> get_human_readable_count(5e15) # (more than trillion) + '5,000 T' + Args: + number: a positive integer number + Return: + A string formatted according to the pattern described above. + """ + assert number >= 0 + labels = PARAMETER_NUM_UNITS + num_digits = int(np.floor(np.log10(number)) + 1 if number > 0 else 1) + num_groups = int(np.ceil(num_digits / 3)) + num_groups = min(num_groups, len(labels)) # don't abbreviate beyond trillions + shift = -3 * (num_groups - 1) + number = number * (10**shift) + index = num_groups - 1 + if index < 1 or number >= 100: + return f"{int(number):,d} {labels[index]}" + else: + return f"{number:,.1f} {labels[index]}" diff --git a/bboxmaskpose/sam2/training/utils/__init__.py b/bboxmaskpose/sam2/training/utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..5277f46157403e47fd830fc519144b97ef69d4ae --- /dev/null +++ b/bboxmaskpose/sam2/training/utils/__init__.py @@ -0,0 +1,5 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. diff --git a/bboxmaskpose/sam2/training/utils/checkpoint_utils.py b/bboxmaskpose/sam2/training/utils/checkpoint_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..8d7847cd05d2bd9d049ec839851f70b966851a28 --- /dev/null +++ b/bboxmaskpose/sam2/training/utils/checkpoint_utils.py @@ -0,0 +1,308 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +import contextlib +import fnmatch +import logging +from typing import Any, Callable, Dict, List, Mapping, Optional, Sequence, Set, Tuple, Union + +import numpy as np +import torch +import torch.nn as nn +from torch.jit._script import RecursiveScriptModule + +from iopath.common.file_io import g_pathmgr + + +def unix_pattern_to_parameter_names(constraints: List[str], all_parameter_names: Sequence[str]) -> Union[None, Set[str]]: + """ + Go through the list of parameter names and select those that match + any of the provided constraints + """ + parameter_names = [] + for param_name in constraints: + matching_parameters = set(fnmatch.filter(all_parameter_names, param_name)) + assert len(matching_parameters) > 0, f"param_names {param_name} don't match any param in the given names." + parameter_names.append(matching_parameters) + return set.union(*parameter_names) + + +def filter_params_matching_unix_pattern(patterns: List[str], state_dict: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]: + """ + Remove from the state dictionary the parameters matching the provided unix patterns + + Args: + patterns: the list of unix patterns to exclude + state_dict: the dictionary to filter + + Returns: + A new state dictionary + """ + if len(patterns) == 0: + return {} + + all_keys = list(state_dict.keys()) + included_keys = unix_pattern_to_parameter_names(patterns, all_keys) + return {k: state_dict[k] for k in included_keys} + + +def exclude_params_matching_unix_pattern(patterns: List[str], state_dict: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]: + """ + Remove from the state dictionary the parameters matching the provided unix patterns + + Args: + patterns: the list of unix patterns to exclude + state_dict: the dictionary to filter + + Returns: + A new state dictionary + """ + if len(patterns) == 0: + return state_dict + + all_keys = list(state_dict.keys()) + excluded_keys = unix_pattern_to_parameter_names(patterns, all_keys) + return {k: v for k, v in state_dict.items() if k not in excluded_keys} + + +def _get_state_dict_summary(state_dict: Dict[str, torch.Tensor]): + keys = [] + trace = [] + for k, v in state_dict.items(): + keys.append(k) + trace.append(v.sum().item()) + trace = np.array(trace)[np.argsort(keys)] + return trace + + +def assert_skipped_parameters_are_frozen(model: nn.Module, patterns: List[str]): + """ + Verifies that all the parameters matching the provided patterns + are frozen - this acts as a safeguard when ignoring parameter + when saving checkpoints - if the parameters are in fact trainable + """ + if not patterns: + return + + frozen_state_dict = filter_params_matching_unix_pattern(patterns=patterns, state_dict=model.state_dict()) + non_frozen_keys = {n for n, p in model.named_parameters() if n in frozen_state_dict and p.requires_grad} + if non_frozen_keys: + raise ValueError(f"Parameters excluded with `skip_saving_parameters` should be frozen: {non_frozen_keys}") + + +@contextlib.contextmanager +def with_check_parameter_frozen(model: nn.Module, patterns: List[str], disabled: bool = True): + """ + Context manager that inspects a model surrounding a piece of code + and verifies if the model has been updated by this piece of code + + The function will raise an exception if the model has been updated + on at least one of the parameter that matches one of the pattern + + Args: + model: the model that might have been updated + patterns: for the parameters we want to observe + allowed: + """ + if not patterns or disabled: + yield + return + + frozen_state_dict = filter_params_matching_unix_pattern(patterns=patterns, state_dict=model.state_dict()) + summary_before = _get_state_dict_summary(frozen_state_dict) + + yield + + frozen_state_dict = filter_params_matching_unix_pattern(patterns=patterns, state_dict=model.state_dict()) + summary_after = _get_state_dict_summary(frozen_state_dict) + + if not np.allclose(summary_before, summary_after, atol=1e-6): + raise ValueError( + f""" + The `model_weight_initializer` has initialized parameters frozen with `skip_saving_parameters`. + You can resolve this error by either initializing those parameters from within the model definition + or using the flag `trainer.checkpoint.initialize_after_preemption` to True. + """ + ) + + +class CkptExcludeKernel: + """ + Removes the keys from the given model state_dict that match the key_pattern. + + Args: + key_pattern: Patterns used to select the keys in the state_dict + that are eligible for this kernel. + """ + + def __init__(self, key_pattern: List[str]): + self.key_pattern = key_pattern + + def __call__(self, state_dict: Dict): + """ + Args: + state_dict: A dictionary representing the given checkpoint's state dict. + """ + if len(self.key_pattern) == 0: + return state_dict + exclude_keys = unix_pattern_to_parameter_names(self.key_pattern, state_dict.keys()) + return {k: v for k, v in state_dict.items() if k not in exclude_keys} + + +def load_checkpoint( + path_list: List[str], + pick_recursive_keys: Optional[List[str]] = None, + map_location: str = "cpu", +) -> Any: + """ + Loads a checkpoint from the specified path. + + Args: + path_list: A list of paths which contain the checkpoint. Each element + is tried (in order) until a file that exists is found. That file is then + used to read the checkpoint. + pick_recursive_keys: Picks sub dicts from the loaded checkpoint if not None. + For pick_recursive_keys = ["a", "b"], will return checkpoint_dict["a"]["b"] + map_location (str): a function, torch.device, string or a dict specifying how to + remap storage locations + + Returns: Model with the matchin pre-trained weights loaded. + """ + path_exists = False + for path in path_list: + if g_pathmgr.exists(path): + path_exists = True + break + + if not path_exists: + raise ValueError(f"No path exists in {path_list}") + + with g_pathmgr.open(path, "rb") as f: + checkpoint = torch.load(f, map_location=map_location) + + logging.info(f"Loaded checkpoint from {path}") + if pick_recursive_keys is not None: + for key in pick_recursive_keys: + checkpoint = checkpoint[key] + return checkpoint + + +def get_state_dict(checkpoint, ckpt_state_dict_keys): + if isinstance(checkpoint, RecursiveScriptModule): + # This is a torchscript JIT model + return checkpoint.state_dict() + pre_train_dict = checkpoint + for i, key in enumerate(ckpt_state_dict_keys): + if (isinstance(pre_train_dict, Mapping) and key not in pre_train_dict) or ( + isinstance(pre_train_dict, Sequence) and key >= len(pre_train_dict) + ): + key_str = '["' + '"]["'.join(list(map(ckpt_state_dict_keys[:i], str))) + '"]' + raise KeyError(f"'{key}' not found in checkpoint{key_str} " f"with keys: {pre_train_dict.keys()}") + pre_train_dict = pre_train_dict[key] + return pre_train_dict + + +def load_checkpoint_and_apply_kernels( + checkpoint_path: str, + checkpoint_kernels: List[Callable] = None, + ckpt_state_dict_keys: Tuple[str] = ("state_dict",), + map_location: str = "cpu", +) -> nn.Module: + """ + Performs checkpoint loading with a variety of pre-processing kernel applied in + sequence. + + Args: + checkpoint_path (str): Path to the checkpoint. + checkpoint_kernels List(Callable): A list of checkpoint processing kernels + to apply in the specified order. Supported kernels include `CkptIncludeKernel`, + `CkptExcludeKernel`, etc. These kernels are applied in the + given order. + ckpt_state_dict_keys (str): Keys containing the model state dict. + map_location (str): a function, torch.device, string or a dict specifying how to + remap storage locations + + Returns: Model with the matchin pre-trained weights loaded. + """ + assert g_pathmgr.exists(checkpoint_path), "Checkpoint '{}' not found".format(checkpoint_path) + + # Load the checkpoint on CPU to avoid GPU mem spike. + with g_pathmgr.open(checkpoint_path, "rb") as f: + checkpoint = torch.load(f, map_location=map_location) + + pre_train_dict = get_state_dict(checkpoint, ckpt_state_dict_keys) + + # Not logging into info etc since it's a huge log + logging.debug("Loaded Checkpoint State Dict pre-kernel application: %s" % str(", ".join(list(pre_train_dict.keys())))) + # Apply kernels + if checkpoint_kernels is not None: + for f in checkpoint_kernels: + pre_train_dict = f(state_dict=pre_train_dict) + + logging.debug("Loaded Checkpoint State Dict Post-kernel application %s" % str(", ".join(list(pre_train_dict.keys())))) + + return pre_train_dict + + +def check_load_state_dict_errors( + missing_keys, + unexpected_keys, + strict: bool, + ignore_missing_keys: List[str] = None, + ignore_unexpected_keys: List[str] = None, +): + if ignore_missing_keys is not None and len(ignore_missing_keys) > 0: + ignored_keys = unix_pattern_to_parameter_names(ignore_missing_keys, missing_keys) + missing_keys = [key for key in missing_keys if key not in ignored_keys] + + if ignore_unexpected_keys is not None and len(ignore_unexpected_keys) > 0: + ignored_unexpected_keys = unix_pattern_to_parameter_names(ignore_unexpected_keys, unexpected_keys) + unexpected_keys = [key for key in unexpected_keys if key not in ignored_unexpected_keys] + + err = "State key mismatch." + if unexpected_keys: + err += f" Unexpected keys: {unexpected_keys}." + if missing_keys: + err += f" Missing keys: {missing_keys}." + + if unexpected_keys or missing_keys: + logging.warning(err) + # if unexpected_keys or strict: + # raise KeyError(err) + + +def load_state_dict_into_model( + state_dict: Dict, + model: nn.Module, + strict: bool = True, + ignore_missing_keys: List[str] = None, + ignore_unexpected_keys: List[str] = None, + checkpoint_kernels: List[Callable] = None, +): + """ + Loads a state dict into the given model. + + Args: + state_dict: A dictionary containing the model's + state dict, or a subset if strict is False + model: Model to load the checkpoint weights into + strict: raise if the state_dict has missing state keys + ignore_missing_keys: unix pattern of keys to ignore + """ + # Apply kernels + if checkpoint_kernels is not None: + for f in checkpoint_kernels: + state_dict = f(state_dict=state_dict) + missing_keys, unexpected_keys = model.load_state_dict(state_dict, strict=False) + + check_load_state_dict_errors( + missing_keys, + unexpected_keys, + strict=strict, + ignore_missing_keys=ignore_missing_keys, + ignore_unexpected_keys=ignore_unexpected_keys, + ) + return model diff --git a/bboxmaskpose/sam2/training/utils/data_utils.py b/bboxmaskpose/sam2/training/utils/data_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..6c8ab502edc33c4df3153a1f0cc2c43c538b0f97 --- /dev/null +++ b/bboxmaskpose/sam2/training/utils/data_utils.py @@ -0,0 +1,176 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +""" +Misc functions, including distributed helpers. + +Mostly copy-paste from torchvision references. +""" + +from dataclasses import dataclass +from typing import List, Optional, Tuple, Union + +import torch +from PIL import Image as PILImage + +from tensordict import tensorclass + + +@tensorclass +class BatchedVideoMetaData: + """ + This class represents metadata about a batch of videos. + Attributes: + unique_objects_identifier: A tensor of shape Bx3 containing unique identifiers for each object in the batch. Index consists of (video_id, obj_id, frame_id) + frame_orig_size: A tensor of shape Bx2 containing the original size of each frame in the batch. + """ + + unique_objects_identifier: torch.LongTensor + frame_orig_size: torch.LongTensor + + +@tensorclass +class BatchedVideoDatapoint: + """ + This class represents a batch of videos with associated annotations and metadata. + Attributes: + img_batch: A [TxBxCxHxW] tensor containing the image data for each frame in the batch, where T is the number of frames per video, and B is the number of videos in the batch. + obj_to_frame_idx: A [TxOx2] tensor containing the image_batch index which the object belongs to. O is the number of objects in the batch. + masks: A [TxOxHxW] tensor containing binary masks for each object in the batch. + metadata: An instance of BatchedVideoMetaData containing metadata about the batch. + dict_key: A string key used to identify the batch. + """ + + img_batch: torch.FloatTensor + obj_to_frame_idx: torch.IntTensor + masks: torch.BoolTensor + metadata: BatchedVideoMetaData + + points: torch.Tensor + labels: torch.Tensor + + dict_key: str + + def pin_memory(self, device=None): + return self.apply(torch.Tensor.pin_memory, device=device) + + @property + def num_frames(self) -> int: + """ + Returns the number of frames per video. + """ + return self.batch_size[0] + + @property + def num_videos(self) -> int: + """ + Returns the number of videos in the batch. + """ + return self.img_batch.shape[1] + + @property + def flat_obj_to_img_idx(self) -> torch.IntTensor: + """ + Returns a flattened tensor containing the object to img index. + The flat index can be used to access a flattened img_batch of shape [(T*B)xCxHxW] + """ + frame_idx, video_idx = self.obj_to_frame_idx.unbind(dim=-1) + flat_idx = video_idx * self.num_frames + frame_idx + return flat_idx + + @property + def flat_img_batch(self) -> torch.FloatTensor: + """ + Returns a flattened img_batch_tensor of shape [(B*T)xCxHxW] + """ + + return self.img_batch.transpose(0, 1).flatten(0, 1) + + +@dataclass +class Object: + # Id of the object in the media + object_id: int + # Index of the frame in the media (0 if single image) + frame_index: int + segment: list # Union[torch.Tensor, dict] # RLE dict or binary mask + + +@dataclass +class Frame: + data: Union[torch.Tensor, PILImage.Image] + objects: List[Object] + + +@dataclass +class VideoDatapoint: + """Refers to an image/video and all its annotations""" + + frames: List[Frame] + video_id: int + size: Tuple[int, int] + + +def collate_fn( + batch: List[VideoDatapoint], + dict_key, +) -> BatchedVideoDatapoint: + """ + Args: + batch: A list of VideoDatapoint instances. + dict_key (str): A string key used to identify the batch. + """ + img_batch = [] + for video in batch: + img_batch += [torch.stack([frame.data for frame in video.frames], dim=0)] + + img_batch = torch.stack(img_batch, dim=0).permute((1, 0, 2, 3, 4)) + T = img_batch.shape[0] + # Prepare data structures for sequential processing. Per-frame processing but batched across videos. + step_t_objects_identifier = [[] for _ in range(T)] + step_t_frame_orig_size = [[] for _ in range(T)] + + step_t_masks = [[] for _ in range(T)] + points = [[] for _ in range(T)] + labels = [[] for _ in range(T)] + step_t_obj_to_frame_idx = [[] for _ in range(T)] # List to store frame indices for each time step + + for video_idx, video in enumerate(batch): + orig_video_id = video.video_id + orig_frame_size = video.size + for t, frame in enumerate(video.frames): + objects = frame.objects + for obj in objects: + orig_obj_id = obj.object_id + orig_frame_idx = obj.frame_index + step_t_obj_to_frame_idx[t].append(torch.tensor([t, video_idx], dtype=torch.int)) + step_t_masks[t].append(obj.segment[0].to(torch.bool)) + points[t].append(obj.segment[1]) + labels[t].append(obj.segment[2]) + step_t_objects_identifier[t].append(torch.tensor([orig_video_id, orig_obj_id, orig_frame_idx])) + step_t_frame_orig_size[t].append(torch.tensor(orig_frame_size)) + + obj_to_frame_idx = torch.stack( + [torch.stack(obj_to_frame_idx, dim=0) for obj_to_frame_idx in step_t_obj_to_frame_idx], + dim=0, + ) + masks = torch.stack([torch.stack(masks, dim=0) for masks in step_t_masks], dim=0) + objects_identifier = torch.stack([torch.stack(id, dim=0) for id in step_t_objects_identifier], dim=0) + frame_orig_size = torch.stack([torch.stack(id, dim=0) for id in step_t_frame_orig_size], dim=0) + + return BatchedVideoDatapoint( + img_batch=img_batch, + obj_to_frame_idx=obj_to_frame_idx, + masks=masks, + points=points, + labels=labels, + metadata=BatchedVideoMetaData( + unique_objects_identifier=objects_identifier, + frame_orig_size=frame_orig_size, + ), + dict_key=dict_key, + batch_size=[T], + ) diff --git a/bboxmaskpose/sam2/training/utils/distributed.py b/bboxmaskpose/sam2/training/utils/distributed.py new file mode 100644 index 0000000000000000000000000000000000000000..d27fa211e644ca83665fc4cb0fdf21dc1a92d56b --- /dev/null +++ b/bboxmaskpose/sam2/training/utils/distributed.py @@ -0,0 +1,546 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +import datetime +import functools +import io +import logging +import os +import random +import tempfile +import time +from typing import Any, Callable, List, Tuple + +import torch +import torch.autograd as autograd +import torch.distributed as dist + +# Default to GPU 0 +_cuda_device_index: int = 0 + +# Setting _cuda_device_index to -1 internally implies that we should use CPU +_CPU_DEVICE_INDEX = -1 +_PRIMARY_RANK = 0 + + +@functools.lru_cache() +def _get_global_gloo_group(): + """ + Return a process group based on gloo backend, containing all the ranks + The result is cached. + """ + + if dist.get_backend() == "nccl": + # Increase timeout from 1800 sec to 43200 sec (12 hr) to avoid some processes + # being much slower than others causing a timeout (which can happen in relation + # or LVIS class mAP evaluation). + timeout = 43200 + return dist.new_group( + backend="gloo", + timeout=datetime.timedelta(seconds=timeout), + ) + + return dist.group.WORLD + + +def is_main_process(): + """Return true if the current process is the main one""" + return get_rank() == 0 + + +def all_gather_via_filesys(data, filesys_save_dir=None, gather_to_rank_0_only=False): + """ + Run all_gather on arbitrary picklable data (not necessarily tensors), similar to + `all_gather` above, but using filesystem instead of collective ops. + + If gather_to_rank_0_only is True, only rank 0 will load the gathered object list + (and other ranks will have an empty list). + """ + world_size = get_world_size() + if world_size == 1: + return [data] + + print("gathering via files") + cpu_group = _get_global_gloo_group() + + # if unspecified, we will save to the current python file dir + if filesys_save_dir is not None: + save_dir = filesys_save_dir + elif "EXP_DIR" in os.environ: + save_dir = os.environ["EXP_DIR"] + else: + # try the same directory where the code is stored + save_dir = filesys_save_dir or os.path.dirname(__file__) + save_dir = os.path.join(save_dir, "all_gather_via_filesys") + if is_main_process(): + os.makedirs(save_dir, exist_ok=True) + + # use a timestamp and salt to distinguish different all_gather + timestamp = int(time.time()) if is_main_process() else 0 + salt = random.randint(0, 2**31 - 1) if is_main_process() else 0 + # broadcast the timestamp and salt across ranks + # (all-reduce will do the broadcasting since only rank 0 is non-zero) + timestamp_and_salt = torch.tensor([timestamp, salt], dtype=torch.long) + dist.all_reduce(timestamp_and_salt, group=cpu_group) + timestamp, salt = timestamp_and_salt.tolist() + + # save the data to a file on the disk + rank_save = get_rank() + save_data_filename = f"data_to_gather_{timestamp}_{salt}_{rank_save}.pkl" + save_data_path = os.path.join(save_dir, save_data_filename) + assert not os.path.exists(save_data_path), f"{save_data_path} already exists" + torch.save(data, save_data_path) + dist.barrier(group=cpu_group) + + # read the data from the files + data_list = [] + if rank_save == 0 or not gather_to_rank_0_only: + for rank_load in range(world_size): + load_data_filename = f"data_to_gather_{timestamp}_{salt}_{rank_load}.pkl" + load_data_path = os.path.join(save_dir, load_data_filename) + assert os.path.exists(load_data_path), f"cannot read {save_data_path}" + data_list.append(torch.load(load_data_path)) + dist.barrier(group=cpu_group) + + # delete the saved file + os.remove(save_data_path) + return data_list + + +def all_gather(data, force_cpu=False, force_filesys=False, filesys_save_dir=None): + """ + Run all_gather on arbitrary picklable data (not necessarily tensors) + Args: + data: any picklable object + Returns: + list[data]: list of data gathered from each rank + """ + + world_size = get_world_size() + if world_size == 1: + return [data] + + if os.getenv("MDETR_FILESYS_REDUCE_RANK_0_ONLY") == "1": + return all_gather_via_filesys(data, filesys_save_dir, gather_to_rank_0_only=True) + + if os.getenv("MDETR_FILESYS_REDUCE") == "1" or force_filesys: + return all_gather_via_filesys(data, filesys_save_dir) + + cpu_group = None + if os.getenv("MDETR_CPU_REDUCE") == "1" or force_cpu: + cpu_group = _get_global_gloo_group() + + buffer = io.BytesIO() + torch.save(data, buffer) + data_view = buffer.getbuffer() + device = "cuda" if cpu_group is None else "cpu" + tensor = torch.ByteTensor(data_view).to(device) + + # obtain Tensor size of each rank + local_size = torch.tensor([tensor.numel()], device=device, dtype=torch.long) + size_list = [torch.tensor([0], device=device, dtype=torch.long) for _ in range(world_size)] + if cpu_group is None: + dist.all_gather(size_list, local_size) + else: + print("gathering on cpu") + dist.all_gather(size_list, local_size, group=cpu_group) + size_list = [int(size.item()) for size in size_list] + max_size = max(size_list) + assert isinstance(local_size.item(), int) + local_size = int(local_size.item()) + + # receiving Tensor from all ranks + # we pad the tensor because torch all_gather does not support + # gathering tensors of different shapes + tensor_list = [] + for _ in size_list: + tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device=device)) + if local_size != max_size: + padding = torch.empty(size=(max_size - local_size,), dtype=torch.uint8, device=device) + tensor = torch.cat((tensor, padding), dim=0) + if cpu_group is None: + dist.all_gather(tensor_list, tensor) + else: + dist.all_gather(tensor_list, tensor, group=cpu_group) + + data_list = [] + for size, tensor in zip(size_list, tensor_list): + tensor = torch.split(tensor, [size, max_size - size], dim=0)[0] + buffer = io.BytesIO(tensor.cpu().numpy()) + obj = torch.load(buffer) + data_list.append(obj) + + return data_list + + +def convert_to_distributed_tensor(tensor: torch.Tensor) -> Tuple[torch.Tensor, str]: + """ + For some backends, such as NCCL, communication only works if the + tensor is on the GPU. This helper function converts to the correct + device and returns the tensor + original device. + """ + orig_device = "cpu" if not tensor.is_cuda else "gpu" + if torch.distributed.is_available() and torch.distributed.get_backend() == torch.distributed.Backend.NCCL and not tensor.is_cuda: + tensor = tensor.cuda() + return (tensor, orig_device) + + +def convert_to_normal_tensor(tensor: torch.Tensor, orig_device: str) -> torch.Tensor: + """ + For some backends, such as NCCL, communication only works if the + tensor is on the GPU. This converts the tensor back to original device. + """ + if tensor.is_cuda and orig_device == "cpu": + tensor = tensor.cpu() + return tensor + + +def is_distributed_training_run() -> bool: + return torch.distributed.is_available() and torch.distributed.is_initialized() and (torch.distributed.get_world_size() > 1) + + +def is_primary() -> bool: + """ + Returns True if this is rank 0 of a distributed training job OR if it is + a single trainer job. Otherwise False. + """ + return get_rank() == _PRIMARY_RANK + + +def all_reduce_mean(tensor: torch.Tensor) -> torch.Tensor: + """ + Wrapper over torch.distributed.all_reduce for performing mean reduction + of tensor over all processes. + """ + return all_reduce_op( + tensor, + torch.distributed.ReduceOp.SUM, + lambda t: t / torch.distributed.get_world_size(), + ) + + +def all_reduce_sum(tensor: torch.Tensor) -> torch.Tensor: + """ + Wrapper over torch.distributed.all_reduce for performing sum + reduction of tensor over all processes in both distributed / + non-distributed scenarios. + """ + return all_reduce_op(tensor, torch.distributed.ReduceOp.SUM) + + +def all_reduce_min(tensor: torch.Tensor) -> torch.Tensor: + """ + Wrapper over torch.distributed.all_reduce for performing min + reduction of tensor over all processes in both distributed / + non-distributed scenarios. + """ + return all_reduce_op(tensor, torch.distributed.ReduceOp.MIN) + + +def all_reduce_max(tensor: torch.Tensor) -> torch.Tensor: + """ + Wrapper over torch.distributed.all_reduce for performing min + reduction of tensor over all processes in both distributed / + non-distributed scenarios. + """ + return all_reduce_op(tensor, torch.distributed.ReduceOp.MAX) + + +def all_reduce_op( + tensor: torch.Tensor, + op: torch.distributed.ReduceOp, + after_op_func: Callable[[torch.Tensor], torch.Tensor] = None, +) -> torch.Tensor: + """ + Wrapper over torch.distributed.all_reduce for performing + reduction of tensor over all processes in both distributed / + non-distributed scenarios. + """ + if is_distributed_training_run(): + tensor, orig_device = convert_to_distributed_tensor(tensor) + torch.distributed.all_reduce(tensor, op) + if after_op_func is not None: + tensor = after_op_func(tensor) + tensor = convert_to_normal_tensor(tensor, orig_device) + return tensor + + +def gather_tensors_from_all(tensor: torch.Tensor) -> List[torch.Tensor]: + """ + Wrapper over torch.distributed.all_gather for performing + 'gather' of 'tensor' over all processes in both distributed / + non-distributed scenarios. + """ + if tensor.ndim == 0: + # 0 dim tensors cannot be gathered. so unsqueeze + tensor = tensor.unsqueeze(0) + + if is_distributed_training_run(): + tensor, orig_device = convert_to_distributed_tensor(tensor) + gathered_tensors = [torch.zeros_like(tensor) for _ in range(torch.distributed.get_world_size())] + torch.distributed.all_gather(gathered_tensors, tensor) + gathered_tensors = [convert_to_normal_tensor(_tensor, orig_device) for _tensor in gathered_tensors] + else: + gathered_tensors = [tensor] + + return gathered_tensors + + +def gather_from_all(tensor: torch.Tensor) -> torch.Tensor: + gathered_tensors = gather_tensors_from_all(tensor) + gathered_tensor = torch.cat(gathered_tensors, 0) + return gathered_tensor + + +def broadcast(tensor: torch.Tensor, src: int = 0) -> torch.Tensor: + """ + Wrapper over torch.distributed.broadcast for broadcasting a tensor from the source + to all processes in both distributed / non-distributed scenarios. + """ + if is_distributed_training_run(): + tensor, orig_device = convert_to_distributed_tensor(tensor) + torch.distributed.broadcast(tensor, src) + tensor = convert_to_normal_tensor(tensor, orig_device) + return tensor + + +def barrier() -> None: + """ + Wrapper over torch.distributed.barrier, returns without waiting + if the distributed process group is not initialized instead of throwing error. + """ + if not torch.distributed.is_available() or not torch.distributed.is_initialized(): + return + torch.distributed.barrier() + + +def get_world_size() -> int: + """ + Simple wrapper for correctly getting worldsize in both distributed + / non-distributed settings + """ + return torch.distributed.get_world_size() if torch.distributed.is_available() and torch.distributed.is_initialized() else 1 + + +def get_rank() -> int: + """ + Simple wrapper for correctly getting rank in both distributed + / non-distributed settings + """ + return torch.distributed.get_rank() if torch.distributed.is_available() and torch.distributed.is_initialized() else 0 + + +def get_primary_rank() -> int: + return _PRIMARY_RANK + + +def set_cuda_device_index(idx: int) -> None: + global _cuda_device_index + _cuda_device_index = idx + torch.cuda.set_device(_cuda_device_index) + + +def set_cpu_device() -> None: + global _cuda_device_index + _cuda_device_index = _CPU_DEVICE_INDEX + + +def get_cuda_device_index() -> int: + return _cuda_device_index + + +def init_distributed_data_parallel_model( + model: torch.nn.Module, + broadcast_buffers: bool = False, + find_unused_parameters: bool = True, + bucket_cap_mb: int = 25, +) -> torch.nn.parallel.DistributedDataParallel: + global _cuda_device_index + + if _cuda_device_index == _CPU_DEVICE_INDEX: + # CPU-only model, don't specify device + return torch.nn.parallel.DistributedDataParallel( + model, + broadcast_buffers=broadcast_buffers, + find_unused_parameters=find_unused_parameters, + bucket_cap_mb=bucket_cap_mb, + ) + else: + # GPU model + return torch.nn.parallel.DistributedDataParallel( + model, + device_ids=[_cuda_device_index], + output_device=_cuda_device_index, + broadcast_buffers=broadcast_buffers, + find_unused_parameters=find_unused_parameters, + bucket_cap_mb=bucket_cap_mb, + ) + + +def broadcast_object(obj: Any, src: int = _PRIMARY_RANK, use_disk: bool = True) -> Any: + """Broadcast an object from a source to all workers. + + Args: + obj: Object to broadcast, must be serializable + src: Source rank for broadcast (default is primary) + use_disk: If enabled, removes redundant CPU memory copies by writing to + disk + """ + # Either broadcast from primary to the fleet (default), + # or use the src setting as the original rank + if get_rank() == src: + # Emit data + buffer = io.BytesIO() + torch.save(obj, buffer) + data_view = buffer.getbuffer() + length_tensor = torch.LongTensor([len(data_view)]) + length_tensor = broadcast(length_tensor, src=src) + data_tensor = torch.ByteTensor(data_view) + data_tensor = broadcast(data_tensor, src=src) + else: + # Fetch from the source + length_tensor = torch.LongTensor([0]) + length_tensor = broadcast(length_tensor, src=src) + data_tensor = torch.empty([length_tensor.item()], dtype=torch.uint8) + data_tensor = broadcast(data_tensor, src=src) + if use_disk: + with tempfile.TemporaryFile("r+b") as f: + f.write(data_tensor.numpy()) + # remove reference to the data tensor and hope that Python garbage + # collects it + del data_tensor + f.seek(0) + obj = torch.load(f) + else: + buffer = io.BytesIO(data_tensor.numpy()) + obj = torch.load(buffer) + return obj + + +def all_gather_tensor(tensor: torch.Tensor, world_size=None): + if world_size is None: + world_size = get_world_size() + # make contiguous because NCCL won't gather the tensor otherwise + assert tensor.is_contiguous(), f"{tensor.shape} is not contiguous!" + tensor, orig_device = convert_to_distributed_tensor(tensor) + tensor_all = [torch.ones_like(tensor) for _ in range(world_size)] + dist.all_gather(tensor_all, tensor, async_op=False) # performance opt + tensor_all = [convert_to_normal_tensor(tensor, orig_device) for tensor in tensor_all] + return tensor_all + + +def all_gather_batch(tensors: List[torch.Tensor]): + """ + Performs all_gather operation on the provided tensors. + """ + # Queue the gathered tensors + world_size = get_world_size() + # There is no need for reduction in the single-proc case + if world_size == 1: + return tensors + tensor_list = [] + output_tensor = [] + for tensor in tensors: + tensor_all = all_gather_tensor(tensor, world_size) + tensor_list.append(tensor_all) + + for tensor_all in tensor_list: + output_tensor.append(torch.cat(tensor_all, dim=0)) + return output_tensor + + +class GatherLayer(autograd.Function): + """ + Gather tensors from all workers with support for backward propagation: + This implementation does not cut the gradients as torch.distributed.all_gather does. + """ + + @staticmethod + def forward(ctx, x): + output = [torch.zeros_like(x) for _ in range(dist.get_world_size())] + dist.all_gather(output, x) + return tuple(output) + + @staticmethod + def backward(ctx, *grads): + all_gradients = torch.stack(grads) + dist.all_reduce(all_gradients) + return all_gradients[dist.get_rank()] + + +def all_gather_batch_with_grad(tensors): + """ + Performs all_gather operation on the provided tensors. + Graph remains connected for backward grad computation. + """ + # Queue the gathered tensors + world_size = get_world_size() + # There is no need for reduction in the single-proc case + if world_size == 1: + return tensors + tensor_list = [] + output_tensor = [] + + for tensor in tensors: + tensor_all = GatherLayer.apply(tensor) + tensor_list.append(tensor_all) + + for tensor_all in tensor_list: + output_tensor.append(torch.cat(tensor_all, dim=0)) + return output_tensor + + +def unwrap_ddp_if_wrapped(model): + if isinstance(model, torch.nn.parallel.DistributedDataParallel): + return model.module + return model + + +def create_new_process_group(group_size): + """ + Creates process groups of a gives `group_size` and returns + process group that current GPU participates in. + + `group_size` must divide the total number of GPUs (world_size). + + Modified from + https://github.com/NVIDIA/apex/blob/4e1ae43f7f7ac69113ef426dd15f37123f0a2ed3/apex/parallel/__init__.py#L60 + + Args: + group_size (int): number of GPU's to collaborate for sync bn + """ + + assert group_size > 0 + + world_size = torch.distributed.get_world_size() + if world_size <= 8: + if group_size > world_size: + logging.warning( + f"Requested group size [{group_size}] > world size [{world_size}]. " + "Assuming local debug run and capping it to world size." + ) + group_size = world_size + assert world_size >= group_size + assert world_size % group_size == 0 + + group = None + for group_num in range(world_size // group_size): + group_ids = range(group_num * group_size, (group_num + 1) * group_size) + cur_group = torch.distributed.new_group(ranks=group_ids) + if torch.distributed.get_rank() // group_size == group_num: + group = cur_group + # can not drop out and return here, every process must go through creation of all subgroups + + assert group is not None + return group + + +def is_dist_avail_and_initialized(): + if not dist.is_available(): + return False + if not dist.is_initialized(): + return False + return True diff --git a/bboxmaskpose/sam2/training/utils/logger.py b/bboxmaskpose/sam2/training/utils/logger.py new file mode 100644 index 0000000000000000000000000000000000000000..ee194a79019213b3cb064e758dff43e2daae623a --- /dev/null +++ b/bboxmaskpose/sam2/training/utils/logger.py @@ -0,0 +1,235 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +# Code borrowed from TLC - https://www.internalfb.com/code/fbsource/fbcode/pytorch/tlc/torchtlc/loggers/tensorboard.py +import atexit +import functools +import logging +import sys +import uuid +from typing import Any, Dict, Optional, Union + +from numpy import ndarray +from torch import Tensor +from torch.utils.tensorboard import SummaryWriter + +from hydra.utils import instantiate +from iopath.common.file_io import g_pathmgr +from training.utils.train_utils import get_machine_local_and_dist_rank, makedir + +Scalar = Union[Tensor, ndarray, int, float] + + +def make_tensorboard_logger(log_dir: str, **writer_kwargs: Any): + makedir(log_dir) + summary_writer_method = SummaryWriter + return TensorBoardLogger(path=log_dir, summary_writer_method=summary_writer_method, **writer_kwargs) + + +class TensorBoardWriterWrapper: + """ + A wrapper around a SummaryWriter object. + """ + + def __init__( + self, + path: str, + *args: Any, + filename_suffix: str = None, + summary_writer_method: Any = SummaryWriter, + **kwargs: Any, + ) -> None: + """Create a new TensorBoard logger. + On construction, the logger creates a new events file that logs + will be written to. If the environment variable `RANK` is defined, + logger will only log if RANK = 0. + + NOTE: If using the logger with distributed training: + - This logger can call collective operations + - Logs will be written on rank 0 only + - Logger must be constructed synchronously *after* initializing distributed process group. + + Args: + path (str): path to write logs to + *args, **kwargs: Extra arguments to pass to SummaryWriter + """ + self._writer: Optional[SummaryWriter] = None + _, self._rank = get_machine_local_and_dist_rank() + self._path: str = path + if self._rank == 0: + logging.info(f"TensorBoard SummaryWriter instantiated. Files will be stored in: {path}") + self._writer = summary_writer_method( + log_dir=path, + *args, + filename_suffix=filename_suffix or str(uuid.uuid4()), + **kwargs, + ) + else: + logging.debug(f"Not logging meters on this host because env RANK: {self._rank} != 0") + atexit.register(self.close) + + @property + def writer(self) -> Optional[SummaryWriter]: + return self._writer + + @property + def path(self) -> str: + return self._path + + def flush(self) -> None: + """Writes pending logs to disk.""" + + if not self._writer: + return + + self._writer.flush() + + def close(self) -> None: + """Close writer, flushing pending logs to disk. + Logs cannot be written after `close` is called. + """ + + if not self._writer: + return + + self._writer.close() + self._writer = None + + +class TensorBoardLogger(TensorBoardWriterWrapper): + """ + A simple logger for TensorBoard. + """ + + def log_dict(self, payload: Dict[str, Scalar], step: int) -> None: + """Add multiple scalar values to TensorBoard. + + Args: + payload (dict): dictionary of tag name and scalar value + step (int, Optional): step value to record + """ + if not self._writer: + return + for k, v in payload.items(): + self.log(k, v, step) + + def log(self, name: str, data: Scalar, step: int) -> None: + """Add scalar data to TensorBoard. + + Args: + name (string): tag name used to group scalars + data (float/int/Tensor): scalar data to log + step (int, optional): step value to record + """ + if not self._writer: + return + self._writer.add_scalar(name, data, global_step=step, new_style=True) + + def log_hparams(self, hparams: Dict[str, Scalar], meters: Dict[str, Scalar]) -> None: + """Add hyperparameter data to TensorBoard. + + Args: + hparams (dict): dictionary of hyperparameter names and corresponding values + meters (dict): dictionary of name of meter and corersponding values + """ + if not self._writer: + return + self._writer.add_hparams(hparams, meters) + + +class Logger: + """ + A logger class that can interface with multiple loggers. It now supports tensorboard only for simplicity, but you can extend it with your own logger. + """ + + def __init__(self, logging_conf): + # allow turning off TensorBoard with "should_log: false" in config + tb_config = logging_conf.tensorboard_writer + tb_should_log = tb_config and tb_config.pop("should_log", True) + self.tb_logger = instantiate(tb_config) if tb_should_log else None + + def log_dict(self, payload: Dict[str, Scalar], step: int) -> None: + if self.tb_logger: + self.tb_logger.log_dict(payload, step) + + def log(self, name: str, data: Scalar, step: int) -> None: + if self.tb_logger: + self.tb_logger.log(name, data, step) + + def log_hparams(self, hparams: Dict[str, Scalar], meters: Dict[str, Scalar]) -> None: + if self.tb_logger: + self.tb_logger.log_hparams(hparams, meters) + + +# cache the opened file object, so that different calls to `setup_logger` +# with the same file name can safely write to the same file. +@functools.lru_cache(maxsize=None) +def _cached_log_stream(filename): + # we tune the buffering value so that the logs are updated + # frequently. + log_buffer_kb = 10 * 1024 # 10KB + io = g_pathmgr.open(filename, mode="a", buffering=log_buffer_kb) + atexit.register(io.close) + return io + + +def setup_logging( + name, + output_dir=None, + rank=0, + log_level_primary="INFO", + log_level_secondary="ERROR", +): + """ + Setup various logging streams: stdout and file handlers. + For file handlers, we only setup for the master gpu. + """ + # get the filename if we want to log to the file as well + log_filename = None + if output_dir: + makedir(output_dir) + if rank == 0: + log_filename = f"{output_dir}/log.txt" + + logger = logging.getLogger(name) + logger.setLevel(log_level_primary) + + # create formatter + FORMAT = "%(levelname)s %(asctime)s %(filename)s:%(lineno)4d: %(message)s" + formatter = logging.Formatter(FORMAT) + + # Cleanup any existing handlers + for h in logger.handlers: + logger.removeHandler(h) + logger.root.handlers = [] + + # setup the console handler + console_handler = logging.StreamHandler(sys.stdout) + console_handler.setFormatter(formatter) + logger.addHandler(console_handler) + if rank == 0: + console_handler.setLevel(log_level_primary) + else: + console_handler.setLevel(log_level_secondary) + + # we log to file as well if user wants + if log_filename and rank == 0: + file_handler = logging.StreamHandler(_cached_log_stream(log_filename)) + file_handler.setLevel(log_level_primary) + file_handler.setFormatter(formatter) + logger.addHandler(file_handler) + + logging.root = logger + + +def shutdown_logging(): + """ + After training is done, we ensure to shut down all the logger streams. + """ + logging.info("Shutting down loggers...") + handlers = logging.root.handlers + for handler in handlers: + handler.close() diff --git a/bboxmaskpose/sam2/training/utils/train_utils.py b/bboxmaskpose/sam2/training/utils/train_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..9245102f608376800bc9e01a6bc824e7c36ba264 --- /dev/null +++ b/bboxmaskpose/sam2/training/utils/train_utils.py @@ -0,0 +1,273 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +import logging +import math +import os +import random +import re +from datetime import timedelta +from typing import Optional + +import numpy as np +import torch +import torch.distributed as dist + +import hydra +import omegaconf +from iopath.common.file_io import g_pathmgr +from omegaconf import OmegaConf + + +def multiply_all(*args): + return np.prod(np.array(args)).item() + + +def collect_dict_keys(config): + """This function recursively iterates through a dataset configuration, and collect all the dict_key that are defined""" + val_keys = [] + # If the this config points to the collate function, then it has a key + if "_target_" in config and re.match(r".*collate_fn.*", config["_target_"]): + val_keys.append(config["dict_key"]) + else: + # Recursively proceed + for v in config.values(): + if isinstance(v, type(config)): + val_keys.extend(collect_dict_keys(v)) + elif isinstance(v, omegaconf.listconfig.ListConfig): + for item in v: + if isinstance(item, type(config)): + val_keys.extend(collect_dict_keys(item)) + return val_keys + + +class Phase: + TRAIN = "train" + VAL = "val" + + +def register_omegaconf_resolvers(): + OmegaConf.register_new_resolver("get_method", hydra.utils.get_method) + OmegaConf.register_new_resolver("get_class", hydra.utils.get_class) + OmegaConf.register_new_resolver("add", lambda x, y: x + y) + OmegaConf.register_new_resolver("times", multiply_all) + OmegaConf.register_new_resolver("divide", lambda x, y: x / y) + OmegaConf.register_new_resolver("pow", lambda x, y: x**y) + OmegaConf.register_new_resolver("subtract", lambda x, y: x - y) + OmegaConf.register_new_resolver("range", lambda x: list(range(x))) + OmegaConf.register_new_resolver("int", lambda x: int(x)) + OmegaConf.register_new_resolver("ceil_int", lambda x: int(math.ceil(x))) + OmegaConf.register_new_resolver("merge", lambda *x: OmegaConf.merge(*x)) + + +def setup_distributed_backend(backend, timeout_mins): + """ + Initialize torch.distributed and set the CUDA device. + Expects environment variables to be set as per + https://pytorch.org/docs/stable/distributed.html#environment-variable-initialization + along with the environ variable "LOCAL_RANK" which is used to set the CUDA device. + """ + # enable TORCH_NCCL_ASYNC_ERROR_HANDLING to ensure dist nccl ops time out after timeout_mins + # of waiting + os.environ["TORCH_NCCL_ASYNC_ERROR_HANDLING"] = "1" + logging.info(f"Setting up torch.distributed with a timeout of {timeout_mins} mins") + dist.init_process_group(backend=backend, timeout=timedelta(minutes=timeout_mins)) + return dist.get_rank() + + +def get_machine_local_and_dist_rank(): + """ + Get the distributed and local rank of the current gpu. + """ + local_rank = int(os.environ.get("LOCAL_RANK", None)) + distributed_rank = int(os.environ.get("RANK", None)) + assert local_rank is not None and distributed_rank is not None, "Please the set the RANK and LOCAL_RANK environment variables." + return local_rank, distributed_rank + + +def print_cfg(cfg): + """ + Supports printing both Hydra DictConfig and also the AttrDict config + """ + logging.info("Training with config:") + logging.info(OmegaConf.to_yaml(cfg)) + + +def set_seeds(seed_value, max_epochs, dist_rank): + """ + Set the python random, numpy and torch seed for each gpu. Also set the CUDA + seeds if the CUDA is available. This ensures deterministic nature of the training. + """ + # Since in the pytorch sampler, we increment the seed by 1 for every epoch. + seed_value = (seed_value + dist_rank) * max_epochs + logging.info(f"MACHINE SEED: {seed_value}") + random.seed(seed_value) + np.random.seed(seed_value) + torch.manual_seed(seed_value) + if torch.cuda.is_available(): + torch.cuda.manual_seed_all(seed_value) + + +def makedir(dir_path): + """ + Create the directory if it does not exist. + """ + is_success = False + try: + if not g_pathmgr.exists(dir_path): + g_pathmgr.mkdirs(dir_path) + is_success = True + except BaseException: + logging.info(f"Error creating directory: {dir_path}") + return is_success + + +def is_dist_avail_and_initialized(): + if not dist.is_available(): + return False + if not dist.is_initialized(): + return False + return True + + +def get_amp_type(amp_type: Optional[str] = None): + if amp_type is None: + return None + assert amp_type in ["bfloat16", "float16"], "Invalid Amp type." + if amp_type == "bfloat16": + return torch.bfloat16 + else: + return torch.float16 + + +def log_env_variables(): + env_keys = sorted(list(os.environ.keys())) + st = "" + for k in env_keys: + v = os.environ[k] + st += f"{k}={v}\n" + logging.info("Logging ENV_VARIABLES") + logging.info(st) + + +class AverageMeter: + """Computes and stores the average and current value""" + + def __init__(self, name, device, fmt=":f"): + self.name = name + self.fmt = fmt + self.device = device + self.reset() + + def reset(self): + self.val = 0 + self.avg = 0 + self.sum = 0 + self.count = 0 + self._allow_updates = True + + def update(self, val, n=1): + self.val = val + self.sum += val * n + self.count += n + self.avg = self.sum / self.count + + def __str__(self): + fmtstr = "{name}: {val" + self.fmt + "} ({avg" + self.fmt + "})" + return fmtstr.format(**self.__dict__) + + +class MemMeter: + """Computes and stores the current, avg, and max of peak Mem usage per iteration""" + + def __init__(self, name, device, fmt=":f"): + self.name = name + self.fmt = fmt + self.device = device + self.reset() + + def reset(self): + self.val = 0 # Per iteration max usage + self.avg = 0 # Avg per iteration max usage + self.peak = 0 # Peak usage for lifetime of program + self.sum = 0 + self.count = 0 + self._allow_updates = True + + def update(self, n=1, reset_peak_usage=True): + self.val = torch.cuda.max_memory_allocated() // 1e9 + self.sum += self.val * n + self.count += n + self.avg = self.sum / self.count + self.peak = max(self.peak, self.val) + if reset_peak_usage: + torch.cuda.reset_peak_memory_stats() + + def __str__(self): + fmtstr = "{name}: {val" + self.fmt + "} ({avg" + self.fmt + "}/{peak" + self.fmt + "})" + return fmtstr.format(**self.__dict__) + + +def human_readable_time(time_seconds): + time = int(time_seconds) + minutes, seconds = divmod(time, 60) + hours, minutes = divmod(minutes, 60) + days, hours = divmod(hours, 24) + return f"{days:02}d {hours:02}h {minutes:02}m" + + +class DurationMeter: + def __init__(self, name, device, fmt=":f"): + self.name = name + self.device = device + self.fmt = fmt + self.val = 0 + + def reset(self): + self.val = 0 + + def update(self, val): + self.val = val + + def add(self, val): + self.val += val + + def __str__(self): + return f"{self.name}: {human_readable_time(self.val)}" + + +class ProgressMeter: + def __init__(self, num_batches, meters, real_meters, prefix=""): + self.batch_fmtstr = self._get_batch_fmtstr(num_batches) + self.meters = meters + self.real_meters = real_meters + self.prefix = prefix + + def display(self, batch, enable_print=False): + entries = [self.prefix + self.batch_fmtstr.format(batch)] + entries += [str(meter) for meter in self.meters] + entries += [ + " | ".join([f"{os.path.join(name, subname)}: {val:.4f}" for subname, val in meter.compute().items()]) + for name, meter in self.real_meters.items() + ] + logging.info(" | ".join(entries)) + if enable_print: + print(" | ".join(entries)) + + def _get_batch_fmtstr(self, num_batches): + num_digits = len(str(num_batches // 1)) + fmt = "{:" + str(num_digits) + "d}" + return "[" + fmt + "/" + fmt.format(num_batches) + "]" + + +def get_resume_checkpoint(checkpoint_save_dir): + if not g_pathmgr.isdir(checkpoint_save_dir): + return None + ckpt_file = os.path.join(checkpoint_save_dir, "checkpoint.pt") + if not g_pathmgr.isfile(ckpt_file): + return None + + return ckpt_file diff --git a/bboxmaskpose/sam2/utils/__init__.py b/bboxmaskpose/sam2/utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..5277f46157403e47fd830fc519144b97ef69d4ae --- /dev/null +++ b/bboxmaskpose/sam2/utils/__init__.py @@ -0,0 +1,5 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. + +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. diff --git a/sam2/utils/amg.py b/bboxmaskpose/sam2/utils/amg.py similarity index 88% rename from sam2/utils/amg.py rename to bboxmaskpose/sam2/utils/amg.py index 986842960cf5deca00614b7b1cde1ab77dad7e6e..c4acb838e7fd9bf3d9df25de00c422317a53f152 100644 --- a/sam2/utils/amg.py +++ b/bboxmaskpose/sam2/utils/amg.py @@ -23,15 +23,11 @@ class MaskData: def __init__(self, **kwargs) -> None: for v in kwargs.values(): - assert isinstance( - v, (list, np.ndarray, torch.Tensor) - ), "MaskData only supports list, numpy arrays, and torch tensors." + assert isinstance(v, (list, np.ndarray, torch.Tensor)), "MaskData only supports list, numpy arrays, and torch tensors." self._stats = dict(**kwargs) def __setitem__(self, key: str, item: Any) -> None: - assert isinstance( - item, (list, np.ndarray, torch.Tensor) - ), "MaskData only supports list, numpy arrays, and torch tensors." + assert isinstance(item, (list, np.ndarray, torch.Tensor)), "MaskData only supports list, numpy arrays, and torch tensors." self._stats[key] = item def __delitem__(self, key: str) -> None: @@ -77,9 +73,7 @@ class MaskData: self._stats[k] = v.float().detach().cpu().numpy() -def is_box_near_crop_edge( - boxes: torch.Tensor, crop_box: List[int], orig_box: List[int], atol: float = 20.0 -) -> torch.Tensor: +def is_box_near_crop_edge(boxes: torch.Tensor, crop_box: List[int], orig_box: List[int], atol: float = 20.0) -> torch.Tensor: """Filter masks at the edge of a crop, but not at the edge of the original image.""" crop_box_torch = torch.as_tensor(crop_box, dtype=torch.float, device=boxes.device) orig_box_torch = torch.as_tensor(orig_box, dtype=torch.float, device=boxes.device) @@ -98,9 +92,7 @@ def box_xyxy_to_xywh(box_xyxy: torch.Tensor) -> torch.Tensor: def batch_iterator(batch_size: int, *args) -> Generator[List[Any], None, None]: - assert len(args) > 0 and all( - len(a) == len(args[0]) for a in args - ), "Batched iteration must have inputs of all the same size." + assert len(args) > 0 and all(len(a) == len(args[0]) for a in args), "Batched iteration must have inputs of all the same size." n_batches = len(args[0]) // batch_size + int(len(args[0]) % batch_size != 0) for b in range(n_batches): yield [arg[b * batch_size : (b + 1) * batch_size] for arg in args] @@ -155,9 +147,7 @@ def area_from_rle(rle: Dict[str, Any]) -> int: return sum(rle["counts"][1::2]) -def calculate_stability_score( - masks: torch.Tensor, mask_threshold: float, threshold_offset: float -) -> torch.Tensor: +def calculate_stability_score(masks: torch.Tensor, mask_threshold: float, threshold_offset: float) -> torch.Tensor: """ Computes the stability score for a batch of masks. The stability score is the IoU between the binary masks obtained by thresholding @@ -165,16 +155,8 @@ def calculate_stability_score( """ # One mask is always contained inside the other. # Save memory by preventing unnecessary cast to torch.int64 - intersections = ( - (masks > (mask_threshold + threshold_offset)) - .sum(-1, dtype=torch.int16) - .sum(-1, dtype=torch.int32) - ) - unions = ( - (masks > (mask_threshold - threshold_offset)) - .sum(-1, dtype=torch.int16) - .sum(-1, dtype=torch.int32) - ) + intersections = (masks > (mask_threshold + threshold_offset)).sum(-1, dtype=torch.int16).sum(-1, dtype=torch.int32) + unions = (masks > (mask_threshold - threshold_offset)).sum(-1, dtype=torch.int16).sum(-1, dtype=torch.int32) return intersections / unions @@ -188,9 +170,7 @@ def build_point_grid(n_per_side: int) -> np.ndarray: return points -def build_all_layer_point_grids( - n_per_side: int, n_layers: int, scale_per_layer: int -) -> List[np.ndarray]: +def build_all_layer_point_grids(n_per_side: int, n_layers: int, scale_per_layer: int) -> List[np.ndarray]: """Generates point grids for all crop layers.""" points_by_layer = [] for i in range(n_layers + 1): @@ -199,9 +179,7 @@ def build_all_layer_point_grids( return points_by_layer -def generate_crop_boxes( - im_size: Tuple[int, ...], n_layers: int, overlap_ratio: float -) -> Tuple[List[List[int]], List[int]]: +def generate_crop_boxes(im_size: Tuple[int, ...], n_layers: int, overlap_ratio: float) -> Tuple[List[List[int]], List[int]]: """ Generates a list of crop boxes of different sizes. Each layer has (2**i)**2 boxes for the ith layer. @@ -254,9 +232,7 @@ def uncrop_points(points: torch.Tensor, crop_box: List[int]) -> torch.Tensor: return points + offset -def uncrop_masks( - masks: torch.Tensor, crop_box: List[int], orig_h: int, orig_w: int -) -> torch.Tensor: +def uncrop_masks(masks: torch.Tensor, crop_box: List[int], orig_h: int, orig_w: int) -> torch.Tensor: x0, y0, x1, y1 = crop_box if x0 == 0 and y0 == 0 and x1 == orig_w and y1 == orig_h: return masks @@ -266,9 +242,7 @@ def uncrop_masks( return torch.nn.functional.pad(masks, pad, value=0) -def remove_small_regions( - mask: np.ndarray, area_thresh: float, mode: str -) -> Tuple[np.ndarray, bool]: +def remove_small_regions(mask: np.ndarray, area_thresh: float, mode: str) -> Tuple[np.ndarray, bool]: """ Removes small disconnected regions and holes in a mask. Returns the mask and an indicator of if the mask has been modified. diff --git a/sam2/utils/kalman_filter.py b/bboxmaskpose/sam2/utils/kalman_filter.py similarity index 81% rename from sam2/utils/kalman_filter.py rename to bboxmaskpose/sam2/utils/kalman_filter.py index 4eba007a90b1272c69bf6607d4e3246100f0ac48..208d69382c4a85187dc4b39b3cd74d6596c2eb27 100644 --- a/sam2/utils/kalman_filter.py +++ b/bboxmaskpose/sam2/utils/kalman_filter.py @@ -1,22 +1,14 @@ +# Adapted from the DeepSORT tracking library. +# Modified by Miroslav Purkrabek and Constantin Kolomiiets, BBoxMaskPose authors. import numpy as np import scipy.linalg - """ Table for the 0.95 quantile of the chi-square distribution with N degrees of freedom (contains values for N=1, ..., 9). Taken from MATLAB/Octave's chi2inv function and used as Mahalanobis gating threshold. """ -chi2inv95 = { - 1: 3.8415, - 2: 5.9915, - 3: 7.8147, - 4: 9.4877, - 5: 11.070, - 6: 12.592, - 7: 14.067, - 8: 15.507, - 9: 16.919} +chi2inv95 = {1: 3.8415, 2: 5.9915, 3: 7.8147, 4: 9.4877, 5: 11.070, 6: 12.592, 7: 14.067, 8: 15.507, 9: 16.919} class KalmanFilter(object): @@ -37,7 +29,7 @@ class KalmanFilter(object): """ def __init__(self): - ndim, dt = 4, 1. + ndim, dt = 4, 1.0 # Create Kalman filter model matrices. self._motion_mat = np.eye(2 * ndim, 2 * ndim) @@ -48,8 +40,8 @@ class KalmanFilter(object): # Motion and observation uncertainty are chosen relative to the current # state estimate. These weights control the amount of uncertainty in # the model. This is a bit hacky. - self._std_weight_position = 1. / 20 - self._std_weight_velocity = 1. / 160 + self._std_weight_position = 1.0 / 20 + self._std_weight_velocity = 1.0 / 160 def initiate(self, measurement): """Create track from unassociated measurement. @@ -80,7 +72,8 @@ class KalmanFilter(object): 10 * self._std_weight_velocity * measurement[3], 10 * self._std_weight_velocity * measurement[3], 1e-5, - 10 * self._std_weight_velocity * measurement[3]] + 10 * self._std_weight_velocity * measurement[3], + ] covariance = np.diag(np.square(std)) return mean, covariance @@ -103,22 +96,13 @@ class KalmanFilter(object): state. Unobserved velocities are initialized to 0 mean. """ - std_pos = [ - self._std_weight_position * mean[3], - self._std_weight_position * mean[3], - 1e-2, - self._std_weight_position * mean[3]] - std_vel = [ - self._std_weight_velocity * mean[3], - self._std_weight_velocity * mean[3], - 1e-5, - self._std_weight_velocity * mean[3]] + std_pos = [self._std_weight_position * mean[3], self._std_weight_position * mean[3], 1e-2, self._std_weight_position * mean[3]] + std_vel = [self._std_weight_velocity * mean[3], self._std_weight_velocity * mean[3], 1e-5, self._std_weight_velocity * mean[3]] motion_cov = np.diag(np.square(np.r_[std_pos, std_vel])) - #mean = np.dot(self._motion_mat, mean) + # mean = np.dot(self._motion_mat, mean) mean = np.dot(mean, self._motion_mat.T) - covariance = np.linalg.multi_dot(( - self._motion_mat, covariance, self._motion_mat.T)) + motion_cov + covariance = np.linalg.multi_dot((self._motion_mat, covariance, self._motion_mat.T)) + motion_cov return mean, covariance @@ -139,16 +123,11 @@ class KalmanFilter(object): estimate. """ - std = [ - self._std_weight_position * mean[3], - self._std_weight_position * mean[3], - 1e-1, - self._std_weight_position * mean[3]] + std = [self._std_weight_position * mean[3], self._std_weight_position * mean[3], 1e-1, self._std_weight_position * mean[3]] innovation_cov = np.diag(np.square(std)) mean = np.dot(self._update_mat, mean) - covariance = np.linalg.multi_dot(( - self._update_mat, covariance, self._update_mat.T)) + covariance = np.linalg.multi_dot((self._update_mat, covariance, self._update_mat.T)) return mean, covariance + innovation_cov def multi_predict(self, mean, covariance): @@ -171,12 +150,14 @@ class KalmanFilter(object): self._std_weight_position * mean[:, 3], self._std_weight_position * mean[:, 3], 1e-2 * np.ones_like(mean[:, 3]), - self._std_weight_position * mean[:, 3]] + self._std_weight_position * mean[:, 3], + ] std_vel = [ self._std_weight_velocity * mean[:, 3], self._std_weight_velocity * mean[:, 3], 1e-5 * np.ones_like(mean[:, 3]), - self._std_weight_velocity * mean[:, 3]] + self._std_weight_velocity * mean[:, 3], + ] sqr = np.square(np.r_[std_pos, std_vel]).T motion_cov = [] @@ -212,20 +193,15 @@ class KalmanFilter(object): """ projected_mean, projected_cov = self.project(mean, covariance) - chol_factor, lower = scipy.linalg.cho_factor( - projected_cov, lower=True, check_finite=False) - kalman_gain = scipy.linalg.cho_solve( - (chol_factor, lower), np.dot(covariance, self._update_mat.T).T, - check_finite=False).T + chol_factor, lower = scipy.linalg.cho_factor(projected_cov, lower=True, check_finite=False) + kalman_gain = scipy.linalg.cho_solve((chol_factor, lower), np.dot(covariance, self._update_mat.T).T, check_finite=False).T innovation = measurement - projected_mean new_mean = mean + np.dot(innovation, kalman_gain.T) - new_covariance = covariance - np.linalg.multi_dot(( - kalman_gain, projected_cov, kalman_gain.T)) + new_covariance = covariance - np.linalg.multi_dot((kalman_gain, projected_cov, kalman_gain.T)) return new_mean, new_covariance - def gating_distance(self, mean, covariance, measurements, - only_position=False, metric='maha'): + def gating_distance(self, mean, covariance, measurements, only_position=False, metric="maha"): """Compute gating distance between state distribution and measurements. A suitable distance threshold can be obtained from `chi2inv95`. If `only_position` is False, the chi-square distribution has 4 degrees of @@ -256,17 +232,15 @@ class KalmanFilter(object): measurements = measurements[:, :2] d = measurements - mean - if metric == 'gaussian': + if metric == "gaussian": return np.sum(d * d, axis=1) - elif metric == 'maha': + elif metric == "maha": cholesky_factor = np.linalg.cholesky(covariance) - z = scipy.linalg.solve_triangular( - cholesky_factor, d.T, lower=True, check_finite=False, - overwrite_b=True) + z = scipy.linalg.solve_triangular(cholesky_factor, d.T, lower=True, check_finite=False, overwrite_b=True) squared_maha = np.sum(z * z, axis=0) return squared_maha else: - raise ValueError('invalid distance metric') + raise ValueError("invalid distance metric") def compute_iou(self, pred_bbox, bboxes): """ diff --git a/sam2/utils/misc.py b/bboxmaskpose/sam2/utils/misc.py similarity index 97% rename from sam2/utils/misc.py rename to bboxmaskpose/sam2/utils/misc.py index 9c214039093c8d78bc662fbf2855eb5c6ee2980a..0373f1fd7c44692efece88bb968068693a6afea3 100644 --- a/sam2/utils/misc.py +++ b/bboxmaskpose/sam2/utils/misc.py @@ -152,9 +152,7 @@ class AsyncVideoFrameLoader: if img is not None: return img - img, video_height, video_width = _load_img_as_tensor( - self.img_paths[index], self.image_size - ) + img, video_height, video_width = _load_img_as_tensor(self.img_paths[index], self.image_size) self.video_height = video_height self.video_width = video_width # normalize by mean and std @@ -205,9 +203,7 @@ def load_video_frames( compute_device=compute_device, ) else: - raise NotImplementedError( - "Only MP4 video and JPEG folder are supported at this moment" - ) + raise NotImplementedError("Only MP4 video and JPEG folder are supported at this moment") def load_video_frames_from_jpg_images( @@ -240,11 +236,7 @@ def load_video_frames_from_jpg_images( "ffmpeg to start the JPEG file from 00000.jpg." ) - frame_names = [ - p - for p in os.listdir(jpg_folder) - if os.path.splitext(p)[-1] in [".jpg", ".jpeg", ".JPG", ".JPEG"] - ] + frame_names = [p for p in os.listdir(jpg_folder) if os.path.splitext(p)[-1] in [".jpg", ".jpeg", ".JPG", ".JPEG"]] frame_names.sort(key=lambda p: int(os.path.splitext(p)[0])) num_frames = len(frame_names) if num_frames == 0: diff --git a/sam2/utils/transforms.py b/bboxmaskpose/sam2/utils/transforms.py similarity index 86% rename from sam2/utils/transforms.py rename to bboxmaskpose/sam2/utils/transforms.py index cc17bebfab104b659c5469e8434cf357ae7e24b6..9d348c05d0f77711e58874dd1bb3ab3b9d9332aa 100644 --- a/sam2/utils/transforms.py +++ b/bboxmaskpose/sam2/utils/transforms.py @@ -13,9 +13,7 @@ from torchvision.transforms import Normalize, Resize, ToTensor class SAM2Transforms(nn.Module): - def __init__( - self, resolution, mask_threshold, max_hole_area=0.0, max_sprinkle_area=0.0 - ): + def __init__(self, resolution, mask_threshold, max_hole_area=0.0, max_sprinkle_area=0.0): """ Transforms for SAM2. """ @@ -43,9 +41,7 @@ class SAM2Transforms(nn.Module): img_batch = torch.stack(img_batch, dim=0) return img_batch - def transform_coords( - self, coords: torch.Tensor, normalize=False, orig_hw=None - ) -> torch.Tensor: + def transform_coords(self, coords: torch.Tensor, normalize=False, orig_hw=None) -> torch.Tensor: """ Expects a torch tensor with length 2 in the last dimension. The coordinates can be in absolute image or normalized coordinates, If the coords are in absolute image coordinates, normalize should be set to True and original image size is required. @@ -63,9 +59,7 @@ class SAM2Transforms(nn.Module): coords = coords * self.resolution # unnormalize coords return coords - def transform_boxes( - self, boxes: torch.Tensor, normalize=False, orig_hw=None - ) -> torch.Tensor: + def transform_boxes(self, boxes: torch.Tensor, normalize=False, orig_hw=None) -> torch.Tensor: """ Expects a tensor of shape Bx4. The coordinates can be in absolute image or normalized coordinates, if the coords are in absolute image coordinates, normalize should be set to True and original image size is required. @@ -77,7 +71,7 @@ class SAM2Transforms(nn.Module): """ Perform PostProcessing on output masks. """ - from sam2.utils.misc import get_connected_components + from bboxmaskpose.sam2.utils.misc import get_connected_components masks = masks.float() input_masks = masks @@ -86,18 +80,14 @@ class SAM2Transforms(nn.Module): if self.max_hole_area > 0: # Holes are those connected components in background with area <= self.fill_hole_area # (background regions are those with mask scores <= self.mask_threshold) - labels, areas = get_connected_components( - mask_flat <= self.mask_threshold - ) + labels, areas = get_connected_components(mask_flat <= self.mask_threshold) is_hole = (labels > 0) & (areas <= self.max_hole_area) is_hole = is_hole.reshape_as(masks) # We fill holes with a small positive mask score (10.0) to change them to foreground. masks = torch.where(is_hole, self.mask_threshold + 10.0, masks) if self.max_sprinkle_area > 0: - labels, areas = get_connected_components( - mask_flat > self.mask_threshold - ) + labels, areas = get_connected_components(mask_flat > self.mask_threshold) is_hole = (labels > 0) & (areas <= self.max_sprinkle_area) is_hole = is_hole.reshape_as(masks) # We fill holes with negative mask score (-10.0) to change them to background. diff --git a/sam2/visualization.py b/bboxmaskpose/sam2/visualization.py similarity index 73% rename from sam2/visualization.py rename to bboxmaskpose/sam2/visualization.py index a506ef92ad573802de6d685ba5f11a68a674fa48..203b72fec27be3436fac40eedf69c86b103e01df 100644 --- a/sam2/visualization.py +++ b/bboxmaskpose/sam2/visualization.py @@ -1,18 +1,49 @@ +# Copyright (c) Miroslav Purkrabek. All rights reserved. import os + import cv2 import numpy as np - -from sam2.distinctipy import get_colors - from pycocotools import mask as Mask - -def batch_visualize_masks(args, image, masks_rle, image_kpts, bboxes_xyxy, dt_bboxes, gt_masks_raw, bbox_ious, mask_ious, image_path=None, mask_out=False, alpha=1.0): +from bboxmaskpose.sam2.distinctipy import get_colors + + +def batch_visualize_masks( + args, + image, + masks_rle, + image_kpts, + bboxes_xyxy, + dt_bboxes, + gt_masks_raw, + bbox_ious, + mask_ious, + image_path=None, + mask_out=False, + alpha=1.0, +): + """ + Visualize predicted and ground-truth masks side by side on an image. + + Args: + args: Configuration object with debug_folder and num_pos_keypoints. + image (np.ndarray): BGR image array. + masks_rle (list): Predicted masks in RLE format. + image_kpts (np.ndarray or None): Keypoints array of shape (N, K, 3). + bboxes_xyxy (np.ndarray or None): GT bounding boxes in [x1,y1,x2,y2]. + dt_bboxes (np.ndarray or None): Detected bounding boxes in [x,y,w,h]. + gt_masks_raw (list): Ground-truth masks in RLE format (or None per entry). + bbox_ious (list): IoU scores per bbox instance. + mask_ious (list): IoU scores per mask instance. + image_path (str or None): Optional path to derive the save filename. + mask_out (bool): If True, renders a masked-out image instead of colored masks. + alpha (float): Blending factor for mask overlay. + """ # Decode dt_masks_rle dt_masks = [] for mask_rle in masks_rle: mask = Mask.decode(mask_rle) - dt_masks.append(mask) + dt_masks.append(mask) dt_masks = np.array(dt_masks) # Decode gt_masks_raw @@ -31,13 +62,13 @@ def batch_visualize_masks(args, image, masks_rle, image_kpts, bboxes_xyxy, dt_bb # Generate random color for each mask if mask_out: dt_mask_image = dt_masks.max(axis=0) - dt_mask_image = (~ dt_mask_image.astype(bool)).astype(np.uint8) + dt_mask_image = (~dt_mask_image.astype(bool)).astype(np.uint8) dt_mask_image = cv2.resize(dt_mask_image, (image.shape[1], image.shape[0]), interpolation=cv2.INTER_NEAREST) dt_mask_image = image * dt_mask_image[:, :, None] - dt_mask_image = cv2.addWeighted(image, 1-alpha, dt_mask_image, alpha, 0) + dt_mask_image = cv2.addWeighted(image, 1 - alpha, dt_mask_image, alpha, 0) else: colors = (np.array(get_colors(dt_masks.shape[0])) * 255).astype(int) - + # colors = np.random.randint(0, 255, (dt_masks.shape[0], 3)) # # Make sure no colors are too dark # np.clip(colors, 50, 255, out=colors) @@ -53,7 +84,7 @@ def batch_visualize_masks(args, image, masks_rle, image_kpts, bboxes_xyxy, dt_bb # # Remove masks that are too small # dt_masks_area = dt_masks.any(axis=3).sum(axis=(1, 2)) # dt_masks[dt_masks_area < 300*300] = 0 - + # Collapse masks to 3 channels dt_mask_image = dt_masks.max(axis=0) gt_mask_image = gt_masks.max(axis=0) @@ -114,14 +145,16 @@ def batch_visualize_masks(args, image, masks_rle, image_kpts, bboxes_xyxy, dt_bb save_name = os.path.basename(image_path) else: save_name = "batch_bbox_{:06.2f}_mask_{:06.2f}_{:02d}kpts_{:06d}.jpg".format( - bbox_ious.mean(), mask_ious.mean(), args.num_pos_keypoints, np.random.randint(1000000), + bbox_ious.mean(), + mask_ious.mean(), + args.num_pos_keypoints, + np.random.randint(1000000), ) - if 'debug_folder' not in args: + if "debug_folder" not in args: args.debug_folder = "debug" if mask_out: - cv2.imwrite(os.path.join(args.debug_folder, save_name), dt_mask_image) + cv2.imwrite(os.path.join(args.debug_folder, save_name), dt_mask_image) else: - cv2.imwrite(os.path.join(args.debug_folder, save_name), np.hstack([gt_mask_image, dt_mask_image])) - + cv2.imwrite(os.path.join(args.debug_folder, save_name), np.hstack([gt_mask_image, dt_mask_image])) diff --git a/bboxmaskpose/sam2_utils.py b/bboxmaskpose/sam2_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..6cb393937a53cd3f8be3737b78dbd84e2a85bfd0 --- /dev/null +++ b/bboxmaskpose/sam2_utils.py @@ -0,0 +1,786 @@ +# Copyright (c) authors of BBoxMaskPose (BMPv2). All rights reserved. + +""" +SAM2 utilities for BMP demo: +- Build and prepare SAM model +- Convert poses to segmentation +- Compute mask-pose consistency +""" + +from typing import Any, List, Optional, Tuple + +import numpy as np +import torch +from mmengine.structures import InstanceData +from pycocotools import mask as Mask + +from bboxmaskpose.sam2.build_sam import build_sam2 +from bboxmaskpose.sam2.sam2_image_predictor import SAM2ImagePredictor + +# Threshold for keypoint validity in mask-pose consistency +STRICT_KPT_THRESHOLD: float = 0.5 + + +def _validate_sam_args(sam_args): + """Validate that all required sam_args attributes are present.""" + required = [ + "crop", + "use_bbox", + "confidence_thr", + "ignore_small_bboxes", + "num_pos_keypoints", + "num_pos_keypoints_if_crowd", + "crowd_by_max_iou", + "batch", + "exclusive_masks", + "extend_bbox", + "pose_mask_consistency", + "visibility_thr", + ] + for param in required: + if not hasattr(sam_args, param): + raise AttributeError(f"Missing required arg {param} in sam_args") + + +def _get_max_ious(bboxes: List[np.ndarray]) -> np.ndarray: + """Compute maximum IoU for each bbox against others.""" + if len(bboxes) == 0: + return np.zeros((0,), dtype=np.float32) + is_crowd = [0] * len(bboxes) + ious = Mask.iou(bboxes, bboxes, is_crowd) + mat = np.array(ious) + np.fill_diagonal(mat, 0) + return mat.max(axis=1) + + +def _compute_one_mask_pose_consistency( + mask: np.ndarray, + pos_keypoints: Optional[np.ndarray] = None, + neg_keypoints: Optional[np.ndarray] = None, +) -> float: + """Compute a consistency score between a mask and given keypoints.""" + if mask is None: + return 0.0 + + def _mean_inside(points: np.ndarray) -> float: + if points.size == 0: + return 0.0 + pts_int = np.floor(points[:, :2]).astype(int) + pts_int[:, 0] = np.clip(pts_int[:, 0], 0, mask.shape[1] - 1) + pts_int[:, 1] = np.clip(pts_int[:, 1], 0, mask.shape[0] - 1) + vals = mask[pts_int[:, 1], pts_int[:, 0]] + return vals.mean() if vals.size > 0 else 0.0 + + pos_mean = 0.0 + if pos_keypoints is not None: + valid = pos_keypoints[:, 2] > STRICT_KPT_THRESHOLD + pos_mean = _mean_inside(pos_keypoints[valid]) + + neg_mean = 0.0 + if neg_keypoints is not None: + valid = neg_keypoints[:, 2] > STRICT_KPT_THRESHOLD + pts = neg_keypoints[valid][:, :2] + inside = mask[np.floor(pts[:, 1]).astype(int), np.floor(pts[:, 0]).astype(int)] + neg_mean = (~inside.astype(bool)).mean() if inside.size > 0 else 0.0 + + return 0.5 * pos_mean + 0.5 * neg_mean + + +def _require_instance_keypoint_channels( + instances: InstanceData, + role: str, +) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]: + """ + Extract and validate keypoint channels from InstanceData. + + Returns: + coords: (N, K, 2) + scores: (N, K) + visibilities: (N, K) + probabilities: (N, K) + """ + if not hasattr(instances, "keypoints") or instances.keypoints is None: + raise AttributeError(f"{role} instances must contain keypoints") + if not hasattr(instances, "keypoint_vis") or instances.keypoint_vis is None: + raise AttributeError(f"{role} instances must contain keypoint_vis") + if not hasattr(instances, "keypoint_prob") or instances.keypoint_prob is None: + raise AttributeError(f"{role} instances must contain keypoint_prob") + + keypoints = np.asarray(instances.keypoints) + visibilities = np.asarray(instances.keypoint_vis) + probabilities = np.asarray(instances.keypoint_prob) + + if keypoints.ndim != 3 or keypoints.shape[-1] < 3: + raise ValueError(f"{role} keypoints must have shape (N, K, 3)") + + if hasattr(instances, "keypoint_scores") and instances.keypoint_scores is not None: + scores = np.asarray(instances.keypoint_scores) + else: + scores = keypoints[:, :, 2] + + if scores.ndim == 3 and scores.shape[-1] == 1: + scores = scores[..., 0] + if visibilities.ndim == 3 and visibilities.shape[-1] == 1: + visibilities = visibilities[..., 0] + if probabilities.ndim == 3 and probabilities.shape[-1] == 1: + probabilities = probabilities[..., 0] + + expected_shape = keypoints.shape[:2] + if scores.shape != expected_shape: + raise ValueError(f"{role} keypoint_scores shape {scores.shape} does not match {expected_shape}") + if visibilities.shape != expected_shape: + raise ValueError(f"{role} keypoint_vis shape {visibilities.shape} does not match {expected_shape}") + if probabilities.shape != expected_shape: + raise ValueError(f"{role} keypoint_prob shape {probabilities.shape} does not match {expected_shape}") + + coords = keypoints[:, :, :2] + return coords, scores, visibilities, probabilities + + +def _select_keypoints( + args: Any, + coords: np.ndarray, + scores: np.ndarray, + visibilities: np.ndarray, + num_visible: int, + bbox: Optional[Tuple[float, float, float, float]] = None, + method: Optional[str] = None, +) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: + """ + Select and order keypoints for SAM prompting using explicit channels. + + Visibility thresholding is always performed over visibilities. + Method-specific ranking uses: + - k_most_visible: visibility + - distance+confidence: confidence + + Returns: + selected_coords: (M, 2) + selected_drive_values: (M,) method-dependent ranking signal + selected_indices: (M,) original keypoint indices + """ + methods = ["k_most_visible", "distance", "distance+confidence", "closest"] + sel_method = method or args.selection_method + if sel_method not in methods: + raise ValueError(f"Unknown method for keypoint selection: {sel_method}") + + if num_visible <= 0: + return ( + np.empty((0, 2), dtype=np.float32), + np.empty((0,), dtype=np.float32), + np.empty((0,), dtype=np.int64), + ) + + coords = np.asarray(coords, dtype=np.float32) + scores = np.asarray(scores, dtype=np.float32).reshape(-1) + visibilities = np.asarray(visibilities, dtype=np.float32).reshape(-1) + + if coords.ndim != 2 or coords.shape[1] != 2: + raise ValueError("coords must have shape (K, 2)") + if not (coords.shape[0] == scores.shape[0] == visibilities.shape[0]): + raise ValueError("coords, scores and visibilities must share K") + + kept_indices = np.arange(coords.shape[0]) + + # Optional face-anchor for non-k_most_visible methods. + if sel_method != "k_most_visible" and coords.shape[0] >= 3: + facial_rel_idx = int(np.argmax(visibilities[:3])) + if visibilities[facial_rel_idx] >= args.visibility_thr: + facial_abs_idx = facial_rel_idx + remaining_abs = np.arange(3, coords.shape[0]) + reorder_abs = np.concatenate([np.array([facial_abs_idx]), remaining_abs]) + coords = coords[reorder_abs] + scores = scores[reorder_abs] + visibilities = visibilities[reorder_abs] + kept_indices = kept_indices[reorder_abs] + + # Visibility filtering for all methods. + vis_mask = visibilities >= args.visibility_thr + coords = coords[vis_mask] + scores = scores[vis_mask] + visibilities = visibilities[vis_mask] + kept_indices = kept_indices[vis_mask] + + if coords.shape[0] == 0: + return ( + np.empty((0, 2), dtype=np.float32), + np.empty((0,), dtype=np.float32), + np.empty((0,), dtype=np.int64), + ) + + if sel_method == "k_most_visible": + order = np.argsort(visibilities)[::-1] + drive_values = visibilities + + elif sel_method == "distance": + if bbox is None: + bbox_center = np.array([coords[:, 0].mean(), coords[:, 1].mean()]) + else: + bbox_center = np.array([(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2]) + dists = np.linalg.norm(coords - bbox_center, axis=1) + if coords.shape[0] > 1: + dist_matrix = np.linalg.norm(coords[:, None, :] - coords[None, :, :], axis=2) + np.fill_diagonal(dist_matrix, np.inf) + min_inter_dist = np.min(dist_matrix, axis=1) + else: + min_inter_dist = np.zeros((coords.shape[0],), dtype=np.float32) + order = np.argsort(dists + 3 * min_inter_dist)[::-1] + drive_values = scores + + elif sel_method == "distance+confidence": + conf_order = np.argsort(scores)[::-1] + ordered_coords = coords[conf_order] + ordered_scores = scores[conf_order] + ordered_vis = visibilities[conf_order] + ordered_indices = kept_indices[conf_order] + + if ordered_coords.shape[0] <= 1: + greedy_indices = np.arange(ordered_coords.shape[0]) + else: + dist_matrix = np.linalg.norm(ordered_coords[:, None, :] - ordered_coords[None, :, :], axis=2) + greedy_indices = [0] + available_scores = ordered_scores.copy() + available_scores[0] = -1 + for _ in range(ordered_coords.shape[0] - 1): + min_dist = np.min(dist_matrix[:, greedy_indices], axis=1) + min_dist[available_scores < np.percentile(available_scores, 80)] = -1 + next_idx = int(np.argmax(min_dist)) + greedy_indices.append(next_idx) + available_scores[next_idx] = -1 + greedy_indices = np.array(greedy_indices, dtype=np.int64) + + coords = ordered_coords[greedy_indices] + scores = ordered_scores[greedy_indices] + visibilities = ordered_vis[greedy_indices] + kept_indices = ordered_indices[greedy_indices] + order = np.arange(coords.shape[0]) + drive_values = scores + + else: # closest + if bbox is None: + bbox_center = np.array([coords[:, 0].mean(), coords[:, 1].mean()]) + else: + bbox_center = np.array([(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2]) + dists = np.linalg.norm(coords - bbox_center, axis=1) + order = np.argsort(dists) + drive_values = scores + + selected_coords = coords[order] + selected_drive_values = drive_values[order] + selected_indices = kept_indices[order] + return selected_coords, selected_drive_values, selected_indices + + +def prepare_model(model_cfg: Any, model_checkpoint: str) -> SAM2ImagePredictor: + """Build and return a SAM2ImagePredictor model on the appropriate device.""" + if torch.cuda.is_available(): + device = torch.device("cuda") + elif torch.backends.mps.is_available(): + device = torch.device("mps") + else: + device = torch.device("cpu") + + sam2 = build_sam2(model_cfg, model_checkpoint, device=device, apply_postprocessing=True) + model = SAM2ImagePredictor( + sam2, + max_hole_area=10.0, + max_sprinkle_area=50.0, + ) + return model + + +def _compute_mask_pose_consistency(masks: List[np.ndarray], keypoints_list: List[np.ndarray]) -> np.ndarray: + """Compute mask-pose consistency score for each mask-keypoints pair.""" + scores: List[float] = [] + for idx, (mask, kpts) in enumerate(zip(masks, keypoints_list)): + other_kpts = np.concatenate([keypoints_list[:idx], keypoints_list[idx + 1 :]], axis=0).reshape(-1, 3) + score = _compute_one_mask_pose_consistency(mask, kpts, other_kpts) + scores.append(score) + return np.array(scores) + + +def _pose2seg( + args: Any, + model: SAM2ImagePredictor, + bbox_xyxy: Optional[List[float]] = None, + pos_coords: Optional[np.ndarray] = None, + pos_scores: Optional[np.ndarray] = None, + pos_visibilities: Optional[np.ndarray] = None, + pos_probabilities: Optional[np.ndarray] = None, + neg_coords: Optional[np.ndarray] = None, + neg_scores: Optional[np.ndarray] = None, + neg_visibilities: Optional[np.ndarray] = None, + neg_probabilities: Optional[np.ndarray] = None, + image: Optional[np.ndarray] = None, + gt_mask: Optional[Any] = None, + num_pos_keypoints: Optional[int] = None, + gt_mask_is_binary: bool = False, +) -> Tuple[np.ndarray, np.ndarray, np.ndarray, float]: + """Run SAM segmentation conditioned on explicit keypoint channels.""" + num_pos_keypoints = args.num_pos_keypoints if num_pos_keypoints is None else num_pos_keypoints + + # Positive keypoints. + if pos_coords is not None and pos_scores is not None and pos_visibilities is not None: + pos_coords = pos_coords.reshape(-1, 2) + pos_scores = pos_scores.reshape(-1) + pos_visibilities = pos_visibilities.reshape(-1) + if pos_probabilities is not None: + _ = pos_probabilities.reshape(-1) + + valid_pos = (pos_scores > args.confidence_thr) & (pos_visibilities > args.visibility_thr) + pos_kpts, pos_drive, _ = _select_keypoints( + args, + coords=pos_coords, + scores=pos_scores, + visibilities=pos_visibilities, + num_visible=int(np.sum(valid_pos)), + bbox=bbox_xyxy, + ) + pos_kpts_backup = ( + np.concatenate([pos_kpts, pos_drive[:, None]], axis=1) if pos_kpts.shape[0] > 0 else np.empty((0, 3), dtype=np.float32) + ) + + if pos_kpts.shape[0] > num_pos_keypoints: + pos_kpts = pos_kpts[:num_pos_keypoints, :] + pos_kpts_backup = pos_kpts_backup[:num_pos_keypoints, :] + else: + pos_kpts = np.empty((0, 2), dtype=np.float32) + pos_kpts_backup = np.empty((0, 3), dtype=np.float32) + + # Negative keypoints. + if neg_coords is not None and neg_scores is not None and neg_visibilities is not None: + neg_coords = neg_coords.reshape(-1, 2) + neg_scores = neg_scores.reshape(-1) + neg_visibilities = neg_visibilities.reshape(-1) + if neg_probabilities is not None: + _ = neg_probabilities.reshape(-1) + + valid_neg = (neg_scores > args.confidence_thr) & (neg_visibilities > args.visibility_thr) + neg_kpts, neg_drive, _ = _select_keypoints( + args, + coords=neg_coords, + scores=neg_scores, + visibilities=neg_visibilities, + num_visible=int(np.sum(valid_neg)), + bbox=bbox_xyxy, + method="closest", + ) + selected_neg_kpts = neg_kpts + neg_kpts_backup = ( + np.concatenate([neg_kpts, neg_drive[:, None]], axis=1) if neg_kpts.shape[0] > 0 else np.empty((0, 3), dtype=np.float32) + ) + + if neg_kpts.shape[0] > args.num_neg_keypoints: + selected_neg_kpts = neg_kpts[: args.num_neg_keypoints, :] + else: + selected_neg_kpts = np.empty((0, 2), dtype=np.float32) + neg_kpts_backup = np.empty((0, 3), dtype=np.float32) + + # SAM prompts. + kpts = np.concatenate([pos_kpts, selected_neg_kpts], axis=0) + kpts_labels = np.concatenate([np.ones(pos_kpts.shape[0]), np.zeros(selected_neg_kpts.shape[0])], axis=0) + + bbox = bbox_xyxy if args.use_bbox else None + + if args.extend_bbox and bbox is not None and pos_kpts.shape[0] > 0: + pose_bbox = np.array([pos_kpts[:, 0].min() - 2, pos_kpts[:, 1].min() - 2, pos_kpts[:, 0].max() + 2, pos_kpts[:, 1].max() + 2]) + expanded_bbox = np.array(bbox) + expanded_bbox[:2] = np.minimum(bbox[:2], pose_bbox[:2]) + expanded_bbox[2:] = np.maximum(bbox[2:], pose_bbox[2:]) + bbox = expanded_bbox + + if args.crop and args.use_bbox and image is not None: + crop_bbox = np.array(bbox) + bbox_center = np.array([(crop_bbox[0] + crop_bbox[2]) / 2, (crop_bbox[1] + crop_bbox[3]) / 2]) + bbox_size = np.array([crop_bbox[2] - crop_bbox[0], crop_bbox[3] - crop_bbox[1]]) + bbox_size = 1.5 * bbox_size + crop_bbox = np.array( + [ + bbox_center[0] - bbox_size[0] / 2, + bbox_center[1] - bbox_size[1] / 2, + bbox_center[0] + bbox_size[0] / 2, + bbox_center[1] + bbox_size[1] / 2, + ] + ) + crop_bbox = np.round(crop_bbox).astype(int) + crop_bbox = np.clip(crop_bbox, 0, [image.shape[1], image.shape[0], image.shape[1], image.shape[0]]) + original_image_size = image.shape[:2] + image = image[crop_bbox[1] : crop_bbox[3], crop_bbox[0] : crop_bbox[2], :] + + kpts = kpts - crop_bbox[:2] + bbox[:2] = bbox[:2] - crop_bbox[:2] + bbox[2:] = bbox[2:] - crop_bbox[:2] + + model.set_image(image) + + masks, scores, logits = model.predict( + point_coords=kpts, + point_labels=kpts_labels, + box=bbox, + multimask_output=False, + ) + mask = masks[0] + scores = scores[0] + + if args.crop and args.use_bbox and image is not None: + mask_padded = np.zeros(original_image_size, dtype=np.uint8) + mask_padded[crop_bbox[1] : crop_bbox[3], crop_bbox[0] : crop_bbox[2]] = mask + mask = mask_padded + + bbox[:2] = bbox[:2] + crop_bbox[:2] + bbox[2:] = bbox[2:] + crop_bbox[:2] + + if args.pose_mask_consistency: + if gt_mask_is_binary: + gt_mask_binary = gt_mask + else: + gt_mask_binary = Mask.decode(gt_mask).astype(bool) if gt_mask is not None else None + + gt_mask_pose_consistency = _compute_one_mask_pose_consistency(gt_mask_binary, pos_kpts_backup, neg_kpts_backup) + dt_mask_pose_consistency = _compute_one_mask_pose_consistency(mask, pos_kpts_backup, neg_kpts_backup) + + tol = 0.1 + dt_is_same = np.abs(dt_mask_pose_consistency - gt_mask_pose_consistency) < tol + if dt_is_same: + mask = gt_mask_binary if gt_mask_binary.sum() < mask.sum() else mask + else: + mask = gt_mask_binary if gt_mask_pose_consistency > dt_mask_pose_consistency else mask + + return mask, pos_kpts_backup, neg_kpts_backup, scores + + +def process_image_with_SAM( + sam_args: Any, + image: np.ndarray, + model: SAM2ImagePredictor, + new_dets: InstanceData, + old_dets: Optional[InstanceData] = None, +) -> InstanceData: + """Validate args and route to single or batch processing.""" + _validate_sam_args(sam_args) + if sam_args.batch: + return _process_image_batch(sam_args, image, model, new_dets, old_dets) + return _process_image_single(sam_args, image, model, new_dets, old_dets) + + +def _process_image_single( + sam_args: Any, + image: np.ndarray, + model: SAM2ImagePredictor, + new_dets: InstanceData, + old_dets: Optional[InstanceData] = None, +) -> InstanceData: + """Refine instance segmentation masks using SAM2 with pose-conditioned prompts.""" + _validate_sam_args(sam_args) + + if not (sam_args.crop and sam_args.use_bbox): + model.set_image(image) + + new_coords, new_scores, new_visibilities, new_probabilities = _require_instance_keypoint_channels(new_dets, role="new") + + n_new_dets = len(new_dets.bboxes) + n_old_dets = 0 + if old_dets is not None: + n_old_dets = len(old_dets.bboxes) + old_coords, old_scores, old_visibilities, old_probabilities = _require_instance_keypoint_channels(old_dets, role="old") + + all_bboxes = new_dets.bboxes.copy() + if old_dets is not None: + all_bboxes = np.concatenate([all_bboxes, old_dets.bboxes], axis=0) + + max_ious = _get_max_ious(all_bboxes) + + new_dets.refined_masks = np.zeros((n_new_dets, image.shape[0], image.shape[1]), dtype=np.uint8) + new_dets.sam_scores = np.zeros_like(new_dets.bbox_scores) + new_dets.sam_kpts = np.zeros((len(new_dets.bboxes), sam_args.num_pos_keypoints, 3), dtype=np.float32) + + for instance_idx in range(len(new_dets.bboxes)): + bbox_xywh = new_dets.bboxes[instance_idx] + bbox_area = bbox_xywh[2] * bbox_xywh[3] + + if sam_args.ignore_small_bboxes and bbox_area < 100 * 100: + continue + + dt_mask = new_dets.pred_masks[instance_idx] if hasattr(new_dets, "pred_masks") else None + + bbox_xyxy = [bbox_xywh[0], bbox_xywh[1], bbox_xywh[0] + bbox_xywh[2], bbox_xywh[1] + bbox_xywh[3]] + + this_coords = new_coords[instance_idx] + this_scores = new_scores[instance_idx] + this_vis = new_visibilities[instance_idx] + this_probs = new_probabilities[instance_idx] + + other_coords = None + other_scores = None + other_vis = None + other_probs = None + + if old_dets is not None: + other_coords = old_coords.copy().reshape(n_old_dets, -1, 2) + other_scores = old_scores.copy().reshape(n_old_dets, -1) + other_vis = old_visibilities.copy().reshape(n_old_dets, -1) + other_probs = old_probabilities.copy().reshape(n_old_dets, -1) + + if len(new_coords) > 1: + other_new_coords = np.concatenate([new_coords[:instance_idx], new_coords[instance_idx + 1 :]], axis=0) + other_new_scores = np.concatenate([new_scores[:instance_idx], new_scores[instance_idx + 1 :]], axis=0) + other_new_vis = np.concatenate([new_visibilities[:instance_idx], new_visibilities[instance_idx + 1 :]], axis=0) + other_new_probs = np.concatenate([new_probabilities[:instance_idx], new_probabilities[instance_idx + 1 :]], axis=0) + + other_coords = np.concatenate([other_coords, other_new_coords], axis=0) if other_coords is not None else other_new_coords + other_scores = np.concatenate([other_scores, other_new_scores], axis=0) if other_scores is not None else other_new_scores + other_vis = np.concatenate([other_vis, other_new_vis], axis=0) if other_vis is not None else other_new_vis + other_probs = np.concatenate([other_probs, other_new_probs], axis=0) if other_probs is not None else other_new_probs + + num_pos_keypoints = sam_args.num_pos_keypoints + if sam_args.crowd_by_max_iou is not None and max_ious[instance_idx] > sam_args.crowd_by_max_iou: + bbox_xyxy = None + num_pos_keypoints = sam_args.num_pos_keypoints_if_crowd + + dt_mask, pos_kpts, neg_kpts, scores = _pose2seg( + sam_args, + model, + bbox_xyxy, + pos_coords=this_coords, + pos_scores=this_scores, + pos_visibilities=this_vis, + pos_probabilities=this_probs, + neg_coords=other_coords, + neg_scores=other_scores, + neg_visibilities=other_vis, + neg_probabilities=other_probs, + image=image if (sam_args.crop and sam_args.use_bbox) else None, + gt_mask=dt_mask, + num_pos_keypoints=num_pos_keypoints, + gt_mask_is_binary=True, + ) + + new_dets.refined_masks[instance_idx] = dt_mask + new_dets.sam_scores[instance_idx] = scores + + if len(pos_kpts) != sam_args.num_pos_keypoints: + pos_kpts = np.concatenate([pos_kpts, np.zeros((sam_args.num_pos_keypoints - len(pos_kpts), 3), dtype=np.float32)], axis=0) + new_dets.sam_kpts[instance_idx] = pos_kpts + + n_masks = len(new_dets.refined_masks) + (len(old_dets.refined_masks) if old_dets is not None else 0) + + if sam_args.exclusive_masks and n_masks > 1: + all_masks = ( + np.concatenate([new_dets.refined_masks, old_dets.refined_masks], axis=0) if old_dets is not None else new_dets.refined_masks + ) + all_scores = np.concatenate([new_dets.sam_scores, old_dets.sam_scores], axis=0) if old_dets is not None else new_dets.sam_scores + refined_masks = _apply_exclusive_masks(all_masks, all_scores) + new_dets.refined_masks = refined_masks[: len(new_dets.refined_masks)] + + return new_dets + + +def _process_image_batch( + sam_args: Any, + image: np.ndarray, + model: SAM2ImagePredictor, + new_dets: InstanceData, + old_dets: Optional[InstanceData] = None, +) -> InstanceData: + """Batch process multiple detection instances with SAM2 refinement.""" + n_new_dets = len(new_dets.bboxes) + + model.set_image(image) + + new_coords, new_scores, new_visibilities, new_probabilities = _require_instance_keypoint_channels(new_dets, role="new") + + image_coords = [] + image_scores = [] + image_visibilities = [] + image_probabilities = [] + image_bboxes = [] + num_valid_kpts = [] + + for instance_idx in range(len(new_dets.bboxes)): + bbox_xywh = new_dets.bboxes[instance_idx].copy() + bbox_area = bbox_xywh[2] * bbox_xywh[3] + if sam_args.ignore_small_bboxes and bbox_area < 100 * 100: + continue + + this_coords = new_coords[instance_idx].copy().reshape(-1, 2) + this_scores = new_scores[instance_idx].copy().reshape(-1) + this_vis = new_visibilities[instance_idx].copy().reshape(-1) + this_probs = new_probabilities[instance_idx].copy().reshape(-1) + + visible_kpts = (this_vis > sam_args.visibility_thr) & (this_scores > sam_args.confidence_thr) + num_visible = int(visible_kpts.sum()) + if num_visible <= 0: + continue + + num_valid_kpts.append(num_visible) + image_bboxes.append(np.array(bbox_xywh)) + image_coords.append(this_coords) + image_scores.append(this_scores) + image_visibilities.append(this_vis) + image_probabilities.append(this_probs) + + if old_dets is not None: + old_coords, old_scores, old_visibilities, old_probabilities = _require_instance_keypoint_channels(old_dets, role="old") + for instance_idx in range(len(old_dets.bboxes)): + bbox_xywh = old_dets.bboxes[instance_idx].copy() + bbox_area = bbox_xywh[2] * bbox_xywh[3] + if sam_args.ignore_small_bboxes and bbox_area < 100 * 100: + continue + + this_coords = old_coords[instance_idx].reshape(-1, 2) + this_scores = old_scores[instance_idx].reshape(-1) + this_vis = old_visibilities[instance_idx].reshape(-1) + this_probs = old_probabilities[instance_idx].reshape(-1) + + visible_kpts = (this_vis > sam_args.visibility_thr) & (this_scores > sam_args.confidence_thr) + num_visible = int(visible_kpts.sum()) + if num_visible <= 0: + continue + + num_valid_kpts.append(num_visible) + image_bboxes.append(np.array(bbox_xywh)) + image_coords.append(this_coords) + image_scores.append(this_scores) + image_visibilities.append(this_vis) + image_probabilities.append(this_probs) + + if len(image_bboxes) == 0: + new_dets.refined_masks = np.zeros((n_new_dets, image.shape[0], image.shape[1]), dtype=np.uint8) + new_dets.sam_scores = np.zeros((n_new_dets,), dtype=np.float32) + new_dets.sam_kpts = np.zeros((n_new_dets, sam_args.num_pos_keypoints, 3), dtype=np.float32) + return new_dets + + image_coords = np.array(image_coords) + image_scores = np.array(image_scores) + image_visibilities = np.array(image_visibilities) + image_probabilities = np.array(image_probabilities) + image_bboxes = np.array(image_bboxes) + num_valid_kpts = np.array(num_valid_kpts) + + prepared_kpts = [] + prepared_kpts_backup = [] + for bbox, coords, scores, visibilities, probabilities, num_visible in zip( + image_bboxes, + image_coords, + image_scores, + image_visibilities, + image_probabilities, + num_valid_kpts, + ): + _ = probabilities # extracted for clarity; currently unused in selection + this_kpts, this_drive_vals, _ = _select_keypoints( + sam_args, + coords=coords, + scores=scores, + visibilities=visibilities, + num_visible=int(num_visible), + bbox=bbox, + ) + + if this_kpts.shape[0] == 0: + continue + + if this_kpts.shape[0] < num_valid_kpts.max(): + this_kpts = np.concatenate([this_kpts, np.tile(this_kpts[-1], (num_valid_kpts.max() - this_kpts.shape[0], 1))], axis=0) + this_drive_vals = np.concatenate( + [this_drive_vals, np.tile(this_drive_vals[-1], (num_valid_kpts.max() - this_drive_vals.shape[0],))], + axis=0, + ) + + prepared_kpts.append(this_kpts) + prepared_kpts_backup.append(np.concatenate([this_kpts, this_drive_vals[:, None]], axis=1)) + + if len(prepared_kpts) == 0: + new_dets.refined_masks = np.zeros((n_new_dets, image.shape[0], image.shape[1]), dtype=np.uint8) + new_dets.sam_scores = np.zeros((n_new_dets,), dtype=np.float32) + new_dets.sam_kpts = np.zeros((n_new_dets, sam_args.num_pos_keypoints, 3), dtype=np.float32) + return new_dets + + image_kpts = np.array(prepared_kpts) + image_kpts_backup = np.array(prepared_kpts_backup) + kpts_labels = np.ones(image_kpts.shape[:2]) + + max_ious = _get_max_ious(image_bboxes) + num_pos_keypoints = sam_args.num_pos_keypoints + use_bbox = sam_args.use_bbox + if sam_args.crowd_by_max_iou is not None and len(max_ious) > 0 and max_ious.max() > sam_args.crowd_by_max_iou: + use_bbox = False + num_pos_keypoints = sam_args.num_pos_keypoints_if_crowd + + if num_pos_keypoints > 0 and num_pos_keypoints < image_kpts.shape[1]: + image_kpts = image_kpts[:, :num_pos_keypoints, :] + kpts_labels = kpts_labels[:, :num_pos_keypoints] + image_kpts_backup = image_kpts_backup[:, :num_pos_keypoints, :] + elif num_pos_keypoints == 0: + image_kpts = None + kpts_labels = None + image_kpts_backup = np.empty((0, 3), dtype=np.float32) + + image_bboxes_xyxy = None + if use_bbox: + image_bboxes_xyxy = np.array(image_bboxes) + image_bboxes_xyxy[:, 2:] += image_bboxes_xyxy[:, :2] + + if sam_args.extend_bbox and image_kpts is not None and image_kpts.size > 0: + pose_bbox = np.stack( + [ + np.min(image_kpts[:, :, 0], axis=1) - 2, + np.min(image_kpts[:, :, 1], axis=1) - 2, + np.max(image_kpts[:, :, 0], axis=1) + 2, + np.max(image_kpts[:, :, 1], axis=1) + 2, + ], + axis=1, + ) + expanded_bbox = np.array(image_bboxes_xyxy) + expanded_bbox[:, :2] = np.minimum(expanded_bbox[:, :2], pose_bbox[:, :2]) + expanded_bbox[:, 2:] = np.maximum(expanded_bbox[:, 2:], pose_bbox[:, 2:]) + image_bboxes_xyxy = expanded_bbox + + masks, scores, logits = model.predict( + point_coords=image_kpts, + point_labels=kpts_labels, + box=image_bboxes_xyxy, + multimask_output=False, + ) + + if len(masks.shape) == 3: + masks = masks[None, :, :, :] + masks = masks[:, 0, :, :] + n_masks_out = masks.shape[0] + scores = scores.reshape(n_masks_out) + + if sam_args.exclusive_masks and n_masks_out > 1: + masks = _apply_exclusive_masks(masks, scores) + + gt_masks = new_dets.pred_masks.copy() if new_dets.pred_masks is not None else None + if sam_args.pose_mask_consistency and gt_masks is not None: + dt_mask_pose_consistency = _compute_mask_pose_consistency(masks, image_kpts_backup) + gt_mask_pose_consistency = _compute_mask_pose_consistency(gt_masks, image_kpts_backup) + + dt_masks_area = np.array([m.sum() for m in masks]) + gt_masks_area = np.array([m.sum() for m in gt_masks]) if gt_masks is not None else np.zeros_like(dt_masks_area) + + tol = 0.1 + pmc_is_equal = np.isclose(dt_mask_pose_consistency, gt_mask_pose_consistency, atol=tol) + dt_is_worse = (dt_mask_pose_consistency < (gt_mask_pose_consistency - tol)) | (pmc_is_equal & (dt_masks_area > gt_masks_area)) + + new_masks = [] + for dt_mask, gt_mask, dt_worse in zip(masks, gt_masks, dt_is_worse): + new_masks.append(gt_mask if dt_worse else dt_mask) + masks = np.array(new_masks) + + new_dets.refined_masks = masks[:n_new_dets] + new_dets.sam_scores = scores[:n_new_dets] + new_dets.sam_kpts = image_kpts_backup[:n_new_dets] + + return new_dets + + +def _apply_exclusive_masks(masks: np.ndarray, scores: np.ndarray) -> np.ndarray: + """Ensure masks are non-overlapping by keeping per pixel the highest-score mask.""" + no_mask = masks.sum(axis=0) == 0 + masked_scores = masks * scores[:, None, None] + argmax_masks = np.argmax(masked_scores, axis=0) + new_masks = argmax_masks[None, :, :] == (np.arange(masks.shape[0])[:, None, None]) + new_masks[:, no_mask] = 0 + return new_masks diff --git a/bboxmaskpose/sam3d_build_fov_estimator.py b/bboxmaskpose/sam3d_build_fov_estimator.py new file mode 100644 index 0000000000000000000000000000000000000000..fd66b491754a6bab360092872cc2ff508e9e9f23 --- /dev/null +++ b/bboxmaskpose/sam3d_build_fov_estimator.py @@ -0,0 +1,71 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. + +import torch + + +class FOVEstimator: + def __init__(self, name="moge2", device="cuda", **kwargs): + self.device = device + + if name == "moge2": + print("########### Using fov estimator: MoGe2...") + self.fov_estimator = load_moge(device, **kwargs) + self.fov_estimator_func = run_moge + + self.fov_estimator.eval() + else: + raise NotImplementedError + + def get_cam_intrinsics(self, img, **kwargs): + return self.fov_estimator_func(self.fov_estimator, img, self.device, **kwargs) + + +def load_moge(device, path=""): + from moge.model.v2 import MoGeModel + + if path == "": + path = "Ruicheng/moge-2-vitl-normal" + moge_model = MoGeModel.from_pretrained(path).to(device) + return moge_model + + +def run_moge(model, input_image, device): + # We expect the image to be RGB already + H, W, _ = input_image.shape + input_image = torch.tensor(input_image / 255, dtype=torch.float32, device=device).permute(2, 0, 1) + + # Infer w/ MoGe2 + moge_data = model.infer(input_image) + + # get intrinsics + intrinsics = denormalize_f(moge_data["intrinsics"].cpu().numpy(), H, W) + v_focal = intrinsics[1, 1] + + # override hfov with v_focal + intrinsics[0, 0] = v_focal + # add batch dim + cam_intrinsics = intrinsics[None] + + return cam_intrinsics + + +def denormalize_f(norm_K, height, width): + # Extract cx and cy from the normalized K matrix + cx_norm = norm_K[0][2] # c_x is at K[0][2] + cy_norm = norm_K[1][2] # c_y is at K[1][2] + + fx_norm = norm_K[0][0] # Normalized fx + fy_norm = norm_K[1][1] # Normalized fy + # s_norm = norm_K[0][1] # Skew (usually 0) + + # Scale to absolute values + fx_abs = fx_norm * width + fy_abs = fy_norm * height + cx_abs = cx_norm * width + cy_abs = cy_norm * height + # s_abs = s_norm * width + s_abs = 0 + + # Construct absolute K matrix + abs_K = torch.tensor([[fx_abs, s_abs, cx_abs], [0.0, fy_abs, cy_abs], [0.0, 0.0, 1.0]]) + return abs_K diff --git a/bboxmaskpose/sam3d_utils.py b/bboxmaskpose/sam3d_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..e8a1f3a7fa9e76292a15ad1761ceb8e3930783dc --- /dev/null +++ b/bboxmaskpose/sam3d_utils.py @@ -0,0 +1,250 @@ +# Copyright (c) authors of BBoxMaskPose (BMPv2). All rights reserved. + +""" +SAM-3D-Body integration utilities for BBoxMaskPose. + +This module provides a lightweight wrapper for integrating SAM-3D-Body +(3D human mesh recovery) into the BBoxMaskPose pipeline. +""" + +import os +from pathlib import Path +from typing import Dict, List, Optional, Tuple, Union + +import cv2 +import numpy as np +import torch + + +def check_sam3d_available() -> bool: + """ + Check if SAM-3D-Body package is available. + + Returns: + bool: True if sam_3d_body can be imported, False otherwise. + """ + try: + import sam_3d_body + + return True + except ImportError: + return False + + +class SAM3DBodyWrapper: + """ + Wrapper class for SAM-3D-Body model. + + This class provides a simplified interface for 3D human mesh recovery + that integrates seamlessly with BBoxMaskPose outputs. + + Example: + >>> sam3d = SAM3DBodyWrapper(device="cuda") + >>> meshes = sam3d.predict( + ... image="path/to/image.jpg", + ... bboxes=bmp_result['bboxes'], + ... masks=bmp_result['masks'] + ... ) + """ + + def __init__( + self, + checkpoint_path: Optional[str] = None, + mhr_path: Optional[str] = None, + device: str = "cuda", + use_detector: bool = False, + use_segmentor: bool = False, + use_fov: bool = True, + fov_name: str = "moge2", + ): + """ + Initialize SAM-3D-Body wrapper. + + Args: + checkpoint_path: Path to SAM-3D-Body checkpoint. If None, will attempt + to load from HuggingFace (facebook/sam-3d-body-dinov3). + mhr_path: Path to MHR (Momentum Human Rig) model file. + device: Device for inference ('cuda' or 'cpu'). + use_detector: Whether to use built-in human detector (not needed with BMP). + use_segmentor: Whether to use built-in segmentor (not needed with BMP). + use_fov: Whether to use FOV estimator for camera calibration. + fov_name: FOV estimator name ('moge2' recommended). + """ + if not check_sam3d_available(): + raise ImportError( + "SAM-3D-Body package not found. Please install it following:\n" + "https://github.com/facebookresearch/sam-3d-body/blob/main/INSTALL.md\n\n" + "Quick install:\n" + "pip install pytorch-lightning pyrender opencv-python yacs scikit-image " + "einops timm dill pandas rich hydra-core pyrootutils webdataset networkx==3.2.1 " + "roma joblib seaborn appdirs ffmpeg cython jsonlines loguru optree fvcore " + "black pycocotools huggingface_hub\n" + "pip install 'git+https://github.com/facebookresearch/detectron2.git@a1ce2f9' " + "--no-build-isolation --no-deps\n" + "pip install git+https://github.com/microsoft/MoGe.git" + ) + + from sam_3d_body import SAM3DBodyEstimator, load_sam_3d_body, load_sam_3d_body_hf + + self.device = torch.device(device) if isinstance(device, str) else device + + # Load SAM-3D-Body model + if checkpoint_path is not None: + print(f"Loading SAM-3D-Body from checkpoint: {checkpoint_path}") + self.model, self.model_cfg = load_sam_3d_body(checkpoint_path, device=self.device, mhr_path=mhr_path) + else: + # Load from HuggingFace + print("Loading SAM-3D-Body from HuggingFace (facebook/sam-3d-body-dinov3)") + print("Note: This requires HuggingFace authentication and access approval.") + self.model, self.model_cfg = load_sam_3d_body_hf(repo_id="facebook/sam-3d-body-dinov3", device=self.device) + + print("โœ“ SAM-3D-Body model loaded successfully") + + # Initialize optional components + human_detector = None + human_segmentor = None + fov_estimator = None + + # if use_detector: + # from sam_3d_body.tools.build_detector import HumanDetector + # human_detector = HumanDetector(name="vitdet", device=self.device) + + # if use_segmentor: + # from sam_3d_body.tools.build_sam import HumanSegmentor + # human_segmentor = HumanSegmentor(name="sam2", device=self.device) + + if use_fov: + try: + from .sam3d_build_fov_estimator import FOVEstimator + + # fov_estimator = FOVEstimator(name=fov_name, device=self.device) + except Exception as e: + print(f"Warning: Could not load FOV estimator: {e}") + print("Continuing without FOV estimation (will use default FOV)") + + # Create estimator + self.estimator = SAM3DBodyEstimator( + sam_3d_body_model=self.model, + model_cfg=self.model_cfg, + human_detector=human_detector, + human_segmentor=human_segmentor, + fov_estimator=fov_estimator, + ) + + print("โœ“ SAM-3D-Body initialized successfully") + + def predict( + self, + image: Union[str, np.ndarray], + bboxes: Optional[np.ndarray] = None, + masks: Optional[np.ndarray] = None, + keypoints: Optional[np.ndarray] = None, + bbox_thr: float = 0.5, + nms_thr: float = 0.3, + use_mask: bool = True, + inference_type: str = "full", + ) -> List[Dict]: + """ + Predict 3D human meshes from image. + + Args: + image: Input image (path or numpy array in BGR format). + bboxes: Bounding boxes (N, 4) in [x1, y1, x2, y2] format. + If None, will use internal detector (if available). + masks: Binary masks (N, H, W) or (N, H, W, 1). + If provided, will be used for mask-conditioned inference. + bbox_thr: Bounding box detection threshold (only if using internal detector). + nms_thr: NMS threshold (only if using internal detector). + use_mask: Whether to use mask-conditioned inference. + inference_type: Type of inference to run: + - "full": Full-body inference with both body and hand decoders (default) + - "body": Inference with body decoder only (faster) + - "hand": Inference with hand decoder only + + Returns: + List of prediction dicts, one per detected person. Each dict contains: + - 'vertices': (V, 3) 3D mesh vertices in camera coordinates + - 'faces': (F, 3) mesh face indices + - 'joints': (J, 3) 3D joint locations + - 'bbox': (4,) bounding box [x1, y1, x2, y2] + - 'mask': (H, W) binary mask (if provided) + And other intermediate outputs from SAM-3D-Body + """ + # Handle different image input formats + if isinstance(image, str): + img = cv2.imread(image) + if img is None: + raise ValueError(f"Could not read image: {image}") + else: + img = image + + # Process masks if provided + processed_masks = None + if masks is not None: + # Ensure masks are in correct format (N, H, W) + if masks.ndim == 4 and masks.shape[-1] == 1: + masks = masks.squeeze(-1) + + # Ensure masks are in [0, 255] range + # BMP outputs binary masks [0, 1], so we need to scale + if masks.max() <= 1.0: + processed_masks = (masks * 255).astype(np.uint8) + else: + # Already in [0, 255] range + processed_masks = np.clip(masks, 0, 255).astype(np.uint8) + use_mask = True + + # Run SAM-3D-Body inference + outputs = self.estimator.process_one_image( + img, + bboxes=bboxes, + masks=processed_masks, + bbox_thr=bbox_thr, + nms_thr=nms_thr, + use_mask=use_mask, + keypoints=keypoints, + inference_type=inference_type, + ) + + return outputs + + @property + def faces(self) -> np.ndarray: + """Get mesh face indices for visualization.""" + return self.estimator.faces + + +def visualize_3d_meshes( + image: np.ndarray, + outputs: List[Dict], + faces: np.ndarray, + masks: Optional[np.ndarray] = None, + keypoints: Optional[np.ndarray] = None, + output_path: Optional[str] = None, +) -> np.ndarray: + """ + Visualize 3D mesh predictions on the input image. + + Args: + image: Input image (BGR format). + outputs: List of prediction dicts from SAM3DBodyWrapper.predict(). + faces: Mesh face indices. + masks: Optional binary masks for each detected person. + output_path: Optional path to save the visualization. + + Returns: + Visualization image with rendered 3D meshes (BGR format). + """ + try: + from .sam3d_vis_utils import visualize_sample_together + + vis_img = visualize_sample_together(image, outputs, faces, masks=masks, keypoints=keypoints) + + if output_path is not None: + cv2.imwrite(output_path, vis_img.astype(np.uint8)) + + return vis_img.astype(np.uint8) + except ImportError: + print("Warning: SAM-3D-Body visualization tools not available") + print("Returning original image") + return image diff --git a/bboxmaskpose/sam3d_vis_utils.py b/bboxmaskpose/sam3d_vis_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..88acb31b6278562a9b8e948046429e86842a8740 --- /dev/null +++ b/bboxmaskpose/sam3d_vis_utils.py @@ -0,0 +1,454 @@ +# Copyright (c) authors of BBoxMaskPose (BMPv2). All rights reserved. +# and +# Copyright (c) Meta Platforms, Inc. and affiliates. +import cv2 +import numpy as np + +from sam_3d_body.metadata.mhr70 import pose_info as mhr70_pose_info +from sam_3d_body.visualization.renderer import Renderer +from sam_3d_body.visualization.skeleton_visualizer import SkeletonVisualizer + +try: + import distinctipy + + HAS_DISTINCTIPY = True +except ImportError: + HAS_DISTINCTIPY = False + +try: + from posevis import pose_visualization + + has_posevis = True +except ImportError: + has_posevis = False + + +LIGHT_BLUE = (0.65098039, 0.74117647, 0.85882353) +# Amount of yaw used when showing meshes from the "side". +SIDE_VIEW_ROTATION_DEG = 60 + +visualizer = SkeletonVisualizer(line_width=2, radius=5) +visualizer.set_pose_meta(mhr70_pose_info) + + +def _build_color_palettes(num_colors, pastel_factor=0.5, order=None): + if num_colors <= 0: + return ( + np.zeros((0, 3), dtype=np.float32), + np.zeros((0, 3), dtype=np.uint8), + ) + + if HAS_DISTINCTIPY: + rgb = np.array(distinctipy.get_colors(num_colors, exclude_colors=[(0, 1, 0), (0, 0, 0), (1, 1, 1)], rng=0), dtype=np.float32) + bgr_float = rgb[:, ::-1] + else: + random_colors = [] + for _ in range(num_colors): + random_hsv = np.array([np.random.uniform(0, 255), 255, 255], dtype=np.float32).reshape(1, 1, 3) + random_bgr = cv2.cvtColor(random_hsv.astype(np.uint8), cv2.COLOR_HSV2BGR).flatten().astype(np.float32) / 255.0 + random_colors.append(random_bgr) + bgr_float = np.stack(random_colors, axis=0) + + if pastel_factor > 0: + hsv = cv2.cvtColor( + (bgr_float.reshape(1, num_colors, 3) * 255).astype(np.uint8), + cv2.COLOR_BGR2HSV, + ) + # Lower the saturation to get pastel colors + hsv = hsv.astype(np.float32) + s_floor = 20.0 + v_target = 240.0 + hsv[:, :, 1] = hsv[:, :, 1] * (1.0 - pastel_factor) + pastel_factor * s_floor + hsv[:, :, 2] = hsv[:, :, 2] * (1.0 - pastel_factor) + pastel_factor * v_target + hsv = np.clip(hsv, 0, 255).astype(np.uint8) + bgr_float = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR).reshape(num_colors, 3).astype(np.float32) / 255.0 + + if order is not None: + ordered_bgr_float = np.zeros_like(bgr_float) + ordered_bgr_float[order, :] = bgr_float + bgr_float = ordered_bgr_float + + return bgr_float, (bgr_float * 255).astype(np.uint8) + + +def _merge_mesh_instances(outputs_sorted, faces): + if not outputs_sorted: + return None, None, None, [] + + merged_vertices = [] + merged_faces = [] + vertex_counts = [] + vertex_offset = 0 + faces_np = np.asarray(faces, dtype=np.int32) + for person_output in outputs_sorted: + verts = np.asarray(person_output["pred_vertices"], dtype=np.float32) + np.asarray(person_output["pred_cam_t"], dtype=np.float32) + merged_vertices.append(verts) + merged_faces.append(faces_np + vertex_offset) + vertex_counts.append(verts.shape[0]) + vertex_offset += verts.shape[0] + + merged_vertices = np.concatenate(merged_vertices, axis=0) + merged_faces = np.concatenate(merged_faces, axis=0) + + verts_per_person = vertex_counts[0] if vertex_counts else 0 + tail_vertices = min(2 * verts_per_person, merged_vertices.shape[0]) + if tail_vertices > 0: + tail = merged_vertices[-tail_vertices:] + fake_pred_cam_t = (np.max(tail, axis=0) + np.min(tail, axis=0)) / 2.0 + else: + fake_pred_cam_t = np.zeros(3, dtype=np.float32) + + merged_vertices = merged_vertices - fake_pred_cam_t + return merged_vertices, merged_faces, fake_pred_cam_t, vertex_counts + + +def _expand_vertex_colors(mesh_colors, vertex_counts): + if not mesh_colors or not vertex_counts: + return None + per_vertex_colors = [] + for color, count in zip(mesh_colors, vertex_counts): + tiled = np.tile(np.asarray(color, dtype=np.float32), (count, 1)) + per_vertex_colors.append(tiled) + return np.concatenate(per_vertex_colors, axis=0) + + +def visualize_sample(img_cv2, outputs, faces, distinct_colors=False): + img_keypoints = img_cv2.copy() + img_mesh = img_cv2.copy() + + if distinct_colors: + palette_float, palette_uint8 = _build_color_palettes(len(outputs)) + else: + palette_float, palette_uint8 = None, None + + rend_img = [] + for pid, person_output in enumerate(outputs): + if distinct_colors and palette_float is not None and pid < len(palette_float): + mesh_color = tuple(palette_float[pid].tolist()) + bbox_color = palette_uint8[pid].tolist() + else: + mesh_color = LIGHT_BLUE + bbox_color = (0, 255, 0) + keypoints_2d = person_output["pred_keypoints_2d"] + keypoints_2d = np.concatenate([keypoints_2d, np.ones((keypoints_2d.shape[0], 1))], axis=-1) + img1 = visualizer.draw_skeleton(img_keypoints.copy(), keypoints_2d) + + img1 = cv2.rectangle( + img1, + (int(person_output["bbox"][0]), int(person_output["bbox"][1])), + (int(person_output["bbox"][2]), int(person_output["bbox"][3])), + bbox_color, + 2, + ) + + if "lhand_bbox" in person_output: + img1 = cv2.rectangle( + img1, + ( + int(person_output["lhand_bbox"][0]), + int(person_output["lhand_bbox"][1]), + ), + ( + int(person_output["lhand_bbox"][2]), + int(person_output["lhand_bbox"][3]), + ), + (255, 0, 0), + 2, + ) + + if "rhand_bbox" in person_output: + img1 = cv2.rectangle( + img1, + ( + int(person_output["rhand_bbox"][0]), + int(person_output["rhand_bbox"][1]), + ), + ( + int(person_output["rhand_bbox"][2]), + int(person_output["rhand_bbox"][3]), + ), + (0, 0, 255), + 2, + ) + + renderer = Renderer(focal_length=person_output["focal_length"], faces=faces) + img2 = ( + renderer( + person_output["pred_vertices"], + person_output["pred_cam_t"], + img_mesh.copy(), + mesh_base_color=mesh_color, + scene_bg_color=(1, 1, 1), + ) + * 255 + ) + + white_img = np.ones_like(img_cv2) * 255 + img3 = ( + renderer( + person_output["pred_vertices"], + person_output["pred_cam_t"], + white_img, + mesh_base_color=mesh_color, + scene_bg_color=(1, 1, 1), + side_view=True, + rot_angle=SIDE_VIEW_ROTATION_DEG, + ) + * 255 + ) + + cur_img = np.concatenate([img_cv2, img1, img2, img3], axis=1) + rend_img.append(cur_img) + + return rend_img + + +def visualize_sample_animation( + img_cv2, + outputs, + faces, + masks=None, + keypoints=None, + distinct_colors=True, +): + # Render everything together + img_mesh = img_cv2.copy() + + # First, sort by depth, furthest to closest + all_depths = np.stack([tmp["pred_cam_t"] for tmp in outputs], axis=0)[:, 2] + sorted_indices = np.argsort(-all_depths) + outputs_sorted = [outputs[idx] for idx in sorted_indices] + + mesh_colors = [] + if distinct_colors: + + scores = [out["bbox_score"] for out in outputs_sorted] + sorted_color_indices = np.argsort(np.array(scores))[::-1] + palette_float, palette_uint8 = _build_color_palettes(len(outputs_sorted), order=sorted_color_indices, pastel_factor=0.2) + random_rgb_colors = None + else: + palette_float = palette_uint8 = None + random_rgb_colors = [] + for _ in range(len(outputs_sorted)): + random_hsv = np.array([np.random.uniform(0, 255), 255, 255]).reshape(1, 1, 3) + random_rgb = cv2.cvtColor(random_hsv.astype(np.uint8), cv2.COLOR_HSV2BGR).flatten().astype(np.uint8) + random_rgb_colors.append(random_rgb) + + # Then, draw all keypoints and bboxes. + for pid, person_output in enumerate(outputs_sorted): + if distinct_colors and palette_float is not None and pid < len(palette_float): + mesh_color = tuple(palette_float[pid].tolist()) + color_uint8 = palette_uint8[pid].tolist() + else: + mesh_color = LIGHT_BLUE + color_uint8 = random_rgb_colors[pid].tolist() + mesh_colors.append(mesh_color) + + merged_vertices, merged_faces, fake_pred_cam_t, vertex_counts = _merge_mesh_instances(outputs_sorted, faces) + + rendered_images = [] + + if merged_vertices is None: + img_mesh = img_cv2.copy() + img_mesh_side = np.ones_like(img_cv2) * 255 + else: + renderer = Renderer(focal_length=outputs_sorted[-1]["focal_length"], faces=merged_faces) + vertex_colors = _expand_vertex_colors(mesh_colors, vertex_counts) if distinct_colors else None + img_mesh = ( + renderer( + merged_vertices, + fake_pred_cam_t, + img_mesh.astype(np.float32), + mesh_base_color=LIGHT_BLUE, + scene_bg_color=(1, 1, 1), + vertex_colors=vertex_colors, + ) + * 255 + ).astype(np.uint8) + white_img = np.ones_like(img_cv2, dtype=np.float32) * 255.0 + + rendered_images.append(img_mesh) + + num_frames_per_transition = 10 + angles = np.concatenate( + [ + np.linspace(0, SIDE_VIEW_ROTATION_DEG, num=num_frames_per_transition, endpoint=True), + np.linspace(SIDE_VIEW_ROTATION_DEG, 0, num=num_frames_per_transition, endpoint=True), + np.linspace(0, -SIDE_VIEW_ROTATION_DEG, num=num_frames_per_transition, endpoint=True), + np.linspace(-SIDE_VIEW_ROTATION_DEG, 0, num=num_frames_per_transition, endpoint=True), + ], + axis=0, + ) + + for angle in angles: + img_mesh_side = ( + renderer( + merged_vertices, + fake_pred_cam_t, + white_img, + mesh_base_color=LIGHT_BLUE, + scene_bg_color=(1, 1, 1), + vertex_colors=vertex_colors, + side_view=True, + rot_angle=angle, + ) + * 255 + ).astype(np.uint8) + rendered_images.append(img_mesh_side) + + rendered_images.append(img_mesh) + + return rendered_images + + +def visualize_sample_together( + img_cv2, + outputs, + faces, + masks=None, + keypoints=None, + distinct_colors=True, +): + # Render everything together + img_bboxes = img_cv2.copy() + img_keypoints = img_cv2.copy() + img_mesh = img_cv2.copy() + + # First, sort by depth, furthest to closest + all_depths = np.stack([tmp["pred_cam_t"] for tmp in outputs], axis=0)[:, 2] + sorted_indices = np.argsort(-all_depths) + outputs_sorted = [outputs[idx] for idx in sorted_indices] + + if masks is not None: + masks = [masks[idx] for idx in sorted_indices] + if keypoints is not None: + keypoints = [keypoints[idx] for idx in sorted_indices] + + mesh_colors = [] + if distinct_colors: + + scores = [out["bbox_score"] for out in outputs_sorted] + sorted_color_indices = np.argsort(np.array(scores))[::-1] + palette_float, palette_uint8 = _build_color_palettes(len(outputs_sorted), order=sorted_color_indices, pastel_factor=0.2) + random_rgb_colors = None + else: + palette_float = palette_uint8 = None + random_rgb_colors = [] + for _ in range(len(outputs_sorted)): + random_hsv = np.array([np.random.uniform(0, 255), 255, 255]).reshape(1, 1, 3) + random_rgb = cv2.cvtColor(random_hsv.astype(np.uint8), cv2.COLOR_HSV2BGR).flatten().astype(np.uint8) + random_rgb_colors.append(random_rgb) + + # Then, draw all keypoints and bboxes. + for pid, person_output in enumerate(outputs_sorted): + if distinct_colors and palette_float is not None and pid < len(palette_float): + mesh_color = tuple(palette_float[pid].tolist()) + color_uint8 = palette_uint8[pid].tolist()[::-1] + else: + mesh_color = LIGHT_BLUE + color_uint8 = random_rgb_colors[pid].tolist()[::-1] + mesh_colors.append(mesh_color) + mask = masks[pid] if masks is not None else None + + bbox = person_output["bbox"] + + if masks is not None: + masks_image = img_bboxes.copy() + mask = masks[pid] + binary_mask = mask > 0.5 + masks_image[binary_mask, :] = color_uint8 + rows = np.any(binary_mask, axis=1) + cols = np.any(binary_mask, axis=0) + rmin, rmax = np.where(rows)[0][[0, -1]] + cmin, cmax = np.where(cols)[0][[0, -1]] + bbox = [cmin, rmin, cmax, rmax] + img_bboxes = cv2.addWeighted(img_bboxes, 0.4, masks_image, 0.6, 0) + + bbox_score = person_output.get("bbox_score", 0.0) + img_bboxes = cv2.rectangle( + img_bboxes, + (int(bbox[0]), int(bbox[1])), + (int(bbox[2]), int(bbox[3])), + color_uint8, + 2, + ) + img_bboxes = cv2.putText( + img_bboxes, + f"{bbox_score:.2f}", + (int(bbox[0]), int(bbox[1]) - 10), + cv2.FONT_HERSHEY_SIMPLEX, + 0.5, + color_uint8, + 2, + ) + + # Draw masks semi-transparently if available + + # Draw keypoints if available + if keypoints is not None: + keypoints_2d = keypoints[pid] + if has_posevis: + img_bboxes = pose_visualization( + img_bboxes, + keypoints_2d, + format="COCO", + greyness=1.0, + show_markers=True, + show_bones=True, + line_type="solid", + width_multiplier=1.0, + bbox_width_multiplier=1.0, + show_bbox=False, + differ_individuals=True, + conf_thr=0.3, + errors=None, + color=color_uint8, + keep_image_size=True, + return_padding=False, + ) + else: + for kp in keypoints_2d: + cv2.circle(img_bboxes, (int(kp[0]), int(kp[1])), 3, color_uint8, -1) + + keypoints_2d = person_output["pred_keypoints_2d"] + keypoints_2d = np.concatenate([keypoints_2d, np.ones((keypoints_2d.shape[0], 1))], axis=-1) + img_keypoints = visualizer.draw_skeleton(img_keypoints, keypoints_2d) + + merged_vertices, merged_faces, fake_pred_cam_t, vertex_counts = _merge_mesh_instances(outputs_sorted, faces) + + if merged_vertices is None: + img_mesh = img_cv2.copy() + img_mesh_side = np.ones_like(img_cv2) * 255 + else: + renderer = Renderer(focal_length=outputs_sorted[-1]["focal_length"], faces=merged_faces) + vertex_colors = _expand_vertex_colors(mesh_colors, vertex_counts) if distinct_colors else None + img_mesh = ( + renderer( + merged_vertices, + fake_pred_cam_t, + img_mesh.astype(np.float32), + mesh_base_color=LIGHT_BLUE, + scene_bg_color=(1, 1, 1), + vertex_colors=vertex_colors, + ) + * 255 + ).astype(np.uint8) + + white_img = np.ones_like(img_cv2, dtype=np.float32) * 255.0 + img_mesh_side = ( + renderer( + merged_vertices, + fake_pred_cam_t, + white_img, + mesh_base_color=LIGHT_BLUE, + scene_bg_color=(1, 1, 1), + vertex_colors=vertex_colors, + side_view=True, + rot_angle=SIDE_VIEW_ROTATION_DEG, + ) + * 255 + ).astype(np.uint8) + + cur_img = np.concatenate([img_bboxes, img_keypoints, img_mesh, img_mesh_side], axis=1) + + return cur_img diff --git a/demo/bmp_demo.py b/demo/bmp_demo.py deleted file mode 100644 index b360b6ebcf7da06f9b0da580be1d7bf45d44f2fa..0000000000000000000000000000000000000000 --- a/demo/bmp_demo.py +++ /dev/null @@ -1,250 +0,0 @@ -# Copyright (c) OpenMMLab. All rights reserved. -""" -BMP Demo script: sequentially runs detection, pose estimation, SAM-based mask refinement, and visualization. -Usage: - python bmp_demo.py [--output-root ] -""" - -import os -import shutil -from argparse import ArgumentParser, Namespace -from pathlib import Path - -import mmcv -import mmengine -import numpy as np -import yaml -from demo_utils import DotDict, concat_instances, create_GIF, filter_instances, pose_nms, visualize_itteration -from mm_utils import run_MMDetector, run_MMPose -from mmdet.apis import init_detector -from mmengine.logging import print_log -from mmengine.structures import InstanceData -from sam2_utils import prepare_model as prepare_sam2_model -from sam2_utils import process_image_with_SAM - -from mmpose.apis import init_model as init_pose_estimator -from mmpose.utils import adapt_mmdet_pipeline - -# Default thresholds -DEFAULT_DET_CAT_ID: int = 0 # "person" -DEFAULT_BBOX_THR: float = 0.3 -DEFAULT_NMS_THR: float = 0.3 -DEFAULT_KPT_THR: float = 0.3 - - -def parse_args() -> Namespace: - """ - Parse command-line arguments for BMP demo. - - Returns: - Namespace: Contains bmp_config (Path), input (Path), output_root (Path), device (str). - """ - parser = ArgumentParser(description="BBoxMaskPose demo") - parser.add_argument("bmp_config", type=Path, help="Path to BMP YAML config file") - parser.add_argument("input", type=Path, help="Input image file") - parser.add_argument("--output-root", type=Path, default=None, help="Directory to save outputs (default: ./outputs)") - parser.add_argument("--device", type=str, default="cuda:0", help="Device for inference (e.g., cuda:0 or cpu)") - parser.add_argument("--create-gif", action="store_true", default=False, help="Create GIF of all BMP iterations") - args = parser.parse_args() - if args.output_root is None: - args.output_root = os.path.join(Path(__file__).parent, "outputs") - return args - - -def parse_yaml_config(yaml_path: Path) -> DotDict: - """ - Load BMP configuration from a YAML file. - - Args: - yaml_path (Path): Path to YAML config. - Returns: - DotDict: Nested config dictionary. - """ - with open(yaml_path, "r") as f: - cfg = yaml.safe_load(f) - return DotDict(cfg) - - -def process_one_image( - args: Namespace, - bmp_config: DotDict, - img_path: Path, - detector: object, - detector_prime: object, - pose_estimator: object, - sam2_model: object, -) -> InstanceData: - """ - Run the full BMP pipeline on a single image: detection, pose, SAM mask refinement, and visualization. - - Args: - args (Namespace): Parsed CLI arguments. - bmp_config (DotDict): Configuration parameters. - img_path (Path): Path to the input image. - detector: Primary MMDetection model. - detector_prime: Secondary MMDetection model for iterations. - pose_estimator: MMPose model for keypoint estimation. - sam2_model: SAM model for mask refinement. - Returns: - InstanceData: Final merged detections and refined masks. - """ - # Load image - img = mmcv.imread(str(img_path), channel_order="bgr") - if img is None: - raise ValueError("Failed to read image from {}.".format(img_path)) - - # Prepare output directory - output_dir = os.path.join(args.output_root, img_path.stem) - shutil.rmtree(str(output_dir), ignore_errors=True) - mmengine.mkdir_or_exist(str(output_dir)) - - img_for_detection = img.copy() - all_detections = None - for iteration in range(bmp_config.num_bmp_iters): - print_log("BMP Iteration {}/{} started".format(iteration + 1, bmp_config.num_bmp_iters), logger="current") - - # Step 1: Detection - det_instances = run_MMDetector( - detector if iteration == 0 else detector_prime, - img_for_detection, - det_cat_id=DEFAULT_DET_CAT_ID, - bbox_thr=DEFAULT_BBOX_THR, - nms_thr=DEFAULT_NMS_THR, - ) - print_log("Detected {} instances".format(len(det_instances.bboxes)), logger="current") - if len(det_instances.bboxes) == 0: - print_log("No detections found, skipping.", logger="current") - continue - - # Step 2: Pose estimation - pose_instances = run_MMPose( - pose_estimator, - img.copy(), - detections=det_instances, - kpt_thr=DEFAULT_KPT_THR, - ) - # Restrict to first 17 COCO keypoints - pose_instances.keypoints = pose_instances.keypoints[:, :17, :] - pose_instances.keypoint_scores = pose_instances.keypoint_scores[:, :17] - pose_instances.keypoints = np.concatenate( - [pose_instances.keypoints, pose_instances.keypoint_scores[:, :, None]], axis=-1 - ) - - # Step 3: Pose-NMS and SAM refinement - all_keypoints = ( - pose_instances.keypoints - if all_detections is None - else np.concatenate([all_detections.keypoints, pose_instances.keypoints], axis=0) - ) - all_bboxes = ( - pose_instances.bboxes - if all_detections is None - else np.concatenate([all_detections.bboxes, pose_instances.bboxes], axis=0) - ) - num_valid_kpts = np.sum(all_keypoints[:, :, 2] > bmp_config.sam2.prompting.confidence_thr, axis=1) - keep_indices = pose_nms( - DotDict({"confidence_thr": bmp_config.sam2.prompting.confidence_thr, "oks_thr": bmp_config.oks_nms_thr}), - image_kpts=all_keypoints, - image_bboxes=all_bboxes, - num_valid_kpts=num_valid_kpts, - ) - keep_indices = sorted(keep_indices) # Sort by original index - num_old_detections = 0 if all_detections is None else len(all_detections.bboxes) - keep_new_indices = [i - num_old_detections for i in keep_indices if i >= num_old_detections] - keep_old_indices = [i for i in keep_indices if i < num_old_detections] - if len(keep_new_indices) == 0: - print_log("No new instances passed pose NMS, skipping SAM refinement.", logger="current") - continue - # filter new detections and compute scores - new_dets = filter_instances(pose_instances, keep_new_indices) - new_dets.scores = pose_instances.keypoint_scores[keep_new_indices].mean(axis=-1) - old_dets = None - if len(keep_old_indices) > 0: - old_dets = filter_instances(all_detections, keep_old_indices) - print_log( - "Pose NMS reduced instances to {:d} ({:d}+{:d}) instances".format( - len(new_dets.bboxes) + num_old_detections, num_old_detections, len(new_dets.bboxes) - ), - logger="current", - ) - - new_detections = process_image_with_SAM( - DotDict(bmp_config.sam2.prompting), - img.copy(), - sam2_model, - new_dets, - old_dets if old_dets is not None else None, - ) - - # Merge detections - if all_detections is None: - all_detections = new_detections - else: - all_detections = concat_instances(all_detections, new_dets) - - # Step 4: Visualization - img_for_detection = visualize_itteration( - img.copy(), - all_detections, - iteration_idx=iteration, - output_root=str(output_dir), - img_name=img_path.stem, - ) - print_log("Iteration {} completed".format(iteration + 1), logger="current") - - # Create GIF of iterations if requested - if args.create_gif: - image_file = os.path.join(output_dir, "{:s}.jpg".format(img_path.stem)) - create_GIF( - img_path=str(image_file), - output_root=str(output_dir), - bmp_x=bmp_config.num_bmp_iters, - ) - return all_detections - - -def main() -> None: - """ - Entry point for the BMP demo: loads models and processes one image. - """ - args = parse_args() - bmp_config = parse_yaml_config(args.bmp_config) - - # Ensure output root exists - mmengine.mkdir_or_exist(str(args.output_root)) - - # build detectors - detector = init_detector(bmp_config.detector.det_config, bmp_config.detector.det_checkpoint, device=args.device) - detector.cfg = adapt_mmdet_pipeline(detector.cfg) - if ( - bmp_config.detector.det_config == bmp_config.detector.det_prime_config - and bmp_config.detector.det_checkpoint == bmp_config.detector.det_prime_checkpoint - ) or (bmp_config.detector.det_prime_config is None or bmp_config.detector.det_prime_checkpoint is None): - print_log("Using the same detector as D and D'", logger="current") - detector_prime = detector - else: - detector_prime = init_detector( - bmp_config.detector.det_prime_config, bmp_config.detector.det_prime_checkpoint, device=args.device - ) - detector_prime.cfg = adapt_mmdet_pipeline(detector_prime.cfg) - print_log("Using a different detector for D'", logger="current") - - # build pose estimator - pose_estimator = init_pose_estimator( - bmp_config.pose_estimator.pose_config, - bmp_config.pose_estimator.pose_checkpoint, - device=args.device, - cfg_options=dict(model=dict(test_cfg=dict(output_heatmaps=False))), - ) - - sam2 = prepare_sam2_model( - model_cfg=bmp_config.sam2.sam2_config, - model_checkpoint=bmp_config.sam2.sam2_checkpoint, - ) - - # Run inference on one image - _ = process_one_image(args, bmp_config, args.input, detector, detector_prime, pose_estimator, sam2) - - -if __name__ == "__main__": - main() diff --git a/demo/sam2_utils.py b/demo/sam2_utils.py deleted file mode 100644 index 0b8834c68ccfcca4d020b4254fa40906c300625d..0000000000000000000000000000000000000000 --- a/demo/sam2_utils.py +++ /dev/null @@ -1,714 +0,0 @@ -""" -SAM2 utilities for BMP demo: -- Build and prepare SAM model -- Convert poses to segmentation -- Compute mask-pose consistency -""" - -from typing import Any, List, Optional, Tuple - -import numpy as np -import torch -from mmengine.structures import InstanceData -from pycocotools import mask as Mask -from sam2.build_sam import build_sam2 -from sam2.sam2_image_predictor import SAM2ImagePredictor - -# Threshold for keypoint validity in mask-pose consistency -STRICT_KPT_THRESHOLD: float = 0.5 - - -def _validate_sam_args(sam_args): - """ - Validate that all required sam_args attributes are present. - """ - required = [ - "crop", - "use_bbox", - "confidence_thr", - "ignore_small_bboxes", - "num_pos_keypoints", - "num_pos_keypoints_if_crowd", - "crowd_by_max_iou", - "batch", - "exclusive_masks", - "extend_bbox", - "pose_mask_consistency", - "visibility_thr", - ] - for param in required: - if not hasattr(sam_args, param): - raise AttributeError(f"Missing required arg {param} in sam_args") - - -def _get_max_ious(bboxes: List[np.ndarray]) -> np.ndarray: - """ - Compute maximum IoU for each bbox against others. - """ - is_crowd = [0] * len(bboxes) - ious = Mask.iou(bboxes, bboxes, is_crowd) - mat = np.array(ious) - np.fill_diagonal(mat, 0) - return mat.max(axis=1) - - -def _compute_one_mask_pose_consistency( - mask: np.ndarray, pos_keypoints: Optional[np.ndarray] = None, neg_keypoints: Optional[np.ndarray] = None -) -> float: - """ - Compute a consistency score between a mask and given keypoints. - - Args: - mask (np.ndarray): Binary mask of shape (H, W). - pos_keypoints (Optional[np.ndarray]): Positive keypoints array (N, 3). - neg_keypoints (Optional[np.ndarray]): Negative keypoints array (M, 3). - - Returns: - float: Weighted mean of positive and negative keypoint consistency. - """ - if mask is None: - return 0.0 - - def _mean_inside(points: np.ndarray) -> float: - if points.size == 0: - return 0.0 - pts_int = np.floor(points[:, :2]).astype(int) - pts_int[:, 0] = np.clip(pts_int[:, 0], 0, mask.shape[1] - 1) - pts_int[:, 1] = np.clip(pts_int[:, 1], 0, mask.shape[0] - 1) - vals = mask[pts_int[:, 1], pts_int[:, 0]] - return vals.mean() if vals.size > 0 else 0.0 - - pos_mean = 0.0 - if pos_keypoints is not None: - valid = pos_keypoints[:, 2] > STRICT_KPT_THRESHOLD - pos_mean = _mean_inside(pos_keypoints[valid]) - - neg_mean = 0.0 - if neg_keypoints is not None: - valid = neg_keypoints[:, 2] > STRICT_KPT_THRESHOLD - pts = neg_keypoints[valid][:, :2] - inside = mask[np.floor(pts[:, 1]).astype(int), np.floor(pts[:, 0]).astype(int)] - neg_mean = (~inside.astype(bool)).mean() if inside.size > 0 else 0.0 - - return 0.5 * pos_mean + 0.5 * neg_mean - - -def _select_keypoints( - args: Any, - kpts: np.ndarray, - num_visible: int, - bbox: Optional[Tuple[float, float, float, float]] = None, - method: Optional[str] = "distance+confidence", -) -> Tuple[np.ndarray, np.ndarray]: - """ - Select and order keypoints for SAM prompting based on specified method. - - Args: - args: Configuration object with selection_method and visibility_thr attributes. - kpts (np.ndarray): Keypoints array of shape (K, 3). - num_visible (int): Number of keypoints above visibility threshold. - bbox (Optional[Tuple]): Optional bbox for distance methods. - method (Optional[str]): Override selection method. - - Returns: - Tuple[np.ndarray, np.ndarray]: Selected keypoint coordinates (N,2) and confidences (N,). - - Raises: - ValueError: If an unknown method is specified. - """ - if num_visible == 0: - return kpts[:, :2], kpts[:, 2] - - methods = ["confidence", "distance", "distance+confidence", "closest"] - sel_method = method or args.selection_method - if sel_method not in methods: - raise ValueError("Unknown method for keypoint selection: {}".format(sel_method)) - - # Select at maximum keypoint from the face - facial_kpts = kpts[:3, :] - facial_conf = kpts[:3, 2] - facial_point = facial_kpts[np.argmax(facial_conf)] - if facial_point[-1] >= args.visibility_thr: - kpts = np.concatenate([facial_point[None, :], kpts[3:]], axis=0) - - conf = kpts[:, 2] - vis_mask = conf >= args.visibility_thr - coords = kpts[vis_mask, :2] - confs = conf[vis_mask] - - if sel_method == "confidence": - order = np.argsort(confs)[::-1] - coords = coords[order] - confs = confs[order] - elif sel_method == "distance": - if bbox is None: - bbox_center = np.array([coords[:, 0].mean(), coords[:, 1].mean()]) - else: - bbox_center = np.array([(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2]) - dists = np.linalg.norm(coords[:, :2] - bbox_center, axis=1) - dist_matrix = np.linalg.norm(coords[:, None, :2] - coords[None, :, :2], axis=2) - np.fill_diagonal(dist_matrix, np.inf) - min_inter_dist = np.min(dist_matrix, axis=1) - order = np.argsort(dists + 3 * min_inter_dist)[::-1] - coords = coords[order, :2] - confs = confs[order] - elif sel_method == "distance+confidence": - order = np.argsort(confs)[::-1] - confidences = kpts[order, 2] - coords = coords[order, :2] - confs = confs[order] - - dist_matrix = np.linalg.norm(coords[:, None, :2] - coords[None, :, :2], axis=2) - - selected_idx = [0] - confidences[0] = -1 - for _ in range(coords.shape[0] - 1): - min_dist = np.min(dist_matrix[:, selected_idx], axis=1) - min_dist[confidences < np.percentile(confidences, 80)] = -1 - - next_idx = np.argmax(min_dist) - selected_idx.append(next_idx) - confidences[next_idx] = -1 - - coords = coords[selected_idx] - confs = confs[selected_idx] - elif sel_method == "closest": - coords = coords[confs > STRICT_KPT_THRESHOLD, :] - confs = confs[confs > STRICT_KPT_THRESHOLD] - if bbox is None: - bbox_center = np.array([coords[:, 0].mean(), coords[:, 1].mean()]) - else: - bbox_center = np.array([(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2]) - dists = np.linalg.norm(coords[:, :2] - bbox_center, axis=1) - order = np.argsort(dists) - coords = coords[order, :2] - confs = confs[order] - - return coords, confs - - -def prepare_model(model_cfg: Any, model_checkpoint: str) -> SAM2ImagePredictor: - """ - Build and return a SAM2ImagePredictor model on the appropriate device. - - Args: - model_cfg: Configuration for SAM2 model. - model_checkpoint (str): Path to model checkpoint. - - Returns: - SAM2ImagePredictor: Initialized SAM2 image predictor. - """ - if torch.cuda.is_available(): - device = torch.device("cuda") - elif torch.backends.mps.is_available(): - device = torch.device("mps") - else: - device = torch.device("cpu") - - sam2 = build_sam2(model_cfg, model_checkpoint, device=device, apply_postprocessing=True) - model = SAM2ImagePredictor( - sam2, - max_hole_area=10.0, - max_sprinkle_area=50.0, - ) - return model - - -def _compute_mask_pose_consistency(masks: List[np.ndarray], keypoints_list: List[np.ndarray]) -> np.ndarray: - """ - Compute mask-pose consistency score for each mask-keypoints pair. - - Args: - masks (List[np.ndarray]): Binary masks list. - keypoints_list (List[np.ndarray]): List of keypoint arrays per instance. - - Returns: - np.ndarray: Consistency scores array of shape (N,). - """ - scores: List[float] = [] - for mask, kpts in zip(masks, keypoints_list): - other_kpts = np.concatenate([keypoints_list[:idx], keypoints_list[idx + 1 :]], axis=0).reshape(-1, 3) - score = _compute_one_mask_pose_consistency(mask, kpts, other_kpts) - scores.append(score) - - return np.array(scores) - - -def _pose2seg( - args: Any, - model: SAM2ImagePredictor, - bbox_xyxy: Optional[List[float]] = None, - pos_kpts: Optional[np.ndarray] = None, - neg_kpts: Optional[np.ndarray] = None, - image: Optional[np.ndarray] = None, - gt_mask: Optional[Any] = None, - num_pos_keypoints: Optional[int] = None, - gt_mask_is_binary: bool = False, -) -> Tuple[np.ndarray, np.ndarray, np.ndarray, float]: - """ - Run SAM segmentation conditioned on pose keypoints and optional ground truth mask. - - Args: - args: Configuration object with prompting settings. - model (SAM2ImagePredictor): Prepared SAM2 model. - bbox_xyxy (Optional[List[float]]): Bounding box coordinates in xyxy format. - pos_kpts (Optional[np.ndarray]): Positive keypoints array. - neg_kpts (Optional[np.ndarray]): Negative keypoints array. - image (Optional[np.ndarray]): Input image array. - gt_mask (Optional[Any]): Ground truth mask (optional). - num_pos_keypoints (Optional[int]): Number of positive keypoints to use. - gt_mask_is_binary (bool): Flag indicating if ground truth mask is binary. - - Returns: - Tuple of (mask, pos_kpts_backup, neg_kpts_backup, score). - """ - num_pos_keypoints = args.num_pos_keypoints if num_pos_keypoints is None else num_pos_keypoints - - # Filter-out un-annotated and invisible keypoints - if pos_kpts is not None: - pos_kpts = pos_kpts.reshape(-1, 3) - valid_kpts = pos_kpts[:, 2] > args.visibility_thr - - pose_bbox = np.array([pos_kpts[:, 0].min(), pos_kpts[:, 1].min(), pos_kpts[:, 0].max(), pos_kpts[:, 1].max()]) - pos_kpts, conf = _select_keypoints(args, pos_kpts, num_visible=valid_kpts.sum(), bbox=bbox_xyxy) - - pos_kpts_backup = np.concatenate([pos_kpts, conf[:, None]], axis=1) - - if pos_kpts.shape[0] > num_pos_keypoints: - pos_kpts = pos_kpts[:num_pos_keypoints, :] - pos_kpts_backup = pos_kpts_backup[:num_pos_keypoints, :] - - else: - pose_bbox = None - pos_kpts = np.empty((0, 2), dtype=np.float32) - pos_kpts_backup = np.empty((0, 3), dtype=np.float32) - - if neg_kpts is not None: - neg_kpts = neg_kpts.reshape(-1, 3) - valid_kpts = neg_kpts[:, 2] > args.visibility_thr - - neg_kpts, conf = _select_keypoints( - args, neg_kpts, num_visible=valid_kpts.sum(), bbox=bbox_xyxy, method="closest" - ) - selected_neg_kpts = neg_kpts - neg_kpts_backup = np.concatenate([neg_kpts, conf[:, None]], axis=1) - - if neg_kpts.shape[0] > args.num_neg_keypoints: - selected_neg_kpts = neg_kpts[: args.num_neg_keypoints, :] - - else: - selected_neg_kpts = np.empty((0, 2), dtype=np.float32) - neg_kpts_backup = np.empty((0, 3), dtype=np.float32) - - # Concatenate positive and negative keypoints - kpts = np.concatenate([pos_kpts, selected_neg_kpts], axis=0) - kpts_labels = np.concatenate([np.ones(pos_kpts.shape[0]), np.zeros(selected_neg_kpts.shape[0])], axis=0) - - bbox = bbox_xyxy if args.use_bbox else None - - if args.extend_bbox and not bbox is None: - # Expand the bbox such that it contains all positive keypoints - pose_bbox = np.array( - [pos_kpts[:, 0].min() - 2, pos_kpts[:, 1].min() - 2, pos_kpts[:, 0].max() + 2, pos_kpts[:, 1].max() + 2] - ) - expanded_bbox = np.array(bbox) - expanded_bbox[:2] = np.minimum(bbox[:2], pose_bbox[:2]) - expanded_bbox[2:] = np.maximum(bbox[2:], pose_bbox[2:]) - bbox = expanded_bbox - - if args.crop and args.use_bbox and image is not None: - # Crop the image to the 1.5 * bbox size - crop_bbox = np.array(bbox) - bbox_center = np.array([(crop_bbox[0] + crop_bbox[2]) / 2, (crop_bbox[1] + crop_bbox[3]) / 2]) - bbox_size = np.array([crop_bbox[2] - crop_bbox[0], crop_bbox[3] - crop_bbox[1]]) - bbox_size = 1.5 * bbox_size - crop_bbox = np.array( - [ - bbox_center[0] - bbox_size[0] / 2, - bbox_center[1] - bbox_size[1] / 2, - bbox_center[0] + bbox_size[0] / 2, - bbox_center[1] + bbox_size[1] / 2, - ] - ) - crop_bbox = np.round(crop_bbox).astype(int) - crop_bbox = np.clip(crop_bbox, 0, [image.shape[1], image.shape[0], image.shape[1], image.shape[0]]) - original_image_size = image.shape[:2] - image = image[crop_bbox[1] : crop_bbox[3], crop_bbox[0] : crop_bbox[2], :] - - # Update the keypoints - kpts = kpts - crop_bbox[:2] - bbox[:2] = bbox[:2] - crop_bbox[:2] - bbox[2:] = bbox[2:] - crop_bbox[:2] - - model.set_image(image) - - masks, scores, logits = model.predict( - point_coords=kpts, - point_labels=kpts_labels, - box=bbox, - multimask_output=False, - ) - mask = masks[0] - scores = scores[0] - - if args.crop and args.use_bbox and image is not None: - # Pad the mask to the original image size - mask_padded = np.zeros(original_image_size, dtype=np.uint8) - mask_padded[crop_bbox[1] : crop_bbox[3], crop_bbox[0] : crop_bbox[2]] = mask - mask = mask_padded - - bbox[:2] = bbox[:2] + crop_bbox[:2] - bbox[2:] = bbox[2:] + crop_bbox[:2] - - if args.pose_mask_consistency: - if gt_mask_is_binary: - gt_mask_binary = gt_mask - else: - gt_mask_binary = Mask.decode(gt_mask).astype(bool) if gt_mask is not None else None - - gt_mask_pose_consistency = _compute_one_mask_pose_consistency(gt_mask_binary, pos_kpts_backup, neg_kpts_backup) - dt_mask_pose_consistency = _compute_one_mask_pose_consistency(mask, pos_kpts_backup, neg_kpts_backup) - - tol = 0.1 - dt_is_same = np.abs(dt_mask_pose_consistency - gt_mask_pose_consistency) < tol - if dt_is_same: - mask = gt_mask_binary if gt_mask_binary.sum() < mask.sum() else mask - else: - mask = gt_mask_binary if gt_mask_pose_consistency > dt_mask_pose_consistency else mask - - return mask, pos_kpts_backup, neg_kpts_backup, scores - - -def process_image_with_SAM( - sam_args: Any, - image: np.ndarray, - model: SAM2ImagePredictor, - new_dets: InstanceData, - old_dets: Optional[InstanceData] = None, -) -> InstanceData: - """ - Wrapper that validates args and routes to single or batch processing. - """ - _validate_sam_args(sam_args) - if sam_args.batch: - return _process_image_batch(sam_args, image, model, new_dets, old_dets) - return _process_image_single(sam_args, image, model, new_dets, old_dets) - - -def _process_image_single( - sam_args: Any, - image: np.ndarray, - model: SAM2ImagePredictor, - new_dets: InstanceData, - old_dets: Optional[InstanceData] = None, -) -> InstanceData: - """ - Refine instance segmentation masks using SAM2 with pose-conditioned prompts. - - Args: - sam_args (Any): DotDict containing required SAM parameters: - crop (bool), use_bbox (bool), confidence_thr (float), - ignore_small_bboxes (bool), num_pos_keypoints (int), - num_pos_keypoints_if_crowd (int), crowd_by_max_iou (Optional[float]), - batch (bool), exclusive_masks (bool), extend_bbox (bool), pose_mask_consistency (bool). - image (np.ndarray): BGR image array of shape (H, W, 3). - model (SAM2ImagePredictor): Initialized SAM2 predictor. - new_dets (InstanceData): New detections with attributes: - bboxes, pred_masks, keypoints, bbox_scores. - old_dets (Optional[InstanceData]): Previous detections for negative prompts. - - Returns: - InstanceData: `new_dets` updated in-place with - `.refined_masks`, `.sam_scores`, and `.sam_kpts`. - """ - _validate_sam_args(sam_args) - - if not (sam_args.crop and sam_args.use_bbox): - model.set_image(image) - - # Ignore all keypoints with confidence below the threshold - new_keypoints = new_dets.keypoints.copy() - for kpts in new_keypoints: - conf_mask = kpts[:, 2] < sam_args.confidence_thr - kpts[conf_mask, :] = 0 - n_new_dets = len(new_dets.bboxes) - n_old_dets = 0 - if old_dets is not None: - n_old_dets = len(old_dets.bboxes) - old_keypoints = old_dets.keypoints.copy() - for kpts in old_keypoints: - conf_mask = kpts[:, 2] < sam_args.confidence_thr - kpts[conf_mask, :] = 0 - - all_bboxes = new_dets.bboxes.copy() - if old_dets is not None: - all_bboxes = np.concatenate([all_bboxes, old_dets.bboxes], axis=0) - - max_ious = _get_max_ious(all_bboxes) - - gt_bboxes = [] - new_dets.refined_masks = np.zeros((n_new_dets, image.shape[0], image.shape[1]), dtype=np.uint8) - new_dets.sam_scores = np.zeros_like(new_dets.bbox_scores) - new_dets.sam_kpts = np.zeros((len(new_dets.bboxes), sam_args.num_pos_keypoints, 3), dtype=np.float32) - for instance_idx in range(len(new_dets.bboxes)): - bbox_xywh = new_dets.bboxes[instance_idx] - bbox_area = bbox_xywh[2] * bbox_xywh[3] - - if sam_args.ignore_small_bboxes and bbox_area < 100 * 100: - continue - dt_mask = new_dets.pred_masks[instance_idx] if new_dets.pred_masks is not None else None - - bbox_xyxy = [bbox_xywh[0], bbox_xywh[1], bbox_xywh[0] + bbox_xywh[2], bbox_xywh[1] + bbox_xywh[3]] - gt_bboxes.append(bbox_xyxy) - this_kpts = new_keypoints[instance_idx].reshape(1, -1, 3) - other_kpts = None - if old_dets is not None: - other_kpts = old_keypoints.copy().reshape(n_old_dets, -1, 3) - if len(new_keypoints) > 1: - other_new_kpts = np.concatenate([new_keypoints[:instance_idx], new_keypoints[instance_idx + 1 :]], axis=0) - other_kpts = ( - np.concatenate([other_kpts, other_new_kpts], axis=0) if other_kpts is not None else other_new_kpts - ) - - num_pos_keypoints = sam_args.num_pos_keypoints - if sam_args.crowd_by_max_iou is not None and max_ious[instance_idx] > sam_args.crowd_by_max_iou: - bbox_xyxy = None - num_pos_keypoints = sam_args.num_pos_keypoints_if_crowd - - dt_mask, pos_kpts, neg_kpts, scores = _pose2seg( - sam_args, - model, - bbox_xyxy, - pos_kpts=this_kpts, - neg_kpts=other_kpts, - image=image if (sam_args.crop and sam_args.use_bbox) else None, - gt_mask=dt_mask, - num_pos_keypoints=num_pos_keypoints, - gt_mask_is_binary=True, - ) - - new_dets.refined_masks[instance_idx] = dt_mask - new_dets.sam_scores[instance_idx] = scores - - # If the number of positive keypoints is less than the required number, fill the rest with zeros - if len(pos_kpts) != sam_args.num_pos_keypoints: - pos_kpts = np.concatenate( - [pos_kpts, np.zeros((sam_args.num_pos_keypoints - len(pos_kpts), 3), dtype=np.float32)], axis=0 - ) - new_dets.sam_kpts[instance_idx] = pos_kpts - - n_masks = len(new_dets.refined_masks) + (len(old_dets.refined_masks) if old_dets is not None else 0) - - if sam_args.exclusive_masks and n_masks > 1: - all_masks = ( - np.concatenate([new_dets.refined_masks, old_dets.refined_masks], axis=0) - if old_dets is not None - else new_dets.refined_masks - ) - all_scores = ( - np.concatenate([new_dets.sam_scores, old_dets.sam_scores], axis=0) - if old_dets is not None - else new_dets.sam_scores - ) - refined_masks = _apply_exclusive_masks(all_masks, all_scores) - new_dets.refined_masks = refined_masks[: len(new_dets.refined_masks)] - - return new_dets - - -def _process_image_batch( - sam_args: Any, - image: np.ndarray, - model: SAM2ImagePredictor, - new_dets: InstanceData, - old_dets: Optional[InstanceData] = None, -) -> InstanceData: - """ - Batch process multiple detection instances with SAM2 refinement. - - Args: - sam_args (Any): DotDict of SAM parameters (same as `process_image_with_SAM`). - image (np.ndarray): Input BGR image. - model (SAM2ImagePredictor): Prepared SAM2 predictor. - new_dets (InstanceData): New detection instances. - old_dets (Optional[InstanceData]): Previous detections for negative prompts. - - Returns: - InstanceData: `new_dets` updated as in `process_image_with_SAM`. - """ - n_new_dets = len(new_dets.bboxes) - - model.set_image(image) - - image_kpts = [] - image_bboxes = [] - num_valid_kpts = [] - for instance_idx in range(len(new_dets.bboxes)): - - bbox_xywh = new_dets.bboxes[instance_idx].copy() - bbox_area = bbox_xywh[2] * bbox_xywh[3] - if sam_args.ignore_small_bboxes and bbox_area < 100 * 100: - continue - - this_kpts = new_dets.keypoints[instance_idx].copy().reshape(-1, 3) - kpts_vis = np.array(this_kpts[:, 2]) - visible_kpts = (kpts_vis > sam_args.visibility_thr) & (this_kpts[:, 2] > sam_args.confidence_thr) - num_visible = (visible_kpts).sum() - if num_visible <= 0: - continue - num_valid_kpts.append(num_visible) - image_bboxes.append(np.array(bbox_xywh)) - this_kpts[~visible_kpts, :2] = 0 - this_kpts[:, 2] = visible_kpts - image_kpts.append(this_kpts) - if old_dets is not None: - for instance_idx in range(len(old_dets.bboxes)): - bbox_xywh = old_dets.bboxes[instance_idx].copy() - bbox_area = bbox_xywh[2] * bbox_xywh[3] - if sam_args.ignore_small_bboxes and bbox_area < 100 * 100: - continue - this_kpts = old_dets.keypoints[instance_idx].reshape(-1, 3) - kpts_vis = np.array(this_kpts[:, 2]) - visible_kpts = (kpts_vis > sam_args.visibility_thr) & (this_kpts[:, 2] > sam_args.confidence_thr) - num_visible = (visible_kpts).sum() - if num_visible <= 0: - continue - num_valid_kpts.append(num_visible) - image_bboxes.append(np.array(bbox_xywh)) - this_kpts[~visible_kpts, :2] = 0 - this_kpts[:, 2] = visible_kpts - image_kpts.append(this_kpts) - - image_kpts = np.array(image_kpts) - image_bboxes = np.array(image_bboxes) - num_valid_kpts = np.array(num_valid_kpts) - - image_kpts_backup = image_kpts.copy() - - # Prepare keypoints such that all instances have the same number of keypoints - # First sort keypoints by their distance to the center of the bounding box - # If some are missing, duplicate the last one - prepared_kpts = [] - prepared_kpts_backup = [] - for bbox, kpts, num_visible in zip(image_bboxes, image_kpts, num_valid_kpts): - - this_kpts, this_conf = _select_keypoints(sam_args, kpts, num_visible, bbox) - - # Duplicate the last keypoint if some are missing - if this_kpts.shape[0] < num_valid_kpts.max(): - this_kpts = np.concatenate( - [this_kpts, np.tile(this_kpts[-1], (num_valid_kpts.max() - this_kpts.shape[0], 1))], axis=0 - ) - this_conf = np.concatenate( - [this_conf, np.tile(this_conf[-1], (num_valid_kpts.max() - this_conf.shape[0],))], axis=0 - ) - - prepared_kpts.append(this_kpts) - prepared_kpts_backup.append(np.concatenate([this_kpts, this_conf[:, None]], axis=1)) - image_kpts = np.array(prepared_kpts) - image_kpts_backup = np.array(prepared_kpts_backup) - kpts_labels = np.ones(image_kpts.shape[:2]) - - # Compute IoUs between all bounding boxes - max_ious = _get_max_ious(image_bboxes) - num_pos_keypoints = sam_args.num_pos_keypoints - use_bbox = sam_args.use_bbox - if sam_args.crowd_by_max_iou is not None and max_ious[instance_idx] > sam_args.crowd_by_max_iou: - use_bbox = False - num_pos_keypoints = sam_args.num_pos_keypoints_if_crowd - - # Threshold the number of positive keypoints - if num_pos_keypoints > 0 and num_pos_keypoints < image_kpts.shape[1]: - image_kpts = image_kpts[:, :num_pos_keypoints, :] - kpts_labels = kpts_labels[:, :num_pos_keypoints] - image_kpts_backup = image_kpts_backup[:, :num_pos_keypoints, :] - - elif num_pos_keypoints == 0: - image_kpts = None - kpts_labels = None - image_kpts_backup = np.empty((0, 3), dtype=np.float32) - - image_bboxes_xyxy = None - if use_bbox: - image_bboxes_xyxy = np.array(image_bboxes) - image_bboxes_xyxy[:, 2:] += image_bboxes_xyxy[:, :2] - - # Expand the bbox to include the positive keypoints - if sam_args.extend_bbox: - pose_bbox = np.stack( - [ - np.min(image_kpts[:, :, 0], axis=1) - 2, - np.min(image_kpts[:, :, 1], axis=1) - 2, - np.max(image_kpts[:, :, 0], axis=1) + 2, - np.max(image_kpts[:, :, 1], axis=1) + 2, - ], - axis=1, - ) - expanded_bbox = np.array(image_bboxes_xyxy) - expanded_bbox[:, :2] = np.minimum(expanded_bbox[:, :2], pose_bbox[:, :2]) - expanded_bbox[:, 2:] = np.maximum(expanded_bbox[:, 2:], pose_bbox[:, 2:]) - # bbox_expanded = (np.abs(expanded_bbox - image_bboxes_xyxy) > 1e-4).any(axis=1) - image_bboxes_xyxy = expanded_bbox - - # Process even old detections to get their 'negative' keypoints - masks, scores, logits = model.predict( - point_coords=image_kpts, - point_labels=kpts_labels, - box=image_bboxes_xyxy, - multimask_output=False, - ) - - # Reshape the masks to (N, C, H, W). If the model outputs (C, H, W), add a number of masks dimension - if len(masks.shape) == 3: - masks = masks[None, :, :, :] - masks = masks[:, 0, :, :] - N = masks.shape[0] - scores = scores.reshape(N) - - if sam_args.exclusive_masks and N > 1: - # Make sure the masks are non-overlapping - # If two masks overlap, set the pixel to the one with the highest score - masks = _apply_exclusive_masks(masks, scores) - - gt_masks = new_dets.pred_masks.copy() if new_dets.pred_masks is not None else None - if sam_args.pose_mask_consistency and gt_masks is not None: - # Measure 'mask-pose_conistency' by computing number of keypoints inside the mask - # Compute for both gt (if available) and predicted masks and then choose the one with higher consistency - dt_mask_pose_consistency = _compute_mask_pose_consistency(masks, image_kpts_backup) - gt_mask_pose_consistency = _compute_mask_pose_consistency(gt_masks, image_kpts_backup) - - dt_masks_area = np.array([m.sum() for m in masks]) - gt_masks_area = np.array([m.sum() for m in gt_masks]) if gt_masks is not None else np.zeros_like(dt_masks_area) - - # If PM-c is approx the same, prefer the smaller mask - tol = 0.1 - pmc_is_equal = np.isclose(dt_mask_pose_consistency, gt_mask_pose_consistency, atol=tol) - dt_is_worse = (dt_mask_pose_consistency < (gt_mask_pose_consistency - tol)) | pmc_is_equal & ( - dt_masks_area > gt_masks_area - ) - - new_masks = [] - for dt_mask, gt_mask, dt_worse in zip(masks, gt_masks, dt_is_worse): - if dt_worse: - new_masks.append(gt_mask) - else: - new_masks.append(dt_mask) - masks = np.array(new_masks) - - new_dets.refined_masks = masks[:n_new_dets] - new_dets.sam_scores = scores[:n_new_dets] - new_dets.sam_kpts = image_kpts_backup[:n_new_dets] - - return new_dets - - -def _apply_exclusive_masks(masks: np.ndarray, scores: np.ndarray) -> np.ndarray: - """ - Ensure masks are non-overlapping by keeping at each pixel the mask with the highest score. - """ - no_mask = masks.sum(axis=0) == 0 - masked_scores = masks * scores[:, None, None] - argmax_masks = np.argmax(masked_scores, axis=0) - new_masks = argmax_masks[None, :, :] == (np.arange(masks.shape[0])[:, None, None]) - new_masks[:, no_mask] = 0 - return new_masks diff --git a/demos/BMP_demo.py b/demos/BMP_demo.py new file mode 100644 index 0000000000000000000000000000000000000000..da8598e702e3174f49177d8c74a200e9c4216e11 --- /dev/null +++ b/demos/BMP_demo.py @@ -0,0 +1,149 @@ +#!/usr/bin/env python3 +# Copyright (c) authors of BBoxMaskPose (BMPv2). All rights reserved. +""" +BBoxMaskPose Demo - Demonstrate BBoxMaskPose public API usage. + +This demo shows two usage patterns: +1. BMP with internal pose model +2. BMP with externally provided PMPose model + +Usage: + python demos/BMP_demo.py --image --output +""" + +import argparse +import os +from pathlib import Path + +import cv2 +import numpy as np + +from bboxmaskpose import BBoxMaskPose +from pmpose import PMPose + + +def parse_args(): + """Parse command-line arguments.""" + parser = argparse.ArgumentParser(description="BBoxMaskPose Demo") + parser.add_argument("--image", type=str, default="demo/data/004806.jpg", help="Path to input image") + parser.add_argument("--output", type=str, default="demos/outputs/bboxmaskpose", help="Directory to save outputs") + parser.add_argument("--device", type=str, default="cuda", help="Device for inference (cuda or cpu)") + parser.add_argument("--config", type=str, default="bmp_v2", help="BMP config (bmp_D3, bmp_J1 or bmp_v2)") + parser.add_argument( + "--mode", + type=str, + default="both", + choices=["internal", "external", "both"], + help="Demo mode: internal (BMP creates pose model), external (inject PMPose), or both", + ) + return parser.parse_args() + + +def bmp_demo(image_path: str, output_dir: Path, device: str, config: str): + """ + Demo 2: User creates PMPose model and injects it into BMP. + + This pattern is useful when you want to: + - Reuse the same pose model across multiple BMP instances + - Pre-configure the pose model with custom settings + - Have fine-grained control over the pose model + """ + print("\n" + "=" * 60) + print("Demo 2: BMP with External PMPose Model") + print("=" * 60) + + # Step 1: Create PMPose model + print("\n[Step 1] Creating PMPose model...") + print(f" Device: {device}") + + pose_model = PMPose( + device=device, + variant="PMPose-b", + from_pretrained=True, + ) + print(" โœ“ PMPose model created") + + # Step 2: Inject into BBoxMaskPose + print("\n[Step 2] Initializing BBoxMaskPose with external pose model...") + print(f" Config: {config}") + + bmp_model = BBoxMaskPose( + config=config, + device=device, + pose_model=pose_model, # Inject the PMPose instance. If None, BBoxMaskPose creates a default one. + ) + print(" โœ“ BBoxMaskPose initialized with external pose model") + + # Step 3: Run pipeline + print(f"\n[Step 3] Running full BMP pipeline on: {image_path}") + result = bmp_model.predict( + image=image_path, + bboxes=None, # Run detector + return_intermediates=False, + ) + + print(" โœ“ Pipeline complete") + print(f"\n Results:") + print(f" - Detected {len(result['bboxes'])} people") + print(f" - Keypoints shape: {result['keypoints'].shape}") + print(f" - Masks shape: {result['masks'].shape}") + + # Visualize + print("\n[Step 4] Visualizing results...") + img = cv2.imread(image_path) + vis_pose = bmp_model.visualize( + image=img, + result=result, + vis_type="pose", + ) + vis_mask = bmp_model.visualize( + image=img, + result=result, + vis_type="mask", + ) + vis_img = np.hstack((vis_pose, vis_mask)) + + output_path = output_dir / f"{Path(image_path).stem}_bmp.jpg" + cv2.imwrite(str(output_path), vis_img) + print(f" โœ“ Saved to: {output_path}") + + return result + + +def main(): + """Main demo function.""" + args = parse_args() + + # Create output directory + output_dir = Path(args.output) + output_dir.mkdir(parents=True, exist_ok=True) + + # Verify image exists + if not os.path.exists(args.image): + raise FileNotFoundError(f"Image not found: {args.image}") + + print("=" * 60) + print("BBoxMaskPose Demo - Public API Usage") + print("=" * 60) + print(f"\nImage: {args.image}") + print(f"Config: {args.config}") + print(f"Device: {args.device}") + print(f"Mode: {args.mode}") + + result1 = bmp_demo( + args.image, + output_dir, + args.device, + args.config, + ) + + print("\n" + "=" * 60) + print("All demos completed successfully!") + print("=" * 60) + print(f"\nOutputs saved to: {output_dir}") + + print(result1["bboxes"]) + + +if __name__ == "__main__": + main() diff --git a/demos/BMPv2_demo.py b/demos/BMPv2_demo.py new file mode 100644 index 0000000000000000000000000000000000000000..68e8a6ba1b156f2c77db6b52d25ddb5ee4a025a8 --- /dev/null +++ b/demos/BMPv2_demo.py @@ -0,0 +1,400 @@ +#!/usr/bin/env python3 +# Copyright (c) authors of BBoxMaskPose (BMPv2). All rights reserved. +""" +BBoxMaskPose v2 Demo - Demonstrate BBoxMaskPose + SAM-3D-Body integration. + +This demo extends BMP_demo.py to additionally predict 3D human meshes +using SAM-3D-Body. The pipeline: +1. Run BMP to get bboxes, masks, and 2D poses +2. Pass masks and bboxes to SAM-3D-Body for 3D mesh recovery +3. Output and visualize all results: masks, 2D poses, and 3D meshes + +Usage: + python demos/BMPv2_demo.py --image --output + +Requirements: + - SAM-3D-Body must be installed (see installation guide) + - SAM-3D-Body checkpoints must be available (download from HuggingFace) + +Examples: + # Basic usage with default checkpoint (downloads from HuggingFace) + python demos/BMPv2_demo.py --image data/004806.jpg --device cuda + + # With local checkpoint + python demos/BMPv2_demo.py --image data/004806.jpg --device cuda \ + --sam3d_checkpoint checkpoints/sam-3d-body-dinov3/model.ckpt \ + --mhr_path checkpoints/sam-3d-body-dinov3/assets/mhr_model.pt + + # Without mask conditioning + python demos/BMPv2_demo.py --image data/004806.jpg --no_mask_conditioning +""" + +import argparse +import os +import sys +from pathlib import Path + +import cv2 +import numpy as np + +from bboxmaskpose import BBoxMaskPose +from pmpose import PMPose + + +def parse_args(): + """Parse command-line arguments.""" + parser = argparse.ArgumentParser( + description="BBoxMaskPose v2 Demo - Full Pipeline with 3D Mesh Recovery", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + # Basic usage (downloads checkpoint from HuggingFace) + python demos/BMPv2_demo.py --image data/004806.jpg --device cuda + + # With local checkpoint + python demos/BMPv2_demo.py --image data/004806.jpg --device cuda \\ + --sam3d_checkpoint checkpoints/sam-3d-body-dinov3/model.ckpt \\ + --mhr_path checkpoints/sam-3d-body-dinov3/assets/mhr_model.pt + +Note: First time usage may download models from HuggingFace (requires authentication) + """, + ) + parser.add_argument("--image", type=str, default="data/004806.jpg", help="Path to input image") + parser.add_argument("--output", type=str, default="demos/outputs/bboxmaskpose_v2", help="Directory to save outputs") + parser.add_argument("--device", type=str, default="cuda", help="Device for inference (cuda or cpu)") + parser.add_argument("--config", type=str, default="bmp_D3", help="BMP config (bmp_D3 or bmp_J1)") + + # SAM-3D-Body specific arguments + parser.add_argument( + "--sam3d_checkpoint", + type=str, + default=None, + help="Path to SAM-3D-Body checkpoint. If None, auto-downloads from HuggingFace. " + "To use local checkpoint, download to 'checkpoints/sam-3d-body/' and it will be auto-detected.", + ) + parser.add_argument( + "--mhr_path", type=str, default=None, help="Path to MHR model file. Auto-detected if checkpoint is in 'checkpoints/sam-3d-body/'." + ) + parser.add_argument( + "--no_mask_conditioning", action="store_true", help="Disable mask-conditioned 3D inference (faster but less accurate)" + ) + parser.add_argument( + "--kpts_conditioning", + action="store_true", + default=False, + help="Enable keypoints-conditioned 3D inference (faster but less accurate)", + ) + parser.add_argument("--no_fov", action="store_true", help="Disable FOV estimator (enabled by default if MoGe is installed)") + parser.add_argument( + "--inference_type", + type=str, + default="full", + choices=["full", "body", "hand"], + help="Type of 3D inference: 'full' (body+hands, slower), 'body' (body only, faster), 'hand' (hands only)", + ) + parser.add_argument("--skip_3d", action="store_true", help="Skip 3D mesh recovery (only run BMP pipeline)") + + return parser.parse_args() + + +def check_sam3d_available(): + """Check if SAM-3D-Body is installed.""" + try: + from bboxmaskpose.sam3d_utils import check_sam3d_available + + return check_sam3d_available() + except ImportError: + return False + + +def print_installation_guide(): + """Print SAM-3D-Body installation instructions.""" + print("\n" + "=" * 70) + print("SAM-3D-Body Installation Required") + print("=" * 70) + print("\nSAM-3D-Body is not installed. To use 3D mesh recovery, install it:") + print("\n1. Install core dependencies:") + print(" pip install -r requirements/sam3d.txt") + print("\n2. Install detectron2:") + print(" pip install 'git+https://github.com/facebookresearch/detectron2.git@a1ce2f9' \\") + print(" --no-build-isolation --no-deps") + print("\n3. Install MoGe (optional, for FOV estimation):") + print(" pip install git+https://github.com/microsoft/MoGe.git") + print("\n4. Clone and install SAM-3D-Body:") + print(" git clone https://github.com/facebookresearch/sam-3d-body.git") + print(" cd sam-3d-body") + print(" pip install -e .") + print("\n5. Request access to model checkpoints:") + print(" https://huggingface.co/facebook/sam-3d-body-dinov3") + print("\nFor more details, see:") + print("https://github.com/facebookresearch/sam-3d-body/blob/main/INSTALL.md") + print("=" * 70 + "\n") + + +def bmpv2_demo( + image_path: str, + output_dir: Path, + device: str, + config: str, + sam3d_checkpoint: str = None, + mhr_path: str = None, + use_mask_conditioning: bool = True, + use_kpts_conditioning: bool = False, + use_fov: bool = True, + inference_type: str = "full", + skip_3d: bool = False, +): + """ + Run BBoxMaskPose v2 demo: BMP + SAM-3D-Body pipeline. + + Args: + image_path: Path to input image. + output_dir: Output directory for results. + device: Device for inference ('cuda' or 'cpu'). + config: BMP configuration name. + sam3d_checkpoint: Path to SAM-3D-Body checkpoint. + mhr_path: Path to MHR model file. + use_mask_conditioning: Whether to use mask-conditioned 3D inference. + use_fov: Whether to use FOV estimator. + inference_type: Type of 3D inference ('full', 'body', or 'hand'). + skip_3d: If True, skip 3D mesh recovery. + """ + # Auto-detect checkpoint paths if not provided + default_checkpoint_dir = Path("checkpoints/sam-3d-body-dinov3") + if sam3d_checkpoint is None and default_checkpoint_dir.exists(): + checkpoint_file = default_checkpoint_dir / "model.ckpt" + if checkpoint_file.exists(): + sam3d_checkpoint = str(checkpoint_file) + print(f"Auto-detected SAM-3D-Body checkpoint: {sam3d_checkpoint}") + + if mhr_path is None and default_checkpoint_dir.exists(): + mhr_file = default_checkpoint_dir / "assets" / "mhr_model.pt" + if mhr_file.exists(): + mhr_path = str(mhr_file) + print(f"Auto-detected MHR model: {mhr_path}") + + print("\n" + "=" * 70) + print("BBoxMaskPose v2 Demo - Full Pipeline with 3D Mesh Recovery") + print("=" * 70) + + # ======================================================================== + # STEP 1: Run BBoxMaskPose pipeline (detection + pose + segmentation) + # ======================================================================== + print("\n[STEP 1/3] Running BBoxMaskPose Pipeline") + print("-" * 70) + + # Initialize BBoxMaskPose (creates internal pose model from config) + print(f" โ€ข Initializing BBoxMaskPose (config: {config})...") + bmp_model = BBoxMaskPose( + config=config, + device=device, + ) + print(" โœ“ BBoxMaskPose ready") + + # Run full BMP pipeline + print(f" โ€ข Running full BMP pipeline on: {image_path}") + result = bmp_model.predict( + image=image_path, + bboxes=None, # Run detector + return_intermediates=False, + ) + + num_people = len(result["bboxes"]) + print(f" โœ“ Pipeline complete - detected {num_people} people") + print(f" - Bboxes: {result['bboxes'].shape}") + print(f" - Keypoints: {result['keypoints'].shape}") + print(f" - Masks: {result['masks'].shape}") + + # Visualize BMP results + print(" โ€ข Generating BMP visualizations...") + img = cv2.imread(image_path) + vis_pose = bmp_model.visualize(image=img, result=result, vis_type="pose") + vis_mask = bmp_model.visualize(image=img, result=result, vis_type="mask") + + # Save BMP outputs + bmp_output_path = output_dir / f"{Path(image_path).stem}_bmp_pose.jpg" + cv2.imwrite(str(bmp_output_path), vis_pose) + print(f" โœ“ Saved pose visualization: {bmp_output_path}") + + bmp_mask_path = output_dir / f"{Path(image_path).stem}_bmp_mask.jpg" + cv2.imwrite(str(bmp_mask_path), vis_mask) + print(f" โœ“ Saved mask visualization: {bmp_mask_path}") + + # ======================================================================== + # STEP 2: Check if 3D mesh recovery should be run + # ======================================================================== + if skip_3d: + print("\n[STEP 2/3] Skipping 3D mesh recovery (--skip_3d flag set)") + print("\n" + "=" * 70) + print("Demo completed successfully! (BMP only)") + print("=" * 70) + return result, None + + if not check_sam3d_available(): + print("\n[STEP 2/3] SAM-3D-Body not available") + print_installation_guide() + print("Skipping 3D mesh recovery. Use --skip_3d to suppress this message.") + print("\n" + "=" * 70) + print("Demo completed successfully! (BMP only)") + print("=" * 70) + return result, None + + # ======================================================================== + # STEP 3: Run SAM-3D-Body for 3D mesh recovery + # ======================================================================== + print("\n[STEP 2/3] Initializing SAM-3D-Body") + print("-" * 70) + + from bboxmaskpose.sam3d_utils import SAM3DBodyWrapper, visualize_3d_meshes + + try: + print(" โ€ข Loading SAM-3D-Body model...") + sam3d = SAM3DBodyWrapper( + checkpoint_path=sam3d_checkpoint, + mhr_path=mhr_path, + device=device, + use_detector=False, # We already have detections from BMP + use_segmentor=False, # We already have masks from BMP + use_fov=use_fov, + ) + print(" โœ“ SAM-3D-Body ready") + except Exception as e: + # Traceback e + import traceback + + traceback.print_exc() + + print(f"\n โœ— Error loading SAM-3D-Body: {e}") + print(" \nPlease check:") + print(" 1. SAM-3D-Body is installed correctly") + print(" 2. You have HuggingFace access to facebook/sam-3d-body-dinov3") + print(" 3. You are authenticated with HuggingFace (huggingface-cli login)") + print("\nSkipping 3D mesh recovery...") + return result, None + + print("\n[STEP 3/3] Running 3D Mesh Recovery") + print("-" * 70) + + # Prepare inputs for SAM-3D-Body + bboxes = result["bboxes"] # (N, 4) in [x1, y1, x2, y2] format + masks = result["masks"] # (N, H, W) binary masks + keypoints = result["keypoints"] # (N, 17, 2) keypoints + keypoints = keypoints[:, :17, :] # Ensure only COCO keypoints are used (if more are present) + + print(f" โ€ข Running SAM-3D-Body on {len(bboxes)} detected people...") + print(f" - Using mask conditioning: {use_mask_conditioning}") + print(f" - Using keypoints conditioning: {use_kpts_conditioning}") + print(f" - Inference type: {inference_type}") + + # Run 3D inference + outputs_3d = sam3d.predict( + image=image_path, + bboxes=bboxes, + masks=masks if use_mask_conditioning else None, + keypoints=keypoints if use_kpts_conditioning else None, + use_mask=use_mask_conditioning, + inference_type=inference_type, + ) + + print(f" โœ“ 3D mesh recovery complete") + print(f" - Generated {len(outputs_3d)} 3D meshes") + + # Visualize 3D meshes + print(" โ€ข Generating 3D mesh visualization...") + vis_3d_path = output_dir / f"{Path(image_path).stem}_3d_mesh.jpg" + vis_3d = visualize_3d_meshes( + image=img, + outputs=outputs_3d, + faces=sam3d.faces, + masks=masks if use_mask_conditioning else None, + keypoints=keypoints if use_kpts_conditioning else None, + output_path=str(vis_3d_path), + ) + print(f" โœ“ Saved 3D visualization: {vis_3d_path}") + + # Create combined visualization (BMP + 3D) + print(" โ€ข Creating combined visualization...") + + # Ensure all images have the same height for horizontal stacking + target_height = vis_pose.shape[0] + + # Resize vis_mask if needed + if vis_mask.shape[0] != target_height: + aspect_ratio = vis_mask.shape[1] / vis_mask.shape[0] + target_width = int(target_height * aspect_ratio) + vis_mask = cv2.resize(vis_mask, (target_width, target_height)) + + # Resize vis_3d if needed + if vis_3d.shape[0] != target_height: + aspect_ratio = vis_3d.shape[1] / vis_3d.shape[0] + target_width = int(target_height * aspect_ratio) + vis_3d = cv2.resize(vis_3d, (target_width, target_height)) + + combined = np.hstack((vis_pose, vis_mask, vis_3d)) + combined_path = output_dir / f"{Path(image_path).stem}_combined.jpg" + cv2.imwrite(str(combined_path), combined) + print(f" โœ“ Saved combined visualization: {combined_path}") + + return result, outputs_3d + + +def main(): + """Main demo function.""" + args = parse_args() + + # Create output directory + output_dir = Path(args.output) + output_dir.mkdir(parents=True, exist_ok=True) + + # Verify image exists + if not os.path.exists(args.image): + raise FileNotFoundError(f"Image not found: {args.image}") + + print("\n" + "=" * 70) + print("BBoxMaskPose v2 Demo Configuration") + print("=" * 70) + print(f"Image: {args.image}") + print(f"Output directory: {args.output}") + print(f"Device: {args.device}") + print(f"BMP config: {args.config}") + print(f"SAM-3D checkpoint: {args.sam3d_checkpoint or 'Auto-detect or HuggingFace'}") + print(f"Mask conditioning: {not args.no_mask_conditioning}") + print(f"Keypoints conditioning: {args.kpts_conditioning}") + print(f"FOV estimation: {not args.no_fov}") + print(f"Inference type: {args.inference_type}") + print(f"Skip 3D: {args.skip_3d}") + + # Run demo + result_bmp, result_3d = bmpv2_demo( + image_path=args.image, + output_dir=output_dir, + device=args.device, + config=args.config, + sam3d_checkpoint=args.sam3d_checkpoint, + mhr_path=args.mhr_path, + use_mask_conditioning=not args.no_mask_conditioning, + use_kpts_conditioning=args.kpts_conditioning, + use_fov=not args.no_fov, + inference_type=args.inference_type, + skip_3d=args.skip_3d, + ) + + # Print summary + print("\n" + "=" * 70) + print("Demo Completed Successfully!") + print("=" * 70) + print(f"\nOutputs saved to: {output_dir}") + print(f" โ€ข BMP pose visualization: {Path(args.image).stem}_bmp_pose.jpg") + print(f" โ€ข BMP mask visualization: {Path(args.image).stem}_bmp_mask.jpg") + if result_3d is not None: + print(f" โ€ข 3D mesh visualization: {Path(args.image).stem}_3d_mesh.jpg") + print(f" โ€ข Combined visualization: {Path(args.image).stem}_combined.jpg") + + print(f"\nDetected {len(result_bmp['bboxes'])} people in the image") + if result_3d is not None: + print(f"Recovered {len(result_3d)} 3D meshes") + print() + + +if __name__ == "__main__": + main() diff --git a/demos/PMPose_demo.py b/demos/PMPose_demo.py new file mode 100644 index 0000000000000000000000000000000000000000..4f164a1dec395c33f6f6dbade4ad67b301200e19 --- /dev/null +++ b/demos/PMPose_demo.py @@ -0,0 +1,187 @@ +#!/usr/bin/env python3 +# Copyright (c) authors of BBoxMaskPose (BMPv2). All rights reserved. +""" +PMPose Demo - Demonstrate PMPose public API usage. + +This demo shows how to use the PMPose wrapper API for pose estimation +given an image and bounding boxes. + +Usage: + python demos/PMPose_demo.py --image --output +""" + +import argparse +import os +from pathlib import Path + +import cv2 +import numpy as np + +from pmpose import PMPose + + +def parse_args(): + """Parse command-line arguments.""" + parser = argparse.ArgumentParser(description="PMPose Demo") + parser.add_argument("--image", type=str, default="demo/data/004806.jpg", help="Path to input image") + parser.add_argument("--output", type=str, default="demos/outputs/pmpose", help="Directory to save outputs") + parser.add_argument("--device", type=str, default="cuda", help="Device for inference (cuda or cpu)") + parser.add_argument("--bboxes", type=str, default=None, help="Comma-separated bbox coords: x1,y1,x2,y2;x1,y1,x2,y2;...") + return parser.parse_args() + + +def parse_bboxes(bbox_str: str): + """Parse bbox string into numpy array.""" + if bbox_str is None: + return None + + bboxes = [] + for bbox in bbox_str.split(";"): + coords = [float(x) for x in bbox.split(",")] + if len(coords) != 4: + raise ValueError(f"Invalid bbox format: {bbox}") + bboxes.append(coords) + + return np.array(bboxes, dtype=np.float32) + + +def get_default_bboxes(image_path: str): + """ + Get some default bboxes for demo purposes. + + For the OCHuman 004806.jpg image, we use pre-defined bboxes + that cover the people in the image. + """ + # These are approximate bboxes for the people in demo/data/004806.jpg + # You can adjust these for your specific image + if "004806" in image_path: + # OCHuman image with multiple people + return np.array( + [ + [1.343687, 55.028114, 530.4726, 863.68], + [196.49245, 48.729275, 528.9763, 832.8075], + ], + dtype=np.float32, + ) + else: + # Generic full-image bbox + img = cv2.imread(image_path) + if img is None: + raise ValueError(f"Cannot read image: {image_path}") + h, w = img.shape[:2] + return np.array([[0, 0, w, h]], dtype=np.float32) + + +def main(): + """Main demo function.""" + args = parse_args() + + # Create output directory + output_dir = Path(args.output) + output_dir.mkdir(parents=True, exist_ok=True) + + print("=" * 60) + print("PMPose Demo - Public API Usage") + print("=" * 60) + + # Step 1: Initialize PMPose model + print("\n[Step 1] Initializing PMPose model...") + print(f" Device: {args.device}") + + pose_model = PMPose( + device=args.device, + variant="PMPose-b", + from_pretrained=True, + ) + print(" โœ“ Model initialized successfully") + + # Step 2: Load image and prepare bboxes + print(f"\n[Step 2] Loading image: {args.image}") + if not os.path.exists(args.image): + raise FileNotFoundError(f"Image not found: {args.image}") + + img = cv2.imread(args.image) + if img is None: + raise ValueError(f"Failed to read image: {args.image}") + + h, w = img.shape[:2] + print(f" Image size: {w}x{h}") + + # Get bboxes + if args.bboxes: + bboxes = parse_bboxes(args.bboxes) + print(f" Using provided bboxes: {len(bboxes)} boxes") + else: + bboxes = get_default_bboxes(args.image) + print(f" Using default bboxes: {len(bboxes)} boxes") + + # Create segmentation masks (polygon format) as MaskPose requires them + # Create dummy masks -- all ones covering the bbox area + masks_binary = np.ones((bboxes.shape[0], h, w), dtype=np.uint8) * 255 + masks_polygon = [] + for i in range(bboxes.shape[0]): + x1, y1, x2, y2 = bboxes[i] + polygon = np.array([[x1, y1], [x2, y1], [x2, y2], [x1, y2]], dtype=np.float32) + masks_polygon.append(polygon) + + print(f" Bboxes shape: {bboxes.shape}") + + # Step 3: Run pose estimation + print("\n[Step 3] Running pose estimation...") + keypoints, presence, visibility, heatmaps = pose_model.predict( + image=img, + bboxes=bboxes, + masks=masks_polygon, + return_probmaps=False, + ) + + print(" โœ“ Pose estimation complete") + print(f" Keypoints shape: {keypoints.shape}") + print(f" Presence shape: {presence.shape}") + print(f" Visibility shape: {visibility.shape}") + + # Print some statistics + num_people = keypoints.shape[0] + num_keypoints = keypoints.shape[1] + avg_scores = keypoints[:, :, 2].mean(axis=1) + + print(f"\n Results:") + print(f" - Detected {num_people} people") + print(f" - {num_keypoints} keypoints per person") + for i, score in enumerate(avg_scores): + print(f" - Person {i+1}: avg confidence = {score:.3f}") + + # Step 4: Visualize results + print("\n[Step 4] Visualizing results...") + vis_img = pose_model.visualize( + image=img, + keypoints=keypoints, + bboxes=bboxes, + save_path=None, + ) + + # Save visualization + output_path = output_dir / f"{Path(args.image).stem}_pmpose.jpg" + cv2.imwrite(str(output_path), vis_img) + print(f" โœ“ Visualization saved to: {output_path}") + + # Save keypoints as numpy file + keypoints_path = output_dir / f"{Path(args.image).stem}_keypoints.npy" + np.save( + str(keypoints_path), + { + "keypoints": keypoints, + "presence": presence, + "visibility": visibility, + "bboxes": bboxes, + }, + ) + print(f" โœ“ Keypoints saved to: {keypoints_path}") + + print("\n" + "=" * 60) + print("Demo completed successfully!") + print("=" * 60) + + +if __name__ == "__main__": + main() diff --git a/demos/quickstart.ipynb b/demos/quickstart.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..ff63da120120c17bf5140a086b9459b2711a9506 --- /dev/null +++ b/demos/quickstart.ipynb @@ -0,0 +1,353 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# BBoxMaskPose Quickstart\n", + "\n", + "This notebook demonstrates the public API for both PMPose and BBoxMaskPose.\n", + "\n", + "## Installation Reminder\n", + "\n", + "Before running this notebook, ensure you have installed BBoxMaskPose:\n", + "\n", + "```bash\n", + "# Install dependencies\n", + "pip install -U openmim\n", + "mim install mmengine \"mmcv==2.1.0\" \"mmdet==3.3.0\" \"mmpretrain==1.2.0\"\n", + "pip install -r requirements.txt\n", + "\n", + "# Install in editable mode\n", + "pip install -e .\n", + "\n", + "# Download SAM2 weights\n", + "bash models/SAM/download_ckpts.sh\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 1: PMPose - Pose Estimation with Bounding Boxes\n", + "\n", + "PMPose (currently MaskPose) performs pose estimation given an image and bounding boxes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import cv2\n", + "import numpy as np\n", + "from pathlib import Path\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Import PMPose public API\n", + "from pmpose import PMPose" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize PMPose Model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize PMPose with pretrained weights\n", + "pose_model = PMPose(\n", + " device=\"cuda\", # Use 'cpu' if no GPU available\n", + " variant=\"default\", # Default MaskPose-b model\n", + " from_pretrained=True, # Download pretrained weights\n", + ")\n", + "\n", + "print(\"โœ“ PMPose model initialized\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load Image and Define Bounding Boxes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load example image\n", + "image_path = \"demo/data/004806.jpg\"\n", + "img = cv2.imread(image_path)\n", + "img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n", + "\n", + "# Display image\n", + "plt.figure(figsize=(12, 8))\n", + "plt.imshow(img_rgb)\n", + "plt.title(\"Input Image\")\n", + "plt.axis('off')\n", + "plt.show()\n", + "\n", + "# Define bounding boxes for people in the image [x1, y1, x2, y2]\n", + "bboxes = np.array([\n", + " [180, 100, 380, 500], # Person 1\n", + " [350, 150, 550, 500], # Person 2\n", + " [500, 120, 700, 480], # Person 3\n", + "], dtype=np.float32)\n", + "\n", + "print(f\"Image shape: {img.shape}\")\n", + "print(f\"Number of bboxes: {len(bboxes)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run Pose Estimation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Run pose estimation\n", + "keypoints, presence, visibility, heatmaps = pose_model.predict(\n", + " image=img,\n", + " bboxes=bboxes,\n", + " masks=None, # Optional instance masks\n", + " return_probmaps=False, # Set True to get heatmaps\n", + ")\n", + "\n", + "print(f\"Keypoints shape: {keypoints.shape}\") # (N, K, 3) - [x, y, score]\n", + "print(f\"Presence shape: {presence.shape}\") # (N, K) - presence probability\n", + "print(f\"Visibility shape: {visibility.shape}\") # (N, K) - visibility flag\n", + "\n", + "# Print average confidence per person\n", + "for i in range(len(keypoints)):\n", + " avg_conf = keypoints[i, :, 2].mean()\n", + " print(f\"Person {i+1}: average confidence = {avg_conf:.3f}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize Results" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Visualize pose estimation results\n", + "vis_img = pose_model.visualize(\n", + " image=img,\n", + " keypoints=keypoints,\n", + " bboxes=bboxes,\n", + ")\n", + "\n", + "# Display\n", + "vis_img_rgb = cv2.cvtColor(vis_img, cv2.COLOR_BGR2RGB)\n", + "plt.figure(figsize=(12, 8))\n", + "plt.imshow(vis_img_rgb)\n", + "plt.title(\"PMPose Results\")\n", + "plt.axis('off')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part 2: BBoxMaskPose - Full Detection + Pose + Segmentation Pipeline\n", + "\n", + "BBoxMaskPose runs the complete pipeline: detection โ†’ pose estimation โ†’ SAM refinement." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Import BBoxMaskPose public API\n", + "from bboxmaskpose import BBoxMaskPose" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Option 1: BMP with Internal Pose Model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize BBoxMaskPose (creates internal PMPose)\n", + "bmp_model = BBoxMaskPose(\n", + " config=\"BMP_D3\", # BMP configuration\n", + " device=\"cuda\", # Use 'cpu' if no GPU\n", + " pose_model=None, # Let BMP create internal model\n", + ")\n", + "\n", + "print(\"โœ“ BBoxMaskPose initialized with internal pose model\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run Full Pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Run full BMP pipeline\n", + "result = bmp_model.predict(\n", + " image=image_path,\n", + " bboxes=None, # Let detector find bboxes\n", + " return_intermediates=False,\n", + ")\n", + "\n", + "print(f\"Detected {len(result['bboxes'])} people\")\n", + "print(f\"Keypoints shape: {result['keypoints'].shape}\")\n", + "print(f\"Masks shape: {result['masks'].shape}\")\n", + "print(f\"Presence shape: {result['presence'].shape}\")\n", + "print(f\"Visibility shape: {result['visibility'].shape}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize BMP Results" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Visualize complete results\n", + "vis_img = bmp_model.visualize(\n", + " image=image_path,\n", + " result=result,\n", + ")\n", + "\n", + "# Display\n", + "vis_img_rgb = cv2.cvtColor(vis_img, cv2.COLOR_BGR2RGB)\n", + "plt.figure(figsize=(12, 8))\n", + "plt.imshow(vis_img_rgb)\n", + "plt.title(\"BBoxMaskPose Results\")\n", + "plt.axis('off')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Option 2: BMP with External PMPose Model\n", + "\n", + "You can also create a PMPose model separately and inject it into BMP." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create PMPose model first\n", + "my_pose_model = PMPose(\n", + " device=\"cuda\",\n", + " variant=\"default\",\n", + " from_pretrained=True,\n", + ")\n", + "\n", + "# Inject into BBoxMaskPose\n", + "bmp_model_with_external_pose = BBoxMaskPose(\n", + " config=\"BMP_D3\",\n", + " device=\"cuda\",\n", + " pose_model=my_pose_model, # Use our PMPose instance\n", + ")\n", + "\n", + "print(\"โœ“ BBoxMaskPose initialized with external pose model\")\n", + "\n", + "# Run pipeline (same as before)\n", + "result2 = bmp_model_with_external_pose.predict(\n", + " image=image_path,\n", + " bboxes=None,\n", + ")\n", + "\n", + "print(f\"Detected {len(result2['bboxes'])} people\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "This notebook demonstrated:\n", + "\n", + "1. **PMPose API**: Pose estimation with bounding boxes\n", + " - `PMPose()` initialization\n", + " - `predict()` for inference\n", + " - `visualize()` for visualization\n", + "\n", + "2. **BBoxMaskPose API**: Full detection + pose + segmentation pipeline\n", + " - Internal pose model creation\n", + " - External pose model injection\n", + " - `predict()` for full pipeline\n", + " - `visualize()` for results\n", + "\n", + "Both APIs provide stable, easy-to-use interfaces while maintaining backward compatibility with the underlying MaskPose model." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.0" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/mmpose/__init__.py b/mmpose/__init__.py index dda49513faf22bf632da4f03ce57f175c3d7f853..7a7e44a1f0d5d52db853149d5a8b97bd5475ad68 100644 --- a/mmpose/__init__.py +++ b/mmpose/__init__.py @@ -5,23 +5,22 @@ from mmengine.utils import digit_version from .version import __version__, short_version -mmcv_minimum_version = '2.0.0rc4' -mmcv_maximum_version = '2.3.0' +mmcv_minimum_version = "2.0.0rc4" +mmcv_maximum_version = "2.3.0" mmcv_version = digit_version(mmcv.__version__) -mmengine_minimum_version = '0.6.0' -mmengine_maximum_version = '1.0.0' +mmengine_minimum_version = "0.6.0" +mmengine_maximum_version = "1.0.0" mmengine_version = digit_version(mmengine.__version__) -assert (mmcv_version >= digit_version(mmcv_minimum_version) - and mmcv_version <= digit_version(mmcv_maximum_version)), \ - f'MMCV=={mmcv.__version__} is used but incompatible. ' \ - f'Please install mmcv>={mmcv_minimum_version}, <={mmcv_maximum_version}.' +assert mmcv_version >= digit_version(mmcv_minimum_version) and mmcv_version <= digit_version(mmcv_maximum_version), ( + f"MMCV=={mmcv.__version__} is used but incompatible. " f"Please install mmcv>={mmcv_minimum_version}, <={mmcv_maximum_version}." +) -assert (mmengine_version >= digit_version(mmengine_minimum_version) - and mmengine_version <= digit_version(mmengine_maximum_version)), \ - f'MMEngine=={mmengine.__version__} is used but incompatible. ' \ - f'Please install mmengine>={mmengine_minimum_version}, ' \ - f'<={mmengine_maximum_version}.' +assert mmengine_version >= digit_version(mmengine_minimum_version) and mmengine_version <= digit_version(mmengine_maximum_version), ( + f"MMEngine=={mmengine.__version__} is used but incompatible. " + f"Please install mmengine>={mmengine_minimum_version}, " + f"<={mmengine_maximum_version}." +) -__all__ = ['__version__', 'short_version'] +__all__ = ["__version__", "short_version"] diff --git a/mmpose/apis/__init__.py b/mmpose/apis/__init__.py index 322ee9cf73d9fe7c796d5f47093f4c0a94b623fd..309f9d10aa3cd8de9357b03e11400dcc3da8577e 100644 --- a/mmpose/apis/__init__.py +++ b/mmpose/apis/__init__.py @@ -1,16 +1,23 @@ # Copyright (c) OpenMMLab. All rights reserved. -from .inference import (collect_multi_frames, inference_bottomup, - inference_topdown, init_model) -from .inference_3d import (collate_pose_sequence, convert_keypoint_definition, - extract_pose_sequence, inference_pose_lifter_model) +from .inference import collect_multi_frames, inference_bottomup, inference_topdown, init_model +from .inference_3d import collate_pose_sequence, convert_keypoint_definition, extract_pose_sequence, inference_pose_lifter_model from .inference_tracking import _compute_iou, _track_by_iou, _track_by_oks from .inferencers import MMPoseInferencer, Pose2DInferencer from .visualization import visualize __all__ = [ - 'init_model', 'inference_topdown', 'inference_bottomup', - 'collect_multi_frames', 'Pose2DInferencer', 'MMPoseInferencer', - '_track_by_iou', '_track_by_oks', '_compute_iou', - 'inference_pose_lifter_model', 'extract_pose_sequence', - 'convert_keypoint_definition', 'collate_pose_sequence', 'visualize' + "init_model", + "inference_topdown", + "inference_bottomup", + "collect_multi_frames", + "Pose2DInferencer", + "MMPoseInferencer", + "_track_by_iou", + "_track_by_oks", + "_compute_iou", + "inference_pose_lifter_model", + "extract_pose_sequence", + "convert_keypoint_definition", + "collate_pose_sequence", + "visualize", ] diff --git a/mmpose/apis/inference.py b/mmpose/apis/inference.py index e88ea6dfb3fccd2a6d4faf7be424be2556c353f0..087da9a5e16cc60cce3afe34afe35d2029e6bfb9 100644 --- a/mmpose/apis/inference.py +++ b/mmpose/apis/inference.py @@ -3,6 +3,7 @@ import warnings from pathlib import Path from typing import List, Optional, Union +import cv2 import numpy as np import torch import torch.nn as nn @@ -18,10 +19,8 @@ from mmpose.models.builder import build_pose_estimator from mmpose.structures import PoseDataSample from mmpose.structures.bbox import bbox_xywh2xyxy -import cv2 -def dataset_meta_from_config(config: Config, - dataset_mode: str = 'train') -> Optional[dict]: +def dataset_meta_from_config(config: Config, dataset_mode: str = "train") -> Optional[dict]: """Get dataset metainfo from the model config. Args: @@ -37,25 +36,22 @@ def dataset_meta_from_config(config: Config, Return ``None`` if failing to get dataset metainfo from the config. """ try: - if dataset_mode == 'train': + if dataset_mode == "train": dataset_cfg = config.train_dataloader.dataset - elif dataset_mode == 'val': + elif dataset_mode == "val": dataset_cfg = config.val_dataloader.dataset - elif dataset_mode == 'test': + elif dataset_mode == "test": dataset_cfg = config.test_dataloader.dataset else: - raise ValueError( - f'Invalid dataset {dataset_mode} to get metainfo. ' - 'Should be one of "train", "val", or "test".') + raise ValueError(f"Invalid dataset {dataset_mode} to get metainfo. " 'Should be one of "train", "val", or "test".') - if 'metainfo' in dataset_cfg: + if "metainfo" in dataset_cfg: metainfo = dataset_cfg.metainfo else: import mmpose.datasets.datasets # noqa: F401, F403 from mmpose.registry import DATASETS - dataset_class = dataset_cfg.type if isinstance( - dataset_cfg.type, type) else DATASETS.get(dataset_cfg.type) + dataset_class = dataset_cfg.type if isinstance(dataset_cfg.type, type) else DATASETS.get(dataset_cfg.type) metainfo = dataset_class.METAINFO metainfo = parse_pose_metainfo(metainfo) @@ -66,10 +62,9 @@ def dataset_meta_from_config(config: Config, return metainfo -def init_model(config: Union[str, Path, Config], - checkpoint: Optional[str] = None, - device: str = 'cuda:0', - cfg_options: Optional[dict] = None) -> nn.Module: +def init_model( + config: Union[str, Path, Config], checkpoint: Optional[str] = None, device: str = "cuda:0", cfg_options: Optional[dict] = None +) -> nn.Module: """Initialize a pose estimator from a config file. Args: @@ -89,16 +84,15 @@ def init_model(config: Union[str, Path, Config], if isinstance(config, (str, Path)): config = Config.fromfile(config) elif not isinstance(config, Config): - raise TypeError('config must be a filename or Config object, ' - f'but got {type(config)}') + raise TypeError("config must be a filename or Config object, " f"but got {type(config)}") if cfg_options is not None: config.merge_from_dict(cfg_options) - elif 'init_cfg' in config.model.backbone: + elif "init_cfg" in config.model.backbone: config.model.backbone.init_cfg = None config.model.train_cfg = None # register all modules in mmpose into the registries - scope = config.get('default_scope', 'mmpose') + scope = config.get("default_scope", "mmpose") if scope is not None: init_default_scope(scope) @@ -108,21 +102,19 @@ def init_model(config: Union[str, Path, Config], dataset_meta = None if checkpoint is not None: - ckpt = load_checkpoint(model, checkpoint, map_location='cpu') + ckpt = load_checkpoint(model, checkpoint, map_location="cpu") - if 'dataset_meta' in ckpt.get('meta', {}): + if "dataset_meta" in ckpt.get("meta", {}): # checkpoint from mmpose 1.x - dataset_meta = ckpt['meta']['dataset_meta'] + dataset_meta = ckpt["meta"]["dataset_meta"] if dataset_meta is None: - dataset_meta = dataset_meta_from_config(config, dataset_mode='train') + dataset_meta = dataset_meta_from_config(config, dataset_mode="train") if dataset_meta is None: - warnings.simplefilter('once') - warnings.warn('Can not load dataset_meta from the checkpoint or the ' - 'model config. Use COCO metainfo by default.') - dataset_meta = parse_pose_metainfo( - dict(from_file='configs/_base_/datasets/coco.py')) + warnings.simplefilter("once") + warnings.warn("Can not load dataset_meta from the checkpoint or the " "model config. Use COCO metainfo by default.") + dataset_meta = parse_pose_metainfo(dict(from_file="configs/_base_/datasets/coco.py")) model.dataset_meta = dataset_meta @@ -132,11 +124,13 @@ def init_model(config: Union[str, Path, Config], return model -def inference_topdown(model: nn.Module, - img: Union[np.ndarray, str], - bboxes: Optional[Union[List, np.ndarray]] = None, - masks: Optional[Union[List, np.ndarray]] = None, - bbox_format: str = 'xyxy') -> List[PoseDataSample]: +def inference_topdown( + model: nn.Module, + img: Union[np.ndarray, str], + bboxes: Optional[Union[List, np.ndarray]] = None, + masks: Optional[Union[List, np.ndarray]] = None, + bbox_format: str = "xyxy", +) -> List[PoseDataSample]: """Inference image with a top-down pose estimator. Args: @@ -154,7 +148,7 @@ def inference_topdown(model: nn.Module, ``data_sample.pred_instances.keypoints`` and ``data_sample.pred_instances.keypoint_scores``. """ - scope = model.cfg.get('default_scope', 'mmpose') + scope = model.cfg.get("default_scope", "mmpose") if scope is not None: init_default_scope(scope) pipeline = Compose(model.cfg.test_dataloader.dataset.pipeline) @@ -171,23 +165,21 @@ def inference_topdown(model: nn.Module, if isinstance(bboxes, list): bboxes = np.array(bboxes) - assert bbox_format in {'xyxy', 'xywh'}, \ - f'Invalid bbox_format "{bbox_format}".' + assert bbox_format in {"xyxy", "xywh"}, f'Invalid bbox_format "{bbox_format}".' - if bbox_format == 'xywh': + if bbox_format == "xywh": bboxes = bbox_xywh2xyxy(bboxes) if masks is None or len(masks) == 0: - masks = np.zeros((bboxes.shape[0], img.shape[0], img.shape[1]), - dtype=np.uint8) - + masks = np.zeros((bboxes.shape[0], img.shape[0], img.shape[1]), dtype=np.uint8) + # Masks are expected in polygon format poly_masks = [] for mask in masks: if np.sum(mask) == 0: poly_masks.append(None) else: - contours, _ = cv2.findContours((mask*255).astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) + contours, _ = cv2.findContours((mask * 255).astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) polygons = [contour.flatten() for contour in contours if len(contour) > 3] poly_masks.append(polygons if polygons else None) @@ -198,9 +190,9 @@ def inference_topdown(model: nn.Module, data_info = dict(img_path=img) else: data_info = dict(img=img) - data_info['bbox'] = bbox[None] # shape (1, 4) - data_info['segmentation'] = pmask - data_info['bbox_score'] = np.ones(1, dtype=np.float32) # shape (1,) + data_info["bbox"] = bbox[None] # shape (1, 4) + data_info["segmentation"] = pmask + data_info["bbox_score"] = np.ones(1, dtype=np.float32) # shape (1,) data_info.update(model.dataset_meta) data_list.append(pipeline(data_info)) diff --git a/mmpose/apis/inference_3d.py b/mmpose/apis/inference_3d.py index b4151e804a593da7cb5355ece804924ccbd7f0b0..ff1b8a35e3e0363d55faba662d8c0e9bbe75b56c 100644 --- a/mmpose/apis/inference_3d.py +++ b/mmpose/apis/inference_3d.py @@ -8,8 +8,7 @@ from mmengine.structures import InstanceData from mmpose.structures import PoseDataSample -def convert_keypoint_definition(keypoints, pose_det_dataset, - pose_lift_dataset): +def convert_keypoint_definition(keypoints, pose_det_dataset, pose_lift_dataset): """Convert pose det dataset keypoints definition to pose lifter dataset keypoints definition, so that they are compatible with the definitions required for 3D pose lifting. @@ -22,69 +21,56 @@ def convert_keypoint_definition(keypoints, pose_det_dataset, Returns: ndarray[K, 2 or 3]: the transformed 2D keypoints. """ - assert pose_lift_dataset in [ - 'h36m', 'h3wb'], '`pose_lift_dataset` should be ' \ - f'`h36m`, but got {pose_lift_dataset}.' - - keypoints_new = np.zeros((keypoints.shape[0], 17, keypoints.shape[2]), - dtype=keypoints.dtype) - if pose_lift_dataset in ['h36m', 'h3wb']: - if pose_det_dataset in ['h36m', 'coco_wholebody']: + assert pose_lift_dataset in ["h36m", "h3wb"], "`pose_lift_dataset` should be " f"`h36m`, but got {pose_lift_dataset}." + + keypoints_new = np.zeros((keypoints.shape[0], 17, keypoints.shape[2]), dtype=keypoints.dtype) + if pose_lift_dataset in ["h36m", "h3wb"]: + if pose_det_dataset in ["h36m", "coco_wholebody"]: keypoints_new = keypoints - elif pose_det_dataset in ['coco', 'posetrack18']: + elif pose_det_dataset in ["coco", "posetrack18"]: # pelvis (root) is in the middle of l_hip and r_hip keypoints_new[:, 0] = (keypoints[:, 11] + keypoints[:, 12]) / 2 # thorax is in the middle of l_shoulder and r_shoulder keypoints_new[:, 8] = (keypoints[:, 5] + keypoints[:, 6]) / 2 # spine is in the middle of thorax and pelvis - keypoints_new[:, - 7] = (keypoints_new[:, 0] + keypoints_new[:, 8]) / 2 + keypoints_new[:, 7] = (keypoints_new[:, 0] + keypoints_new[:, 8]) / 2 # in COCO, head is in the middle of l_eye and r_eye # in PoseTrack18, head is in the middle of head_bottom and head_top keypoints_new[:, 10] = (keypoints[:, 1] + keypoints[:, 2]) / 2 # rearrange other keypoints - keypoints_new[:, [1, 2, 3, 4, 5, 6, 9, 11, 12, 13, 14, 15, 16]] = \ - keypoints[:, [12, 14, 16, 11, 13, 15, 0, 5, 7, 9, 6, 8, 10]] - elif pose_det_dataset in ['aic']: + keypoints_new[:, [1, 2, 3, 4, 5, 6, 9, 11, 12, 13, 14, 15, 16]] = keypoints[:, [12, 14, 16, 11, 13, 15, 0, 5, 7, 9, 6, 8, 10]] + elif pose_det_dataset in ["aic"]: # pelvis (root) is in the middle of l_hip and r_hip keypoints_new[:, 0] = (keypoints[:, 9] + keypoints[:, 6]) / 2 # thorax is in the middle of l_shoulder and r_shoulder keypoints_new[:, 8] = (keypoints[:, 3] + keypoints[:, 0]) / 2 # spine is in the middle of thorax and pelvis - keypoints_new[:, - 7] = (keypoints_new[:, 0] + keypoints_new[:, 8]) / 2 + keypoints_new[:, 7] = (keypoints_new[:, 0] + keypoints_new[:, 8]) / 2 # neck base (top end of neck) is 1/4 the way from # neck (bottom end of neck) to head top keypoints_new[:, 9] = (3 * keypoints[:, 13] + keypoints[:, 12]) / 4 # head (spherical centre of head) is 7/12 the way from # neck (bottom end of neck) to head top - keypoints_new[:, 10] = (5 * keypoints[:, 13] + - 7 * keypoints[:, 12]) / 12 + keypoints_new[:, 10] = (5 * keypoints[:, 13] + 7 * keypoints[:, 12]) / 12 - keypoints_new[:, [1, 2, 3, 4, 5, 6, 11, 12, 13, 14, 15, 16]] = \ - keypoints[:, [6, 7, 8, 9, 10, 11, 3, 4, 5, 0, 1, 2]] - elif pose_det_dataset in ['crowdpose']: + keypoints_new[:, [1, 2, 3, 4, 5, 6, 11, 12, 13, 14, 15, 16]] = keypoints[:, [6, 7, 8, 9, 10, 11, 3, 4, 5, 0, 1, 2]] + elif pose_det_dataset in ["crowdpose"]: # pelvis (root) is in the middle of l_hip and r_hip keypoints_new[:, 0] = (keypoints[:, 6] + keypoints[:, 7]) / 2 # thorax is in the middle of l_shoulder and r_shoulder keypoints_new[:, 8] = (keypoints[:, 0] + keypoints[:, 1]) / 2 # spine is in the middle of thorax and pelvis - keypoints_new[:, - 7] = (keypoints_new[:, 0] + keypoints_new[:, 8]) / 2 + keypoints_new[:, 7] = (keypoints_new[:, 0] + keypoints_new[:, 8]) / 2 # neck base (top end of neck) is 1/4 the way from # neck (bottom end of neck) to head top keypoints_new[:, 9] = (3 * keypoints[:, 13] + keypoints[:, 12]) / 4 # head (spherical centre of head) is 7/12 the way from # neck (bottom end of neck) to head top - keypoints_new[:, 10] = (5 * keypoints[:, 13] + - 7 * keypoints[:, 12]) / 12 + keypoints_new[:, 10] = (5 * keypoints[:, 13] + 7 * keypoints[:, 12]) / 12 - keypoints_new[:, [1, 2, 3, 4, 5, 6, 11, 12, 13, 14, 15, 16]] = \ - keypoints[:, [7, 9, 11, 6, 8, 10, 0, 2, 4, 1, 3, 5]] + keypoints_new[:, [1, 2, 3, 4, 5, 6, 11, 12, 13, 14, 15, 16]] = keypoints[:, [7, 9, 11, 6, 8, 10, 0, 2, 4, 1, 3, 5]] else: - raise NotImplementedError( - f'unsupported conversion between {pose_lift_dataset} and ' - f'{pose_det_dataset}') + raise NotImplementedError(f"unsupported conversion between {pose_lift_dataset} and " f"{pose_det_dataset}") return keypoints_new @@ -119,16 +105,12 @@ def extract_pose_sequence(pose_results, frame_idx, causal, seq_len, step=1): pad_left = max(0, frames_left - frame_idx // step) pad_right = max(0, frames_right - (num_frames - 1 - frame_idx) // step) start = max(frame_idx % step, frame_idx - frames_left * step) - end = min(num_frames - (num_frames - 1 - frame_idx) % step, - frame_idx + frames_right * step + 1) - pose_results_seq = [pose_results[0]] * pad_left + \ - pose_results[start:end:step] + [pose_results[-1]] * pad_right + end = min(num_frames - (num_frames - 1 - frame_idx) % step, frame_idx + frames_right * step + 1) + pose_results_seq = [pose_results[0]] * pad_left + pose_results[start:end:step] + [pose_results[-1]] * pad_right return pose_results_seq -def collate_pose_sequence(pose_results_2d, - with_track_id=True, - target_frame=-1): +def collate_pose_sequence(pose_results_2d, with_track_id=True, target_frame=-1): """Reorganize multi-frame pose detection results into individual pose sequences. @@ -164,8 +146,7 @@ def collate_pose_sequence(pose_results_2d, target_frame = (T + target_frame) % T # convert negative index to positive - N = len( - pose_results_2d[target_frame]) # use identities in the target frame + N = len(pose_results_2d[target_frame]) # use identities in the target frame if N == 0: return [] @@ -181,21 +162,15 @@ def collate_pose_sequence(pose_results_2d, pred_instances = InstanceData() gt_instances = pose_results_2d[target_frame][idx].gt_instances.clone() - pred_instances = pose_results_2d[target_frame][ - idx].pred_instances.clone() + pred_instances = pose_results_2d[target_frame][idx].pred_instances.clone() pose_seq.pred_instances = pred_instances pose_seq.gt_instances = gt_instances if not with_track_id: - pose_seq.pred_instances.keypoints = np.stack([ - frame[idx].pred_instances.keypoints - for frame in pose_results_2d - ], - axis=1) + pose_seq.pred_instances.keypoints = np.stack([frame[idx].pred_instances.keypoints for frame in pose_results_2d], axis=1) else: keypoints = np.zeros((B, T, K, C), dtype=np.float32) - keypoints[:, target_frame] = pose_results_2d[target_frame][ - idx].pred_instances.keypoints + keypoints[:, target_frame] = pose_results_2d[target_frame][idx].pred_instances.keypoints # find the left most frame containing track_ids[idx] for frame_idx in range(target_frame - 1, -1, -1): contains_idx = False @@ -206,7 +181,7 @@ def collate_pose_sequence(pose_results_2d, break if not contains_idx: # replicate the left most frame - keypoints[:, :frame_idx + 1] = keypoints[:, frame_idx + 1] + keypoints[:, : frame_idx + 1] = keypoints[:, frame_idx + 1] break # find the right most frame containing track_idx[idx] for frame_idx in range(target_frame + 1, T): @@ -218,19 +193,15 @@ def collate_pose_sequence(pose_results_2d, break if not contains_idx: # replicate the right most frame - keypoints[:, frame_idx + 1:] = keypoints[:, frame_idx] + keypoints[:, frame_idx + 1 :] = keypoints[:, frame_idx] break - pose_seq.pred_instances.set_field(keypoints, 'keypoints') + pose_seq.pred_instances.set_field(keypoints, "keypoints") pose_sequences.append(pose_seq) return pose_sequences -def inference_pose_lifter_model(model, - pose_results_2d, - with_track_id=True, - image_size=None, - norm_pose_2d=False): +def inference_pose_lifter_model(model, pose_results_2d, with_track_id=True, image_size=None, norm_pose_2d=False): """Inference 3D pose from 2D pose sequences using a pose lifter model. Args: @@ -253,17 +224,17 @@ def inference_pose_lifter_model(model, the predicted keypoints and scores are saved at ``data_sample.pred_instances.keypoints_3d``. """ - init_default_scope(model.cfg.get('default_scope', 'mmpose')) + init_default_scope(model.cfg.get("default_scope", "mmpose")) pipeline = Compose(model.cfg.test_dataloader.dataset.pipeline) - causal = model.cfg.test_dataloader.dataset.get('causal', False) + causal = model.cfg.test_dataloader.dataset.get("causal", False) target_idx = -1 if causal else len(pose_results_2d) // 2 dataset_info = model.dataset_meta if dataset_info is not None: - if 'stats_info' in dataset_info: - bbox_center = dataset_info['stats_info']['bbox_center'] - bbox_scale = dataset_info['stats_info']['bbox_scale'] + if "stats_info" in dataset_info: + bbox_center = dataset_info["stats_info"]["bbox_center"] + bbox_scale = dataset_info["stats_info"]["bbox_scale"] else: if norm_pose_2d: # compute the average bbox center and scale from the @@ -274,11 +245,8 @@ def inference_pose_lifter_model(model, for pose_res in pose_results_2d: for data_sample in pose_res: for bbox in data_sample.pred_instances.bboxes: - bbox_center += np.array([[(bbox[0] + bbox[2]) / 2, - (bbox[1] + bbox[3]) / 2] - ]) - bbox_scale += max(bbox[2] - bbox[0], - bbox[3] - bbox[1]) + bbox_center += np.array([[(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2]]) + bbox_scale += max(bbox[2] - bbox[0], bbox[3] - bbox[1]) num_bbox += 1 bbox_center /= num_bbox bbox_scale /= num_bbox @@ -292,8 +260,7 @@ def inference_pose_lifter_model(model, for j, data_sample in enumerate(pose_res): data_sample_copy = PoseDataSample() data_sample_copy.gt_instances = data_sample.gt_instances.clone() - data_sample_copy.pred_instances = data_sample.pred_instances.clone( - ) + data_sample_copy.pred_instances = data_sample.pred_instances.clone() data_sample_copy.track_id = data_sample.track_id kpts = data_sample.pred_instances.keypoints bboxes = data_sample.pred_instances.bboxes @@ -302,20 +269,16 @@ def inference_pose_lifter_model(model, kpt = kpts[k] if norm_pose_2d: bbox = bboxes[k] - center = np.array([[(bbox[0] + bbox[2]) / 2, - (bbox[1] + bbox[3]) / 2]]) + center = np.array([[(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2]]) scale = max(bbox[2] - bbox[0], bbox[3] - bbox[1]) - keypoints.append((kpt[:, :2] - center) / scale * - bbox_scale + bbox_center) + keypoints.append((kpt[:, :2] - center) / scale * bbox_scale + bbox_center) else: keypoints.append(kpt[:, :2]) - data_sample_copy.pred_instances.set_field( - np.array(keypoints), 'keypoints') + data_sample_copy.pred_instances.set_field(np.array(keypoints), "keypoints") pose_res_copy.append(data_sample_copy) pose_results_2d_copy.append(pose_res_copy) - pose_sequences_2d = collate_pose_sequence(pose_results_2d_copy, - with_track_id, target_idx) + pose_sequences_2d = collate_pose_sequence(pose_results_2d_copy, with_track_id, target_idx) if not pose_sequences_2d: return [] @@ -325,24 +288,25 @@ def inference_pose_lifter_model(model, data_info = dict() keypoints_2d = pose_seq.pred_instances.keypoints - keypoints_2d = np.squeeze( - keypoints_2d, axis=0) if keypoints_2d.ndim == 4 else keypoints_2d + keypoints_2d = np.squeeze(keypoints_2d, axis=0) if keypoints_2d.ndim == 4 else keypoints_2d T, K, C = keypoints_2d.shape - data_info['keypoints'] = keypoints_2d - data_info['keypoints_visible'] = np.ones(( - T, - K, - ), dtype=np.float32) - data_info['lifting_target'] = np.zeros((1, K, 3), dtype=np.float32) - data_info['factor'] = np.zeros((T, ), dtype=np.float32) - data_info['lifting_target_visible'] = np.ones((1, K, 1), - dtype=np.float32) + data_info["keypoints"] = keypoints_2d + data_info["keypoints_visible"] = np.ones( + ( + T, + K, + ), + dtype=np.float32, + ) + data_info["lifting_target"] = np.zeros((1, K, 3), dtype=np.float32) + data_info["factor"] = np.zeros((T,), dtype=np.float32) + data_info["lifting_target_visible"] = np.ones((1, K, 1), dtype=np.float32) if image_size is not None: assert len(image_size) == 2 - data_info['camera_param'] = dict(w=image_size[0], h=image_size[1]) + data_info["camera_param"] = dict(w=image_size[0], h=image_size[1]) data_info.update(model.dataset_meta) data_list.append(pipeline(data_info)) diff --git a/mmpose/apis/inference_tracking.py b/mmpose/apis/inference_tracking.py index c823adcfc7107e1e63ba0a62ad48148d7fc354c9..f4f92ac4e37921f59bf23333464af7c97cace586 100644 --- a/mmpose/apis/inference_tracking.py +++ b/mmpose/apis/inference_tracking.py @@ -29,7 +29,7 @@ def _compute_iou(bboxA, bboxB): union_area = float(bboxA_area + bboxB_area - inter_area) if union_area == 0: union_area = 1e-5 - warnings.warn('union_area=0 is unexpected') + warnings.warn("union_area=0 is unexpected") iou = inter_area / union_area @@ -64,9 +64,7 @@ def _track_by_iou(res, results_last, thr): def _track_by_oks(res, results_last, thr, sigmas=None): """Get track id using OKS tracking greedily.""" - keypoint = np.concatenate((res.pred_instances.keypoints, - res.pred_instances.keypoint_scores[:, :, None]), - axis=2) + keypoint = np.concatenate((res.pred_instances.keypoints, res.pred_instances.keypoint_scores[:, :, None]), axis=2) keypoint = np.squeeze(keypoint, axis=0).reshape((-1)) area = np.squeeze(res.pred_instances.areas, axis=0) max_index = -1 @@ -75,21 +73,17 @@ def _track_by_oks(res, results_last, thr, sigmas=None): if len(results_last) == 0: return -1, results_last, match_result - keypoints_last = np.array([ - np.squeeze( - np.concatenate( - (res_last.pred_instances.keypoints, - res_last.pred_instances.keypoint_scores[:, :, None]), - axis=2), - axis=0).reshape((-1)) for res_last in results_last - ]) - area_last = np.array([ - np.squeeze(res_last.pred_instances.areas, axis=0) - for res_last in results_last - ]) - - oks_score = oks_iou( - keypoint, keypoints_last, area, area_last, sigmas=sigmas) + keypoints_last = np.array( + [ + np.squeeze( + np.concatenate((res_last.pred_instances.keypoints, res_last.pred_instances.keypoint_scores[:, :, None]), axis=2), axis=0 + ).reshape((-1)) + for res_last in results_last + ] + ) + area_last = np.array([np.squeeze(res_last.pred_instances.areas, axis=0) for res_last in results_last]) + + oks_score = oks_iou(keypoint, keypoints_last, area, area_last, sigmas=sigmas) max_index = np.argmax(oks_score) diff --git a/mmpose/apis/inferencers/__init__.py b/mmpose/apis/inferencers/__init__.py index 0e2b5c8293f261ef5651f2d379e35c484ae53e40..0114bfd2e15e3d46437504e8bed8eb7ff5b56963 100644 --- a/mmpose/apis/inferencers/__init__.py +++ b/mmpose/apis/inferencers/__init__.py @@ -5,7 +5,4 @@ from .pose2d_inferencer import Pose2DInferencer from .pose3d_inferencer import Pose3DInferencer from .utils import get_model_aliases -__all__ = [ - 'Pose2DInferencer', 'MMPoseInferencer', 'get_model_aliases', - 'Pose3DInferencer', 'Hand3DInferencer' -] +__all__ = ["Pose2DInferencer", "MMPoseInferencer", "get_model_aliases", "Pose3DInferencer", "Hand3DInferencer"] diff --git a/mmpose/apis/inferencers/base_mmpose_inferencer.py b/mmpose/apis/inferencers/base_mmpose_inferencer.py index 574063e824198bb535d3737df472300a78229c3f..404124697c5a23cbf0f458cb390809e99af50fdb 100644 --- a/mmpose/apis/inferencers/base_mmpose_inferencer.py +++ b/mmpose/apis/inferencers/base_mmpose_inferencer.py @@ -4,8 +4,7 @@ import logging import mimetypes import os from collections import defaultdict -from typing import (Callable, Dict, Generator, Iterable, List, Optional, - Sequence, Tuple, Union) +from typing import Callable, Dict, Generator, Iterable, List, Optional, Sequence, Tuple, Union import cv2 import mmcv @@ -14,8 +13,7 @@ import numpy as np import torch.nn as nn from mmengine.config import Config, ConfigDict from mmengine.dataset import Compose -from mmengine.fileio import (get_file_backend, isdir, join_path, - list_dir_or_file) +from mmengine.fileio import get_file_backend, isdir, join_path, list_dir_or_file from mmengine.infer.infer import BaseInferencer, ModelType from mmengine.logging import print_log from mmengine.registry import init_default_scope @@ -27,10 +25,12 @@ from rich.progress import track from mmpose.apis.inference import dataset_meta_from_config from mmpose.registry import DATASETS from mmpose.structures import PoseDataSample, split_instances + from .utils import default_det_models try: from mmdet.apis.det_inferencer import DetInferencer + has_mmdet = True except (ImportError, ModuleNotFoundError): has_mmdet = False @@ -47,22 +47,30 @@ ResType = Union[Dict, List[Dict], InstanceData, List[InstanceData]] class BaseMMPoseInferencer(BaseInferencer): """The base class for MMPose inferencers.""" - preprocess_kwargs: set = {'bbox_thr', 'nms_thr', 'bboxes'} + preprocess_kwargs: set = {"bbox_thr", "nms_thr", "bboxes"} forward_kwargs: set = set() visualize_kwargs: set = { - 'return_vis', 'show', 'wait_time', 'draw_bbox', 'radius', 'thickness', - 'kpt_thr', 'vis_out_dir', 'black_background' + "return_vis", + "show", + "wait_time", + "draw_bbox", + "radius", + "thickness", + "kpt_thr", + "vis_out_dir", + "black_background", } - postprocess_kwargs: set = {'pred_out_dir', 'return_datasample'} + postprocess_kwargs: set = {"pred_out_dir", "return_datasample"} - def __init__(self, - model: Union[ModelType, str, None] = None, - weights: Optional[str] = None, - device: Optional[str] = None, - scope: Optional[str] = None, - show_progress: bool = False) -> None: - super().__init__( - model, weights, device, scope, show_progress=show_progress) + def __init__( + self, + model: Union[ModelType, str, None] = None, + weights: Optional[str] = None, + device: Optional[str] = None, + scope: Optional[str] = None, + show_progress: bool = False, + ) -> None: + super().__init__(model, weights, device, scope, show_progress=show_progress) def _init_detector( self, @@ -71,20 +79,16 @@ class BaseMMPoseInferencer(BaseInferencer): det_cat_ids: Optional[Union[int, Tuple]] = None, device: Optional[str] = None, ): - object_type = DATASETS.get(self.cfg.dataset_type).__module__.split( - 'datasets.')[-1].split('.')[0].lower() + object_type = DATASETS.get(self.cfg.dataset_type).__module__.split("datasets.")[-1].split(".")[0].lower() - if det_model in ('whole_image', 'whole-image') or \ - (det_model is None and - object_type not in default_det_models): + if det_model in ("whole_image", "whole-image") or (det_model is None and object_type not in default_det_models): self.detector = None else: - det_scope = 'mmdet' + det_scope = "mmdet" if det_model is None: det_info = default_det_models[object_type] - det_model, det_weights, det_cat_ids = det_info[ - 'model'], det_info['weights'], det_info['cat_ids'] + det_model, det_weights, det_cat_ids = det_info["model"], det_info["weights"], det_info["cat_ids"] elif os.path.exists(det_model): det_cfg = Config.fromfile(det_model) det_scope = det_cfg.default_scope @@ -97,24 +101,19 @@ class BaseMMPoseInferencer(BaseInferencer): scope=det_scope, ) # for compatibility with low version of mmdet - if 'show_progress' in inspect.signature( - DetInferencer).parameters: - det_kwargs['show_progress'] = False + if "show_progress" in inspect.signature(DetInferencer).parameters: + det_kwargs["show_progress"] = False self.detector = DetInferencer(**det_kwargs) else: - raise RuntimeError( - 'MMDetection (v3.0.0 or above) is required to build ' - 'inferencers for top-down pose estimation models.') + raise RuntimeError("MMDetection (v3.0.0 or above) is required to build " "inferencers for top-down pose estimation models.") if isinstance(det_cat_ids, (tuple, list)): self.det_cat_ids = det_cat_ids else: - self.det_cat_ids = (det_cat_ids, ) + self.det_cat_ids = (det_cat_ids,) - def _load_weights_to_model(self, model: nn.Module, - checkpoint: Optional[dict], - cfg: Optional[ConfigType]) -> None: + def _load_weights_to_model(self, model: nn.Module, checkpoint: Optional[dict], cfg: Optional[ConfigType]) -> None: """Loading model weights and meta information from cfg and checkpoint. Subclasses could override this method to load extra meta information @@ -127,28 +126,23 @@ class BaseMMPoseInferencer(BaseInferencer): """ if checkpoint is not None: _load_checkpoint_to_model(model, checkpoint) - checkpoint_meta = checkpoint.get('meta', {}) + checkpoint_meta = checkpoint.get("meta", {}) # save the dataset_meta in the model for convenience - if 'dataset_meta' in checkpoint_meta: + if "dataset_meta" in checkpoint_meta: # mmpose 1.x - model.dataset_meta = checkpoint_meta['dataset_meta'] + model.dataset_meta = checkpoint_meta["dataset_meta"] else: print_log( - 'dataset_meta are not saved in the checkpoint\'s ' - 'meta data, load via config.', - logger='current', - level=logging.WARNING) - model.dataset_meta = dataset_meta_from_config( - cfg, dataset_mode='train') + "dataset_meta are not saved in the checkpoint's " "meta data, load via config.", logger="current", level=logging.WARNING + ) + model.dataset_meta = dataset_meta_from_config(cfg, dataset_mode="train") else: print_log( - 'Checkpoint is not loaded, and the inference ' - 'result is calculated by the randomly initialized ' - 'model!', - logger='current', - level=logging.WARNING) - model.dataset_meta = dataset_meta_from_config( - cfg, dataset_mode='train') + "Checkpoint is not loaded, and the inference " "result is calculated by the randomly initialized " "model!", + logger="current", + level=logging.WARNING, + ) + model.dataset_meta = dataset_meta_from_config(cfg, dataset_mode="train") def _inputs_to_list(self, inputs: InputsType) -> Iterable: """Preprocess the inputs to a list. @@ -172,42 +166,34 @@ class BaseMMPoseInferencer(BaseInferencer): if isinstance(inputs, str): backend = get_file_backend(inputs) - if hasattr(backend, 'isdir') and isdir(inputs): + if hasattr(backend, "isdir") and isdir(inputs): # Backends like HttpsBackend do not implement `isdir`, so only # those backends that implement `isdir` could accept the # inputs as a directory - filepath_list = [ - join_path(inputs, fname) - for fname in list_dir_or_file(inputs, list_dir=False) - ] + filepath_list = [join_path(inputs, fname) for fname in list_dir_or_file(inputs, list_dir=False)] inputs = [] for filepath in filepath_list: - input_type = mimetypes.guess_type(filepath)[0].split( - '/')[0] - if input_type == 'image': + input_type = mimetypes.guess_type(filepath)[0].split("/")[0] + if input_type == "image": inputs.append(filepath) inputs.sort() else: # if inputs is a path to a video file, it will be converted # to a list containing separated frame filenames - input_type = mimetypes.guess_type(inputs)[0].split('/')[0] - if input_type == 'video': + input_type = mimetypes.guess_type(inputs)[0].split("/")[0] + if input_type == "video": self._video_input = True video = mmcv.VideoReader(inputs) self.video_info = dict( - fps=video.fps, - name=os.path.basename(inputs), - writer=None, - width=video.width, - height=video.height, - predictions=[]) + fps=video.fps, name=os.path.basename(inputs), writer=None, width=video.width, height=video.height, predictions=[] + ) inputs = video - elif input_type == 'image': + elif input_type == "image": inputs = [inputs] else: - raise ValueError(f'Expected input to be an image, video, ' - f'or folder, but received {inputs} of ' - f'type {input_type}.') + raise ValueError( + f"Expected input to be an image, video, " f"or folder, but received {inputs} of " f"type {input_type}." + ) elif isinstance(inputs, np.ndarray): inputs = [inputs] @@ -232,32 +218,26 @@ class BaseMMPoseInferencer(BaseInferencer): # Ensure the inputs string is in the expected format. inputs = inputs.lower() - assert inputs.startswith('webcam'), f'Expected input to start with ' \ - f'"webcam", but got "{inputs}"' + assert inputs.startswith("webcam"), f"Expected input to start with " f'"webcam", but got "{inputs}"' # Parse the camera ID from the inputs string. - inputs_ = inputs.split(':') + inputs_ = inputs.split(":") if len(inputs_) == 1: camera_id = 0 elif len(inputs_) == 2 and str.isdigit(inputs_[1]): camera_id = int(inputs_[1]) else: - raise ValueError( - f'Expected webcam input to have format "webcam:id", ' - f'but got "{inputs}"') + raise ValueError(f'Expected webcam input to have format "webcam:id", ' f'but got "{inputs}"') # Attempt to open the video capture object. vcap = cv2.VideoCapture(camera_id) if not vcap.isOpened(): - print_log( - f'Cannot open camera (ID={camera_id})', - logger='current', - level=logging.WARNING) + print_log(f"Cannot open camera (ID={camera_id})", logger="current", level=logging.WARNING) return [] # Set video input flag and metadata. self._video_input = True - (major_ver, minor_ver, subminor_ver) = (cv2.__version__).split('.') + major_ver, minor_ver, subminor_ver = (cv2.__version__).split(".") if int(major_ver) < 3: fps = vcap.get(cv2.cv.CV_CAP_PROP_FPS) width = vcap.get(cv2.cv.CV_CAP_PROP_FRAME_WIDTH) @@ -266,13 +246,7 @@ class BaseMMPoseInferencer(BaseInferencer): fps = vcap.get(cv2.CAP_PROP_FPS) width = vcap.get(cv2.CAP_PROP_FRAME_WIDTH) height = vcap.get(cv2.CAP_PROP_FRAME_HEIGHT) - self.video_info = dict( - fps=fps, - name='webcam.mp4', - writer=None, - width=width, - height=height, - predictions=[]) + self.video_info = dict(fps=fps, name="webcam.mp4", writer=None, width=width, height=height, predictions=[]) def _webcam_reader() -> Generator: while True: @@ -299,7 +273,7 @@ class BaseMMPoseInferencer(BaseInferencer): ``np.ndarray``. The returned pipeline will be used to process a single data. """ - scope = cfg.get('default_scope', 'mmpose') + scope = cfg.get("default_scope", "mmpose") if scope is not None: init_default_scope(scope) return Compose(cfg.test_dataloader.dataset.pipeline) @@ -310,13 +284,9 @@ class BaseMMPoseInferencer(BaseInferencer): pass - def preprocess(self, - inputs: InputsType, - batch_size: int = 1, - bboxes: Optional[List] = None, - bbox_thr: float = 0.3, - nms_thr: float = 0.3, - **kwargs): + def preprocess( + self, inputs: InputsType, batch_size: int = 1, bboxes: Optional[List] = None, bbox_thr: float = 0.3, nms_thr: float = 0.3, **kwargs + ): """Process the inputs into a model-feedable format. Args: @@ -334,25 +304,19 @@ class BaseMMPoseInferencer(BaseInferencer): # One-stage pose estimators perform prediction filtering within the # head's `predict` method. Here, we set the arguments for filtering - if self.cfg.model.type == 'BottomupPoseEstimator': + if self.cfg.model.type == "BottomupPoseEstimator": # 1. init with default arguments test_cfg = self.model.head.test_cfg.copy() # 2. update the score_thr and nms_thr in the test_cfg of the head - if 'score_thr' in test_cfg: - test_cfg['score_thr'] = bbox_thr - if 'nms_thr' in test_cfg: - test_cfg['nms_thr'] = nms_thr + if "score_thr" in test_cfg: + test_cfg["score_thr"] = bbox_thr + if "nms_thr" in test_cfg: + test_cfg["nms_thr"] = nms_thr self.model.test_cfg = test_cfg for i, input in enumerate(inputs): bbox = bboxes[i] if bboxes else [] - data_infos = self.preprocess_single( - input, - index=i, - bboxes=bbox, - bbox_thr=bbox_thr, - nms_thr=nms_thr, - **kwargs) + data_infos = self.preprocess_single(input, index=i, bboxes=bbox, bbox_thr=bbox_thr, nms_thr=nms_thr, **kwargs) # only supports inference with batch size 1 yield self.collate_fn(data_infos), [input] @@ -384,10 +348,10 @@ class BaseMMPoseInferencer(BaseInferencer): dict: Inference and visualization results. """ if out_dir is not None: - if 'vis_out_dir' not in kwargs: - kwargs['vis_out_dir'] = f'{out_dir}/visualizations' - if 'pred_out_dir' not in kwargs: - kwargs['pred_out_dir'] = f'{out_dir}/predictions' + if "vis_out_dir" not in kwargs: + kwargs["vis_out_dir"] = f"{out_dir}/visualizations" + if "pred_out_dir" not in kwargs: + kwargs["pred_out_dir"] = f"{out_dir}/predictions" ( preprocess_kwargs, @@ -399,73 +363,67 @@ class BaseMMPoseInferencer(BaseInferencer): self.update_model_visualizer_settings(**kwargs) # preprocessing - if isinstance(inputs, str) and inputs.startswith('webcam'): + if isinstance(inputs, str) and inputs.startswith("webcam"): inputs = self._get_webcam_inputs(inputs) batch_size = 1 - if not visualize_kwargs.get('show', False): + if not visualize_kwargs.get("show", False): print_log( - 'The display mode is closed when using webcam ' - 'input. It will be turned on automatically.', - logger='current', - level=logging.WARNING) - visualize_kwargs['show'] = True + "The display mode is closed when using webcam " "input. It will be turned on automatically.", + logger="current", + level=logging.WARNING, + ) + visualize_kwargs["show"] = True else: inputs = self._inputs_to_list(inputs) # check the compatibility between inputs/outputs if not self._video_input and len(inputs) > 0: - vis_out_dir = visualize_kwargs.get('vis_out_dir', None) + vis_out_dir = visualize_kwargs.get("vis_out_dir", None) if vis_out_dir is not None: _, file_extension = os.path.splitext(vis_out_dir) - assert not file_extension, f'the argument `vis_out_dir` ' \ - f'should be a folder while the input contains multiple ' \ - f'images, but got {vis_out_dir}' + assert not file_extension, ( + f"the argument `vis_out_dir` " f"should be a folder while the input contains multiple " f"images, but got {vis_out_dir}" + ) - if 'bbox_thr' in self.forward_kwargs: - forward_kwargs['bbox_thr'] = preprocess_kwargs.get('bbox_thr', -1) - inputs = self.preprocess( - inputs, batch_size=batch_size, **preprocess_kwargs) + if "bbox_thr" in self.forward_kwargs: + forward_kwargs["bbox_thr"] = preprocess_kwargs.get("bbox_thr", -1) + inputs = self.preprocess(inputs, batch_size=batch_size, **preprocess_kwargs) preds = [] - for proc_inputs, ori_inputs in (track(inputs, description='Inference') - if self.show_progress else inputs): + for proc_inputs, ori_inputs in track(inputs, description="Inference") if self.show_progress else inputs: preds = self.forward(proc_inputs, **forward_kwargs) - visualization = self.visualize(ori_inputs, preds, - **visualize_kwargs) - results = self.postprocess( - preds, - visualization, - return_datasamples=return_datasamples, - **postprocess_kwargs) + visualization = self.visualize(ori_inputs, preds, **visualize_kwargs) + results = self.postprocess(preds, visualization, return_datasamples=return_datasamples, **postprocess_kwargs) yield results if self._video_input: - self._finalize_video_processing( - postprocess_kwargs.get('pred_out_dir', '')) + self._finalize_video_processing(postprocess_kwargs.get("pred_out_dir", "")) # In 3D Inferencers, some intermediate results (e.g. 2d keypoints) # will be temporarily stored in `self._buffer`. It's essential to # clear this information to prevent any interference with subsequent # inferences. - if hasattr(self, '_buffer'): + if hasattr(self, "_buffer"): self._buffer.clear() - def visualize(self, - inputs: list, - preds: List[PoseDataSample], - return_vis: bool = False, - show: bool = False, - draw_bbox: bool = False, - wait_time: float = 0, - radius: int = 3, - thickness: int = 1, - kpt_thr: float = 0.3, - vis_out_dir: str = '', - window_name: str = '', - black_background: bool = False, - **kwargs) -> List[np.ndarray]: + def visualize( + self, + inputs: list, + preds: List[PoseDataSample], + return_vis: bool = False, + show: bool = False, + draw_bbox: bool = False, + wait_time: float = 0, + radius: int = 3, + thickness: int = 1, + kpt_thr: float = 0.3, + vis_out_dir: str = "", + window_name: str = "", + black_background: bool = False, + **kwargs, + ) -> List[np.ndarray]: """Visualize predictions. Args: @@ -494,9 +452,8 @@ class BaseMMPoseInferencer(BaseInferencer): if (not return_vis) and (not show) and (not vis_out_dir): return - if getattr(self, 'visualizer', None) is None: - raise ValueError('Visualization needs the "visualizer" term' - 'defined in the config, but got None.') + if getattr(self, "visualizer", None) is None: + raise ValueError('Visualization needs the "visualizer" term' "defined in the config, but got None.") self.visualizer.radius = radius self.visualizer.line_width = thickness @@ -505,16 +462,15 @@ class BaseMMPoseInferencer(BaseInferencer): for single_input, pred in zip(inputs, preds): if isinstance(single_input, str): - img = mmcv.imread(single_input, channel_order='rgb') + img = mmcv.imread(single_input, channel_order="rgb") elif isinstance(single_input, np.ndarray): img = mmcv.bgr2rgb(single_input) else: - raise ValueError('Unsupported input type: ' - f'{type(single_input)}') + raise ValueError("Unsupported input type: " f"{type(single_input)}") if black_background: img = img * 0 - img_name = os.path.basename(pred.metainfo['img_path']) + img_name = os.path.basename(pred.metainfo["img_path"]) window_name = window_name if window_name else img_name # since visualization and inference utilize the same process, @@ -523,15 +479,8 @@ class BaseMMPoseInferencer(BaseInferencer): wait_time = 1e-5 if self._video_input else wait_time visualization = self.visualizer.add_datasample( - window_name, - img, - pred, - draw_gt=False, - draw_bbox=draw_bbox, - show=show, - wait_time=wait_time, - kpt_thr=kpt_thr, - **kwargs) + window_name, img, pred, draw_gt=False, draw_bbox=draw_bbox, show=show, wait_time=wait_time, kpt_thr=kpt_thr, **kwargs + ) results.append(visualization) if vis_out_dir: @@ -559,27 +508,24 @@ class BaseMMPoseInferencer(BaseInferencer): if self._video_input: - if self.video_info['writer'] is None: - fourcc = cv2.VideoWriter_fourcc(*'mp4v') + if self.video_info["writer"] is None: + fourcc = cv2.VideoWriter_fourcc(*"mp4v") if file_name is None: - file_name = os.path.basename(self.video_info['name']) + file_name = os.path.basename(self.video_info["name"]) out_file = join_path(dir_name, file_name) - self.video_info['output_file'] = out_file - self.video_info['writer'] = cv2.VideoWriter( - out_file, fourcc, self.video_info['fps'], - (visualization.shape[1], visualization.shape[0])) - self.video_info['writer'].write(out_img) + self.video_info["output_file"] = out_file + self.video_info["writer"] = cv2.VideoWriter( + out_file, fourcc, self.video_info["fps"], (visualization.shape[1], visualization.shape[0]) + ) + self.video_info["writer"].write(out_img) else: if file_name is None: - file_name = img_name if img_name else 'visualization.jpg' + file_name = img_name if img_name else "visualization.jpg" out_file = join_path(dir_name, file_name) mmcv.imwrite(out_img, out_file) - print_log( - f'the output image has been saved at {out_file}', - logger='current', - level=logging.INFO) + print_log(f"the output image has been saved at {out_file}", logger="current", level=logging.INFO) def postprocess( self, @@ -587,7 +533,7 @@ class BaseMMPoseInferencer(BaseInferencer): visualization: List[np.ndarray], return_datasample=None, return_datasamples=False, - pred_out_dir: str = '', + pred_out_dir: str = "", ) -> dict: """Process the predictions and visualization results from ``forward`` and ``visualize``. @@ -620,46 +566,44 @@ class BaseMMPoseInferencer(BaseInferencer): """ if return_datasample is not None: print_log( - 'The `return_datasample` argument is deprecated ' - 'and will be removed in future versions. Please ' - 'use `return_datasamples`.', - logger='current', - level=logging.WARNING) + "The `return_datasample` argument is deprecated " + "and will be removed in future versions. Please " + "use `return_datasamples`.", + logger="current", + level=logging.WARNING, + ) return_datasamples = return_datasample result_dict = defaultdict(list) - result_dict['visualization'] = visualization + result_dict["visualization"] = visualization for pred in preds: if not return_datasamples: # convert datasamples to list of instance predictions pred = split_instances(pred.pred_instances) - result_dict['predictions'].append(pred) + result_dict["predictions"].append(pred) - if pred_out_dir != '': - for pred, data_sample in zip(result_dict['predictions'], preds): + if pred_out_dir != "": + for pred, data_sample in zip(result_dict["predictions"], preds): if self._video_input: # For video or webcam input, predictions for each frame # are gathered in the 'predictions' key of 'video_info' # dictionary. All frame predictions are then stored into # a single file after processing all frames. - self.video_info['predictions'].append(pred) + self.video_info["predictions"].append(pred) else: # For non-video inputs, predictions are stored in separate # JSON files. The filename is determined by the basename # of the input image path with a '.json' extension. The # predictions are then dumped into this file. - fname = os.path.splitext( - os.path.basename( - data_sample.metainfo['img_path']))[0] + '.json' - mmengine.dump( - pred, join_path(pred_out_dir, fname), indent=' ') + fname = os.path.splitext(os.path.basename(data_sample.metainfo["img_path"]))[0] + ".json" + mmengine.dump(pred, join_path(pred_out_dir, fname), indent=" ") return result_dict def _finalize_video_processing( self, - pred_out_dir: str = '', + pred_out_dir: str = "", ): """Finalize video processing by releasing the video writer and saving predictions to a file. @@ -670,22 +614,14 @@ class BaseMMPoseInferencer(BaseInferencer): """ # Release the video writer if it exists - if self.video_info['writer'] is not None: - out_file = self.video_info['output_file'] - print_log( - f'the output video has been saved at {out_file}', - logger='current', - level=logging.INFO) - self.video_info['writer'].release() + if self.video_info["writer"] is not None: + out_file = self.video_info["output_file"] + print_log(f"the output video has been saved at {out_file}", logger="current", level=logging.INFO) + self.video_info["writer"].release() # Save predictions if pred_out_dir: - fname = os.path.splitext( - os.path.basename(self.video_info['name']))[0] + '.json' - predictions = [ - dict(frame_id=i, instances=pred) - for i, pred in enumerate(self.video_info['predictions']) - ] - - mmengine.dump( - predictions, join_path(pred_out_dir, fname), indent=' ') + fname = os.path.splitext(os.path.basename(self.video_info["name"]))[0] + ".json" + predictions = [dict(frame_id=i, instances=pred) for i, pred in enumerate(self.video_info["predictions"])] + + mmengine.dump(predictions, join_path(pred_out_dir, fname), indent=" ") diff --git a/mmpose/apis/inferencers/hand3d_inferencer.py b/mmpose/apis/inferencers/hand3d_inferencer.py index a7db53cb84bf0fc8abc0903a9d311da42502f097..0a974390ca5ef3fc83298f56d474a7a90967c289 100644 --- a/mmpose/apis/inferencers/hand3d_inferencer.py +++ b/mmpose/apis/inferencers/hand3d_inferencer.py @@ -17,6 +17,7 @@ from mmengine.structures import InstanceData from mmpose.evaluation.functional import nms from mmpose.registry import INFERENCERS from mmpose.structures import PoseDataSample, merge_data_samples + from .base_mmpose_inferencer import BaseMMPoseInferencer InstanceList = List[InstanceData] @@ -56,38 +57,35 @@ class Hand3DInferencer(BaseMMPoseInferencer): detection model. Defaults to None. """ - preprocess_kwargs: set = {'bbox_thr', 'nms_thr', 'bboxes'} - forward_kwargs: set = {'disable_rebase_keypoint'} + preprocess_kwargs: set = {"bbox_thr", "nms_thr", "bboxes"} + forward_kwargs: set = {"disable_rebase_keypoint"} visualize_kwargs: set = { - 'return_vis', - 'show', - 'wait_time', - 'draw_bbox', - 'radius', - 'thickness', - 'kpt_thr', - 'vis_out_dir', - 'num_instances', + "return_vis", + "show", + "wait_time", + "draw_bbox", + "radius", + "thickness", + "kpt_thr", + "vis_out_dir", + "num_instances", } - postprocess_kwargs: set = {'pred_out_dir', 'return_datasample'} - - def __init__(self, - model: Union[ModelType, str], - weights: Optional[str] = None, - device: Optional[str] = None, - scope: Optional[str] = 'mmpose', - det_model: Optional[Union[ModelType, str]] = None, - det_weights: Optional[str] = None, - det_cat_ids: Optional[Union[int, Tuple]] = None, - show_progress: bool = False) -> None: + postprocess_kwargs: set = {"pred_out_dir", "return_datasample"} + + def __init__( + self, + model: Union[ModelType, str], + weights: Optional[str] = None, + device: Optional[str] = None, + scope: Optional[str] = "mmpose", + det_model: Optional[Union[ModelType, str]] = None, + det_weights: Optional[str] = None, + det_cat_ids: Optional[Union[int, Tuple]] = None, + show_progress: bool = False, + ) -> None: init_default_scope(scope) - super().__init__( - model=model, - weights=weights, - device=device, - scope=scope, - show_progress=show_progress) + super().__init__(model=model, weights=weights, device=device, scope=scope, show_progress=show_progress) self.model = revert_sync_batchnorm(self.model) # assign dataset metainfo to self.visualizer @@ -104,13 +102,14 @@ class Hand3DInferencer(BaseMMPoseInferencer): self._video_input = False self._buffer = defaultdict(list) - def preprocess_single(self, - input: InputType, - index: int, - bbox_thr: float = 0.3, - nms_thr: float = 0.3, - bboxes: Union[List[List], List[np.ndarray], - np.ndarray] = []): + def preprocess_single( + self, + input: InputType, + index: int, + bbox_thr: float = 0.3, + nms_thr: float = 0.3, + bboxes: Union[List[List], List[np.ndarray], np.ndarray] = [], + ): """Process a single input into a model-feedable format. Args: @@ -128,42 +127,38 @@ class Hand3DInferencer(BaseMMPoseInferencer): if isinstance(input, str): data_info = dict(img_path=input) else: - data_info = dict(img=input, img_path=f'{index}.jpg'.rjust(10, '0')) + data_info = dict(img=input, img_path=f"{index}.jpg".rjust(10, "0")) data_info.update(self.model.dataset_meta) if self.detector is not None: try: - det_results = self.detector( - input, return_datasamples=True)['predictions'] + det_results = self.detector(input, return_datasamples=True)["predictions"] except ValueError: print_log( - 'Support for mmpose and mmdet versions up to 3.1.0 ' - 'will be discontinued in upcoming releases. To ' - 'ensure ongoing compatibility, please upgrade to ' - 'mmdet version 3.2.0 or later.', - logger='current', - level=logging.WARNING) - det_results = self.detector( - input, return_datasample=True)['predictions'] + "Support for mmpose and mmdet versions up to 3.1.0 " + "will be discontinued in upcoming releases. To " + "ensure ongoing compatibility, please upgrade to " + "mmdet version 3.2.0 or later.", + logger="current", + level=logging.WARNING, + ) + det_results = self.detector(input, return_datasample=True)["predictions"] pred_instance = det_results[0].pred_instances.cpu().numpy() - bboxes = np.concatenate( - (pred_instance.bboxes, pred_instance.scores[:, None]), axis=1) + bboxes = np.concatenate((pred_instance.bboxes, pred_instance.scores[:, None]), axis=1) label_mask = np.zeros(len(bboxes), dtype=np.uint8) for cat_id in self.det_cat_ids: - label_mask = np.logical_or(label_mask, - pred_instance.labels == cat_id) + label_mask = np.logical_or(label_mask, pred_instance.labels == cat_id) - bboxes = bboxes[np.logical_and(label_mask, - pred_instance.scores > bbox_thr)] + bboxes = bboxes[np.logical_and(label_mask, pred_instance.scores > bbox_thr)] bboxes = bboxes[nms(bboxes, nms_thr)] data_infos = [] if len(bboxes) > 0: for bbox in bboxes: inst = data_info.copy() - inst['bbox'] = bbox[None, :4] - inst['bbox_score'] = bbox[4:5] + inst["bbox"] = bbox[None, :4] + inst["bbox_score"] = bbox[4:5] data_infos.append(self.pipeline(inst)) else: inst = data_info.copy() @@ -173,16 +168,14 @@ class Hand3DInferencer(BaseMMPoseInferencer): input = mmcv.imread(input) h, w = input.shape[:2] - inst['bbox'] = np.array([[0, 0, w, h]], dtype=np.float32) - inst['bbox_score'] = np.ones(1, dtype=np.float32) + inst["bbox"] = np.array([[0, 0, w, h]], dtype=np.float32) + inst["bbox_score"] = np.ones(1, dtype=np.float32) data_infos.append(self.pipeline(inst)) return data_infos @torch.no_grad() - def forward(self, - inputs: Union[dict, tuple], - disable_rebase_keypoint: bool = False): + def forward(self, inputs: Union[dict, tuple], disable_rebase_keypoint: bool = False): """Performs a forward pass through the model. Args: @@ -220,8 +213,7 @@ class Hand3DInferencer(BaseMMPoseInferencer): if scores.max() > 1: scores /= 255 - res_2d.pred_instances.set_field(keypoints[..., :2].copy(), - 'keypoints') + res_2d.pred_instances.set_field(keypoints[..., :2].copy(), "keypoints") # rotate the keypoint to make z-axis correspondent to height # for better visualization @@ -231,8 +223,7 @@ class Hand3DInferencer(BaseMMPoseInferencer): # rebase height (z-axis) if not disable_rebase_keypoint: valid = scores > 0 - keypoints[..., 2] -= np.min( - keypoints[valid, 2], axis=-1, keepdims=True) + keypoints[..., 2] -= np.min(keypoints[valid, 2], axis=-1, keepdims=True) data_samples[idx].pred_instances.keypoints = keypoints data_samples[idx].pred_instances.keypoint_scores = scores @@ -241,7 +232,7 @@ class Hand3DInferencer(BaseMMPoseInferencer): data_samples = [merge_data_samples(data_samples)] data_samples_2d = merge_data_samples(data_samples_2d) - self._buffer['pose2d_results'] = data_samples_2d + self._buffer["pose2d_results"] = data_samples_2d return data_samples @@ -257,8 +248,8 @@ class Hand3DInferencer(BaseMMPoseInferencer): thickness: int = 1, kpt_thr: float = 0.3, num_instances: int = 1, - vis_out_dir: str = '', - window_name: str = '', + vis_out_dir: str = "", + window_name: str = "", ) -> List[np.ndarray]: """Visualize predictions. @@ -287,9 +278,8 @@ class Hand3DInferencer(BaseMMPoseInferencer): if (not return_vis) and (not show) and (not vis_out_dir): return - if getattr(self, 'visualizer', None) is None: - raise ValueError('Visualization needs the "visualizer" term' - 'defined in the config, but got None.') + if getattr(self, "visualizer", None) is None: + raise ValueError('Visualization needs the "visualizer" term' "defined in the config, but got None.") self.visualizer.radius = radius self.visualizer.line_width = thickness @@ -298,13 +288,12 @@ class Hand3DInferencer(BaseMMPoseInferencer): for single_input, pred in zip(inputs, preds): if isinstance(single_input, str): - img = mmcv.imread(single_input, channel_order='rgb') + img = mmcv.imread(single_input, channel_order="rgb") elif isinstance(single_input, np.ndarray): img = mmcv.bgr2rgb(single_input) else: - raise ValueError('Unsupported input type: ' - f'{type(single_input)}') - img_name = os.path.basename(pred.metainfo['img_path']) + raise ValueError("Unsupported input type: " f"{type(single_input)}") + img_name = os.path.basename(pred.metainfo["img_path"]) # since visualization and inference utilize the same process, # the wait time is reduced when a video input is utilized, @@ -318,7 +307,7 @@ class Hand3DInferencer(BaseMMPoseInferencer): window_name, img, data_sample=pred, - det_data_sample=self._buffer['pose2d_results'], + det_data_sample=self._buffer["pose2d_results"], draw_gt=False, draw_bbox=draw_bbox, show=show, @@ -328,7 +317,8 @@ class Hand3DInferencer(BaseMMPoseInferencer): axis_limit=200, axis_elev=15, kpt_thr=kpt_thr, - num_instances=num_instances) + num_instances=num_instances, + ) results.append(visualization) if vis_out_dir: diff --git a/mmpose/apis/inferencers/mmpose_inferencer.py b/mmpose/apis/inferencers/mmpose_inferencer.py index 4ade56cb04cf7a5b18758fd90430ae894d34983f..ff0c0a78ac67b2fbfb914c1ee4641f975d103fda 100644 --- a/mmpose/apis/inferencers/mmpose_inferencer.py +++ b/mmpose/apis/inferencers/mmpose_inferencer.py @@ -56,53 +56,53 @@ class MMPoseInferencer(BaseMMPoseInferencer): config will be used. Default is None. """ - preprocess_kwargs: set = { - 'bbox_thr', 'nms_thr', 'bboxes', 'use_oks_tracking', 'tracking_thr', - 'disable_norm_pose_2d' - } - forward_kwargs: set = { - 'merge_results', 'disable_rebase_keypoint', 'pose_based_nms' - } + preprocess_kwargs: set = {"bbox_thr", "nms_thr", "bboxes", "use_oks_tracking", "tracking_thr", "disable_norm_pose_2d"} + forward_kwargs: set = {"merge_results", "disable_rebase_keypoint", "pose_based_nms"} visualize_kwargs: set = { - 'return_vis', 'show', 'wait_time', 'draw_bbox', 'radius', 'thickness', - 'kpt_thr', 'vis_out_dir', 'skeleton_style', 'draw_heatmap', - 'black_background', 'num_instances' + "return_vis", + "show", + "wait_time", + "draw_bbox", + "radius", + "thickness", + "kpt_thr", + "vis_out_dir", + "skeleton_style", + "draw_heatmap", + "black_background", + "num_instances", } - postprocess_kwargs: set = {'pred_out_dir', 'return_datasample'} - - def __init__(self, - pose2d: Optional[str] = None, - pose2d_weights: Optional[str] = None, - pose3d: Optional[str] = None, - pose3d_weights: Optional[str] = None, - device: Optional[str] = None, - scope: str = 'mmpose', - det_model: Optional[Union[ModelType, str]] = None, - det_weights: Optional[str] = None, - det_cat_ids: Optional[Union[int, List]] = None, - show_progress: bool = False) -> None: + postprocess_kwargs: set = {"pred_out_dir", "return_datasample"} + + def __init__( + self, + pose2d: Optional[str] = None, + pose2d_weights: Optional[str] = None, + pose3d: Optional[str] = None, + pose3d_weights: Optional[str] = None, + device: Optional[str] = None, + scope: str = "mmpose", + det_model: Optional[Union[ModelType, str]] = None, + det_weights: Optional[str] = None, + det_cat_ids: Optional[Union[int, List]] = None, + show_progress: bool = False, + ) -> None: self.visualizer = None self.show_progress = show_progress if pose3d is not None: - if 'hand3d' in pose3d: - self.inferencer = Hand3DInferencer(pose3d, pose3d_weights, - device, scope, det_model, - det_weights, det_cat_ids, - show_progress) + if "hand3d" in pose3d: + self.inferencer = Hand3DInferencer( + pose3d, pose3d_weights, device, scope, det_model, det_weights, det_cat_ids, show_progress + ) else: - self.inferencer = Pose3DInferencer(pose3d, pose3d_weights, - pose2d, pose2d_weights, - device, scope, det_model, - det_weights, det_cat_ids, - show_progress) + self.inferencer = Pose3DInferencer( + pose3d, pose3d_weights, pose2d, pose2d_weights, device, scope, det_model, det_weights, det_cat_ids, show_progress + ) elif pose2d is not None: - self.inferencer = Pose2DInferencer(pose2d, pose2d_weights, device, - scope, det_model, det_weights, - det_cat_ids, show_progress) + self.inferencer = Pose2DInferencer(pose2d, pose2d_weights, device, scope, det_model, det_weights, det_cat_ids, show_progress) else: - raise ValueError('Either 2d or 3d pose estimation algorithm ' - 'should be provided.') + raise ValueError("Either 2d or 3d pose estimation algorithm " "should be provided.") def preprocess(self, inputs: InputsType, batch_size: int = 1, **kwargs): """Process the inputs into a model-feedable format. @@ -158,18 +158,21 @@ class MMPoseInferencer(BaseMMPoseInferencer): dict: Inference and visualization results. """ if out_dir is not None: - if 'vis_out_dir' not in kwargs: - kwargs['vis_out_dir'] = f'{out_dir}/visualizations' - if 'pred_out_dir' not in kwargs: - kwargs['pred_out_dir'] = f'{out_dir}/predictions' + if "vis_out_dir" not in kwargs: + kwargs["vis_out_dir"] = f"{out_dir}/visualizations" + if "pred_out_dir" not in kwargs: + kwargs["pred_out_dir"] = f"{out_dir}/predictions" kwargs = { key: value for key, value in kwargs.items() - if key in set.union(self.inferencer.preprocess_kwargs, - self.inferencer.forward_kwargs, - self.inferencer.visualize_kwargs, - self.inferencer.postprocess_kwargs) + if key + in set.union( + self.inferencer.preprocess_kwargs, + self.inferencer.forward_kwargs, + self.inferencer.visualize_kwargs, + self.inferencer.postprocess_kwargs, + ) } ( preprocess_kwargs, @@ -181,47 +184,37 @@ class MMPoseInferencer(BaseMMPoseInferencer): self.inferencer.update_model_visualizer_settings(**kwargs) # preprocessing - if isinstance(inputs, str) and inputs.startswith('webcam'): + if isinstance(inputs, str) and inputs.startswith("webcam"): inputs = self.inferencer._get_webcam_inputs(inputs) batch_size = 1 - if not visualize_kwargs.get('show', False): - warnings.warn('The display mode is closed when using webcam ' - 'input. It will be turned on automatically.') - visualize_kwargs['show'] = True + if not visualize_kwargs.get("show", False): + warnings.warn("The display mode is closed when using webcam " "input. It will be turned on automatically.") + visualize_kwargs["show"] = True else: inputs = self.inferencer._inputs_to_list(inputs) self._video_input = self.inferencer._video_input if self._video_input: self.video_info = self.inferencer.video_info - inputs = self.preprocess( - inputs, batch_size=batch_size, **preprocess_kwargs) + inputs = self.preprocess(inputs, batch_size=batch_size, **preprocess_kwargs) # forward - if 'bbox_thr' in self.inferencer.forward_kwargs: - forward_kwargs['bbox_thr'] = preprocess_kwargs.get('bbox_thr', -1) + if "bbox_thr" in self.inferencer.forward_kwargs: + forward_kwargs["bbox_thr"] = preprocess_kwargs.get("bbox_thr", -1) preds = [] - for proc_inputs, ori_inputs in (track(inputs, description='Inference') - if self.show_progress else inputs): + for proc_inputs, ori_inputs in track(inputs, description="Inference") if self.show_progress else inputs: preds = self.forward(proc_inputs, **forward_kwargs) - visualization = self.visualize(ori_inputs, preds, - **visualize_kwargs) - results = self.postprocess( - preds, - visualization, - return_datasamples=return_datasamples, - **postprocess_kwargs) + visualization = self.visualize(ori_inputs, preds, **visualize_kwargs) + results = self.postprocess(preds, visualization, return_datasamples=return_datasamples, **postprocess_kwargs) yield results if self._video_input: - self._finalize_video_processing( - postprocess_kwargs.get('pred_out_dir', '')) + self._finalize_video_processing(postprocess_kwargs.get("pred_out_dir", "")) - def visualize(self, inputs: InputsType, preds: PredType, - **kwargs) -> List[np.ndarray]: + def visualize(self, inputs: InputsType, preds: PredType, **kwargs) -> List[np.ndarray]: """Visualize predictions. Args: @@ -242,9 +235,8 @@ class MMPoseInferencer(BaseMMPoseInferencer): Returns: List[np.ndarray]: Visualization results. """ - window_name = '' + window_name = "" if self.inferencer._video_input: - window_name = self.inferencer.video_info['name'] + window_name = self.inferencer.video_info["name"] - return self.inferencer.visualize( - inputs, preds, window_name=window_name, **kwargs) + return self.inferencer.visualize(inputs, preds, window_name=window_name, **kwargs) diff --git a/mmpose/apis/inferencers/pose2d_inferencer.py b/mmpose/apis/inferencers/pose2d_inferencer.py index 8b6a2c3e96f9d537bccab05eed01de4a951377ca..c66f01b959a931465e85783f6e5bacb66a105d4d 100644 --- a/mmpose/apis/inferencers/pose2d_inferencer.py +++ b/mmpose/apis/inferencers/pose2d_inferencer.py @@ -15,6 +15,7 @@ from mmengine.structures import InstanceData from mmpose.evaluation.functional import nearby_joints_nms, nms from mmpose.registry import INFERENCERS from mmpose.structures import merge_data_samples + from .base_mmpose_inferencer import BaseMMPoseInferencer InstanceList = List[InstanceData] @@ -26,7 +27,7 @@ ConfigType = Union[Config, ConfigDict] ResType = Union[Dict, List[Dict], InstanceData, List[InstanceData]] -@INFERENCERS.register_module(name='pose-estimation') +@INFERENCERS.register_module(name="pose-estimation") @INFERENCERS.register_module() class Pose2DInferencer(BaseMMPoseInferencer): """The inferencer for 2D pose estimation. @@ -55,47 +56,44 @@ class Pose2DInferencer(BaseMMPoseInferencer): detection model. Defaults to None. """ - preprocess_kwargs: set = {'bbox_thr', 'nms_thr', 'bboxes'} - forward_kwargs: set = {'merge_results', 'pose_based_nms'} + preprocess_kwargs: set = {"bbox_thr", "nms_thr", "bboxes"} + forward_kwargs: set = {"merge_results", "pose_based_nms"} visualize_kwargs: set = { - 'return_vis', - 'show', - 'wait_time', - 'draw_bbox', - 'radius', - 'thickness', - 'kpt_thr', - 'vis_out_dir', - 'skeleton_style', - 'draw_heatmap', - 'black_background', + "return_vis", + "show", + "wait_time", + "draw_bbox", + "radius", + "thickness", + "kpt_thr", + "vis_out_dir", + "skeleton_style", + "draw_heatmap", + "black_background", } - postprocess_kwargs: set = {'pred_out_dir', 'return_datasample'} - - def __init__(self, - model: Union[ModelType, str], - weights: Optional[str] = None, - device: Optional[str] = None, - scope: Optional[str] = 'mmpose', - det_model: Optional[Union[ModelType, str]] = None, - det_weights: Optional[str] = None, - det_cat_ids: Optional[Union[int, Tuple]] = None, - show_progress: bool = False) -> None: + postprocess_kwargs: set = {"pred_out_dir", "return_datasample"} + + def __init__( + self, + model: Union[ModelType, str], + weights: Optional[str] = None, + device: Optional[str] = None, + scope: Optional[str] = "mmpose", + det_model: Optional[Union[ModelType, str]] = None, + det_weights: Optional[str] = None, + det_cat_ids: Optional[Union[int, Tuple]] = None, + show_progress: bool = False, + ) -> None: init_default_scope(scope) - super().__init__( - model=model, - weights=weights, - device=device, - scope=scope, - show_progress=show_progress) + super().__init__(model=model, weights=weights, device=device, scope=scope, show_progress=show_progress) self.model = revert_sync_batchnorm(self.model) # assign dataset metainfo to self.visualizer self.visualizer.set_dataset_meta(self.model.dataset_meta) # initialize detector for top-down models - if self.cfg.data_mode == 'topdown': + if self.cfg.data_mode == "topdown": self._init_detector( det_model=det_model, det_weights=det_weights, @@ -105,10 +103,7 @@ class Pose2DInferencer(BaseMMPoseInferencer): self._video_input = False - def update_model_visualizer_settings(self, - draw_heatmap: bool = False, - skeleton_style: str = 'mmpose', - **kwargs) -> None: + def update_model_visualizer_settings(self, draw_heatmap: bool = False, skeleton_style: str = "mmpose", **kwargs) -> None: """Update the settings of models and visualizer according to inference arguments. @@ -118,23 +113,22 @@ class Pose2DInferencer(BaseMMPoseInferencer): skeleton_style (str, optional): Skeleton style selection. Valid options are 'mmpose' and 'openpose'. Defaults to 'mmpose'. """ - self.model.test_cfg['output_heatmaps'] = draw_heatmap - - if skeleton_style not in ['mmpose', 'openpose']: - raise ValueError('`skeleton_style` must be either \'mmpose\' ' - 'or \'openpose\'') - - if skeleton_style == 'openpose': - self.visualizer.set_dataset_meta(self.model.dataset_meta, - skeleton_style) - - def preprocess_single(self, - input: InputType, - index: int, - bbox_thr: float = 0.3, - nms_thr: float = 0.3, - bboxes: Union[List[List], List[np.ndarray], - np.ndarray] = []): + self.model.test_cfg["output_heatmaps"] = draw_heatmap + + if skeleton_style not in ["mmpose", "openpose"]: + raise ValueError("`skeleton_style` must be either 'mmpose' " "or 'openpose'") + + if skeleton_style == "openpose": + self.visualizer.set_dataset_meta(self.model.dataset_meta, skeleton_style) + + def preprocess_single( + self, + input: InputType, + index: int, + bbox_thr: float = 0.3, + nms_thr: float = 0.3, + bboxes: Union[List[List], List[np.ndarray], np.ndarray] = [], + ): """Process a single input into a model-feedable format. Args: @@ -152,45 +146,40 @@ class Pose2DInferencer(BaseMMPoseInferencer): if isinstance(input, str): data_info = dict(img_path=input) else: - data_info = dict(img=input, img_path=f'{index}.jpg'.rjust(10, '0')) + data_info = dict(img=input, img_path=f"{index}.jpg".rjust(10, "0")) data_info.update(self.model.dataset_meta) - if self.cfg.data_mode == 'topdown': + if self.cfg.data_mode == "topdown": bboxes = [] if self.detector is not None: try: - det_results = self.detector( - input, return_datasamples=True)['predictions'] + det_results = self.detector(input, return_datasamples=True)["predictions"] except ValueError: print_log( - 'Support for mmpose and mmdet versions up to 3.1.0 ' - 'will be discontinued in upcoming releases. To ' - 'ensure ongoing compatibility, please upgrade to ' - 'mmdet version 3.2.0 or later.', - logger='current', - level=logging.WARNING) - det_results = self.detector( - input, return_datasample=True)['predictions'] + "Support for mmpose and mmdet versions up to 3.1.0 " + "will be discontinued in upcoming releases. To " + "ensure ongoing compatibility, please upgrade to " + "mmdet version 3.2.0 or later.", + logger="current", + level=logging.WARNING, + ) + det_results = self.detector(input, return_datasample=True)["predictions"] pred_instance = det_results[0].pred_instances.cpu().numpy() - bboxes = np.concatenate( - (pred_instance.bboxes, pred_instance.scores[:, None]), - axis=1) + bboxes = np.concatenate((pred_instance.bboxes, pred_instance.scores[:, None]), axis=1) label_mask = np.zeros(len(bboxes), dtype=np.uint8) for cat_id in self.det_cat_ids: - label_mask = np.logical_or(label_mask, - pred_instance.labels == cat_id) + label_mask = np.logical_or(label_mask, pred_instance.labels == cat_id) - bboxes = bboxes[np.logical_and( - label_mask, pred_instance.scores > bbox_thr)] + bboxes = bboxes[np.logical_and(label_mask, pred_instance.scores > bbox_thr)] bboxes = bboxes[nms(bboxes, nms_thr)] data_infos = [] if len(bboxes) > 0: for bbox in bboxes: inst = data_info.copy() - inst['bbox'] = bbox[None, :4] - inst['bbox_score'] = bbox[4:5] + inst["bbox"] = bbox[None, :4] + inst["bbox_score"] = bbox[4:5] data_infos.append(self.pipeline(inst)) else: inst = data_info.copy() @@ -200,8 +189,8 @@ class Pose2DInferencer(BaseMMPoseInferencer): input = mmcv.imread(input) h, w = input.shape[:2] - inst['bbox'] = np.array([[0, 0, w, h]], dtype=np.float32) - inst['bbox_score'] = np.ones(1, dtype=np.float32) + inst["bbox"] = np.array([[0, 0, w, h]], dtype=np.float32) + inst["bbox_score"] = np.ones(1, dtype=np.float32) data_infos.append(self.pipeline(inst)) else: # bottom-up @@ -210,11 +199,7 @@ class Pose2DInferencer(BaseMMPoseInferencer): return data_infos @torch.no_grad() - def forward(self, - inputs: Union[dict, tuple], - merge_results: bool = True, - bbox_thr: float = -1, - pose_based_nms: bool = False): + def forward(self, inputs: Union[dict, tuple], merge_results: bool = True, bbox_thr: float = -1, pose_based_nms: bool = False): """Performs a forward pass through the model. Args: @@ -232,14 +217,13 @@ class Pose2DInferencer(BaseMMPoseInferencer): A list of data samples with prediction instances. """ data_samples = self.model.test_step(inputs) - if self.cfg.data_mode == 'topdown' and merge_results: + if self.cfg.data_mode == "topdown" and merge_results: data_samples = [merge_data_samples(data_samples)] if bbox_thr > 0: for ds in data_samples: - if 'bbox_scores' in ds.pred_instances: - ds.pred_instances = ds.pred_instances[ - ds.pred_instances.bbox_scores > bbox_thr] + if "bbox_scores" in ds.pred_instances: + ds.pred_instances = ds.pred_instances[ds.pred_instances.bbox_scores > bbox_thr] if pose_based_nms: for ds in data_samples: @@ -251,10 +235,7 @@ class Pose2DInferencer(BaseMMPoseInferencer): num_keypoints = kpts.shape[-2] kept_indices = nearby_joints_nms( - [ - dict(keypoints=kpts[i], score=scores[i]) - for i in range(len(kpts)) - ], + [dict(keypoints=kpts[i], score=scores[i]) for i in range(len(kpts))], num_nearby_joints_thr=num_keypoints // 3, ) ds.pred_instances = ds.pred_instances[kept_indices] diff --git a/mmpose/apis/inferencers/pose3d_inferencer.py b/mmpose/apis/inferencers/pose3d_inferencer.py index f372438298c8ac8c4a6aaa9e171b9c799a9450b1..07dcbc3bcba78f2969a242e973113450c94ac850 100644 --- a/mmpose/apis/inferencers/pose3d_inferencer.py +++ b/mmpose/apis/inferencers/pose3d_inferencer.py @@ -13,10 +13,10 @@ from mmengine.model import revert_sync_batchnorm from mmengine.registry import init_default_scope from mmengine.structures import InstanceData -from mmpose.apis import (_track_by_iou, _track_by_oks, collate_pose_sequence, - convert_keypoint_definition, extract_pose_sequence) +from mmpose.apis import _track_by_iou, _track_by_oks, collate_pose_sequence, convert_keypoint_definition, extract_pose_sequence from mmpose.registry import INFERENCERS from mmpose.structures import PoseDataSample, merge_data_samples + from .base_mmpose_inferencer import BaseMMPoseInferencer from .pose2d_inferencer import Pose2DInferencer @@ -29,7 +29,7 @@ ConfigType = Union[Config, ConfigDict] ResType = Union[Dict, List[Dict], InstanceData, List[InstanceData]] -@INFERENCERS.register_module(name='pose-estimation-3d') +@INFERENCERS.register_module(name="pose-estimation-3d") @INFERENCERS.register_module() class Pose3DInferencer(BaseMMPoseInferencer): """The inferencer for 3D pose estimation. @@ -61,43 +61,37 @@ class Pose3DInferencer(BaseMMPoseInferencer): config will be used. Default is None. """ - preprocess_kwargs: set = { - 'bbox_thr', 'nms_thr', 'bboxes', 'use_oks_tracking', 'tracking_thr', - 'disable_norm_pose_2d' - } - forward_kwargs: set = {'disable_rebase_keypoint'} + preprocess_kwargs: set = {"bbox_thr", "nms_thr", "bboxes", "use_oks_tracking", "tracking_thr", "disable_norm_pose_2d"} + forward_kwargs: set = {"disable_rebase_keypoint"} visualize_kwargs: set = { - 'return_vis', - 'show', - 'wait_time', - 'draw_bbox', - 'radius', - 'thickness', - 'num_instances', - 'kpt_thr', - 'vis_out_dir', + "return_vis", + "show", + "wait_time", + "draw_bbox", + "radius", + "thickness", + "num_instances", + "kpt_thr", + "vis_out_dir", } - postprocess_kwargs: set = {'pred_out_dir', 'return_datasample'} - - def __init__(self, - model: Union[ModelType, str], - weights: Optional[str] = None, - pose2d_model: Optional[Union[ModelType, str]] = None, - pose2d_weights: Optional[str] = None, - device: Optional[str] = None, - scope: Optional[str] = 'mmpose', - det_model: Optional[Union[ModelType, str]] = None, - det_weights: Optional[str] = None, - det_cat_ids: Optional[Union[int, Tuple]] = None, - show_progress: bool = False) -> None: + postprocess_kwargs: set = {"pred_out_dir", "return_datasample"} + + def __init__( + self, + model: Union[ModelType, str], + weights: Optional[str] = None, + pose2d_model: Optional[Union[ModelType, str]] = None, + pose2d_weights: Optional[str] = None, + device: Optional[str] = None, + scope: Optional[str] = "mmpose", + det_model: Optional[Union[ModelType, str]] = None, + det_weights: Optional[str] = None, + det_cat_ids: Optional[Union[int, Tuple]] = None, + show_progress: bool = False, + ) -> None: init_default_scope(scope) - super().__init__( - model=model, - weights=weights, - device=device, - scope=scope, - show_progress=show_progress) + super().__init__(model=model, weights=weights, device=device, scope=scope, show_progress=show_progress) self.model = revert_sync_batchnorm(self.model) # assign dataset metainfo to self.visualizer @@ -105,36 +99,37 @@ class Pose3DInferencer(BaseMMPoseInferencer): # initialize 2d pose estimator self.pose2d_model = Pose2DInferencer( - pose2d_model if pose2d_model else 'human', pose2d_weights, device, - scope, det_model, det_weights, det_cat_ids) + pose2d_model if pose2d_model else "human", pose2d_weights, device, scope, det_model, det_weights, det_cat_ids + ) # helper functions self._keypoint_converter = partial( convert_keypoint_definition, - pose_det_dataset=self.pose2d_model.model. - dataset_meta['dataset_name'], - pose_lift_dataset=self.model.dataset_meta['dataset_name'], + pose_det_dataset=self.pose2d_model.model.dataset_meta["dataset_name"], + pose_lift_dataset=self.model.dataset_meta["dataset_name"], ) self._pose_seq_extractor = partial( extract_pose_sequence, - causal=self.cfg.test_dataloader.dataset.get('causal', False), - seq_len=self.cfg.test_dataloader.dataset.get('seq_len', 1), - step=self.cfg.test_dataloader.dataset.get('seq_step', 1)) + causal=self.cfg.test_dataloader.dataset.get("causal", False), + seq_len=self.cfg.test_dataloader.dataset.get("seq_len", 1), + step=self.cfg.test_dataloader.dataset.get("seq_step", 1), + ) self._video_input = False self._buffer = defaultdict(list) - def preprocess_single(self, - input: InputType, - index: int, - bbox_thr: float = 0.3, - nms_thr: float = 0.3, - bboxes: Union[List[List], List[np.ndarray], - np.ndarray] = [], - use_oks_tracking: bool = False, - tracking_thr: float = 0.3, - disable_norm_pose_2d: bool = False): + def preprocess_single( + self, + input: InputType, + index: int, + bbox_thr: float = 0.3, + nms_thr: float = 0.3, + bboxes: Union[List[List], List[np.ndarray], np.ndarray] = [], + use_oks_tracking: bool = False, + tracking_thr: float = 0.3, + disable_norm_pose_2d: bool = False, + ): """Process a single input into a model-feedable format. Args: @@ -165,30 +160,23 @@ class Pose3DInferencer(BaseMMPoseInferencer): # calculate 2d keypoints results_pose2d = next( - self.pose2d_model( - input, - bbox_thr=bbox_thr, - nms_thr=nms_thr, - bboxes=bboxes, - merge_results=False, - return_datasamples=True))['predictions'] + self.pose2d_model(input, bbox_thr=bbox_thr, nms_thr=nms_thr, bboxes=bboxes, merge_results=False, return_datasamples=True) + )["predictions"] for ds in results_pose2d: - ds.pred_instances.set_field( - (ds.pred_instances.bboxes[..., 2:] - - ds.pred_instances.bboxes[..., :2]).prod(-1), 'areas') + ds.pred_instances.set_field((ds.pred_instances.bboxes[..., 2:] - ds.pred_instances.bboxes[..., :2]).prod(-1), "areas") if not self._video_input: - height, width = results_pose2d[0].metainfo['ori_shape'] + height, width = results_pose2d[0].metainfo["ori_shape"] # Clear the buffer if inputs are individual images to prevent # carryover effects from previous images self._buffer.clear() else: - height = self.video_info['height'] - width = self.video_info['width'] - img_path = results_pose2d[0].metainfo['img_path'] + height = self.video_info["height"] + width = self.video_info["width"] + img_path = results_pose2d[0].metainfo["img_path"] # instance matching if use_oks_tracking: @@ -197,42 +185,38 @@ class Pose3DInferencer(BaseMMPoseInferencer): _track = _track_by_iou for result in results_pose2d: - track_id, self._buffer['results_pose2d_last'], _ = _track( - result, self._buffer['results_pose2d_last'], tracking_thr) + track_id, self._buffer["results_pose2d_last"], _ = _track(result, self._buffer["results_pose2d_last"], tracking_thr) if track_id == -1: pred_instances = result.pred_instances.cpu().numpy() keypoints = pred_instances.keypoints if np.count_nonzero(keypoints[:, :, 1]) >= 3: - next_id = self._buffer.get('next_id', 0) - result.set_field(next_id, 'track_id') - self._buffer['next_id'] = next_id + 1 + next_id = self._buffer.get("next_id", 0) + result.set_field(next_id, "track_id") + self._buffer["next_id"] = next_id + 1 else: # If the number of keypoints detected is small, # delete that person instance. result.pred_instances.keypoints[..., 1] = -10 result.pred_instances.bboxes *= 0 - result.set_field(-1, 'track_id') + result.set_field(-1, "track_id") else: - result.set_field(track_id, 'track_id') - self._buffer['pose2d_results'] = merge_data_samples(results_pose2d) + result.set_field(track_id, "track_id") + self._buffer["pose2d_results"] = merge_data_samples(results_pose2d) # convert keypoints results_pose2d_converted = [ds.cpu().numpy() for ds in results_pose2d] for ds in results_pose2d_converted: - ds.pred_instances.keypoints = self._keypoint_converter( - ds.pred_instances.keypoints) - self._buffer['pose_est_results_list'].append(results_pose2d_converted) + ds.pred_instances.keypoints = self._keypoint_converter(ds.pred_instances.keypoints) + self._buffer["pose_est_results_list"].append(results_pose2d_converted) # extract and pad input pose2d sequence - pose_results_2d = self._pose_seq_extractor( - self._buffer['pose_est_results_list'], - frame_idx=index if self._video_input else 0) - causal = self.cfg.test_dataloader.dataset.get('causal', False) + pose_results_2d = self._pose_seq_extractor(self._buffer["pose_est_results_list"], frame_idx=index if self._video_input else 0) + causal = self.cfg.test_dataloader.dataset.get("causal", False) target_idx = -1 if causal else len(pose_results_2d) // 2 - stats_info = self.model.dataset_meta.get('stats_info', {}) - bbox_center = stats_info.get('bbox_center', None) - bbox_scale = stats_info.get('bbox_scale', None) + stats_info = self.model.dataset_meta.get("stats_info", {}) + bbox_center = stats_info.get("bbox_center", None) + bbox_scale = stats_info.get("bbox_scale", None) pose_results_2d_copy = [] for pose_res in pose_results_2d: @@ -240,10 +224,8 @@ class Pose3DInferencer(BaseMMPoseInferencer): for data_sample in pose_res: data_sample_copy = PoseDataSample() - data_sample_copy.gt_instances = \ - data_sample.gt_instances.clone() - data_sample_copy.pred_instances = \ - data_sample.pred_instances.clone() + data_sample_copy.gt_instances = data_sample.gt_instances.clone() + data_sample_copy.pred_instances = data_sample.pred_instances.clone() data_sample_copy.track_id = data_sample.track_id kpts = data_sample.pred_instances.keypoints @@ -253,20 +235,16 @@ class Pose3DInferencer(BaseMMPoseInferencer): kpt = kpts[k] if not disable_norm_pose_2d: bbox = bboxes[k] - center = np.array([[(bbox[0] + bbox[2]) / 2, - (bbox[1] + bbox[3]) / 2]]) + center = np.array([[(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2]]) scale = max(bbox[2] - bbox[0], bbox[3] - bbox[1]) - keypoints.append((kpt[:, :2] - center) / scale * - bbox_scale + bbox_center) + keypoints.append((kpt[:, :2] - center) / scale * bbox_scale + bbox_center) else: keypoints.append(kpt[:, :2]) - data_sample_copy.pred_instances.set_field( - np.array(keypoints), 'keypoints') + data_sample_copy.pred_instances.set_field(np.array(keypoints), "keypoints") pose_res_copy.append(data_sample_copy) pose_results_2d_copy.append(pose_res_copy) - pose_sequences_2d = collate_pose_sequence(pose_results_2d_copy, True, - target_idx) + pose_sequences_2d = collate_pose_sequence(pose_results_2d_copy, True, target_idx) if not pose_sequences_2d: return [] @@ -275,36 +253,32 @@ class Pose3DInferencer(BaseMMPoseInferencer): data_info = dict() keypoints_2d = pose_seq.pred_instances.keypoints - keypoints_2d = np.squeeze( - keypoints_2d, - axis=0) if keypoints_2d.ndim == 4 else keypoints_2d + keypoints_2d = np.squeeze(keypoints_2d, axis=0) if keypoints_2d.ndim == 4 else keypoints_2d T, K, C = keypoints_2d.shape - data_info['keypoints'] = keypoints_2d - data_info['keypoints_visible'] = np.ones(( - T, - K, - ), - dtype=np.float32) - data_info['lifting_target'] = np.zeros((1, K, 3), dtype=np.float32) - data_info['factor'] = np.zeros((T, ), dtype=np.float32) - data_info['lifting_target_visible'] = np.ones((1, K, 1), - dtype=np.float32) - data_info['camera_param'] = dict(w=width, h=height) + data_info["keypoints"] = keypoints_2d + data_info["keypoints_visible"] = np.ones( + ( + T, + K, + ), + dtype=np.float32, + ) + data_info["lifting_target"] = np.zeros((1, K, 3), dtype=np.float32) + data_info["factor"] = np.zeros((T,), dtype=np.float32) + data_info["lifting_target_visible"] = np.ones((1, K, 1), dtype=np.float32) + data_info["camera_param"] = dict(w=width, h=height) data_info.update(self.model.dataset_meta) data_info = self.pipeline(data_info) - data_info['data_samples'].set_field( - img_path, 'img_path', field_type='metainfo') + data_info["data_samples"].set_field(img_path, "img_path", field_type="metainfo") data_list.append(data_info) return data_list @torch.no_grad() - def forward(self, - inputs: Union[dict, tuple], - disable_rebase_keypoint: bool = False): + def forward(self, inputs: Union[dict, tuple], disable_rebase_keypoint: bool = False): """Perform forward pass through the model and process the results. Args: @@ -319,18 +293,16 @@ class Pose3DInferencer(BaseMMPoseInferencer): pose_lift_results = self.model.test_step(inputs) # Post-processing of pose estimation results - pose_est_results_converted = self._buffer['pose_est_results_list'][-1] + pose_est_results_converted = self._buffer["pose_est_results_list"][-1] for idx, pose_lift_res in enumerate(pose_lift_results): # Update track_id from the pose estimation results - pose_lift_res.track_id = pose_est_results_converted[idx].get( - 'track_id', 1e4) + pose_lift_res.track_id = pose_est_results_converted[idx].get("track_id", 1e4) # align the shape of output keypoints coordinates and scores keypoints = pose_lift_res.pred_instances.keypoints keypoint_scores = pose_lift_res.pred_instances.keypoint_scores if keypoint_scores.ndim == 3: - pose_lift_results[idx].pred_instances.keypoint_scores = \ - np.squeeze(keypoint_scores, axis=1) + pose_lift_results[idx].pred_instances.keypoint_scores = np.squeeze(keypoint_scores, axis=1) if keypoints.ndim == 4: keypoints = np.squeeze(keypoints, axis=1) @@ -341,32 +313,31 @@ class Pose3DInferencer(BaseMMPoseInferencer): # If rebase_keypoint_height is True, adjust z-axis values if not disable_rebase_keypoint: - keypoints[..., 2] -= np.min( - keypoints[..., 2], axis=-1, keepdims=True) + keypoints[..., 2] -= np.min(keypoints[..., 2], axis=-1, keepdims=True) pose_lift_results[idx].pred_instances.keypoints = keypoints - pose_lift_results = sorted( - pose_lift_results, key=lambda x: x.get('track_id', 1e4)) + pose_lift_results = sorted(pose_lift_results, key=lambda x: x.get("track_id", 1e4)) data_samples = [merge_data_samples(pose_lift_results)] return data_samples - def visualize(self, - inputs: list, - preds: List[PoseDataSample], - return_vis: bool = False, - show: bool = False, - draw_bbox: bool = False, - wait_time: float = 0, - radius: int = 3, - thickness: int = 1, - kpt_thr: float = 0.3, - num_instances: int = 1, - vis_out_dir: str = '', - window_name: str = '', - window_close_event_handler: Optional[Callable] = None - ) -> List[np.ndarray]: + def visualize( + self, + inputs: list, + preds: List[PoseDataSample], + return_vis: bool = False, + show: bool = False, + draw_bbox: bool = False, + wait_time: float = 0, + radius: int = 3, + thickness: int = 1, + kpt_thr: float = 0.3, + num_instances: int = 1, + vis_out_dir: str = "", + window_name: str = "", + window_close_event_handler: Optional[Callable] = None, + ) -> List[np.ndarray]: """Visualize predictions. Args: @@ -394,9 +365,8 @@ class Pose3DInferencer(BaseMMPoseInferencer): if (not return_vis) and (not show) and (not vis_out_dir): return - if getattr(self, 'visualizer', None) is None: - raise ValueError('Visualization needs the "visualizer" term' - 'defined in the config, but got None.') + if getattr(self, "visualizer", None) is None: + raise ValueError('Visualization needs the "visualizer" term' "defined in the config, but got None.") self.visualizer.radius = radius self.visualizer.line_width = thickness @@ -411,12 +381,11 @@ class Pose3DInferencer(BaseMMPoseInferencer): for single_input, pred in zip(inputs, preds): if isinstance(single_input, str): - img = mmcv.imread(single_input, channel_order='rgb') + img = mmcv.imread(single_input, channel_order="rgb") elif isinstance(single_input, np.ndarray): img = mmcv.bgr2rgb(single_input) else: - raise ValueError('Unsupported input type: ' - f'{type(single_input)}') + raise ValueError("Unsupported input type: " f"{type(single_input)}") # since visualization and inference utilize the same process, # the wait time is reduced when a video input is utilized, @@ -430,21 +399,20 @@ class Pose3DInferencer(BaseMMPoseInferencer): window_name, img, data_sample=pred, - det_data_sample=self._buffer['pose2d_results'], + det_data_sample=self._buffer["pose2d_results"], draw_gt=False, draw_bbox=draw_bbox, show=show, wait_time=wait_time, - dataset_2d=self.pose2d_model.model. - dataset_meta['dataset_name'], - dataset_3d=self.model.dataset_meta['dataset_name'], + dataset_2d=self.pose2d_model.model.dataset_meta["dataset_name"], + dataset_3d=self.model.dataset_meta["dataset_name"], kpt_thr=kpt_thr, - num_instances=num_instances) + num_instances=num_instances, + ) results.append(visualization) if vis_out_dir: - img_name = os.path.basename(pred.metainfo['img_path']) \ - if 'img_path' in pred.metainfo else None + img_name = os.path.basename(pred.metainfo["img_path"]) if "img_path" in pred.metainfo else None self.save_visualization( visualization, vis_out_dir, diff --git a/mmpose/apis/inferencers/utils/__init__.py b/mmpose/apis/inferencers/utils/__init__.py index 5cc40535b0d42a3b2ff41e97e26dcc30c440622b..b690197616b7aafd22d1970c44f4a66369ae9d9e 100644 --- a/mmpose/apis/inferencers/utils/__init__.py +++ b/mmpose/apis/inferencers/utils/__init__.py @@ -2,4 +2,4 @@ from .default_det_models import default_det_models from .get_model_alias import get_model_aliases -__all__ = ['default_det_models', 'get_model_aliases'] +__all__ = ["default_det_models", "get_model_aliases"] diff --git a/mmpose/apis/inferencers/utils/default_det_models.py b/mmpose/apis/inferencers/utils/default_det_models.py index a2deca961b00c75fe05d09b30b05394c175acf5b..6656774440246db890399d6ef1e890db2b655f7f 100644 --- a/mmpose/apis/inferencers/utils/default_det_models.py +++ b/mmpose/apis/inferencers/utils/default_det_models.py @@ -4,33 +4,26 @@ import os.path as osp from mmengine.config.utils import MODULE2PACKAGE from mmengine.utils import get_installed_path -mmpose_path = get_installed_path(MODULE2PACKAGE['mmpose']) +mmpose_path = get_installed_path(MODULE2PACKAGE["mmpose"]) default_det_models = dict( human=dict( - model=osp.join( - mmpose_path, '.mim', 'demo/mmdetection_cfg/' - 'rtmdet_m_640-8xb32_coco-person.py'), - weights='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth', - cat_ids=(0, )), + model=osp.join(mmpose_path, ".mim", "demo/mmdetection_cfg/" "rtmdet_m_640-8xb32_coco-person.py"), + weights="https://download.openmmlab.com/mmpose/v1/projects/" "rtmposev1/rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth", + cat_ids=(0,), + ), face=dict( - model=osp.join(mmpose_path, '.mim', - 'demo/mmdetection_cfg/yolox-s_8xb8-300e_coco-face.py'), - weights='https://download.openmmlab.com/mmpose/mmdet_pretrained/' - 'yolo-x_8xb8-300e_coco-face_13274d7c.pth', - cat_ids=(0, )), + model=osp.join(mmpose_path, ".mim", "demo/mmdetection_cfg/yolox-s_8xb8-300e_coco-face.py"), + weights="https://download.openmmlab.com/mmpose/mmdet_pretrained/" "yolo-x_8xb8-300e_coco-face_13274d7c.pth", + cat_ids=(0,), + ), hand=dict( - model=osp.join(mmpose_path, '.mim', 'demo/mmdetection_cfg/' - 'rtmdet_nano_320-8xb32_hand.py'), - weights='https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/' - 'rtmdet_nano_8xb32-300e_hand-267f9c8f.pth', - cat_ids=(0, )), - animal=dict( - model='rtmdet-m', - weights=None, - cat_ids=(15, 16, 17, 18, 19, 20, 21, 22, 23)), + model=osp.join(mmpose_path, ".mim", "demo/mmdetection_cfg/" "rtmdet_nano_320-8xb32_hand.py"), + weights="https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/" "rtmdet_nano_8xb32-300e_hand-267f9c8f.pth", + cat_ids=(0,), + ), + animal=dict(model="rtmdet-m", weights=None, cat_ids=(15, 16, 17, 18, 19, 20, 21, 22, 23)), ) -default_det_models['body'] = default_det_models['human'] -default_det_models['wholebody'] = default_det_models['human'] +default_det_models["body"] = default_det_models["human"] +default_det_models["wholebody"] = default_det_models["human"] diff --git a/mmpose/apis/inferencers/utils/get_model_alias.py b/mmpose/apis/inferencers/utils/get_model_alias.py index 49de6528d6ea0df58cf7ae987176defbd4953739..e058c83163df72987681aa446ac4ffc1856f05ca 100644 --- a/mmpose/apis/inferencers/utils/get_model_alias.py +++ b/mmpose/apis/inferencers/utils/get_model_alias.py @@ -4,7 +4,7 @@ from typing import Dict from mmengine.infer import BaseInferencer -def get_model_aliases(scope: str = 'mmpose') -> Dict[str, str]: +def get_model_aliases(scope: str = "mmpose") -> Dict[str, str]: """Retrieve model aliases and their corresponding configuration names. Args: @@ -22,16 +22,17 @@ def get_model_aliases(scope: str = 'mmpose') -> Dict[str, str]: model_alias_dict = dict() for model_cfg in model_cfgs: - if 'Alias' in model_cfg: - if isinstance(model_cfg['Alias'], str): - model_alias_dict[model_cfg['Alias']] = model_cfg['Name'] - elif isinstance(model_cfg['Alias'], list): - for alias in model_cfg['Alias']: - model_alias_dict[alias] = model_cfg['Name'] + if "Alias" in model_cfg: + if isinstance(model_cfg["Alias"], str): + model_alias_dict[model_cfg["Alias"]] = model_cfg["Name"] + elif isinstance(model_cfg["Alias"], list): + for alias in model_cfg["Alias"]: + model_alias_dict[alias] = model_cfg["Name"] else: raise ValueError( - 'encounter an unexpected alias type. Please raise an ' - 'issue at https://github.com/open-mmlab/mmpose/issues ' - 'to announce us') + "encounter an unexpected alias type. Please raise an " + "issue at https://github.com/open-mmlab/mmpose/issues " + "to announce us" + ) return model_alias_dict diff --git a/mmpose/apis/visualization.py b/mmpose/apis/visualization.py index ffc951ea427c363285b4b0daa5e48bab7716a5a0..abef904aaf18e1c821007d9883b67de12a9d7dcf 100644 --- a/mmpose/apis/visualization.py +++ b/mmpose/apis/visualization.py @@ -68,7 +68,7 @@ def visualize( metainfo: Union[str, dict] = None, visualizer: PoseLocalVisualizer = None, show_kpt_idx: bool = False, - skeleton_style: str = 'mmpose', + skeleton_style: str = "mmpose", show: bool = False, kpt_thr: float = 0.3, ): @@ -87,9 +87,7 @@ def visualize( wait_time (int): Value of waitKey param. kpt_thr (float): Keypoint threshold. """ - assert skeleton_style in [ - 'mmpose', 'openpose' - ], (f'Only support skeleton style in {["mmpose", "openpose"]}, ') + assert skeleton_style in ["mmpose", "openpose"], f'Only support skeleton style in {["mmpose", "openpose"]}, ' if visualizer is None: visualizer = PoseLocalVisualizer() @@ -105,7 +103,7 @@ def visualize( visualizer.set_dataset_meta(metainfo, skeleton_style=skeleton_style) if isinstance(img, str): - img = mmcv.imread(img, channel_order='rgb') + img = mmcv.imread(img, channel_order="rgb") elif isinstance(img, np.ndarray): img = mmcv.bgr2rgb(img) @@ -120,13 +118,14 @@ def visualize( tmp_datasample.pred_instances = tmp_instances visualizer.add_datasample( - 'visualization', + "visualization", img, tmp_datasample, show_kpt_idx=show_kpt_idx, skeleton_style=skeleton_style, show=show, wait_time=0, - kpt_thr=kpt_thr) + kpt_thr=kpt_thr, + ) return visualizer.get_image() diff --git a/mmpose/codecs/__init__.py b/mmpose/codecs/__init__.py index 31bc874a13d5db1d8d42093359940bda24db814f..d56f237c88b47b487e4c222000d8a5bcc389117e 100644 --- a/mmpose/codecs/__init__.py +++ b/mmpose/codecs/__init__.py @@ -9,17 +9,30 @@ from .integral_regression_label import IntegralRegressionLabel from .megvii_heatmap import MegviiHeatmap from .motionbert_label import MotionBERTLabel from .msra_heatmap import MSRAHeatmap +from .oks_argmax_heatmap import OKSArgMaxHeatmap +from .onehot_heatmap import OneHotHeatmap from .regression_label import RegressionLabel from .simcc_label import SimCCLabel from .spr import SPR from .udp_heatmap import UDPHeatmap from .video_pose_lifting import VideoPoseLifting -from .onehot_heatmap import OneHotHeatmap __all__ = [ - 'MSRAHeatmap', 'MegviiHeatmap', 'UDPHeatmap', 'RegressionLabel', - 'SimCCLabel', 'IntegralRegressionLabel', 'AssociativeEmbedding', 'SPR', - 'DecoupledHeatmap', 'VideoPoseLifting', 'ImagePoseLifting', - 'MotionBERTLabel', 'YOLOXPoseAnnotationProcessor', 'EDPoseLabel', - 'Hand3DHeatmap', 'OneHotHeatmap' + "MSRAHeatmap", + "MegviiHeatmap", + "UDPHeatmap", + "RegressionLabel", + "SimCCLabel", + "IntegralRegressionLabel", + "AssociativeEmbedding", + "SPR", + "DecoupledHeatmap", + "VideoPoseLifting", + "ImagePoseLifting", + "MotionBERTLabel", + "YOLOXPoseAnnotationProcessor", + "EDPoseLabel", + "Hand3DHeatmap", + "OKSArgMaxHeatmap", + "OneHotHeatmap", ] diff --git a/mmpose/codecs/annotation_processors.py b/mmpose/codecs/annotation_processors.py index 72a578df7000707ceb122469a4fe9ab85959625f..65b7ff42475e885c251aa98fe234c34973c513d5 100644 --- a/mmpose/codecs/annotation_processors.py +++ b/mmpose/codecs/annotation_processors.py @@ -4,6 +4,7 @@ from typing import Dict, List, Optional, Tuple import numpy as np from mmpose.registry import KEYPOINT_CODECS + from .base import BaseKeypointCodec INF = 1e6 @@ -31,35 +32,34 @@ class YOLOXPoseAnnotationProcessor(BaseAnnotationProcessor): codec in deployment but is not used indeed. """ - auxiliary_encode_keys = {'category_id', 'bbox'} + auxiliary_encode_keys = {"category_id", "bbox"} label_mapping_table = dict( - bbox='bboxes', - bbox_labels='labels', - keypoints='keypoints', - keypoints_visible='keypoints_visible', - area='areas', + bbox="bboxes", + bbox_labels="labels", + keypoints="keypoints", + keypoints_visible="keypoints_visible", + area="areas", ) instance_mapping_table = dict( - bbox='bboxes', - bbox_score='bbox_scores', - keypoints='keypoints', - keypoints_visible='keypoints_visible', + bbox="bboxes", + bbox_score="bbox_scores", + keypoints="keypoints", + keypoints_visible="keypoints_visible", # remove 'bbox_scales' in default instance_mapping_table to avoid # length mismatch during training with multiple datasets ) - def __init__(self, - extend_bbox: bool = False, - input_size: Optional[Tuple] = None): + def __init__(self, extend_bbox: bool = False, input_size: Optional[Tuple] = None): super().__init__() self.extend_bbox = extend_bbox - def encode(self, - keypoints: Optional[np.ndarray] = None, - keypoints_visible: Optional[np.ndarray] = None, - bbox: Optional[np.ndarray] = None, - category_id: Optional[List[int]] = None - ) -> Dict[str, np.ndarray]: + def encode( + self, + keypoints: Optional[np.ndarray] = None, + keypoints_visible: Optional[np.ndarray] = None, + bbox: Optional[np.ndarray] = None, + category_id: Optional[List[int]] = None, + ) -> Dict[str, np.ndarray]: """Encode keypoints, bounding boxes, and category IDs. Args: @@ -90,11 +90,11 @@ class YOLOXPoseAnnotationProcessor(BaseAnnotationProcessor): kpts_max[keypoints_visible == 0] = NEG_INF bbox[..., 2:] = np.maximum(bbox[..., 2:], kpts_max.max(axis=1)) - results['bbox'] = bbox + results["bbox"] = bbox if category_id is not None: # Convert category IDs to labels bbox_labels = np.array(category_id).astype(np.int8) - 1 - results['bbox_labels'] = bbox_labels + results["bbox_labels"] = bbox_labels return results diff --git a/mmpose/codecs/associative_embedding.py b/mmpose/codecs/associative_embedding.py index def9bfd89ed9157ca45b60d5dcd33861e7eac9ec..396ea5564c7b7fc06f29a58c0e48dcb1ee97eece 100644 --- a/mmpose/codecs/associative_embedding.py +++ b/mmpose/codecs/associative_embedding.py @@ -4,15 +4,20 @@ from typing import Any, List, Optional, Tuple import numpy as np import torch -from munkres import Munkres from torch import Tensor from mmpose.registry import KEYPOINT_CODECS from mmpose.utils.tensor_utils import to_numpy +from munkres import Munkres + from .base import BaseKeypointCodec -from .utils import (batch_heatmap_nms, generate_gaussian_heatmaps, - generate_udp_gaussian_heatmaps, refine_keypoints, - refine_keypoints_dark_udp) +from .utils import ( + batch_heatmap_nms, + generate_gaussian_heatmaps, + generate_udp_gaussian_heatmaps, + refine_keypoints, + refine_keypoints_dark_udp, +) def _py_max_match(scores): @@ -30,13 +35,15 @@ def _py_max_match(scores): return tmp -def _group_keypoints_by_tags(vals: np.ndarray, - tags: np.ndarray, - locs: np.ndarray, - keypoint_order: List[int], - val_thr: float, - tag_thr: float = 1.0, - max_groups: Optional[int] = None) -> np.ndarray: +def _group_keypoints_by_tags( + vals: np.ndarray, + tags: np.ndarray, + locs: np.ndarray, + keypoint_order: List[int], + val_thr: float, + tag_thr: float = 1.0, + max_groups: Optional[int] = None, +) -> np.ndarray: """Group the keypoints by tags using Munkres algorithm. Note: @@ -112,31 +119,24 @@ def _group_keypoints_by_tags(vals: np.ndarray, num_grouped = diff.shape[1] if num_added > num_grouped: - diff_normed = np.concatenate( - (diff_normed, - np.zeros((num_added, num_added - num_grouped), - dtype=np.float32) + 1e10), - axis=1) + diff_normed = np.concatenate((diff_normed, np.zeros((num_added, num_added - num_grouped), dtype=np.float32) + 1e10), axis=1) pairs = _py_max_match(diff_normed) for row, col in pairs: - if (row < num_added and col < num_grouped - and diff_saved[row][col] < tag_thr): + if row < num_added and col < num_grouped and diff_saved[row][col] < tag_thr: key = grouped_keys[col] joint_dict[key][idx] = joints[row] tag_dict[key].append(tags[row]) else: key = tags[row][0] - joint_dict.setdefault(key, np.copy(default_))[idx] = \ - joints[row] + joint_dict.setdefault(key, np.copy(default_))[idx] = joints[row] tag_dict[key] = [tags[row]] joint_dict_keys = list(joint_dict.keys())[:max_groups] if joint_dict_keys: - results = np.array([joint_dict[i] - for i in joint_dict_keys]).astype(np.float32) - results = results[..., :D + 1] + results = np.array([joint_dict[i] for i in joint_dict_keys]).astype(np.float32) + results = results[..., : D + 1] else: results = np.empty((0, K, D + 1), dtype=np.float32) return results @@ -231,22 +231,15 @@ class AssociativeEmbedding(BaseKeypointCodec): self.decode_keypoint_order = decode_keypoint_order.copy() if self.use_udp: - self.scale_factor = ((np.array(input_size) - 1) / - (np.array(heatmap_size) - 1)).astype( - np.float32) + self.scale_factor = ((np.array(input_size) - 1) / (np.array(heatmap_size) - 1)).astype(np.float32) else: - self.scale_factor = (np.array(input_size) / - heatmap_size).astype(np.float32) + self.scale_factor = (np.array(input_size) / heatmap_size).astype(np.float32) if sigma is None: - sigma = (heatmap_size[0] * heatmap_size[1])**0.5 / 64 + sigma = (heatmap_size[0] * heatmap_size[1]) ** 0.5 / 64 self.sigma = sigma - def encode( - self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None - ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: + def encode(self, keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: """Encode keypoints into heatmaps and position indices. Note that the original keypoint coordinates should be in the input image space. @@ -275,32 +268,22 @@ class AssociativeEmbedding(BaseKeypointCodec): if self.use_udp: heatmaps, keypoint_weights = generate_udp_gaussian_heatmaps( - heatmap_size=self.heatmap_size, - keypoints=_keypoints, - keypoints_visible=keypoints_visible, - sigma=self.sigma) + heatmap_size=self.heatmap_size, keypoints=_keypoints, keypoints_visible=keypoints_visible, sigma=self.sigma + ) else: heatmaps, keypoint_weights = generate_gaussian_heatmaps( - heatmap_size=self.heatmap_size, - keypoints=_keypoints, - keypoints_visible=keypoints_visible, - sigma=self.sigma) + heatmap_size=self.heatmap_size, keypoints=_keypoints, keypoints_visible=keypoints_visible, sigma=self.sigma + ) keypoint_indices = self._encode_keypoint_indices( - heatmap_size=self.heatmap_size, - keypoints=_keypoints, - keypoints_visible=keypoints_visible) + heatmap_size=self.heatmap_size, keypoints=_keypoints, keypoints_visible=keypoints_visible + ) - encoded = dict( - heatmaps=heatmaps, - keypoint_indices=keypoint_indices, - keypoint_weights=keypoint_weights) + encoded = dict(heatmaps=heatmaps, keypoint_indices=keypoint_indices, keypoint_weights=keypoint_weights) return encoded - def _encode_keypoint_indices(self, heatmap_size: Tuple[int, int], - keypoints: np.ndarray, - keypoints_visible: np.ndarray) -> np.ndarray: + def _encode_keypoint_indices(self, heatmap_size: Tuple[int, int], keypoints: np.ndarray, keypoints_visible: np.ndarray) -> np.ndarray: w, h = heatmap_size N, K, _ = keypoints.shape keypoint_indices = np.zeros((N, K, 2), dtype=np.int64) @@ -308,7 +291,7 @@ class AssociativeEmbedding(BaseKeypointCodec): for n, k in product(range(N), range(K)): x, y = (keypoints[n, k] + 0.5).astype(np.int64) index = y * w + x - vis = (keypoints_visible[n, k] > 0.5 and 0 <= x < w and 0 <= y < h) + vis = keypoints_visible[n, k] > 0.5 and 0 <= x < w and 0 <= y < h keypoint_indices[n, k] = [index, vis] return keypoint_indices @@ -316,8 +299,7 @@ class AssociativeEmbedding(BaseKeypointCodec): def decode(self, encoded: Any) -> Tuple[np.ndarray, np.ndarray]: raise NotImplementedError() - def _get_batch_topk(self, batch_heatmaps: Tensor, batch_tags: Tensor, - k: int): + def _get_batch_topk(self, batch_heatmaps: Tensor, batch_tags: Tensor, k: int): """Get top-k response values from the heatmaps and corresponding tag values from the tagging heatmaps. @@ -342,22 +324,18 @@ class AssociativeEmbedding(BaseKeypointCodec): L = batch_tags.shape[1] // K # shape of topk_val, top_indices: (B, K, TopK) - topk_vals, topk_indices = batch_heatmaps.flatten(-2, -1).topk( - k, dim=-1) + topk_vals, topk_indices = batch_heatmaps.flatten(-2, -1).topk(k, dim=-1) topk_tags_per_kpts = [ - torch.gather(_tag, dim=2, index=topk_indices) - for _tag in torch.unbind(batch_tags.view(B, L, K, H * W), dim=1) + torch.gather(_tag, dim=2, index=topk_indices) for _tag in torch.unbind(batch_tags.view(B, L, K, H * W), dim=1) ] topk_tags = torch.stack(topk_tags_per_kpts, dim=-1) # (B, K, TopK, L) - topk_locs = torch.stack([topk_indices % W, topk_indices // W], - dim=-1) # (B, K, TopK, 2) + topk_locs = torch.stack([topk_indices % W, topk_indices // W], dim=-1) # (B, K, TopK, 2) return topk_vals, topk_tags, topk_locs - def _group_keypoints(self, batch_vals: np.ndarray, batch_tags: np.ndarray, - batch_locs: np.ndarray): + def _group_keypoints(self, batch_vals: np.ndarray, batch_tags: np.ndarray, batch_locs: np.ndarray): """Group keypoints into groups (each represents an instance) by tags. Args: @@ -384,15 +362,14 @@ class AssociativeEmbedding(BaseKeypointCodec): keypoint_order=self.decode_keypoint_order, val_thr=self.decode_keypoint_thr, tag_thr=self.decode_tag_thr, - max_groups=self.decode_max_instances) + max_groups=self.decode_max_instances, + ) _results = map(_group_func, zip(batch_vals, batch_tags, batch_locs)) results = list(_results) return results - def _fill_missing_keypoints(self, keypoints: np.ndarray, - keypoint_scores: np.ndarray, - heatmaps: np.ndarray, tags: np.ndarray): + def _fill_missing_keypoints(self, keypoints: np.ndarray, keypoint_scores: np.ndarray, heatmaps: np.ndarray, tags: np.ndarray): """Fill the missing keypoints in the initial predictions. Args: @@ -433,8 +410,7 @@ class AssociativeEmbedding(BaseKeypointCodec): for k in range(K): if keypoint_scores[n, k] > 0: continue - dist_map = np.linalg.norm( - keypoint_tags[k] - tag, ord=2, axis=0) + dist_map = np.linalg.norm(keypoint_tags[k] - tag, ord=2, axis=0) cost_map = np.round(dist_map) * 100 - heatmaps[k] # H, W y, x = np.unravel_index(np.argmin(cost_map), shape=(H, W)) keypoints[n, k] = [x, y] @@ -442,8 +418,7 @@ class AssociativeEmbedding(BaseKeypointCodec): return keypoints, keypoint_scores - def batch_decode(self, batch_heatmaps: Tensor, batch_tags: Tensor - ) -> Tuple[List[np.ndarray], List[np.ndarray]]: + def batch_decode(self, batch_heatmaps: Tensor, batch_tags: Tensor) -> Tuple[List[np.ndarray], List[np.ndarray]]: """Decode the keypoint coordinates from a batch of heatmaps and tagging heatmaps. The decoded keypoint coordinates are in the input image space. @@ -464,21 +439,19 @@ class AssociativeEmbedding(BaseKeypointCodec): """ B, _, H, W = batch_heatmaps.shape assert batch_tags.shape[0] == B and batch_tags.shape[2:4] == (H, W), ( - f'Mismatched shapes of heatmap ({batch_heatmaps.shape}) and ' - f'tagging map ({batch_tags.shape})') + f"Mismatched shapes of heatmap ({batch_heatmaps.shape}) and " f"tagging map ({batch_tags.shape})" + ) # Heatmap NMS - batch_heatmaps_peak = batch_heatmap_nms(batch_heatmaps, - self.decode_nms_kernel) + batch_heatmaps_peak = batch_heatmap_nms(batch_heatmaps, self.decode_nms_kernel) # Get top-k in each heatmap and and convert to numpy batch_topk_vals, batch_topk_tags, batch_topk_locs = to_numpy( - self._get_batch_topk( - batch_heatmaps_peak, batch_tags, k=self.decode_topk)) + self._get_batch_topk(batch_heatmaps_peak, batch_tags, k=self.decode_topk) + ) # Group keypoint candidates into groups (instances) - batch_groups = self._group_keypoints(batch_topk_vals, batch_topk_tags, - batch_topk_locs) + batch_groups = self._group_keypoints(batch_topk_vals, batch_topk_tags, batch_topk_locs) # Convert to numpy batch_heatmaps_np = to_numpy(batch_heatmaps) @@ -488,8 +461,7 @@ class AssociativeEmbedding(BaseKeypointCodec): batch_keypoints = [] batch_keypoint_scores = [] batch_instance_scores = [] - for i, (groups, heatmaps, tags) in enumerate( - zip(batch_groups, batch_heatmaps_np, batch_tags_np)): + for i, (groups, heatmaps, tags) in enumerate(zip(batch_groups, batch_heatmaps_np, batch_tags_np)): keypoints, scores = groups[..., :-1], groups[..., -1] instance_scores = scores.mean(axis=-1) @@ -497,26 +469,19 @@ class AssociativeEmbedding(BaseKeypointCodec): if keypoints.size > 0: # refine keypoint coordinates according to heatmap distribution if self.use_udp: - keypoints = refine_keypoints_dark_udp( - keypoints, - heatmaps, - blur_kernel_size=self.decode_gaussian_kernel) + keypoints = refine_keypoints_dark_udp(keypoints, heatmaps, blur_kernel_size=self.decode_gaussian_kernel) else: keypoints = refine_keypoints(keypoints, heatmaps) - keypoints += self.decode_center_shift * \ - (scores > 0).astype(keypoints.dtype)[..., None] + keypoints += self.decode_center_shift * (scores > 0).astype(keypoints.dtype)[..., None] # identify missing keypoints - keypoints, scores = self._fill_missing_keypoints( - keypoints, scores, heatmaps, tags) + keypoints, scores = self._fill_missing_keypoints(keypoints, scores, heatmaps, tags) batch_keypoints.append(keypoints) batch_keypoint_scores.append(scores) batch_instance_scores.append(instance_scores) # restore keypoint scale - batch_keypoints = [ - kpts * self.scale_factor for kpts in batch_keypoints - ] + batch_keypoints = [kpts * self.scale_factor for kpts in batch_keypoints] return batch_keypoints, batch_keypoint_scores, batch_instance_scores diff --git a/mmpose/codecs/base.py b/mmpose/codecs/base.py index b01e8c4b2c974a3cf115a005400baad8e0bf9cd6..05a4c623b81be5257a7085b97a25e69021f45a83 100644 --- a/mmpose/codecs/base.py +++ b/mmpose/codecs/base.py @@ -23,9 +23,7 @@ class BaseKeypointCodec(metaclass=ABCMeta): label_mapping_table = dict() @abstractmethod - def encode(self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None) -> dict: + def encode(self, keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None) -> dict: """Encode keypoints. Note: @@ -57,8 +55,7 @@ class BaseKeypointCodec(metaclass=ABCMeta): (N, K, D) """ - def batch_decode(self, batch_encoded: Any - ) -> Tuple[List[np.ndarray], List[np.ndarray]]: + def batch_decode(self, batch_encoded: Any) -> Tuple[List[np.ndarray], List[np.ndarray]]: """Decode keypoints. Args: @@ -77,5 +74,4 @@ class BaseKeypointCodec(metaclass=ABCMeta): @property def support_batch_decoding(self) -> bool: """Return whether the codec support decoding from batch data.""" - return is_method_overridden('batch_decode', BaseKeypointCodec, - self.__class__) + return is_method_overridden("batch_decode", BaseKeypointCodec, self.__class__) diff --git a/mmpose/codecs/decoupled_heatmap.py b/mmpose/codecs/decoupled_heatmap.py index b5929e3dcf3f24092d6aa2887e4a2ff7e4903b9b..035d8a025f80455f56161cc80d7ad0fd441e20df 100644 --- a/mmpose/codecs/decoupled_heatmap.py +++ b/mmpose/codecs/decoupled_heatmap.py @@ -5,9 +5,9 @@ from typing import Optional, Tuple import numpy as np from mmpose.registry import KEYPOINT_CODECS + from .base import BaseKeypointCodec -from .utils import (generate_gaussian_heatmaps, get_diagonal_lengths, - get_instance_bbox, get_instance_root) +from .utils import generate_gaussian_heatmaps, get_diagonal_lengths, get_instance_bbox, get_instance_root from .utils.post_processing import get_heatmap_maximum from .utils.refinement import refine_keypoints @@ -63,22 +63,22 @@ class DecoupledHeatmap(BaseKeypointCodec): # DecoupledHeatmap requires bounding boxes to determine the size of each # instance, so that it can assign varying sigmas based on their size - auxiliary_encode_keys = {'bbox'} + auxiliary_encode_keys = {"bbox"} label_mapping_table = dict( - keypoint_weights='keypoint_weights', - instance_coords='instance_coords', + keypoint_weights="keypoint_weights", + instance_coords="instance_coords", ) field_mapping_table = dict( - heatmaps='heatmaps', - instance_heatmaps='instance_heatmaps', + heatmaps="heatmaps", + instance_heatmaps="instance_heatmaps", ) def __init__( self, input_size: Tuple[int, int], heatmap_size: Tuple[int, int], - root_type: str = 'kpt_center', + root_type: str = "kpt_center", heatmap_min_overlap: float = 0.7, encode_max_instances: int = 30, ): @@ -90,8 +90,7 @@ class DecoupledHeatmap(BaseKeypointCodec): self.encode_max_instances = encode_max_instances self.heatmap_min_overlap = heatmap_min_overlap - self.scale_factor = (np.array(input_size) / - heatmap_size).astype(np.float32) + self.scale_factor = (np.array(input_size) / heatmap_size).astype(np.float32) def _get_instance_wise_sigmas( self, @@ -105,7 +104,7 @@ class DecoupledHeatmap(BaseKeypointCodec): Returns: np.ndarray: Array containing the sigma values for each instance. """ - sigmas = np.zeros((bbox.shape[0], ), dtype=np.float32) + sigmas = np.zeros((bbox.shape[0],), dtype=np.float32) heights = np.sqrt(np.power(bbox[:, 0] - bbox[:, 1], 2).sum(axis=-1)) widths = np.sqrt(np.power(bbox[:, 0] - bbox[:, 2], 2).sum(axis=-1)) @@ -116,8 +115,7 @@ class DecoupledHeatmap(BaseKeypointCodec): # compute sigma for each instance # condition 1 a1, b1 = 1, h + w - c1 = w * h * (1 - self.heatmap_min_overlap) / ( - 1 + self.heatmap_min_overlap) + c1 = w * h * (1 - self.heatmap_min_overlap) / (1 + self.heatmap_min_overlap) sq1 = np.sqrt(b1**2 - 4 * a1 * c1) r1 = (b1 + sq1) / 2 @@ -139,10 +137,7 @@ class DecoupledHeatmap(BaseKeypointCodec): return sigmas - def encode(self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None, - bbox: Optional[np.ndarray] = None) -> dict: + def encode(self, keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None, bbox: Optional[np.ndarray] = None) -> dict: """Encode keypoints into heatmaps. Args: @@ -178,8 +173,7 @@ class DecoupledHeatmap(BaseKeypointCodec): _bbox = bbox.reshape(-1, 4, 2) / self.scale_factor # compute the root and scale of each instance - roots, roots_visible = get_instance_root(_keypoints, keypoints_visible, - self.root_type) + roots, roots_visible = get_instance_root(_keypoints, keypoints_visible, self.root_type) sigmas = self._get_instance_wise_sigmas(_bbox) @@ -187,9 +181,9 @@ class DecoupledHeatmap(BaseKeypointCodec): heatmaps, keypoint_weights = generate_gaussian_heatmaps( heatmap_size=self.heatmap_size, keypoints=np.concatenate((_keypoints, roots[:, None]), axis=1), - keypoints_visible=np.concatenate( - (keypoints_visible, roots_visible[:, None]), axis=1), - sigma=sigmas) + keypoints_visible=np.concatenate((keypoints_visible, roots_visible[:, None]), axis=1), + sigma=sigmas, + ) roots_visible = keypoint_weights[:, -1] # select instances @@ -199,15 +193,14 @@ class DecoupledHeatmap(BaseKeypointCodec): if roots_visible[i] < 1: continue # rand root point in 3x3 grid - x, y = roots[i] + np.random.randint(-1, 2, (2, )) + x, y = roots[i] + np.random.randint(-1, 2, (2,)) x = max(0, min(x, self.heatmap_size[0] - 1)) y = max(0, min(y, self.heatmap_size[1] - 1)) if (x, y) not in inst_roots: inst_roots.append((x, y)) inst_indices.append(i) if len(inst_indices) > self.encode_max_instances: - rand_indices = random.sample( - range(len(inst_indices)), self.encode_max_instances) + rand_indices = random.sample(range(len(inst_indices)), self.encode_max_instances) inst_roots = [inst_roots[i] for i in rand_indices] inst_indices = [inst_indices[i] for i in rand_indices] @@ -216,9 +209,10 @@ class DecoupledHeatmap(BaseKeypointCodec): for i in inst_indices: inst_heatmap, inst_heatmap_weight = generate_gaussian_heatmaps( heatmap_size=self.heatmap_size, - keypoints=_keypoints[i:i + 1], - keypoints_visible=keypoints_visible[i:i + 1], - sigma=sigmas[i].item()) + keypoints=_keypoints[i : i + 1], + keypoints_visible=keypoints_visible[i : i + 1], + sigma=sigmas[i].item(), + ) inst_heatmaps.append(inst_heatmap) inst_heatmap_weights.append(inst_heatmap_weight) @@ -228,19 +222,16 @@ class DecoupledHeatmap(BaseKeypointCodec): inst_roots = np.array(inst_roots, dtype=np.int32) else: inst_heatmaps = np.empty((0, *self.heatmap_size[::-1])) - inst_heatmap_weights = np.empty((0, )) + inst_heatmap_weights = np.empty((0,)) inst_roots = np.empty((0, 2), dtype=np.int32) encoded = dict( - heatmaps=heatmaps, - instance_heatmaps=inst_heatmaps, - keypoint_weights=inst_heatmap_weights, - instance_coords=inst_roots) + heatmaps=heatmaps, instance_heatmaps=inst_heatmaps, keypoint_weights=inst_heatmap_weights, instance_coords=inst_roots + ) return encoded - def decode(self, instance_heatmaps: np.ndarray, - instance_scores: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: + def decode(self, instance_heatmaps: np.ndarray, instance_scores: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: """Decode keypoint coordinates from decoupled heatmaps. The decoded keypoint coordinates are in the input image space. diff --git a/mmpose/codecs/edpose_label.py b/mmpose/codecs/edpose_label.py index 0433784886bbd28a38832d0b5ac614e75d446869..6c91440be6ce44c5709c358b1e822a28895400c5 100644 --- a/mmpose/codecs/edpose_label.py +++ b/mmpose/codecs/edpose_label.py @@ -5,6 +5,7 @@ import numpy as np from mmpose.registry import KEYPOINT_CODECS from mmpose.structures import bbox_cs2xyxy, bbox_xyxy2cs + from .base import BaseKeypointCodec @@ -33,12 +34,12 @@ class EDPoseLabel(BaseKeypointCodec): num_keypoints (int): The Number of keypoints """ - auxiliary_encode_keys = {'area', 'bboxes', 'img_shape'} + auxiliary_encode_keys = {"area", "bboxes", "img_shape"} instance_mapping_table = dict( - bbox='bboxes', - keypoints='keypoints', - keypoints_visible='keypoints_visible', - area='areas', + bbox="bboxes", + keypoints="keypoints", + keypoints_visible="keypoints_visible", + area="areas", ) def __init__(self, num_select: int = 100, num_keypoints: int = 17): @@ -95,16 +96,11 @@ class EDPoseLabel(BaseKeypointCodec): if keypoints is not None: keypoints = keypoints / np.array([w, h], dtype=np.float32) - encoded = dict( - keypoints=keypoints, - area=area, - bbox=bboxes, - keypoints_visible=keypoints_visible) + encoded = dict(keypoints=keypoints, area=area, bbox=bboxes, keypoints_visible=keypoints_visible) return encoded - def decode(self, input_shapes: np.ndarray, pred_logits: np.ndarray, - pred_boxes: np.ndarray, pred_keypoints: np.ndarray): + def decode(self, input_shapes: np.ndarray, pred_logits: np.ndarray, pred_boxes: np.ndarray, pred_keypoints: np.ndarray): """Select the final top-k keypoints, and decode the results from normalize size to origin input size. @@ -124,15 +120,14 @@ class EDPoseLabel(BaseKeypointCodec): prob = pred_logits.reshape(-1) # Select top-k instances based on prediction scores - topk_indexes = np.argsort(-prob)[:self.num_select] + topk_indexes = np.argsort(-prob)[: self.num_select] topk_values = np.take_along_axis(prob, topk_indexes, axis=0) scores = np.tile(topk_values[:, np.newaxis], [1, num_keypoints]) # Decode bounding boxes topk_boxes = topk_indexes // pred_logits.shape[1] boxes = bbox_cs2xyxy(*np.split(pred_boxes, [2], axis=-1)) - boxes = np.take_along_axis( - boxes, np.tile(topk_boxes[:, np.newaxis], [1, 4]), axis=0) + boxes = np.take_along_axis(boxes, np.tile(topk_boxes[:, np.newaxis], [1, 4]), axis=0) # Convert from relative to absolute coordinates img_h, img_w = np.split(input_shapes, 2, axis=0) @@ -141,13 +136,9 @@ class EDPoseLabel(BaseKeypointCodec): # Decode keypoints topk_keypoints = topk_indexes // pred_logits.shape[1] - keypoints = np.take_along_axis( - pred_keypoints, - np.tile(topk_keypoints[:, np.newaxis], [1, num_keypoints * 3]), - axis=0) - keypoints = keypoints[:, :(num_keypoints * 2)] - keypoints = keypoints * np.tile( - np.hstack([img_w, img_h]), [num_keypoints])[np.newaxis, :] + keypoints = np.take_along_axis(pred_keypoints, np.tile(topk_keypoints[:, np.newaxis], [1, num_keypoints * 3]), axis=0) + keypoints = keypoints[:, : (num_keypoints * 2)] + keypoints = keypoints * np.tile(np.hstack([img_w, img_h]), [num_keypoints])[np.newaxis, :] keypoints = keypoints.reshape(-1, num_keypoints, 2) return boxes, keypoints, scores diff --git a/mmpose/codecs/hand_3d_heatmap.py b/mmpose/codecs/hand_3d_heatmap.py index b088e0d7faa27e0775152dd0579c2933ec481860..bd74e3ee53b29ac340902a6c3f87b883a952650b 100644 --- a/mmpose/codecs/hand_3d_heatmap.py +++ b/mmpose/codecs/hand_3d_heatmap.py @@ -4,6 +4,7 @@ from typing import Optional, Tuple import numpy as np from mmpose.registry import KEYPOINT_CODECS + from .base import BaseKeypointCodec from .utils.gaussian_heatmap import generate_3d_gaussian_heatmaps from .utils.post_processing import get_heatmap_3d_maximum @@ -41,36 +42,43 @@ class Hand3DHeatmap(BaseKeypointCodec): """ auxiliary_encode_keys = { - 'dataset_keypoint_weights', 'rel_root_depth', 'rel_root_valid', - 'hand_type', 'hand_type_valid', 'focal', 'principal_pt' + "dataset_keypoint_weights", + "rel_root_depth", + "rel_root_valid", + "hand_type", + "hand_type_valid", + "focal", + "principal_pt", } instance_mapping_table = { - 'keypoints': 'keypoints', - 'keypoints_visible': 'keypoints_visible', - 'keypoints_cam': 'keypoints_cam', + "keypoints": "keypoints", + "keypoints_visible": "keypoints_visible", + "keypoints_cam": "keypoints_cam", } label_mapping_table = { - 'keypoint_weights': 'keypoint_weights', - 'root_depth_weight': 'root_depth_weight', - 'type_weight': 'type_weight', - 'root_depth': 'root_depth', - 'type': 'type' + "keypoint_weights": "keypoint_weights", + "root_depth_weight": "root_depth_weight", + "type_weight": "type_weight", + "root_depth": "root_depth", + "type": "type", } - def __init__(self, - image_size: Tuple[int, int] = [256, 256], - root_heatmap_size: int = 64, - heatmap_size: Tuple[int, int, int] = [64, 64, 64], - heatmap3d_depth_bound: float = 400.0, - heatmap_size_root: int = 64, - root_depth_bound: float = 400.0, - depth_size: int = 64, - use_different_joint_weights: bool = False, - sigma: int = 2, - joint_indices: Optional[list] = None, - max_bound: float = 1.0): + def __init__( + self, + image_size: Tuple[int, int] = [256, 256], + root_heatmap_size: int = 64, + heatmap_size: Tuple[int, int, int] = [64, 64, 64], + heatmap3d_depth_bound: float = 400.0, + heatmap_size_root: int = 64, + root_depth_bound: float = 400.0, + depth_size: int = 64, + use_different_joint_weights: bool = False, + sigma: int = 2, + joint_indices: Optional[list] = None, + max_bound: float = 1.0, + ): super().__init__() self.image_size = np.array(image_size) @@ -85,8 +93,7 @@ class Hand3DHeatmap(BaseKeypointCodec): self.sigma = sigma self.joint_indices = joint_indices self.max_bound = max_bound - self.scale_factor = (np.array(image_size) / - heatmap_size[:-1]).astype(np.float32) + self.scale_factor = (np.array(image_size) / heatmap_size[:-1]).astype(np.float32) def encode( self, @@ -132,8 +139,7 @@ class Hand3DHeatmap(BaseKeypointCodec): keypoints_visible = np.ones(keypoints.shape[:-1], dtype=np.float32) if self.use_different_joint_weights: - assert dataset_keypoint_weights is not None, 'To use different ' \ - 'joint weights,`dataset_keypoint_weights` cannot be None.' + assert dataset_keypoint_weights is not None, "To use different " "joint weights,`dataset_keypoint_weights` cannot be None." heatmaps, keypoint_weights = generate_3d_gaussian_heatmaps( heatmap_size=self.heatmap_size, @@ -145,12 +151,11 @@ class Hand3DHeatmap(BaseKeypointCodec): joint_indices=self.joint_indices, max_bound=self.max_bound, use_different_joint_weights=self.use_different_joint_weights, - dataset_keypoint_weights=dataset_keypoint_weights) + dataset_keypoint_weights=dataset_keypoint_weights, + ) - rel_root_depth = (rel_root_depth / self.root_depth_bound + - 0.5) * self.heatmap_size_root - rel_root_valid = rel_root_valid * (rel_root_depth >= 0) * ( - rel_root_depth <= self.heatmap_size_root) + rel_root_depth = (rel_root_depth / self.root_depth_bound + 0.5) * self.heatmap_size_root + rel_root_valid = rel_root_valid * (rel_root_depth >= 0) * (rel_root_depth <= self.heatmap_size_root) encoded = dict( heatmaps=heatmaps, @@ -158,11 +163,11 @@ class Hand3DHeatmap(BaseKeypointCodec): root_depth=rel_root_depth * np.ones(1, dtype=np.float32), type=hand_type, type_weight=hand_type_valid, - root_depth_weight=rel_root_valid * np.ones(1, dtype=np.float32)) + root_depth_weight=rel_root_valid * np.ones(1, dtype=np.float32), + ) return encoded - def decode(self, heatmaps: np.ndarray, root_depth: np.ndarray, - hand_type: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: + def decode(self, heatmaps: np.ndarray, root_depth: np.ndarray, hand_type: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: """Decode keypoint coordinates from heatmaps. The decoded keypoint coordinates are in the input image space. @@ -183,8 +188,7 @@ class Hand3DHeatmap(BaseKeypointCodec): keypoints, scores = get_heatmap_3d_maximum(heatmap3d) # transform keypoint depth to camera space - keypoints[..., 2] = (keypoints[..., 2] / self.depth_size - - 0.5) * self.heatmap3d_depth_bound + keypoints[..., 2] = (keypoints[..., 2] / self.depth_size - 0.5) * self.heatmap3d_depth_bound # Unsqueeze the instance dimension for single-instance results keypoints, scores = keypoints[None], scores[None] @@ -194,8 +198,7 @@ class Hand3DHeatmap(BaseKeypointCodec): # decode relative hand root depth # transform relative root depth to camera space - rel_root_depth = ((root_depth / self.root_heatmap_size - 0.5) * - self.root_depth_bound) + rel_root_depth = (root_depth / self.root_heatmap_size - 0.5) * self.root_depth_bound hand_type = (hand_type > 0).reshape(1, -1).astype(int) diff --git a/mmpose/codecs/image_pose_lifting.py b/mmpose/codecs/image_pose_lifting.py index 1665d88e1d90afc843db4fa453f4004d9ecd12d3..5c8da3c0bc200a298c39cd9575e30e77cc1ec5b9 100644 --- a/mmpose/codecs/image_pose_lifting.py +++ b/mmpose/codecs/image_pose_lifting.py @@ -4,6 +4,7 @@ from typing import List, Optional, Tuple, Union import numpy as np from mmpose.registry import KEYPOINT_CODECS + from .base import BaseKeypointCodec @@ -39,29 +40,30 @@ class ImagePoseLifting(BaseKeypointCodec): coordinates in shape (K, C). """ - auxiliary_encode_keys = {'lifting_target', 'lifting_target_visible'} + auxiliary_encode_keys = {"lifting_target", "lifting_target_visible"} instance_mapping_table = dict( - lifting_target='lifting_target', - lifting_target_visible='lifting_target_visible', + lifting_target="lifting_target", + lifting_target_visible="lifting_target_visible", ) label_mapping_table = dict( - trajectory_weights='trajectory_weights', - lifting_target_label='lifting_target_label', - lifting_target_weight='lifting_target_weight') - - def __init__(self, - num_keypoints: int, - root_index: Union[int, List] = 0, - remove_root: bool = False, - save_index: bool = False, - reshape_keypoints: bool = True, - concat_vis: bool = False, - keypoints_mean: Optional[np.ndarray] = None, - keypoints_std: Optional[np.ndarray] = None, - target_mean: Optional[np.ndarray] = None, - target_std: Optional[np.ndarray] = None, - additional_encode_keys: Optional[List[str]] = None): + trajectory_weights="trajectory_weights", lifting_target_label="lifting_target_label", lifting_target_weight="lifting_target_weight" + ) + + def __init__( + self, + num_keypoints: int, + root_index: Union[int, List] = 0, + remove_root: bool = False, + save_index: bool = False, + reshape_keypoints: bool = True, + concat_vis: bool = False, + keypoints_mean: Optional[np.ndarray] = None, + keypoints_std: Optional[np.ndarray] = None, + target_mean: Optional[np.ndarray] = None, + target_std: Optional[np.ndarray] = None, + additional_encode_keys: Optional[List[str]] = None, + ): super().__init__() self.num_keypoints = num_keypoints @@ -73,27 +75,22 @@ class ImagePoseLifting(BaseKeypointCodec): self.reshape_keypoints = reshape_keypoints self.concat_vis = concat_vis if keypoints_mean is not None: - assert keypoints_std is not None, 'keypoints_std is None' - keypoints_mean = np.array( - keypoints_mean, - dtype=np.float32).reshape(1, num_keypoints, -1) - keypoints_std = np.array( - keypoints_std, dtype=np.float32).reshape(1, num_keypoints, -1) + assert keypoints_std is not None, "keypoints_std is None" + keypoints_mean = np.array(keypoints_mean, dtype=np.float32).reshape(1, num_keypoints, -1) + keypoints_std = np.array(keypoints_std, dtype=np.float32).reshape(1, num_keypoints, -1) assert keypoints_mean.shape == keypoints_std.shape, ( - f'keypoints_mean.shape {keypoints_mean.shape} != ' - f'keypoints_std.shape {keypoints_std.shape}') + f"keypoints_mean.shape {keypoints_mean.shape} != " f"keypoints_std.shape {keypoints_std.shape}" + ) if target_mean is not None: - assert target_std is not None, 'target_std is None' + assert target_std is not None, "target_std is None" target_dim = num_keypoints - 1 if remove_root else num_keypoints - target_mean = np.array( - target_mean, dtype=np.float32).reshape(1, target_dim, -1) - target_std = np.array( - target_std, dtype=np.float32).reshape(1, target_dim, -1) + target_mean = np.array(target_mean, dtype=np.float32).reshape(1, target_dim, -1) + target_std = np.array(target_std, dtype=np.float32).reshape(1, target_dim, -1) assert target_mean.shape == target_std.shape, ( - f'target_mean.shape {target_mean.shape} != ' - f'target_std.shape {target_std.shape}') + f"target_mean.shape {target_mean.shape} != " f"target_std.shape {target_std.shape}" + ) self.keypoints_mean = keypoints_mean self.keypoints_std = keypoints_std self.target_mean = target_mean @@ -102,11 +99,13 @@ class ImagePoseLifting(BaseKeypointCodec): if additional_encode_keys is not None: self.auxiliary_encode_keys.update(additional_encode_keys) - def encode(self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None, - lifting_target: Optional[np.ndarray] = None, - lifting_target_visible: Optional[np.ndarray] = None) -> dict: + def encode( + self, + keypoints: np.ndarray, + keypoints_visible: Optional[np.ndarray] = None, + lifting_target: Optional[np.ndarray] = None, + lifting_target_visible: Optional[np.ndarray] = None, + ) -> dict: """Encoding keypoints from input image space to normalized space. Args: @@ -154,75 +153,64 @@ class ImagePoseLifting(BaseKeypointCodec): # set initial value for `lifting_target_weight` # and `trajectory_weights` if lifting_target_visible is None: - lifting_target_visible = np.ones( - lifting_target.shape[:-1], dtype=np.float32) + lifting_target_visible = np.ones(lifting_target.shape[:-1], dtype=np.float32) lifting_target_weight = lifting_target_visible - trajectory_weights = (1 / lifting_target[:, 2]) + trajectory_weights = 1 / lifting_target[:, 2] else: valid = lifting_target_visible > 0.5 - lifting_target_weight = np.where(valid, 1., 0.).astype(np.float32) + lifting_target_weight = np.where(valid, 1.0, 0.0).astype(np.float32) trajectory_weights = lifting_target_weight encoded = dict() # Zero-center the target pose around a given root keypoint - assert (lifting_target.ndim >= 2 and - lifting_target.shape[-2] > max(self.root_index)), \ - f'Got invalid joint shape {lifting_target.shape}' + assert lifting_target.ndim >= 2 and lifting_target.shape[-2] > max( + self.root_index + ), f"Got invalid joint shape {lifting_target.shape}" - root = np.mean( - lifting_target[..., self.root_index, :], axis=-2, dtype=np.float32) + root = np.mean(lifting_target[..., self.root_index, :], axis=-2, dtype=np.float32) lifting_target_label = lifting_target - root[np.newaxis, ...] if self.remove_root and len(self.root_index) == 1: root_index = self.root_index[0] - lifting_target_label = np.delete( - lifting_target_label, root_index, axis=-2) - lifting_target_visible = np.delete( - lifting_target_visible, root_index, axis=-2) - assert lifting_target_weight.ndim in { - 2, 3 - }, (f'lifting_target_weight.ndim {lifting_target_weight.ndim} ' - 'is not in {2, 3}') + lifting_target_label = np.delete(lifting_target_label, root_index, axis=-2) + lifting_target_visible = np.delete(lifting_target_visible, root_index, axis=-2) + assert lifting_target_weight.ndim in {2, 3}, f"lifting_target_weight.ndim {lifting_target_weight.ndim} " "is not in {2, 3}" axis_to_remove = -2 if lifting_target_weight.ndim == 3 else -1 - lifting_target_weight = np.delete( - lifting_target_weight, root_index, axis=axis_to_remove) + lifting_target_weight = np.delete(lifting_target_weight, root_index, axis=axis_to_remove) # Add a flag to avoid latter transforms that rely on the root # joint or the original joint index - encoded['target_root_removed'] = True + encoded["target_root_removed"] = True # Save the root index which is necessary to restore the global pose if self.save_index: - encoded['target_root_index'] = root_index + encoded["target_root_index"] = root_index # Normalize the 2D keypoint coordinate with mean and std keypoint_labels = keypoints.copy() if self.keypoints_mean is not None: assert self.keypoints_mean.shape[1:] == keypoints.shape[1:], ( - f'self.keypoints_mean.shape[1:] {self.keypoints_mean.shape[1:]} ' # noqa - f'!= keypoints.shape[1:] {keypoints.shape[1:]}') - encoded['keypoints_mean'] = self.keypoints_mean.copy() - encoded['keypoints_std'] = self.keypoints_std.copy() + f"self.keypoints_mean.shape[1:] {self.keypoints_mean.shape[1:]} " # noqa + f"!= keypoints.shape[1:] {keypoints.shape[1:]}" + ) + encoded["keypoints_mean"] = self.keypoints_mean.copy() + encoded["keypoints_std"] = self.keypoints_std.copy() - keypoint_labels = (keypoint_labels - - self.keypoints_mean) / self.keypoints_std + keypoint_labels = (keypoint_labels - self.keypoints_mean) / self.keypoints_std if self.target_mean is not None: assert self.target_mean.shape == lifting_target_label.shape, ( - f'self.target_mean.shape {self.target_mean.shape} ' - f'!= lifting_target_label.shape {lifting_target_label.shape}' # noqa + f"self.target_mean.shape {self.target_mean.shape} " + f"!= lifting_target_label.shape {lifting_target_label.shape}" # noqa ) - encoded['target_mean'] = self.target_mean.copy() - encoded['target_std'] = self.target_std.copy() + encoded["target_mean"] = self.target_mean.copy() + encoded["target_std"] = self.target_std.copy() - lifting_target_label = (lifting_target_label - - self.target_mean) / self.target_std + lifting_target_label = (lifting_target_label - self.target_mean) / self.target_std # Generate reshaped keypoint coordinates - assert keypoint_labels.ndim in { - 2, 3 - }, (f'keypoint_labels.ndim {keypoint_labels.ndim} is not in {2, 3}') + assert keypoint_labels.ndim in {2, 3}, f"keypoint_labels.ndim {keypoint_labels.ndim} is not in {2, 3}" if keypoint_labels.ndim == 2: keypoint_labels = keypoint_labels[None, ...] @@ -230,26 +218,22 @@ class ImagePoseLifting(BaseKeypointCodec): keypoints_visible_ = keypoints_visible if keypoints_visible.ndim == 2: keypoints_visible_ = keypoints_visible[..., None] - keypoint_labels = np.concatenate( - (keypoint_labels, keypoints_visible_), axis=2) + keypoint_labels = np.concatenate((keypoint_labels, keypoints_visible_), axis=2) if self.reshape_keypoints: N = keypoint_labels.shape[0] keypoint_labels = keypoint_labels.transpose(1, 2, 0).reshape(-1, N) - encoded['keypoint_labels'] = keypoint_labels - encoded['keypoint_labels_visible'] = keypoints_visible - encoded['lifting_target_label'] = lifting_target_label - encoded['lifting_target_weight'] = lifting_target_weight - encoded['trajectory_weights'] = trajectory_weights - encoded['target_root'] = root + encoded["keypoint_labels"] = keypoint_labels + encoded["keypoint_labels_visible"] = keypoints_visible + encoded["lifting_target_label"] = lifting_target_label + encoded["lifting_target_weight"] = lifting_target_weight + encoded["trajectory_weights"] = trajectory_weights + encoded["target_root"] = root return encoded - def decode(self, - encoded: np.ndarray, - target_root: Optional[np.ndarray] = None - ) -> Tuple[np.ndarray, np.ndarray]: + def decode(self, encoded: np.ndarray, target_root: Optional[np.ndarray] = None) -> Tuple[np.ndarray, np.ndarray]: """Decode keypoint coordinates from normalized space to input image space. @@ -266,15 +250,14 @@ class ImagePoseLifting(BaseKeypointCodec): if self.target_mean is not None and self.target_std is not None: assert self.target_mean.shape == keypoints.shape, ( - f'self.target_mean.shape {self.target_mean.shape} ' - f'!= keypoints.shape {keypoints.shape}') + f"self.target_mean.shape {self.target_mean.shape} " f"!= keypoints.shape {keypoints.shape}" + ) keypoints = keypoints * self.target_std + self.target_mean if target_root is not None and target_root.size > 0: keypoints = keypoints + target_root if self.remove_root and len(self.root_index) == 1: - keypoints = np.insert( - keypoints, self.root_index, target_root, axis=1) + keypoints = np.insert(keypoints, self.root_index, target_root, axis=1) scores = np.ones(keypoints.shape[:-1], dtype=np.float32) return keypoints, scores diff --git a/mmpose/codecs/integral_regression_label.py b/mmpose/codecs/integral_regression_label.py index a3ded1f00b89cfe6c67107529d0787eb1acc49cb..df110e34a688da1ad19e4b487b6262ee4133fd6d 100644 --- a/mmpose/codecs/integral_regression_label.py +++ b/mmpose/codecs/integral_regression_label.py @@ -5,6 +5,7 @@ from typing import Optional, Tuple import numpy as np from mmpose.registry import KEYPOINT_CODECS + from .base import BaseKeypointCodec from .msra_heatmap import MSRAHeatmap from .regression_label import RegressionLabel @@ -46,28 +47,29 @@ class IntegralRegressionLabel(BaseKeypointCodec): """ label_mapping_table = dict( - keypoint_labels='keypoint_labels', - keypoint_weights='keypoint_weights', + keypoint_labels="keypoint_labels", + keypoint_weights="keypoint_weights", + ) + field_mapping_table = dict( + heatmaps="heatmaps", ) - field_mapping_table = dict(heatmaps='heatmaps', ) - - def __init__(self, - input_size: Tuple[int, int], - heatmap_size: Tuple[int, int], - sigma: float, - unbiased: bool = False, - blur_kernel_size: int = 11, - normalize: bool = True) -> None: + + def __init__( + self, + input_size: Tuple[int, int], + heatmap_size: Tuple[int, int], + sigma: float, + unbiased: bool = False, + blur_kernel_size: int = 11, + normalize: bool = True, + ) -> None: super().__init__() - self.heatmap_codec = MSRAHeatmap(input_size, heatmap_size, sigma, - unbiased, blur_kernel_size) + self.heatmap_codec = MSRAHeatmap(input_size, heatmap_size, sigma, unbiased, blur_kernel_size) self.keypoint_codec = RegressionLabel(input_size) self.normalize = normalize - def encode(self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None) -> dict: + def encode(self, keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None) -> dict: """Encoding keypoints to regression labels and heatmaps. Args: @@ -87,18 +89,15 @@ class IntegralRegressionLabel(BaseKeypointCodec): encoded_hm = self.heatmap_codec.encode(keypoints, keypoints_visible) encoded_kp = self.keypoint_codec.encode(keypoints, keypoints_visible) - heatmaps = encoded_hm['heatmaps'] - keypoint_labels = encoded_kp['keypoint_labels'] - keypoint_weights = encoded_kp['keypoint_weights'] + heatmaps = encoded_hm["heatmaps"] + keypoint_labels = encoded_kp["keypoint_labels"] + keypoint_weights = encoded_kp["keypoint_weights"] if self.normalize: val_sum = heatmaps.sum(axis=(-1, -2)).reshape(-1, 1, 1) + 1e-24 heatmaps = heatmaps / val_sum - encoded = dict( - keypoint_labels=keypoint_labels, - heatmaps=heatmaps, - keypoint_weights=keypoint_weights) + encoded = dict(keypoint_labels=keypoint_labels, heatmaps=heatmaps, keypoint_weights=keypoint_weights) return encoded diff --git a/mmpose/codecs/megvii_heatmap.py b/mmpose/codecs/megvii_heatmap.py index 3af0a54ff832f87e3e546e5e0b754dd95fa40bba..6ad5d5aae425e3244d64ab841661325607fbab5a 100644 --- a/mmpose/codecs/megvii_heatmap.py +++ b/mmpose/codecs/megvii_heatmap.py @@ -6,6 +6,7 @@ import cv2 import numpy as np from mmpose.registry import KEYPOINT_CODECS + from .base import BaseKeypointCodec from .utils import gaussian_blur, get_heatmap_maximum @@ -39,8 +40,12 @@ class MegviiHeatmap(BaseKeypointCodec): .. _`CPN`: https://arxiv.org/abs/1711.07319 """ - label_mapping_table = dict(keypoint_weights='keypoint_weights', ) - field_mapping_table = dict(heatmaps='heatmaps', ) + label_mapping_table = dict( + keypoint_weights="keypoint_weights", + ) + field_mapping_table = dict( + heatmaps="heatmaps", + ) def __init__( self, @@ -53,12 +58,9 @@ class MegviiHeatmap(BaseKeypointCodec): self.input_size = input_size self.heatmap_size = heatmap_size self.kernel_size = kernel_size - self.scale_factor = (np.array(input_size) / - heatmap_size).astype(np.float32) + self.scale_factor = (np.array(input_size) / heatmap_size).astype(np.float32) - def encode(self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None) -> dict: + def encode(self, keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None) -> dict: """Encode keypoints into heatmaps. Note that the original keypoint coordinates should be in the input image space. @@ -78,9 +80,7 @@ class MegviiHeatmap(BaseKeypointCodec): N, K, _ = keypoints.shape W, H = self.heatmap_size - assert N == 1, ( - f'{self.__class__.__name__} only support single-instance ' - 'keypoint encoding') + assert N == 1, f"{self.__class__.__name__} only support single-instance " "keypoint encoding" heatmaps = np.zeros((K, H, W), dtype=np.float32) keypoint_weights = keypoints_visible.copy() @@ -96,12 +96,12 @@ class MegviiHeatmap(BaseKeypointCodec): keypoint_weights[n, k] = 0 continue - heatmaps[k, ky, kx] = 1. + heatmaps[k, ky, kx] = 1.0 kernel_size = (self.kernel_size, self.kernel_size) heatmaps[k] = cv2.GaussianBlur(heatmaps[k], kernel_size, 0) # normalize the heatmap - heatmaps[k] = heatmaps[k] / heatmaps[k, ky, kx] * 255. + heatmaps[k] = heatmaps[k] / heatmaps[k, ky, kx] * 255.0 encoded = dict(heatmaps=heatmaps, keypoint_weights=keypoint_weights) @@ -131,11 +131,8 @@ class MegviiHeatmap(BaseKeypointCodec): px = int(keypoints[k, 0]) py = int(keypoints[k, 1]) if 1 < px < W - 1 and 1 < py < H - 1: - diff = np.array([ - heatmap[py][px + 1] - heatmap[py][px - 1], - heatmap[py + 1][px] - heatmap[py - 1][px] - ]) - keypoints[k] += (np.sign(diff) * 0.25 + 0.5) + diff = np.array([heatmap[py][px + 1] - heatmap[py][px - 1], heatmap[py + 1][px] - heatmap[py - 1][px]]) + keypoints[k] += np.sign(diff) * 0.25 + 0.5 scores = scores / 255.0 + 0.5 diff --git a/mmpose/codecs/motionbert_label.py b/mmpose/codecs/motionbert_label.py index 98024ea4e63d1ca836808c950d72b4760b969c41..36ebd3d51a8295dbf2224b365890d37a22ee09d6 100644 --- a/mmpose/codecs/motionbert_label.py +++ b/mmpose/codecs/motionbert_label.py @@ -6,6 +6,7 @@ from typing import Optional, Tuple import numpy as np from mmpose.registry import KEYPOINT_CODECS + from .base import BaseKeypointCodec from .utils import camera_to_image_coord @@ -38,27 +39,26 @@ class MotionBERTLabel(BaseKeypointCodec): Default: ``'test'``. """ - auxiliary_encode_keys = { - 'lifting_target', 'lifting_target_visible', 'camera_param', 'factor' - } + auxiliary_encode_keys = {"lifting_target", "lifting_target_visible", "camera_param", "factor"} instance_mapping_table = dict( - lifting_target='lifting_target', - lifting_target_visible='lifting_target_visible', + lifting_target="lifting_target", + lifting_target_visible="lifting_target_visible", ) label_mapping_table = dict( - trajectory_weights='trajectory_weights', - lifting_target_label='lifting_target_label', - lifting_target_weight='lifting_target_weight') - - def __init__(self, - num_keypoints: int, - root_index: int = 0, - remove_root: bool = False, - save_index: bool = False, - concat_vis: bool = False, - rootrel: bool = False, - mode: str = 'test'): + trajectory_weights="trajectory_weights", lifting_target_label="lifting_target_label", lifting_target_weight="lifting_target_weight" + ) + + def __init__( + self, + num_keypoints: int, + root_index: int = 0, + remove_root: bool = False, + save_index: bool = False, + concat_vis: bool = False, + rootrel: bool = False, + mode: str = "test", + ): super().__init__() self.num_keypoints = num_keypoints @@ -67,18 +67,18 @@ class MotionBERTLabel(BaseKeypointCodec): self.save_index = save_index self.concat_vis = concat_vis self.rootrel = rootrel - assert mode.lower() in {'train', 'test' - }, (f'Unsupported mode {mode}, ' - 'mode should be one of ("train", "test").') + assert mode.lower() in {"train", "test"}, f"Unsupported mode {mode}, " 'mode should be one of ("train", "test").' self.mode = mode.lower() - def encode(self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None, - lifting_target: Optional[np.ndarray] = None, - lifting_target_visible: Optional[np.ndarray] = None, - camera_param: Optional[dict] = None, - factor: Optional[np.ndarray] = None) -> dict: + def encode( + self, + keypoints: np.ndarray, + keypoints_visible: Optional[np.ndarray] = None, + lifting_target: Optional[np.ndarray] = None, + lifting_target_visible: Optional[np.ndarray] = None, + camera_param: Optional[dict] = None, + factor: Optional[np.ndarray] = None, + ) -> dict: """Encoding keypoints from input image space to normalized space. Args: @@ -112,12 +112,11 @@ class MotionBERTLabel(BaseKeypointCodec): # set initial value for `lifting_target_weight` if lifting_target_visible is None: - lifting_target_visible = np.ones( - lifting_target.shape[:-1], dtype=np.float32) + lifting_target_visible = np.ones(lifting_target.shape[:-1], dtype=np.float32) lifting_target_weight = lifting_target_visible else: valid = lifting_target_visible > 0.5 - lifting_target_weight = np.where(valid, 1., 0.).astype(np.float32) + lifting_target_weight = np.where(valid, 1.0, 0.0).astype(np.float32) if camera_param is None: camera_param = dict() @@ -128,57 +127,50 @@ class MotionBERTLabel(BaseKeypointCodec): lifting_target_label = lifting_target.copy() keypoint_labels = keypoints.copy() - assert keypoint_labels.ndim in { - 2, 3 - }, (f'Keypoint labels should have 2 or 3 dimensions, ' - f'but got {keypoint_labels.ndim}.') + assert keypoint_labels.ndim in {2, 3}, f"Keypoint labels should have 2 or 3 dimensions, " f"but got {keypoint_labels.ndim}." if keypoint_labels.ndim == 2: keypoint_labels = keypoint_labels[None, ...] # Normalize the 2D keypoint coordinate with image width and height _camera_param = deepcopy(camera_param) - assert 'w' in _camera_param and 'h' in _camera_param, ( - 'Camera parameters should contain "w" and "h".') - w, h = _camera_param['w'], _camera_param['h'] - keypoint_labels[ - ..., :2] = keypoint_labels[..., :2] / w * 2 - [1, h / w] + assert "w" in _camera_param and "h" in _camera_param, 'Camera parameters should contain "w" and "h".' + w, h = _camera_param["w"], _camera_param["h"] + keypoint_labels[..., :2] = keypoint_labels[..., :2] / w * 2 - [1, h / w] # convert target to image coordinate T = keypoint_labels.shape[0] - factor_ = np.array([4] * T, dtype=np.float32).reshape(T, ) - if 'f' in _camera_param and 'c' in _camera_param: - lifting_target_label, factor_ = camera_to_image_coord( - self.root_index, lifting_target_label, _camera_param) - if self.mode == 'train': + factor_ = np.array([4] * T, dtype=np.float32).reshape( + T, + ) + if "f" in _camera_param and "c" in _camera_param: + lifting_target_label, factor_ = camera_to_image_coord(self.root_index, lifting_target_label, _camera_param) + if self.mode == "train": w, h = w / 1000, h / 1000 - lifting_target_label[ - ..., :2] = lifting_target_label[..., :2] / w * 2 - [1, h / w] + lifting_target_label[..., :2] = lifting_target_label[..., :2] / w * 2 - [1, h / w] lifting_target_label[..., 2] = lifting_target_label[..., 2] / w * 2 - lifting_target_label[..., :, :] = lifting_target_label[ - ..., :, :] - lifting_target_label[..., - self.root_index:self.root_index + - 1, :] + lifting_target_label[..., :, :] = ( + lifting_target_label[..., :, :] - lifting_target_label[..., self.root_index : self.root_index + 1, :] + ) if factor is None or factor[0] == 0: factor = factor_ if factor.ndim == 1: factor = factor[:, None] - if self.mode == 'test': + if self.mode == "test": lifting_target_label *= factor[..., None] if self.concat_vis: keypoints_visible_ = keypoints_visible if keypoints_visible.ndim == 2: keypoints_visible_ = keypoints_visible[..., None] - keypoint_labels = np.concatenate( - (keypoint_labels, keypoints_visible_), axis=2) + keypoint_labels = np.concatenate((keypoint_labels, keypoints_visible_), axis=2) - encoded['keypoint_labels'] = keypoint_labels - encoded['keypoint_labels_visible'] = keypoints_visible - encoded['lifting_target_label'] = lifting_target_label - encoded['lifting_target_weight'] = lifting_target_weight - encoded['lifting_target'] = lifting_target_label - encoded['lifting_target_visible'] = lifting_target_visible - encoded['factor'] = factor + encoded["keypoint_labels"] = keypoint_labels + encoded["keypoint_labels_visible"] = keypoints_visible + encoded["lifting_target_label"] = lifting_target_label + encoded["lifting_target_weight"] = lifting_target_weight + encoded["lifting_target"] = lifting_target_label + encoded["lifting_target_visible"] = lifting_target_visible + encoded["factor"] = factor return encoded @@ -212,29 +204,24 @@ class MotionBERTLabel(BaseKeypointCodec): keypoints[..., 0, :] = 0 if w is not None and w.size > 0: - assert w.shape == h.shape, (f'w and h should have the same shape, ' - f'but got {w.shape} and {h.shape}.') + assert w.shape == h.shape, f"w and h should have the same shape, " f"but got {w.shape} and {h.shape}." assert w.shape[0] == keypoints.shape[0], ( - f'w and h should have the same batch size, ' - f'but got {w.shape[0]} and {keypoints.shape[0]}.') - assert w.ndim in {1, - 2}, (f'w and h should have 1 or 2 dimensions, ' - f'but got {w.ndim}.') + f"w and h should have the same batch size, " f"but got {w.shape[0]} and {keypoints.shape[0]}." + ) + assert w.ndim in {1, 2}, f"w and h should have 1 or 2 dimensions, " f"but got {w.ndim}." if w.ndim == 1: w = w[:, None] h = h[:, None] - trans = np.append( - np.ones((w.shape[0], 1)), h / w, axis=1)[:, None, :] + trans = np.append(np.ones((w.shape[0], 1)), h / w, axis=1)[:, None, :] keypoints[..., :2] = (keypoints[..., :2] + trans) * w[:, None] / 2 keypoints[..., 2:] = keypoints[..., 2:] * w[:, None] / 2 if factor is not None and factor.size > 0: assert factor.shape[0] == keypoints.shape[0], ( - f'factor should have the same batch size, ' - f'but got {factor.shape[0]} and {keypoints.shape[0]}.') + f"factor should have the same batch size, " f"but got {factor.shape[0]} and {keypoints.shape[0]}." + ) keypoints *= factor[..., None] - keypoints[..., :, :] = keypoints[..., :, :] - keypoints[ - ..., self.root_index:self.root_index + 1, :] - keypoints /= 1000. + keypoints[..., :, :] = keypoints[..., :, :] - keypoints[..., self.root_index : self.root_index + 1, :] + keypoints /= 1000.0 return keypoints, scores diff --git a/mmpose/codecs/msra_heatmap.py b/mmpose/codecs/msra_heatmap.py index 15742555b495560c9dfa095a3cdc93ba0eb5d928..4a5f71f32f50738870c65a2cb5c54b31a6a21804 100644 --- a/mmpose/codecs/msra_heatmap.py +++ b/mmpose/codecs/msra_heatmap.py @@ -4,9 +4,9 @@ from typing import Optional, Tuple import numpy as np from mmpose.registry import KEYPOINT_CODECS + from .base import BaseKeypointCodec -from .utils.gaussian_heatmap import (generate_gaussian_heatmaps, - generate_unbiased_gaussian_heatmaps) +from .utils.gaussian_heatmap import generate_gaussian_heatmaps, generate_unbiased_gaussian_heatmaps from .utils.post_processing import get_heatmap_maximum from .utils.refinement import refine_keypoints, refine_keypoints_dark @@ -47,15 +47,16 @@ class MSRAHeatmap(BaseKeypointCodec): .. _`Dark Pose`: https://arxiv.org/abs/1910.06278 """ - label_mapping_table = dict(keypoint_weights='keypoint_weights', ) - field_mapping_table = dict(heatmaps='heatmaps', ) + label_mapping_table = dict( + keypoint_weights="keypoint_weights", + ) + field_mapping_table = dict( + heatmaps="heatmaps", + ) - def __init__(self, - input_size: Tuple[int, int], - heatmap_size: Tuple[int, int], - sigma: float, - unbiased: bool = False, - blur_kernel_size: int = 11) -> None: + def __init__( + self, input_size: Tuple[int, int], heatmap_size: Tuple[int, int], sigma: float, unbiased: bool = False, blur_kernel_size: int = 11 + ) -> None: super().__init__() self.input_size = input_size self.heatmap_size = heatmap_size @@ -71,12 +72,9 @@ class MSRAHeatmap(BaseKeypointCodec): # sigma~=1.5 if ks=7; # sigma~=1 if ks=3; self.blur_kernel_size = blur_kernel_size - self.scale_factor = (np.array(input_size) / - heatmap_size).astype(np.float32) + self.scale_factor = (np.array(input_size) / heatmap_size).astype(np.float32) - def encode(self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None) -> dict: + def encode(self, keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None) -> dict: """Encode keypoints into heatmaps. Note that the original keypoint coordinates should be in the input image space. @@ -93,9 +91,7 @@ class MSRAHeatmap(BaseKeypointCodec): (N, K) """ - assert keypoints.shape[0] == 1, ( - f'{self.__class__.__name__} only support single-instance ' - 'keypoint encoding') + assert keypoints.shape[0] == 1, f"{self.__class__.__name__} only support single-instance " "keypoint encoding" if keypoints_visible is None: keypoints_visible = np.ones(keypoints.shape[:2], dtype=np.float32) @@ -105,13 +101,15 @@ class MSRAHeatmap(BaseKeypointCodec): heatmap_size=self.heatmap_size, keypoints=keypoints / self.scale_factor, keypoints_visible=keypoints_visible, - sigma=self.sigma) + sigma=self.sigma, + ) else: heatmaps, keypoint_weights = generate_gaussian_heatmaps( heatmap_size=self.heatmap_size, keypoints=keypoints / self.scale_factor, keypoints_visible=keypoints_visible, - sigma=self.sigma) + sigma=self.sigma, + ) encoded = dict(heatmaps=heatmaps, keypoint_weights=keypoint_weights) @@ -141,8 +139,7 @@ class MSRAHeatmap(BaseKeypointCodec): if self.unbiased: # Alleviate biased coordinate - keypoints = refine_keypoints_dark( - keypoints, heatmaps, blur_kernel_size=self.blur_kernel_size) + keypoints = refine_keypoints_dark(keypoints, heatmaps, blur_kernel_size=self.blur_kernel_size) else: keypoints = refine_keypoints(keypoints, heatmaps) diff --git a/mmpose/codecs/oks_argmax_heatmap.py b/mmpose/codecs/oks_argmax_heatmap.py new file mode 100644 index 0000000000000000000000000000000000000000..89065171435a1a574bbe29cbe41e2c98f76f9672 --- /dev/null +++ b/mmpose/codecs/oks_argmax_heatmap.py @@ -0,0 +1,342 @@ +# Copyright (c) Miroslav Purkrabek, ProbPose. All rights reserved. +import os +from typing import Optional, Tuple + +import cv2 +import numpy as np + +from mmpose.registry import KEYPOINT_CODECS +from mmpose.utils import get_root_logger + +from .base import BaseKeypointCodec +from .utils import ( + gaussian_blur, + generate_offset_heatmap, + generate_oks_maps, + generate_udp_gaussian_heatmaps, + get_heatmap_expected_value, + get_heatmap_maximum, + refine_keypoints_dark_udp, +) + + +@KEYPOINT_CODECS.register_module() +class OKSArgMaxHeatmap(BaseKeypointCodec): + r"""Generate keypoint heatmaps by Unbiased Data Processing (UDP). + See the paper: `The Devil is in the Details: Delving into Unbiased Data + Processing for Human Pose Estimation`_ by Huang et al (2020) for details. + + Note: + + - instance number: N + - keypoint number: K + - keypoint dimension: D + - image size: [w, h] + - heatmap size: [W, H] + + Encoded: + + - heatmap (np.ndarray): The generated heatmap in shape (C_out, H, W) + where [W, H] is the `heatmap_size`, and the C_out is the output + channel number which depends on the `heatmap_type`. If + `heatmap_type=='gaussian'`, C_out equals to keypoint number K; + if `heatmap_type=='combined'`, C_out equals to K*3 + (x_offset, y_offset and class label) + - keypoint_weights (np.ndarray): The target weights in shape (K,) + + Args: + input_size (tuple): Image size in [w, h] + heatmap_size (tuple): Heatmap size in [W, H] + heatmap_type (str): The heatmap type to encode the keypoitns. Options + are: + + - ``'gaussian'``: Gaussian heatmap + - ``'combined'``: Combination of a binary label map and offset + maps for X and Y axes. + + sigma (float): The sigma value of the Gaussian heatmap when + ``heatmap_type=='gaussian'``. Defaults to 2.0 + radius_factor (float): The radius factor of the binary label + map when ``heatmap_type=='combined'``. The positive region is + defined as the neighbor of the keypoit with the radius + :math:`r=radius_factor*max(W, H)`. Defaults to 0.0546875 + blur_kernel_size (int): The Gaussian blur kernel size of the heatmap + modulation in DarkPose. Defaults to 11 + + .. _`The Devil is in the Details: Delving into Unbiased Data Processing for + Human Pose Estimation`: https://arxiv.org/abs/1911.07524 + """ + + label_mapping_table = dict( + keypoint_weights="keypoint_weights", + ) + field_mapping_table = dict( + heatmaps="heatmaps", + ) + + def __init__( + self, + input_size: Tuple[int, int], + heatmap_size: Tuple[int, int], + heatmap_type: str = "gaussian", + sigma: float = -1, + radius_factor: float = 0.0546875, + blur_kernel_size: int = 11, + increase_sigma_with_padding=False, + ) -> None: + super().__init__() + self.input_size = input_size + self.heatmap_size = heatmap_size + self.radius_factor = radius_factor + self.heatmap_type = heatmap_type + self.blur_kernel_size = blur_kernel_size + self.scale_factor = ((np.array(input_size) - 1) / (np.array(heatmap_size) - 1)).astype(np.float32) + self.increase_sigma_with_padding = increase_sigma_with_padding + self.sigma = sigma + + if self.heatmap_type not in {"gaussian", "combined"}: + raise ValueError( + f"{self.__class__.__name__} got invalid `heatmap_type` value" + f"{self.heatmap_type}. Should be one of " + '{"gaussian", "combined"}' + ) + + def encode( + self, + keypoints: np.ndarray, + keypoints_visible: Optional[np.ndarray] = None, + id_similarity: Optional[float] = 0.0, + keypoints_visibility: Optional[np.ndarray] = None, + ) -> dict: + """Encode keypoints into heatmaps. Note that the original keypoint + coordinates should be in the input image space. + + Args: + keypoints (np.ndarray): Keypoint coordinates in shape (N, K, D) + keypoints_visible (np.ndarray): Keypoint visibilities in shape + (N, K) + id_similarity (float): The usefulness of the identity information + for the whole pose. Defaults to 0.0 + keypoints_visibility (np.ndarray): The visibility bit for each + keypoint (N, K). Defaults to None + + Returns: + dict: + - heatmap (np.ndarray): The generated heatmap in shape + (C_out, H, W) where [W, H] is the `heatmap_size`, and the + C_out is the output channel number which depends on the + `heatmap_type`. If `heatmap_type=='gaussian'`, C_out equals to + keypoint number K; if `heatmap_type=='combined'`, C_out + equals to K*3 (x_offset, y_offset and class label) + - keypoint_weights (np.ndarray): The target weights in shape + (K,) + """ + assert keypoints.shape[0] == 1, f"{self.__class__.__name__} only support single-instance " "keypoint encoding" + + if keypoints_visibility is None: + keypoints_visibility = np.zeros(keypoints.shape[:2], dtype=np.float32) + + if keypoints_visible is None: + keypoints_visible = np.ones(keypoints.shape[:2], dtype=np.float32) + + heatmaps, keypoint_weights = generate_oks_maps( + heatmap_size=self.heatmap_size, + keypoints=keypoints / self.scale_factor, + keypoints_visible=keypoints_visible, + keypoints_visibility=keypoints_visibility, + sigma=self.sigma, + increase_sigma_with_padding=self.increase_sigma_with_padding, + ) + + annotated = keypoints_visible > 0 + + in_image = np.logical_and( + keypoints[:, :, 0] >= 0, + keypoints[:, :, 0] < self.input_size[0], + ) + in_image = np.logical_and( + in_image, + keypoints[:, :, 1] >= 0, + ) + in_image = np.logical_and( + in_image, + keypoints[:, :, 1] < self.input_size[1], + ) + + encoded = dict( + heatmaps=heatmaps, + keypoint_weights=keypoint_weights, + annotated=annotated, + in_image=in_image, + keypoints_scaled=keypoints, + # identification_similarity=id_similarity, + ) + + return encoded + + def decode(self, encoded: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: + """Decode keypoint coordinates from heatmaps. The decoded keypoint + coordinates are in the input image space. + + Args: + encoded (np.ndarray): Heatmaps in shape (K, H, W) + + Returns: + tuple: + - keypoints (np.ndarray): Decoded keypoint coordinates in shape + (N, K, D) + - scores (np.ndarray): The keypoint scores in shape (N, K). It + usually represents the confidence of the keypoint prediction + """ + heatmaps = encoded.copy() + W, H = self.heatmap_size + + logger = get_root_logger() + + if self.heatmap_type == "gaussian": + + # Set the bottom 20% of the heatmaps to zero to filter-out low values (~noise) + # bottom_threshold = np.percentile(heatmaps, 20) + # heatmaps[heatmaps < bottom_threshold] = 0 + + keypoints_max, scores = get_heatmap_maximum(heatmaps) + # unsqueeze the instance dimension for single-instance results + keypoints_max = keypoints_max[None] + scores = scores[None] + + try: + keypoints = refine_keypoints_dark_udp(keypoints_max.copy(), heatmaps, blur_kernel_size=self.blur_kernel_size) + # print(keypoints_max.shape, keypoints_max.dtype, keypoints.shape, keypoints.dtype) + except np.linalg.LinAlgError: + keypoints = keypoints_max.copy() + logger.warning("LinAlgError in np.linalg.solve, " "use the non-refined keypoints instead") + + draw_comparison = False + if draw_comparison: + + keypoints_exp, _, oks_maps = get_heatmap_expected_value(encoded.copy(), return_heatmap=True) + keypoints_exp = keypoints_exp.reshape(keypoints.shape) + + dist = np.linalg.norm(keypoints - keypoints_exp, axis=-1) + + # keypoints = keypoints_exp + # keypoints[dist > 1] = keypoints_exp[dist > 1] + + for k in range(keypoints.shape[1]): + # continue + d = dist[0, k] + + # 1/4 of heatmap pixel is 1 pixel in image space + if 0.5 < d: + + # Skip 80% of the heatmaps to save time and space + if np.random.rand() < 0.8: + continue + + # size = np.array([W, H]) + size = self.input_size + + # Draw heatmaps with estimated values + htm = encoded.copy()[k, :, :] + # kpt_max, _ = get_heatmap_maximum(htm.reshape(1, H, W).copy()) + # kpt_max = np.array(kpt_max).reshape(2) + htm = cv2.resize(htm, (size[0], size[1])) + htm = cv2.cvtColor(htm, cv2.COLOR_GRAY2BGR) + htm /= htm.max() + htm = cv2.applyColorMap((htm * 255).astype(np.uint8), cv2.COLORMAP_JET) + htm_exp = htm.copy() + htm_max = htm.copy() + kpt = keypoints[0, k, :] + kpt_max = keypoints_max[0, k, :] + kpt_exp = keypoints_exp[0, k, :] + + kpt = (kpt / [W - 1, H - 1] * size).flatten() + kpt_exp = (kpt_exp / [W - 1, H - 1] * size).flatten() + kpt_max = (kpt_max / [W - 1, H - 1] * size).flatten() + + # kpt[0] = np.clip(kpt[0], 0, size[0] - 1) + # kpt[1] = np.clip(kpt[1], 0, size[1] - 1) + # kpt_exp[0] = np.clip(kpt_exp[0], 0, size[0] - 1) + # kpt_exp[1] = np.clip(kpt_exp[1], 0, size[1] - 1) + # kpt_max[0] = np.clip(kpt_max[0], 0, size[0] - 1) + # kpt_max[1] = np.clip(kpt_max[1], 0, size[1] - 1) + + kpt = kpt.astype(int) + kpt_exp = kpt_exp.astype(int) + kpt_max = kpt_max.astype(int) + + htm_raw = htm.copy() + + htm_center = np.array(size) // 2 + htm = cv2.arrowedLine(htm, htm_center, kpt, (191, 64, 191), thickness=1, tipLength=0.05) + htm_exp = cv2.arrowedLine(htm_exp, htm_center, kpt_exp, (191, 64, 191), thickness=1, tipLength=0.05) + htm_max = cv2.arrowedLine(htm_max, htm_center, kpt_max, (191, 64, 191), thickness=1, tipLength=0.05) + + white_column = np.ones((size[1], 3, 3), dtype=np.uint8) * 150 + save_img = np.hstack((htm_max, white_column, htm, white_column, htm_exp)) + + oksm = oks_maps[k, :, :] + oksm = cv2.resize(oksm, (size[0], size[1])) + oksm /= oksm.max() + oksm = cv2.cvtColor(oksm, cv2.COLOR_GRAY2BGR) + oksm = cv2.applyColorMap((oksm * 255).astype(np.uint8), cv2.COLORMAP_JET) + + raw_htm = encoded[k, :, :].copy().reshape(1, H, W) + blur_htm = gaussian_blur(raw_htm.copy(), self.blur_kernel_size).squeeze() + blur_htm = cv2.resize(blur_htm, (size[0], size[1])) + blur_htm /= blur_htm.max() + blur_htm = cv2.cvtColor(blur_htm, cv2.COLOR_GRAY2BGR) + blur_htm = cv2.applyColorMap((blur_htm * 255).astype(np.uint8), cv2.COLORMAP_JET) + + raw_htm = cv2.resize(raw_htm.squeeze(), (size[0], size[1])) + raw_htm /= raw_htm.max() + raw_htm = cv2.cvtColor(raw_htm, cv2.COLOR_GRAY2BGR) + raw_htm = cv2.applyColorMap((raw_htm * 255).astype(np.uint8), cv2.COLORMAP_JET) + + oksm_merge = oksm.copy() + oksm_merge = cv2.drawMarker(oksm_merge, kpt_exp, (191, 64, 191), cv2.MARKER_CROSS, 10, 2) + + htm_merge = blur_htm.copy() + htm_merge = cv2.drawMarker(htm_merge, kpt, (255, 159, 207), cv2.MARKER_CROSS, 10, 2) + htm_merge = cv2.drawMarker(htm_merge, kpt_max, (255, 255, 255), cv2.MARKER_CROSS, 10, 2) + + save_heatmaps = np.hstack((raw_htm, white_column, blur_htm, white_column, oksm)) + white_row = np.ones((3, save_img.shape[1], 3), dtype=np.uint8) * 150 + save_img = np.vstack((save_img, white_row, save_heatmaps)) + + os.makedirs("debug", exist_ok=True) + save_path = "debug/{:04.1f}_{:d}_{:06d}.png".format(d, k, abs(hash(str(keypoints[0, k, :])) % (10**6))) + cv2.imwrite(save_path, save_img) + save_path = "debug/{:04.1f}_{:d}_{:06d}_merge.png".format(d, k, abs(hash(str(keypoints[0, k, :])) % (10**6))) + cv2.imwrite(save_path, np.hstack((htm_merge, white_column, oksm_merge))) + save_path = "debug/{:04.1f}_{:d}_{:06d}_blur.png".format(d, k, abs(hash(str(keypoints[0, k, :])) % (10**6))) + cv2.imwrite(save_path, htm_merge) + save_path = "debug/{:04.1f}_{:d}_{:06d}_oks.png".format(d, k, abs(hash(str(keypoints[0, k, :])) % (10**6))) + cv2.imwrite(save_path, oksm_merge) + + elif self.heatmap_type == "combined": + _K, H, W = heatmaps.shape + K = _K // 3 + + for cls_heatmap in heatmaps[::3]: + # Apply Gaussian blur on classification maps + ks = 2 * self.blur_kernel_size + 1 + cv2.GaussianBlur(cls_heatmap, (ks, ks), 0, cls_heatmap) + + # valid radius + radius = self.radius_factor * max(W, H) + + x_offset = heatmaps[1::3].flatten() * radius + y_offset = heatmaps[2::3].flatten() * radius + keypoints, scores = get_heatmap_maximum(heatmaps=heatmaps[::3]) + index = (keypoints[..., 0] + keypoints[..., 1] * W).flatten() + index += W * H * np.arange(0, K) + index = index.astype(int) + keypoints += np.stack((x_offset[index], y_offset[index]), axis=-1) + # unsqueeze the instance dimension for single-instance results + keypoints = keypoints[None].astype(np.float32) + scores = scores[None] + + keypoints = keypoints / [W - 1, H - 1] * self.input_size + + return keypoints, scores diff --git a/mmpose/codecs/onehot_heatmap.py b/mmpose/codecs/onehot_heatmap.py index e820271f6c92ec93cb3abec3009b7acb9d804e1f..7c95e2208f6e6be81e8dda1c2d7c961c643c865d 100644 --- a/mmpose/codecs/onehot_heatmap.py +++ b/mmpose/codecs/onehot_heatmap.py @@ -5,9 +5,9 @@ import cv2 import numpy as np from mmpose.registry import KEYPOINT_CODECS + from .base import BaseKeypointCodec -from .utils import (generate_offset_heatmap, generate_onehot_heatmaps, - get_heatmap_maximum, refine_keypoints_dark_udp) +from .utils import generate_offset_heatmap, generate_onehot_heatmaps, get_heatmap_maximum, refine_keypoints_dark_udp @KEYPOINT_CODECS.register_module() @@ -57,20 +57,25 @@ class OneHotHeatmap(BaseKeypointCodec): Human Pose Estimation`: https://arxiv.org/abs/1911.07524 """ - label_mapping_table = dict(keypoint_weights='keypoint_weights', ) - field_mapping_table = dict(heatmaps='heatmaps', ) - - def __init__(self, - input_size: Tuple[int, int], - heatmap_size: Tuple[int, int], - heatmap_type: str = 'gaussian', - sigma: float = 2., - radius_factor: float = 0.0546875, - blur_kernel_size: int = 11, - increase_sigma_with_padding=False, - amap_scale: float = 1.0, - normalize=None, - ) -> None: + label_mapping_table = dict( + keypoint_weights="keypoint_weights", + ) + field_mapping_table = dict( + heatmaps="heatmaps", + ) + + def __init__( + self, + input_size: Tuple[int, int], + heatmap_size: Tuple[int, int], + heatmap_type: str = "gaussian", + sigma: float = 2.0, + radius_factor: float = 0.0546875, + blur_kernel_size: int = 11, + increase_sigma_with_padding=False, + amap_scale: float = 1.0, + normalize=None, + ) -> None: super().__init__() self.input_size = np.array(input_size) self.heatmap_size = np.array(heatmap_size) @@ -82,16 +87,16 @@ class OneHotHeatmap(BaseKeypointCodec): self.normalize = normalize self.amap_size = self.input_size * amap_scale - self.scale_factor = ((self.amap_size - 1) / - (self.heatmap_size - 1)).astype(np.float32) + self.scale_factor = ((self.amap_size - 1) / (self.heatmap_size - 1)).astype(np.float32) self.input_center = self.input_size / 2 self.top_left = self.input_center - self.amap_size / 2 - - if self.heatmap_type not in {'gaussian', 'combined'}: + + if self.heatmap_type not in {"gaussian", "combined"}: raise ValueError( - f'{self.__class__.__name__} got invalid `heatmap_type` value' - f'{self.heatmap_type}. Should be one of ' - '{"gaussian", "combined"}') + f"{self.__class__.__name__} got invalid `heatmap_type` value" + f"{self.heatmap_type}. Should be one of " + '{"gaussian", "combined"}' + ) def _kpts_to_activation_pts(self, keypoints: np.ndarray) -> np.ndarray: """ @@ -104,7 +109,7 @@ class OneHotHeatmap(BaseKeypointCodec): transformed_keypoints = keypoints - self.top_left transformed_keypoints = transformed_keypoints / self.scale_factor return transformed_keypoints - + def _activation_pts_to_kpts(self, keypoints: np.ndarray) -> np.ndarray: """ Transform the points in activation map to the keypoint coordinates. @@ -118,11 +123,13 @@ class OneHotHeatmap(BaseKeypointCodec): transformed_keypoints += self.top_left return transformed_keypoints - def encode(self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None, - id_similarity: Optional[float] = 0.0, - keypoints_visibility: Optional[np.ndarray] = None) -> dict: + def encode( + self, + keypoints: np.ndarray, + keypoints_visible: Optional[np.ndarray] = None, + id_similarity: Optional[float] = 0.0, + keypoints_visibility: Optional[np.ndarray] = None, + ) -> dict: """Encode keypoints into heatmaps. Note that the original keypoint coordinates should be in the input image space. @@ -146,36 +153,37 @@ class OneHotHeatmap(BaseKeypointCodec): - keypoint_weights (np.ndarray): The target weights in shape (K,) """ - assert keypoints.shape[0] == 1, ( - f'{self.__class__.__name__} only support single-instance ' - 'keypoint encoding') - + assert keypoints.shape[0] == 1, f"{self.__class__.__name__} only support single-instance " "keypoint encoding" + if keypoints_visibility is None: keypoints_visibility = np.zeros(keypoints.shape[:2], dtype=np.float32) if keypoints_visible is None: keypoints_visible = np.ones(keypoints.shape[:2], dtype=np.float32) - if self.heatmap_type == 'gaussian': + if self.heatmap_type == "gaussian": heatmaps, keypoint_weights = generate_onehot_heatmaps( heatmap_size=self.heatmap_size, keypoints=self._kpts_to_activation_pts(keypoints), keypoints_visible=keypoints_visible, sigma=self.sigma, keypoints_visibility=keypoints_visibility, - increase_sigma_with_padding=self.increase_sigma_with_padding) - elif self.heatmap_type == 'combined': + increase_sigma_with_padding=self.increase_sigma_with_padding, + ) + elif self.heatmap_type == "combined": heatmaps, keypoint_weights = generate_offset_heatmap( heatmap_size=self.heatmap_size, keypoints=self._kpts_to_activation_pts(keypoints), keypoints_visible=keypoints_visible, - radius_factor=self.radius_factor) + radius_factor=self.radius_factor, + ) else: raise ValueError( - f'{self.__class__.__name__} got invalid `heatmap_type` value' - f'{self.heatmap_type}. Should be one of ' - '{"gaussian", "combined"}') - + f"{self.__class__.__name__} got invalid `heatmap_type` value" + f"{self.heatmap_type}. Should be one of " + '{"gaussian", "combined"}' + ) + if self.normalize is not None: heatmaps_sum = np.sum(heatmaps, axis=(1, 2), keepdims=False) mask = heatmaps_sum > 0 @@ -183,7 +191,7 @@ class OneHotHeatmap(BaseKeypointCodec): heatmaps = heatmaps * self.normalize annotated = keypoints_visible > 0 - + heatmap_keypoints = self._kpts_to_activation_pts(keypoints) in_image = np.logical_and( heatmap_keypoints[:, :, 0] >= 0, @@ -197,7 +205,7 @@ class OneHotHeatmap(BaseKeypointCodec): in_image, heatmap_keypoints[:, :, 1] < self.heatmap_size[1], ) - + encoded = dict( heatmaps=heatmaps, keypoint_weights=keypoint_weights, @@ -226,16 +234,15 @@ class OneHotHeatmap(BaseKeypointCodec): """ heatmaps = encoded.copy() - if self.heatmap_type == 'gaussian': + if self.heatmap_type == "gaussian": keypoints, scores = get_heatmap_maximum(heatmaps) # unsqueeze the instance dimension for single-instance results keypoints = keypoints[None] scores = scores[None] - keypoints = refine_keypoints_dark_udp( - keypoints, heatmaps, blur_kernel_size=self.blur_kernel_size) + keypoints = refine_keypoints_dark_udp(keypoints, heatmaps, blur_kernel_size=self.blur_kernel_size) - elif self.heatmap_type == 'combined': + elif self.heatmap_type == "combined": _K, H, W = heatmaps.shape K = _K // 3 diff --git a/mmpose/codecs/regression_label.py b/mmpose/codecs/regression_label.py index 74cd21b73dcadc5ad4df2a5f270da9e0a2ce3a68..33f2402926c5b0398e17b33d2c769bfe87e09cac 100644 --- a/mmpose/codecs/regression_label.py +++ b/mmpose/codecs/regression_label.py @@ -5,6 +5,7 @@ from typing import Optional, Tuple import numpy as np from mmpose.registry import KEYPOINT_CODECS + from .base import BaseKeypointCodec @@ -31,8 +32,8 @@ class RegressionLabel(BaseKeypointCodec): """ label_mapping_table = dict( - keypoint_labels='keypoint_labels', - keypoint_weights='keypoint_weights', + keypoint_labels="keypoint_labels", + keypoint_weights="keypoint_weights", ) def __init__(self, input_size: Tuple[int, int]) -> None: @@ -40,9 +41,7 @@ class RegressionLabel(BaseKeypointCodec): self.input_size = input_size - def encode(self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None) -> dict: + def encode(self, keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None) -> dict: """Encoding keypoints from input image space to normalized space. Args: @@ -61,15 +60,12 @@ class RegressionLabel(BaseKeypointCodec): keypoints_visible = np.ones(keypoints.shape[:2], dtype=np.float32) w, h = self.input_size - valid = ((keypoints >= 0) & - (keypoints <= [w - 1, h - 1])).all(axis=-1) & ( - keypoints_visible > 0.5) + valid = ((keypoints >= 0) & (keypoints <= [w - 1, h - 1])).all(axis=-1) & (keypoints_visible > 0.5) keypoint_labels = (keypoints / np.array([w, h])).astype(np.float32) - keypoint_weights = np.where(valid, 1., 0.).astype(np.float32) + keypoint_weights = np.where(valid, 1.0, 0.0).astype(np.float32) - encoded = dict( - keypoint_labels=keypoint_labels, keypoint_weights=keypoint_weights) + encoded = dict(keypoint_labels=keypoint_labels, keypoint_weights=keypoint_weights) return encoded @@ -98,9 +94,7 @@ class RegressionLabel(BaseKeypointCodec): scores = (1 - output_sigma).mean(axis=-1) else: - raise ValueError( - 'Keypoint dimension should be 2 or 4 (with sigma), ' - f'but got {encoded.shape[-1]}') + raise ValueError("Keypoint dimension should be 2 or 4 (with sigma), " f"but got {encoded.shape[-1]}") w, h = self.input_size keypoints = normalized_coords * np.array([w, h]) diff --git a/mmpose/codecs/simcc_label.py b/mmpose/codecs/simcc_label.py index e83960faafbf6e0852ae0dbdd361989cbcfaa24b..df9df481364b7c551185fd1a79f9381083b4d2ef 100644 --- a/mmpose/codecs/simcc_label.py +++ b/mmpose/codecs/simcc_label.py @@ -7,6 +7,7 @@ import numpy as np from mmpose.codecs.utils import get_simcc_maximum from mmpose.codecs.utils.refinement import refine_simcc_dark from mmpose.registry import KEYPOINT_CODECS + from .base import BaseKeypointCodec @@ -59,15 +60,15 @@ class SimCCLabel(BaseKeypointCodec): """ label_mapping_table = dict( - keypoint_x_labels='keypoint_x_labels', - keypoint_y_labels='keypoint_y_labels', - keypoint_weights='keypoint_weights', + keypoint_x_labels="keypoint_x_labels", + keypoint_y_labels="keypoint_y_labels", + keypoint_weights="keypoint_weights", ) def __init__( self, input_size: Tuple[int, int], - smoothing_type: str = 'gaussian', + smoothing_type: str = "gaussian", sigma: Union[float, int, Tuple[float]] = 6.0, simcc_split_ratio: float = 2.0, label_smooth_weight: float = 0.0, @@ -92,22 +93,20 @@ class SimCCLabel(BaseKeypointCodec): else: self.sigma = np.array(sigma) - if self.smoothing_type not in {'gaussian', 'standard'}: + if self.smoothing_type not in {"gaussian", "standard"}: raise ValueError( - f'{self.__class__.__name__} got invalid `smoothing_type` value' - f'{self.smoothing_type}. Should be one of ' - '{"gaussian", "standard"}') + f"{self.__class__.__name__} got invalid `smoothing_type` value" + f"{self.smoothing_type}. Should be one of " + '{"gaussian", "standard"}' + ) - if self.smoothing_type == 'gaussian' and self.label_smooth_weight > 0: - raise ValueError('Attribute `label_smooth_weight` is only ' - 'used for `standard` mode.') + if self.smoothing_type == "gaussian" and self.label_smooth_weight > 0: + raise ValueError("Attribute `label_smooth_weight` is only " "used for `standard` mode.") if self.label_smooth_weight < 0.0 or self.label_smooth_weight > 1.0: - raise ValueError('`label_smooth_weight` should be in range [0, 1]') + raise ValueError("`label_smooth_weight` should be in range [0, 1]") - def encode(self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None) -> dict: + def encode(self, keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None) -> dict: """Encoding keypoints into SimCC labels. Note that the original keypoint coordinates should be in the input image space. @@ -134,27 +133,22 @@ class SimCCLabel(BaseKeypointCodec): if keypoints_visible is None: keypoints_visible = np.ones(keypoints.shape[:2], dtype=np.float32) - if self.smoothing_type == 'gaussian': - x_labels, y_labels, keypoint_weights = self._generate_gaussian( - keypoints, keypoints_visible) - elif self.smoothing_type == 'standard': - x_labels, y_labels, keypoint_weights = self._generate_standard( - keypoints, keypoints_visible) + if self.smoothing_type == "gaussian": + x_labels, y_labels, keypoint_weights = self._generate_gaussian(keypoints, keypoints_visible) + elif self.smoothing_type == "standard": + x_labels, y_labels, keypoint_weights = self._generate_standard(keypoints, keypoints_visible) else: raise ValueError( - f'{self.__class__.__name__} got invalid `smoothing_type` value' - f'{self.smoothing_type}. Should be one of ' - '{"gaussian", "standard"}') + f"{self.__class__.__name__} got invalid `smoothing_type` value" + f"{self.smoothing_type}. Should be one of " + '{"gaussian", "standard"}' + ) - encoded = dict( - keypoint_x_labels=x_labels, - keypoint_y_labels=y_labels, - keypoint_weights=keypoint_weights) + encoded = dict(keypoint_x_labels=x_labels, keypoint_y_labels=y_labels, keypoint_weights=keypoint_weights) return encoded - def decode(self, simcc_x: np.ndarray, - simcc_y: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: + def decode(self, simcc_x: np.ndarray, simcc_y: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: """Decode keypoint coordinates from SimCC representations. The decoded coordinates are in the input image space. @@ -183,27 +177,20 @@ class SimCCLabel(BaseKeypointCodec): y_blur = int((self.sigma[1] * 20 - 7) // 3) x_blur -= int((x_blur % 2) == 0) y_blur -= int((y_blur % 2) == 0) - keypoints[:, :, 0] = refine_simcc_dark(keypoints[:, :, 0], simcc_x, - x_blur) - keypoints[:, :, 1] = refine_simcc_dark(keypoints[:, :, 1], simcc_y, - y_blur) + keypoints[:, :, 0] = refine_simcc_dark(keypoints[:, :, 0], simcc_x, x_blur) + keypoints[:, :, 1] = refine_simcc_dark(keypoints[:, :, 1], simcc_y, y_blur) keypoints /= self.simcc_split_ratio if self.decode_visibility: _, visibility = get_simcc_maximum( - simcc_x * self.decode_beta * self.sigma[0], - simcc_y * self.decode_beta * self.sigma[1], - apply_softmax=True) + simcc_x * self.decode_beta * self.sigma[0], simcc_y * self.decode_beta * self.sigma[1], apply_softmax=True + ) return keypoints, (scores, visibility) else: return keypoints, scores - def _map_coordinates( - self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None - ) -> Tuple[np.ndarray, np.ndarray]: + def _map_coordinates(self, keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None) -> Tuple[np.ndarray, np.ndarray]: """Mapping keypoint coordinates into SimCC space.""" keypoints_split = keypoints.copy() @@ -214,9 +201,7 @@ class SimCCLabel(BaseKeypointCodec): return keypoints_split, keypoint_weights def _generate_standard( - self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None + self, keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: """Encoding keypoints into SimCC labels with Standard Label Smoothing strategy. @@ -229,8 +214,7 @@ class SimCCLabel(BaseKeypointCodec): W = np.around(w * self.simcc_split_ratio).astype(int) H = np.around(h * self.simcc_split_ratio).astype(int) - keypoints_split, keypoint_weights = self._map_coordinates( - keypoints, keypoints_visible) + keypoints_split, keypoint_weights = self._map_coordinates(keypoints, keypoints_visible) target_x = np.zeros((N, K, W), dtype=np.float32) target_y = np.zeros((N, K, H), dtype=np.float32) @@ -258,9 +242,7 @@ class SimCCLabel(BaseKeypointCodec): return target_x, target_y, keypoint_weights def _generate_gaussian( - self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None + self, keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: """Encoding keypoints into SimCC labels with Gaussian Label Smoothing strategy.""" @@ -270,8 +252,7 @@ class SimCCLabel(BaseKeypointCodec): W = np.around(w * self.simcc_split_ratio).astype(int) H = np.around(h * self.simcc_split_ratio).astype(int) - keypoints_split, keypoint_weights = self._map_coordinates( - keypoints, keypoints_visible) + keypoints_split, keypoint_weights = self._map_coordinates(keypoints, keypoints_visible) target_x = np.zeros((N, K, W), dtype=np.float32) target_y = np.zeros((N, K, H), dtype=np.float32) @@ -300,8 +281,8 @@ class SimCCLabel(BaseKeypointCodec): mu_x, mu_y = mu - target_x[n, k] = np.exp(-((x - mu_x)**2) / (2 * self.sigma[0]**2)) - target_y[n, k] = np.exp(-((y - mu_y)**2) / (2 * self.sigma[1]**2)) + target_x[n, k] = np.exp(-((x - mu_x) ** 2) / (2 * self.sigma[0] ** 2)) + target_y[n, k] = np.exp(-((y - mu_y) ** 2) / (2 * self.sigma[1] ** 2)) if self.normalize: norm_value = self.sigma * np.sqrt(np.pi * 2) diff --git a/mmpose/codecs/spr.py b/mmpose/codecs/spr.py index fba17f15982f1b38ac07bf5f6d61bfec0286a660..d21ab90760d310213b28cf4262ae05beb5b31157 100644 --- a/mmpose/codecs/spr.py +++ b/mmpose/codecs/spr.py @@ -6,10 +6,9 @@ import torch from torch import Tensor from mmpose.registry import KEYPOINT_CODECS + from .base import BaseKeypointCodec -from .utils import (batch_heatmap_nms, generate_displacement_heatmap, - generate_gaussian_heatmaps, get_diagonal_lengths, - get_instance_root) +from .utils import batch_heatmap_nms, generate_displacement_heatmap, generate_gaussian_heatmaps, get_diagonal_lengths, get_instance_root @KEYPOINT_CODECS.register_module() @@ -74,10 +73,10 @@ class SPR(BaseKeypointCodec): """ field_mapping_table = dict( - heatmaps='heatmaps', - heatmap_weights='heatmap_weights', - displacements='displacements', - displacement_weights='displacement_weights', + heatmaps="heatmaps", + heatmap_weights="heatmap_weights", + displacements="displacements", + displacement_weights="displacement_weights", ) def __init__( @@ -86,7 +85,7 @@ class SPR(BaseKeypointCodec): heatmap_size: Tuple[int, int], sigma: Optional[Union[float, Tuple[float]]] = None, generate_keypoint_heatmaps: bool = False, - root_type: str = 'kpt_center', + root_type: str = "kpt_center", minimal_diagonal_length: Union[int, float] = 5, background_weight: float = 0.1, decode_nms_kernel: int = 5, @@ -105,29 +104,23 @@ class SPR(BaseKeypointCodec): self.decode_max_instances = decode_max_instances self.decode_thr = decode_thr - self.scale_factor = (np.array(input_size) / - heatmap_size).astype(np.float32) + self.scale_factor = (np.array(input_size) / heatmap_size).astype(np.float32) if sigma is None: - sigma = (heatmap_size[0] * heatmap_size[1])**0.5 / 32 + sigma = (heatmap_size[0] * heatmap_size[1]) ** 0.5 / 32 if generate_keypoint_heatmaps: # sigma for root heatmap and keypoint heatmaps self.sigma = (sigma, sigma // 2) else: - self.sigma = (sigma, ) + self.sigma = (sigma,) else: if not isinstance(sigma, (tuple, list)): - sigma = (sigma, ) + sigma = (sigma,) if generate_keypoint_heatmaps: - assert len(sigma) == 2, 'sigma for keypoints must be given ' \ - 'if `generate_keypoint_heatmaps` ' \ - 'is True. e.g. sigma=(4, 2)' + assert len(sigma) == 2, "sigma for keypoints must be given " "if `generate_keypoint_heatmaps` " "is True. e.g. sigma=(4, 2)" self.sigma = sigma - def _get_heatmap_weights(self, - heatmaps, - fg_weight: float = 1, - bg_weight: float = 0): + def _get_heatmap_weights(self, heatmaps, fg_weight: float = 1, bg_weight: float = 0): """Generate weight array for heatmaps. Args: @@ -142,9 +135,7 @@ class SPR(BaseKeypointCodec): heatmap_weights[heatmaps > 0] = fg_weight return heatmap_weights - def encode(self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None) -> dict: + def encode(self, keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None) -> dict: """Encode keypoints into root heatmaps and keypoint displacement fields. Note that the original keypoint coordinates should be in the input image space. @@ -176,8 +167,7 @@ class SPR(BaseKeypointCodec): _keypoints = keypoints / self.scale_factor # compute the root and scale of each instance - roots, roots_visible = get_instance_root(_keypoints, keypoints_visible, - self.root_type) + roots, roots_visible = get_instance_root(_keypoints, keypoints_visible, self.root_type) diagonal_lengths = get_diagonal_lengths(_keypoints, keypoints_visible) # discard the small instances @@ -185,49 +175,38 @@ class SPR(BaseKeypointCodec): # generate heatmaps heatmaps, _ = generate_gaussian_heatmaps( - heatmap_size=self.heatmap_size, - keypoints=roots[:, None], - keypoints_visible=roots_visible[:, None], - sigma=self.sigma[0]) - heatmap_weights = self._get_heatmap_weights( - heatmaps, bg_weight=self.background_weight) + heatmap_size=self.heatmap_size, keypoints=roots[:, None], keypoints_visible=roots_visible[:, None], sigma=self.sigma[0] + ) + heatmap_weights = self._get_heatmap_weights(heatmaps, bg_weight=self.background_weight) if self.generate_keypoint_heatmaps: keypoint_heatmaps, _ = generate_gaussian_heatmaps( - heatmap_size=self.heatmap_size, - keypoints=_keypoints, - keypoints_visible=keypoints_visible, - sigma=self.sigma[1]) + heatmap_size=self.heatmap_size, keypoints=_keypoints, keypoints_visible=keypoints_visible, sigma=self.sigma[1] + ) - keypoint_heatmaps_weights = self._get_heatmap_weights( - keypoint_heatmaps, bg_weight=self.background_weight) + keypoint_heatmaps_weights = self._get_heatmap_weights(keypoint_heatmaps, bg_weight=self.background_weight) heatmaps = np.concatenate((keypoint_heatmaps, heatmaps), axis=0) - heatmap_weights = np.concatenate( - (keypoint_heatmaps_weights, heatmap_weights), axis=0) + heatmap_weights = np.concatenate((keypoint_heatmaps_weights, heatmap_weights), axis=0) # generate displacements - displacements, displacement_weights = \ - generate_displacement_heatmap( - self.heatmap_size, - _keypoints, - keypoints_visible, - roots, - roots_visible, - diagonal_lengths, - self.sigma[0], - ) + displacements, displacement_weights = generate_displacement_heatmap( + self.heatmap_size, + _keypoints, + keypoints_visible, + roots, + roots_visible, + diagonal_lengths, + self.sigma[0], + ) encoded = dict( - heatmaps=heatmaps, - heatmap_weights=heatmap_weights, - displacements=displacements, - displacement_weights=displacement_weights) + heatmaps=heatmaps, heatmap_weights=heatmap_weights, displacements=displacements, displacement_weights=displacement_weights + ) return encoded - def decode(self, heatmaps: Tensor, - displacements: Tensor) -> Tuple[np.ndarray, np.ndarray]: + def decode(self, heatmaps: Tensor, displacements: Tensor) -> Tuple[np.ndarray, np.ndarray]: """Decode the keypoint coordinates from heatmaps and displacements. The decoded keypoint coordinates are in the input image space. @@ -258,10 +237,8 @@ class SPR(BaseKeypointCodec): posemaps = (regular_grid[None] + displacements).flatten(2) # find local maximum on root heatmap - root_heatmap_peaks = batch_heatmap_nms(heatmaps[None, -1:], - self.decode_nms_kernel) - root_scores, pos_idx = root_heatmap_peaks.flatten().topk( - self.decode_max_instances) + root_heatmap_peaks = batch_heatmap_nms(heatmaps[None, -1:], self.decode_nms_kernel) + root_scores, pos_idx = root_heatmap_peaks.flatten().topk(self.decode_max_instances) mask = root_scores > self.decode_thr root_scores, pos_idx = root_scores[mask], pos_idx[mask] @@ -273,11 +250,7 @@ class SPR(BaseKeypointCodec): else: keypoint_scores = None - keypoints = torch.cat([ - kpt * self.scale_factor[i] - for i, kpt in enumerate(keypoints.split(1, -1)) - ], - dim=-1) + keypoints = torch.cat([kpt * self.scale_factor[i] for i, kpt in enumerate(keypoints.split(1, -1))], dim=-1) return keypoints, (root_scores, keypoint_scores) def get_keypoint_scores(self, heatmaps: Tensor, keypoints: Tensor): @@ -292,15 +265,20 @@ class SPR(BaseKeypointCodec): Tensor: Keypoint scores in [N, K] """ k, h, w = heatmaps.shape - keypoints = torch.stack(( - keypoints[..., 0] / (w - 1) * 2 - 1, - keypoints[..., 1] / (h - 1) * 2 - 1, - ), - dim=-1) + keypoints = torch.stack( + ( + keypoints[..., 0] / (w - 1) * 2 - 1, + keypoints[..., 1] / (h - 1) * 2 - 1, + ), + dim=-1, + ) keypoints = keypoints.transpose(0, 1).unsqueeze(1).contiguous() - keypoint_scores = torch.nn.functional.grid_sample( - heatmaps.unsqueeze(1), keypoints, - padding_mode='border').view(k, -1).transpose(0, 1).contiguous() + keypoint_scores = ( + torch.nn.functional.grid_sample(heatmaps.unsqueeze(1), keypoints, padding_mode="border") + .view(k, -1) + .transpose(0, 1) + .contiguous() + ) return keypoint_scores diff --git a/mmpose/codecs/udp_heatmap.py b/mmpose/codecs/udp_heatmap.py index 1fcdbd559166ff159d614d2c1e3048c27e942570..2f3059d8b8eef9c0e12696dc0f0b28445518b61b 100644 --- a/mmpose/codecs/udp_heatmap.py +++ b/mmpose/codecs/udp_heatmap.py @@ -5,9 +5,9 @@ import cv2 import numpy as np from mmpose.registry import KEYPOINT_CODECS + from .base import BaseKeypointCodec -from .utils import (generate_offset_heatmap, generate_udp_gaussian_heatmaps, - get_heatmap_maximum, refine_keypoints_dark_udp) +from .utils import generate_offset_heatmap, generate_udp_gaussian_heatmaps, get_heatmap_maximum, refine_keypoints_dark_udp @KEYPOINT_CODECS.register_module() @@ -57,20 +57,25 @@ class UDPHeatmap(BaseKeypointCodec): Human Pose Estimation`: https://arxiv.org/abs/1911.07524 """ - label_mapping_table = dict(keypoint_weights='keypoint_weights', ) - field_mapping_table = dict(heatmaps='heatmaps', ) - - def __init__(self, - input_size: Tuple[int, int], - heatmap_size: Tuple[int, int], - heatmap_type: str = 'gaussian', - sigma: float = 2., - radius_factor: float = 0.0546875, - blur_kernel_size: int = 11, - increase_sigma_with_padding=False, - amap_scale: float = 1.0, - normalize=None, - ) -> None: + label_mapping_table = dict( + keypoint_weights="keypoint_weights", + ) + field_mapping_table = dict( + heatmaps="heatmaps", + ) + + def __init__( + self, + input_size: Tuple[int, int], + heatmap_size: Tuple[int, int], + heatmap_type: str = "gaussian", + sigma: float = 2.0, + radius_factor: float = 0.0546875, + blur_kernel_size: int = 11, + increase_sigma_with_padding=False, + amap_scale: float = 1.0, + normalize=None, + ) -> None: super().__init__() self.input_size = np.array(input_size) self.heatmap_size = np.array(heatmap_size) @@ -82,16 +87,16 @@ class UDPHeatmap(BaseKeypointCodec): self.normalize = normalize self.amap_size = self.input_size * amap_scale - self.scale_factor = ((self.amap_size - 1) / - (self.heatmap_size - 1)).astype(np.float32) + self.scale_factor = ((self.amap_size - 1) / (self.heatmap_size - 1)).astype(np.float32) self.input_center = self.input_size / 2 self.top_left = self.input_center - self.amap_size / 2 - - if self.heatmap_type not in {'gaussian', 'combined'}: + + if self.heatmap_type not in {"gaussian", "combined"}: raise ValueError( - f'{self.__class__.__name__} got invalid `heatmap_type` value' - f'{self.heatmap_type}. Should be one of ' - '{"gaussian", "combined"}') + f"{self.__class__.__name__} got invalid `heatmap_type` value" + f"{self.heatmap_type}. Should be one of " + '{"gaussian", "combined"}' + ) def _kpts_to_activation_pts(self, keypoints: np.ndarray) -> np.ndarray: """ @@ -104,7 +109,7 @@ class UDPHeatmap(BaseKeypointCodec): transformed_keypoints = keypoints - self.top_left transformed_keypoints = transformed_keypoints / self.scale_factor return transformed_keypoints - + def _activation_pts_to_kpts(self, keypoints: np.ndarray) -> np.ndarray: """ Transform the points in activation map to the keypoint coordinates. @@ -118,11 +123,13 @@ class UDPHeatmap(BaseKeypointCodec): transformed_keypoints += self.top_left return transformed_keypoints - def encode(self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None, - id_similarity: Optional[float] = 0.0, - keypoints_visibility: Optional[np.ndarray] = None) -> dict: + def encode( + self, + keypoints: np.ndarray, + keypoints_visible: Optional[np.ndarray] = None, + id_similarity: Optional[float] = 0.0, + keypoints_visibility: Optional[np.ndarray] = None, + ) -> dict: """Encode keypoints into heatmaps. Note that the original keypoint coordinates should be in the input image space. @@ -146,36 +153,37 @@ class UDPHeatmap(BaseKeypointCodec): - keypoint_weights (np.ndarray): The target weights in shape (K,) """ - assert keypoints.shape[0] == 1, ( - f'{self.__class__.__name__} only support single-instance ' - 'keypoint encoding') - + assert keypoints.shape[0] == 1, f"{self.__class__.__name__} only support single-instance " "keypoint encoding" + if keypoints_visibility is None: keypoints_visibility = np.zeros(keypoints.shape[:2], dtype=np.float32) if keypoints_visible is None: keypoints_visible = np.ones(keypoints.shape[:2], dtype=np.float32) - if self.heatmap_type == 'gaussian': + if self.heatmap_type == "gaussian": heatmaps, keypoint_weights = generate_udp_gaussian_heatmaps( heatmap_size=self.heatmap_size, keypoints=self._kpts_to_activation_pts(keypoints), keypoints_visible=keypoints_visible, sigma=self.sigma, keypoints_visibility=keypoints_visibility, - increase_sigma_with_padding=self.increase_sigma_with_padding) - elif self.heatmap_type == 'combined': + increase_sigma_with_padding=self.increase_sigma_with_padding, + ) + elif self.heatmap_type == "combined": heatmaps, keypoint_weights = generate_offset_heatmap( heatmap_size=self.heatmap_size, keypoints=self._kpts_to_activation_pts(keypoints), keypoints_visible=keypoints_visible, - radius_factor=self.radius_factor) + radius_factor=self.radius_factor, + ) else: raise ValueError( - f'{self.__class__.__name__} got invalid `heatmap_type` value' - f'{self.heatmap_type}. Should be one of ' - '{"gaussian", "combined"}') - + f"{self.__class__.__name__} got invalid `heatmap_type` value" + f"{self.heatmap_type}. Should be one of " + '{"gaussian", "combined"}' + ) + if self.normalize is not None: heatmaps_sum = np.sum(heatmaps, axis=(1, 2), keepdims=False) mask = heatmaps_sum > 0 @@ -183,7 +191,7 @@ class UDPHeatmap(BaseKeypointCodec): heatmaps = heatmaps * self.normalize annotated = keypoints_visible > 0 - + heatmap_keypoints = self._kpts_to_activation_pts(keypoints) in_image = np.logical_and( heatmap_keypoints[:, :, 0] >= 0, @@ -197,7 +205,7 @@ class UDPHeatmap(BaseKeypointCodec): in_image, heatmap_keypoints[:, :, 1] < self.heatmap_size[1], ) - + encoded = dict( heatmaps=heatmaps, keypoint_weights=keypoint_weights, @@ -226,16 +234,15 @@ class UDPHeatmap(BaseKeypointCodec): """ heatmaps = encoded.copy() - if self.heatmap_type == 'gaussian': + if self.heatmap_type == "gaussian": keypoints, scores = get_heatmap_maximum(heatmaps) # unsqueeze the instance dimension for single-instance results keypoints = keypoints[None] scores = scores[None] - keypoints = refine_keypoints_dark_udp( - keypoints, heatmaps, blur_kernel_size=self.blur_kernel_size) + keypoints = refine_keypoints_dark_udp(keypoints, heatmaps, blur_kernel_size=self.blur_kernel_size) - elif self.heatmap_type == 'combined': + elif self.heatmap_type == "combined": _K, H, W = heatmaps.shape K = _K // 3 diff --git a/mmpose/codecs/utils/__init__.py b/mmpose/codecs/utils/__init__.py index e11f254466857d73de73112c9a8c4f28112a98e4..de129a1e82013a5949ab80048b4c51a094c156de 100644 --- a/mmpose/codecs/utils/__init__.py +++ b/mmpose/codecs/utils/__init__.py @@ -1,32 +1,52 @@ # Copyright (c) OpenMMLab. All rights reserved. -from .camera_image_projection import (camera_to_image_coord, camera_to_pixel, - pixel_to_camera) -from .gaussian_heatmap import (generate_3d_gaussian_heatmaps, - generate_gaussian_heatmaps, - generate_udp_gaussian_heatmaps, - generate_unbiased_gaussian_heatmaps, - generate_onehot_heatmaps) -from .instance_property import (get_diagonal_lengths, get_instance_bbox, - get_instance_root) -from .offset_heatmap import (generate_displacement_heatmap, - generate_offset_heatmap) -from .post_processing import (batch_heatmap_nms, gaussian_blur, - gaussian_blur1d, get_heatmap_3d_maximum, - get_heatmap_maximum, get_simcc_maximum, - get_simcc_normalized, get_heatmap_expected_value) -from .refinement import (refine_keypoints, refine_keypoints_dark, - refine_keypoints_dark_udp, refine_simcc_dark) +from .camera_image_projection import camera_to_image_coord, camera_to_pixel, pixel_to_camera +from .gaussian_heatmap import ( + generate_3d_gaussian_heatmaps, + generate_gaussian_heatmaps, + generate_onehot_heatmaps, + generate_udp_gaussian_heatmaps, + generate_unbiased_gaussian_heatmaps, +) +from .instance_property import get_diagonal_lengths, get_instance_bbox, get_instance_root +from .offset_heatmap import generate_displacement_heatmap, generate_offset_heatmap from .oks_map import generate_oks_maps +from .post_processing import ( + batch_heatmap_nms, + gaussian_blur, + gaussian_blur1d, + get_heatmap_3d_maximum, + get_heatmap_expected_value, + get_heatmap_maximum, + get_simcc_maximum, + get_simcc_normalized, +) +from .refinement import refine_keypoints, refine_keypoints_dark, refine_keypoints_dark_udp, refine_simcc_dark __all__ = [ - 'generate_gaussian_heatmaps', 'generate_udp_gaussian_heatmaps', - 'generate_unbiased_gaussian_heatmaps', 'gaussian_blur', - 'get_heatmap_maximum', 'get_simcc_maximum', 'generate_offset_heatmap', - 'batch_heatmap_nms', 'refine_keypoints', 'refine_keypoints_dark', - 'refine_keypoints_dark_udp', 'generate_displacement_heatmap', - 'refine_simcc_dark', 'gaussian_blur1d', 'get_diagonal_lengths', - 'get_instance_root', 'get_instance_bbox', 'get_simcc_normalized', - 'camera_to_image_coord', 'camera_to_pixel', 'pixel_to_camera', - 'get_heatmap_3d_maximum', 'generate_3d_gaussian_heatmaps', - 'generate_oks_maps', 'get_heatmap_expected_value', 'generate_onehot_heatmaps' + "generate_gaussian_heatmaps", + "generate_udp_gaussian_heatmaps", + "generate_unbiased_gaussian_heatmaps", + "gaussian_blur", + "get_heatmap_maximum", + "get_simcc_maximum", + "generate_offset_heatmap", + "batch_heatmap_nms", + "refine_keypoints", + "refine_keypoints_dark", + "refine_keypoints_dark_udp", + "generate_displacement_heatmap", + "refine_simcc_dark", + "gaussian_blur1d", + "get_diagonal_lengths", + "get_instance_root", + "get_instance_bbox", + "get_simcc_normalized", + "camera_to_image_coord", + "camera_to_pixel", + "pixel_to_camera", + "get_heatmap_3d_maximum", + "generate_3d_gaussian_heatmaps", + "generate_oks_maps", + "get_heatmap_expected_value", + "generate_onehot_heatmaps", ] diff --git a/mmpose/codecs/utils/camera_image_projection.py b/mmpose/codecs/utils/camera_image_projection.py index b26d1396f1d054b1f36fd50df4c469d6201f12e6..390943999ee97a4d4aeb6f1aa521e35041c5da65 100644 --- a/mmpose/codecs/utils/camera_image_projection.py +++ b/mmpose/codecs/utils/camera_image_projection.py @@ -4,8 +4,7 @@ from typing import Dict, Tuple import numpy as np -def camera_to_image_coord(root_index: int, kpts_3d_cam: np.ndarray, - camera_param: Dict) -> Tuple[np.ndarray, np.ndarray]: +def camera_to_image_coord(root_index: int, kpts_3d_cam: np.ndarray, camera_param: Dict) -> Tuple[np.ndarray, np.ndarray]: """Project keypoints from camera space to image space and calculate factor. Args: @@ -29,30 +28,23 @@ def camera_to_image_coord(root_index: int, kpts_3d_cam: np.ndarray, br_kpt[..., :2] += 1.0 tl_kpt = np.reshape(tl_kpt, (-1, 3)) br_kpt = np.reshape(br_kpt, (-1, 3)) - fx, fy = camera_param['f'] / 1000. - cx, cy = camera_param['c'] / 1000. + fx, fy = camera_param["f"] / 1000.0 + cx, cy = camera_param["c"] / 1000.0 tl2d = camera_to_pixel(tl_kpt, fx, fy, cx, cy) br2d = camera_to_pixel(br_kpt, fx, fy, cx, cy) rectangle_3d_size = 2.0 kpts_3d_image = np.zeros_like(kpts_3d_cam) - kpts_3d_image[..., :2] = camera_to_pixel(kpts_3d_cam.copy(), fx, fy, cx, - cy) + kpts_3d_image[..., :2] = camera_to_pixel(kpts_3d_cam.copy(), fx, fy, cx, cy) ratio = (br2d[..., 0] - tl2d[..., 0] + 0.001) / rectangle_3d_size factor = rectangle_3d_size / (br2d[..., 0] - tl2d[..., 0] + 0.001) - kpts_3d_depth = ratio[:, None] * ( - kpts_3d_cam[..., 2] - kpts_3d_cam[..., root_index:root_index + 1, 2]) + kpts_3d_depth = ratio[:, None] * (kpts_3d_cam[..., 2] - kpts_3d_cam[..., root_index : root_index + 1, 2]) kpts_3d_image[..., 2] = kpts_3d_depth return kpts_3d_image, factor -def camera_to_pixel(kpts_3d: np.ndarray, - fx: float, - fy: float, - cx: float, - cy: float, - shift: bool = False) -> np.ndarray: +def camera_to_pixel(kpts_3d: np.ndarray, fx: float, fy: float, cx: float, cy: float, shift: bool = False) -> np.ndarray: """Project keypoints from camera space to image space. Args: @@ -77,8 +69,7 @@ def camera_to_pixel(kpts_3d: np.ndarray, return pose_2d -def pixel_to_camera(kpts_3d: np.ndarray, fx: float, fy: float, cx: float, - cy: float) -> np.ndarray: +def pixel_to_camera(kpts_3d: np.ndarray, fx: float, fy: float, cx: float, cy: float) -> np.ndarray: """Project keypoints from camera space to image space. Args: diff --git a/mmpose/codecs/utils/gaussian_heatmap.py b/mmpose/codecs/utils/gaussian_heatmap.py index a475a6ae12c968598fed7abdcf5a0ce5b0b38e74..8f25fd7709d40be90b205b95c908fcf07468464d 100644 --- a/mmpose/codecs/utils/gaussian_heatmap.py +++ b/mmpose/codecs/utils/gaussian_heatmap.py @@ -16,7 +16,7 @@ def generate_3d_gaussian_heatmaps( joint_indices: Optional[list] = None, max_bound: float = 1.0, use_different_joint_weights: bool = False, - dataset_keypoint_weights: Optional[np.ndarray] = None + dataset_keypoint_weights: Optional[np.ndarray] = None, ) -> Tuple[np.ndarray, np.ndarray]: """Generate 3d gaussian heatmaps of keypoints. @@ -60,7 +60,7 @@ def generate_3d_gaussian_heatmaps( keypoint_weights = keypoints_visible.copy() if isinstance(sigma, (int, float)): - sigma = (sigma, ) * N + sigma = (sigma,) * N for n in range(N): # 3-sigma rule @@ -71,11 +71,9 @@ def generate_3d_gaussian_heatmaps( mu_y = keypoints[n, :, 1] * H / image_size[1] mu_z = (keypoints[n, :, 2] / heatmap3d_depth_bound + 0.5) * D - keypoint_weights[n, ...] = keypoint_weights[n, ...] * (mu_z >= 0) * ( - mu_z < D) + keypoint_weights[n, ...] = keypoint_weights[n, ...] * (mu_z >= 0) * (mu_z < D) if use_different_joint_weights: - keypoint_weights[ - n] = keypoint_weights[n] * dataset_keypoint_weights + keypoint_weights[n] = keypoint_weights[n] * dataset_keypoint_weights # xy grid gaussian_size = 2 * radius + 1 @@ -99,19 +97,15 @@ def generate_3d_gaussian_heatmaps( zz = zz.round().clip(0, D - 1) # compute the target value near joints - gaussian = np.exp(-((xx - mu_x)**2 + (yy - mu_y)**2 + (zz - mu_z)**2) / - (2 * sigma[n]**2)) + gaussian = np.exp(-((xx - mu_x) ** 2 + (yy - mu_y) ** 2 + (zz - mu_z) ** 2) / (2 * sigma[n] ** 2)) # put the local target value to the full target heatmap - idx_joints = np.tile( - np.expand_dims(np.arange(K), axis=(-1, -2, -3)), - [1, local_size, local_size, local_size]) - idx = np.stack([idx_joints, zz, yy, xx], - axis=-1).astype(int).reshape(-1, 4) + idx_joints = np.tile(np.expand_dims(np.arange(K), axis=(-1, -2, -3)), [1, local_size, local_size, local_size]) + idx = np.stack([idx_joints, zz, yy, xx], axis=-1).astype(int).reshape(-1, 4) heatmaps[idx[:, 0], idx[:, 1], idx[:, 2], idx[:, 3]] = np.maximum( - heatmaps[idx[:, 0], idx[:, 1], idx[:, 2], idx[:, 3]], - gaussian.reshape(-1)) + heatmaps[idx[:, 0], idx[:, 1], idx[:, 2], idx[:, 3]], gaussian.reshape(-1) + ) heatmaps = (heatmaps * max_bound).reshape(-1, H, W) @@ -150,7 +144,7 @@ def generate_gaussian_heatmaps( keypoint_weights = keypoints_visible.copy() if isinstance(sigma, (int, float)): - sigma = (sigma, ) * N + sigma = (sigma,) * N for n in range(N): # 3-sigma rule @@ -180,7 +174,7 @@ def generate_gaussian_heatmaps( # The gaussian is not normalized, # we want the center value to equal 1 - gaussian = np.exp(-((x - x0)**2 + (y - y0)**2) / (2 * sigma[n]**2)) + gaussian = np.exp(-((x - x0) ** 2 + (y - y0) ** 2) / (2 * sigma[n] ** 2)) # valid range in gaussian g_x1 = max(0, -left) @@ -197,8 +191,7 @@ def generate_gaussian_heatmaps( heatmap_region = heatmaps[k, h_y1:h_y2, h_x1:h_x2] gaussian_regsion = gaussian[g_y1:g_y2, g_x1:g_x2] - _ = np.maximum( - heatmap_region, gaussian_regsion, out=heatmap_region) + _ = np.maximum(heatmap_region, gaussian_regsion, out=heatmap_region) return heatmaps, keypoint_weights @@ -254,7 +247,7 @@ def generate_unbiased_gaussian_heatmaps( keypoint_weights[n, k] = 0 continue - gaussian = np.exp(-((x - mu[0])**2 + (y - mu[1])**2) / (2 * sigma**2)) + gaussian = np.exp(-((x - mu[0]) ** 2 + (y - mu[1]) ** 2) / (2 * sigma**2)) _ = np.maximum(gaussian, heatmaps[k], out=heatmaps[k]) @@ -315,12 +308,12 @@ def generate_udp_gaussian_heatmaps( if vis_kpts.size == 0: min_dists = np.ones(image_kpts.shape[0]) * diag else: - dists = cdist(image_kpts, vis_kpts, metric='euclidean') + dists = cdist(image_kpts, vis_kpts, metric="euclidean") min_dists = np.min(dists, axis=1) - scales = min_dists / diag * 2.0 # Maximum distance (diagonal) results in .0*sigma + scales = min_dists / diag * 2.0 # Maximum distance (diagonal) results in .0*sigma scales_arr[n, :] = scales - scaled_sigmas[n, :] = sigma * (1+scales) + scaled_sigmas[n, :] = sigma * (1 + scales) # print(scales_arr) # print(scaled_sigmas) @@ -330,7 +323,7 @@ def generate_udp_gaussian_heatmaps( # skip unlabled keypoints if keypoints_visible[n, k] < 0.5: continue - + # 3-sigma rule radius = scaled_sigma * 3 @@ -354,7 +347,7 @@ def generate_udp_gaussian_heatmaps( x0 = y0 = gaussian_size // 2 x0 += mu_ac[0] - mu[0] y0 += mu_ac[1] - mu[1] - gaussian = np.exp(-((x - x0)**2 + (y - y0)**2) / (2 * scaled_sigma**2)) + gaussian = np.exp(-((x - x0) ** 2 + (y - y0) ** 2) / (2 * scaled_sigma**2)) # Normalize Gaussian such that scaled_sigma = sigma is the norm gaussian = gaussian / (scaled_sigma / sigmas[n, k]) @@ -420,10 +413,9 @@ def generate_onehot_heatmaps( for n, k in product(range(N), range(K)): # skip unlabled keypoints if keypoints_visible[n, k] < 0.5: - continue + continue mu = (keypoints[n, k] + 0.5).astype(np.int64) - if mu[0] < 0 or mu[0] >= W or mu[1] < 0 or mu[1] >= H: keypoint_weights[n, k] = 0 diff --git a/mmpose/codecs/utils/instance_property.py b/mmpose/codecs/utils/instance_property.py index 15ae30aef021939e2f0dbf276ce8b1c3cceaa40e..923637926ac7cd985f2f2ae66a41b2f0bd081be2 100644 --- a/mmpose/codecs/utils/instance_property.py +++ b/mmpose/codecs/utils/instance_property.py @@ -4,9 +4,7 @@ from typing import Optional import numpy as np -def get_instance_root(keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None, - root_type: str = 'kpt_center') -> np.ndarray: +def get_instance_root(keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None, root_type: str = "kpt_center") -> np.ndarray: """Calculate the coordinates and visibility of instance roots. Args: @@ -46,24 +44,19 @@ def get_instance_root(keypoints: np.ndarray, continue # compute the instance root with visible keypoints - if root_type == 'kpt_center': + if root_type == "kpt_center": roots_coordinate[i] = visible_keypoints.mean(axis=0) roots_visible[i] = 1 - elif root_type == 'bbox_center': - roots_coordinate[i] = (visible_keypoints.max(axis=0) + - visible_keypoints.min(axis=0)) / 2.0 + elif root_type == "bbox_center": + roots_coordinate[i] = (visible_keypoints.max(axis=0) + visible_keypoints.min(axis=0)) / 2.0 roots_visible[i] = 1 else: - raise ValueError( - f'the value of `root_type` must be \'kpt_center\' or ' - f'\'bbox_center\', but got \'{root_type}\'') + raise ValueError(f"the value of `root_type` must be 'kpt_center' or " f"'bbox_center', but got '{root_type}'") return roots_coordinate, roots_visible -def get_instance_bbox(keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None - ) -> np.ndarray: +def get_instance_bbox(keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None) -> np.ndarray: """Calculate the pseudo instance bounding box from visible keypoints. The bounding boxes are in the xyxy format. @@ -89,9 +82,7 @@ def get_instance_bbox(keypoints: np.ndarray, return bbox -def get_diagonal_lengths(keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None - ) -> np.ndarray: +def get_diagonal_lengths(keypoints: np.ndarray, keypoints_visible: Optional[np.ndarray] = None) -> np.ndarray: """Calculate the diagonal length of instance bounding box from visible keypoints. diff --git a/mmpose/codecs/utils/offset_heatmap.py b/mmpose/codecs/utils/offset_heatmap.py index c3c1c32ed391982fa0f8cd31b6240363b4fe1c52..90b8ce8bd4a91d4453f5306591cfbf415377de01 100644 --- a/mmpose/codecs/utils/offset_heatmap.py +++ b/mmpose/codecs/utils/offset_heatmap.py @@ -55,7 +55,7 @@ def generate_offset_heatmap( x_offset = (mu[0] - x) / radius y_offset = (mu[1] - y) / radius - heatmaps[k, 0] = np.where(x_offset**2 + y_offset**2 <= 1, 1., 0.) + heatmaps[k, 0] = np.where(x_offset**2 + y_offset**2 <= 1, 1.0, 0.0) heatmaps[k, 1] = x_offset heatmaps[k, 2] = y_offset @@ -108,16 +108,19 @@ def generate_displacement_heatmap( instance_size_map = np.zeros((H, W), dtype=np.float32) for n in range(N): - if (roots_visible[n] < 1 or (roots[n, 0] < 0 or roots[n, 1] < 0) - or (roots[n, 0] >= W or roots[n, 1] >= H)): + if roots_visible[n] < 1 or (roots[n, 0] < 0 or roots[n, 1] < 0) or (roots[n, 0] >= W or roots[n, 1] >= H): continue diagonal_length = diagonal_lengths[n] for k in range(K): - if keypoints_visible[n, k] < 1 or keypoints[n, k, 0] < 0 \ - or keypoints[n, k, 1] < 0 or keypoints[n, k, 0] >= W \ - or keypoints[n, k, 1] >= H: + if ( + keypoints_visible[n, k] < 1 + or keypoints[n, k, 0] < 0 + or keypoints[n, k, 1] < 0 + or keypoints[n, k, 0] >= W + or keypoints[n, k, 1] >= H + ): continue start_x = max(int(roots[n, 0] - radius), 0) @@ -127,17 +130,13 @@ def generate_displacement_heatmap( for x in range(start_x, end_x): for y in range(start_y, end_y): - if displacements[2 * k, y, - x] != 0 or displacements[2 * k + 1, y, - x] != 0: + if displacements[2 * k, y, x] != 0 or displacements[2 * k + 1, y, x] != 0: if diagonal_length > instance_size_map[y, x]: # keep the gt displacement of smaller instance continue - displacement_weights[2 * k:2 * k + 2, y, - x] = 1 / diagonal_length - displacements[2 * k:2 * k + 2, y, - x] = keypoints[n, k] - [x, y] + displacement_weights[2 * k : 2 * k + 2, y, x] = 1 / diagonal_length + displacements[2 * k : 2 * k + 2, y, x] = keypoints[n, k] - [x, y] instance_size_map[y, x] = diagonal_length return displacements, displacement_weights diff --git a/mmpose/codecs/utils/oks_map.py b/mmpose/codecs/utils/oks_map.py index f1d886e8d64e2d6214391cabd564302d447f1aed..1b060ef728155fc82eb10248c64661a7f1a5d817 100644 --- a/mmpose/codecs/utils/oks_map.py +++ b/mmpose/codecs/utils/oks_map.py @@ -1,4 +1,4 @@ -# Copyright (c) OpenMMLab. All rights reserved. +# Copyright (c) Miroslav Purkrabek, ProbPose. All rights reserved. from itertools import product from typing import Optional, Tuple, Union @@ -40,18 +40,17 @@ def generate_oks_maps( W, H = heatmap_size # The default sigmas are used for COCO dataset. - sigmas = np.array( - [2.6, 2.5, 2.5, 3.5, 3.5, 7.9, 7.9, 7.2, 7.2, 6.2, 6.2, 10.7, 10.7, 8.7, 8.7, 8.9, 8.9])/100 + sigmas = np.array([2.6, 2.5, 2.5, 3.5, 3.5, 7.9, 7.9, 7.2, 7.2, 6.2, 6.2, 10.7, 10.7, 8.7, 8.7, 8.9, 8.9]) / 100 # sigmas = sigmas * 2 / sigmas.mean() # sigmas = np.round(sigmas).astype(int) # sigmas = np.clip(sigmas, 1, 10) - + heatmaps = np.zeros((K, H, W), dtype=np.float32) keypoint_weights = keypoints_visible.copy() # bbox_area = W/1.25 * H/1.25 # bbox_area = W * H * 0.53 - bbox_area = np.sqrt(H/1.25 * W/1.25) + bbox_area = np.sqrt(H / 1.25 * W / 1.25) # print(scales_arr) # print(scaled_sigmas) @@ -68,16 +67,16 @@ def generate_oks_maps( dist = np.sqrt(dx**2 + dy**2) # e_map = (dx**2 + dy**2) / ((kpt_sigma*100)**2 * sigma) - vars = (kpt_sigma*2)**2 + vars = (kpt_sigma * 2) ** 2 s = vars * bbox_area * 2 s = np.clip(s, 0.55, 3.0) if sigma is not None and sigma > 0: s = sigma - e_map = dist**2 / (2*s) + e_map = dist**2 / (2 * s) oks_map = np.exp(-e_map) keypoint_weights[n, k] = (oks_map.max() > 0).astype(int) - + # Scale such that there is always 1 at the maximum if oks_map.max() > 1e-3: oks_map = oks_map / oks_map.max() @@ -86,12 +85,11 @@ def generate_oks_maps( # oks_map[oks_map < 0.5] = 0 # oks_map = 2 * oks_map - 1 - # oks_map[oks_map > 0.95] = 1 # print("{:.4f}, {:7.1f}, {:9.3f}, {:9.3f}, {:4.2f}".format(vars, bbox_area, vars * bbox_area* 2, s, oks_map.max())) # if np.all(oks_map < 0.1): # print("\t{:d} --> {:.4f}".format(k, s)) - heatmaps[k] = oks_map + heatmaps[k] = oks_map # breakpoint() return heatmaps, keypoint_weights diff --git a/mmpose/codecs/utils/post_processing.py b/mmpose/codecs/utils/post_processing.py index 054eaedd8f189860dccd28252b2a4046f6e40d8c..8f8c5ef416d9ea3aac28c7626fdcd6c4f0456e57 100644 --- a/mmpose/codecs/utils/post_processing.py +++ b/mmpose/codecs/utils/post_processing.py @@ -6,9 +6,8 @@ import cv2 import numpy as np import torch import torch.nn.functional as F -from torch import Tensor - from scipy.signal import convolve2d +from torch import Tensor def get_simcc_normalized(batch_pred_simcc, sigma=None): @@ -32,7 +31,7 @@ def get_simcc_normalized(batch_pred_simcc, sigma=None): mask = (batch_pred_simcc.amax(dim=-1) > 1).reshape(B, K, 1) # Normalize the tensor using the maximum value - norm = (batch_pred_simcc / batch_pred_simcc.amax(dim=-1).reshape(B, K, 1)) + norm = batch_pred_simcc / batch_pred_simcc.amax(dim=-1).reshape(B, K, 1) # Apply normalization batch_pred_simcc = torch.where(mask, norm, batch_pred_simcc) @@ -40,10 +39,7 @@ def get_simcc_normalized(batch_pred_simcc, sigma=None): return batch_pred_simcc -def get_simcc_maximum(simcc_x: np.ndarray, - simcc_y: np.ndarray, - apply_softmax: bool = False - ) -> Tuple[np.ndarray, np.ndarray]: +def get_simcc_maximum(simcc_x: np.ndarray, simcc_y: np.ndarray, apply_softmax: bool = False) -> Tuple[np.ndarray, np.ndarray]: """Get maximum response location and value from simcc representations. Note: @@ -66,14 +62,11 @@ def get_simcc_maximum(simcc_x: np.ndarray, (K,) or (N, K) """ - assert isinstance(simcc_x, np.ndarray), ('simcc_x should be numpy.ndarray') - assert isinstance(simcc_y, np.ndarray), ('simcc_y should be numpy.ndarray') - assert simcc_x.ndim == 2 or simcc_x.ndim == 3, ( - f'Invalid shape {simcc_x.shape}') - assert simcc_y.ndim == 2 or simcc_y.ndim == 3, ( - f'Invalid shape {simcc_y.shape}') - assert simcc_x.ndim == simcc_y.ndim, ( - f'{simcc_x.shape} != {simcc_y.shape}') + assert isinstance(simcc_x, np.ndarray), "simcc_x should be numpy.ndarray" + assert isinstance(simcc_y, np.ndarray), "simcc_y should be numpy.ndarray" + assert simcc_x.ndim == 2 or simcc_x.ndim == 3, f"Invalid shape {simcc_x.shape}" + assert simcc_y.ndim == 2 or simcc_y.ndim == 3, f"Invalid shape {simcc_y.shape}" + assert simcc_x.ndim == simcc_y.ndim, f"{simcc_x.shape} != {simcc_y.shape}" if simcc_x.ndim == 3: N, K, Wx = simcc_x.shape @@ -98,7 +91,7 @@ def get_simcc_maximum(simcc_x: np.ndarray, mask = max_val_x > max_val_y max_val_x[mask] = max_val_y[mask] vals = max_val_x - locs[vals <= 0.] = -1 + locs[vals <= 0.0] = -1 if N: locs = locs.reshape(N, K, 2) @@ -107,8 +100,7 @@ def get_simcc_maximum(simcc_x: np.ndarray, return locs, vals -def get_heatmap_3d_maximum(heatmaps: np.ndarray - ) -> Tuple[np.ndarray, np.ndarray]: +def get_heatmap_3d_maximum(heatmaps: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: """Get maximum response location and value from heatmaps. Note: @@ -129,10 +121,8 @@ def get_heatmap_3d_maximum(heatmaps: np.ndarray - vals (np.ndarray): values of maximum heatmap responses in shape (K,) or (B, K) """ - assert isinstance(heatmaps, - np.ndarray), ('heatmaps should be numpy.ndarray') - assert heatmaps.ndim == 4 or heatmaps.ndim == 5, ( - f'Invalid shape {heatmaps.shape}') + assert isinstance(heatmaps, np.ndarray), "heatmaps should be numpy.ndarray" + assert heatmaps.ndim == 4 or heatmaps.ndim == 5, f"Invalid shape {heatmaps.shape}" if heatmaps.ndim == 4: K, D, H, W = heatmaps.shape @@ -142,11 +132,10 @@ def get_heatmap_3d_maximum(heatmaps: np.ndarray B, K, D, H, W = heatmaps.shape heatmaps_flatten = heatmaps.reshape(B * K, -1) - z_locs, y_locs, x_locs = np.unravel_index( - np.argmax(heatmaps_flatten, axis=1), shape=(D, H, W)) + z_locs, y_locs, x_locs = np.unravel_index(np.argmax(heatmaps_flatten, axis=1), shape=(D, H, W)) locs = np.stack((x_locs, y_locs, z_locs), axis=-1).astype(np.float32) vals = np.amax(heatmaps_flatten, axis=1) - locs[vals <= 0.] = -1 + locs[vals <= 0.0] = -1 if B: locs = locs.reshape(B, K, 3) @@ -174,10 +163,8 @@ def get_heatmap_maximum(heatmaps: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: - vals (np.ndarray): values of maximum heatmap responses in shape (K,) or (B, K) """ - assert isinstance(heatmaps, - np.ndarray), ('heatmaps should be numpy.ndarray') - assert heatmaps.ndim == 3 or heatmaps.ndim == 4, ( - f'Invalid shape {heatmaps.shape}') + assert isinstance(heatmaps, np.ndarray), "heatmaps should be numpy.ndarray" + assert heatmaps.ndim == 3 or heatmaps.ndim == 4, f"Invalid shape {heatmaps.shape}" if heatmaps.ndim == 3: K, H, W = heatmaps.shape @@ -187,11 +174,10 @@ def get_heatmap_maximum(heatmaps: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: B, K, H, W = heatmaps.shape heatmaps_flatten = heatmaps.reshape(B * K, -1) - y_locs, x_locs = np.unravel_index( - np.argmax(heatmaps_flatten, axis=1), shape=(H, W)) + y_locs, x_locs = np.unravel_index(np.argmax(heatmaps_flatten, axis=1), shape=(H, W)) locs = np.stack((x_locs, y_locs), axis=-1).astype(np.float32) vals = np.amax(heatmaps_flatten, axis=1) - locs[vals <= 0.] = -1 + locs[vals <= 0.0] = -1 if B: locs = locs.reshape(B, K, 2) @@ -228,7 +214,7 @@ def gaussian_blur(heatmaps: np.ndarray, kernel: int = 11) -> np.ndarray: dr[border:-border, border:-border] = heatmaps[k].copy() dr = cv2.GaussianBlur(dr, (kernel, kernel), 0) heatmaps[k] = dr[border:-border, border:-border].copy() - heatmaps[k] *= origin_max / (np.max(heatmaps[k])+1e-12) + heatmaps[k] *= origin_max / (np.max(heatmaps[k]) + 1e-12) return heatmaps @@ -275,20 +261,20 @@ def batch_heatmap_nms(batch_heatmaps: Tensor, kernel_size: int = 5): Tensor: The batch heatmaps after NMS. """ - assert isinstance(kernel_size, int) and kernel_size % 2 == 1, \ - f'The kernel_size should be an odd integer, got {kernel_size}' + assert isinstance(kernel_size, int) and kernel_size % 2 == 1, f"The kernel_size should be an odd integer, got {kernel_size}" padding = (kernel_size - 1) // 2 - maximum = F.max_pool2d( - batch_heatmaps, kernel_size, stride=1, padding=padding) + maximum = F.max_pool2d(batch_heatmaps, kernel_size, stride=1, padding=padding) maximum_indicator = torch.eq(batch_heatmaps, maximum) batch_heatmaps = batch_heatmaps * maximum_indicator.float() return batch_heatmaps -def get_heatmap_expected_value(heatmaps: np.ndarray, parzen_size: float = 0.1, return_heatmap: bool = False) -> Tuple[np.ndarray, np.ndarray]: +def get_heatmap_expected_value( + heatmaps: np.ndarray, parzen_size: float = 0.1, return_heatmap: bool = False +) -> Tuple[np.ndarray, np.ndarray]: """Get maximum response location and value from heatmaps. Note: @@ -307,13 +293,10 @@ def get_heatmap_expected_value(heatmaps: np.ndarray, parzen_size: float = 0.1, r - vals (np.ndarray): values of maximum heatmap responses in shape (K,) or (B, K) """ - assert isinstance(heatmaps, - np.ndarray), ('heatmaps should be numpy.ndarray') - assert heatmaps.ndim == 3 or heatmaps.ndim == 4, ( - f'Invalid shape {heatmaps.shape}') - - assert parzen_size >= 0.0 and parzen_size <= 1.0, ( - f'Invalid parzen_size {parzen_size}') + assert isinstance(heatmaps, np.ndarray), "heatmaps should be numpy.ndarray" + assert heatmaps.ndim == 3 or heatmaps.ndim == 4, f"Invalid shape {heatmaps.shape}" + + assert parzen_size >= 0.0 and parzen_size <= 1.0, f"Invalid parzen_size {parzen_size}" if heatmaps.ndim == 3: K, H, W = heatmaps.shape @@ -322,12 +305,12 @@ def get_heatmap_expected_value(heatmaps: np.ndarray, parzen_size: float = 0.1, r heatmaps_flatten = heatmaps.reshape(1, K, H, W) else: B, K, H, W = heatmaps.shape - FIRST_DIM = K*B + FIRST_DIM = K * B heatmaps_flatten = heatmaps.reshape(B, K, H, W) # Blur heatmaps with Gaussian # heatmaps_flatten = gaussian_blur(heatmaps_flatten, kernel=9) - + # Zero out pixels far from the maximum for each heatmap # heatmaps_tmp = heatmaps_flatten.copy().reshape(B*K, H*W) # y_locs, x_locs = np.unravel_index( @@ -345,19 +328,17 @@ def get_heatmap_expected_value(heatmaps: np.ndarray, parzen_size: float = 0.1, r # heatmaps_flatten[i] = heatmaps_flatten[i] * mask # heatmaps_flatten = heatmaps_flatten.reshape(B, K, H, W) + bbox_area = np.sqrt(H / 1.25 * W / 1.25) - bbox_area = np.sqrt(H/1.25 * W/1.25) + kpt_sigmas = np.array([2.6, 2.5, 2.5, 3.5, 3.5, 7.9, 7.9, 7.2, 7.2, 6.2, 6.2, 10.7, 10.7, 8.7, 8.7, 8.9, 8.9]) / 100 - kpt_sigmas = np.array( - [2.6, 2.5, 2.5, 3.5, 3.5, 7.9, 7.9, 7.2, 7.2, 6.2, 6.2, 10.7, 10.7, 8.7, 8.7, 8.9, 8.9])/100 - heatmaps_covolved = np.zeros_like(heatmaps_flatten) for k in range(K): - vars = (kpt_sigmas[k]*2)**2 + vars = (kpt_sigmas[k] * 2) ** 2 s = vars * bbox_area * 2 s = np.clip(s, 0.55, 3.0) radius = np.ceil(s * 3).astype(int) - diameter = 2*radius + 1 + diameter = 2 * radius + 1 diameter = np.ceil(diameter).astype(int) # kernel_sizes[kernel_sizes % 2 == 0] += 1 center = diameter // 2 @@ -365,9 +346,9 @@ def get_heatmap_expected_value(heatmaps: np.ndarray, parzen_size: float = 0.1, r dist_y = np.arange(diameter) - center dist_x, dist_y = np.meshgrid(dist_x, dist_y) dist = np.sqrt(dist_x**2 + dist_y**2) - oks_kernel = np.exp(-dist**2 / (2 * s)) + oks_kernel = np.exp(-(dist**2) / (2 * s)) oks_kernel = oks_kernel / oks_kernel.sum() - + htm = heatmaps_flatten[:, k, :, :].reshape(-1, H, W) # htm = np.pad(htm, ((0, 0), (radius, radius), (radius, radius)), mode='symmetric') # htm = torch.from_numpy(htm).float() @@ -375,27 +356,23 @@ def get_heatmap_expected_value(heatmaps: np.ndarray, parzen_size: float = 0.1, r oks_kernel = oks_kernel.reshape(1, diameter, diameter) htm_conv = np.zeros_like(htm) for b in range(B): - htm_conv[b, :, :] = convolve2d(htm[b, :, :], oks_kernel[b, :, :], mode='same', boundary='symm') + htm_conv[b, :, :] = convolve2d(htm[b, :, :], oks_kernel[b, :, :], mode="same", boundary="symm") # htm_conv = F.conv2d(htm.unsqueeze(1), oks_kernel.unsqueeze(1), padding='same') # htm_conv = htm_conv[:, :, radius:-radius, radius:-radius] htm_conv = htm_conv.reshape(-1, 1, H, W) heatmaps_covolved[:, k, :, :] = htm_conv - - heatmaps_covolved = heatmaps_covolved.reshape(B*K, H*W) - y_locs, x_locs = np.unravel_index( - np.argmax(heatmaps_covolved, axis=1), shape=(H, W)) + heatmaps_covolved = heatmaps_covolved.reshape(B * K, H * W) + y_locs, x_locs = np.unravel_index(np.argmax(heatmaps_covolved, axis=1), shape=(H, W)) locs = np.stack((x_locs, y_locs), axis=-1).astype(np.float32) # Apply mean-shift to get sub-pixel locations - locs = _get_subpixel_maximums(heatmaps_covolved.reshape(B*K, H, W), locs) + locs = _get_subpixel_maximums(heatmaps_covolved.reshape(B * K, H, W), locs) # breakpoint() - # heatmaps_sums = heatmaps_flatten.sum(axis=(1, 2)) # norm_heatmaps = heatmaps_flatten.copy() # norm_heatmaps[heatmaps_sums > 0] = heatmaps_flatten[heatmaps_sums > 0] / heatmaps_sums[heatmaps_sums > 0, None, None] - # # Compute Parzen window with Gaussian blur along the edge instead of simple mirroring # x_pad = int(parzen_size * W + 0.5) @@ -408,12 +385,12 @@ def get_heatmap_expected_value(heatmaps: np.ndarray, parzen_size: float = 0.1, r # # norm_heatmaps_pad_blur = np.pad(norm_heatmaps, ((0, 0), (x_pad, x_pad), (y_pad, y_pad)), mode='symmetric') # norm_heatmaps_pad = np.pad(norm_heatmaps, ((0, 0), (y_pad, y_pad), (x_pad, x_pad)), mode='constant', constant_values=0) # norm_heatmaps_pad_blur = gaussian_blur(norm_heatmaps_pad, kernel=kernel_size) - + # # norm_heatmaps_pad_blur[:, x_pad:-x_pad, y_pad:-y_pad] = norm_heatmaps - + # norm_heatmaps_pad_sum = norm_heatmaps_pad_blur.sum(axis=(1, 2)) # norm_heatmaps_pad_blur[norm_heatmaps_pad_sum>0] = norm_heatmaps_pad_blur[norm_heatmaps_pad_sum>0] / norm_heatmaps_pad_sum[norm_heatmaps_pad_sum>0, None, None] - + # # # Save the blurred heatmaps # # for i in range(heatmaps.shape[0]): # # tmp_htm = norm_heatmaps_pad_blur[i].copy() @@ -439,7 +416,7 @@ def get_heatmap_expected_value(heatmaps: np.ndarray, parzen_size: float = 0.1, r # # breakpoint() # x_locs = np.sum(norm_heatmaps_pad_blur * x_idx, axis=(1, 2)) - x_pad # y_locs = np.sum(norm_heatmaps_pad_blur * y_idx, axis=(1, 2)) - y_pad - + # # mean_idx = np.argmax(heatmaps_flatten, axis=1) # # x_locs, y_locs = np.unravel_index(mean_idx, shape=(H, W)) # # locs = np.stack((x_locs, y_locs), axis=-1).astype(np.float32) @@ -450,15 +427,14 @@ def get_heatmap_expected_value(heatmaps: np.ndarray, parzen_size: float = 0.1, r # # mean_idx = np.argmax(norm_heatmaps, axis=1) # # y_locs, x_locs = np.unravel_index( # # mean_idx, shape=(H, W)) - + # locs = np.stack((x_locs, y_locs), axis=-1).astype(np.float32) # # vals = np.amax(heatmaps_flatten, axis=1) - - + x_locs_int = np.round(x_locs).astype(int) - x_locs_int = np.clip(x_locs_int, 0, W-1) + x_locs_int = np.clip(x_locs_int, 0, W - 1) y_locs_int = np.round(y_locs).astype(int) - y_locs_int = np.clip(y_locs_int, 0, H-1) + y_locs_int = np.clip(y_locs_int, 0, H - 1) vals = heatmaps_flatten[np.arange(B), np.arange(K), y_locs_int, x_locs_int] # breakpoint() # locs[vals <= 0.] = -1 @@ -481,8 +457,7 @@ def get_heatmap_expected_value(heatmaps: np.ndarray, parzen_size: float = 0.1, r if return_heatmap: return locs, vals, heatmaps_covolved else: - return locs, vals - + return locs, vals def _get_subpixel_maximums(heatmaps, locs): @@ -491,8 +466,7 @@ def _get_subpixel_maximums(heatmaps, locs): y_locs = locs[:, 1].astype(np.int32) # Ensure we are not near the boundaries (avoid boundary issues) - valid_mask = (x_locs > 0) & (x_locs < heatmaps.shape[2] - 1) & \ - (y_locs > 0) & (y_locs < heatmaps.shape[1] - 1) + valid_mask = (x_locs > 0) & (x_locs < heatmaps.shape[2] - 1) & (y_locs > 0) & (y_locs < heatmaps.shape[1] - 1) # Initialize the output array with the integer locations subpixel_locs = locs.copy() @@ -503,16 +477,18 @@ def _get_subpixel_maximums(heatmaps, locs): y_locs_valid = y_locs[valid_mask] # Compute gradients (dx, dy) and second derivatives (dxx, dyy) - dx = (heatmaps[valid_mask, y_locs_valid, x_locs_valid + 1] - - heatmaps[valid_mask, y_locs_valid, x_locs_valid - 1]) / 2.0 - dy = (heatmaps[valid_mask, y_locs_valid + 1, x_locs_valid] - - heatmaps[valid_mask, y_locs_valid - 1, x_locs_valid]) / 2.0 - dxx = heatmaps[valid_mask, y_locs_valid, x_locs_valid + 1] + \ - heatmaps[valid_mask, y_locs_valid, x_locs_valid - 1] - \ - 2 * heatmaps[valid_mask, y_locs_valid, x_locs_valid] - dyy = heatmaps[valid_mask, y_locs_valid + 1, x_locs_valid] + \ - heatmaps[valid_mask, y_locs_valid - 1, x_locs_valid] - \ - 2 * heatmaps[valid_mask, y_locs_valid, x_locs_valid] + dx = (heatmaps[valid_mask, y_locs_valid, x_locs_valid + 1] - heatmaps[valid_mask, y_locs_valid, x_locs_valid - 1]) / 2.0 + dy = (heatmaps[valid_mask, y_locs_valid + 1, x_locs_valid] - heatmaps[valid_mask, y_locs_valid - 1, x_locs_valid]) / 2.0 + dxx = ( + heatmaps[valid_mask, y_locs_valid, x_locs_valid + 1] + + heatmaps[valid_mask, y_locs_valid, x_locs_valid - 1] + - 2 * heatmaps[valid_mask, y_locs_valid, x_locs_valid] + ) + dyy = ( + heatmaps[valid_mask, y_locs_valid + 1, x_locs_valid] + + heatmaps[valid_mask, y_locs_valid - 1, x_locs_valid] + - 2 * heatmaps[valid_mask, y_locs_valid, x_locs_valid] + ) # Avoid division by zero by setting a minimum threshold for the second derivatives dxx = np.where(dxx != 0, dxx, 1e-6) @@ -527,4 +503,3 @@ def _get_subpixel_maximums(heatmaps, locs): subpixel_locs[valid_mask, 1] += subpixel_y_shift return subpixel_locs - diff --git a/mmpose/codecs/utils/refinement.py b/mmpose/codecs/utils/refinement.py index 13c79b4b4c1c8b774a84801ac2c03bac3417cf7d..55963b7d7368c95ca732c0c2a420702c6910c9ed 100644 --- a/mmpose/codecs/utils/refinement.py +++ b/mmpose/codecs/utils/refinement.py @@ -6,8 +6,7 @@ import numpy as np from .post_processing import gaussian_blur, gaussian_blur1d -def refine_keypoints(keypoints: np.ndarray, - heatmaps: np.ndarray) -> np.ndarray: +def refine_keypoints(keypoints: np.ndarray, heatmaps: np.ndarray) -> np.ndarray: """Refine keypoint predictions by moving from the maximum towards the second maximum by 0.25 pixel. The operation is in-place. @@ -34,20 +33,19 @@ def refine_keypoints(keypoints: np.ndarray, if 1 < x < W - 1 and 0 < y < H: dx = heatmaps[k, y, x + 1] - heatmaps[k, y, x - 1] else: - dx = 0. + dx = 0.0 if 1 < y < H - 1 and 0 < x < W: dy = heatmaps[k, y + 1, x] - heatmaps[k, y - 1, x] else: - dy = 0. + dy = 0.0 keypoints[n, k] += np.sign([dx, dy], dtype=np.float32) * 0.25 return keypoints -def refine_keypoints_dark(keypoints: np.ndarray, heatmaps: np.ndarray, - blur_kernel_size: int) -> np.ndarray: +def refine_keypoints_dark(keypoints: np.ndarray, heatmaps: np.ndarray, blur_kernel_size: int) -> np.ndarray: """Refine keypoint predictions using distribution aware coordinate decoding. See `Dark Pose`_ for details. The operation is in-place. @@ -83,15 +81,9 @@ def refine_keypoints_dark(keypoints: np.ndarray, heatmaps: np.ndarray, dx = 0.5 * (heatmaps[k, y, x + 1] - heatmaps[k, y, x - 1]) dy = 0.5 * (heatmaps[k, y + 1, x] - heatmaps[k, y - 1, x]) - dxx = 0.25 * ( - heatmaps[k, y, x + 2] - 2 * heatmaps[k, y, x] + - heatmaps[k, y, x - 2]) - dxy = 0.25 * ( - heatmaps[k, y + 1, x + 1] - heatmaps[k, y - 1, x + 1] - - heatmaps[k, y + 1, x - 1] + heatmaps[k, y - 1, x - 1]) - dyy = 0.25 * ( - heatmaps[k, y + 2, x] - 2 * heatmaps[k, y, x] + - heatmaps[k, y - 2, x]) + dxx = 0.25 * (heatmaps[k, y, x + 2] - 2 * heatmaps[k, y, x] + heatmaps[k, y, x - 2]) + dxy = 0.25 * (heatmaps[k, y + 1, x + 1] - heatmaps[k, y - 1, x + 1] - heatmaps[k, y + 1, x - 1] + heatmaps[k, y - 1, x - 1]) + dyy = 0.25 * (heatmaps[k, y + 2, x] - 2 * heatmaps[k, y, x] + heatmaps[k, y - 2, x]) derivative = np.array([[dx], [dy]]) hessian = np.array([[dxx, dxy], [dxy, dyy]]) if dxx * dyy - dxy**2 != 0: @@ -102,8 +94,7 @@ def refine_keypoints_dark(keypoints: np.ndarray, heatmaps: np.ndarray, return keypoints -def refine_keypoints_dark_udp(keypoints: np.ndarray, heatmaps: np.ndarray, - blur_kernel_size: int) -> np.ndarray: +def refine_keypoints_dark_udp(keypoints: np.ndarray, heatmaps: np.ndarray, blur_kernel_size: int) -> np.ndarray: """Refine keypoint predictions using distribution aware coordinate decoding for UDP. See `UDP`_ for details. The operation is in-place. @@ -130,11 +121,10 @@ def refine_keypoints_dark_udp(keypoints: np.ndarray, heatmaps: np.ndarray, # modulate heatmaps heatmaps = gaussian_blur(heatmaps, blur_kernel_size) - np.clip(heatmaps, 1e-3, 50., heatmaps) + np.clip(heatmaps, 1e-3, 50.0, heatmaps) np.log(heatmaps, heatmaps) - heatmaps_pad = np.pad( - heatmaps, ((0, 0), (1, 1), (1, 1)), mode='edge').flatten() + heatmaps_pad = np.pad(heatmaps, ((0, 0), (1, 1), (1, 1)), mode="edge").flatten() for n in range(N): index = keypoints[n, :, 0] + 1 + (keypoints[n, :, 1] + 1) * (W + 2) @@ -159,14 +149,12 @@ def refine_keypoints_dark_udp(keypoints: np.ndarray, heatmaps: np.ndarray, hessian = np.concatenate([dxx, dxy, dxy, dyy], axis=1) hessian = hessian.reshape(K, 2, 2) hessian = np.linalg.pinv(hessian + np.finfo(np.float32).eps * np.eye(2)) - keypoints[n] -= np.einsum('imn,ink->imk', hessian, - derivative).squeeze() + keypoints[n] -= np.einsum("imn,ink->imk", hessian, derivative).squeeze() return keypoints -def refine_simcc_dark(keypoints: np.ndarray, simcc: np.ndarray, - blur_kernel_size: int) -> np.ndarray: +def refine_simcc_dark(keypoints: np.ndarray, simcc: np.ndarray, blur_kernel_size: int) -> np.ndarray: """SimCC version. Refine keypoint predictions using distribution aware coordinate decoding for UDP. See `UDP`_ for details. The operation is in- place. @@ -192,10 +180,10 @@ def refine_simcc_dark(keypoints: np.ndarray, simcc: np.ndarray, # modulate simcc simcc = gaussian_blur1d(simcc, blur_kernel_size) - np.clip(simcc, 1e-3, 50., simcc) + np.clip(simcc, 1e-3, 50.0, simcc) np.log(simcc, simcc) - simcc = np.pad(simcc, ((0, 0), (0, 0), (2, 2)), 'edge') + simcc = np.pad(simcc, ((0, 0), (0, 0), (2, 2)), "edge") for n in range(N): px = (keypoints[n] + 2.5).astype(np.int64).reshape(-1, 1) # K, 1 diff --git a/mmpose/codecs/video_pose_lifting.py b/mmpose/codecs/video_pose_lifting.py index 5a5a7b1983381b9752d3bc9514fc28d6e1f73c43..f7f2827d8d693e49233249e84fd28df2b7730534 100644 --- a/mmpose/codecs/video_pose_lifting.py +++ b/mmpose/codecs/video_pose_lifting.py @@ -6,6 +6,7 @@ from typing import List, Optional, Tuple, Union import numpy as np from mmpose.registry import KEYPOINT_CODECS + from .base import BaseKeypointCodec @@ -39,28 +40,27 @@ class VideoPoseLifting(BaseKeypointCodec): Default: ``False``. """ - auxiliary_encode_keys = { - 'lifting_target', 'lifting_target_visible', 'camera_param' - } + auxiliary_encode_keys = {"lifting_target", "lifting_target_visible", "camera_param"} instance_mapping_table = dict( - lifting_target='lifting_target', - lifting_target_visible='lifting_target_visible', + lifting_target="lifting_target", + lifting_target_visible="lifting_target_visible", ) label_mapping_table = dict( - trajectory_weights='trajectory_weights', - lifting_target_label='lifting_target_label', - lifting_target_weight='lifting_target_weight') - - def __init__(self, - num_keypoints: int, - zero_center: bool = True, - root_index: Union[int, List] = 0, - remove_root: bool = False, - save_index: bool = False, - reshape_keypoints: bool = True, - concat_vis: bool = False, - normalize_camera: bool = False): + trajectory_weights="trajectory_weights", lifting_target_label="lifting_target_label", lifting_target_weight="lifting_target_weight" + ) + + def __init__( + self, + num_keypoints: int, + zero_center: bool = True, + root_index: Union[int, List] = 0, + remove_root: bool = False, + save_index: bool = False, + reshape_keypoints: bool = True, + concat_vis: bool = False, + normalize_camera: bool = False, + ): super().__init__() self.num_keypoints = num_keypoints @@ -74,12 +74,14 @@ class VideoPoseLifting(BaseKeypointCodec): self.concat_vis = concat_vis self.normalize_camera = normalize_camera - def encode(self, - keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray] = None, - lifting_target: Optional[np.ndarray] = None, - lifting_target_visible: Optional[np.ndarray] = None, - camera_param: Optional[dict] = None) -> dict: + def encode( + self, + keypoints: np.ndarray, + keypoints_visible: Optional[np.ndarray] = None, + lifting_target: Optional[np.ndarray] = None, + lifting_target_visible: Optional[np.ndarray] = None, + camera_param: Optional[dict] = None, + ) -> dict: """Encoding keypoints from input image space to normalized space. Args: @@ -128,13 +130,12 @@ class VideoPoseLifting(BaseKeypointCodec): # set initial value for `lifting_target_weight` # and `trajectory_weights` if lifting_target_visible is None: - lifting_target_visible = np.ones( - lifting_target.shape[:-1], dtype=np.float32) + lifting_target_visible = np.ones(lifting_target.shape[:-1], dtype=np.float32) lifting_target_weight = lifting_target_visible - trajectory_weights = (1 / lifting_target[:, 2]) + trajectory_weights = 1 / lifting_target[:, 2] else: valid = lifting_target_visible > 0.5 - lifting_target_weight = np.where(valid, 1., 0.).astype(np.float32) + lifting_target_weight = np.where(valid, 1.0, 0.0).astype(np.float32) trajectory_weights = lifting_target_weight if camera_param is None: @@ -145,83 +146,68 @@ class VideoPoseLifting(BaseKeypointCodec): lifting_target_label = lifting_target.copy() # Zero-center the target pose around a given root keypoint if self.zero_center: - assert (lifting_target.ndim >= 2 and - lifting_target.shape[-2] > max(self.root_index)), \ - f'Got invalid joint shape {lifting_target.shape}' + assert lifting_target.ndim >= 2 and lifting_target.shape[-2] > max( + self.root_index + ), f"Got invalid joint shape {lifting_target.shape}" root = np.mean(lifting_target[..., self.root_index, :], axis=-2) lifting_target_label -= root[..., np.newaxis, :] - encoded['target_root'] = root + encoded["target_root"] = root if self.remove_root and len(self.root_index) == 1: root_index = self.root_index[0] - lifting_target_label = np.delete( - lifting_target_label, root_index, axis=-2) - lifting_target_visible = np.delete( - lifting_target_visible, root_index, axis=-2) - assert lifting_target_weight.ndim in { - 2, 3 - }, (f'Got invalid lifting target weights shape ' - f'{lifting_target_weight.shape}') + lifting_target_label = np.delete(lifting_target_label, root_index, axis=-2) + lifting_target_visible = np.delete(lifting_target_visible, root_index, axis=-2) + assert lifting_target_weight.ndim in {2, 3}, f"Got invalid lifting target weights shape " f"{lifting_target_weight.shape}" axis_to_remove = -2 if lifting_target_weight.ndim == 3 else -1 - lifting_target_weight = np.delete( - lifting_target_weight, root_index, axis=axis_to_remove) + lifting_target_weight = np.delete(lifting_target_weight, root_index, axis=axis_to_remove) # Add a flag to avoid latter transforms that rely on the root # joint or the original joint index - encoded['target_root_removed'] = True + encoded["target_root_removed"] = True # Save the root index for restoring the global pose if self.save_index: - encoded['target_root_index'] = root_index + encoded["target_root_index"] = root_index # Normalize the 2D keypoint coordinate with image width and height _camera_param = deepcopy(camera_param) - assert 'w' in _camera_param and 'h' in _camera_param, ( - 'Camera parameter `w` and `h` should be provided.') + assert "w" in _camera_param and "h" in _camera_param, "Camera parameter `w` and `h` should be provided." - center = np.array([0.5 * _camera_param['w'], 0.5 * _camera_param['h']], - dtype=np.float32) - scale = np.array(0.5 * _camera_param['w'], dtype=np.float32) + center = np.array([0.5 * _camera_param["w"], 0.5 * _camera_param["h"]], dtype=np.float32) + scale = np.array(0.5 * _camera_param["w"], dtype=np.float32) keypoint_labels = (keypoints - center) / scale - assert keypoint_labels.ndim in { - 2, 3 - }, (f'Got invalid keypoint labels shape {keypoint_labels.shape}') + assert keypoint_labels.ndim in {2, 3}, f"Got invalid keypoint labels shape {keypoint_labels.shape}" if keypoint_labels.ndim == 2: keypoint_labels = keypoint_labels[None, ...] if self.normalize_camera: - assert 'f' in _camera_param and 'c' in _camera_param, ( - 'Camera parameter `f` and `c` should be provided.') - _camera_param['f'] = _camera_param['f'] / scale - _camera_param['c'] = (_camera_param['c'] - center[:, None]) / scale - encoded['camera_param'] = _camera_param + assert "f" in _camera_param and "c" in _camera_param, "Camera parameter `f` and `c` should be provided." + _camera_param["f"] = _camera_param["f"] / scale + _camera_param["c"] = (_camera_param["c"] - center[:, None]) / scale + encoded["camera_param"] = _camera_param if self.concat_vis: keypoints_visible_ = keypoints_visible if keypoints_visible.ndim == 2: keypoints_visible_ = keypoints_visible[..., None] - keypoint_labels = np.concatenate( - (keypoint_labels, keypoints_visible_), axis=2) + keypoint_labels = np.concatenate((keypoint_labels, keypoints_visible_), axis=2) if self.reshape_keypoints: N = keypoint_labels.shape[0] keypoint_labels = keypoint_labels.transpose(1, 2, 0).reshape(-1, N) - encoded['keypoint_labels'] = keypoint_labels - encoded['keypoints_visible'] = keypoints_visible - encoded['lifting_target_label'] = lifting_target_label - encoded['lifting_target_weight'] = lifting_target_weight - encoded['trajectory_weights'] = trajectory_weights + encoded["keypoint_labels"] = keypoint_labels + encoded["keypoints_visible"] = keypoints_visible + encoded["lifting_target_label"] = lifting_target_label + encoded["lifting_target_weight"] = lifting_target_weight + encoded["trajectory_weights"] = trajectory_weights return encoded - def decode(self, - encoded: np.ndarray, - target_root: Optional[np.ndarray] = None - ) -> Tuple[np.ndarray, np.ndarray]: + def decode(self, encoded: np.ndarray, target_root: Optional[np.ndarray] = None) -> Tuple[np.ndarray, np.ndarray]: """Decode keypoint coordinates from normalized space to input image space. @@ -239,8 +225,7 @@ class VideoPoseLifting(BaseKeypointCodec): if target_root is not None and target_root.size > 0: keypoints = keypoints + target_root if self.remove_root and len(self.root_index) == 1: - keypoints = np.insert( - keypoints, self.root_index, target_root, axis=1) + keypoints = np.insert(keypoints, self.root_index, target_root, axis=1) scores = np.ones(keypoints.shape[:-1], dtype=np.float32) return keypoints, scores diff --git a/mmpose/configs/MaskPose/MaskPose-b-1.0.0.py b/mmpose/configs/MaskPose/MaskPose-b-1.0.0.py new file mode 100644 index 0000000000000000000000000000000000000000..d02a3eb096494653843ab147a7a83c935aefa6ce --- /dev/null +++ b/mmpose/configs/MaskPose/MaskPose-b-1.0.0.py @@ -0,0 +1,388 @@ +# Copyright (c) Miroslav Purkrabek, BMPv2. All rights reserved. + +COCO_ROOT = "path/to/COCO/" +MPII_ROOT = "path/to/MPII/" +AIC_ROOT = "path/to/AIC/" +OCHUMAN_ROOT = "path/to/OCHuman/" + +BATCH_SIZE = 64 +COCO_NAME = "COCO" +MPII_NAME = "MPII" +AIC_NAME = "AIC" +OCHUMAN_NAME = "OCHuman" + +_base_ = ["../_base_/default_runtime.py"] + +# resume = True +load_from = "work_dirs/ViTb-multi/epoch_210.pth" + +# runtime +train_cfg = dict(max_epochs=210, val_interval=5) + +# optimizer +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) + +optim_wrapper = dict( + optimizer=dict(type="AdamW", lr=5e-4 * BATCH_SIZE / 64, betas=(0.9, 0.999), weight_decay=0.1), + paramwise_cfg=dict( + num_layers=12, + layer_decay_rate=0.75, + custom_keys={ + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + }, + ), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), +) + +# learning policy +param_scheduler = [ + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), +] + +# automatically scaling LR based on the actual training batch size +auto_scale_lr = dict(base_batch_size=512) + +# hooks +default_hooks = dict(checkpoint=dict(save_best="{}/AP".format(COCO_NAME), rule="greater", max_keep_ckpts=1)) + +# codec settings +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) + +# model settings +model = dict( + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict( + type="mmpretrain.VisionTransformer", + arch="base", + img_size=(256, 192), + patch_size=16, + qkv_bias=True, + drop_path_rate=0.3, + with_cls_token=False, + out_type="featmap", + patch_cfg=dict(padding=2), + init_cfg=None, + # init_cfg=dict( + # type='Pretrained', + # checkpoint='models/pretrained/mae_pretrain_vit_base_20230913.pth'), + ), + head=dict( + type="HeatmapHead", + in_channels=768, + out_channels=21, + deconv_out_channels=(256, 256), + deconv_kernel_sizes=(4, 4), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + flip_mode="heatmap", + shift_heatmap=False, + ), +) + +# pipelines +train_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict( + type="MaskBackground", + prob=1.0, + continue_on_failure=False, + alpha=0.2, + dilate_prob=0.5, + dilate_amount=0.1, + erode_prob=0.5, + erode_amount=0.5, + ), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), +] +val_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="MaskBackground", continue_on_failure=False, alpha=0.2), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), +] + + +coco_train_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + test_mode=False, +) +coco_val_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=COCO_ROOT + "/detections/rtmdet-l-ins-mask.json", + filter_cfg=dict(bbox_score_thr=0.3), + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) +mpii_train_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +mpii_val_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +aic_train_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +aic_val_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +ochuman_val_dataset = dict( + type="OCHumanDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), + # bbox_file=OCHUMAN_ROOT + "/detections/rtmdet-l-ins.json", + # filter_cfg=dict(bbox_score_thr=0.3), + pipeline=[], + test_mode=True, +) + +combined_val_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII.py"), + datasets=[coco_val_dataset, mpii_val_dataset, aic_val_dataset, ochuman_val_dataset], + pipeline=val_pipeline, + test_mode=True, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 16, + 1: 14, + 2: 12, + 3: 11, + 4: 13, + 5: 15, + 6: 20, + 7: 17, + 8: 18, + 9: 19, + 10: 10, + 11: 8, + 12: 6, + 13: 5, + 14: 7, + 15: 9, + }, # MPII -> COCO and additional points + { + 0: 6, + 1: 8, + 2: 10, + 3: 5, + 4: 7, + 5: 9, + 6: 12, + 7: 14, + 8: 16, + 9: 11, + 10: 13, + 11: 15, + 12: 19, + 13: 17, + }, # AIC -> COCO and additional points + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + ], +) + +combined_train_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII.py"), + datasets=[coco_train_dataset, mpii_train_dataset, aic_train_dataset], + pipeline=train_pipeline, + test_mode=False, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 16, + 1: 14, + 2: 12, + 3: 11, + 4: 13, + 5: 15, + 6: 20, + 7: 17, + 8: 18, + 9: 19, + 10: 10, + 11: 8, + 12: 6, + 13: 5, + 14: 7, + 15: 9, + }, # MPII -> COCO and additional points + { + 0: 6, + 1: 8, + 2: 10, + 3: 5, + 4: 7, + 5: 9, + 6: 12, + 7: 14, + 8: 16, + 9: 11, + 10: 13, + 11: 15, + 12: 19, + 13: 17, + }, # AIC -> COCO and additional points + ], +) + +# data loaders +train_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=True, + sampler=dict( + type="MultiSourceSampler", + batch_size=BATCH_SIZE, + source_ratio=[1, 1, 1], + shuffle=True, + ), + dataset=combined_train_dataset, +) +val_dataloader = dict( + batch_size=128, + num_workers=8, + persistent_workers=True, + drop_last=False, + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), + dataset=combined_val_dataset, +) +test_dataloader = val_dataloader + +# evaluators +val_evaluator = dict( + type="MultiDatasetEvaluator", + metrics=[ + dict( + type="CocoMetric", + ann_file=COCO_ROOT + "annotations/person_keypoints_val2017.json", + prefix=COCO_NAME, + nms_mode="none", + outfile_prefix="COCO_MaskPose", + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + dict( + type="PCKAccuracy", + prefix=MPII_NAME, + ), + dict( + type="PCKAccuracy", + prefix=AIC_NAME, + ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "annotations/person_keypoints_val2017.json", + prefix=OCHUMAN_NAME, + outfile_prefix="ochuman", + nms_mode="none", + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + ], + datasets=combined_val_dataset["datasets"], +) +test_evaluator = val_evaluator diff --git a/mmpose/configs/MaskPose/MaskPose-b-1.1.0.py b/mmpose/configs/MaskPose/MaskPose-b-1.1.0.py new file mode 100644 index 0000000000000000000000000000000000000000..ad9a2ea650a1f4898463b9b6eea6f618cd0cb4da --- /dev/null +++ b/mmpose/configs/MaskPose/MaskPose-b-1.1.0.py @@ -0,0 +1,386 @@ +# Copyright (c) Miroslav Purkrabek, BMPv2. All rights reserved. + +COCO_ROOT = "path/to/COCO/original/" +MPII_ROOT = "path/to/MPII/" +AIC_ROOT = "path/to/AIC/" +OCHUMAN_ROOT = "path/to/OCHuman/" + +BATCH_SIZE = 128 +COCO_NAME = "COCO" +MPII_NAME = "MPII" +AIC_NAME = "AIC" +OCHUMAN_NAME = "OCHuman" + +_base_ = ["../_base_/default_runtime.py"] + +# resume = True +load_from = "models/pretrained/vitpose-b.pth" + +# runtime +train_cfg = dict(max_epochs=210, val_interval=5) + +# optimizer +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) + +optim_wrapper = dict( + optimizer=dict(type="AdamW", lr=5e-4 * BATCH_SIZE / 64, betas=(0.9, 0.999), weight_decay=0.1), + paramwise_cfg=dict( + num_layers=12, + layer_decay_rate=0.75, + custom_keys={ + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + }, + ), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), +) + +# learning policy +param_scheduler = [ + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), +] + +# automatically scaling LR based on the actual training batch size +auto_scale_lr = dict(base_batch_size=512) + +# hooks +default_hooks = dict(checkpoint=dict(save_best="{}/AP".format(COCO_NAME), rule="greater", max_keep_ckpts=1)) + +# codec settings +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) + +# model settings +model = dict( + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict( + type="mmpretrain.VisionTransformer", + arch="base", + img_size=(256, 192), + patch_size=16, + qkv_bias=True, + drop_path_rate=0.3, + with_cls_token=False, + out_type="featmap", + patch_cfg=dict(padding=2), + init_cfg=None, + # init_cfg=dict( + # type='Pretrained', + # checkpoint='models/pretrained/mae_pretrain_vit_base_20230913.pth'), + ), + head=dict( + type="HeatmapHead", + in_channels=768, + out_channels=23, + deconv_out_channels=(256, 256), + deconv_kernel_sizes=(4, 4), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + flip_mode="heatmap", + shift_heatmap=False, + ), +) + +# pipelines +train_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict( + type="MaskBackground", + prob=1.0, + continue_on_failure=False, + alpha=0.25, + dilate_prob=0.5, + dilate_amount=0.1, + erode_prob=0.5, + erode_amount=0.5, + patches_computation_method="blob", # random_grid or blob + ), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), +] +val_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="MaskBackground", continue_on_failure=False, alpha=0.25), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), +] + +coco_train_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + test_mode=False, +) +coco_val_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) +mpii_train_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +mpii_val_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +aic_train_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +aic_val_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +ochuman_val_dataset = dict( + type="OCHumanDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + pipeline=[], + test_mode=True, +) + +ochuman_dt_val_dataset = dict( + type="OCHumanDetDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + bbox_file=OCHUMAN_ROOT + "../detections/rtmdet-l-ins_val.json", + filter_cfg=dict(bbox_score_thr=0.3), + pipeline=[], + test_mode=True, +) + +combined_val_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_val_dataset, ochuman_val_dataset, ochuman_dt_val_dataset], + pipeline=val_pipeline, + test_mode=True, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + ], +) + +combined_train_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_train_dataset, mpii_train_dataset, aic_train_dataset], + pipeline=train_pipeline, + test_mode=False, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 16, + 1: 14, + 2: 12, + 3: 11, + 4: 13, + 5: 15, + 6: 18, + 7: 17, + 8: 19, + 9: 20, + 10: 10, + 11: 8, + 12: 6, + 13: 5, + 14: 7, + 15: 9, + }, # MPII -> COCO and additional points + { + 0: 6, + 1: 8, + 2: 10, + 3: 5, + 4: 7, + 5: 9, + 6: 12, + 7: 14, + 8: 16, + 9: 11, + 10: 13, + 11: 15, + 12: 22, + 13: 21, + }, # AIC -> COCO and additional points + ], +) + +# data loaders +train_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + sampler=dict( + type="MultiSourceSampler", + batch_size=BATCH_SIZE, + source_ratio=[149813, 22246, 378352], + shuffle=True, + ), + dataset=combined_train_dataset, +) +val_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + drop_last=False, + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), + dataset=combined_val_dataset, +) +test_dataloader = val_dataloader + +# evaluators +val_evaluator = dict( + type="MultiDatasetEvaluator", + metrics=[ + dict( + type="CocoMetric", + ann_file=COCO_ROOT + "annotations/person_keypoints_val2017.json", + prefix=COCO_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + # dict(type='PCKAccuracy', + # prefix=MPII_NAME, + # ), + # dict(type='PCKAccuracy', + # prefix=AIC_NAME, + # ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME + "_det", + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + ], + datasets=combined_val_dataset["datasets"], +) +test_evaluator = val_evaluator diff --git a/mmpose/configs/MaskPose/MaskPose-b-wb-1.1.0.py b/mmpose/configs/MaskPose/MaskPose-b-wb-1.1.0.py new file mode 100644 index 0000000000000000000000000000000000000000..91377a437ee10181b4f53204974f2ecc88bc2243 --- /dev/null +++ b/mmpose/configs/MaskPose/MaskPose-b-wb-1.1.0.py @@ -0,0 +1,405 @@ +# Copyright (c) Miroslav Purkrabek, BMPv2. All rights reserved. + +COCO_ROOT = "path/to/COCO/original/" +MPII_ROOT = "path/to/MPII/" +AIC_ROOT = "path/to/AIC/" +OCHUMAN_ROOT = "path/to/OCHuman/" + +BATCH_SIZE = 128 +COCO_NAME = "COCO" +MPII_NAME = "MPII" +AIC_NAME = "AIC" +OCHUMAN_NAME = "OCHuman" + +_base_ = ["../_base_/default_runtime.py"] + +# resume = True +load_from = "models/pretrained/vitpose-b.pth" + +# runtime +train_cfg = dict(max_epochs=210, val_interval=5) + +# optimizer +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) + +optim_wrapper = dict( + optimizer=dict(type="AdamW", lr=5e-4 * BATCH_SIZE / 64, betas=(0.9, 0.999), weight_decay=0.1), + paramwise_cfg=dict( + num_layers=12, + layer_decay_rate=0.75, + custom_keys={ + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + }, + ), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), +) + +# learning policy +param_scheduler = [ + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), +] + +# automatically scaling LR based on the actual training batch size +auto_scale_lr = dict(base_batch_size=512) + +# hooks +default_hooks = dict(checkpoint=dict(save_best="{}/AP".format(COCO_NAME), rule="greater", max_keep_ckpts=1)) + +# codec settings +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) + +# model settings +model = dict( + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict( + type="mmpretrain.VisionTransformer", + arch="base", + img_size=(256, 192), + patch_size=16, + qkv_bias=True, + drop_path_rate=0.3, + with_cls_token=False, + out_type="featmap", + patch_cfg=dict(padding=2), + init_cfg=None, + # init_cfg=dict( + # type='Pretrained', + # checkpoint='models/pretrained/mae_pretrain_vit_base_20230913.pth'), + ), + head=dict( + type="HeatmapHead", + in_channels=768, + out_channels=41, + deconv_out_channels=(256, 256), + deconv_kernel_sizes=(4, 4), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + flip_mode="heatmap", + shift_heatmap=False, + ), +) + +# pipelines +train_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict( + type="MaskBackground", + prob=1.0, + continue_on_failure=False, + alpha=0.25, + dilate_prob=0.5, + dilate_amount=0.1, + erode_prob=0.5, + erode_amount=0.5, + patches_computation_method="blob", # random_grid or blob + ), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), +] +val_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="MaskBackground", continue_on_failure=False, alpha=0.25), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), +] + +coco_train_dataset = dict( + type="CocoWholeBodyDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + test_mode=False, +) +coco_val_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) +mpii_train_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +mpii_val_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +aic_train_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +aic_val_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +ochuman_val_dataset = dict( + type="OCHumanDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) + +ochuman_dt_val_dataset = dict( + type="OCHumanDetDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + bbox_file=OCHUMAN_ROOT + "../detections/rtmdet-l-ins_val.json", + filter_cfg=dict(bbox_score_thr=0.3), + pipeline=[], + test_mode=True, +) + +combined_val_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_wLimbs.py"), + datasets=[coco_val_dataset, ochuman_val_dataset, ochuman_dt_val_dataset], + pipeline=val_pipeline, + test_mode=True, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + ], +) + +combined_train_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_wLimbs.py"), + datasets=[coco_train_dataset, mpii_train_dataset, aic_train_dataset], + pipeline=train_pipeline, + test_mode=False, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, # Identity mapping for COCO as merged is based on COCO + 17: 23, + 18: 24, + 19: 25, + 20: 26, + 21: 27, + 22: 28, # Feet kpts of COCO-wb + 95: 29, + 99: 30, + 103: 31, + 107: 32, + 111: 33, # Left hand kpts of COCO-wb + 116: 34, + 120: 35, + 124: 36, + 128: 37, + 132: 38, # Right hand kpts of COCO-wb + 71: 39, + 72: 40, + }, # Mouth corners of COCO-wb + { + 0: 16, + 1: 14, + 2: 12, + 3: 11, + 4: 13, + 5: 15, + 6: 18, + 7: 17, + 8: 19, + 9: 20, + 10: 10, + 11: 8, + 12: 6, + 13: 5, + 14: 7, + 15: 9, + }, # MPII -> COCO and additional points + { + 0: 6, + 1: 8, + 2: 10, + 3: 5, + 4: 7, + 5: 9, + 6: 12, + 7: 14, + 8: 16, + 9: 11, + 10: 13, + 11: 15, + 12: 22, + 13: 21, + }, # AIC -> COCO and additional points + ], +) + +# data loaders +train_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + sampler=dict( + type="MultiSourceSampler", + batch_size=BATCH_SIZE, + source_ratio=[149813, 22246, 378352], + shuffle=True, + ), + dataset=combined_train_dataset, +) +val_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + drop_last=False, + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), + dataset=combined_val_dataset, +) +test_dataloader = val_dataloader + +# evaluators +val_evaluator = dict( + type="MultiDatasetEvaluator", + metrics=[ + dict( + type="CocoMetric", + ann_file=COCO_ROOT + "annotations/person_keypoints_val2017.json", + prefix=COCO_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + # dict(type='PCKAccuracy', + # prefix=MPII_NAME, + # ), + # dict(type='PCKAccuracy', + # prefix=AIC_NAME, + # ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME + "_det", + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + ], + datasets=combined_val_dataset["datasets"], +) +test_evaluator = val_evaluator diff --git a/mmpose/configs/MaskPose/MaskPose-h-1.1.0.py b/mmpose/configs/MaskPose/MaskPose-h-1.1.0.py new file mode 100644 index 0000000000000000000000000000000000000000..76754d2a9fa3a98bf1c99abc0e26642b914203ac --- /dev/null +++ b/mmpose/configs/MaskPose/MaskPose-h-1.1.0.py @@ -0,0 +1,388 @@ +# Copyright (c) Miroslav Purkrabek, BMPv2. All rights reserved. + +COCO_ROOT = "path/to/COCO/original/" +MPII_ROOT = "path/to/MPII/" +AIC_ROOT = "path/to/AIC/" +OCHUMAN_ROOT = "path/to/OCHuman/" + +BATCH_SIZE = 32 +COCO_NAME = "COCO" +MPII_NAME = "MPII" +AIC_NAME = "AIC" +OCHUMAN_NAME = "OCHuman" + +_base_ = ["../_base_/default_runtime.py"] + +# resume = True +load_from = "models/pretrained/vitpose-h.pth" + +# runtime +train_cfg = dict(max_epochs=210, val_interval=5) + +# optimizer +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) + +optim_wrapper = dict( + optimizer=dict(type="AdamW", lr=5e-4 * BATCH_SIZE / 64, betas=(0.9, 0.999), weight_decay=0.1), + paramwise_cfg=dict( + num_layers=32, + layer_decay_rate=0.85, + custom_keys={ + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + }, + ), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), +) + +# learning policy +param_scheduler = [ + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), +] + +# automatically scaling LR based on the actual training batch size +auto_scale_lr = dict(base_batch_size=512) + +# hooks +default_hooks = dict(checkpoint=dict(save_best="{}/AP".format(COCO_NAME), rule="greater", max_keep_ckpts=1)) + +# codec settings +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) + +# model settings +model = dict( + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict( + type="mmpretrain.VisionTransformer", + arch="huge", + img_size=(256, 192), + patch_size=16, + qkv_bias=True, + drop_path_rate=0.55, + with_cls_token=False, + out_type="featmap", + patch_cfg=dict(padding=2), + init_cfg=None, + # init_cfg=dict( + # type='Pretrained', + # checkpoint='models/pretrained/mae_pretrain_vit_base_20230913.pth'), + ), + head=dict( + type="HeatmapHead", + in_channels=1280, + out_channels=23, + deconv_out_channels=(256, 256), + deconv_kernel_sizes=(4, 4), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + flip_mode="heatmap", + shift_heatmap=False, + ), +) + +# pipelines +train_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict( + type="MaskBackground", + prob=1.0, + continue_on_failure=False, + alpha=0.25, + dilate_prob=0.5, + dilate_amount=0.1, + erode_prob=0.5, + erode_amount=0.5, + patches_computation_method="blob", # random_grid or blob + ), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), +] +val_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="MaskBackground", continue_on_failure=False, alpha=0.25), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), +] + + +coco_train_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + test_mode=False, +) +coco_val_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) +mpii_train_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +mpii_val_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +aic_train_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +aic_val_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +ochuman_val_dataset = dict( + type="OCHumanDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) + +ochuman_dt_val_dataset = dict( + type="OCHumanDetDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + bbox_file=OCHUMAN_ROOT + "../detections/rtmdet-l-ins_val.json", + filter_cfg=dict(bbox_score_thr=0.3), + pipeline=[], + test_mode=True, +) + +combined_val_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_val_dataset, ochuman_val_dataset, ochuman_dt_val_dataset], + pipeline=val_pipeline, + test_mode=True, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + ], +) + +combined_train_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_train_dataset, mpii_train_dataset, aic_train_dataset], + pipeline=train_pipeline, + test_mode=False, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 16, + 1: 14, + 2: 12, + 3: 11, + 4: 13, + 5: 15, + 6: 18, + 7: 17, + 8: 19, + 9: 20, + 10: 10, + 11: 8, + 12: 6, + 13: 5, + 14: 7, + 15: 9, + }, # MPII -> COCO and additional points + { + 0: 6, + 1: 8, + 2: 10, + 3: 5, + 4: 7, + 5: 9, + 6: 12, + 7: 14, + 8: 16, + 9: 11, + 10: 13, + 11: 15, + 12: 22, + 13: 21, + }, # AIC -> COCO and additional points + ], +) + +# data loaders +train_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + sampler=dict( + type="MultiSourceSampler", + batch_size=BATCH_SIZE, + source_ratio=[149813, 22246, 378352], + shuffle=True, + ), + dataset=combined_train_dataset, +) +val_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + drop_last=False, + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), + dataset=combined_val_dataset, +) +test_dataloader = val_dataloader + +# evaluators +val_evaluator = dict( + type="MultiDatasetEvaluator", + metrics=[ + dict( + type="CocoMetric", + ann_file=COCO_ROOT + "annotations/person_keypoints_val2017.json", + prefix=COCO_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + # dict(type='PCKAccuracy', + # prefix=MPII_NAME, + # ), + # dict(type='PCKAccuracy', + # prefix=AIC_NAME, + # ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME + "_det", + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + ], + datasets=combined_val_dataset["datasets"], +) +test_evaluator = val_evaluator diff --git a/mmpose/configs/MaskPose/MaskPose-h-wb-1.1.0.py b/mmpose/configs/MaskPose/MaskPose-h-wb-1.1.0.py new file mode 100644 index 0000000000000000000000000000000000000000..3e893046b6743b413c9705f7407831dd1025d33d --- /dev/null +++ b/mmpose/configs/MaskPose/MaskPose-h-wb-1.1.0.py @@ -0,0 +1,407 @@ +# Copyright (c) Miroslav Purkrabek, BMPv2. All rights reserved. + +COCO_ROOT = "/path/to/COCO/" +MPII_ROOT = "/path/to/MPII/" +AIC_ROOT = "/path/to/AIC/" +OCHUMAN_ROOT = "/path/to/OCHuman/" + + +BATCH_SIZE = 128 +COCO_NAME = "COCO" +MPII_NAME = "MPII" +AIC_NAME = "AIC" +OCHUMAN_NAME = "OCHuman" + +_base_ = ["../_base_/default_runtime.py"] + +# resume = True +load_from = "models/pretrained/vitpose-h.pth" + +# runtime +train_cfg = dict(max_epochs=210, val_interval=5) + +# optimizer +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) + +optim_wrapper = dict( + optimizer=dict(type="AdamW", lr=5e-4 * BATCH_SIZE / 64, betas=(0.9, 0.999), weight_decay=0.1), + paramwise_cfg=dict( + num_layers=32, + layer_decay_rate=0.85, + custom_keys={ + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + }, + ), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), +) + +# learning policy +param_scheduler = [ + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), +] + +# automatically scaling LR based on the actual training batch size +auto_scale_lr = dict(base_batch_size=512) + +# hooks +default_hooks = dict(checkpoint=dict(save_best="{}/AP".format(COCO_NAME), rule="greater", max_keep_ckpts=1)) + +# codec settings +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) + +# model settings +model = dict( + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict( + type="mmpretrain.VisionTransformer", + arch="huge", + img_size=(256, 192), + patch_size=16, + qkv_bias=True, + drop_path_rate=0.55, + with_cls_token=False, + out_type="featmap", + patch_cfg=dict(padding=2), + init_cfg=None, + # init_cfg=dict( + # type='Pretrained', + # checkpoint='models/pretrained/mae_pretrain_vit_base_20230913.pth'), + ), + head=dict( + type="HeatmapHead", + in_channels=1280, + out_channels=41, + deconv_out_channels=(256, 256), + deconv_kernel_sizes=(4, 4), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + flip_mode="heatmap", + shift_heatmap=False, + ), +) + +# pipelines +train_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict( + type="MaskBackground", + prob=1.0, + continue_on_failure=False, + alpha=0.25, + dilate_prob=0.5, + dilate_amount=0.1, + erode_prob=0.5, + erode_amount=0.5, + patches_computation_method="blob", # random_grid or blob + ), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), +] +val_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="MaskBackground", continue_on_failure=False, alpha=0.25), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), +] + + +coco_train_dataset = dict( + type="CocoWholeBodyDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + test_mode=False, +) +coco_val_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) +mpii_train_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +mpii_val_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +aic_train_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +aic_val_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +ochuman_val_dataset = dict( + type="OCHumanDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) + +ochuman_dt_val_dataset = dict( + type="OCHumanDetDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + bbox_file=OCHUMAN_ROOT + "../detections/rtmdet-l-ins_val.json", + filter_cfg=dict(bbox_score_thr=0.3), + pipeline=[], + test_mode=True, +) + +combined_val_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_wLimbs.py"), + datasets=[coco_val_dataset, ochuman_val_dataset, ochuman_dt_val_dataset], + pipeline=val_pipeline, + test_mode=True, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + ], +) + +combined_train_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_wLimbs.py"), + datasets=[coco_train_dataset, mpii_train_dataset, aic_train_dataset], + pipeline=train_pipeline, + test_mode=False, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, # Identity mapping for COCO as merged is based on COCO + 17: 23, + 18: 24, + 19: 25, + 20: 26, + 21: 27, + 22: 28, # Feet kpts of COCO-wb + 95: 29, + 99: 30, + 103: 31, + 107: 32, + 111: 33, # Left hand kpts of COCO-wb + 116: 34, + 120: 35, + 124: 36, + 128: 37, + 132: 38, # Right hand kpts of COCO-wb + 71: 39, + 72: 40, + }, # Mouth corners of COCO-wb + { + 0: 16, + 1: 14, + 2: 12, + 3: 11, + 4: 13, + 5: 15, + 6: 18, + 7: 17, + 8: 19, + 9: 20, + 10: 10, + 11: 8, + 12: 6, + 13: 5, + 14: 7, + 15: 9, + }, # MPII -> COCO and additional points + { + 0: 6, + 1: 8, + 2: 10, + 3: 5, + 4: 7, + 5: 9, + 6: 12, + 7: 14, + 8: 16, + 9: 11, + 10: 13, + 11: 15, + 12: 22, + 13: 21, + }, # AIC -> COCO and additional points + ], +) + +# data loaders +train_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + sampler=dict( + type="MultiSourceSampler", + batch_size=BATCH_SIZE, + source_ratio=[149813, 22246, 378352], + shuffle=True, + ), + dataset=combined_train_dataset, +) +val_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + drop_last=False, + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), + dataset=combined_val_dataset, +) +test_dataloader = val_dataloader + +# evaluators +val_evaluator = dict( + type="MultiDatasetEvaluator", + metrics=[ + dict( + type="CocoMetric", + ann_file=COCO_ROOT + "annotations/person_keypoints_val2017.json", + prefix=COCO_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + # dict(type='PCKAccuracy', + # prefix=MPII_NAME, + # ), + # dict(type='PCKAccuracy', + # prefix=AIC_NAME, + # ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME + "_det", + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + ], + datasets=combined_val_dataset["datasets"], +) +test_evaluator = val_evaluator diff --git a/mmpose/configs/MaskPose/MaskPose-l-1.1.0.py b/mmpose/configs/MaskPose/MaskPose-l-1.1.0.py new file mode 100644 index 0000000000000000000000000000000000000000..6bf2ede300cc2af064bf125bdea080d393a2ed77 --- /dev/null +++ b/mmpose/configs/MaskPose/MaskPose-l-1.1.0.py @@ -0,0 +1,389 @@ +# Copyright (c) Miroslav Purkrabek, BMPv2. All rights reserved. + +COCO_ROOT = "/path/to/COCO/" +MPII_ROOT = "/path/to/MPII/" +AIC_ROOT = "/path/to/AIC/" +OCHUMAN_ROOT = "/path/to/OCHuman/" + + +BATCH_SIZE = 128 +COCO_NAME = "COCO" +MPII_NAME = "MPII" +AIC_NAME = "AIC" +OCHUMAN_NAME = "OCHuman" + +_base_ = ["../_base_/default_runtime.py"] + +# resume = True +load_from = "models/pretrained/vitpose-l.pth" + +# runtime +train_cfg = dict(max_epochs=210, val_interval=5) + +# optimizer +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) + +optim_wrapper = dict( + optimizer=dict(type="AdamW", lr=5e-4 * BATCH_SIZE / 64, betas=(0.9, 0.999), weight_decay=0.1), + paramwise_cfg=dict( + num_layers=24, + layer_decay_rate=0.8, + custom_keys={ + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + }, + ), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), +) + +# learning policy +param_scheduler = [ + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), +] + +# automatically scaling LR based on the actual training batch size +auto_scale_lr = dict(base_batch_size=512) + +# hooks +default_hooks = dict(checkpoint=dict(save_best="{}/AP".format(COCO_NAME), rule="greater", max_keep_ckpts=1)) + +# codec settings +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) + +# model settings +model = dict( + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict( + type="mmpretrain.VisionTransformer", + arch="large", + img_size=(256, 192), + patch_size=16, + qkv_bias=True, + drop_path_rate=0.5, + with_cls_token=False, + out_type="featmap", + patch_cfg=dict(padding=2), + init_cfg=None, + # init_cfg=dict( + # type='Pretrained', + # checkpoint='models/pretrained/mae_pretrain_vit_base_20230913.pth'), + ), + head=dict( + type="HeatmapHead", + in_channels=1024, + out_channels=23, + deconv_out_channels=(256, 256), + deconv_kernel_sizes=(4, 4), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + flip_mode="heatmap", + shift_heatmap=False, + ), +) + +# pipelines +train_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict( + type="MaskBackground", + prob=1.0, + continue_on_failure=False, + alpha=0.25, + dilate_prob=0.5, + dilate_amount=0.1, + erode_prob=0.5, + erode_amount=0.5, + patches_computation_method="blob", # random_grid or blob + ), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), +] +val_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="MaskBackground", continue_on_failure=False, alpha=0.25), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), +] + + +coco_train_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + test_mode=False, +) +coco_val_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) +mpii_train_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +mpii_val_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +aic_train_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +aic_val_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +ochuman_val_dataset = dict( + type="OCHumanDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) + +ochuman_dt_val_dataset = dict( + type="OCHumanDetDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + bbox_file=OCHUMAN_ROOT + "../detections/rtmdet-l-ins_val.json", + filter_cfg=dict(bbox_score_thr=0.3), + pipeline=[], + test_mode=True, +) + +combined_val_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_val_dataset, ochuman_val_dataset, ochuman_dt_val_dataset], + pipeline=val_pipeline, + test_mode=True, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + ], +) + +combined_train_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_train_dataset, mpii_train_dataset, aic_train_dataset], + pipeline=train_pipeline, + test_mode=False, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 16, + 1: 14, + 2: 12, + 3: 11, + 4: 13, + 5: 15, + 6: 18, + 7: 17, + 8: 19, + 9: 20, + 10: 10, + 11: 8, + 12: 6, + 13: 5, + 14: 7, + 15: 9, + }, # MPII -> COCO and additional points + { + 0: 6, + 1: 8, + 2: 10, + 3: 5, + 4: 7, + 5: 9, + 6: 12, + 7: 14, + 8: 16, + 9: 11, + 10: 13, + 11: 15, + 12: 22, + 13: 21, + }, # AIC -> COCO and additional points + ], +) + +# data loaders +train_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + sampler=dict( + type="MultiSourceSampler", + batch_size=BATCH_SIZE, + source_ratio=[149813, 22246, 378352], + shuffle=True, + ), + dataset=combined_train_dataset, +) +val_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + drop_last=False, + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), + dataset=combined_val_dataset, +) +test_dataloader = val_dataloader + +# evaluators +val_evaluator = dict( + type="MultiDatasetEvaluator", + metrics=[ + dict( + type="CocoMetric", + ann_file=COCO_ROOT + "annotations/person_keypoints_val2017.json", + prefix=COCO_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + # dict(type='PCKAccuracy', + # prefix=MPII_NAME, + # ), + # dict(type='PCKAccuracy', + # prefix=AIC_NAME, + # ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME + "_det", + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + ], + datasets=combined_val_dataset["datasets"], +) +test_evaluator = val_evaluator diff --git a/mmpose/configs/MaskPose/MaskPose-l-wb-1.1.0.py b/mmpose/configs/MaskPose/MaskPose-l-wb-1.1.0.py new file mode 100644 index 0000000000000000000000000000000000000000..a1b72d247b07203c5ff436ee92c52ec00f9c596f --- /dev/null +++ b/mmpose/configs/MaskPose/MaskPose-l-wb-1.1.0.py @@ -0,0 +1,407 @@ +# Copyright (c) Miroslav Purkrabek, BMPv2. All rights reserved. + +COCO_ROOT = "/path/to/COCO/" +MPII_ROOT = "/path/to/MPII/" +AIC_ROOT = "/path/to/AIC/" +OCHUMAN_ROOT = "/path/to/OCHuman/" + + +BATCH_SIZE = 128 +COCO_NAME = "COCO" +MPII_NAME = "MPII" +AIC_NAME = "AIC" +OCHUMAN_NAME = "OCHuman" + +_base_ = ["../_base_/default_runtime.py"] + +# resume = True +load_from = "models/pretrained/vitpose-l.pth" + +# runtime +train_cfg = dict(max_epochs=210, val_interval=5) + +# optimizer +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) + +optim_wrapper = dict( + optimizer=dict(type="AdamW", lr=5e-4 * BATCH_SIZE / 64, betas=(0.9, 0.999), weight_decay=0.1), + paramwise_cfg=dict( + num_layers=24, + layer_decay_rate=0.8, + custom_keys={ + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + }, + ), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), +) + +# learning policy +param_scheduler = [ + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), +] + +# automatically scaling LR based on the actual training batch size +auto_scale_lr = dict(base_batch_size=512) + +# hooks +default_hooks = dict(checkpoint=dict(save_best="{}/AP".format(COCO_NAME), rule="greater", max_keep_ckpts=1)) + +# codec settings +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) + +# model settings +model = dict( + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict( + type="mmpretrain.VisionTransformer", + arch="large", + img_size=(256, 192), + patch_size=16, + qkv_bias=True, + drop_path_rate=0.5, + with_cls_token=False, + out_type="featmap", + patch_cfg=dict(padding=2), + init_cfg=None, + # init_cfg=dict( + # type='Pretrained', + # checkpoint='models/pretrained/mae_pretrain_vit_base_20230913.pth'), + ), + head=dict( + type="HeatmapHead", + in_channels=1024, + out_channels=41, + deconv_out_channels=(256, 256), + deconv_kernel_sizes=(4, 4), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + flip_mode="heatmap", + shift_heatmap=False, + ), +) + +# pipelines +train_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict( + type="MaskBackground", + prob=1.0, + continue_on_failure=False, + alpha=0.25, + dilate_prob=0.5, + dilate_amount=0.1, + erode_prob=0.5, + erode_amount=0.5, + patches_computation_method="blob", # random_grid or blob + ), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), +] +val_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="MaskBackground", continue_on_failure=False, alpha=0.25), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), +] + + +coco_train_dataset = dict( + type="CocoWholeBodyDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + test_mode=False, +) +coco_val_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) +mpii_train_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +mpii_val_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +aic_train_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +aic_val_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +ochuman_val_dataset = dict( + type="OCHumanDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) + +ochuman_dt_val_dataset = dict( + type="OCHumanDetDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + bbox_file=OCHUMAN_ROOT + "../detections/rtmdet-l-ins_val.json", + filter_cfg=dict(bbox_score_thr=0.3), + pipeline=[], + test_mode=True, +) + +combined_val_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_wLimbs.py"), + datasets=[coco_val_dataset, ochuman_val_dataset, ochuman_dt_val_dataset], + pipeline=val_pipeline, + test_mode=True, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + ], +) + +combined_train_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_wLimbs.py"), + datasets=[coco_train_dataset, mpii_train_dataset, aic_train_dataset], + pipeline=train_pipeline, + test_mode=False, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, # Identity mapping for COCO as merged is based on COCO + 17: 23, + 18: 24, + 19: 25, + 20: 26, + 21: 27, + 22: 28, # Feet kpts of COCO-wb + 95: 29, + 99: 30, + 103: 31, + 107: 32, + 111: 33, # Left hand kpts of COCO-wb + 116: 34, + 120: 35, + 124: 36, + 128: 37, + 132: 38, # Right hand kpts of COCO-wb + 71: 39, + 72: 40, + }, # Mouth corners of COCO-wb + { + 0: 16, + 1: 14, + 2: 12, + 3: 11, + 4: 13, + 5: 15, + 6: 18, + 7: 17, + 8: 19, + 9: 20, + 10: 10, + 11: 8, + 12: 6, + 13: 5, + 14: 7, + 15: 9, + }, # MPII -> COCO and additional points + { + 0: 6, + 1: 8, + 2: 10, + 3: 5, + 4: 7, + 5: 9, + 6: 12, + 7: 14, + 8: 16, + 9: 11, + 10: 13, + 11: 15, + 12: 22, + 13: 21, + }, # AIC -> COCO and additional points + ], +) + +# data loaders +train_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + sampler=dict( + type="MultiSourceSampler", + batch_size=BATCH_SIZE, + source_ratio=[149813, 22246, 378352], + shuffle=True, + ), + dataset=combined_train_dataset, +) +val_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + drop_last=False, + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), + dataset=combined_val_dataset, +) +test_dataloader = val_dataloader + +# evaluators +val_evaluator = dict( + type="MultiDatasetEvaluator", + metrics=[ + dict( + type="CocoMetric", + ann_file=COCO_ROOT + "annotations/person_keypoints_val2017.json", + prefix=COCO_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + # dict(type='PCKAccuracy', + # prefix=MPII_NAME, + # ), + # dict(type='PCKAccuracy', + # prefix=AIC_NAME, + # ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME + "_det", + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + ], + datasets=combined_val_dataset["datasets"], +) +test_evaluator = val_evaluator diff --git a/mmpose/configs/MaskPose/MaskPose-s-1.1.0.py b/mmpose/configs/MaskPose/MaskPose-s-1.1.0.py new file mode 100644 index 0000000000000000000000000000000000000000..44c37ef79bcabc80de0b6e89498886f14123863e --- /dev/null +++ b/mmpose/configs/MaskPose/MaskPose-s-1.1.0.py @@ -0,0 +1,388 @@ +# Copyright (c) Miroslav Purkrabek, BMPv2. All rights reserved. + +COCO_ROOT = "/path/to/COCO/" +MPII_ROOT = "/path/to/MPII/" +AIC_ROOT = "/path/to/AIC/" +OCHUMAN_ROOT = "/path/to/OCHuman/" + +BATCH_SIZE = 128 +COCO_NAME = "COCO" +MPII_NAME = "MPII" +AIC_NAME = "AIC" +OCHUMAN_NAME = "OCHuman" + +_base_ = ["../_base_/default_runtime.py"] + +# resume = True +load_from = "models/pretrained/vitpose-s.pth" + +# runtime +train_cfg = dict(max_epochs=210, val_interval=5) + +# optimizer +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) + +optim_wrapper = dict( + optimizer=dict(type="AdamW", lr=5e-4 * BATCH_SIZE / 64, betas=(0.9, 0.999), weight_decay=0.1), + paramwise_cfg=dict( + num_layers=12, + layer_decay_rate=0.8, + custom_keys={ + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + }, + ), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), +) + +# learning policy +param_scheduler = [ + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), +] + +# automatically scaling LR based on the actual training batch size +auto_scale_lr = dict(base_batch_size=512) + +# hooks +default_hooks = dict(checkpoint=dict(save_best="{}/AP".format(COCO_NAME), rule="greater", max_keep_ckpts=1)) + +# codec settings +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) + +# model settings +model = dict( + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict( + type="mmpretrain.VisionTransformer", + arch={"embed_dims": 384, "num_layers": 12, "num_heads": 12, "feedforward_channels": 384 * 4}, + img_size=(256, 192), + patch_size=16, + qkv_bias=True, + drop_path_rate=0.1, + with_cls_token=False, + out_type="featmap", + patch_cfg=dict(padding=2), + init_cfg=None, + # init_cfg=dict( + # type='Pretrained', + # checkpoint='models/pretrained/mae_pretrain_vit_base_20230913.pth'), + ), + head=dict( + type="HeatmapHead", + in_channels=384, + out_channels=23, + deconv_out_channels=(256, 256), + deconv_kernel_sizes=(4, 4), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + flip_mode="heatmap", + shift_heatmap=False, + ), +) + +# pipelines +train_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict( + type="MaskBackground", + prob=1.0, + continue_on_failure=False, + alpha=0.25, + dilate_prob=0.5, + dilate_amount=0.1, + erode_prob=0.5, + erode_amount=0.5, + patches_computation_method="blob", # random_grid or blob + ), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), +] +val_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="MaskBackground", continue_on_failure=False, alpha=0.25), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), +] + + +coco_train_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + test_mode=False, +) +coco_val_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) +mpii_train_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +mpii_val_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +aic_train_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +aic_val_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +ochuman_val_dataset = dict( + type="OCHumanDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) + +ochuman_dt_val_dataset = dict( + type="OCHumanDetDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + bbox_file=OCHUMAN_ROOT + "../detections/rtmdet-l-ins_test.json", + filter_cfg=dict(bbox_score_thr=0.3), + pipeline=[], + test_mode=True, +) + +combined_val_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_val_dataset, ochuman_val_dataset, ochuman_dt_val_dataset], + pipeline=val_pipeline, + test_mode=True, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + ], +) + +combined_train_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_train_dataset, mpii_train_dataset, aic_train_dataset], + pipeline=train_pipeline, + test_mode=False, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 16, + 1: 14, + 2: 12, + 3: 11, + 4: 13, + 5: 15, + 6: 18, + 7: 17, + 8: 19, + 9: 20, + 10: 10, + 11: 8, + 12: 6, + 13: 5, + 14: 7, + 15: 9, + }, # MPII -> COCO and additional points + { + 0: 6, + 1: 8, + 2: 10, + 3: 5, + 4: 7, + 5: 9, + 6: 12, + 7: 14, + 8: 16, + 9: 11, + 10: 13, + 11: 15, + 12: 22, + 13: 21, + }, # AIC -> COCO and additional points + ], +) + +# data loaders +train_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + sampler=dict( + type="MultiSourceSampler", + batch_size=BATCH_SIZE, + source_ratio=[149813, 22246, 378352], + shuffle=True, + ), + dataset=combined_train_dataset, +) +val_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + drop_last=False, + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), + dataset=combined_val_dataset, +) +test_dataloader = val_dataloader + +# evaluators +val_evaluator = dict( + type="MultiDatasetEvaluator", + metrics=[ + dict( + type="CocoMetric", + ann_file=COCO_ROOT + "annotations/person_keypoints_val2017.json", + prefix=COCO_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + # dict(type='PCKAccuracy', + # prefix=MPII_NAME, + # ), + # dict(type='PCKAccuracy', + # prefix=AIC_NAME, + # ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME + "_det", + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + ], + datasets=combined_val_dataset["datasets"], +) +test_evaluator = val_evaluator diff --git a/mmpose/configs/MaskPose/MaskPose-s-wb-1.1.0.py b/mmpose/configs/MaskPose/MaskPose-s-wb-1.1.0.py new file mode 100644 index 0000000000000000000000000000000000000000..14bc706a045a1f39667b8d3e55d156325c7a8f56 --- /dev/null +++ b/mmpose/configs/MaskPose/MaskPose-s-wb-1.1.0.py @@ -0,0 +1,406 @@ +# Copyright (c) Miroslav Purkrabek, BMPv2. All rights reserved. + +COCO_ROOT = "/path/to/COCO/" +MPII_ROOT = "/path/to/MPII/" +AIC_ROOT = "/path/to/AIC/" +OCHUMAN_ROOT = "/path/to/OCHuman/" + +BATCH_SIZE = 128 +COCO_NAME = "COCO" +MPII_NAME = "MPII" +AIC_NAME = "AIC" +OCHUMAN_NAME = "OCHuman" + +_base_ = ["../_base_/default_runtime.py"] + +resume = True +# load_from = "models/pretrained/vitpose-s.pth" + +# runtime +train_cfg = dict(max_epochs=210, val_interval=5) + +# optimizer +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) + +optim_wrapper = dict( + optimizer=dict(type="AdamW", lr=5e-4 * BATCH_SIZE / 64, betas=(0.9, 0.999), weight_decay=0.1), + paramwise_cfg=dict( + num_layers=12, + layer_decay_rate=0.8, + custom_keys={ + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + }, + ), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), +) + +# learning policy +param_scheduler = [ + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), +] + +# automatically scaling LR based on the actual training batch size +auto_scale_lr = dict(base_batch_size=512) + +# hooks +default_hooks = dict(checkpoint=dict(save_best="{}/AP".format(COCO_NAME), rule="greater", max_keep_ckpts=1)) + +# codec settings +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) + +# model settings +model = dict( + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict( + type="mmpretrain.VisionTransformer", + arch={"embed_dims": 384, "num_layers": 12, "num_heads": 12, "feedforward_channels": 384 * 4}, + img_size=(256, 192), + patch_size=16, + qkv_bias=True, + drop_path_rate=0.1, + with_cls_token=False, + out_type="featmap", + patch_cfg=dict(padding=2), + init_cfg=None, + # init_cfg=dict( + # type='Pretrained', + # checkpoint='models/pretrained/mae_pretrain_vit_base_20230913.pth'), + ), + head=dict( + type="HeatmapHead", + in_channels=384, + out_channels=41, + deconv_out_channels=(256, 256), + deconv_kernel_sizes=(4, 4), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + flip_mode="heatmap", + shift_heatmap=False, + ), +) + +# pipelines +train_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict( + type="MaskBackground", + prob=1.0, + continue_on_failure=False, + alpha=0.25, + dilate_prob=0.5, + dilate_amount=0.1, + erode_prob=0.5, + erode_amount=0.5, + patches_computation_method="blob", # random_grid or blob + ), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), +] +val_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="MaskBackground", continue_on_failure=False, alpha=0.25), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), +] + + +coco_train_dataset = dict( + type="CocoWholeBodyDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + test_mode=False, +) +coco_val_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) +mpii_train_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +mpii_val_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +aic_train_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +aic_val_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +ochuman_val_dataset = dict( + type="OCHumanDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) + +ochuman_dt_val_dataset = dict( + type="OCHumanDetDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + bbox_file=OCHUMAN_ROOT + "../detections/rtmdet-l-ins_val.json", + filter_cfg=dict(bbox_score_thr=0.3), + pipeline=[], + test_mode=True, +) + +combined_val_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_wLimbs.py"), + datasets=[coco_val_dataset, ochuman_val_dataset, ochuman_dt_val_dataset], + pipeline=val_pipeline, + test_mode=True, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + ], +) + +combined_train_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_wLimbs.py"), + datasets=[coco_train_dataset, mpii_train_dataset, aic_train_dataset], + pipeline=train_pipeline, + test_mode=False, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, # Identity mapping for COCO as merged is based on COCO + 17: 23, + 18: 24, + 19: 25, + 20: 26, + 21: 27, + 22: 28, # Feet kpts of COCO-wb + 95: 29, + 99: 30, + 103: 31, + 107: 32, + 111: 33, # Left hand kpts of COCO-wb + 116: 34, + 120: 35, + 124: 36, + 128: 37, + 132: 38, # Right hand kpts of COCO-wb + 71: 39, + 72: 40, + }, # Mouth corners of COCO-wb + { + 0: 16, + 1: 14, + 2: 12, + 3: 11, + 4: 13, + 5: 15, + 6: 18, + 7: 17, + 8: 19, + 9: 20, + 10: 10, + 11: 8, + 12: 6, + 13: 5, + 14: 7, + 15: 9, + }, # MPII -> COCO and additional points + { + 0: 6, + 1: 8, + 2: 10, + 3: 5, + 4: 7, + 5: 9, + 6: 12, + 7: 14, + 8: 16, + 9: 11, + 10: 13, + 11: 15, + 12: 22, + 13: 21, + }, # AIC -> COCO and additional points + ], +) + +# data loaders +train_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + sampler=dict( + type="MultiSourceSampler", + batch_size=BATCH_SIZE, + source_ratio=[149813, 22246, 378352], + shuffle=True, + ), + dataset=combined_train_dataset, +) +val_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + drop_last=False, + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), + dataset=combined_val_dataset, +) +test_dataloader = val_dataloader + +# evaluators +val_evaluator = dict( + type="MultiDatasetEvaluator", + metrics=[ + dict( + type="CocoMetric", + ann_file=COCO_ROOT + "annotations/person_keypoints_val2017.json", + prefix=COCO_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + # dict(type='PCKAccuracy', + # prefix=MPII_NAME, + # ), + # dict(type='PCKAccuracy', + # prefix=AIC_NAME, + # ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME + "_det", + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + ], + datasets=combined_val_dataset["datasets"], +) +test_evaluator = val_evaluator diff --git a/mmpose/configs/MaskPose/ViTb-multi_mask.py b/mmpose/configs/MaskPose/ViTb-multi_mask.py deleted file mode 100644 index 7431a3ef78ef8bcfe21e2b0c39a2c9db93f5e6ee..0000000000000000000000000000000000000000 --- a/mmpose/configs/MaskPose/ViTb-multi_mask.py +++ /dev/null @@ -1,291 +0,0 @@ -COCO_ROOT = "path/to/COCO/" -MPII_ROOT = "path/to/MPII/" -AIC_ROOT = "path/to/AIC/" -OCHUMAN_ROOT = "path/to/OCHuman/" - -BATCH_SIZE = 64 -COCO_NAME = "COCO" -MPII_NAME = "MPII" -AIC_NAME = "AIC" -OCHUMAN_NAME = "OCHuman" - -_base_ = ['../_base_/default_runtime.py'] - -# resume = True -load_from = "work_dirs/ViTb-multi/epoch_210.pth" - -# runtime -train_cfg = dict(max_epochs=210, val_interval=5) - -# optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) - -optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4*BATCH_SIZE/64, betas=(0.9, 0.999), weight_decay=0.1), - paramwise_cfg=dict( - num_layers=12, - layer_decay_rate=0.75, - custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), - }, - ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), -) - -# learning policy -param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) -] - -# automatically scaling LR based on the actual training batch size -auto_scale_lr = dict(base_batch_size=512) - -# hooks -default_hooks = dict( - checkpoint=dict(save_best='{}/AP'.format(COCO_NAME), rule='greater', max_keep_ckpts=1)) - -# codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) - -# model settings -model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict( - type='mmpretrain.VisionTransformer', - arch='base', - img_size=(256, 192), - patch_size=16, - qkv_bias=True, - drop_path_rate=0.3, - with_cls_token=False, - out_type='featmap', - patch_cfg=dict(padding=2), - init_cfg=None, - # init_cfg=dict( - # type='Pretrained', - # checkpoint='models/pretrained/mae_pretrain_vit_base_20230913.pth'), - ), - head=dict( - type='HeatmapHead', - in_channels=768, - out_channels=21, - deconv_out_channels=(256, 256), - deconv_kernel_sizes=(4, 4), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), - test_cfg=dict( - flip_test=True, - flip_mode='heatmap', - shift_heatmap=False, - )) - -# pipelines -train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict( - type='MaskBackground', - prob=1.0, - continue_on_failure=False, - alpha=0.2, - dilate_prob=0.5, - dilate_amount=0.1, - erode_prob=0.5, - erode_amount=0.5, - ), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') -] -val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='MaskBackground', continue_on_failure=False, alpha=0.2), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') -] - -# # base dataset settings -# data_root = TRAIN_ROOT -# val_data_root = VAL_ROOT -# dataset_type = 'CocoDataset' -# data_mode = 'topdown' - -coco_train_dataset = dict( - type="CocoDataset", - data_root=COCO_ROOT, - data_mode="topdown", - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), - pipeline=[], - test_mode=False, -) -coco_val_dataset = dict( - type="CocoDataset", - data_root=COCO_ROOT, - data_mode="topdown", - ann_file="annotations/person_keypoints_val2017.json", - bbox_file=COCO_ROOT + "/detections/rtmdet-l-ins-mask.json", - filter_cfg=dict(bbox_score_thr=0.3), - data_prefix=dict(img='val2017/'), - pipeline=[], - test_mode=True, -) -mpii_train_dataset = dict( - type="MpiiDataset", - data_root=MPII_ROOT, - data_mode="topdown", - ann_file="annotations/mpii_sam_train.json", - data_prefix=dict(img='images/'), - pipeline=[], - test_mode=False, -) -mpii_val_dataset = dict( - type="MpiiDataset", - data_root=MPII_ROOT, - data_mode="topdown", - ann_file="annotations/mpii_sam_val.json", - data_prefix=dict(img='images/'), - pipeline=[], - test_mode=True, -) -aic_train_dataset = dict( - type="AicDataset", - data_root=AIC_ROOT, - data_mode="topdown", - ann_file="annotations/aic_sam_train.json", - data_prefix=dict(img='images/'), - pipeline=[], - test_mode=False, -) -aic_val_dataset = dict( - type="AicDataset", - data_root=AIC_ROOT, - data_mode="topdown", - ann_file="annotations/aic_sam_val.json", - data_prefix=dict(img='images/'), - pipeline=[], - test_mode=True, -) -ochuman_val_dataset = dict( - type="OCHumanDataset", - data_root=OCHUMAN_ROOT, - data_mode="topdown", - ann_file="annotations/person_keypoints_val2017.json", - data_prefix=dict(img='val2017/'), - # bbox_file=OCHUMAN_ROOT + "/detections/rtmdet-l-ins.json", - # filter_cfg=dict(bbox_score_thr=0.3), - pipeline=[], - test_mode=True, -) - -combined_val_dataset = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/merged_COCO_AIC_MPII.py'), - datasets=[coco_val_dataset, mpii_val_dataset, aic_val_dataset, ochuman_val_dataset], - pipeline=val_pipeline, - test_mode=True, - keypoints_mapping=[ - {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, - 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16}, # Identity mapping for COCO as merged is based on COCO - {0: 16, 1: 14, 2: 12, 3: 11, 4: 13, 5: 15, 6: 20, 7: 17, 8: 18, - 9: 19, 10: 10, 11: 8, 12: 6, 13: 5, 14: 7, 15: 9}, # MPII -> COCO and additional points - {0: 6, 1: 8, 2: 10, 3: 5, 4: 7, 5: 9, 6: 12, 7: 14, 8: 16, - 9: 11, 10: 13, 11: 15, 12: 19, 13: 17}, # AIC -> COCO and additional points - {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, - 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16}, # Identity mapping for OCHuman as merged is based on COCO - ], -) - -combined_train_dataset = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/merged_COCO_AIC_MPII.py'), - datasets=[coco_train_dataset, mpii_train_dataset, aic_train_dataset], - pipeline=train_pipeline, - test_mode=False, - keypoints_mapping=[ - {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, - 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16}, # Identity mapping for COCO as merged is based on COCO - {0: 16, 1: 14, 2: 12, 3: 11, 4: 13, 5: 15, 6: 20, 7: 17, 8: 18, - 9: 19, 10: 10, 11: 8, 12: 6, 13: 5, 14: 7, 15: 9}, # MPII -> COCO and additional points - {0: 6, 1: 8, 2: 10, 3: 5, 4: 7, 5: 9, 6: 12, 7: 14, 8: 16, - 9: 11, 10: 13, 11: 15, 12: 19, 13: 17}, # AIC -> COCO and additional points - ], -) - -# data loaders -train_dataloader = dict( - batch_size=BATCH_SIZE, - num_workers=8, - persistent_workers=True, - sampler=dict( - type='MultiSourceSampler', - batch_size=BATCH_SIZE, - source_ratio=[1, 1, 1], - shuffle=True, - ), - dataset=combined_train_dataset, -) -val_dataloader = dict( - batch_size=128, - num_workers=8, - persistent_workers=True, - drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), - dataset=combined_val_dataset, -) -test_dataloader = val_dataloader - -# evaluators -val_evaluator = dict( - type='MultiDatasetEvaluator', - metrics=[ - dict(type='CocoMetric', - ann_file=COCO_ROOT + 'annotations/person_keypoints_val2017.json', - prefix=COCO_NAME, - nms_mode='none', - outfile_prefix='COCO_MaskPose', - ignore_stats=['AP .5', 'AP .75', 'AR .5', 'AR .75', 'AR (M)', 'AR (L)'], - ), - dict(type='PCKAccuracy', - prefix=MPII_NAME, - ), - dict(type='PCKAccuracy', - prefix=AIC_NAME, - ), - dict(type='CocoMetric', - ann_file=OCHUMAN_ROOT + 'annotations/person_keypoints_val2017.json', - prefix=OCHUMAN_NAME, - outfile_prefix='ochuman', - nms_mode='none', - ignore_stats=['AP .5', 'AP .75', 'AR .5', 'AR .75', 'AR (M)', 'AR (L)'], - ), - ], - datasets=combined_val_dataset['datasets'], - ) -test_evaluator = val_evaluator diff --git a/mmpose/configs/ProbMaskPose/PMPose-b-1.0.0.py b/mmpose/configs/ProbMaskPose/PMPose-b-1.0.0.py new file mode 100644 index 0000000000000000000000000000000000000000..49768abd0a61fd58112b74394ef25493c41b75bc --- /dev/null +++ b/mmpose/configs/ProbMaskPose/PMPose-b-1.0.0.py @@ -0,0 +1,405 @@ +# Copyright (c) Miroslav Purkrabek, BMPv2. All rights reserved. + +COCO_ROOT = "/path/to/COCO/" +MPII_ROOT = "/path/to/MPII/" +AIC_ROOT = "/path/to/AIC/" +OCHUMAN_ROOT = "/path/to/OCHuman/" + + +BATCH_SIZE = 128 +INPUT_PADDING = 1.25 + +COCO_NAME = "COCO" +MPII_NAME = "MPII" +AIC_NAME = "AIC" +OCHUMAN_NAME = "OCHuman" + +_base_ = ["../_base_/default_runtime.py"] + +# resume = True +load_from = "models/trained/MaskPose-b-EQ.pth" + + +# runtime +train_cfg = dict(max_epochs=210, val_interval=1) + +# optimizer +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) + +optim_wrapper = dict( + optimizer=dict(type="AdamW", lr=5e-4 * BATCH_SIZE / 64, betas=(0.9, 0.999), weight_decay=0.1), + paramwise_cfg=dict( + num_layers=12, + layer_decay_rate=0.75, + custom_keys={ + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + }, + ), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), +) + +# learning policy +param_scheduler = [ + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), +] + +# automatically scaling LR based on the actual training batch size +auto_scale_lr = dict(base_batch_size=512) + +# hooks +default_hooks = dict(checkpoint=dict(save_best="{}/AP".format(COCO_NAME), rule="greater", max_keep_ckpts=1)) + +# codec settings +codec = dict(type="OKSArgMaxHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=-1) + +# model settings +model = dict( + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict( + type="mmpretrain.VisionTransformer", + arch="base", + img_size=(256, 192), + patch_size=16, + qkv_bias=True, + drop_path_rate=0.3, + with_cls_token=False, + out_type="featmap", + patch_cfg=dict(padding=2), + init_cfg=None, + # init_cfg=dict( + # type='Pretrained', + # checkpoint='models/pretrained/mae_pretrain_vit_base_20230913.pth'), + ), + head=dict( + type="MultiHead", + in_channels=768, + out_channels=23, + deconv_out_channels=(256, 256), + deconv_kernel_sizes=(4, 4), + keypoint_loss=dict(type="OKSHeatmapLoss", use_target_weight=True, smoothing_weight=0.05), + probability_loss=dict(type="BCELoss", use_target_weight=True, use_sigmoid=True), + visibility_loss=dict(type="BCELoss", use_target_weight=True, use_sigmoid=True), + oks_loss=dict(type="MSELoss", use_target_weight=True), + error_loss=dict(type="L1LogLoss", use_target_weight=True), + detach_probability=True, + detach_visibility=True, + normalize=1.0, + freeze_error=True, + freeze_oks=False, + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + flip_mode="heatmap", + shift_heatmap=False, + output_heatmaps=False, + ), + freeze_backbone=True, +) + +# pipelines +train_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict( + type="MaskBackground", + prob=1.0, + continue_on_failure=False, + alpha=0.25, + dilate_prob=0.5, + dilate_amount=0.1, + erode_prob=0.5, + erode_amount=0.5, + patches_computation_method="blob", # random_grid or blob + ), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="RandomEdgesBlackout", prob=0.2, mask_ratio_range=(0.1, 0.3), context_size=1.5), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True, input_padding=INPUT_PADDING), + dict(type="RandomPatchesBlackout", prob=0.1, grid_size=(8, 6), mask_ratio=0.2), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), +] +val_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="MaskBackground", continue_on_failure=False, alpha=0.25), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True, input_padding=INPUT_PADDING), + dict(type="PackPoseInputs"), +] + + +coco_train_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + test_mode=False, +) +coco_val_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) +mpii_train_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +mpii_val_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +aic_train_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +aic_val_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +ochuman_val_dataset = dict( + type="OCHumanDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) + +ochuman_dt_val_dataset = dict( + type="OCHumanDetDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + bbox_file=OCHUMAN_ROOT + "../detections/rtmdet-l-ins_val.json", + filter_cfg=dict(bbox_score_thr=0.3), + pipeline=[], + test_mode=True, +) + +combined_val_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_val_dataset, ochuman_val_dataset, ochuman_dt_val_dataset], + pipeline=val_pipeline, + test_mode=True, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + ], +) + +combined_train_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_train_dataset, mpii_train_dataset, aic_train_dataset], + pipeline=train_pipeline, + test_mode=False, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 16, + 1: 14, + 2: 12, + 3: 11, + 4: 13, + 5: 15, + 6: 18, + 7: 17, + 8: 19, + 9: 20, + 10: 10, + 11: 8, + 12: 6, + 13: 5, + 14: 7, + 15: 9, + }, # MPII -> COCO and additional points + { + 0: 6, + 1: 8, + 2: 10, + 3: 5, + 4: 7, + 5: 9, + 6: 12, + 7: 14, + 8: 16, + 9: 11, + 10: 13, + 11: 15, + 12: 22, + 13: 21, + }, # AIC -> COCO and additional points + ], +) + +# data loaders +train_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + sampler=dict( + type="MultiSourceSampler", + batch_size=BATCH_SIZE, + source_ratio=[149813, 22246, 378352], + shuffle=True, + ), + dataset=combined_train_dataset, +) +val_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + drop_last=False, + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), + dataset=combined_val_dataset, +) +test_dataloader = val_dataloader + +# evaluators +val_evaluator = dict( + type="MultiDatasetEvaluator", + metrics=[ + dict( + type="CocoMetric", + ann_file=COCO_ROOT + "annotations/person_keypoints_val2017.json", + prefix=COCO_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + # dict(type='PCKAccuracy', + # prefix=MPII_NAME, + # ), + # dict(type='PCKAccuracy', + # prefix=AIC_NAME, + # ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME + "_det", + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + ], + datasets=combined_val_dataset["datasets"], +) +test_evaluator = val_evaluator diff --git a/mmpose/configs/ProbMaskPose/PMPose-h-1.0.0.py b/mmpose/configs/ProbMaskPose/PMPose-h-1.0.0.py new file mode 100644 index 0000000000000000000000000000000000000000..6a3b57c54b1b56c5c4bbe65fb3c1fc390a93d27c --- /dev/null +++ b/mmpose/configs/ProbMaskPose/PMPose-h-1.0.0.py @@ -0,0 +1,406 @@ +# Copyright (c) Miroslav Purkrabek, BMPv2. All rights reserved. + +COCO_ROOT = "/path/to/COCO/" +MPII_ROOT = "/path/to/MPII/" +AIC_ROOT = "/path/to/AIC/" +OCHUMAN_ROOT = "/path/to/OCHuman/" + +BATCH_SIZE = 64 +INPUT_PADDING = 1.25 + +COCO_NAME = "COCO" +MPII_NAME = "MPII" +AIC_NAME = "AIC" +OCHUMAN_NAME = "OCHuman" + +_base_ = ["../_base_/default_runtime.py"] + +# resume = True +load_from = "models/trained/MaskPose-h-EQ.pth" + +# runtime +train_cfg = dict(max_epochs=210, val_interval=1) + +# optimizer +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) + +optim_wrapper = dict( + optimizer=dict(type="AdamW", lr=5e-4 * BATCH_SIZE / 64, betas=(0.9, 0.999), weight_decay=0.1), + paramwise_cfg=dict( + num_layers=32, + layer_decay_rate=0.85, + custom_keys={ + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + }, + ), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), +) + +# learning policy +param_scheduler = [ + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), +] + +# automatically scaling LR based on the actual training batch size +auto_scale_lr = dict(base_batch_size=512) + +# hooks +default_hooks = dict(checkpoint=dict(save_best="{}/AP".format(COCO_NAME), rule="greater", max_keep_ckpts=1)) + +# codec settings +codec = dict(type="OKSArgMaxHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=-1) + +# model settings +model = dict( + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict( + type="mmpretrain.VisionTransformer", + arch="huge", + img_size=(256, 192), + patch_size=16, + qkv_bias=True, + drop_path_rate=0.55, + with_cls_token=False, + out_type="featmap", + patch_cfg=dict(padding=2), + init_cfg=None, + # init_cfg=dict( + # type='Pretrained', + # checkpoint='models/pretrained/mae_pretrain_vit_huge_20230913.pth'), + ), + head=dict( + type="MultiHead", + in_channels=1280, + out_channels=23, + deconv_out_channels=(256, 256), + deconv_kernel_sizes=(4, 4), + keypoint_loss=dict(type="OKSHeatmapLoss", use_target_weight=True, smoothing_weight=0.05), + probability_loss=dict(type="BCELoss", use_target_weight=True, use_sigmoid=True), + visibility_loss=dict(type="BCELoss", use_target_weight=True, use_sigmoid=True), + oks_loss=dict(type="MSELoss", use_target_weight=True), + error_loss=dict(type="L1LogLoss", use_target_weight=True), + detach_probability=True, + detach_visibility=True, + normalize=1.0, + freeze_error=True, + freeze_oks=False, + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + flip_mode="heatmap", + shift_heatmap=False, + output_heatmaps=False, + ), + freeze_backbone=True, +) + +# pipelines +train_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict( + type="MaskBackground", + prob=1.0, + continue_on_failure=False, + alpha=0.25, + dilate_prob=0.5, + dilate_amount=0.1, + erode_prob=0.5, + erode_amount=0.5, + patches_computation_method="blob", # random_grid or blob + ), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="RandomEdgesBlackout", prob=0.2, mask_ratio_range=(0.1, 0.3), context_size=1.5), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True, input_padding=INPUT_PADDING), + dict(type="RandomPatchesBlackout", prob=0.1, grid_size=(8, 6), mask_ratio=0.2), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), +] +val_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="MaskBackground", continue_on_failure=False, alpha=0.25), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True, input_padding=INPUT_PADDING), + dict(type="PackPoseInputs"), +] + + +coco_train_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + test_mode=False, +) +coco_val_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) +mpii_train_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +mpii_val_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +aic_train_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +aic_val_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +ochuman_val_dataset = dict( + type="OCHumanDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) + +ochuman_dt_val_dataset = dict( + type="OCHumanDetDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + # bbox_file=OCHUMAN_ROOT + "/BMP_results/BMPx1/ochuman_all/rtmdet-l-ins_all.json", + # bbox_file=OCHUMAN_ROOT + "/detections/rtmdet-l-ins.json", + # bbox_file=OCHUMAN_ROOT + "/detections/rtmdet-l.json", + bbox_file=OCHUMAN_ROOT + "../detections/rtmdet-l-ins_val.json", + filter_cfg=dict(bbox_score_thr=0.3), + pipeline=[], + test_mode=True, +) + +combined_val_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_val_dataset, ochuman_val_dataset, ochuman_dt_val_dataset], + pipeline=val_pipeline, + test_mode=True, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + ], +) + +combined_train_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_train_dataset, mpii_train_dataset, aic_train_dataset], + pipeline=train_pipeline, + test_mode=False, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 16, + 1: 14, + 2: 12, + 3: 11, + 4: 13, + 5: 15, + 6: 18, + 7: 17, + 8: 19, + 9: 20, + 10: 10, + 11: 8, + 12: 6, + 13: 5, + 14: 7, + 15: 9, + }, # MPII -> COCO and additional points + { + 0: 6, + 1: 8, + 2: 10, + 3: 5, + 4: 7, + 5: 9, + 6: 12, + 7: 14, + 8: 16, + 9: 11, + 10: 13, + 11: 15, + 12: 22, + 13: 21, + }, # AIC -> COCO and additional points + ], +) + +# data loaders +train_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + sampler=dict( + type="MultiSourceSampler", + batch_size=BATCH_SIZE, + source_ratio=[149813, 22246, 378352], + shuffle=True, + ), + dataset=combined_train_dataset, +) +val_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + drop_last=False, + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), + dataset=combined_val_dataset, +) +test_dataloader = val_dataloader + +# evaluators +val_evaluator = dict( + type="MultiDatasetEvaluator", + metrics=[ + dict( + type="CocoMetric", + ann_file=COCO_ROOT + "annotations/person_keypoints_val2017.json", + prefix=COCO_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + # dict(type='PCKAccuracy', + # prefix=MPII_NAME, + # ), + # dict(type='PCKAccuracy', + # prefix=AIC_NAME, + # ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME + "_det", + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + ], + datasets=combined_val_dataset["datasets"], +) +test_evaluator = val_evaluator diff --git a/mmpose/configs/ProbMaskPose/PMPose-l-1.0.0.py b/mmpose/configs/ProbMaskPose/PMPose-l-1.0.0.py new file mode 100644 index 0000000000000000000000000000000000000000..3e2b4ee69648655471fe1dd33cb2648537f9772f --- /dev/null +++ b/mmpose/configs/ProbMaskPose/PMPose-l-1.0.0.py @@ -0,0 +1,405 @@ +# Copyright (c) Miroslav Purkrabek, BMPv2. All rights reserved. + +COCO_ROOT = "/path/to/COCO/" +MPII_ROOT = "/path/to/MPII/" +AIC_ROOT = "/path/to/AIC/" +OCHUMAN_ROOT = "/path/to/OCHuman/" + + +BATCH_SIZE = 128 +INPUT_PADDING = 1.25 + +COCO_NAME = "COCO" +MPII_NAME = "MPII" +AIC_NAME = "AIC" +OCHUMAN_NAME = "OCHuman" + +_base_ = ["../_base_/default_runtime.py"] + +# resume = True +load_from = "models/trained/MaskPose-l-EQ.pth" + + +# runtime +train_cfg = dict(max_epochs=210, val_interval=1) + +# optimizer +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) + +optim_wrapper = dict( + optimizer=dict(type="AdamW", lr=5e-4 * BATCH_SIZE / 64, betas=(0.9, 0.999), weight_decay=0.1), + paramwise_cfg=dict( + num_layers=24, + layer_decay_rate=0.8, + custom_keys={ + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + }, + ), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), +) + +# learning policy +param_scheduler = [ + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), +] + +# automatically scaling LR based on the actual training batch size +auto_scale_lr = dict(base_batch_size=512) + +# hooks +default_hooks = dict(checkpoint=dict(save_best="{}/AP".format(COCO_NAME), rule="greater", max_keep_ckpts=1)) + +# codec settings +codec = dict(type="OKSArgMaxHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=-1) + +# model settings +model = dict( + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict( + type="mmpretrain.VisionTransformer", + arch="large", + img_size=(256, 192), + patch_size=16, + qkv_bias=True, + drop_path_rate=0.5, + with_cls_token=False, + out_type="featmap", + patch_cfg=dict(padding=2), + init_cfg=None, + # init_cfg=dict( + # type='Pretrained', + # checkpoint='models/pretrained/mae_pretrain_vit_base_20230913.pth'), + ), + head=dict( + type="MultiHead", + in_channels=1024, + out_channels=23, + deconv_out_channels=(256, 256), + deconv_kernel_sizes=(4, 4), + keypoint_loss=dict(type="OKSHeatmapLoss", use_target_weight=True, smoothing_weight=0.05), + probability_loss=dict(type="BCELoss", use_target_weight=True, use_sigmoid=True), + visibility_loss=dict(type="BCELoss", use_target_weight=True, use_sigmoid=True), + oks_loss=dict(type="MSELoss", use_target_weight=True), + error_loss=dict(type="L1LogLoss", use_target_weight=True), + detach_probability=True, + detach_visibility=True, + normalize=1.0, + freeze_error=True, + freeze_oks=False, + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + flip_mode="heatmap", + shift_heatmap=False, + output_heatmaps=False, + ), + freeze_backbone=True, +) + +# pipelines +train_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict( + type="MaskBackground", + prob=1.0, + continue_on_failure=False, + alpha=0.25, + dilate_prob=0.5, + dilate_amount=0.1, + erode_prob=0.5, + erode_amount=0.5, + patches_computation_method="blob", # random_grid or blob + ), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="RandomEdgesBlackout", prob=0.2, mask_ratio_range=(0.1, 0.3), context_size=1.5), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True, input_padding=INPUT_PADDING), + dict(type="RandomPatchesBlackout", prob=0.1, grid_size=(8, 6), mask_ratio=0.2), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), +] +val_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="MaskBackground", continue_on_failure=False, alpha=0.25), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True, input_padding=INPUT_PADDING), + dict(type="PackPoseInputs"), +] + + +coco_train_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + test_mode=False, +) +coco_val_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) +mpii_train_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +mpii_val_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +aic_train_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +aic_val_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +ochuman_val_dataset = dict( + type="OCHumanDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) + +ochuman_dt_val_dataset = dict( + type="OCHumanDetDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + bbox_file=OCHUMAN_ROOT + "../detections/rtmdet-l-ins_val.json", + filter_cfg=dict(bbox_score_thr=0.3), + pipeline=[], + test_mode=True, +) + +combined_val_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_val_dataset, ochuman_val_dataset, ochuman_dt_val_dataset], + pipeline=val_pipeline, + test_mode=True, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + ], +) + +combined_train_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_train_dataset, mpii_train_dataset, aic_train_dataset], + pipeline=train_pipeline, + test_mode=False, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 16, + 1: 14, + 2: 12, + 3: 11, + 4: 13, + 5: 15, + 6: 18, + 7: 17, + 8: 19, + 9: 20, + 10: 10, + 11: 8, + 12: 6, + 13: 5, + 14: 7, + 15: 9, + }, # MPII -> COCO and additional points + { + 0: 6, + 1: 8, + 2: 10, + 3: 5, + 4: 7, + 5: 9, + 6: 12, + 7: 14, + 8: 16, + 9: 11, + 10: 13, + 11: 15, + 12: 22, + 13: 21, + }, # AIC -> COCO and additional points + ], +) + +# data loaders +train_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + sampler=dict( + type="MultiSourceSampler", + batch_size=BATCH_SIZE, + source_ratio=[149813, 22246, 378352], + shuffle=True, + ), + dataset=combined_train_dataset, +) +val_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + drop_last=False, + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), + dataset=combined_val_dataset, +) +test_dataloader = val_dataloader + +# evaluators +val_evaluator = dict( + type="MultiDatasetEvaluator", + metrics=[ + dict( + type="CocoMetric", + ann_file=COCO_ROOT + "annotations/person_keypoints_val2017.json", + prefix=COCO_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + # dict(type='PCKAccuracy', + # prefix=MPII_NAME, + # ), + # dict(type='PCKAccuracy', + # prefix=AIC_NAME, + # ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME + "_det", + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + ], + datasets=combined_val_dataset["datasets"], +) +test_evaluator = val_evaluator diff --git a/mmpose/configs/ProbMaskPose/PMPose-s-1.0.0.py b/mmpose/configs/ProbMaskPose/PMPose-s-1.0.0.py new file mode 100644 index 0000000000000000000000000000000000000000..45248b9515642f38b6c17075cdd32694f7017356 --- /dev/null +++ b/mmpose/configs/ProbMaskPose/PMPose-s-1.0.0.py @@ -0,0 +1,405 @@ +# Copyright (c) Miroslav Purkrabek, BMPv2. All rights reserved. + +COCO_ROOT = "/path/to/COCO/" +MPII_ROOT = "/path/to/MPII/" +AIC_ROOT = "/path/to/AIC/" +OCHUMAN_ROOT = "/path/to/OCHuman/" + + +BATCH_SIZE = 128 +INPUT_PADDING = 1.25 + +COCO_NAME = "COCO" +MPII_NAME = "MPII" +AIC_NAME = "AIC" +OCHUMAN_NAME = "OCHuman" + +_base_ = ["../_base_/default_runtime.py"] + +# resume = True +load_from = "models/trained/MaskPose-s-EQ.pth" + + +# runtime +train_cfg = dict(max_epochs=210, val_interval=1) + +# optimizer +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) + +optim_wrapper = dict( + optimizer=dict(type="AdamW", lr=5e-4 * BATCH_SIZE / 64, betas=(0.9, 0.999), weight_decay=0.1), + paramwise_cfg=dict( + num_layers=12, + layer_decay_rate=0.8, + custom_keys={ + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + }, + ), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), +) + +# learning policy +param_scheduler = [ + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), +] + +# automatically scaling LR based on the actual training batch size +auto_scale_lr = dict(base_batch_size=512) + +# hooks +default_hooks = dict(checkpoint=dict(save_best="{}/AP".format(COCO_NAME), rule="greater", max_keep_ckpts=1)) + +# codec settings +codec = dict(type="OKSArgMaxHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=-1) + +# model settings +model = dict( + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict( + type="mmpretrain.VisionTransformer", + arch={"embed_dims": 384, "num_layers": 12, "num_heads": 12, "feedforward_channels": 384 * 4}, + img_size=(256, 192), + patch_size=16, + qkv_bias=True, + drop_path_rate=0.1, + with_cls_token=False, + out_type="featmap", + patch_cfg=dict(padding=2), + init_cfg=None, + # init_cfg=dict( + # type='Pretrained', + # checkpoint='models/pretrained/mae_pretrain_vit_base_20230913.pth'), + ), + head=dict( + type="MultiHead", + in_channels=384, + out_channels=23, + deconv_out_channels=(256, 256), + deconv_kernel_sizes=(4, 4), + keypoint_loss=dict(type="OKSHeatmapLoss", use_target_weight=True, smoothing_weight=0.05), + probability_loss=dict(type="BCELoss", use_target_weight=True, use_sigmoid=True), + visibility_loss=dict(type="BCELoss", use_target_weight=True, use_sigmoid=True), + oks_loss=dict(type="MSELoss", use_target_weight=True), + error_loss=dict(type="L1LogLoss", use_target_weight=True), + detach_probability=True, + detach_visibility=True, + normalize=1.0, + freeze_error=True, + freeze_oks=False, + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + flip_mode="heatmap", + shift_heatmap=False, + output_heatmaps=False, + ), + freeze_backbone=True, +) + +# pipelines +train_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict( + type="MaskBackground", + prob=1.0, + continue_on_failure=False, + alpha=0.25, + dilate_prob=0.5, + dilate_amount=0.1, + erode_prob=0.5, + erode_amount=0.5, + patches_computation_method="blob", # random_grid or blob + ), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="RandomEdgesBlackout", prob=0.2, mask_ratio_range=(0.1, 0.3), context_size=1.5), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True, input_padding=INPUT_PADDING), + dict(type="RandomPatchesBlackout", prob=0.1, grid_size=(8, 6), mask_ratio=0.2), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), +] +val_pipeline = [ + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="MaskBackground", continue_on_failure=False, alpha=0.25), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True, input_padding=INPUT_PADDING), + dict(type="PackPoseInputs"), +] + + +coco_train_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + test_mode=False, +) +coco_val_dataset = dict( + type="CocoDataset", + data_root=COCO_ROOT, + data_mode="topdown", + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) +mpii_train_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +mpii_val_dataset = dict( + type="MpiiDataset", + data_root=MPII_ROOT, + data_mode="topdown", + ann_file="annotations/mpii_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +aic_train_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_train.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=False, +) +aic_val_dataset = dict( + type="AicDataset", + data_root=AIC_ROOT, + data_mode="topdown", + ann_file="annotations/aic_sam_val.json", + data_prefix=dict(img="images/"), + pipeline=[], + test_mode=True, +) +ochuman_val_dataset = dict( + type="OCHumanDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + pipeline=[], + test_mode=True, +) + +ochuman_dt_val_dataset = dict( + type="OCHumanDetDataset", + data_root=OCHUMAN_ROOT, + data_mode="topdown", + ann_file="../ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="val2017/"), + bbox_file=OCHUMAN_ROOT + "../detections/rtmdet-l-ins_val.json", + filter_cfg=dict(bbox_score_thr=0.3), + pipeline=[], + test_mode=True, +) + +combined_val_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_val_dataset, ochuman_val_dataset, ochuman_dt_val_dataset], + pipeline=val_pipeline, + test_mode=True, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for OCHuman as merged is based on COCO + ], +) + +combined_train_dataset = dict( + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py"), + datasets=[coco_train_dataset, mpii_train_dataset, aic_train_dataset], + pipeline=train_pipeline, + test_mode=False, + keypoints_mapping=[ + { + 0: 0, + 1: 1, + 2: 2, + 3: 3, + 4: 4, + 5: 5, + 6: 6, + 7: 7, + 8: 8, + 9: 9, + 10: 10, + 11: 11, + 12: 12, + 13: 13, + 14: 14, + 15: 15, + 16: 16, + }, # Identity mapping for COCO as merged is based on COCO + { + 0: 16, + 1: 14, + 2: 12, + 3: 11, + 4: 13, + 5: 15, + 6: 18, + 7: 17, + 8: 19, + 9: 20, + 10: 10, + 11: 8, + 12: 6, + 13: 5, + 14: 7, + 15: 9, + }, # MPII -> COCO and additional points + { + 0: 6, + 1: 8, + 2: 10, + 3: 5, + 4: 7, + 5: 9, + 6: 12, + 7: 14, + 8: 16, + 9: 11, + 10: 13, + 11: 15, + 12: 22, + 13: 21, + }, # AIC -> COCO and additional points + ], +) + +# data loaders +train_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + sampler=dict( + type="MultiSourceSampler", + batch_size=BATCH_SIZE, + source_ratio=[149813, 22246, 378352], + shuffle=True, + ), + dataset=combined_train_dataset, +) +val_dataloader = dict( + batch_size=BATCH_SIZE, + num_workers=8, + persistent_workers=False, + drop_last=False, + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), + dataset=combined_val_dataset, +) +test_dataloader = val_dataloader + +# evaluators +val_evaluator = dict( + type="MultiDatasetEvaluator", + metrics=[ + dict( + type="CocoMetric", + ann_file=COCO_ROOT + "annotations/person_keypoints_val2017.json", + prefix=COCO_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + # dict(type='PCKAccuracy', + # prefix=MPII_NAME, + # ), + # dict(type='PCKAccuracy', + # prefix=AIC_NAME, + # ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME, + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + dict( + type="CocoMetric", + ann_file=OCHUMAN_ROOT + "../ochuman_coco_format_val_range_0.00_1.00.json", + prefix=OCHUMAN_NAME + "_det", + extended=[False], + match_by_bbox=[False], + ignore_stats=["AP .5", "AP .75", "AR .5", "AR .75", "AR (M)", "AR (L)"], + ), + ], + datasets=combined_val_dataset["datasets"], +) +test_evaluator = val_evaluator diff --git a/mmpose/configs/_base_/datasets/300w.py b/mmpose/configs/_base_/datasets/300w.py index 2c3728da1d1555c3526ccbfca182385961e8b667..75e459a76ec85c0eca8b91bbdef113e372872ab5 100644 --- a/mmpose/configs/_base_/datasets/300w.py +++ b/mmpose/configs/_base_/datasets/300w.py @@ -1,134 +1,83 @@ dataset_info = dict( - dataset_name='300w', + dataset_name="300w", paper_info=dict( - author='Sagonas, Christos and Antonakos, Epameinondas ' - 'and Tzimiropoulos, Georgios and Zafeiriou, Stefanos ' - 'and Pantic, Maja', - title='300 faces in-the-wild challenge: ' - 'Database and results', - container='Image and vision computing', - year='2016', - homepage='https://ibug.doc.ic.ac.uk/resources/300-W/', + author="Sagonas, Christos and Antonakos, Epameinondas " "and Tzimiropoulos, Georgios and Zafeiriou, Stefanos " "and Pantic, Maja", + title="300 faces in-the-wild challenge: " "Database and results", + container="Image and vision computing", + year="2016", + homepage="https://ibug.doc.ic.ac.uk/resources/300-W/", ), keypoint_info={ - 0: dict(name='kpt-0', id=0, color=[255, 0, 0], type='', swap='kpt-16'), - 1: dict(name='kpt-1', id=1, color=[255, 0, 0], type='', swap='kpt-15'), - 2: dict(name='kpt-2', id=2, color=[255, 0, 0], type='', swap='kpt-14'), - 3: dict(name='kpt-3', id=3, color=[255, 0, 0], type='', swap='kpt-13'), - 4: dict(name='kpt-4', id=4, color=[255, 0, 0], type='', swap='kpt-12'), - 5: dict(name='kpt-5', id=5, color=[255, 0, 0], type='', swap='kpt-11'), - 6: dict(name='kpt-6', id=6, color=[255, 0, 0], type='', swap='kpt-10'), - 7: dict(name='kpt-7', id=7, color=[255, 0, 0], type='', swap='kpt-9'), - 8: dict(name='kpt-8', id=8, color=[255, 0, 0], type='', swap=''), - 9: dict(name='kpt-9', id=9, color=[255, 0, 0], type='', swap='kpt-7'), - 10: - dict(name='kpt-10', id=10, color=[255, 0, 0], type='', swap='kpt-6'), - 11: - dict(name='kpt-11', id=11, color=[255, 0, 0], type='', swap='kpt-5'), - 12: - dict(name='kpt-12', id=12, color=[255, 0, 0], type='', swap='kpt-4'), - 13: - dict(name='kpt-13', id=13, color=[255, 0, 0], type='', swap='kpt-3'), - 14: - dict(name='kpt-14', id=14, color=[255, 0, 0], type='', swap='kpt-2'), - 15: - dict(name='kpt-15', id=15, color=[255, 0, 0], type='', swap='kpt-1'), - 16: - dict(name='kpt-16', id=16, color=[255, 0, 0], type='', swap='kpt-0'), - 17: - dict(name='kpt-17', id=17, color=[255, 0, 0], type='', swap='kpt-26'), - 18: - dict(name='kpt-18', id=18, color=[255, 0, 0], type='', swap='kpt-25'), - 19: - dict(name='kpt-19', id=19, color=[255, 0, 0], type='', swap='kpt-24'), - 20: - dict(name='kpt-20', id=20, color=[255, 0, 0], type='', swap='kpt-23'), - 21: - dict(name='kpt-21', id=21, color=[255, 0, 0], type='', swap='kpt-22'), - 22: - dict(name='kpt-22', id=22, color=[255, 0, 0], type='', swap='kpt-21'), - 23: - dict(name='kpt-23', id=23, color=[255, 0, 0], type='', swap='kpt-20'), - 24: - dict(name='kpt-24', id=24, color=[255, 0, 0], type='', swap='kpt-19'), - 25: - dict(name='kpt-25', id=25, color=[255, 0, 0], type='', swap='kpt-18'), - 26: - dict(name='kpt-26', id=26, color=[255, 0, 0], type='', swap='kpt-17'), - 27: dict(name='kpt-27', id=27, color=[255, 0, 0], type='', swap=''), - 28: dict(name='kpt-28', id=28, color=[255, 0, 0], type='', swap=''), - 29: dict(name='kpt-29', id=29, color=[255, 0, 0], type='', swap=''), - 30: dict(name='kpt-30', id=30, color=[255, 0, 0], type='', swap=''), - 31: - dict(name='kpt-31', id=31, color=[255, 0, 0], type='', swap='kpt-35'), - 32: - dict(name='kpt-32', id=32, color=[255, 0, 0], type='', swap='kpt-34'), - 33: dict(name='kpt-33', id=33, color=[255, 0, 0], type='', swap=''), - 34: - dict(name='kpt-34', id=34, color=[255, 0, 0], type='', swap='kpt-32'), - 35: - dict(name='kpt-35', id=35, color=[255, 0, 0], type='', swap='kpt-31'), - 36: - dict(name='kpt-36', id=36, color=[255, 0, 0], type='', swap='kpt-45'), - 37: - dict(name='kpt-37', id=37, color=[255, 0, 0], type='', swap='kpt-44'), - 38: - dict(name='kpt-38', id=38, color=[255, 0, 0], type='', swap='kpt-43'), - 39: - dict(name='kpt-39', id=39, color=[255, 0, 0], type='', swap='kpt-42'), - 40: - dict(name='kpt-40', id=40, color=[255, 0, 0], type='', swap='kpt-47'), - 41: dict( - name='kpt-41', id=41, color=[255, 0, 0], type='', swap='kpt-46'), - 42: dict( - name='kpt-42', id=42, color=[255, 0, 0], type='', swap='kpt-39'), - 43: dict( - name='kpt-43', id=43, color=[255, 0, 0], type='', swap='kpt-38'), - 44: dict( - name='kpt-44', id=44, color=[255, 0, 0], type='', swap='kpt-37'), - 45: dict( - name='kpt-45', id=45, color=[255, 0, 0], type='', swap='kpt-36'), - 46: dict( - name='kpt-46', id=46, color=[255, 0, 0], type='', swap='kpt-41'), - 47: dict( - name='kpt-47', id=47, color=[255, 0, 0], type='', swap='kpt-40'), - 48: dict( - name='kpt-48', id=48, color=[255, 0, 0], type='', swap='kpt-54'), - 49: dict( - name='kpt-49', id=49, color=[255, 0, 0], type='', swap='kpt-53'), - 50: dict( - name='kpt-50', id=50, color=[255, 0, 0], type='', swap='kpt-52'), - 51: dict(name='kpt-51', id=51, color=[255, 0, 0], type='', swap=''), - 52: dict( - name='kpt-52', id=52, color=[255, 0, 0], type='', swap='kpt-50'), - 53: dict( - name='kpt-53', id=53, color=[255, 0, 0], type='', swap='kpt-49'), - 54: dict( - name='kpt-54', id=54, color=[255, 0, 0], type='', swap='kpt-48'), - 55: dict( - name='kpt-55', id=55, color=[255, 0, 0], type='', swap='kpt-59'), - 56: dict( - name='kpt-56', id=56, color=[255, 0, 0], type='', swap='kpt-58'), - 57: dict(name='kpt-57', id=57, color=[255, 0, 0], type='', swap=''), - 58: dict( - name='kpt-58', id=58, color=[255, 0, 0], type='', swap='kpt-56'), - 59: dict( - name='kpt-59', id=59, color=[255, 0, 0], type='', swap='kpt-55'), - 60: dict( - name='kpt-60', id=60, color=[255, 0, 0], type='', swap='kpt-64'), - 61: dict( - name='kpt-61', id=61, color=[255, 0, 0], type='', swap='kpt-63'), - 62: dict(name='kpt-62', id=62, color=[255, 0, 0], type='', swap=''), - 63: dict( - name='kpt-63', id=63, color=[255, 0, 0], type='', swap='kpt-61'), - 64: dict( - name='kpt-64', id=64, color=[255, 0, 0], type='', swap='kpt-60'), - 65: dict( - name='kpt-65', id=65, color=[255, 0, 0], type='', swap='kpt-67'), - 66: dict(name='kpt-66', id=66, color=[255, 0, 0], type='', swap=''), - 67: dict( - name='kpt-67', id=67, color=[255, 0, 0], type='', swap='kpt-65'), + 0: dict(name="kpt-0", id=0, color=[255, 0, 0], type="", swap="kpt-16"), + 1: dict(name="kpt-1", id=1, color=[255, 0, 0], type="", swap="kpt-15"), + 2: dict(name="kpt-2", id=2, color=[255, 0, 0], type="", swap="kpt-14"), + 3: dict(name="kpt-3", id=3, color=[255, 0, 0], type="", swap="kpt-13"), + 4: dict(name="kpt-4", id=4, color=[255, 0, 0], type="", swap="kpt-12"), + 5: dict(name="kpt-5", id=5, color=[255, 0, 0], type="", swap="kpt-11"), + 6: dict(name="kpt-6", id=6, color=[255, 0, 0], type="", swap="kpt-10"), + 7: dict(name="kpt-7", id=7, color=[255, 0, 0], type="", swap="kpt-9"), + 8: dict(name="kpt-8", id=8, color=[255, 0, 0], type="", swap=""), + 9: dict(name="kpt-9", id=9, color=[255, 0, 0], type="", swap="kpt-7"), + 10: dict(name="kpt-10", id=10, color=[255, 0, 0], type="", swap="kpt-6"), + 11: dict(name="kpt-11", id=11, color=[255, 0, 0], type="", swap="kpt-5"), + 12: dict(name="kpt-12", id=12, color=[255, 0, 0], type="", swap="kpt-4"), + 13: dict(name="kpt-13", id=13, color=[255, 0, 0], type="", swap="kpt-3"), + 14: dict(name="kpt-14", id=14, color=[255, 0, 0], type="", swap="kpt-2"), + 15: dict(name="kpt-15", id=15, color=[255, 0, 0], type="", swap="kpt-1"), + 16: dict(name="kpt-16", id=16, color=[255, 0, 0], type="", swap="kpt-0"), + 17: dict(name="kpt-17", id=17, color=[255, 0, 0], type="", swap="kpt-26"), + 18: dict(name="kpt-18", id=18, color=[255, 0, 0], type="", swap="kpt-25"), + 19: dict(name="kpt-19", id=19, color=[255, 0, 0], type="", swap="kpt-24"), + 20: dict(name="kpt-20", id=20, color=[255, 0, 0], type="", swap="kpt-23"), + 21: dict(name="kpt-21", id=21, color=[255, 0, 0], type="", swap="kpt-22"), + 22: dict(name="kpt-22", id=22, color=[255, 0, 0], type="", swap="kpt-21"), + 23: dict(name="kpt-23", id=23, color=[255, 0, 0], type="", swap="kpt-20"), + 24: dict(name="kpt-24", id=24, color=[255, 0, 0], type="", swap="kpt-19"), + 25: dict(name="kpt-25", id=25, color=[255, 0, 0], type="", swap="kpt-18"), + 26: dict(name="kpt-26", id=26, color=[255, 0, 0], type="", swap="kpt-17"), + 27: dict(name="kpt-27", id=27, color=[255, 0, 0], type="", swap=""), + 28: dict(name="kpt-28", id=28, color=[255, 0, 0], type="", swap=""), + 29: dict(name="kpt-29", id=29, color=[255, 0, 0], type="", swap=""), + 30: dict(name="kpt-30", id=30, color=[255, 0, 0], type="", swap=""), + 31: dict(name="kpt-31", id=31, color=[255, 0, 0], type="", swap="kpt-35"), + 32: dict(name="kpt-32", id=32, color=[255, 0, 0], type="", swap="kpt-34"), + 33: dict(name="kpt-33", id=33, color=[255, 0, 0], type="", swap=""), + 34: dict(name="kpt-34", id=34, color=[255, 0, 0], type="", swap="kpt-32"), + 35: dict(name="kpt-35", id=35, color=[255, 0, 0], type="", swap="kpt-31"), + 36: dict(name="kpt-36", id=36, color=[255, 0, 0], type="", swap="kpt-45"), + 37: dict(name="kpt-37", id=37, color=[255, 0, 0], type="", swap="kpt-44"), + 38: dict(name="kpt-38", id=38, color=[255, 0, 0], type="", swap="kpt-43"), + 39: dict(name="kpt-39", id=39, color=[255, 0, 0], type="", swap="kpt-42"), + 40: dict(name="kpt-40", id=40, color=[255, 0, 0], type="", swap="kpt-47"), + 41: dict(name="kpt-41", id=41, color=[255, 0, 0], type="", swap="kpt-46"), + 42: dict(name="kpt-42", id=42, color=[255, 0, 0], type="", swap="kpt-39"), + 43: dict(name="kpt-43", id=43, color=[255, 0, 0], type="", swap="kpt-38"), + 44: dict(name="kpt-44", id=44, color=[255, 0, 0], type="", swap="kpt-37"), + 45: dict(name="kpt-45", id=45, color=[255, 0, 0], type="", swap="kpt-36"), + 46: dict(name="kpt-46", id=46, color=[255, 0, 0], type="", swap="kpt-41"), + 47: dict(name="kpt-47", id=47, color=[255, 0, 0], type="", swap="kpt-40"), + 48: dict(name="kpt-48", id=48, color=[255, 0, 0], type="", swap="kpt-54"), + 49: dict(name="kpt-49", id=49, color=[255, 0, 0], type="", swap="kpt-53"), + 50: dict(name="kpt-50", id=50, color=[255, 0, 0], type="", swap="kpt-52"), + 51: dict(name="kpt-51", id=51, color=[255, 0, 0], type="", swap=""), + 52: dict(name="kpt-52", id=52, color=[255, 0, 0], type="", swap="kpt-50"), + 53: dict(name="kpt-53", id=53, color=[255, 0, 0], type="", swap="kpt-49"), + 54: dict(name="kpt-54", id=54, color=[255, 0, 0], type="", swap="kpt-48"), + 55: dict(name="kpt-55", id=55, color=[255, 0, 0], type="", swap="kpt-59"), + 56: dict(name="kpt-56", id=56, color=[255, 0, 0], type="", swap="kpt-58"), + 57: dict(name="kpt-57", id=57, color=[255, 0, 0], type="", swap=""), + 58: dict(name="kpt-58", id=58, color=[255, 0, 0], type="", swap="kpt-56"), + 59: dict(name="kpt-59", id=59, color=[255, 0, 0], type="", swap="kpt-55"), + 60: dict(name="kpt-60", id=60, color=[255, 0, 0], type="", swap="kpt-64"), + 61: dict(name="kpt-61", id=61, color=[255, 0, 0], type="", swap="kpt-63"), + 62: dict(name="kpt-62", id=62, color=[255, 0, 0], type="", swap=""), + 63: dict(name="kpt-63", id=63, color=[255, 0, 0], type="", swap="kpt-61"), + 64: dict(name="kpt-64", id=64, color=[255, 0, 0], type="", swap="kpt-60"), + 65: dict(name="kpt-65", id=65, color=[255, 0, 0], type="", swap="kpt-67"), + 66: dict(name="kpt-66", id=66, color=[255, 0, 0], type="", swap=""), + 67: dict(name="kpt-67", id=67, color=[255, 0, 0], type="", swap="kpt-65"), }, skeleton_info={}, - joint_weights=[1.] * 68, - sigmas=[]) + joint_weights=[1.0] * 68, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/300wlp.py b/mmpose/configs/_base_/datasets/300wlp.py index 76eb4b70b1a342c17deeb65de79c3fc99ee09f8b..56362ab701aefce1d2ae216bbc48fa0287958589 100644 --- a/mmpose/configs/_base_/datasets/300wlp.py +++ b/mmpose/configs/_base_/datasets/300wlp.py @@ -1,86 +1,83 @@ dataset_info = dict( - dataset_name='300wlp', + dataset_name="300wlp", paper_info=dict( - author='Xiangyu Zhu1, and Zhen Lei1 ' - 'and Xiaoming Liu2, and Hailin Shi1 ' - 'and Stan Z. Li1', - title='300 faces in-the-wild challenge: ' - 'Database and results', - container='Image and vision computing', - year='2016', - homepage='http://www.cbsr.ia.ac.cn/users/xiangyuzhu/' - 'projects/3DDFA/main.htm', + author="Xiangyu Zhu1, and Zhen Lei1 " "and Xiaoming Liu2, and Hailin Shi1 " "and Stan Z. Li1", + title="300 faces in-the-wild challenge: " "Database and results", + container="Image and vision computing", + year="2016", + homepage="http://www.cbsr.ia.ac.cn/users/xiangyuzhu/" "projects/3DDFA/main.htm", ), keypoint_info={ - 0: dict(name='kpt-0', id=0, color=[255, 0, 0], type='', swap=''), - 1: dict(name='kpt-1', id=1, color=[255, 0, 0], type='', swap=''), - 2: dict(name='kpt-2', id=2, color=[255, 0, 0], type='', swap=''), - 3: dict(name='kpt-3', id=3, color=[255, 0, 0], type='', swap=''), - 4: dict(name='kpt-4', id=4, color=[255, 0, 0], type='', swap=''), - 5: dict(name='kpt-5', id=5, color=[255, 0, 0], type='', swap=''), - 6: dict(name='kpt-6', id=6, color=[255, 0, 0], type='', swap=''), - 7: dict(name='kpt-7', id=7, color=[255, 0, 0], type='', swap=''), - 8: dict(name='kpt-8', id=8, color=[255, 0, 0], type='', swap=''), - 9: dict(name='kpt-9', id=9, color=[255, 0, 0], type='', swap=''), - 10: dict(name='kpt-10', id=10, color=[255, 0, 0], type='', swap=''), - 11: dict(name='kpt-11', id=11, color=[255, 0, 0], type='', swap=''), - 12: dict(name='kpt-12', id=12, color=[255, 0, 0], type='', swap=''), - 13: dict(name='kpt-13', id=13, color=[255, 0, 0], type='', swap=''), - 14: dict(name='kpt-14', id=14, color=[255, 0, 0], type='', swap=''), - 15: dict(name='kpt-15', id=15, color=[255, 0, 0], type='', swap=''), - 16: dict(name='kpt-16', id=16, color=[255, 0, 0], type='', swap=''), - 17: dict(name='kpt-17', id=17, color=[255, 0, 0], type='', swap=''), - 18: dict(name='kpt-18', id=18, color=[255, 0, 0], type='', swap=''), - 19: dict(name='kpt-19', id=19, color=[255, 0, 0], type='', swap=''), - 20: dict(name='kpt-20', id=20, color=[255, 0, 0], type='', swap=''), - 21: dict(name='kpt-21', id=21, color=[255, 0, 0], type='', swap=''), - 22: dict(name='kpt-22', id=22, color=[255, 0, 0], type='', swap=''), - 23: dict(name='kpt-23', id=23, color=[255, 0, 0], type='', swap=''), - 24: dict(name='kpt-24', id=24, color=[255, 0, 0], type='', swap=''), - 25: dict(name='kpt-25', id=25, color=[255, 0, 0], type='', swap=''), - 26: dict(name='kpt-26', id=26, color=[255, 0, 0], type='', swap=''), - 27: dict(name='kpt-27', id=27, color=[255, 0, 0], type='', swap=''), - 28: dict(name='kpt-28', id=28, color=[255, 0, 0], type='', swap=''), - 29: dict(name='kpt-29', id=29, color=[255, 0, 0], type='', swap=''), - 30: dict(name='kpt-30', id=30, color=[255, 0, 0], type='', swap=''), - 31: dict(name='kpt-31', id=31, color=[255, 0, 0], type='', swap=''), - 32: dict(name='kpt-32', id=32, color=[255, 0, 0], type='', swap=''), - 33: dict(name='kpt-33', id=33, color=[255, 0, 0], type='', swap=''), - 34: dict(name='kpt-34', id=34, color=[255, 0, 0], type='', swap=''), - 35: dict(name='kpt-35', id=35, color=[255, 0, 0], type='', swap=''), - 36: dict(name='kpt-36', id=36, color=[255, 0, 0], type='', swap=''), - 37: dict(name='kpt-37', id=37, color=[255, 0, 0], type='', swap=''), - 38: dict(name='kpt-38', id=38, color=[255, 0, 0], type='', swap=''), - 39: dict(name='kpt-39', id=39, color=[255, 0, 0], type='', swap=''), - 40: dict(name='kpt-40', id=40, color=[255, 0, 0], type='', swap=''), - 41: dict(name='kpt-41', id=41, color=[255, 0, 0], type='', swap=''), - 42: dict(name='kpt-42', id=42, color=[255, 0, 0], type='', swap=''), - 43: dict(name='kpt-43', id=43, color=[255, 0, 0], type='', swap=''), - 44: dict(name='kpt-44', id=44, color=[255, 0, 0], type='', swap=''), - 45: dict(name='kpt-45', id=45, color=[255, 0, 0], type='', swap=''), - 46: dict(name='kpt-46', id=46, color=[255, 0, 0], type='', swap=''), - 47: dict(name='kpt-47', id=47, color=[255, 0, 0], type='', swap=''), - 48: dict(name='kpt-48', id=48, color=[255, 0, 0], type='', swap=''), - 49: dict(name='kpt-49', id=49, color=[255, 0, 0], type='', swap=''), - 50: dict(name='kpt-50', id=50, color=[255, 0, 0], type='', swap=''), - 51: dict(name='kpt-51', id=51, color=[255, 0, 0], type='', swap=''), - 52: dict(name='kpt-52', id=52, color=[255, 0, 0], type='', swap=''), - 53: dict(name='kpt-53', id=53, color=[255, 0, 0], type='', swap=''), - 54: dict(name='kpt-54', id=54, color=[255, 0, 0], type='', swap=''), - 55: dict(name='kpt-55', id=55, color=[255, 0, 0], type='', swap=''), - 56: dict(name='kpt-56', id=56, color=[255, 0, 0], type='', swap=''), - 57: dict(name='kpt-57', id=57, color=[255, 0, 0], type='', swap=''), - 58: dict(name='kpt-58', id=58, color=[255, 0, 0], type='', swap=''), - 59: dict(name='kpt-59', id=59, color=[255, 0, 0], type='', swap=''), - 60: dict(name='kpt-60', id=60, color=[255, 0, 0], type='', swap=''), - 61: dict(name='kpt-61', id=61, color=[255, 0, 0], type='', swap=''), - 62: dict(name='kpt-62', id=62, color=[255, 0, 0], type='', swap=''), - 63: dict(name='kpt-63', id=63, color=[255, 0, 0], type='', swap=''), - 64: dict(name='kpt-64', id=64, color=[255, 0, 0], type='', swap=''), - 65: dict(name='kpt-65', id=65, color=[255, 0, 0], type='', swap=''), - 66: dict(name='kpt-66', id=66, color=[255, 0, 0], type='', swap=''), - 67: dict(name='kpt-67', id=67, color=[255, 0, 0], type='', swap=''), + 0: dict(name="kpt-0", id=0, color=[255, 0, 0], type="", swap=""), + 1: dict(name="kpt-1", id=1, color=[255, 0, 0], type="", swap=""), + 2: dict(name="kpt-2", id=2, color=[255, 0, 0], type="", swap=""), + 3: dict(name="kpt-3", id=3, color=[255, 0, 0], type="", swap=""), + 4: dict(name="kpt-4", id=4, color=[255, 0, 0], type="", swap=""), + 5: dict(name="kpt-5", id=5, color=[255, 0, 0], type="", swap=""), + 6: dict(name="kpt-6", id=6, color=[255, 0, 0], type="", swap=""), + 7: dict(name="kpt-7", id=7, color=[255, 0, 0], type="", swap=""), + 8: dict(name="kpt-8", id=8, color=[255, 0, 0], type="", swap=""), + 9: dict(name="kpt-9", id=9, color=[255, 0, 0], type="", swap=""), + 10: dict(name="kpt-10", id=10, color=[255, 0, 0], type="", swap=""), + 11: dict(name="kpt-11", id=11, color=[255, 0, 0], type="", swap=""), + 12: dict(name="kpt-12", id=12, color=[255, 0, 0], type="", swap=""), + 13: dict(name="kpt-13", id=13, color=[255, 0, 0], type="", swap=""), + 14: dict(name="kpt-14", id=14, color=[255, 0, 0], type="", swap=""), + 15: dict(name="kpt-15", id=15, color=[255, 0, 0], type="", swap=""), + 16: dict(name="kpt-16", id=16, color=[255, 0, 0], type="", swap=""), + 17: dict(name="kpt-17", id=17, color=[255, 0, 0], type="", swap=""), + 18: dict(name="kpt-18", id=18, color=[255, 0, 0], type="", swap=""), + 19: dict(name="kpt-19", id=19, color=[255, 0, 0], type="", swap=""), + 20: dict(name="kpt-20", id=20, color=[255, 0, 0], type="", swap=""), + 21: dict(name="kpt-21", id=21, color=[255, 0, 0], type="", swap=""), + 22: dict(name="kpt-22", id=22, color=[255, 0, 0], type="", swap=""), + 23: dict(name="kpt-23", id=23, color=[255, 0, 0], type="", swap=""), + 24: dict(name="kpt-24", id=24, color=[255, 0, 0], type="", swap=""), + 25: dict(name="kpt-25", id=25, color=[255, 0, 0], type="", swap=""), + 26: dict(name="kpt-26", id=26, color=[255, 0, 0], type="", swap=""), + 27: dict(name="kpt-27", id=27, color=[255, 0, 0], type="", swap=""), + 28: dict(name="kpt-28", id=28, color=[255, 0, 0], type="", swap=""), + 29: dict(name="kpt-29", id=29, color=[255, 0, 0], type="", swap=""), + 30: dict(name="kpt-30", id=30, color=[255, 0, 0], type="", swap=""), + 31: dict(name="kpt-31", id=31, color=[255, 0, 0], type="", swap=""), + 32: dict(name="kpt-32", id=32, color=[255, 0, 0], type="", swap=""), + 33: dict(name="kpt-33", id=33, color=[255, 0, 0], type="", swap=""), + 34: dict(name="kpt-34", id=34, color=[255, 0, 0], type="", swap=""), + 35: dict(name="kpt-35", id=35, color=[255, 0, 0], type="", swap=""), + 36: dict(name="kpt-36", id=36, color=[255, 0, 0], type="", swap=""), + 37: dict(name="kpt-37", id=37, color=[255, 0, 0], type="", swap=""), + 38: dict(name="kpt-38", id=38, color=[255, 0, 0], type="", swap=""), + 39: dict(name="kpt-39", id=39, color=[255, 0, 0], type="", swap=""), + 40: dict(name="kpt-40", id=40, color=[255, 0, 0], type="", swap=""), + 41: dict(name="kpt-41", id=41, color=[255, 0, 0], type="", swap=""), + 42: dict(name="kpt-42", id=42, color=[255, 0, 0], type="", swap=""), + 43: dict(name="kpt-43", id=43, color=[255, 0, 0], type="", swap=""), + 44: dict(name="kpt-44", id=44, color=[255, 0, 0], type="", swap=""), + 45: dict(name="kpt-45", id=45, color=[255, 0, 0], type="", swap=""), + 46: dict(name="kpt-46", id=46, color=[255, 0, 0], type="", swap=""), + 47: dict(name="kpt-47", id=47, color=[255, 0, 0], type="", swap=""), + 48: dict(name="kpt-48", id=48, color=[255, 0, 0], type="", swap=""), + 49: dict(name="kpt-49", id=49, color=[255, 0, 0], type="", swap=""), + 50: dict(name="kpt-50", id=50, color=[255, 0, 0], type="", swap=""), + 51: dict(name="kpt-51", id=51, color=[255, 0, 0], type="", swap=""), + 52: dict(name="kpt-52", id=52, color=[255, 0, 0], type="", swap=""), + 53: dict(name="kpt-53", id=53, color=[255, 0, 0], type="", swap=""), + 54: dict(name="kpt-54", id=54, color=[255, 0, 0], type="", swap=""), + 55: dict(name="kpt-55", id=55, color=[255, 0, 0], type="", swap=""), + 56: dict(name="kpt-56", id=56, color=[255, 0, 0], type="", swap=""), + 57: dict(name="kpt-57", id=57, color=[255, 0, 0], type="", swap=""), + 58: dict(name="kpt-58", id=58, color=[255, 0, 0], type="", swap=""), + 59: dict(name="kpt-59", id=59, color=[255, 0, 0], type="", swap=""), + 60: dict(name="kpt-60", id=60, color=[255, 0, 0], type="", swap=""), + 61: dict(name="kpt-61", id=61, color=[255, 0, 0], type="", swap=""), + 62: dict(name="kpt-62", id=62, color=[255, 0, 0], type="", swap=""), + 63: dict(name="kpt-63", id=63, color=[255, 0, 0], type="", swap=""), + 64: dict(name="kpt-64", id=64, color=[255, 0, 0], type="", swap=""), + 65: dict(name="kpt-65", id=65, color=[255, 0, 0], type="", swap=""), + 66: dict(name="kpt-66", id=66, color=[255, 0, 0], type="", swap=""), + 67: dict(name="kpt-67", id=67, color=[255, 0, 0], type="", swap=""), }, skeleton_info={}, - joint_weights=[1.] * 68, - sigmas=[]) + joint_weights=[1.0] * 68, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/aflw.py b/mmpose/configs/_base_/datasets/aflw.py index cf5e10964da700415f3613ca43a0755f5015d8f0..cd7db63311c465e6c48fbb96f8bf8775d8fe9a60 100644 --- a/mmpose/configs/_base_/datasets/aflw.py +++ b/mmpose/configs/_base_/datasets/aflw.py @@ -1,44 +1,34 @@ dataset_info = dict( - dataset_name='aflw', + dataset_name="aflw", paper_info=dict( - author='Koestinger, Martin and Wohlhart, Paul and ' - 'Roth, Peter M and Bischof, Horst', - title='Annotated facial landmarks in the wild: ' - 'A large-scale, real-world database for facial ' - 'landmark localization', - container='2011 IEEE international conference on computer ' - 'vision workshops (ICCV workshops)', - year='2011', - homepage='https://www.tugraz.at/institute/icg/research/' - 'team-bischof/lrs/downloads/aflw/', + author="Koestinger, Martin and Wohlhart, Paul and " "Roth, Peter M and Bischof, Horst", + title="Annotated facial landmarks in the wild: " "A large-scale, real-world database for facial " "landmark localization", + container="2011 IEEE international conference on computer " "vision workshops (ICCV workshops)", + year="2011", + homepage="https://www.tugraz.at/institute/icg/research/" "team-bischof/lrs/downloads/aflw/", ), keypoint_info={ - 0: dict(name='kpt-0', id=0, color=[255, 0, 0], type='', swap='kpt-5'), - 1: dict(name='kpt-1', id=1, color=[255, 0, 0], type='', swap='kpt-4'), - 2: dict(name='kpt-2', id=2, color=[255, 0, 0], type='', swap='kpt-3'), - 3: dict(name='kpt-3', id=3, color=[255, 0, 0], type='', swap='kpt-2'), - 4: dict(name='kpt-4', id=4, color=[255, 0, 0], type='', swap='kpt-1'), - 5: dict(name='kpt-5', id=5, color=[255, 0, 0], type='', swap='kpt-0'), - 6: dict(name='kpt-6', id=6, color=[255, 0, 0], type='', swap='kpt-11'), - 7: dict(name='kpt-7', id=7, color=[255, 0, 0], type='', swap='kpt-10'), - 8: dict(name='kpt-8', id=8, color=[255, 0, 0], type='', swap='kpt-9'), - 9: dict(name='kpt-9', id=9, color=[255, 0, 0], type='', swap='kpt-8'), - 10: - dict(name='kpt-10', id=10, color=[255, 0, 0], type='', swap='kpt-7'), - 11: - dict(name='kpt-11', id=11, color=[255, 0, 0], type='', swap='kpt-6'), - 12: - dict(name='kpt-12', id=12, color=[255, 0, 0], type='', swap='kpt-14'), - 13: dict(name='kpt-13', id=13, color=[255, 0, 0], type='', swap=''), - 14: - dict(name='kpt-14', id=14, color=[255, 0, 0], type='', swap='kpt-12'), - 15: - dict(name='kpt-15', id=15, color=[255, 0, 0], type='', swap='kpt-17'), - 16: dict(name='kpt-16', id=16, color=[255, 0, 0], type='', swap=''), - 17: - dict(name='kpt-17', id=17, color=[255, 0, 0], type='', swap='kpt-15'), - 18: dict(name='kpt-18', id=18, color=[255, 0, 0], type='', swap='') + 0: dict(name="kpt-0", id=0, color=[255, 0, 0], type="", swap="kpt-5"), + 1: dict(name="kpt-1", id=1, color=[255, 0, 0], type="", swap="kpt-4"), + 2: dict(name="kpt-2", id=2, color=[255, 0, 0], type="", swap="kpt-3"), + 3: dict(name="kpt-3", id=3, color=[255, 0, 0], type="", swap="kpt-2"), + 4: dict(name="kpt-4", id=4, color=[255, 0, 0], type="", swap="kpt-1"), + 5: dict(name="kpt-5", id=5, color=[255, 0, 0], type="", swap="kpt-0"), + 6: dict(name="kpt-6", id=6, color=[255, 0, 0], type="", swap="kpt-11"), + 7: dict(name="kpt-7", id=7, color=[255, 0, 0], type="", swap="kpt-10"), + 8: dict(name="kpt-8", id=8, color=[255, 0, 0], type="", swap="kpt-9"), + 9: dict(name="kpt-9", id=9, color=[255, 0, 0], type="", swap="kpt-8"), + 10: dict(name="kpt-10", id=10, color=[255, 0, 0], type="", swap="kpt-7"), + 11: dict(name="kpt-11", id=11, color=[255, 0, 0], type="", swap="kpt-6"), + 12: dict(name="kpt-12", id=12, color=[255, 0, 0], type="", swap="kpt-14"), + 13: dict(name="kpt-13", id=13, color=[255, 0, 0], type="", swap=""), + 14: dict(name="kpt-14", id=14, color=[255, 0, 0], type="", swap="kpt-12"), + 15: dict(name="kpt-15", id=15, color=[255, 0, 0], type="", swap="kpt-17"), + 16: dict(name="kpt-16", id=16, color=[255, 0, 0], type="", swap=""), + 17: dict(name="kpt-17", id=17, color=[255, 0, 0], type="", swap="kpt-15"), + 18: dict(name="kpt-18", id=18, color=[255, 0, 0], type="", swap=""), }, skeleton_info={}, - joint_weights=[1.] * 19, - sigmas=[]) + joint_weights=[1.0] * 19, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/aic.py b/mmpose/configs/_base_/datasets/aic.py index 9ecdbe3f0afeb19dbb7aed42653ce5efd85cfda3..b89ea6aa47dfb6e4570c2e0eb676ff0c7902952b 100644 --- a/mmpose/configs/_base_/datasets/aic.py +++ b/mmpose/configs/_base_/datasets/aic.py @@ -1,140 +1,65 @@ dataset_info = dict( - dataset_name='aic', + dataset_name="aic", paper_info=dict( - author='Wu, Jiahong and Zheng, He and Zhao, Bo and ' - 'Li, Yixin and Yan, Baoming and Liang, Rui and ' - 'Wang, Wenjia and Zhou, Shipei and Lin, Guosen and ' - 'Fu, Yanwei and others', - title='Ai challenger: A large-scale dataset for going ' - 'deeper in image understanding', - container='arXiv', - year='2017', - homepage='https://github.com/AIChallenger/AI_Challenger_2017', + author="Wu, Jiahong and Zheng, He and Zhao, Bo and " + "Li, Yixin and Yan, Baoming and Liang, Rui and " + "Wang, Wenjia and Zhou, Shipei and Lin, Guosen and " + "Fu, Yanwei and others", + title="Ai challenger: A large-scale dataset for going " "deeper in image understanding", + container="arXiv", + year="2017", + homepage="https://github.com/AIChallenger/AI_Challenger_2017", ), keypoint_info={ - 0: - dict( - name='right_shoulder', - id=0, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 1: - dict( - name='right_elbow', - id=1, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 2: - dict( - name='right_wrist', - id=2, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 3: - dict( - name='left_shoulder', - id=3, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 4: - dict( - name='left_elbow', - id=4, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 5: - dict( - name='left_wrist', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 6: - dict( - name='right_hip', - id=6, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 7: - dict( - name='right_knee', - id=7, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 8: - dict( - name='right_ankle', - id=8, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 9: - dict( - name='left_hip', - id=9, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 10: - dict( - name='left_knee', - id=10, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 11: - dict( - name='left_ankle', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 12: - dict( - name='head_top', - id=12, - color=[51, 153, 255], - type='upper', - swap=''), - 13: - dict(name='neck', id=13, color=[51, 153, 255], type='upper', swap='') + 0: dict(name="right_shoulder", id=0, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 1: dict(name="right_elbow", id=1, color=[255, 128, 0], type="upper", swap="left_elbow"), + 2: dict(name="right_wrist", id=2, color=[255, 128, 0], type="upper", swap="left_wrist"), + 3: dict(name="left_shoulder", id=3, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 4: dict(name="left_elbow", id=4, color=[0, 255, 0], type="upper", swap="right_elbow"), + 5: dict(name="left_wrist", id=5, color=[0, 255, 0], type="upper", swap="right_wrist"), + 6: dict(name="right_hip", id=6, color=[255, 128, 0], type="lower", swap="left_hip"), + 7: dict(name="right_knee", id=7, color=[255, 128, 0], type="lower", swap="left_knee"), + 8: dict(name="right_ankle", id=8, color=[255, 128, 0], type="lower", swap="left_ankle"), + 9: dict(name="left_hip", id=9, color=[0, 255, 0], type="lower", swap="right_hip"), + 10: dict(name="left_knee", id=10, color=[0, 255, 0], type="lower", swap="right_knee"), + 11: dict(name="left_ankle", id=11, color=[0, 255, 0], type="lower", swap="right_ankle"), + 12: dict(name="head_top", id=12, color=[51, 153, 255], type="upper", swap=""), + 13: dict(name="neck", id=13, color=[51, 153, 255], type="upper", swap=""), }, skeleton_info={ - 0: - dict(link=('right_wrist', 'right_elbow'), id=0, color=[255, 128, 0]), - 1: dict( - link=('right_elbow', 'right_shoulder'), id=1, color=[255, 128, 0]), - 2: dict(link=('right_shoulder', 'neck'), id=2, color=[51, 153, 255]), - 3: dict(link=('neck', 'left_shoulder'), id=3, color=[51, 153, 255]), - 4: dict(link=('left_shoulder', 'left_elbow'), id=4, color=[0, 255, 0]), - 5: dict(link=('left_elbow', 'left_wrist'), id=5, color=[0, 255, 0]), - 6: dict(link=('right_ankle', 'right_knee'), id=6, color=[255, 128, 0]), - 7: dict(link=('right_knee', 'right_hip'), id=7, color=[255, 128, 0]), - 8: dict(link=('right_hip', 'left_hip'), id=8, color=[51, 153, 255]), - 9: dict(link=('left_hip', 'left_knee'), id=9, color=[0, 255, 0]), - 10: dict(link=('left_knee', 'left_ankle'), id=10, color=[0, 255, 0]), - 11: dict(link=('head_top', 'neck'), id=11, color=[51, 153, 255]), - 12: dict( - link=('right_shoulder', 'right_hip'), id=12, color=[51, 153, 255]), - 13: - dict(link=('left_shoulder', 'left_hip'), id=13, color=[51, 153, 255]) + 0: dict(link=("right_wrist", "right_elbow"), id=0, color=[255, 128, 0]), + 1: dict(link=("right_elbow", "right_shoulder"), id=1, color=[255, 128, 0]), + 2: dict(link=("right_shoulder", "neck"), id=2, color=[51, 153, 255]), + 3: dict(link=("neck", "left_shoulder"), id=3, color=[51, 153, 255]), + 4: dict(link=("left_shoulder", "left_elbow"), id=4, color=[0, 255, 0]), + 5: dict(link=("left_elbow", "left_wrist"), id=5, color=[0, 255, 0]), + 6: dict(link=("right_ankle", "right_knee"), id=6, color=[255, 128, 0]), + 7: dict(link=("right_knee", "right_hip"), id=7, color=[255, 128, 0]), + 8: dict(link=("right_hip", "left_hip"), id=8, color=[51, 153, 255]), + 9: dict(link=("left_hip", "left_knee"), id=9, color=[0, 255, 0]), + 10: dict(link=("left_knee", "left_ankle"), id=10, color=[0, 255, 0]), + 11: dict(link=("head_top", "neck"), id=11, color=[51, 153, 255]), + 12: dict(link=("right_shoulder", "right_hip"), id=12, color=[51, 153, 255]), + 13: dict(link=("left_shoulder", "left_hip"), id=13, color=[51, 153, 255]), }, - joint_weights=[ - 1., 1.2, 1.5, 1., 1.2, 1.5, 1., 1.2, 1.5, 1., 1.2, 1.5, 1., 1. - ], - + joint_weights=[1.0, 1.2, 1.5, 1.0, 1.2, 1.5, 1.0, 1.2, 1.5, 1.0, 1.2, 1.5, 1.0, 1.0], # 'https://github.com/AIChallenger/AI_Challenger_2017/blob/master/' # 'Evaluation/keypoint_eval/keypoint_eval.py#L50' # delta = 2 x sigma sigmas=[ - 0.01388152, 0.01515228, 0.01057665, 0.01417709, 0.01497891, 0.01402144, - 0.03909642, 0.03686941, 0.01981803, 0.03843971, 0.03412318, 0.02415081, - 0.01291456, 0.01236173 - ]) + 0.01388152, + 0.01515228, + 0.01057665, + 0.01417709, + 0.01497891, + 0.01402144, + 0.03909642, + 0.03686941, + 0.01981803, + 0.03843971, + 0.03412318, + 0.02415081, + 0.01291456, + 0.01236173, + ], +) diff --git a/mmpose/configs/_base_/datasets/ak.py b/mmpose/configs/_base_/datasets/ak.py index e8b12f5a3125a7eec549a483d70077361f215205..fbfd66e03d7d74c9d032762997f8a59185bfb491 100644 --- a/mmpose/configs/_base_/datasets/ak.py +++ b/mmpose/configs/_base_/datasets/ak.py @@ -1,267 +1,89 @@ dataset_info = dict( - dataset_name='Animal Kingdom', + dataset_name="Animal Kingdom", paper_info=dict( - author='Singapore University of Technology and Design, Singapore.' - ' Xun Long Ng, Kian Eng Ong, Qichen Zheng,' - ' Yun Ni, Si Yong Yeo, Jun Liu.', - title='Animal Kingdom: ' - 'A Large and Diverse Dataset for Animal Behavior Understanding', - container='Conference on Computer Vision ' - 'and Pattern Recognition (CVPR)', - year='2022', - homepage='https://sutdcv.github.io/Animal-Kingdom', - version='1.0 (2022-06)', - date_created='2022-06', + author="Singapore University of Technology and Design, Singapore." + " Xun Long Ng, Kian Eng Ong, Qichen Zheng," + " Yun Ni, Si Yong Yeo, Jun Liu.", + title="Animal Kingdom: " "A Large and Diverse Dataset for Animal Behavior Understanding", + container="Conference on Computer Vision " "and Pattern Recognition (CVPR)", + year="2022", + homepage="https://sutdcv.github.io/Animal-Kingdom", + version="1.0 (2022-06)", + date_created="2022-06", ), keypoint_info={ - 0: - dict( - name='Head_Mid_Top', - id=0, - color=(225, 0, 255), - type='upper', - swap=''), - 1: - dict( - name='Eye_Left', - id=1, - color=[220, 20, 60], - type='upper', - swap='Eye_Right'), - 2: - dict( - name='Eye_Right', - id=2, - color=[0, 255, 255], - type='upper', - swap='Eye_Left'), - 3: - dict( - name='Mouth_Front_Top', - id=3, - color=(0, 255, 42), - type='upper', - swap=''), - 4: - dict( - name='Mouth_Back_Left', - id=4, - color=[221, 160, 221], - type='upper', - swap='Mouth_Back_Right'), - 5: - dict( - name='Mouth_Back_Right', - id=5, - color=[135, 206, 250], - type='upper', - swap='Mouth_Back_Left'), - 6: - dict( - name='Mouth_Front_Bottom', - id=6, - color=[50, 205, 50], - type='upper', - swap=''), - 7: - dict( - name='Shoulder_Left', - id=7, - color=[255, 182, 193], - type='upper', - swap='Shoulder_Right'), - 8: - dict( - name='Shoulder_Right', - id=8, - color=[0, 191, 255], - type='upper', - swap='Shoulder_Left'), - 9: - dict( - name='Elbow_Left', - id=9, - color=[255, 105, 180], - type='upper', - swap='Elbow_Right'), - 10: - dict( - name='Elbow_Right', - id=10, - color=[30, 144, 255], - type='upper', - swap='Elbow_Left'), - 11: - dict( - name='Wrist_Left', - id=11, - color=[255, 20, 147], - type='upper', - swap='Wrist_Right'), - 12: - dict( - name='Wrist_Right', - id=12, - color=[0, 0, 255], - type='upper', - swap='Wrist_Left'), - 13: - dict( - name='Torso_Mid_Back', - id=13, - color=(185, 3, 221), - type='upper', - swap=''), - 14: - dict( - name='Hip_Left', - id=14, - color=[255, 215, 0], - type='lower', - swap='Hip_Right'), - 15: - dict( - name='Hip_Right', - id=15, - color=[147, 112, 219], - type='lower', - swap='Hip_Left'), - 16: - dict( - name='Knee_Left', - id=16, - color=[255, 165, 0], - type='lower', - swap='Knee_Right'), - 17: - dict( - name='Knee_Right', - id=17, - color=[138, 43, 226], - type='lower', - swap='Knee_Left'), - 18: - dict( - name='Ankle_Left', - id=18, - color=[255, 140, 0], - type='lower', - swap='Ankle_Right'), - 19: - dict( - name='Ankle_Right', - id=19, - color=[128, 0, 128], - type='lower', - swap='Ankle_Left'), - 20: - dict( - name='Tail_Top_Back', - id=20, - color=(0, 251, 255), - type='lower', - swap=''), - 21: - dict( - name='Tail_Mid_Back', - id=21, - color=[32, 178, 170], - type='lower', - swap=''), - 22: - dict( - name='Tail_End_Back', - id=22, - color=(0, 102, 102), - type='lower', - swap='') + 0: dict(name="Head_Mid_Top", id=0, color=(225, 0, 255), type="upper", swap=""), + 1: dict(name="Eye_Left", id=1, color=[220, 20, 60], type="upper", swap="Eye_Right"), + 2: dict(name="Eye_Right", id=2, color=[0, 255, 255], type="upper", swap="Eye_Left"), + 3: dict(name="Mouth_Front_Top", id=3, color=(0, 255, 42), type="upper", swap=""), + 4: dict(name="Mouth_Back_Left", id=4, color=[221, 160, 221], type="upper", swap="Mouth_Back_Right"), + 5: dict(name="Mouth_Back_Right", id=5, color=[135, 206, 250], type="upper", swap="Mouth_Back_Left"), + 6: dict(name="Mouth_Front_Bottom", id=6, color=[50, 205, 50], type="upper", swap=""), + 7: dict(name="Shoulder_Left", id=7, color=[255, 182, 193], type="upper", swap="Shoulder_Right"), + 8: dict(name="Shoulder_Right", id=8, color=[0, 191, 255], type="upper", swap="Shoulder_Left"), + 9: dict(name="Elbow_Left", id=9, color=[255, 105, 180], type="upper", swap="Elbow_Right"), + 10: dict(name="Elbow_Right", id=10, color=[30, 144, 255], type="upper", swap="Elbow_Left"), + 11: dict(name="Wrist_Left", id=11, color=[255, 20, 147], type="upper", swap="Wrist_Right"), + 12: dict(name="Wrist_Right", id=12, color=[0, 0, 255], type="upper", swap="Wrist_Left"), + 13: dict(name="Torso_Mid_Back", id=13, color=(185, 3, 221), type="upper", swap=""), + 14: dict(name="Hip_Left", id=14, color=[255, 215, 0], type="lower", swap="Hip_Right"), + 15: dict(name="Hip_Right", id=15, color=[147, 112, 219], type="lower", swap="Hip_Left"), + 16: dict(name="Knee_Left", id=16, color=[255, 165, 0], type="lower", swap="Knee_Right"), + 17: dict(name="Knee_Right", id=17, color=[138, 43, 226], type="lower", swap="Knee_Left"), + 18: dict(name="Ankle_Left", id=18, color=[255, 140, 0], type="lower", swap="Ankle_Right"), + 19: dict(name="Ankle_Right", id=19, color=[128, 0, 128], type="lower", swap="Ankle_Left"), + 20: dict(name="Tail_Top_Back", id=20, color=(0, 251, 255), type="lower", swap=""), + 21: dict(name="Tail_Mid_Back", id=21, color=[32, 178, 170], type="lower", swap=""), + 22: dict(name="Tail_End_Back", id=22, color=(0, 102, 102), type="lower", swap=""), }, skeleton_info={ - 0: - dict(link=('Eye_Left', 'Head_Mid_Top'), id=0, color=[220, 20, 60]), - 1: - dict(link=('Eye_Right', 'Head_Mid_Top'), id=1, color=[0, 255, 255]), - 2: - dict( - link=('Mouth_Front_Top', 'Mouth_Back_Left'), - id=2, - color=[221, 160, 221]), - 3: - dict( - link=('Mouth_Front_Top', 'Mouth_Back_Right'), - id=3, - color=[135, 206, 250]), - 4: - dict( - link=('Mouth_Front_Bottom', 'Mouth_Back_Left'), - id=4, - color=[221, 160, 221]), - 5: - dict( - link=('Mouth_Front_Bottom', 'Mouth_Back_Right'), - id=5, - color=[135, 206, 250]), - 6: - dict( - link=('Head_Mid_Top', 'Torso_Mid_Back'), id=6, - color=(225, 0, 255)), - 7: - dict( - link=('Torso_Mid_Back', 'Tail_Top_Back'), - id=7, - color=(185, 3, 221)), - 8: - dict( - link=('Tail_Top_Back', 'Tail_Mid_Back'), id=8, - color=(0, 251, 255)), - 9: - dict( - link=('Tail_Mid_Back', 'Tail_End_Back'), - id=9, - color=[32, 178, 170]), - 10: - dict( - link=('Head_Mid_Top', 'Shoulder_Left'), - id=10, - color=[255, 182, 193]), - 11: - dict( - link=('Head_Mid_Top', 'Shoulder_Right'), - id=11, - color=[0, 191, 255]), - 12: - dict( - link=('Shoulder_Left', 'Elbow_Left'), id=12, color=[255, 105, - 180]), - 13: - dict( - link=('Shoulder_Right', 'Elbow_Right'), - id=13, - color=[30, 144, 255]), - 14: - dict(link=('Elbow_Left', 'Wrist_Left'), id=14, color=[255, 20, 147]), - 15: - dict(link=('Elbow_Right', 'Wrist_Right'), id=15, color=[0, 0, 255]), - 16: - dict(link=('Tail_Top_Back', 'Hip_Left'), id=16, color=[255, 215, 0]), - 17: - dict( - link=('Tail_Top_Back', 'Hip_Right'), id=17, color=[147, 112, 219]), - 18: - dict(link=('Hip_Left', 'Knee_Left'), id=18, color=[255, 165, 0]), - 19: - dict(link=('Hip_Right', 'Knee_Right'), id=19, color=[138, 43, 226]), - 20: - dict(link=('Knee_Left', 'Ankle_Left'), id=20, color=[255, 140, 0]), - 21: - dict(link=('Knee_Right', 'Ankle_Right'), id=21, color=[128, 0, 128]) + 0: dict(link=("Eye_Left", "Head_Mid_Top"), id=0, color=[220, 20, 60]), + 1: dict(link=("Eye_Right", "Head_Mid_Top"), id=1, color=[0, 255, 255]), + 2: dict(link=("Mouth_Front_Top", "Mouth_Back_Left"), id=2, color=[221, 160, 221]), + 3: dict(link=("Mouth_Front_Top", "Mouth_Back_Right"), id=3, color=[135, 206, 250]), + 4: dict(link=("Mouth_Front_Bottom", "Mouth_Back_Left"), id=4, color=[221, 160, 221]), + 5: dict(link=("Mouth_Front_Bottom", "Mouth_Back_Right"), id=5, color=[135, 206, 250]), + 6: dict(link=("Head_Mid_Top", "Torso_Mid_Back"), id=6, color=(225, 0, 255)), + 7: dict(link=("Torso_Mid_Back", "Tail_Top_Back"), id=7, color=(185, 3, 221)), + 8: dict(link=("Tail_Top_Back", "Tail_Mid_Back"), id=8, color=(0, 251, 255)), + 9: dict(link=("Tail_Mid_Back", "Tail_End_Back"), id=9, color=[32, 178, 170]), + 10: dict(link=("Head_Mid_Top", "Shoulder_Left"), id=10, color=[255, 182, 193]), + 11: dict(link=("Head_Mid_Top", "Shoulder_Right"), id=11, color=[0, 191, 255]), + 12: dict(link=("Shoulder_Left", "Elbow_Left"), id=12, color=[255, 105, 180]), + 13: dict(link=("Shoulder_Right", "Elbow_Right"), id=13, color=[30, 144, 255]), + 14: dict(link=("Elbow_Left", "Wrist_Left"), id=14, color=[255, 20, 147]), + 15: dict(link=("Elbow_Right", "Wrist_Right"), id=15, color=[0, 0, 255]), + 16: dict(link=("Tail_Top_Back", "Hip_Left"), id=16, color=[255, 215, 0]), + 17: dict(link=("Tail_Top_Back", "Hip_Right"), id=17, color=[147, 112, 219]), + 18: dict(link=("Hip_Left", "Knee_Left"), id=18, color=[255, 165, 0]), + 19: dict(link=("Hip_Right", "Knee_Right"), id=19, color=[138, 43, 226]), + 20: dict(link=("Knee_Left", "Ankle_Left"), id=20, color=[255, 140, 0]), + 21: dict(link=("Knee_Right", "Ankle_Right"), id=21, color=[128, 0, 128]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., - 1., 1., 1., 1., 1. - ], + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], sigmas=[ - 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, - 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, - 0.025, 0.025, 0.025 - ]) + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + 0.025, + ], +) diff --git a/mmpose/configs/_base_/datasets/animalpose.py b/mmpose/configs/_base_/datasets/animalpose.py index d5bb62d951b71da25e679bd755fe566216dc3f6f..de63f85a0b17d26507c30a92b2177c55e78e0c3b 100644 --- a/mmpose/configs/_base_/datasets/animalpose.py +++ b/mmpose/configs/_base_/datasets/animalpose.py @@ -1,166 +1,80 @@ dataset_info = dict( - dataset_name='animalpose', + dataset_name="animalpose", paper_info=dict( - author='Cao, Jinkun and Tang, Hongyang and Fang, Hao-Shu and ' - 'Shen, Xiaoyong and Lu, Cewu and Tai, Yu-Wing', - title='Cross-Domain Adaptation for Animal Pose Estimation', - container='The IEEE International Conference on ' - 'Computer Vision (ICCV)', - year='2019', - homepage='https://sites.google.com/view/animal-pose/', + author="Cao, Jinkun and Tang, Hongyang and Fang, Hao-Shu and " "Shen, Xiaoyong and Lu, Cewu and Tai, Yu-Wing", + title="Cross-Domain Adaptation for Animal Pose Estimation", + container="The IEEE International Conference on " "Computer Vision (ICCV)", + year="2019", + homepage="https://sites.google.com/view/animal-pose/", ), keypoint_info={ - 0: - dict( - name='L_Eye', id=0, color=[0, 255, 0], type='upper', swap='R_Eye'), - 1: - dict( - name='R_Eye', - id=1, - color=[255, 128, 0], - type='upper', - swap='L_Eye'), - 2: - dict( - name='L_EarBase', - id=2, - color=[0, 255, 0], - type='upper', - swap='R_EarBase'), - 3: - dict( - name='R_EarBase', - id=3, - color=[255, 128, 0], - type='upper', - swap='L_EarBase'), - 4: - dict(name='Nose', id=4, color=[51, 153, 255], type='upper', swap=''), - 5: - dict(name='Throat', id=5, color=[51, 153, 255], type='upper', swap=''), - 6: - dict( - name='TailBase', id=6, color=[51, 153, 255], type='lower', - swap=''), - 7: - dict( - name='Withers', id=7, color=[51, 153, 255], type='upper', swap=''), - 8: - dict( - name='L_F_Elbow', - id=8, - color=[0, 255, 0], - type='upper', - swap='R_F_Elbow'), - 9: - dict( - name='R_F_Elbow', - id=9, - color=[255, 128, 0], - type='upper', - swap='L_F_Elbow'), - 10: - dict( - name='L_B_Elbow', - id=10, - color=[0, 255, 0], - type='lower', - swap='R_B_Elbow'), - 11: - dict( - name='R_B_Elbow', - id=11, - color=[255, 128, 0], - type='lower', - swap='L_B_Elbow'), - 12: - dict( - name='L_F_Knee', - id=12, - color=[0, 255, 0], - type='upper', - swap='R_F_Knee'), - 13: - dict( - name='R_F_Knee', - id=13, - color=[255, 128, 0], - type='upper', - swap='L_F_Knee'), - 14: - dict( - name='L_B_Knee', - id=14, - color=[0, 255, 0], - type='lower', - swap='R_B_Knee'), - 15: - dict( - name='R_B_Knee', - id=15, - color=[255, 128, 0], - type='lower', - swap='L_B_Knee'), - 16: - dict( - name='L_F_Paw', - id=16, - color=[0, 255, 0], - type='upper', - swap='R_F_Paw'), - 17: - dict( - name='R_F_Paw', - id=17, - color=[255, 128, 0], - type='upper', - swap='L_F_Paw'), - 18: - dict( - name='L_B_Paw', - id=18, - color=[0, 255, 0], - type='lower', - swap='R_B_Paw'), - 19: - dict( - name='R_B_Paw', - id=19, - color=[255, 128, 0], - type='lower', - swap='L_B_Paw') + 0: dict(name="L_Eye", id=0, color=[0, 255, 0], type="upper", swap="R_Eye"), + 1: dict(name="R_Eye", id=1, color=[255, 128, 0], type="upper", swap="L_Eye"), + 2: dict(name="L_EarBase", id=2, color=[0, 255, 0], type="upper", swap="R_EarBase"), + 3: dict(name="R_EarBase", id=3, color=[255, 128, 0], type="upper", swap="L_EarBase"), + 4: dict(name="Nose", id=4, color=[51, 153, 255], type="upper", swap=""), + 5: dict(name="Throat", id=5, color=[51, 153, 255], type="upper", swap=""), + 6: dict(name="TailBase", id=6, color=[51, 153, 255], type="lower", swap=""), + 7: dict(name="Withers", id=7, color=[51, 153, 255], type="upper", swap=""), + 8: dict(name="L_F_Elbow", id=8, color=[0, 255, 0], type="upper", swap="R_F_Elbow"), + 9: dict(name="R_F_Elbow", id=9, color=[255, 128, 0], type="upper", swap="L_F_Elbow"), + 10: dict(name="L_B_Elbow", id=10, color=[0, 255, 0], type="lower", swap="R_B_Elbow"), + 11: dict(name="R_B_Elbow", id=11, color=[255, 128, 0], type="lower", swap="L_B_Elbow"), + 12: dict(name="L_F_Knee", id=12, color=[0, 255, 0], type="upper", swap="R_F_Knee"), + 13: dict(name="R_F_Knee", id=13, color=[255, 128, 0], type="upper", swap="L_F_Knee"), + 14: dict(name="L_B_Knee", id=14, color=[0, 255, 0], type="lower", swap="R_B_Knee"), + 15: dict(name="R_B_Knee", id=15, color=[255, 128, 0], type="lower", swap="L_B_Knee"), + 16: dict(name="L_F_Paw", id=16, color=[0, 255, 0], type="upper", swap="R_F_Paw"), + 17: dict(name="R_F_Paw", id=17, color=[255, 128, 0], type="upper", swap="L_F_Paw"), + 18: dict(name="L_B_Paw", id=18, color=[0, 255, 0], type="lower", swap="R_B_Paw"), + 19: dict(name="R_B_Paw", id=19, color=[255, 128, 0], type="lower", swap="L_B_Paw"), }, skeleton_info={ - 0: dict(link=('L_Eye', 'R_Eye'), id=0, color=[51, 153, 255]), - 1: dict(link=('L_Eye', 'L_EarBase'), id=1, color=[0, 255, 0]), - 2: dict(link=('R_Eye', 'R_EarBase'), id=2, color=[255, 128, 0]), - 3: dict(link=('L_Eye', 'Nose'), id=3, color=[0, 255, 0]), - 4: dict(link=('R_Eye', 'Nose'), id=4, color=[255, 128, 0]), - 5: dict(link=('Nose', 'Throat'), id=5, color=[51, 153, 255]), - 6: dict(link=('Throat', 'Withers'), id=6, color=[51, 153, 255]), - 7: dict(link=('TailBase', 'Withers'), id=7, color=[51, 153, 255]), - 8: dict(link=('Throat', 'L_F_Elbow'), id=8, color=[0, 255, 0]), - 9: dict(link=('L_F_Elbow', 'L_F_Knee'), id=9, color=[0, 255, 0]), - 10: dict(link=('L_F_Knee', 'L_F_Paw'), id=10, color=[0, 255, 0]), - 11: dict(link=('Throat', 'R_F_Elbow'), id=11, color=[255, 128, 0]), - 12: dict(link=('R_F_Elbow', 'R_F_Knee'), id=12, color=[255, 128, 0]), - 13: dict(link=('R_F_Knee', 'R_F_Paw'), id=13, color=[255, 128, 0]), - 14: dict(link=('TailBase', 'L_B_Elbow'), id=14, color=[0, 255, 0]), - 15: dict(link=('L_B_Elbow', 'L_B_Knee'), id=15, color=[0, 255, 0]), - 16: dict(link=('L_B_Knee', 'L_B_Paw'), id=16, color=[0, 255, 0]), - 17: dict(link=('TailBase', 'R_B_Elbow'), id=17, color=[255, 128, 0]), - 18: dict(link=('R_B_Elbow', 'R_B_Knee'), id=18, color=[255, 128, 0]), - 19: dict(link=('R_B_Knee', 'R_B_Paw'), id=19, color=[255, 128, 0]) + 0: dict(link=("L_Eye", "R_Eye"), id=0, color=[51, 153, 255]), + 1: dict(link=("L_Eye", "L_EarBase"), id=1, color=[0, 255, 0]), + 2: dict(link=("R_Eye", "R_EarBase"), id=2, color=[255, 128, 0]), + 3: dict(link=("L_Eye", "Nose"), id=3, color=[0, 255, 0]), + 4: dict(link=("R_Eye", "Nose"), id=4, color=[255, 128, 0]), + 5: dict(link=("Nose", "Throat"), id=5, color=[51, 153, 255]), + 6: dict(link=("Throat", "Withers"), id=6, color=[51, 153, 255]), + 7: dict(link=("TailBase", "Withers"), id=7, color=[51, 153, 255]), + 8: dict(link=("Throat", "L_F_Elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("L_F_Elbow", "L_F_Knee"), id=9, color=[0, 255, 0]), + 10: dict(link=("L_F_Knee", "L_F_Paw"), id=10, color=[0, 255, 0]), + 11: dict(link=("Throat", "R_F_Elbow"), id=11, color=[255, 128, 0]), + 12: dict(link=("R_F_Elbow", "R_F_Knee"), id=12, color=[255, 128, 0]), + 13: dict(link=("R_F_Knee", "R_F_Paw"), id=13, color=[255, 128, 0]), + 14: dict(link=("TailBase", "L_B_Elbow"), id=14, color=[0, 255, 0]), + 15: dict(link=("L_B_Elbow", "L_B_Knee"), id=15, color=[0, 255, 0]), + 16: dict(link=("L_B_Knee", "L_B_Paw"), id=16, color=[0, 255, 0]), + 17: dict(link=("TailBase", "R_B_Elbow"), id=17, color=[255, 128, 0]), + 18: dict(link=("R_B_Elbow", "R_B_Knee"), id=18, color=[255, 128, 0]), + 19: dict(link=("R_B_Knee", "R_B_Paw"), id=19, color=[255, 128, 0]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.2, 1.2, - 1.5, 1.5, 1.5, 1.5 - ], - + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.2, 1.2, 1.5, 1.5, 1.5, 1.5], # Note: The original paper did not provide enough information about # the sigmas. We modified from 'https://github.com/cocodataset/' # 'cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py#L523' sigmas=[ - 0.025, 0.025, 0.026, 0.035, 0.035, 0.10, 0.10, 0.10, 0.107, 0.107, - 0.107, 0.107, 0.087, 0.087, 0.087, 0.087, 0.089, 0.089, 0.089, 0.089 - ]) + 0.025, + 0.025, + 0.026, + 0.035, + 0.035, + 0.10, + 0.10, + 0.10, + 0.107, + 0.107, + 0.107, + 0.107, + 0.087, + 0.087, + 0.087, + 0.087, + 0.089, + 0.089, + 0.089, + 0.089, + ], +) diff --git a/mmpose/configs/_base_/datasets/ap10k.py b/mmpose/configs/_base_/datasets/ap10k.py index c0df579acbb8cf0de1ef62412ba865ee8710f0aa..54f510a83cefc22fb06d509281eef3a9c33820c8 100644 --- a/mmpose/configs/_base_/datasets/ap10k.py +++ b/mmpose/configs/_base_/datasets/ap10k.py @@ -1,142 +1,50 @@ dataset_info = dict( - dataset_name='ap10k', + dataset_name="ap10k", paper_info=dict( - author='Yu, Hang and Xu, Yufei and Zhang, Jing and ' - 'Zhao, Wei and Guan, Ziyu and Tao, Dacheng', - title='AP-10K: A Benchmark for Animal Pose Estimation in the Wild', - container='35th Conference on Neural Information Processing Systems ' - '(NeurIPS 2021) Track on Datasets and Bench-marks.', - year='2021', - homepage='https://github.com/AlexTheBad/AP-10K', + author="Yu, Hang and Xu, Yufei and Zhang, Jing and " "Zhao, Wei and Guan, Ziyu and Tao, Dacheng", + title="AP-10K: A Benchmark for Animal Pose Estimation in the Wild", + container="35th Conference on Neural Information Processing Systems " "(NeurIPS 2021) Track on Datasets and Bench-marks.", + year="2021", + homepage="https://github.com/AlexTheBad/AP-10K", ), keypoint_info={ - 0: - dict( - name='L_Eye', id=0, color=[0, 255, 0], type='upper', swap='R_Eye'), - 1: - dict( - name='R_Eye', - id=1, - color=[255, 128, 0], - type='upper', - swap='L_Eye'), - 2: - dict(name='Nose', id=2, color=[51, 153, 255], type='upper', swap=''), - 3: - dict(name='Neck', id=3, color=[51, 153, 255], type='upper', swap=''), - 4: - dict( - name='Root of tail', - id=4, - color=[51, 153, 255], - type='lower', - swap=''), - 5: - dict( - name='L_Shoulder', - id=5, - color=[51, 153, 255], - type='upper', - swap='R_Shoulder'), - 6: - dict( - name='L_Elbow', - id=6, - color=[51, 153, 255], - type='upper', - swap='R_Elbow'), - 7: - dict( - name='L_F_Paw', - id=7, - color=[0, 255, 0], - type='upper', - swap='R_F_Paw'), - 8: - dict( - name='R_Shoulder', - id=8, - color=[0, 255, 0], - type='upper', - swap='L_Shoulder'), - 9: - dict( - name='R_Elbow', - id=9, - color=[255, 128, 0], - type='upper', - swap='L_Elbow'), - 10: - dict( - name='R_F_Paw', - id=10, - color=[0, 255, 0], - type='lower', - swap='L_F_Paw'), - 11: - dict( - name='L_Hip', - id=11, - color=[255, 128, 0], - type='lower', - swap='R_Hip'), - 12: - dict( - name='L_Knee', - id=12, - color=[255, 128, 0], - type='lower', - swap='R_Knee'), - 13: - dict( - name='L_B_Paw', - id=13, - color=[0, 255, 0], - type='lower', - swap='R_B_Paw'), - 14: - dict( - name='R_Hip', id=14, color=[0, 255, 0], type='lower', - swap='L_Hip'), - 15: - dict( - name='R_Knee', - id=15, - color=[0, 255, 0], - type='lower', - swap='L_Knee'), - 16: - dict( - name='R_B_Paw', - id=16, - color=[0, 255, 0], - type='lower', - swap='L_B_Paw'), + 0: dict(name="L_Eye", id=0, color=[0, 255, 0], type="upper", swap="R_Eye"), + 1: dict(name="R_Eye", id=1, color=[255, 128, 0], type="upper", swap="L_Eye"), + 2: dict(name="Nose", id=2, color=[51, 153, 255], type="upper", swap=""), + 3: dict(name="Neck", id=3, color=[51, 153, 255], type="upper", swap=""), + 4: dict(name="Root of tail", id=4, color=[51, 153, 255], type="lower", swap=""), + 5: dict(name="L_Shoulder", id=5, color=[51, 153, 255], type="upper", swap="R_Shoulder"), + 6: dict(name="L_Elbow", id=6, color=[51, 153, 255], type="upper", swap="R_Elbow"), + 7: dict(name="L_F_Paw", id=7, color=[0, 255, 0], type="upper", swap="R_F_Paw"), + 8: dict(name="R_Shoulder", id=8, color=[0, 255, 0], type="upper", swap="L_Shoulder"), + 9: dict(name="R_Elbow", id=9, color=[255, 128, 0], type="upper", swap="L_Elbow"), + 10: dict(name="R_F_Paw", id=10, color=[0, 255, 0], type="lower", swap="L_F_Paw"), + 11: dict(name="L_Hip", id=11, color=[255, 128, 0], type="lower", swap="R_Hip"), + 12: dict(name="L_Knee", id=12, color=[255, 128, 0], type="lower", swap="R_Knee"), + 13: dict(name="L_B_Paw", id=13, color=[0, 255, 0], type="lower", swap="R_B_Paw"), + 14: dict(name="R_Hip", id=14, color=[0, 255, 0], type="lower", swap="L_Hip"), + 15: dict(name="R_Knee", id=15, color=[0, 255, 0], type="lower", swap="L_Knee"), + 16: dict(name="R_B_Paw", id=16, color=[0, 255, 0], type="lower", swap="L_B_Paw"), }, skeleton_info={ - 0: dict(link=('L_Eye', 'R_Eye'), id=0, color=[0, 0, 255]), - 1: dict(link=('L_Eye', 'Nose'), id=1, color=[0, 0, 255]), - 2: dict(link=('R_Eye', 'Nose'), id=2, color=[0, 0, 255]), - 3: dict(link=('Nose', 'Neck'), id=3, color=[0, 255, 0]), - 4: dict(link=('Neck', 'Root of tail'), id=4, color=[0, 255, 0]), - 5: dict(link=('Neck', 'L_Shoulder'), id=5, color=[0, 255, 255]), - 6: dict(link=('L_Shoulder', 'L_Elbow'), id=6, color=[0, 255, 255]), - 7: dict(link=('L_Elbow', 'L_F_Paw'), id=6, color=[0, 255, 255]), - 8: dict(link=('Neck', 'R_Shoulder'), id=7, color=[6, 156, 250]), - 9: dict(link=('R_Shoulder', 'R_Elbow'), id=8, color=[6, 156, 250]), - 10: dict(link=('R_Elbow', 'R_F_Paw'), id=9, color=[6, 156, 250]), - 11: dict(link=('Root of tail', 'L_Hip'), id=10, color=[0, 255, 255]), - 12: dict(link=('L_Hip', 'L_Knee'), id=11, color=[0, 255, 255]), - 13: dict(link=('L_Knee', 'L_B_Paw'), id=12, color=[0, 255, 255]), - 14: dict(link=('Root of tail', 'R_Hip'), id=13, color=[6, 156, 250]), - 15: dict(link=('R_Hip', 'R_Knee'), id=14, color=[6, 156, 250]), - 16: dict(link=('R_Knee', 'R_B_Paw'), id=15, color=[6, 156, 250]), + 0: dict(link=("L_Eye", "R_Eye"), id=0, color=[0, 0, 255]), + 1: dict(link=("L_Eye", "Nose"), id=1, color=[0, 0, 255]), + 2: dict(link=("R_Eye", "Nose"), id=2, color=[0, 0, 255]), + 3: dict(link=("Nose", "Neck"), id=3, color=[0, 255, 0]), + 4: dict(link=("Neck", "Root of tail"), id=4, color=[0, 255, 0]), + 5: dict(link=("Neck", "L_Shoulder"), id=5, color=[0, 255, 255]), + 6: dict(link=("L_Shoulder", "L_Elbow"), id=6, color=[0, 255, 255]), + 7: dict(link=("L_Elbow", "L_F_Paw"), id=6, color=[0, 255, 255]), + 8: dict(link=("Neck", "R_Shoulder"), id=7, color=[6, 156, 250]), + 9: dict(link=("R_Shoulder", "R_Elbow"), id=8, color=[6, 156, 250]), + 10: dict(link=("R_Elbow", "R_F_Paw"), id=9, color=[6, 156, 250]), + 11: dict(link=("Root of tail", "L_Hip"), id=10, color=[0, 255, 255]), + 12: dict(link=("L_Hip", "L_Knee"), id=11, color=[0, 255, 255]), + 13: dict(link=("L_Knee", "L_B_Paw"), id=12, color=[0, 255, 255]), + 14: dict(link=("Root of tail", "R_Hip"), id=13, color=[6, 156, 250]), + 15: dict(link=("R_Hip", "R_Knee"), id=14, color=[6, 156, 250]), + 16: dict(link=("R_Knee", "R_B_Paw"), id=15, color=[6, 156, 250]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5 - ], - sigmas=[ - 0.025, 0.025, 0.026, 0.035, 0.035, 0.079, 0.072, 0.062, 0.079, 0.072, - 0.062, 0.107, 0.087, 0.089, 0.107, 0.087, 0.089 - ]) + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5], + sigmas=[0.025, 0.025, 0.026, 0.035, 0.035, 0.079, 0.072, 0.062, 0.079, 0.072, 0.062, 0.107, 0.087, 0.089, 0.107, 0.087, 0.089], +) diff --git a/mmpose/configs/_base_/datasets/atrw.py b/mmpose/configs/_base_/datasets/atrw.py index 7ec71c8c508a0340139371a651ca2dd56eeae3cf..5fcf1a51fc82382d4c84f36920507ceb39cea4a5 100644 --- a/mmpose/configs/_base_/datasets/atrw.py +++ b/mmpose/configs/_base_/datasets/atrw.py @@ -1,144 +1,45 @@ dataset_info = dict( - dataset_name='atrw', + dataset_name="atrw", paper_info=dict( - author='Li, Shuyuan and Li, Jianguo and Tang, Hanlin ' - 'and Qian, Rui and Lin, Weiyao', - title='ATRW: A Benchmark for Amur Tiger ' - 'Re-identification in the Wild', - container='Proceedings of the 28th ACM ' - 'International Conference on Multimedia', - year='2020', - homepage='https://cvwc2019.github.io/challenge.html', + author="Li, Shuyuan and Li, Jianguo and Tang, Hanlin " "and Qian, Rui and Lin, Weiyao", + title="ATRW: A Benchmark for Amur Tiger " "Re-identification in the Wild", + container="Proceedings of the 28th ACM " "International Conference on Multimedia", + year="2020", + homepage="https://cvwc2019.github.io/challenge.html", ), keypoint_info={ - 0: - dict( - name='left_ear', - id=0, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 1: - dict( - name='right_ear', - id=1, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 2: - dict(name='nose', id=2, color=[51, 153, 255], type='upper', swap=''), - 3: - dict( - name='right_shoulder', - id=3, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 4: - dict( - name='right_front_paw', - id=4, - color=[255, 128, 0], - type='upper', - swap='left_front_paw'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='left_front_paw', - id=6, - color=[0, 255, 0], - type='upper', - swap='right_front_paw'), - 7: - dict( - name='right_hip', - id=7, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 8: - dict( - name='right_knee', - id=8, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 9: - dict( - name='right_back_paw', - id=9, - color=[255, 128, 0], - type='lower', - swap='left_back_paw'), - 10: - dict( - name='left_hip', - id=10, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 11: - dict( - name='left_knee', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 12: - dict( - name='left_back_paw', - id=12, - color=[0, 255, 0], - type='lower', - swap='right_back_paw'), - 13: - dict(name='tail', id=13, color=[51, 153, 255], type='lower', swap=''), - 14: - dict( - name='center', id=14, color=[51, 153, 255], type='lower', swap=''), + 0: dict(name="left_ear", id=0, color=[51, 153, 255], type="upper", swap="right_ear"), + 1: dict(name="right_ear", id=1, color=[51, 153, 255], type="upper", swap="left_ear"), + 2: dict(name="nose", id=2, color=[51, 153, 255], type="upper", swap=""), + 3: dict(name="right_shoulder", id=3, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 4: dict(name="right_front_paw", id=4, color=[255, 128, 0], type="upper", swap="left_front_paw"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="left_front_paw", id=6, color=[0, 255, 0], type="upper", swap="right_front_paw"), + 7: dict(name="right_hip", id=7, color=[255, 128, 0], type="lower", swap="left_hip"), + 8: dict(name="right_knee", id=8, color=[255, 128, 0], type="lower", swap="left_knee"), + 9: dict(name="right_back_paw", id=9, color=[255, 128, 0], type="lower", swap="left_back_paw"), + 10: dict(name="left_hip", id=10, color=[0, 255, 0], type="lower", swap="right_hip"), + 11: dict(name="left_knee", id=11, color=[0, 255, 0], type="lower", swap="right_knee"), + 12: dict(name="left_back_paw", id=12, color=[0, 255, 0], type="lower", swap="right_back_paw"), + 13: dict(name="tail", id=13, color=[51, 153, 255], type="lower", swap=""), + 14: dict(name="center", id=14, color=[51, 153, 255], type="lower", swap=""), }, skeleton_info={ - 0: - dict(link=('left_ear', 'nose'), id=0, color=[51, 153, 255]), - 1: - dict(link=('right_ear', 'nose'), id=1, color=[51, 153, 255]), - 2: - dict(link=('nose', 'center'), id=2, color=[51, 153, 255]), - 3: - dict( - link=('left_shoulder', 'left_front_paw'), id=3, color=[0, 255, 0]), - 4: - dict(link=('left_shoulder', 'center'), id=4, color=[0, 255, 0]), - 5: - dict( - link=('right_shoulder', 'right_front_paw'), - id=5, - color=[255, 128, 0]), - 6: - dict(link=('right_shoulder', 'center'), id=6, color=[255, 128, 0]), - 7: - dict(link=('tail', 'center'), id=7, color=[51, 153, 255]), - 8: - dict(link=('right_back_paw', 'right_knee'), id=8, color=[255, 128, 0]), - 9: - dict(link=('right_knee', 'right_hip'), id=9, color=[255, 128, 0]), - 10: - dict(link=('right_hip', 'tail'), id=10, color=[255, 128, 0]), - 11: - dict(link=('left_back_paw', 'left_knee'), id=11, color=[0, 255, 0]), - 12: - dict(link=('left_knee', 'left_hip'), id=12, color=[0, 255, 0]), - 13: - dict(link=('left_hip', 'tail'), id=13, color=[0, 255, 0]), + 0: dict(link=("left_ear", "nose"), id=0, color=[51, 153, 255]), + 1: dict(link=("right_ear", "nose"), id=1, color=[51, 153, 255]), + 2: dict(link=("nose", "center"), id=2, color=[51, 153, 255]), + 3: dict(link=("left_shoulder", "left_front_paw"), id=3, color=[0, 255, 0]), + 4: dict(link=("left_shoulder", "center"), id=4, color=[0, 255, 0]), + 5: dict(link=("right_shoulder", "right_front_paw"), id=5, color=[255, 128, 0]), + 6: dict(link=("right_shoulder", "center"), id=6, color=[255, 128, 0]), + 7: dict(link=("tail", "center"), id=7, color=[51, 153, 255]), + 8: dict(link=("right_back_paw", "right_knee"), id=8, color=[255, 128, 0]), + 9: dict(link=("right_knee", "right_hip"), id=9, color=[255, 128, 0]), + 10: dict(link=("right_hip", "tail"), id=10, color=[255, 128, 0]), + 11: dict(link=("left_back_paw", "left_knee"), id=11, color=[0, 255, 0]), + 12: dict(link=("left_knee", "left_hip"), id=12, color=[0, 255, 0]), + 13: dict(link=("left_hip", "tail"), id=13, color=[0, 255, 0]), }, - joint_weights=[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], - sigmas=[ - 0.0277, 0.0823, 0.0831, 0.0202, 0.0716, 0.0263, 0.0646, 0.0302, 0.0440, - 0.0316, 0.0333, 0.0547, 0.0263, 0.0683, 0.0539 - ]) + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], + sigmas=[0.0277, 0.0823, 0.0831, 0.0202, 0.0716, 0.0263, 0.0646, 0.0302, 0.0440, 0.0316, 0.0333, 0.0547, 0.0263, 0.0683, 0.0539], +) diff --git a/mmpose/configs/_base_/datasets/campus.py b/mmpose/configs/_base_/datasets/campus.py index 334316e9c25282508767158d3fae30578ab3949d..a1a54f7fce22d924d28da56686f6c6278b162e28 100644 --- a/mmpose/configs/_base_/datasets/campus.py +++ b/mmpose/configs/_base_/datasets/campus.py @@ -1,151 +1,45 @@ dataset_info = dict( - dataset_name='campus', + dataset_name="campus", paper_info=dict( - author='Belagiannis, Vasileios and Amin, Sikandar and Andriluka, ' - 'Mykhaylo and Schiele, Bernt and Navab, Nassir and Ilic, Slobodan', - title='3D Pictorial Structures for Multiple Human Pose Estimation', - container='IEEE Computer Society Conference on Computer Vision and ' - 'Pattern Recognition (CVPR)', - year='2014', - homepage='http://campar.in.tum.de/Chair/MultiHumanPose', + author="Belagiannis, Vasileios and Amin, Sikandar and Andriluka, " + "Mykhaylo and Schiele, Bernt and Navab, Nassir and Ilic, Slobodan", + title="3D Pictorial Structures for Multiple Human Pose Estimation", + container="IEEE Computer Society Conference on Computer Vision and " "Pattern Recognition (CVPR)", + year="2014", + homepage="http://campar.in.tum.de/Chair/MultiHumanPose", ), keypoint_info={ - 0: - dict( - name='right_ankle', - id=0, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 1: - dict( - name='right_knee', - id=1, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 2: - dict( - name='right_hip', - id=2, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 3: - dict( - name='left_hip', - id=3, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 4: - dict( - name='left_knee', - id=4, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 5: - dict( - name='left_ankle', - id=5, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 6: - dict( - name='right_wrist', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 7: - dict( - name='right_elbow', - id=7, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 8: - dict( - name='right_shoulder', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 9: - dict( - name='left_shoulder', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 10: - dict( - name='left_elbow', - id=10, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 11: - dict( - name='left_wrist', - id=11, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 12: - dict( - name='bottom_head', - id=12, - color=[51, 153, 255], - type='upper', - swap=''), - 13: - dict( - name='top_head', - id=13, - color=[51, 153, 255], - type='upper', - swap=''), + 0: dict(name="right_ankle", id=0, color=[255, 128, 0], type="lower", swap="left_ankle"), + 1: dict(name="right_knee", id=1, color=[255, 128, 0], type="lower", swap="left_knee"), + 2: dict(name="right_hip", id=2, color=[255, 128, 0], type="lower", swap="left_hip"), + 3: dict(name="left_hip", id=3, color=[0, 255, 0], type="lower", swap="right_hip"), + 4: dict(name="left_knee", id=4, color=[0, 255, 0], type="lower", swap="right_knee"), + 5: dict(name="left_ankle", id=5, color=[0, 255, 0], type="lower", swap="right_ankle"), + 6: dict(name="right_wrist", id=6, color=[255, 128, 0], type="upper", swap="left_wrist"), + 7: dict(name="right_elbow", id=7, color=[255, 128, 0], type="upper", swap="left_elbow"), + 8: dict(name="right_shoulder", id=8, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 9: dict(name="left_shoulder", id=9, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 10: dict(name="left_elbow", id=10, color=[0, 255, 0], type="upper", swap="right_elbow"), + 11: dict(name="left_wrist", id=11, color=[0, 255, 0], type="upper", swap="right_wrist"), + 12: dict(name="bottom_head", id=12, color=[51, 153, 255], type="upper", swap=""), + 13: dict(name="top_head", id=13, color=[51, 153, 255], type="upper", swap=""), }, skeleton_info={ - 0: - dict(link=('right_ankle', 'right_knee'), id=0, color=[255, 128, 0]), - 1: - dict(link=('right_knee', 'right_hip'), id=1, color=[255, 128, 0]), - 2: - dict(link=('left_hip', 'left_knee'), id=2, color=[0, 255, 0]), - 3: - dict(link=('left_knee', 'left_ankle'), id=3, color=[0, 255, 0]), - 4: - dict(link=('right_hip', 'left_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('right_wrist', 'right_elbow'), id=5, color=[255, 128, 0]), - 6: - dict( - link=('right_elbow', 'right_shoulder'), id=6, color=[255, 128, 0]), - 7: - dict(link=('left_shoulder', 'left_elbow'), id=7, color=[0, 255, 0]), - 8: - dict(link=('left_elbow', 'left_wrist'), id=8, color=[0, 255, 0]), - 9: - dict(link=('right_hip', 'right_shoulder'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_hip', 'left_shoulder'), id=10, color=[0, 255, 0]), - 11: - dict( - link=('right_shoulder', 'bottom_head'), id=11, color=[255, 128, - 0]), - 12: - dict(link=('left_shoulder', 'bottom_head'), id=12, color=[0, 255, 0]), - 13: - dict(link=('bottom_head', 'top_head'), id=13, color=[51, 153, 255]), + 0: dict(link=("right_ankle", "right_knee"), id=0, color=[255, 128, 0]), + 1: dict(link=("right_knee", "right_hip"), id=1, color=[255, 128, 0]), + 2: dict(link=("left_hip", "left_knee"), id=2, color=[0, 255, 0]), + 3: dict(link=("left_knee", "left_ankle"), id=3, color=[0, 255, 0]), + 4: dict(link=("right_hip", "left_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("right_wrist", "right_elbow"), id=5, color=[255, 128, 0]), + 6: dict(link=("right_elbow", "right_shoulder"), id=6, color=[255, 128, 0]), + 7: dict(link=("left_shoulder", "left_elbow"), id=7, color=[0, 255, 0]), + 8: dict(link=("left_elbow", "left_wrist"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_hip", "right_shoulder"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_hip", "left_shoulder"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_shoulder", "bottom_head"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_shoulder", "bottom_head"), id=12, color=[0, 255, 0]), + 13: dict(link=("bottom_head", "top_head"), id=13, color=[51, 153, 255]), }, - joint_weights=[ - 1.5, 1.2, 1.0, 1.0, 1.2, 1.5, 1.5, 1.2, 1.0, 1.0, 1.2, 1.5, 1.0, 1.0 - ], - sigmas=[ - 0.089, 0.087, 0.107, 0.107, 0.087, 0.089, 0.062, 0.072, 0.079, 0.079, - 0.072, 0.062, 0.026, 0.026 - ]) + joint_weights=[1.5, 1.2, 1.0, 1.0, 1.2, 1.5, 1.5, 1.2, 1.0, 1.0, 1.2, 1.5, 1.0, 1.0], + sigmas=[0.089, 0.087, 0.107, 0.107, 0.087, 0.089, 0.062, 0.072, 0.079, 0.079, 0.072, 0.062, 0.026, 0.026], +) diff --git a/mmpose/configs/_base_/datasets/coco.py b/mmpose/configs/_base_/datasets/coco.py index 865a95bc02fedd318f32d2e7aa8397147d78fdb5..3ca47f6dbf1f19d1ecac0d31ea1575f445bc72a3 100644 --- a/mmpose/configs/_base_/datasets/coco.py +++ b/mmpose/configs/_base_/datasets/coco.py @@ -1,181 +1,55 @@ dataset_info = dict( - dataset_name='coco', + dataset_name="coco", paper_info=dict( - author='Lin, Tsung-Yi and Maire, Michael and ' - 'Belongie, Serge and Hays, James and ' - 'Perona, Pietro and Ramanan, Deva and ' - r'Doll{\'a}r, Piotr and Zitnick, C Lawrence', - title='Microsoft coco: Common objects in context', - container='European conference on computer vision', - year='2014', - homepage='http://cocodataset.org/', + author="Lin, Tsung-Yi and Maire, Michael and " + "Belongie, Serge and Hays, James and " + "Perona, Pietro and Ramanan, Deva and " + r"Doll{\'a}r, Piotr and Zitnick, C Lawrence", + title="Microsoft coco: Common objects in context", + container="European conference on computer vision", + year="2014", + homepage="http://cocodataset.org/", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle') + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'right_shoulder'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), - 14: - dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), - 16: - dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), - 18: - dict( - link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]) + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5 - ], - sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089 - ]) + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5], + sigmas=[0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089], +) diff --git a/mmpose/configs/_base_/datasets/coco_aic.py b/mmpose/configs/_base_/datasets/coco_aic.py index a084247468dac1b766cbcf756b750aa3d3680b9d..18976a1c1ac9b1814ee54196e01012ad330016f4 100644 --- a/mmpose/configs/_base_/datasets/coco_aic.py +++ b/mmpose/configs/_base_/datasets/coco_aic.py @@ -1,205 +1,90 @@ dataset_info = dict( - dataset_name='coco', + dataset_name="coco", paper_info=[ dict( - author='Lin, Tsung-Yi and Maire, Michael and ' - 'Belongie, Serge and Hays, James and ' - 'Perona, Pietro and Ramanan, Deva and ' - r'Doll{\'a}r, Piotr and Zitnick, C Lawrence', - title='Microsoft coco: Common objects in context', - container='European conference on computer vision', - year='2014', - homepage='http://cocodataset.org/', + author="Lin, Tsung-Yi and Maire, Michael and " + "Belongie, Serge and Hays, James and " + "Perona, Pietro and Ramanan, Deva and " + r"Doll{\'a}r, Piotr and Zitnick, C Lawrence", + title="Microsoft coco: Common objects in context", + container="European conference on computer vision", + year="2014", + homepage="http://cocodataset.org/", ), dict( - author='Wu, Jiahong and Zheng, He and Zhao, Bo and ' - 'Li, Yixin and Yan, Baoming and Liang, Rui and ' - 'Wang, Wenjia and Zhou, Shipei and Lin, Guosen and ' - 'Fu, Yanwei and others', - title='Ai challenger: A large-scale dataset for going ' - 'deeper in image understanding', - container='arXiv', - year='2017', - homepage='https://github.com/AIChallenger/AI_Challenger_2017', + author="Wu, Jiahong and Zheng, He and Zhao, Bo and " + "Li, Yixin and Yan, Baoming and Liang, Rui and " + "Wang, Wenjia and Zhou, Shipei and Lin, Guosen and " + "Fu, Yanwei and others", + title="Ai challenger: A large-scale dataset for going " "deeper in image understanding", + container="arXiv", + year="2017", + homepage="https://github.com/AIChallenger/AI_Challenger_2017", ), ], keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 17: - dict( - name='head_top', - id=17, - color=[51, 153, 255], - type='upper', - swap=''), - 18: - dict(name='neck', id=18, color=[51, 153, 255], type='upper', swap='') + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), + 17: dict(name="head_top", id=17, color=[51, 153, 255], type="upper", swap=""), + 18: dict(name="neck", id=18, color=[51, 153, 255], type="upper", swap=""), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'right_shoulder'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), - 14: - dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), - 16: - dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), - 18: - dict( - link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]), - 19: - dict(link=('head_top', 'neck'), id=11, color=[51, 153, 255]), + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), + 19: dict(link=("head_top", "neck"), id=11, color=[51, 153, 255]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5, 1.5 - ], + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.5], sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, 0.026, 0.026 - ]) + 0.026, + 0.025, + 0.025, + 0.035, + 0.035, + 0.079, + 0.079, + 0.072, + 0.072, + 0.062, + 0.062, + 0.107, + 0.107, + 0.087, + 0.087, + 0.089, + 0.089, + 0.026, + 0.026, + ], +) diff --git a/mmpose/configs/_base_/datasets/coco_crop.py b/mmpose/configs/_base_/datasets/coco_crop.py index 8c465b2b8033073c6f1deed93830554262afba26..ac61fc7f3a017c898ff4543cafdbb3ddc8aaeb23 100644 --- a/mmpose/configs/_base_/datasets/coco_crop.py +++ b/mmpose/configs/_base_/datasets/coco_crop.py @@ -1,181 +1,55 @@ dataset_info = dict( - dataset_name='coco_crop', + dataset_name="coco_crop", paper_info=dict( - author='Lin, Tsung-Yi and Maire, Michael and ' - 'Belongie, Serge and Hays, James and ' - 'Perona, Pietro and Ramanan, Deva and ' - r'Doll{\'a}r, Piotr and Zitnick, C Lawrence', - title='Microsoft coco: Common objects in context', - container='European conference on computer vision', - year='2014', - homepage='http://cocodataset.org/', + author="Lin, Tsung-Yi and Maire, Michael and " + "Belongie, Serge and Hays, James and " + "Perona, Pietro and Ramanan, Deva and " + r"Doll{\'a}r, Piotr and Zitnick, C Lawrence", + title="Microsoft coco: Common objects in context", + container="European conference on computer vision", + year="2014", + homepage="http://cocodataset.org/", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle') + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'right_shoulder'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), - 14: - dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), - 16: - dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), - 18: - dict( - link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]) + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5 - ], - sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089 - ]) + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5], + sigmas=[0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089], +) diff --git a/mmpose/configs/_base_/datasets/coco_openpose.py b/mmpose/configs/_base_/datasets/coco_openpose.py index cce11b27f16b480facf8717055500d3e60c6ec4f..c856ce943b8dcbc3eda7778b88db40409fe5fa9d 100644 --- a/mmpose/configs/_base_/datasets/coco_openpose.py +++ b/mmpose/configs/_base_/datasets/coco_openpose.py @@ -1,157 +1,51 @@ dataset_info = dict( - dataset_name='coco_openpose', + dataset_name="coco_openpose", paper_info=dict( - author='Zhe, Cao and Tomas, Simon and ' - 'Shih-En, Wei and Yaser, Sheikh', - title='OpenPose: Realtime Multi-Person 2D Pose ' - 'Estimation using Part Affinity Fields', - container='IEEE Transactions on Pattern Analysis ' - 'and Machine Intelligence', - year='2019', - homepage='https://github.com/CMU-Perceptual-Computing-Lab/openpose/', + author="Zhe, Cao and Tomas, Simon and " "Shih-En, Wei and Yaser, Sheikh", + title="OpenPose: Realtime Multi-Person 2D Pose " "Estimation using Part Affinity Fields", + container="IEEE Transactions on Pattern Analysis " "and Machine Intelligence", + year="2019", + homepage="https://github.com/CMU-Perceptual-Computing-Lab/openpose/", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[255, 0, 0], type='upper', swap=''), - 1: - dict(name='neck', id=1, color=[255, 85, 0], type='upper', swap=''), - 2: - dict( - name='right_shoulder', - id=2, - color=[255, 170, 0], - type='upper', - swap='left_shoulder'), - 3: - dict( - name='right_elbow', - id=3, - color=[255, 255, 0], - type='upper', - swap='left_elbow'), - 4: - dict( - name='right_wrist', - id=4, - color=[170, 255, 0], - type='upper', - swap='left_wrist'), - 5: - dict( - name='left_shoulder', - id=5, - color=[85, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='left_elbow', - id=6, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 7: - dict( - name='left_wrist', - id=7, - color=[0, 255, 85], - type='upper', - swap='right_wrist'), - 8: - dict( - name='right_hip', - id=8, - color=[0, 255, 170], - type='lower', - swap='left_hip'), - 9: - dict( - name='right_knee', - id=9, - color=[0, 255, 255], - type='lower', - swap='left_knee'), - 10: - dict( - name='right_ankle', - id=10, - color=[0, 170, 255], - type='lower', - swap='left_ankle'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 85, 255], - type='lower', - swap='right_hip'), - 12: - dict( - name='left_knee', - id=12, - color=[0, 0, 255], - type='lower', - swap='right_knee'), - 13: - dict( - name='left_ankle', - id=13, - color=[85, 0, 255], - type='lower', - swap='right_ankle'), - 14: - dict( - name='right_eye', - id=14, - color=[170, 0, 255], - type='upper', - swap='left_eye'), - 15: - dict( - name='left_eye', - id=15, - color=[255, 0, 255], - type='upper', - swap='right_eye'), - 16: - dict( - name='right_ear', - id=16, - color=[255, 0, 170], - type='upper', - swap='left_ear'), - 17: - dict( - name='left_ear', - id=17, - color=[255, 0, 85], - type='upper', - swap='right_ear'), + 0: dict(name="nose", id=0, color=[255, 0, 0], type="upper", swap=""), + 1: dict(name="neck", id=1, color=[255, 85, 0], type="upper", swap=""), + 2: dict(name="right_shoulder", id=2, color=[255, 170, 0], type="upper", swap="left_shoulder"), + 3: dict(name="right_elbow", id=3, color=[255, 255, 0], type="upper", swap="left_elbow"), + 4: dict(name="right_wrist", id=4, color=[170, 255, 0], type="upper", swap="left_wrist"), + 5: dict(name="left_shoulder", id=5, color=[85, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="left_elbow", id=6, color=[0, 255, 0], type="upper", swap="right_elbow"), + 7: dict(name="left_wrist", id=7, color=[0, 255, 85], type="upper", swap="right_wrist"), + 8: dict(name="right_hip", id=8, color=[0, 255, 170], type="lower", swap="left_hip"), + 9: dict(name="right_knee", id=9, color=[0, 255, 255], type="lower", swap="left_knee"), + 10: dict(name="right_ankle", id=10, color=[0, 170, 255], type="lower", swap="left_ankle"), + 11: dict(name="left_hip", id=11, color=[0, 85, 255], type="lower", swap="right_hip"), + 12: dict(name="left_knee", id=12, color=[0, 0, 255], type="lower", swap="right_knee"), + 13: dict(name="left_ankle", id=13, color=[85, 0, 255], type="lower", swap="right_ankle"), + 14: dict(name="right_eye", id=14, color=[170, 0, 255], type="upper", swap="left_eye"), + 15: dict(name="left_eye", id=15, color=[255, 0, 255], type="upper", swap="right_eye"), + 16: dict(name="right_ear", id=16, color=[255, 0, 170], type="upper", swap="left_ear"), + 17: dict(name="left_ear", id=17, color=[255, 0, 85], type="upper", swap="right_ear"), }, skeleton_info={ - 0: dict(link=('neck', 'right_shoulder'), id=0, color=[255, 0, 0]), - 1: dict(link=('neck', 'left_shoulder'), id=1, color=[255, 85, 0]), - 2: dict( - link=('right_shoulder', 'right_elbow'), id=2, color=[255, 170, 0]), - 3: - dict(link=('right_elbow', 'right_wrist'), id=3, color=[255, 255, 0]), - 4: - dict(link=('left_shoulder', 'left_elbow'), id=4, color=[170, 255, 0]), - 5: dict(link=('left_elbow', 'left_wrist'), id=5, color=[85, 255, 0]), - 6: dict(link=('neck', 'right_hip'), id=6, color=[0, 255, 0]), - 7: dict(link=('right_hip', 'right_knee'), id=7, color=[0, 255, 85]), - 8: dict(link=('right_knee', 'right_ankle'), id=8, color=[0, 255, 170]), - 9: dict(link=('neck', 'left_hip'), id=9, color=[0, 255, 225]), - 10: dict(link=('left_hip', 'left_knee'), id=10, color=[0, 170, 255]), - 11: dict(link=('left_knee', 'left_ankle'), id=11, color=[0, 85, 255]), - 12: dict(link=('neck', 'nose'), id=12, color=[0, 0, 255]), - 13: dict(link=('nose', 'right_eye'), id=13, color=[255, 0, 170]), - 14: dict(link=('right_eye', 'right_ear'), id=14, color=[170, 0, 255]), - 15: dict(link=('nose', 'left_eye'), id=15, color=[255, 0, 255]), - 16: dict(link=('left_eye', 'left_ear'), id=16, color=[255, 0, 170]), + 0: dict(link=("neck", "right_shoulder"), id=0, color=[255, 0, 0]), + 1: dict(link=("neck", "left_shoulder"), id=1, color=[255, 85, 0]), + 2: dict(link=("right_shoulder", "right_elbow"), id=2, color=[255, 170, 0]), + 3: dict(link=("right_elbow", "right_wrist"), id=3, color=[255, 255, 0]), + 4: dict(link=("left_shoulder", "left_elbow"), id=4, color=[170, 255, 0]), + 5: dict(link=("left_elbow", "left_wrist"), id=5, color=[85, 255, 0]), + 6: dict(link=("neck", "right_hip"), id=6, color=[0, 255, 0]), + 7: dict(link=("right_hip", "right_knee"), id=7, color=[0, 255, 85]), + 8: dict(link=("right_knee", "right_ankle"), id=8, color=[0, 255, 170]), + 9: dict(link=("neck", "left_hip"), id=9, color=[0, 255, 225]), + 10: dict(link=("left_hip", "left_knee"), id=10, color=[0, 170, 255]), + 11: dict(link=("left_knee", "left_ankle"), id=11, color=[0, 85, 255]), + 12: dict(link=("neck", "nose"), id=12, color=[0, 0, 255]), + 13: dict(link=("nose", "right_eye"), id=13, color=[255, 0, 170]), + 14: dict(link=("right_eye", "right_ear"), id=14, color=[170, 0, 255]), + 15: dict(link=("nose", "left_eye"), id=15, color=[255, 0, 255]), + 16: dict(link=("left_eye", "left_ear"), id=16, color=[255, 0, 170]), }, - joint_weights=[1.] * 18, - sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, 0.082 - ]) + joint_weights=[1.0] * 18, + sigmas=[0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, 0.082], +) diff --git a/mmpose/configs/_base_/datasets/coco_wholebody.py b/mmpose/configs/_base_/datasets/coco_wholebody.py index ef9b707017a24a1a133bb28566d212c618fee694..e94f178f2298be7694ff54b4134f9eb372fbfda9 100644 --- a/mmpose/configs/_base_/datasets/coco_wholebody.py +++ b/mmpose/configs/_base_/datasets/coco_wholebody.py @@ -1,1154 +1,350 @@ dataset_info = dict( - dataset_name='coco_wholebody', + dataset_name="coco_wholebody", paper_info=dict( - author='Jin, Sheng and Xu, Lumin and Xu, Jin and ' - 'Wang, Can and Liu, Wentao and ' - 'Qian, Chen and Ouyang, Wanli and Luo, Ping', - title='Whole-Body Human Pose Estimation in the Wild', - container='Proceedings of the European ' - 'Conference on Computer Vision (ECCV)', - year='2020', - homepage='https://github.com/jin-s13/COCO-WholeBody/', + author="Jin, Sheng and Xu, Lumin and Xu, Jin and " "Wang, Can and Liu, Wentao and " "Qian, Chen and Ouyang, Wanli and Luo, Ping", + title="Whole-Body Human Pose Estimation in the Wild", + container="Proceedings of the European " "Conference on Computer Vision (ECCV)", + year="2020", + homepage="https://github.com/jin-s13/COCO-WholeBody/", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 17: - dict( - name='left_big_toe', - id=17, - color=[255, 128, 0], - type='lower', - swap='right_big_toe'), - 18: - dict( - name='left_small_toe', - id=18, - color=[255, 128, 0], - type='lower', - swap='right_small_toe'), - 19: - dict( - name='left_heel', - id=19, - color=[255, 128, 0], - type='lower', - swap='right_heel'), - 20: - dict( - name='right_big_toe', - id=20, - color=[255, 128, 0], - type='lower', - swap='left_big_toe'), - 21: - dict( - name='right_small_toe', - id=21, - color=[255, 128, 0], - type='lower', - swap='left_small_toe'), - 22: - dict( - name='right_heel', - id=22, - color=[255, 128, 0], - type='lower', - swap='left_heel'), - 23: - dict( - name='face-0', - id=23, - color=[255, 255, 255], - type='', - swap='face-16'), - 24: - dict( - name='face-1', - id=24, - color=[255, 255, 255], - type='', - swap='face-15'), - 25: - dict( - name='face-2', - id=25, - color=[255, 255, 255], - type='', - swap='face-14'), - 26: - dict( - name='face-3', - id=26, - color=[255, 255, 255], - type='', - swap='face-13'), - 27: - dict( - name='face-4', - id=27, - color=[255, 255, 255], - type='', - swap='face-12'), - 28: - dict( - name='face-5', - id=28, - color=[255, 255, 255], - type='', - swap='face-11'), - 29: - dict( - name='face-6', - id=29, - color=[255, 255, 255], - type='', - swap='face-10'), - 30: - dict( - name='face-7', - id=30, - color=[255, 255, 255], - type='', - swap='face-9'), - 31: - dict(name='face-8', id=31, color=[255, 255, 255], type='', swap=''), - 32: - dict( - name='face-9', - id=32, - color=[255, 255, 255], - type='', - swap='face-7'), - 33: - dict( - name='face-10', - id=33, - color=[255, 255, 255], - type='', - swap='face-6'), - 34: - dict( - name='face-11', - id=34, - color=[255, 255, 255], - type='', - swap='face-5'), - 35: - dict( - name='face-12', - id=35, - color=[255, 255, 255], - type='', - swap='face-4'), - 36: - dict( - name='face-13', - id=36, - color=[255, 255, 255], - type='', - swap='face-3'), - 37: - dict( - name='face-14', - id=37, - color=[255, 255, 255], - type='', - swap='face-2'), - 38: - dict( - name='face-15', - id=38, - color=[255, 255, 255], - type='', - swap='face-1'), - 39: - dict( - name='face-16', - id=39, - color=[255, 255, 255], - type='', - swap='face-0'), - 40: - dict( - name='face-17', - id=40, - color=[255, 255, 255], - type='', - swap='face-26'), - 41: - dict( - name='face-18', - id=41, - color=[255, 255, 255], - type='', - swap='face-25'), - 42: - dict( - name='face-19', - id=42, - color=[255, 255, 255], - type='', - swap='face-24'), - 43: - dict( - name='face-20', - id=43, - color=[255, 255, 255], - type='', - swap='face-23'), - 44: - dict( - name='face-21', - id=44, - color=[255, 255, 255], - type='', - swap='face-22'), - 45: - dict( - name='face-22', - id=45, - color=[255, 255, 255], - type='', - swap='face-21'), - 46: - dict( - name='face-23', - id=46, - color=[255, 255, 255], - type='', - swap='face-20'), - 47: - dict( - name='face-24', - id=47, - color=[255, 255, 255], - type='', - swap='face-19'), - 48: - dict( - name='face-25', - id=48, - color=[255, 255, 255], - type='', - swap='face-18'), - 49: - dict( - name='face-26', - id=49, - color=[255, 255, 255], - type='', - swap='face-17'), - 50: - dict(name='face-27', id=50, color=[255, 255, 255], type='', swap=''), - 51: - dict(name='face-28', id=51, color=[255, 255, 255], type='', swap=''), - 52: - dict(name='face-29', id=52, color=[255, 255, 255], type='', swap=''), - 53: - dict(name='face-30', id=53, color=[255, 255, 255], type='', swap=''), - 54: - dict( - name='face-31', - id=54, - color=[255, 255, 255], - type='', - swap='face-35'), - 55: - dict( - name='face-32', - id=55, - color=[255, 255, 255], - type='', - swap='face-34'), - 56: - dict(name='face-33', id=56, color=[255, 255, 255], type='', swap=''), - 57: - dict( - name='face-34', - id=57, - color=[255, 255, 255], - type='', - swap='face-32'), - 58: - dict( - name='face-35', - id=58, - color=[255, 255, 255], - type='', - swap='face-31'), - 59: - dict( - name='face-36', - id=59, - color=[255, 255, 255], - type='', - swap='face-45'), - 60: - dict( - name='face-37', - id=60, - color=[255, 255, 255], - type='', - swap='face-44'), - 61: - dict( - name='face-38', - id=61, - color=[255, 255, 255], - type='', - swap='face-43'), - 62: - dict( - name='face-39', - id=62, - color=[255, 255, 255], - type='', - swap='face-42'), - 63: - dict( - name='face-40', - id=63, - color=[255, 255, 255], - type='', - swap='face-47'), - 64: - dict( - name='face-41', - id=64, - color=[255, 255, 255], - type='', - swap='face-46'), - 65: - dict( - name='face-42', - id=65, - color=[255, 255, 255], - type='', - swap='face-39'), - 66: - dict( - name='face-43', - id=66, - color=[255, 255, 255], - type='', - swap='face-38'), - 67: - dict( - name='face-44', - id=67, - color=[255, 255, 255], - type='', - swap='face-37'), - 68: - dict( - name='face-45', - id=68, - color=[255, 255, 255], - type='', - swap='face-36'), - 69: - dict( - name='face-46', - id=69, - color=[255, 255, 255], - type='', - swap='face-41'), - 70: - dict( - name='face-47', - id=70, - color=[255, 255, 255], - type='', - swap='face-40'), - 71: - dict( - name='face-48', - id=71, - color=[255, 255, 255], - type='', - swap='face-54'), - 72: - dict( - name='face-49', - id=72, - color=[255, 255, 255], - type='', - swap='face-53'), - 73: - dict( - name='face-50', - id=73, - color=[255, 255, 255], - type='', - swap='face-52'), - 74: - dict(name='face-51', id=74, color=[255, 255, 255], type='', swap=''), - 75: - dict( - name='face-52', - id=75, - color=[255, 255, 255], - type='', - swap='face-50'), - 76: - dict( - name='face-53', - id=76, - color=[255, 255, 255], - type='', - swap='face-49'), - 77: - dict( - name='face-54', - id=77, - color=[255, 255, 255], - type='', - swap='face-48'), - 78: - dict( - name='face-55', - id=78, - color=[255, 255, 255], - type='', - swap='face-59'), - 79: - dict( - name='face-56', - id=79, - color=[255, 255, 255], - type='', - swap='face-58'), - 80: - dict(name='face-57', id=80, color=[255, 255, 255], type='', swap=''), - 81: - dict( - name='face-58', - id=81, - color=[255, 255, 255], - type='', - swap='face-56'), - 82: - dict( - name='face-59', - id=82, - color=[255, 255, 255], - type='', - swap='face-55'), - 83: - dict( - name='face-60', - id=83, - color=[255, 255, 255], - type='', - swap='face-64'), - 84: - dict( - name='face-61', - id=84, - color=[255, 255, 255], - type='', - swap='face-63'), - 85: - dict(name='face-62', id=85, color=[255, 255, 255], type='', swap=''), - 86: - dict( - name='face-63', - id=86, - color=[255, 255, 255], - type='', - swap='face-61'), - 87: - dict( - name='face-64', - id=87, - color=[255, 255, 255], - type='', - swap='face-60'), - 88: - dict( - name='face-65', - id=88, - color=[255, 255, 255], - type='', - swap='face-67'), - 89: - dict(name='face-66', id=89, color=[255, 255, 255], type='', swap=''), - 90: - dict( - name='face-67', - id=90, - color=[255, 255, 255], - type='', - swap='face-65'), - 91: - dict( - name='left_hand_root', - id=91, - color=[255, 255, 255], - type='', - swap='right_hand_root'), - 92: - dict( - name='left_thumb1', - id=92, - color=[255, 128, 0], - type='', - swap='right_thumb1'), - 93: - dict( - name='left_thumb2', - id=93, - color=[255, 128, 0], - type='', - swap='right_thumb2'), - 94: - dict( - name='left_thumb3', - id=94, - color=[255, 128, 0], - type='', - swap='right_thumb3'), - 95: - dict( - name='left_thumb4', - id=95, - color=[255, 128, 0], - type='', - swap='right_thumb4'), - 96: - dict( - name='left_forefinger1', - id=96, - color=[255, 153, 255], - type='', - swap='right_forefinger1'), - 97: - dict( - name='left_forefinger2', - id=97, - color=[255, 153, 255], - type='', - swap='right_forefinger2'), - 98: - dict( - name='left_forefinger3', - id=98, - color=[255, 153, 255], - type='', - swap='right_forefinger3'), - 99: - dict( - name='left_forefinger4', - id=99, - color=[255, 153, 255], - type='', - swap='right_forefinger4'), - 100: - dict( - name='left_middle_finger1', - id=100, - color=[102, 178, 255], - type='', - swap='right_middle_finger1'), - 101: - dict( - name='left_middle_finger2', - id=101, - color=[102, 178, 255], - type='', - swap='right_middle_finger2'), - 102: - dict( - name='left_middle_finger3', - id=102, - color=[102, 178, 255], - type='', - swap='right_middle_finger3'), - 103: - dict( - name='left_middle_finger4', - id=103, - color=[102, 178, 255], - type='', - swap='right_middle_finger4'), - 104: - dict( - name='left_ring_finger1', - id=104, - color=[255, 51, 51], - type='', - swap='right_ring_finger1'), - 105: - dict( - name='left_ring_finger2', - id=105, - color=[255, 51, 51], - type='', - swap='right_ring_finger2'), - 106: - dict( - name='left_ring_finger3', - id=106, - color=[255, 51, 51], - type='', - swap='right_ring_finger3'), - 107: - dict( - name='left_ring_finger4', - id=107, - color=[255, 51, 51], - type='', - swap='right_ring_finger4'), - 108: - dict( - name='left_pinky_finger1', - id=108, - color=[0, 255, 0], - type='', - swap='right_pinky_finger1'), - 109: - dict( - name='left_pinky_finger2', - id=109, - color=[0, 255, 0], - type='', - swap='right_pinky_finger2'), - 110: - dict( - name='left_pinky_finger3', - id=110, - color=[0, 255, 0], - type='', - swap='right_pinky_finger3'), - 111: - dict( - name='left_pinky_finger4', - id=111, - color=[0, 255, 0], - type='', - swap='right_pinky_finger4'), - 112: - dict( - name='right_hand_root', - id=112, - color=[255, 255, 255], - type='', - swap='left_hand_root'), - 113: - dict( - name='right_thumb1', - id=113, - color=[255, 128, 0], - type='', - swap='left_thumb1'), - 114: - dict( - name='right_thumb2', - id=114, - color=[255, 128, 0], - type='', - swap='left_thumb2'), - 115: - dict( - name='right_thumb3', - id=115, - color=[255, 128, 0], - type='', - swap='left_thumb3'), - 116: - dict( - name='right_thumb4', - id=116, - color=[255, 128, 0], - type='', - swap='left_thumb4'), - 117: - dict( - name='right_forefinger1', - id=117, - color=[255, 153, 255], - type='', - swap='left_forefinger1'), - 118: - dict( - name='right_forefinger2', - id=118, - color=[255, 153, 255], - type='', - swap='left_forefinger2'), - 119: - dict( - name='right_forefinger3', - id=119, - color=[255, 153, 255], - type='', - swap='left_forefinger3'), - 120: - dict( - name='right_forefinger4', - id=120, - color=[255, 153, 255], - type='', - swap='left_forefinger4'), - 121: - dict( - name='right_middle_finger1', - id=121, - color=[102, 178, 255], - type='', - swap='left_middle_finger1'), - 122: - dict( - name='right_middle_finger2', - id=122, - color=[102, 178, 255], - type='', - swap='left_middle_finger2'), - 123: - dict( - name='right_middle_finger3', - id=123, - color=[102, 178, 255], - type='', - swap='left_middle_finger3'), - 124: - dict( - name='right_middle_finger4', - id=124, - color=[102, 178, 255], - type='', - swap='left_middle_finger4'), - 125: - dict( - name='right_ring_finger1', - id=125, - color=[255, 51, 51], - type='', - swap='left_ring_finger1'), - 126: - dict( - name='right_ring_finger2', - id=126, - color=[255, 51, 51], - type='', - swap='left_ring_finger2'), - 127: - dict( - name='right_ring_finger3', - id=127, - color=[255, 51, 51], - type='', - swap='left_ring_finger3'), - 128: - dict( - name='right_ring_finger4', - id=128, - color=[255, 51, 51], - type='', - swap='left_ring_finger4'), - 129: - dict( - name='right_pinky_finger1', - id=129, - color=[0, 255, 0], - type='', - swap='left_pinky_finger1'), - 130: - dict( - name='right_pinky_finger2', - id=130, - color=[0, 255, 0], - type='', - swap='left_pinky_finger2'), - 131: - dict( - name='right_pinky_finger3', - id=131, - color=[0, 255, 0], - type='', - swap='left_pinky_finger3'), - 132: - dict( - name='right_pinky_finger4', - id=132, - color=[0, 255, 0], - type='', - swap='left_pinky_finger4') + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), + 17: dict(name="left_big_toe", id=17, color=[255, 128, 0], type="lower", swap="right_big_toe"), + 18: dict(name="left_small_toe", id=18, color=[255, 128, 0], type="lower", swap="right_small_toe"), + 19: dict(name="left_heel", id=19, color=[255, 128, 0], type="lower", swap="right_heel"), + 20: dict(name="right_big_toe", id=20, color=[255, 128, 0], type="lower", swap="left_big_toe"), + 21: dict(name="right_small_toe", id=21, color=[255, 128, 0], type="lower", swap="left_small_toe"), + 22: dict(name="right_heel", id=22, color=[255, 128, 0], type="lower", swap="left_heel"), + 23: dict(name="face-0", id=23, color=[255, 255, 255], type="", swap="face-16"), + 24: dict(name="face-1", id=24, color=[255, 255, 255], type="", swap="face-15"), + 25: dict(name="face-2", id=25, color=[255, 255, 255], type="", swap="face-14"), + 26: dict(name="face-3", id=26, color=[255, 255, 255], type="", swap="face-13"), + 27: dict(name="face-4", id=27, color=[255, 255, 255], type="", swap="face-12"), + 28: dict(name="face-5", id=28, color=[255, 255, 255], type="", swap="face-11"), + 29: dict(name="face-6", id=29, color=[255, 255, 255], type="", swap="face-10"), + 30: dict(name="face-7", id=30, color=[255, 255, 255], type="", swap="face-9"), + 31: dict(name="face-8", id=31, color=[255, 255, 255], type="", swap=""), + 32: dict(name="face-9", id=32, color=[255, 255, 255], type="", swap="face-7"), + 33: dict(name="face-10", id=33, color=[255, 255, 255], type="", swap="face-6"), + 34: dict(name="face-11", id=34, color=[255, 255, 255], type="", swap="face-5"), + 35: dict(name="face-12", id=35, color=[255, 255, 255], type="", swap="face-4"), + 36: dict(name="face-13", id=36, color=[255, 255, 255], type="", swap="face-3"), + 37: dict(name="face-14", id=37, color=[255, 255, 255], type="", swap="face-2"), + 38: dict(name="face-15", id=38, color=[255, 255, 255], type="", swap="face-1"), + 39: dict(name="face-16", id=39, color=[255, 255, 255], type="", swap="face-0"), + 40: dict(name="face-17", id=40, color=[255, 255, 255], type="", swap="face-26"), + 41: dict(name="face-18", id=41, color=[255, 255, 255], type="", swap="face-25"), + 42: dict(name="face-19", id=42, color=[255, 255, 255], type="", swap="face-24"), + 43: dict(name="face-20", id=43, color=[255, 255, 255], type="", swap="face-23"), + 44: dict(name="face-21", id=44, color=[255, 255, 255], type="", swap="face-22"), + 45: dict(name="face-22", id=45, color=[255, 255, 255], type="", swap="face-21"), + 46: dict(name="face-23", id=46, color=[255, 255, 255], type="", swap="face-20"), + 47: dict(name="face-24", id=47, color=[255, 255, 255], type="", swap="face-19"), + 48: dict(name="face-25", id=48, color=[255, 255, 255], type="", swap="face-18"), + 49: dict(name="face-26", id=49, color=[255, 255, 255], type="", swap="face-17"), + 50: dict(name="face-27", id=50, color=[255, 255, 255], type="", swap=""), + 51: dict(name="face-28", id=51, color=[255, 255, 255], type="", swap=""), + 52: dict(name="face-29", id=52, color=[255, 255, 255], type="", swap=""), + 53: dict(name="face-30", id=53, color=[255, 255, 255], type="", swap=""), + 54: dict(name="face-31", id=54, color=[255, 255, 255], type="", swap="face-35"), + 55: dict(name="face-32", id=55, color=[255, 255, 255], type="", swap="face-34"), + 56: dict(name="face-33", id=56, color=[255, 255, 255], type="", swap=""), + 57: dict(name="face-34", id=57, color=[255, 255, 255], type="", swap="face-32"), + 58: dict(name="face-35", id=58, color=[255, 255, 255], type="", swap="face-31"), + 59: dict(name="face-36", id=59, color=[255, 255, 255], type="", swap="face-45"), + 60: dict(name="face-37", id=60, color=[255, 255, 255], type="", swap="face-44"), + 61: dict(name="face-38", id=61, color=[255, 255, 255], type="", swap="face-43"), + 62: dict(name="face-39", id=62, color=[255, 255, 255], type="", swap="face-42"), + 63: dict(name="face-40", id=63, color=[255, 255, 255], type="", swap="face-47"), + 64: dict(name="face-41", id=64, color=[255, 255, 255], type="", swap="face-46"), + 65: dict(name="face-42", id=65, color=[255, 255, 255], type="", swap="face-39"), + 66: dict(name="face-43", id=66, color=[255, 255, 255], type="", swap="face-38"), + 67: dict(name="face-44", id=67, color=[255, 255, 255], type="", swap="face-37"), + 68: dict(name="face-45", id=68, color=[255, 255, 255], type="", swap="face-36"), + 69: dict(name="face-46", id=69, color=[255, 255, 255], type="", swap="face-41"), + 70: dict(name="face-47", id=70, color=[255, 255, 255], type="", swap="face-40"), + 71: dict(name="face-48", id=71, color=[255, 255, 255], type="", swap="face-54"), + 72: dict(name="face-49", id=72, color=[255, 255, 255], type="", swap="face-53"), + 73: dict(name="face-50", id=73, color=[255, 255, 255], type="", swap="face-52"), + 74: dict(name="face-51", id=74, color=[255, 255, 255], type="", swap=""), + 75: dict(name="face-52", id=75, color=[255, 255, 255], type="", swap="face-50"), + 76: dict(name="face-53", id=76, color=[255, 255, 255], type="", swap="face-49"), + 77: dict(name="face-54", id=77, color=[255, 255, 255], type="", swap="face-48"), + 78: dict(name="face-55", id=78, color=[255, 255, 255], type="", swap="face-59"), + 79: dict(name="face-56", id=79, color=[255, 255, 255], type="", swap="face-58"), + 80: dict(name="face-57", id=80, color=[255, 255, 255], type="", swap=""), + 81: dict(name="face-58", id=81, color=[255, 255, 255], type="", swap="face-56"), + 82: dict(name="face-59", id=82, color=[255, 255, 255], type="", swap="face-55"), + 83: dict(name="face-60", id=83, color=[255, 255, 255], type="", swap="face-64"), + 84: dict(name="face-61", id=84, color=[255, 255, 255], type="", swap="face-63"), + 85: dict(name="face-62", id=85, color=[255, 255, 255], type="", swap=""), + 86: dict(name="face-63", id=86, color=[255, 255, 255], type="", swap="face-61"), + 87: dict(name="face-64", id=87, color=[255, 255, 255], type="", swap="face-60"), + 88: dict(name="face-65", id=88, color=[255, 255, 255], type="", swap="face-67"), + 89: dict(name="face-66", id=89, color=[255, 255, 255], type="", swap=""), + 90: dict(name="face-67", id=90, color=[255, 255, 255], type="", swap="face-65"), + 91: dict(name="left_hand_root", id=91, color=[255, 255, 255], type="", swap="right_hand_root"), + 92: dict(name="left_thumb1", id=92, color=[255, 128, 0], type="", swap="right_thumb1"), + 93: dict(name="left_thumb2", id=93, color=[255, 128, 0], type="", swap="right_thumb2"), + 94: dict(name="left_thumb3", id=94, color=[255, 128, 0], type="", swap="right_thumb3"), + 95: dict(name="left_thumb4", id=95, color=[255, 128, 0], type="", swap="right_thumb4"), + 96: dict(name="left_forefinger1", id=96, color=[255, 153, 255], type="", swap="right_forefinger1"), + 97: dict(name="left_forefinger2", id=97, color=[255, 153, 255], type="", swap="right_forefinger2"), + 98: dict(name="left_forefinger3", id=98, color=[255, 153, 255], type="", swap="right_forefinger3"), + 99: dict(name="left_forefinger4", id=99, color=[255, 153, 255], type="", swap="right_forefinger4"), + 100: dict(name="left_middle_finger1", id=100, color=[102, 178, 255], type="", swap="right_middle_finger1"), + 101: dict(name="left_middle_finger2", id=101, color=[102, 178, 255], type="", swap="right_middle_finger2"), + 102: dict(name="left_middle_finger3", id=102, color=[102, 178, 255], type="", swap="right_middle_finger3"), + 103: dict(name="left_middle_finger4", id=103, color=[102, 178, 255], type="", swap="right_middle_finger4"), + 104: dict(name="left_ring_finger1", id=104, color=[255, 51, 51], type="", swap="right_ring_finger1"), + 105: dict(name="left_ring_finger2", id=105, color=[255, 51, 51], type="", swap="right_ring_finger2"), + 106: dict(name="left_ring_finger3", id=106, color=[255, 51, 51], type="", swap="right_ring_finger3"), + 107: dict(name="left_ring_finger4", id=107, color=[255, 51, 51], type="", swap="right_ring_finger4"), + 108: dict(name="left_pinky_finger1", id=108, color=[0, 255, 0], type="", swap="right_pinky_finger1"), + 109: dict(name="left_pinky_finger2", id=109, color=[0, 255, 0], type="", swap="right_pinky_finger2"), + 110: dict(name="left_pinky_finger3", id=110, color=[0, 255, 0], type="", swap="right_pinky_finger3"), + 111: dict(name="left_pinky_finger4", id=111, color=[0, 255, 0], type="", swap="right_pinky_finger4"), + 112: dict(name="right_hand_root", id=112, color=[255, 255, 255], type="", swap="left_hand_root"), + 113: dict(name="right_thumb1", id=113, color=[255, 128, 0], type="", swap="left_thumb1"), + 114: dict(name="right_thumb2", id=114, color=[255, 128, 0], type="", swap="left_thumb2"), + 115: dict(name="right_thumb3", id=115, color=[255, 128, 0], type="", swap="left_thumb3"), + 116: dict(name="right_thumb4", id=116, color=[255, 128, 0], type="", swap="left_thumb4"), + 117: dict(name="right_forefinger1", id=117, color=[255, 153, 255], type="", swap="left_forefinger1"), + 118: dict(name="right_forefinger2", id=118, color=[255, 153, 255], type="", swap="left_forefinger2"), + 119: dict(name="right_forefinger3", id=119, color=[255, 153, 255], type="", swap="left_forefinger3"), + 120: dict(name="right_forefinger4", id=120, color=[255, 153, 255], type="", swap="left_forefinger4"), + 121: dict(name="right_middle_finger1", id=121, color=[102, 178, 255], type="", swap="left_middle_finger1"), + 122: dict(name="right_middle_finger2", id=122, color=[102, 178, 255], type="", swap="left_middle_finger2"), + 123: dict(name="right_middle_finger3", id=123, color=[102, 178, 255], type="", swap="left_middle_finger3"), + 124: dict(name="right_middle_finger4", id=124, color=[102, 178, 255], type="", swap="left_middle_finger4"), + 125: dict(name="right_ring_finger1", id=125, color=[255, 51, 51], type="", swap="left_ring_finger1"), + 126: dict(name="right_ring_finger2", id=126, color=[255, 51, 51], type="", swap="left_ring_finger2"), + 127: dict(name="right_ring_finger3", id=127, color=[255, 51, 51], type="", swap="left_ring_finger3"), + 128: dict(name="right_ring_finger4", id=128, color=[255, 51, 51], type="", swap="left_ring_finger4"), + 129: dict(name="right_pinky_finger1", id=129, color=[0, 255, 0], type="", swap="left_pinky_finger1"), + 130: dict(name="right_pinky_finger2", id=130, color=[0, 255, 0], type="", swap="left_pinky_finger2"), + 131: dict(name="right_pinky_finger3", id=131, color=[0, 255, 0], type="", swap="left_pinky_finger3"), + 132: dict(name="right_pinky_finger4", id=132, color=[0, 255, 0], type="", swap="left_pinky_finger4"), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'right_shoulder'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), - 14: - dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), - 16: - dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), - 18: - dict( - link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]), - 19: - dict(link=('left_ankle', 'left_big_toe'), id=19, color=[0, 255, 0]), - 20: - dict(link=('left_ankle', 'left_small_toe'), id=20, color=[0, 255, 0]), - 21: - dict(link=('left_ankle', 'left_heel'), id=21, color=[0, 255, 0]), - 22: - dict( - link=('right_ankle', 'right_big_toe'), id=22, color=[255, 128, 0]), - 23: - dict( - link=('right_ankle', 'right_small_toe'), - id=23, - color=[255, 128, 0]), - 24: - dict(link=('right_ankle', 'right_heel'), id=24, color=[255, 128, 0]), - 25: - dict( - link=('left_hand_root', 'left_thumb1'), id=25, color=[255, 128, - 0]), - 26: - dict(link=('left_thumb1', 'left_thumb2'), id=26, color=[255, 128, 0]), - 27: - dict(link=('left_thumb2', 'left_thumb3'), id=27, color=[255, 128, 0]), - 28: - dict(link=('left_thumb3', 'left_thumb4'), id=28, color=[255, 128, 0]), - 29: - dict( - link=('left_hand_root', 'left_forefinger1'), - id=29, - color=[255, 153, 255]), - 30: - dict( - link=('left_forefinger1', 'left_forefinger2'), - id=30, - color=[255, 153, 255]), - 31: - dict( - link=('left_forefinger2', 'left_forefinger3'), - id=31, - color=[255, 153, 255]), - 32: - dict( - link=('left_forefinger3', 'left_forefinger4'), - id=32, - color=[255, 153, 255]), - 33: - dict( - link=('left_hand_root', 'left_middle_finger1'), - id=33, - color=[102, 178, 255]), - 34: - dict( - link=('left_middle_finger1', 'left_middle_finger2'), - id=34, - color=[102, 178, 255]), - 35: - dict( - link=('left_middle_finger2', 'left_middle_finger3'), - id=35, - color=[102, 178, 255]), - 36: - dict( - link=('left_middle_finger3', 'left_middle_finger4'), - id=36, - color=[102, 178, 255]), - 37: - dict( - link=('left_hand_root', 'left_ring_finger1'), - id=37, - color=[255, 51, 51]), - 38: - dict( - link=('left_ring_finger1', 'left_ring_finger2'), - id=38, - color=[255, 51, 51]), - 39: - dict( - link=('left_ring_finger2', 'left_ring_finger3'), - id=39, - color=[255, 51, 51]), - 40: - dict( - link=('left_ring_finger3', 'left_ring_finger4'), - id=40, - color=[255, 51, 51]), - 41: - dict( - link=('left_hand_root', 'left_pinky_finger1'), - id=41, - color=[0, 255, 0]), - 42: - dict( - link=('left_pinky_finger1', 'left_pinky_finger2'), - id=42, - color=[0, 255, 0]), - 43: - dict( - link=('left_pinky_finger2', 'left_pinky_finger3'), - id=43, - color=[0, 255, 0]), - 44: - dict( - link=('left_pinky_finger3', 'left_pinky_finger4'), - id=44, - color=[0, 255, 0]), - 45: - dict( - link=('right_hand_root', 'right_thumb1'), - id=45, - color=[255, 128, 0]), - 46: - dict( - link=('right_thumb1', 'right_thumb2'), id=46, color=[255, 128, 0]), - 47: - dict( - link=('right_thumb2', 'right_thumb3'), id=47, color=[255, 128, 0]), - 48: - dict( - link=('right_thumb3', 'right_thumb4'), id=48, color=[255, 128, 0]), - 49: - dict( - link=('right_hand_root', 'right_forefinger1'), - id=49, - color=[255, 153, 255]), - 50: - dict( - link=('right_forefinger1', 'right_forefinger2'), - id=50, - color=[255, 153, 255]), - 51: - dict( - link=('right_forefinger2', 'right_forefinger3'), - id=51, - color=[255, 153, 255]), - 52: - dict( - link=('right_forefinger3', 'right_forefinger4'), - id=52, - color=[255, 153, 255]), - 53: - dict( - link=('right_hand_root', 'right_middle_finger1'), - id=53, - color=[102, 178, 255]), - 54: - dict( - link=('right_middle_finger1', 'right_middle_finger2'), - id=54, - color=[102, 178, 255]), - 55: - dict( - link=('right_middle_finger2', 'right_middle_finger3'), - id=55, - color=[102, 178, 255]), - 56: - dict( - link=('right_middle_finger3', 'right_middle_finger4'), - id=56, - color=[102, 178, 255]), - 57: - dict( - link=('right_hand_root', 'right_ring_finger1'), - id=57, - color=[255, 51, 51]), - 58: - dict( - link=('right_ring_finger1', 'right_ring_finger2'), - id=58, - color=[255, 51, 51]), - 59: - dict( - link=('right_ring_finger2', 'right_ring_finger3'), - id=59, - color=[255, 51, 51]), - 60: - dict( - link=('right_ring_finger3', 'right_ring_finger4'), - id=60, - color=[255, 51, 51]), - 61: - dict( - link=('right_hand_root', 'right_pinky_finger1'), - id=61, - color=[0, 255, 0]), - 62: - dict( - link=('right_pinky_finger1', 'right_pinky_finger2'), - id=62, - color=[0, 255, 0]), - 63: - dict( - link=('right_pinky_finger2', 'right_pinky_finger3'), - id=63, - color=[0, 255, 0]), - 64: - dict( - link=('right_pinky_finger3', 'right_pinky_finger4'), - id=64, - color=[0, 255, 0]) + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), + 19: dict(link=("left_ankle", "left_big_toe"), id=19, color=[0, 255, 0]), + 20: dict(link=("left_ankle", "left_small_toe"), id=20, color=[0, 255, 0]), + 21: dict(link=("left_ankle", "left_heel"), id=21, color=[0, 255, 0]), + 22: dict(link=("right_ankle", "right_big_toe"), id=22, color=[255, 128, 0]), + 23: dict(link=("right_ankle", "right_small_toe"), id=23, color=[255, 128, 0]), + 24: dict(link=("right_ankle", "right_heel"), id=24, color=[255, 128, 0]), + 25: dict(link=("left_hand_root", "left_thumb1"), id=25, color=[255, 128, 0]), + 26: dict(link=("left_thumb1", "left_thumb2"), id=26, color=[255, 128, 0]), + 27: dict(link=("left_thumb2", "left_thumb3"), id=27, color=[255, 128, 0]), + 28: dict(link=("left_thumb3", "left_thumb4"), id=28, color=[255, 128, 0]), + 29: dict(link=("left_hand_root", "left_forefinger1"), id=29, color=[255, 153, 255]), + 30: dict(link=("left_forefinger1", "left_forefinger2"), id=30, color=[255, 153, 255]), + 31: dict(link=("left_forefinger2", "left_forefinger3"), id=31, color=[255, 153, 255]), + 32: dict(link=("left_forefinger3", "left_forefinger4"), id=32, color=[255, 153, 255]), + 33: dict(link=("left_hand_root", "left_middle_finger1"), id=33, color=[102, 178, 255]), + 34: dict(link=("left_middle_finger1", "left_middle_finger2"), id=34, color=[102, 178, 255]), + 35: dict(link=("left_middle_finger2", "left_middle_finger3"), id=35, color=[102, 178, 255]), + 36: dict(link=("left_middle_finger3", "left_middle_finger4"), id=36, color=[102, 178, 255]), + 37: dict(link=("left_hand_root", "left_ring_finger1"), id=37, color=[255, 51, 51]), + 38: dict(link=("left_ring_finger1", "left_ring_finger2"), id=38, color=[255, 51, 51]), + 39: dict(link=("left_ring_finger2", "left_ring_finger3"), id=39, color=[255, 51, 51]), + 40: dict(link=("left_ring_finger3", "left_ring_finger4"), id=40, color=[255, 51, 51]), + 41: dict(link=("left_hand_root", "left_pinky_finger1"), id=41, color=[0, 255, 0]), + 42: dict(link=("left_pinky_finger1", "left_pinky_finger2"), id=42, color=[0, 255, 0]), + 43: dict(link=("left_pinky_finger2", "left_pinky_finger3"), id=43, color=[0, 255, 0]), + 44: dict(link=("left_pinky_finger3", "left_pinky_finger4"), id=44, color=[0, 255, 0]), + 45: dict(link=("right_hand_root", "right_thumb1"), id=45, color=[255, 128, 0]), + 46: dict(link=("right_thumb1", "right_thumb2"), id=46, color=[255, 128, 0]), + 47: dict(link=("right_thumb2", "right_thumb3"), id=47, color=[255, 128, 0]), + 48: dict(link=("right_thumb3", "right_thumb4"), id=48, color=[255, 128, 0]), + 49: dict(link=("right_hand_root", "right_forefinger1"), id=49, color=[255, 153, 255]), + 50: dict(link=("right_forefinger1", "right_forefinger2"), id=50, color=[255, 153, 255]), + 51: dict(link=("right_forefinger2", "right_forefinger3"), id=51, color=[255, 153, 255]), + 52: dict(link=("right_forefinger3", "right_forefinger4"), id=52, color=[255, 153, 255]), + 53: dict(link=("right_hand_root", "right_middle_finger1"), id=53, color=[102, 178, 255]), + 54: dict(link=("right_middle_finger1", "right_middle_finger2"), id=54, color=[102, 178, 255]), + 55: dict(link=("right_middle_finger2", "right_middle_finger3"), id=55, color=[102, 178, 255]), + 56: dict(link=("right_middle_finger3", "right_middle_finger4"), id=56, color=[102, 178, 255]), + 57: dict(link=("right_hand_root", "right_ring_finger1"), id=57, color=[255, 51, 51]), + 58: dict(link=("right_ring_finger1", "right_ring_finger2"), id=58, color=[255, 51, 51]), + 59: dict(link=("right_ring_finger2", "right_ring_finger3"), id=59, color=[255, 51, 51]), + 60: dict(link=("right_ring_finger3", "right_ring_finger4"), id=60, color=[255, 51, 51]), + 61: dict(link=("right_hand_root", "right_pinky_finger1"), id=61, color=[0, 255, 0]), + 62: dict(link=("right_pinky_finger1", "right_pinky_finger2"), id=62, color=[0, 255, 0]), + 63: dict(link=("right_pinky_finger2", "right_pinky_finger3"), id=63, color=[0, 255, 0]), + 64: dict(link=("right_pinky_finger3", "right_pinky_finger4"), id=64, color=[0, 255, 0]), }, - joint_weights=[1.] * 133, + joint_weights=[1.0] * 133, # 'https://github.com/jin-s13/COCO-WholeBody/blob/master/' # 'evaluation/myeval_wholebody.py#L175' sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, 0.068, 0.066, 0.066, - 0.092, 0.094, 0.094, 0.042, 0.043, 0.044, 0.043, 0.040, 0.035, 0.031, - 0.025, 0.020, 0.023, 0.029, 0.032, 0.037, 0.038, 0.043, 0.041, 0.045, - 0.013, 0.012, 0.011, 0.011, 0.012, 0.012, 0.011, 0.011, 0.013, 0.015, - 0.009, 0.007, 0.007, 0.007, 0.012, 0.009, 0.008, 0.016, 0.010, 0.017, - 0.011, 0.009, 0.011, 0.009, 0.007, 0.013, 0.008, 0.011, 0.012, 0.010, - 0.034, 0.008, 0.008, 0.009, 0.008, 0.008, 0.007, 0.010, 0.008, 0.009, - 0.009, 0.009, 0.007, 0.007, 0.008, 0.011, 0.008, 0.008, 0.008, 0.01, - 0.008, 0.029, 0.022, 0.035, 0.037, 0.047, 0.026, 0.025, 0.024, 0.035, - 0.018, 0.024, 0.022, 0.026, 0.017, 0.021, 0.021, 0.032, 0.02, 0.019, - 0.022, 0.031, 0.029, 0.022, 0.035, 0.037, 0.047, 0.026, 0.025, 0.024, - 0.035, 0.018, 0.024, 0.022, 0.026, 0.017, 0.021, 0.021, 0.032, 0.02, - 0.019, 0.022, 0.031 - ]) + 0.026, + 0.025, + 0.025, + 0.035, + 0.035, + 0.079, + 0.079, + 0.072, + 0.072, + 0.062, + 0.062, + 0.107, + 0.107, + 0.087, + 0.087, + 0.089, + 0.089, + 0.068, + 0.066, + 0.066, + 0.092, + 0.094, + 0.094, + 0.042, + 0.043, + 0.044, + 0.043, + 0.040, + 0.035, + 0.031, + 0.025, + 0.020, + 0.023, + 0.029, + 0.032, + 0.037, + 0.038, + 0.043, + 0.041, + 0.045, + 0.013, + 0.012, + 0.011, + 0.011, + 0.012, + 0.012, + 0.011, + 0.011, + 0.013, + 0.015, + 0.009, + 0.007, + 0.007, + 0.007, + 0.012, + 0.009, + 0.008, + 0.016, + 0.010, + 0.017, + 0.011, + 0.009, + 0.011, + 0.009, + 0.007, + 0.013, + 0.008, + 0.011, + 0.012, + 0.010, + 0.034, + 0.008, + 0.008, + 0.009, + 0.008, + 0.008, + 0.007, + 0.010, + 0.008, + 0.009, + 0.009, + 0.009, + 0.007, + 0.007, + 0.008, + 0.011, + 0.008, + 0.008, + 0.008, + 0.01, + 0.008, + 0.029, + 0.022, + 0.035, + 0.037, + 0.047, + 0.026, + 0.025, + 0.024, + 0.035, + 0.018, + 0.024, + 0.022, + 0.026, + 0.017, + 0.021, + 0.021, + 0.032, + 0.02, + 0.019, + 0.022, + 0.031, + 0.029, + 0.022, + 0.035, + 0.037, + 0.047, + 0.026, + 0.025, + 0.024, + 0.035, + 0.018, + 0.024, + 0.022, + 0.026, + 0.017, + 0.021, + 0.021, + 0.032, + 0.02, + 0.019, + 0.022, + 0.031, + ], +) diff --git a/mmpose/configs/_base_/datasets/coco_wholebody_face.py b/mmpose/configs/_base_/datasets/coco_wholebody_face.py index a3fe1e5b336d8ddd668d47123f5c0ceeff580914..df302948c2841678045974a0e593c6bc6150621c 100644 --- a/mmpose/configs/_base_/datasets/coco_wholebody_face.py +++ b/mmpose/configs/_base_/datasets/coco_wholebody_face.py @@ -1,154 +1,154 @@ dataset_info = dict( - dataset_name='coco_wholebody_face', + dataset_name="coco_wholebody_face", paper_info=dict( - author='Jin, Sheng and Xu, Lumin and Xu, Jin and ' - 'Wang, Can and Liu, Wentao and ' - 'Qian, Chen and Ouyang, Wanli and Luo, Ping', - title='Whole-Body Human Pose Estimation in the Wild', - container='Proceedings of the European ' - 'Conference on Computer Vision (ECCV)', - year='2020', - homepage='https://github.com/jin-s13/COCO-WholeBody/', + author="Jin, Sheng and Xu, Lumin and Xu, Jin and " "Wang, Can and Liu, Wentao and " "Qian, Chen and Ouyang, Wanli and Luo, Ping", + title="Whole-Body Human Pose Estimation in the Wild", + container="Proceedings of the European " "Conference on Computer Vision (ECCV)", + year="2020", + homepage="https://github.com/jin-s13/COCO-WholeBody/", ), keypoint_info={ - 0: - dict(name='face-0', id=0, color=[255, 0, 0], type='', swap='face-16'), - 1: - dict(name='face-1', id=1, color=[255, 0, 0], type='', swap='face-15'), - 2: - dict(name='face-2', id=2, color=[255, 0, 0], type='', swap='face-14'), - 3: - dict(name='face-3', id=3, color=[255, 0, 0], type='', swap='face-13'), - 4: - dict(name='face-4', id=4, color=[255, 0, 0], type='', swap='face-12'), - 5: - dict(name='face-5', id=5, color=[255, 0, 0], type='', swap='face-11'), - 6: - dict(name='face-6', id=6, color=[255, 0, 0], type='', swap='face-10'), - 7: - dict(name='face-7', id=7, color=[255, 0, 0], type='', swap='face-9'), - 8: dict(name='face-8', id=8, color=[255, 0, 0], type='', swap=''), - 9: - dict(name='face-9', id=9, color=[255, 0, 0], type='', swap='face-7'), - 10: - dict(name='face-10', id=10, color=[255, 0, 0], type='', swap='face-6'), - 11: - dict(name='face-11', id=11, color=[255, 0, 0], type='', swap='face-5'), - 12: - dict(name='face-12', id=12, color=[255, 0, 0], type='', swap='face-4'), - 13: - dict(name='face-13', id=13, color=[255, 0, 0], type='', swap='face-3'), - 14: - dict(name='face-14', id=14, color=[255, 0, 0], type='', swap='face-2'), - 15: - dict(name='face-15', id=15, color=[255, 0, 0], type='', swap='face-1'), - 16: - dict(name='face-16', id=16, color=[255, 0, 0], type='', swap='face-0'), - 17: dict( - name='face-17', id=17, color=[255, 0, 0], type='', swap='face-26'), - 18: dict( - name='face-18', id=18, color=[255, 0, 0], type='', swap='face-25'), - 19: dict( - name='face-19', id=19, color=[255, 0, 0], type='', swap='face-24'), - 20: dict( - name='face-20', id=20, color=[255, 0, 0], type='', swap='face-23'), - 21: dict( - name='face-21', id=21, color=[255, 0, 0], type='', swap='face-22'), - 22: dict( - name='face-22', id=22, color=[255, 0, 0], type='', swap='face-21'), - 23: dict( - name='face-23', id=23, color=[255, 0, 0], type='', swap='face-20'), - 24: dict( - name='face-24', id=24, color=[255, 0, 0], type='', swap='face-19'), - 25: dict( - name='face-25', id=25, color=[255, 0, 0], type='', swap='face-18'), - 26: dict( - name='face-26', id=26, color=[255, 0, 0], type='', swap='face-17'), - 27: dict(name='face-27', id=27, color=[255, 0, 0], type='', swap=''), - 28: dict(name='face-28', id=28, color=[255, 0, 0], type='', swap=''), - 29: dict(name='face-29', id=29, color=[255, 0, 0], type='', swap=''), - 30: dict(name='face-30', id=30, color=[255, 0, 0], type='', swap=''), - 31: dict( - name='face-31', id=31, color=[255, 0, 0], type='', swap='face-35'), - 32: dict( - name='face-32', id=32, color=[255, 0, 0], type='', swap='face-34'), - 33: dict(name='face-33', id=33, color=[255, 0, 0], type='', swap=''), - 34: dict( - name='face-34', id=34, color=[255, 0, 0], type='', swap='face-32'), - 35: dict( - name='face-35', id=35, color=[255, 0, 0], type='', swap='face-31'), - 36: dict( - name='face-36', id=36, color=[255, 0, 0], type='', swap='face-45'), - 37: dict( - name='face-37', id=37, color=[255, 0, 0], type='', swap='face-44'), - 38: dict( - name='face-38', id=38, color=[255, 0, 0], type='', swap='face-43'), - 39: dict( - name='face-39', id=39, color=[255, 0, 0], type='', swap='face-42'), - 40: dict( - name='face-40', id=40, color=[255, 0, 0], type='', swap='face-47'), - 41: dict( - name='face-41', id=41, color=[255, 0, 0], type='', swap='face-46'), - 42: dict( - name='face-42', id=42, color=[255, 0, 0], type='', swap='face-39'), - 43: dict( - name='face-43', id=43, color=[255, 0, 0], type='', swap='face-38'), - 44: dict( - name='face-44', id=44, color=[255, 0, 0], type='', swap='face-37'), - 45: dict( - name='face-45', id=45, color=[255, 0, 0], type='', swap='face-36'), - 46: dict( - name='face-46', id=46, color=[255, 0, 0], type='', swap='face-41'), - 47: dict( - name='face-47', id=47, color=[255, 0, 0], type='', swap='face-40'), - 48: dict( - name='face-48', id=48, color=[255, 0, 0], type='', swap='face-54'), - 49: dict( - name='face-49', id=49, color=[255, 0, 0], type='', swap='face-53'), - 50: dict( - name='face-50', id=50, color=[255, 0, 0], type='', swap='face-52'), - 51: dict(name='face-51', id=52, color=[255, 0, 0], type='', swap=''), - 52: dict( - name='face-52', id=52, color=[255, 0, 0], type='', swap='face-50'), - 53: dict( - name='face-53', id=53, color=[255, 0, 0], type='', swap='face-49'), - 54: dict( - name='face-54', id=54, color=[255, 0, 0], type='', swap='face-48'), - 55: dict( - name='face-55', id=55, color=[255, 0, 0], type='', swap='face-59'), - 56: dict( - name='face-56', id=56, color=[255, 0, 0], type='', swap='face-58'), - 57: dict(name='face-57', id=57, color=[255, 0, 0], type='', swap=''), - 58: dict( - name='face-58', id=58, color=[255, 0, 0], type='', swap='face-56'), - 59: dict( - name='face-59', id=59, color=[255, 0, 0], type='', swap='face-55'), - 60: dict( - name='face-60', id=60, color=[255, 0, 0], type='', swap='face-64'), - 61: dict( - name='face-61', id=61, color=[255, 0, 0], type='', swap='face-63'), - 62: dict(name='face-62', id=62, color=[255, 0, 0], type='', swap=''), - 63: dict( - name='face-63', id=63, color=[255, 0, 0], type='', swap='face-61'), - 64: dict( - name='face-64', id=64, color=[255, 0, 0], type='', swap='face-60'), - 65: dict( - name='face-65', id=65, color=[255, 0, 0], type='', swap='face-67'), - 66: dict(name='face-66', id=66, color=[255, 0, 0], type='', swap=''), - 67: dict( - name='face-67', id=67, color=[255, 0, 0], type='', swap='face-65') + 0: dict(name="face-0", id=0, color=[255, 0, 0], type="", swap="face-16"), + 1: dict(name="face-1", id=1, color=[255, 0, 0], type="", swap="face-15"), + 2: dict(name="face-2", id=2, color=[255, 0, 0], type="", swap="face-14"), + 3: dict(name="face-3", id=3, color=[255, 0, 0], type="", swap="face-13"), + 4: dict(name="face-4", id=4, color=[255, 0, 0], type="", swap="face-12"), + 5: dict(name="face-5", id=5, color=[255, 0, 0], type="", swap="face-11"), + 6: dict(name="face-6", id=6, color=[255, 0, 0], type="", swap="face-10"), + 7: dict(name="face-7", id=7, color=[255, 0, 0], type="", swap="face-9"), + 8: dict(name="face-8", id=8, color=[255, 0, 0], type="", swap=""), + 9: dict(name="face-9", id=9, color=[255, 0, 0], type="", swap="face-7"), + 10: dict(name="face-10", id=10, color=[255, 0, 0], type="", swap="face-6"), + 11: dict(name="face-11", id=11, color=[255, 0, 0], type="", swap="face-5"), + 12: dict(name="face-12", id=12, color=[255, 0, 0], type="", swap="face-4"), + 13: dict(name="face-13", id=13, color=[255, 0, 0], type="", swap="face-3"), + 14: dict(name="face-14", id=14, color=[255, 0, 0], type="", swap="face-2"), + 15: dict(name="face-15", id=15, color=[255, 0, 0], type="", swap="face-1"), + 16: dict(name="face-16", id=16, color=[255, 0, 0], type="", swap="face-0"), + 17: dict(name="face-17", id=17, color=[255, 0, 0], type="", swap="face-26"), + 18: dict(name="face-18", id=18, color=[255, 0, 0], type="", swap="face-25"), + 19: dict(name="face-19", id=19, color=[255, 0, 0], type="", swap="face-24"), + 20: dict(name="face-20", id=20, color=[255, 0, 0], type="", swap="face-23"), + 21: dict(name="face-21", id=21, color=[255, 0, 0], type="", swap="face-22"), + 22: dict(name="face-22", id=22, color=[255, 0, 0], type="", swap="face-21"), + 23: dict(name="face-23", id=23, color=[255, 0, 0], type="", swap="face-20"), + 24: dict(name="face-24", id=24, color=[255, 0, 0], type="", swap="face-19"), + 25: dict(name="face-25", id=25, color=[255, 0, 0], type="", swap="face-18"), + 26: dict(name="face-26", id=26, color=[255, 0, 0], type="", swap="face-17"), + 27: dict(name="face-27", id=27, color=[255, 0, 0], type="", swap=""), + 28: dict(name="face-28", id=28, color=[255, 0, 0], type="", swap=""), + 29: dict(name="face-29", id=29, color=[255, 0, 0], type="", swap=""), + 30: dict(name="face-30", id=30, color=[255, 0, 0], type="", swap=""), + 31: dict(name="face-31", id=31, color=[255, 0, 0], type="", swap="face-35"), + 32: dict(name="face-32", id=32, color=[255, 0, 0], type="", swap="face-34"), + 33: dict(name="face-33", id=33, color=[255, 0, 0], type="", swap=""), + 34: dict(name="face-34", id=34, color=[255, 0, 0], type="", swap="face-32"), + 35: dict(name="face-35", id=35, color=[255, 0, 0], type="", swap="face-31"), + 36: dict(name="face-36", id=36, color=[255, 0, 0], type="", swap="face-45"), + 37: dict(name="face-37", id=37, color=[255, 0, 0], type="", swap="face-44"), + 38: dict(name="face-38", id=38, color=[255, 0, 0], type="", swap="face-43"), + 39: dict(name="face-39", id=39, color=[255, 0, 0], type="", swap="face-42"), + 40: dict(name="face-40", id=40, color=[255, 0, 0], type="", swap="face-47"), + 41: dict(name="face-41", id=41, color=[255, 0, 0], type="", swap="face-46"), + 42: dict(name="face-42", id=42, color=[255, 0, 0], type="", swap="face-39"), + 43: dict(name="face-43", id=43, color=[255, 0, 0], type="", swap="face-38"), + 44: dict(name="face-44", id=44, color=[255, 0, 0], type="", swap="face-37"), + 45: dict(name="face-45", id=45, color=[255, 0, 0], type="", swap="face-36"), + 46: dict(name="face-46", id=46, color=[255, 0, 0], type="", swap="face-41"), + 47: dict(name="face-47", id=47, color=[255, 0, 0], type="", swap="face-40"), + 48: dict(name="face-48", id=48, color=[255, 0, 0], type="", swap="face-54"), + 49: dict(name="face-49", id=49, color=[255, 0, 0], type="", swap="face-53"), + 50: dict(name="face-50", id=50, color=[255, 0, 0], type="", swap="face-52"), + 51: dict(name="face-51", id=52, color=[255, 0, 0], type="", swap=""), + 52: dict(name="face-52", id=52, color=[255, 0, 0], type="", swap="face-50"), + 53: dict(name="face-53", id=53, color=[255, 0, 0], type="", swap="face-49"), + 54: dict(name="face-54", id=54, color=[255, 0, 0], type="", swap="face-48"), + 55: dict(name="face-55", id=55, color=[255, 0, 0], type="", swap="face-59"), + 56: dict(name="face-56", id=56, color=[255, 0, 0], type="", swap="face-58"), + 57: dict(name="face-57", id=57, color=[255, 0, 0], type="", swap=""), + 58: dict(name="face-58", id=58, color=[255, 0, 0], type="", swap="face-56"), + 59: dict(name="face-59", id=59, color=[255, 0, 0], type="", swap="face-55"), + 60: dict(name="face-60", id=60, color=[255, 0, 0], type="", swap="face-64"), + 61: dict(name="face-61", id=61, color=[255, 0, 0], type="", swap="face-63"), + 62: dict(name="face-62", id=62, color=[255, 0, 0], type="", swap=""), + 63: dict(name="face-63", id=63, color=[255, 0, 0], type="", swap="face-61"), + 64: dict(name="face-64", id=64, color=[255, 0, 0], type="", swap="face-60"), + 65: dict(name="face-65", id=65, color=[255, 0, 0], type="", swap="face-67"), + 66: dict(name="face-66", id=66, color=[255, 0, 0], type="", swap=""), + 67: dict(name="face-67", id=67, color=[255, 0, 0], type="", swap="face-65"), }, skeleton_info={}, - joint_weights=[1.] * 68, - + joint_weights=[1.0] * 68, # 'https://github.com/jin-s13/COCO-WholeBody/blob/master/' # 'evaluation/myeval_wholebody.py#L177' sigmas=[ - 0.042, 0.043, 0.044, 0.043, 0.040, 0.035, 0.031, 0.025, 0.020, 0.023, - 0.029, 0.032, 0.037, 0.038, 0.043, 0.041, 0.045, 0.013, 0.012, 0.011, - 0.011, 0.012, 0.012, 0.011, 0.011, 0.013, 0.015, 0.009, 0.007, 0.007, - 0.007, 0.012, 0.009, 0.008, 0.016, 0.010, 0.017, 0.011, 0.009, 0.011, - 0.009, 0.007, 0.013, 0.008, 0.011, 0.012, 0.010, 0.034, 0.008, 0.008, - 0.009, 0.008, 0.008, 0.007, 0.010, 0.008, 0.009, 0.009, 0.009, 0.007, - 0.007, 0.008, 0.011, 0.008, 0.008, 0.008, 0.01, 0.008 - ]) + 0.042, + 0.043, + 0.044, + 0.043, + 0.040, + 0.035, + 0.031, + 0.025, + 0.020, + 0.023, + 0.029, + 0.032, + 0.037, + 0.038, + 0.043, + 0.041, + 0.045, + 0.013, + 0.012, + 0.011, + 0.011, + 0.012, + 0.012, + 0.011, + 0.011, + 0.013, + 0.015, + 0.009, + 0.007, + 0.007, + 0.007, + 0.012, + 0.009, + 0.008, + 0.016, + 0.010, + 0.017, + 0.011, + 0.009, + 0.011, + 0.009, + 0.007, + 0.013, + 0.008, + 0.011, + 0.012, + 0.010, + 0.034, + 0.008, + 0.008, + 0.009, + 0.008, + 0.008, + 0.007, + 0.010, + 0.008, + 0.009, + 0.009, + 0.009, + 0.007, + 0.007, + 0.008, + 0.011, + 0.008, + 0.008, + 0.008, + 0.01, + 0.008, + ], +) diff --git a/mmpose/configs/_base_/datasets/coco_wholebody_hand.py b/mmpose/configs/_base_/datasets/coco_wholebody_hand.py index 1910b2ced5a8b31cd6f83911e41cae9f1a580222..d98bb94c367c2ccb76c3df1c1ee5d4a72b3e8110 100644 --- a/mmpose/configs/_base_/datasets/coco_wholebody_hand.py +++ b/mmpose/configs/_base_/datasets/coco_wholebody_hand.py @@ -1,147 +1,79 @@ dataset_info = dict( - dataset_name='coco_wholebody_hand', + dataset_name="coco_wholebody_hand", paper_info=dict( - author='Jin, Sheng and Xu, Lumin and Xu, Jin and ' - 'Wang, Can and Liu, Wentao and ' - 'Qian, Chen and Ouyang, Wanli and Luo, Ping', - title='Whole-Body Human Pose Estimation in the Wild', - container='Proceedings of the European ' - 'Conference on Computer Vision (ECCV)', - year='2020', - homepage='https://github.com/jin-s13/COCO-WholeBody/', + author="Jin, Sheng and Xu, Lumin and Xu, Jin and " "Wang, Can and Liu, Wentao and " "Qian, Chen and Ouyang, Wanli and Luo, Ping", + title="Whole-Body Human Pose Estimation in the Wild", + container="Proceedings of the European " "Conference on Computer Vision (ECCV)", + year="2020", + homepage="https://github.com/jin-s13/COCO-WholeBody/", ), keypoint_info={ - 0: - dict(name='wrist', id=0, color=[255, 255, 255], type='', swap=''), - 1: - dict(name='thumb1', id=1, color=[255, 128, 0], type='', swap=''), - 2: - dict(name='thumb2', id=2, color=[255, 128, 0], type='', swap=''), - 3: - dict(name='thumb3', id=3, color=[255, 128, 0], type='', swap=''), - 4: - dict(name='thumb4', id=4, color=[255, 128, 0], type='', swap=''), - 5: - dict( - name='forefinger1', id=5, color=[255, 153, 255], type='', swap=''), - 6: - dict( - name='forefinger2', id=6, color=[255, 153, 255], type='', swap=''), - 7: - dict( - name='forefinger3', id=7, color=[255, 153, 255], type='', swap=''), - 8: - dict( - name='forefinger4', id=8, color=[255, 153, 255], type='', swap=''), - 9: - dict( - name='middle_finger1', - id=9, - color=[102, 178, 255], - type='', - swap=''), - 10: - dict( - name='middle_finger2', - id=10, - color=[102, 178, 255], - type='', - swap=''), - 11: - dict( - name='middle_finger3', - id=11, - color=[102, 178, 255], - type='', - swap=''), - 12: - dict( - name='middle_finger4', - id=12, - color=[102, 178, 255], - type='', - swap=''), - 13: - dict( - name='ring_finger1', id=13, color=[255, 51, 51], type='', swap=''), - 14: - dict( - name='ring_finger2', id=14, color=[255, 51, 51], type='', swap=''), - 15: - dict( - name='ring_finger3', id=15, color=[255, 51, 51], type='', swap=''), - 16: - dict( - name='ring_finger4', id=16, color=[255, 51, 51], type='', swap=''), - 17: - dict(name='pinky_finger1', id=17, color=[0, 255, 0], type='', swap=''), - 18: - dict(name='pinky_finger2', id=18, color=[0, 255, 0], type='', swap=''), - 19: - dict(name='pinky_finger3', id=19, color=[0, 255, 0], type='', swap=''), - 20: - dict(name='pinky_finger4', id=20, color=[0, 255, 0], type='', swap='') + 0: dict(name="wrist", id=0, color=[255, 255, 255], type="", swap=""), + 1: dict(name="thumb1", id=1, color=[255, 128, 0], type="", swap=""), + 2: dict(name="thumb2", id=2, color=[255, 128, 0], type="", swap=""), + 3: dict(name="thumb3", id=3, color=[255, 128, 0], type="", swap=""), + 4: dict(name="thumb4", id=4, color=[255, 128, 0], type="", swap=""), + 5: dict(name="forefinger1", id=5, color=[255, 153, 255], type="", swap=""), + 6: dict(name="forefinger2", id=6, color=[255, 153, 255], type="", swap=""), + 7: dict(name="forefinger3", id=7, color=[255, 153, 255], type="", swap=""), + 8: dict(name="forefinger4", id=8, color=[255, 153, 255], type="", swap=""), + 9: dict(name="middle_finger1", id=9, color=[102, 178, 255], type="", swap=""), + 10: dict(name="middle_finger2", id=10, color=[102, 178, 255], type="", swap=""), + 11: dict(name="middle_finger3", id=11, color=[102, 178, 255], type="", swap=""), + 12: dict(name="middle_finger4", id=12, color=[102, 178, 255], type="", swap=""), + 13: dict(name="ring_finger1", id=13, color=[255, 51, 51], type="", swap=""), + 14: dict(name="ring_finger2", id=14, color=[255, 51, 51], type="", swap=""), + 15: dict(name="ring_finger3", id=15, color=[255, 51, 51], type="", swap=""), + 16: dict(name="ring_finger4", id=16, color=[255, 51, 51], type="", swap=""), + 17: dict(name="pinky_finger1", id=17, color=[0, 255, 0], type="", swap=""), + 18: dict(name="pinky_finger2", id=18, color=[0, 255, 0], type="", swap=""), + 19: dict(name="pinky_finger3", id=19, color=[0, 255, 0], type="", swap=""), + 20: dict(name="pinky_finger4", id=20, color=[0, 255, 0], type="", swap=""), }, skeleton_info={ - 0: - dict(link=('wrist', 'thumb1'), id=0, color=[255, 128, 0]), - 1: - dict(link=('thumb1', 'thumb2'), id=1, color=[255, 128, 0]), - 2: - dict(link=('thumb2', 'thumb3'), id=2, color=[255, 128, 0]), - 3: - dict(link=('thumb3', 'thumb4'), id=3, color=[255, 128, 0]), - 4: - dict(link=('wrist', 'forefinger1'), id=4, color=[255, 153, 255]), - 5: - dict(link=('forefinger1', 'forefinger2'), id=5, color=[255, 153, 255]), - 6: - dict(link=('forefinger2', 'forefinger3'), id=6, color=[255, 153, 255]), - 7: - dict(link=('forefinger3', 'forefinger4'), id=7, color=[255, 153, 255]), - 8: - dict(link=('wrist', 'middle_finger1'), id=8, color=[102, 178, 255]), - 9: - dict( - link=('middle_finger1', 'middle_finger2'), - id=9, - color=[102, 178, 255]), - 10: - dict( - link=('middle_finger2', 'middle_finger3'), - id=10, - color=[102, 178, 255]), - 11: - dict( - link=('middle_finger3', 'middle_finger4'), - id=11, - color=[102, 178, 255]), - 12: - dict(link=('wrist', 'ring_finger1'), id=12, color=[255, 51, 51]), - 13: - dict( - link=('ring_finger1', 'ring_finger2'), id=13, color=[255, 51, 51]), - 14: - dict( - link=('ring_finger2', 'ring_finger3'), id=14, color=[255, 51, 51]), - 15: - dict( - link=('ring_finger3', 'ring_finger4'), id=15, color=[255, 51, 51]), - 16: - dict(link=('wrist', 'pinky_finger1'), id=16, color=[0, 255, 0]), - 17: - dict( - link=('pinky_finger1', 'pinky_finger2'), id=17, color=[0, 255, 0]), - 18: - dict( - link=('pinky_finger2', 'pinky_finger3'), id=18, color=[0, 255, 0]), - 19: - dict( - link=('pinky_finger3', 'pinky_finger4'), id=19, color=[0, 255, 0]) + 0: dict(link=("wrist", "thumb1"), id=0, color=[255, 128, 0]), + 1: dict(link=("thumb1", "thumb2"), id=1, color=[255, 128, 0]), + 2: dict(link=("thumb2", "thumb3"), id=2, color=[255, 128, 0]), + 3: dict(link=("thumb3", "thumb4"), id=3, color=[255, 128, 0]), + 4: dict(link=("wrist", "forefinger1"), id=4, color=[255, 153, 255]), + 5: dict(link=("forefinger1", "forefinger2"), id=5, color=[255, 153, 255]), + 6: dict(link=("forefinger2", "forefinger3"), id=6, color=[255, 153, 255]), + 7: dict(link=("forefinger3", "forefinger4"), id=7, color=[255, 153, 255]), + 8: dict(link=("wrist", "middle_finger1"), id=8, color=[102, 178, 255]), + 9: dict(link=("middle_finger1", "middle_finger2"), id=9, color=[102, 178, 255]), + 10: dict(link=("middle_finger2", "middle_finger3"), id=10, color=[102, 178, 255]), + 11: dict(link=("middle_finger3", "middle_finger4"), id=11, color=[102, 178, 255]), + 12: dict(link=("wrist", "ring_finger1"), id=12, color=[255, 51, 51]), + 13: dict(link=("ring_finger1", "ring_finger2"), id=13, color=[255, 51, 51]), + 14: dict(link=("ring_finger2", "ring_finger3"), id=14, color=[255, 51, 51]), + 15: dict(link=("ring_finger3", "ring_finger4"), id=15, color=[255, 51, 51]), + 16: dict(link=("wrist", "pinky_finger1"), id=16, color=[0, 255, 0]), + 17: dict(link=("pinky_finger1", "pinky_finger2"), id=17, color=[0, 255, 0]), + 18: dict(link=("pinky_finger2", "pinky_finger3"), id=18, color=[0, 255, 0]), + 19: dict(link=("pinky_finger3", "pinky_finger4"), id=19, color=[0, 255, 0]), }, - joint_weights=[1.] * 21, + joint_weights=[1.0] * 21, sigmas=[ - 0.029, 0.022, 0.035, 0.037, 0.047, 0.026, 0.025, 0.024, 0.035, 0.018, - 0.024, 0.022, 0.026, 0.017, 0.021, 0.021, 0.032, 0.02, 0.019, 0.022, - 0.031 - ]) + 0.029, + 0.022, + 0.035, + 0.037, + 0.047, + 0.026, + 0.025, + 0.024, + 0.035, + 0.018, + 0.024, + 0.022, + 0.026, + 0.017, + 0.021, + 0.021, + 0.032, + 0.02, + 0.019, + 0.022, + 0.031, + ], +) diff --git a/mmpose/configs/_base_/datasets/coco_wholebody_openpose.py b/mmpose/configs/_base_/datasets/coco_wholebody_openpose.py index f05dda18abc4f3b02020d5ad4fc19154e715f97d..1f8c38c48f24c326107d62875363c7e6b39a4526 100644 --- a/mmpose/configs/_base_/datasets/coco_wholebody_openpose.py +++ b/mmpose/configs/_base_/datasets/coco_wholebody_openpose.py @@ -1,1128 +1,344 @@ dataset_info = dict( - dataset_name='coco_wholebody_openpose', + dataset_name="coco_wholebody_openpose", paper_info=dict( - author='Jin, Sheng and Xu, Lumin and Xu, Jin and ' - 'Wang, Can and Liu, Wentao and ' - 'Qian, Chen and Ouyang, Wanli and Luo, Ping', - title='Whole-Body Human Pose Estimation in the Wild', - container='Proceedings of the European ' - 'Conference on Computer Vision (ECCV)', - year='2020', - homepage='https://github.com/jin-s13/COCO-WholeBody/', + author="Jin, Sheng and Xu, Lumin and Xu, Jin and " "Wang, Can and Liu, Wentao and " "Qian, Chen and Ouyang, Wanli and Luo, Ping", + title="Whole-Body Human Pose Estimation in the Wild", + container="Proceedings of the European " "Conference on Computer Vision (ECCV)", + year="2020", + homepage="https://github.com/jin-s13/COCO-WholeBody/", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[255, 0, 0], type='upper', swap=''), - 1: - dict(name='neck', id=1, color=[255, 85, 0], type='upper', swap=''), - 2: - dict( - name='right_shoulder', - id=2, - color=[255, 170, 0], - type='upper', - swap='left_shoulder'), - 3: - dict( - name='right_elbow', - id=3, - color=[255, 255, 0], - type='upper', - swap='left_elbow'), - 4: - dict( - name='right_wrist', - id=4, - color=[170, 255, 0], - type='upper', - swap='left_wrist'), - 5: - dict( - name='left_shoulder', - id=5, - color=[85, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='left_elbow', - id=6, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 7: - dict( - name='left_wrist', - id=7, - color=[0, 255, 85], - type='upper', - swap='right_wrist'), - 8: - dict( - name='right_hip', - id=8, - color=[0, 255, 170], - type='lower', - swap='left_hip'), - 9: - dict( - name='right_knee', - id=9, - color=[0, 255, 255], - type='lower', - swap='left_knee'), - 10: - dict( - name='right_ankle', - id=10, - color=[0, 170, 255], - type='lower', - swap='left_ankle'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 85, 255], - type='lower', - swap='right_hip'), - 12: - dict( - name='left_knee', - id=12, - color=[0, 0, 255], - type='lower', - swap='right_knee'), - 13: - dict( - name='left_ankle', - id=13, - color=[85, 0, 255], - type='lower', - swap='right_ankle'), - 14: - dict( - name='right_eye', - id=14, - color=[170, 0, 255], - type='upper', - swap='left_eye'), - 15: - dict( - name='left_eye', - id=15, - color=[255, 0, 255], - type='upper', - swap='right_eye'), - 16: - dict( - name='right_ear', - id=16, - color=[255, 0, 170], - type='upper', - swap='left_ear'), - 17: - dict( - name='left_ear', - id=17, - color=[255, 0, 85], - type='upper', - swap='right_ear'), - 18: - dict( - name='left_big_toe', - id=17, - color=[0, 0, 0], - type='lower', - swap='right_big_toe'), - 19: - dict( - name='left_small_toe', - id=18, - color=[0, 0, 0], - type='lower', - swap='right_small_toe'), - 20: - dict( - name='left_heel', - id=19, - color=[0, 0, 0], - type='lower', - swap='right_heel'), - 21: - dict( - name='right_big_toe', - id=20, - color=[0, 0, 0], - type='lower', - swap='left_big_toe'), - 22: - dict( - name='right_small_toe', - id=21, - color=[0, 0, 0], - type='lower', - swap='left_small_toe'), - 23: - dict( - name='right_heel', - id=22, - color=[0, 0, 0], - type='lower', - swap='left_heel'), - 24: - dict( - name='face-0', - id=23, - color=[255, 255, 255], - type='', - swap='face-16'), - 25: - dict( - name='face-1', - id=24, - color=[255, 255, 255], - type='', - swap='face-15'), - 26: - dict( - name='face-2', - id=25, - color=[255, 255, 255], - type='', - swap='face-14'), - 27: - dict( - name='face-3', - id=26, - color=[255, 255, 255], - type='', - swap='face-13'), - 28: - dict( - name='face-4', - id=27, - color=[255, 255, 255], - type='', - swap='face-12'), - 29: - dict( - name='face-5', - id=28, - color=[255, 255, 255], - type='', - swap='face-11'), - 30: - dict( - name='face-6', - id=29, - color=[255, 255, 255], - type='', - swap='face-10'), - 31: - dict( - name='face-7', - id=30, - color=[255, 255, 255], - type='', - swap='face-9'), - 32: - dict(name='face-8', id=31, color=[255, 255, 255], type='', swap=''), - 33: - dict( - name='face-9', - id=32, - color=[255, 255, 255], - type='', - swap='face-7'), - 34: - dict( - name='face-10', - id=33, - color=[255, 255, 255], - type='', - swap='face-6'), - 35: - dict( - name='face-11', - id=34, - color=[255, 255, 255], - type='', - swap='face-5'), - 36: - dict( - name='face-12', - id=35, - color=[255, 255, 255], - type='', - swap='face-4'), - 37: - dict( - name='face-13', - id=36, - color=[255, 255, 255], - type='', - swap='face-3'), - 38: - dict( - name='face-14', - id=37, - color=[255, 255, 255], - type='', - swap='face-2'), - 39: - dict( - name='face-15', - id=38, - color=[255, 255, 255], - type='', - swap='face-1'), - 40: - dict( - name='face-16', - id=39, - color=[255, 255, 255], - type='', - swap='face-0'), - 41: - dict( - name='face-17', - id=40, - color=[255, 255, 255], - type='', - swap='face-26'), - 42: - dict( - name='face-18', - id=41, - color=[255, 255, 255], - type='', - swap='face-25'), - 43: - dict( - name='face-19', - id=42, - color=[255, 255, 255], - type='', - swap='face-24'), - 44: - dict( - name='face-20', - id=43, - color=[255, 255, 255], - type='', - swap='face-23'), - 45: - dict( - name='face-21', - id=44, - color=[255, 255, 255], - type='', - swap='face-22'), - 46: - dict( - name='face-22', - id=45, - color=[255, 255, 255], - type='', - swap='face-21'), - 47: - dict( - name='face-23', - id=46, - color=[255, 255, 255], - type='', - swap='face-20'), - 48: - dict( - name='face-24', - id=47, - color=[255, 255, 255], - type='', - swap='face-19'), - 49: - dict( - name='face-25', - id=48, - color=[255, 255, 255], - type='', - swap='face-18'), - 50: - dict( - name='face-26', - id=49, - color=[255, 255, 255], - type='', - swap='face-17'), - 51: - dict(name='face-27', id=50, color=[255, 255, 255], type='', swap=''), - 52: - dict(name='face-28', id=51, color=[255, 255, 255], type='', swap=''), - 53: - dict(name='face-29', id=52, color=[255, 255, 255], type='', swap=''), - 54: - dict(name='face-30', id=53, color=[255, 255, 255], type='', swap=''), - 55: - dict( - name='face-31', - id=54, - color=[255, 255, 255], - type='', - swap='face-35'), - 56: - dict( - name='face-32', - id=55, - color=[255, 255, 255], - type='', - swap='face-34'), - 57: - dict(name='face-33', id=56, color=[255, 255, 255], type='', swap=''), - 58: - dict( - name='face-34', - id=57, - color=[255, 255, 255], - type='', - swap='face-32'), - 59: - dict( - name='face-35', - id=58, - color=[255, 255, 255], - type='', - swap='face-31'), - 60: - dict( - name='face-36', - id=59, - color=[255, 255, 255], - type='', - swap='face-45'), - 61: - dict( - name='face-37', - id=60, - color=[255, 255, 255], - type='', - swap='face-44'), - 62: - dict( - name='face-38', - id=61, - color=[255, 255, 255], - type='', - swap='face-43'), - 63: - dict( - name='face-39', - id=62, - color=[255, 255, 255], - type='', - swap='face-42'), - 64: - dict( - name='face-40', - id=63, - color=[255, 255, 255], - type='', - swap='face-47'), - 65: - dict( - name='face-41', - id=64, - color=[255, 255, 255], - type='', - swap='face-46'), - 66: - dict( - name='face-42', - id=65, - color=[255, 255, 255], - type='', - swap='face-39'), - 67: - dict( - name='face-43', - id=66, - color=[255, 255, 255], - type='', - swap='face-38'), - 68: - dict( - name='face-44', - id=67, - color=[255, 255, 255], - type='', - swap='face-37'), - 69: - dict( - name='face-45', - id=68, - color=[255, 255, 255], - type='', - swap='face-36'), - 70: - dict( - name='face-46', - id=69, - color=[255, 255, 255], - type='', - swap='face-41'), - 71: - dict( - name='face-47', - id=70, - color=[255, 255, 255], - type='', - swap='face-40'), - 72: - dict( - name='face-48', - id=71, - color=[255, 255, 255], - type='', - swap='face-54'), - 73: - dict( - name='face-49', - id=72, - color=[255, 255, 255], - type='', - swap='face-53'), - 74: - dict( - name='face-50', - id=73, - color=[255, 255, 255], - type='', - swap='face-52'), - 75: - dict(name='face-51', id=74, color=[255, 255, 255], type='', swap=''), - 76: - dict( - name='face-52', - id=75, - color=[255, 255, 255], - type='', - swap='face-50'), - 77: - dict( - name='face-53', - id=76, - color=[255, 255, 255], - type='', - swap='face-49'), - 78: - dict( - name='face-54', - id=77, - color=[255, 255, 255], - type='', - swap='face-48'), - 79: - dict( - name='face-55', - id=78, - color=[255, 255, 255], - type='', - swap='face-59'), - 80: - dict( - name='face-56', - id=79, - color=[255, 255, 255], - type='', - swap='face-58'), - 81: - dict(name='face-57', id=80, color=[255, 255, 255], type='', swap=''), - 82: - dict( - name='face-58', - id=81, - color=[255, 255, 255], - type='', - swap='face-56'), - 83: - dict( - name='face-59', - id=82, - color=[255, 255, 255], - type='', - swap='face-55'), - 84: - dict( - name='face-60', - id=83, - color=[255, 255, 255], - type='', - swap='face-64'), - 85: - dict( - name='face-61', - id=84, - color=[255, 255, 255], - type='', - swap='face-63'), - 86: - dict(name='face-62', id=85, color=[255, 255, 255], type='', swap=''), - 87: - dict( - name='face-63', - id=86, - color=[255, 255, 255], - type='', - swap='face-61'), - 88: - dict( - name='face-64', - id=87, - color=[255, 255, 255], - type='', - swap='face-60'), - 89: - dict( - name='face-65', - id=88, - color=[255, 255, 255], - type='', - swap='face-67'), - 90: - dict(name='face-66', id=89, color=[255, 255, 255], type='', swap=''), - 91: - dict( - name='face-67', - id=90, - color=[255, 255, 255], - type='', - swap='face-65'), - 92: - dict( - name='left_hand_root', - id=92, - color=[0, 0, 255], - type='', - swap='right_hand_root'), - 93: - dict( - name='left_thumb1', - id=93, - color=[0, 0, 255], - type='', - swap='right_thumb1'), - 94: - dict( - name='left_thumb2', - id=94, - color=[0, 0, 255], - type='', - swap='right_thumb2'), - 95: - dict( - name='left_thumb3', - id=95, - color=[0, 0, 255], - type='', - swap='right_thumb3'), - 96: - dict( - name='left_thumb4', - id=96, - color=[0, 0, 255], - type='', - swap='right_thumb4'), - 97: - dict( - name='left_forefinger1', - id=97, - color=[0, 0, 255], - type='', - swap='right_forefinger1'), - 98: - dict( - name='left_forefinger2', - id=98, - color=[0, 0, 255], - type='', - swap='right_forefinger2'), - 99: - dict( - name='left_forefinger3', - id=99, - color=[0, 0, 255], - type='', - swap='right_forefinger3'), - 100: - dict( - name='left_forefinger4', - id=100, - color=[0, 0, 255], - type='', - swap='right_forefinger4'), - 101: - dict( - name='left_middle_finger1', - id=101, - color=[0, 0, 255], - type='', - swap='right_middle_finger1'), - 102: - dict( - name='left_middle_finger2', - id=102, - color=[0, 0, 255], - type='', - swap='right_middle_finger2'), - 103: - dict( - name='left_middle_finger3', - id=103, - color=[0, 0, 255], - type='', - swap='right_middle_finger3'), - 104: - dict( - name='left_middle_finger4', - id=104, - color=[0, 0, 255], - type='', - swap='right_middle_finger4'), - 105: - dict( - name='left_ring_finger1', - id=105, - color=[0, 0, 255], - type='', - swap='right_ring_finger1'), - 106: - dict( - name='left_ring_finger2', - id=106, - color=[0, 0, 255], - type='', - swap='right_ring_finger2'), - 107: - dict( - name='left_ring_finger3', - id=107, - color=[0, 0, 255], - type='', - swap='right_ring_finger3'), - 108: - dict( - name='left_ring_finger4', - id=108, - color=[0, 0, 255], - type='', - swap='right_ring_finger4'), - 109: - dict( - name='left_pinky_finger1', - id=109, - color=[0, 0, 255], - type='', - swap='right_pinky_finger1'), - 110: - dict( - name='left_pinky_finger2', - id=110, - color=[0, 0, 255], - type='', - swap='right_pinky_finger2'), - 111: - dict( - name='left_pinky_finger3', - id=111, - color=[0, 0, 255], - type='', - swap='right_pinky_finger3'), - 112: - dict( - name='left_pinky_finger4', - id=112, - color=[0, 0, 255], - type='', - swap='right_pinky_finger4'), - 113: - dict( - name='right_hand_root', - id=113, - color=[0, 0, 255], - type='', - swap='left_hand_root'), - 114: - dict( - name='right_thumb1', - id=114, - color=[0, 0, 255], - type='', - swap='left_thumb1'), - 115: - dict( - name='right_thumb2', - id=115, - color=[0, 0, 255], - type='', - swap='left_thumb2'), - 116: - dict( - name='right_thumb3', - id=116, - color=[0, 0, 255], - type='', - swap='left_thumb3'), - 117: - dict( - name='right_thumb4', - id=117, - color=[0, 0, 255], - type='', - swap='left_thumb4'), - 118: - dict( - name='right_forefinger1', - id=118, - color=[0, 0, 255], - type='', - swap='left_forefinger1'), - 119: - dict( - name='right_forefinger2', - id=119, - color=[0, 0, 255], - type='', - swap='left_forefinger2'), - 120: - dict( - name='right_forefinger3', - id=120, - color=[0, 0, 255], - type='', - swap='left_forefinger3'), - 121: - dict( - name='right_forefinger4', - id=121, - color=[0, 0, 255], - type='', - swap='left_forefinger4'), - 122: - dict( - name='right_middle_finger1', - id=122, - color=[0, 0, 255], - type='', - swap='left_middle_finger1'), - 123: - dict( - name='right_middle_finger2', - id=123, - color=[0, 0, 255], - type='', - swap='left_middle_finger2'), - 124: - dict( - name='right_middle_finger3', - id=124, - color=[0, 0, 255], - type='', - swap='left_middle_finger3'), - 125: - dict( - name='right_middle_finger4', - id=125, - color=[0, 0, 255], - type='', - swap='left_middle_finger4'), - 126: - dict( - name='right_ring_finger1', - id=126, - color=[0, 0, 255], - type='', - swap='left_ring_finger1'), - 127: - dict( - name='right_ring_finger2', - id=127, - color=[0, 0, 255], - type='', - swap='left_ring_finger2'), - 128: - dict( - name='right_ring_finger3', - id=128, - color=[0, 0, 255], - type='', - swap='left_ring_finger3'), - 129: - dict( - name='right_ring_finger4', - id=129, - color=[0, 0, 255], - type='', - swap='left_ring_finger4'), - 130: - dict( - name='right_pinky_finger1', - id=130, - color=[0, 0, 255], - type='', - swap='left_pinky_finger1'), - 131: - dict( - name='right_pinky_finger2', - id=131, - color=[0, 0, 255], - type='', - swap='left_pinky_finger2'), - 132: - dict( - name='right_pinky_finger3', - id=132, - color=[0, 0, 255], - type='', - swap='left_pinky_finger3'), - 133: - dict( - name='right_pinky_finger4', - id=133, - color=[0, 0, 255], - type='', - swap='left_pinky_finger4') + 0: dict(name="nose", id=0, color=[255, 0, 0], type="upper", swap=""), + 1: dict(name="neck", id=1, color=[255, 85, 0], type="upper", swap=""), + 2: dict(name="right_shoulder", id=2, color=[255, 170, 0], type="upper", swap="left_shoulder"), + 3: dict(name="right_elbow", id=3, color=[255, 255, 0], type="upper", swap="left_elbow"), + 4: dict(name="right_wrist", id=4, color=[170, 255, 0], type="upper", swap="left_wrist"), + 5: dict(name="left_shoulder", id=5, color=[85, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="left_elbow", id=6, color=[0, 255, 0], type="upper", swap="right_elbow"), + 7: dict(name="left_wrist", id=7, color=[0, 255, 85], type="upper", swap="right_wrist"), + 8: dict(name="right_hip", id=8, color=[0, 255, 170], type="lower", swap="left_hip"), + 9: dict(name="right_knee", id=9, color=[0, 255, 255], type="lower", swap="left_knee"), + 10: dict(name="right_ankle", id=10, color=[0, 170, 255], type="lower", swap="left_ankle"), + 11: dict(name="left_hip", id=11, color=[0, 85, 255], type="lower", swap="right_hip"), + 12: dict(name="left_knee", id=12, color=[0, 0, 255], type="lower", swap="right_knee"), + 13: dict(name="left_ankle", id=13, color=[85, 0, 255], type="lower", swap="right_ankle"), + 14: dict(name="right_eye", id=14, color=[170, 0, 255], type="upper", swap="left_eye"), + 15: dict(name="left_eye", id=15, color=[255, 0, 255], type="upper", swap="right_eye"), + 16: dict(name="right_ear", id=16, color=[255, 0, 170], type="upper", swap="left_ear"), + 17: dict(name="left_ear", id=17, color=[255, 0, 85], type="upper", swap="right_ear"), + 18: dict(name="left_big_toe", id=17, color=[0, 0, 0], type="lower", swap="right_big_toe"), + 19: dict(name="left_small_toe", id=18, color=[0, 0, 0], type="lower", swap="right_small_toe"), + 20: dict(name="left_heel", id=19, color=[0, 0, 0], type="lower", swap="right_heel"), + 21: dict(name="right_big_toe", id=20, color=[0, 0, 0], type="lower", swap="left_big_toe"), + 22: dict(name="right_small_toe", id=21, color=[0, 0, 0], type="lower", swap="left_small_toe"), + 23: dict(name="right_heel", id=22, color=[0, 0, 0], type="lower", swap="left_heel"), + 24: dict(name="face-0", id=23, color=[255, 255, 255], type="", swap="face-16"), + 25: dict(name="face-1", id=24, color=[255, 255, 255], type="", swap="face-15"), + 26: dict(name="face-2", id=25, color=[255, 255, 255], type="", swap="face-14"), + 27: dict(name="face-3", id=26, color=[255, 255, 255], type="", swap="face-13"), + 28: dict(name="face-4", id=27, color=[255, 255, 255], type="", swap="face-12"), + 29: dict(name="face-5", id=28, color=[255, 255, 255], type="", swap="face-11"), + 30: dict(name="face-6", id=29, color=[255, 255, 255], type="", swap="face-10"), + 31: dict(name="face-7", id=30, color=[255, 255, 255], type="", swap="face-9"), + 32: dict(name="face-8", id=31, color=[255, 255, 255], type="", swap=""), + 33: dict(name="face-9", id=32, color=[255, 255, 255], type="", swap="face-7"), + 34: dict(name="face-10", id=33, color=[255, 255, 255], type="", swap="face-6"), + 35: dict(name="face-11", id=34, color=[255, 255, 255], type="", swap="face-5"), + 36: dict(name="face-12", id=35, color=[255, 255, 255], type="", swap="face-4"), + 37: dict(name="face-13", id=36, color=[255, 255, 255], type="", swap="face-3"), + 38: dict(name="face-14", id=37, color=[255, 255, 255], type="", swap="face-2"), + 39: dict(name="face-15", id=38, color=[255, 255, 255], type="", swap="face-1"), + 40: dict(name="face-16", id=39, color=[255, 255, 255], type="", swap="face-0"), + 41: dict(name="face-17", id=40, color=[255, 255, 255], type="", swap="face-26"), + 42: dict(name="face-18", id=41, color=[255, 255, 255], type="", swap="face-25"), + 43: dict(name="face-19", id=42, color=[255, 255, 255], type="", swap="face-24"), + 44: dict(name="face-20", id=43, color=[255, 255, 255], type="", swap="face-23"), + 45: dict(name="face-21", id=44, color=[255, 255, 255], type="", swap="face-22"), + 46: dict(name="face-22", id=45, color=[255, 255, 255], type="", swap="face-21"), + 47: dict(name="face-23", id=46, color=[255, 255, 255], type="", swap="face-20"), + 48: dict(name="face-24", id=47, color=[255, 255, 255], type="", swap="face-19"), + 49: dict(name="face-25", id=48, color=[255, 255, 255], type="", swap="face-18"), + 50: dict(name="face-26", id=49, color=[255, 255, 255], type="", swap="face-17"), + 51: dict(name="face-27", id=50, color=[255, 255, 255], type="", swap=""), + 52: dict(name="face-28", id=51, color=[255, 255, 255], type="", swap=""), + 53: dict(name="face-29", id=52, color=[255, 255, 255], type="", swap=""), + 54: dict(name="face-30", id=53, color=[255, 255, 255], type="", swap=""), + 55: dict(name="face-31", id=54, color=[255, 255, 255], type="", swap="face-35"), + 56: dict(name="face-32", id=55, color=[255, 255, 255], type="", swap="face-34"), + 57: dict(name="face-33", id=56, color=[255, 255, 255], type="", swap=""), + 58: dict(name="face-34", id=57, color=[255, 255, 255], type="", swap="face-32"), + 59: dict(name="face-35", id=58, color=[255, 255, 255], type="", swap="face-31"), + 60: dict(name="face-36", id=59, color=[255, 255, 255], type="", swap="face-45"), + 61: dict(name="face-37", id=60, color=[255, 255, 255], type="", swap="face-44"), + 62: dict(name="face-38", id=61, color=[255, 255, 255], type="", swap="face-43"), + 63: dict(name="face-39", id=62, color=[255, 255, 255], type="", swap="face-42"), + 64: dict(name="face-40", id=63, color=[255, 255, 255], type="", swap="face-47"), + 65: dict(name="face-41", id=64, color=[255, 255, 255], type="", swap="face-46"), + 66: dict(name="face-42", id=65, color=[255, 255, 255], type="", swap="face-39"), + 67: dict(name="face-43", id=66, color=[255, 255, 255], type="", swap="face-38"), + 68: dict(name="face-44", id=67, color=[255, 255, 255], type="", swap="face-37"), + 69: dict(name="face-45", id=68, color=[255, 255, 255], type="", swap="face-36"), + 70: dict(name="face-46", id=69, color=[255, 255, 255], type="", swap="face-41"), + 71: dict(name="face-47", id=70, color=[255, 255, 255], type="", swap="face-40"), + 72: dict(name="face-48", id=71, color=[255, 255, 255], type="", swap="face-54"), + 73: dict(name="face-49", id=72, color=[255, 255, 255], type="", swap="face-53"), + 74: dict(name="face-50", id=73, color=[255, 255, 255], type="", swap="face-52"), + 75: dict(name="face-51", id=74, color=[255, 255, 255], type="", swap=""), + 76: dict(name="face-52", id=75, color=[255, 255, 255], type="", swap="face-50"), + 77: dict(name="face-53", id=76, color=[255, 255, 255], type="", swap="face-49"), + 78: dict(name="face-54", id=77, color=[255, 255, 255], type="", swap="face-48"), + 79: dict(name="face-55", id=78, color=[255, 255, 255], type="", swap="face-59"), + 80: dict(name="face-56", id=79, color=[255, 255, 255], type="", swap="face-58"), + 81: dict(name="face-57", id=80, color=[255, 255, 255], type="", swap=""), + 82: dict(name="face-58", id=81, color=[255, 255, 255], type="", swap="face-56"), + 83: dict(name="face-59", id=82, color=[255, 255, 255], type="", swap="face-55"), + 84: dict(name="face-60", id=83, color=[255, 255, 255], type="", swap="face-64"), + 85: dict(name="face-61", id=84, color=[255, 255, 255], type="", swap="face-63"), + 86: dict(name="face-62", id=85, color=[255, 255, 255], type="", swap=""), + 87: dict(name="face-63", id=86, color=[255, 255, 255], type="", swap="face-61"), + 88: dict(name="face-64", id=87, color=[255, 255, 255], type="", swap="face-60"), + 89: dict(name="face-65", id=88, color=[255, 255, 255], type="", swap="face-67"), + 90: dict(name="face-66", id=89, color=[255, 255, 255], type="", swap=""), + 91: dict(name="face-67", id=90, color=[255, 255, 255], type="", swap="face-65"), + 92: dict(name="left_hand_root", id=92, color=[0, 0, 255], type="", swap="right_hand_root"), + 93: dict(name="left_thumb1", id=93, color=[0, 0, 255], type="", swap="right_thumb1"), + 94: dict(name="left_thumb2", id=94, color=[0, 0, 255], type="", swap="right_thumb2"), + 95: dict(name="left_thumb3", id=95, color=[0, 0, 255], type="", swap="right_thumb3"), + 96: dict(name="left_thumb4", id=96, color=[0, 0, 255], type="", swap="right_thumb4"), + 97: dict(name="left_forefinger1", id=97, color=[0, 0, 255], type="", swap="right_forefinger1"), + 98: dict(name="left_forefinger2", id=98, color=[0, 0, 255], type="", swap="right_forefinger2"), + 99: dict(name="left_forefinger3", id=99, color=[0, 0, 255], type="", swap="right_forefinger3"), + 100: dict(name="left_forefinger4", id=100, color=[0, 0, 255], type="", swap="right_forefinger4"), + 101: dict(name="left_middle_finger1", id=101, color=[0, 0, 255], type="", swap="right_middle_finger1"), + 102: dict(name="left_middle_finger2", id=102, color=[0, 0, 255], type="", swap="right_middle_finger2"), + 103: dict(name="left_middle_finger3", id=103, color=[0, 0, 255], type="", swap="right_middle_finger3"), + 104: dict(name="left_middle_finger4", id=104, color=[0, 0, 255], type="", swap="right_middle_finger4"), + 105: dict(name="left_ring_finger1", id=105, color=[0, 0, 255], type="", swap="right_ring_finger1"), + 106: dict(name="left_ring_finger2", id=106, color=[0, 0, 255], type="", swap="right_ring_finger2"), + 107: dict(name="left_ring_finger3", id=107, color=[0, 0, 255], type="", swap="right_ring_finger3"), + 108: dict(name="left_ring_finger4", id=108, color=[0, 0, 255], type="", swap="right_ring_finger4"), + 109: dict(name="left_pinky_finger1", id=109, color=[0, 0, 255], type="", swap="right_pinky_finger1"), + 110: dict(name="left_pinky_finger2", id=110, color=[0, 0, 255], type="", swap="right_pinky_finger2"), + 111: dict(name="left_pinky_finger3", id=111, color=[0, 0, 255], type="", swap="right_pinky_finger3"), + 112: dict(name="left_pinky_finger4", id=112, color=[0, 0, 255], type="", swap="right_pinky_finger4"), + 113: dict(name="right_hand_root", id=113, color=[0, 0, 255], type="", swap="left_hand_root"), + 114: dict(name="right_thumb1", id=114, color=[0, 0, 255], type="", swap="left_thumb1"), + 115: dict(name="right_thumb2", id=115, color=[0, 0, 255], type="", swap="left_thumb2"), + 116: dict(name="right_thumb3", id=116, color=[0, 0, 255], type="", swap="left_thumb3"), + 117: dict(name="right_thumb4", id=117, color=[0, 0, 255], type="", swap="left_thumb4"), + 118: dict(name="right_forefinger1", id=118, color=[0, 0, 255], type="", swap="left_forefinger1"), + 119: dict(name="right_forefinger2", id=119, color=[0, 0, 255], type="", swap="left_forefinger2"), + 120: dict(name="right_forefinger3", id=120, color=[0, 0, 255], type="", swap="left_forefinger3"), + 121: dict(name="right_forefinger4", id=121, color=[0, 0, 255], type="", swap="left_forefinger4"), + 122: dict(name="right_middle_finger1", id=122, color=[0, 0, 255], type="", swap="left_middle_finger1"), + 123: dict(name="right_middle_finger2", id=123, color=[0, 0, 255], type="", swap="left_middle_finger2"), + 124: dict(name="right_middle_finger3", id=124, color=[0, 0, 255], type="", swap="left_middle_finger3"), + 125: dict(name="right_middle_finger4", id=125, color=[0, 0, 255], type="", swap="left_middle_finger4"), + 126: dict(name="right_ring_finger1", id=126, color=[0, 0, 255], type="", swap="left_ring_finger1"), + 127: dict(name="right_ring_finger2", id=127, color=[0, 0, 255], type="", swap="left_ring_finger2"), + 128: dict(name="right_ring_finger3", id=128, color=[0, 0, 255], type="", swap="left_ring_finger3"), + 129: dict(name="right_ring_finger4", id=129, color=[0, 0, 255], type="", swap="left_ring_finger4"), + 130: dict(name="right_pinky_finger1", id=130, color=[0, 0, 255], type="", swap="left_pinky_finger1"), + 131: dict(name="right_pinky_finger2", id=131, color=[0, 0, 255], type="", swap="left_pinky_finger2"), + 132: dict(name="right_pinky_finger3", id=132, color=[0, 0, 255], type="", swap="left_pinky_finger3"), + 133: dict(name="right_pinky_finger4", id=133, color=[0, 0, 255], type="", swap="left_pinky_finger4"), }, skeleton_info={ - 0: - dict(link=('neck', 'right_shoulder'), id=0, color=[255, 0, 0]), - 1: - dict(link=('neck', 'left_shoulder'), id=1, color=[255, 85, 0]), - 2: - dict( - link=('right_shoulder', 'right_elbow'), id=2, color=[255, 170, 0]), - 3: - dict(link=('right_elbow', 'right_wrist'), id=3, color=[255, 255, 0]), - 4: - dict(link=('left_shoulder', 'left_elbow'), id=4, color=[170, 255, 0]), - 5: - dict(link=('left_elbow', 'left_wrist'), id=5, color=[85, 255, 0]), - 6: - dict(link=('neck', 'right_hip'), id=6, color=[0, 255, 0]), - 7: - dict(link=('right_hip', 'right_knee'), id=7, color=[0, 255, 85]), - 8: - dict(link=('right_knee', 'right_ankle'), id=8, color=[0, 255, 170]), - 9: - dict(link=('neck', 'left_hip'), id=9, color=[0, 255, 225]), - 10: - dict(link=('left_hip', 'left_knee'), id=10, color=[0, 170, 255]), - 11: - dict(link=('left_knee', 'left_ankle'), id=11, color=[0, 85, 255]), - 12: - dict(link=('neck', 'nose'), id=12, color=[0, 0, 255]), - 13: - dict(link=('nose', 'right_eye'), id=13, color=[255, 0, 170]), - 14: - dict(link=('right_eye', 'right_ear'), id=14, color=[170, 0, 255]), - 15: - dict(link=('nose', 'left_eye'), id=15, color=[255, 0, 255]), - 16: - dict(link=('left_eye', 'left_ear'), id=16, color=[255, 0, 170]), - 17: - dict(link=('left_hand_root', 'left_thumb1'), id=17, color=[255, 0, 0]), - 18: - dict(link=('left_thumb1', 'left_thumb2'), id=18, color=[255, 76, 0]), - 19: - dict(link=('left_thumb2', 'left_thumb3'), id=19, color=[255, 153, 0]), - 20: - dict(link=('left_thumb3', 'left_thumb4'), id=20, color=[255, 230, 0]), - 21: - dict( - link=('left_hand_root', 'left_forefinger1'), - id=21, - color=[204, 255, 0]), - 22: - dict( - link=('left_forefinger1', 'left_forefinger2'), - id=22, - color=[128, 255, 0]), - 23: - dict( - link=('left_forefinger2', 'left_forefinger3'), - id=23, - color=[51, 255, 0]), - 24: - dict( - link=('left_forefinger3', 'left_forefinger4'), - id=24, - color=[0, 255, 26]), - 25: - dict( - link=('left_hand_root', 'left_middle_finger1'), - id=25, - color=[0, 255, 102]), - 26: - dict( - link=('left_middle_finger1', 'left_middle_finger2'), - id=26, - color=[0, 255, 178]), - 27: - dict( - link=('left_middle_finger2', 'left_middle_finger3'), - id=27, - color=[0, 255, 255]), - 28: - dict( - link=('left_middle_finger3', 'left_middle_finger4'), - id=28, - color=[0, 178, 255]), - 29: - dict( - link=('left_hand_root', 'left_ring_finger1'), - id=29, - color=[0, 102, 255]), - 30: - dict( - link=('left_ring_finger1', 'left_ring_finger2'), - id=30, - color=[0, 26, 255]), - 31: - dict( - link=('left_ring_finger2', 'left_ring_finger3'), - id=31, - color=[51, 0, 255]), - 32: - dict( - link=('left_ring_finger3', 'left_ring_finger4'), - id=32, - color=[128, 0, 255]), - 33: - dict( - link=('left_hand_root', 'left_pinky_finger1'), - id=33, - color=[204, 0, 255]), - 34: - dict( - link=('left_pinky_finger1', 'left_pinky_finger2'), - id=34, - color=[255, 0, 230]), - 35: - dict( - link=('left_pinky_finger2', 'left_pinky_finger3'), - id=35, - color=[255, 0, 153]), - 36: - dict( - link=('left_pinky_finger3', 'left_pinky_finger4'), - id=36, - color=[255, 0, 76]), - 37: - dict( - link=('right_hand_root', 'right_thumb1'), id=37, color=[255, 0, - 0]), - 38: - dict(link=('right_thumb1', 'right_thumb2'), id=38, color=[255, 76, 0]), - 39: - dict( - link=('right_thumb2', 'right_thumb3'), id=39, color=[255, 153, 0]), - 40: - dict( - link=('right_thumb3', 'right_thumb4'), id=40, color=[255, 230, 0]), - 41: - dict( - link=('right_hand_root', 'right_forefinger1'), - id=41, - color=[204, 255, 0]), - 42: - dict( - link=('right_forefinger1', 'right_forefinger2'), - id=42, - color=[128, 255, 0]), - 43: - dict( - link=('right_forefinger2', 'right_forefinger3'), - id=43, - color=[51, 255, 0]), - 44: - dict( - link=('right_forefinger3', 'right_forefinger4'), - id=44, - color=[0, 255, 26]), - 45: - dict( - link=('right_hand_root', 'right_middle_finger1'), - id=45, - color=[0, 255, 102]), - 46: - dict( - link=('right_middle_finger1', 'right_middle_finger2'), - id=46, - color=[0, 255, 178]), - 47: - dict( - link=('right_middle_finger2', 'right_middle_finger3'), - id=47, - color=[255, 255, 255]), - 48: - dict( - link=('right_middle_finger3', 'right_middle_finger4'), - id=48, - color=[0, 178, 255]), - 49: - dict( - link=('right_hand_root', 'right_ring_finger1'), - id=49, - color=[0, 102, 255]), - 50: - dict( - link=('right_ring_finger1', 'right_ring_finger2'), - id=50, - color=[0, 26, 255]), - 51: - dict( - link=('right_ring_finger2', 'right_ring_finger3'), - id=51, - color=[51, 0, 255]), - 52: - dict( - link=('right_ring_finger3', 'right_ring_finger4'), - id=52, - color=[128, 0, 255]), - 53: - dict( - link=('right_hand_root', 'right_pinky_finger1'), - id=53, - color=[204, 0, 255]), - 54: - dict( - link=('right_pinky_finger1', 'right_pinky_finger2'), - id=54, - color=[255, 0, 230]), - 55: - dict( - link=('right_pinky_finger2', 'right_pinky_finger3'), - id=55, - color=[255, 0, 153]), - 56: - dict( - link=('right_pinky_finger3', 'right_pinky_finger4'), - id=56, - color=[255, 0, 76]) + 0: dict(link=("neck", "right_shoulder"), id=0, color=[255, 0, 0]), + 1: dict(link=("neck", "left_shoulder"), id=1, color=[255, 85, 0]), + 2: dict(link=("right_shoulder", "right_elbow"), id=2, color=[255, 170, 0]), + 3: dict(link=("right_elbow", "right_wrist"), id=3, color=[255, 255, 0]), + 4: dict(link=("left_shoulder", "left_elbow"), id=4, color=[170, 255, 0]), + 5: dict(link=("left_elbow", "left_wrist"), id=5, color=[85, 255, 0]), + 6: dict(link=("neck", "right_hip"), id=6, color=[0, 255, 0]), + 7: dict(link=("right_hip", "right_knee"), id=7, color=[0, 255, 85]), + 8: dict(link=("right_knee", "right_ankle"), id=8, color=[0, 255, 170]), + 9: dict(link=("neck", "left_hip"), id=9, color=[0, 255, 225]), + 10: dict(link=("left_hip", "left_knee"), id=10, color=[0, 170, 255]), + 11: dict(link=("left_knee", "left_ankle"), id=11, color=[0, 85, 255]), + 12: dict(link=("neck", "nose"), id=12, color=[0, 0, 255]), + 13: dict(link=("nose", "right_eye"), id=13, color=[255, 0, 170]), + 14: dict(link=("right_eye", "right_ear"), id=14, color=[170, 0, 255]), + 15: dict(link=("nose", "left_eye"), id=15, color=[255, 0, 255]), + 16: dict(link=("left_eye", "left_ear"), id=16, color=[255, 0, 170]), + 17: dict(link=("left_hand_root", "left_thumb1"), id=17, color=[255, 0, 0]), + 18: dict(link=("left_thumb1", "left_thumb2"), id=18, color=[255, 76, 0]), + 19: dict(link=("left_thumb2", "left_thumb3"), id=19, color=[255, 153, 0]), + 20: dict(link=("left_thumb3", "left_thumb4"), id=20, color=[255, 230, 0]), + 21: dict(link=("left_hand_root", "left_forefinger1"), id=21, color=[204, 255, 0]), + 22: dict(link=("left_forefinger1", "left_forefinger2"), id=22, color=[128, 255, 0]), + 23: dict(link=("left_forefinger2", "left_forefinger3"), id=23, color=[51, 255, 0]), + 24: dict(link=("left_forefinger3", "left_forefinger4"), id=24, color=[0, 255, 26]), + 25: dict(link=("left_hand_root", "left_middle_finger1"), id=25, color=[0, 255, 102]), + 26: dict(link=("left_middle_finger1", "left_middle_finger2"), id=26, color=[0, 255, 178]), + 27: dict(link=("left_middle_finger2", "left_middle_finger3"), id=27, color=[0, 255, 255]), + 28: dict(link=("left_middle_finger3", "left_middle_finger4"), id=28, color=[0, 178, 255]), + 29: dict(link=("left_hand_root", "left_ring_finger1"), id=29, color=[0, 102, 255]), + 30: dict(link=("left_ring_finger1", "left_ring_finger2"), id=30, color=[0, 26, 255]), + 31: dict(link=("left_ring_finger2", "left_ring_finger3"), id=31, color=[51, 0, 255]), + 32: dict(link=("left_ring_finger3", "left_ring_finger4"), id=32, color=[128, 0, 255]), + 33: dict(link=("left_hand_root", "left_pinky_finger1"), id=33, color=[204, 0, 255]), + 34: dict(link=("left_pinky_finger1", "left_pinky_finger2"), id=34, color=[255, 0, 230]), + 35: dict(link=("left_pinky_finger2", "left_pinky_finger3"), id=35, color=[255, 0, 153]), + 36: dict(link=("left_pinky_finger3", "left_pinky_finger4"), id=36, color=[255, 0, 76]), + 37: dict(link=("right_hand_root", "right_thumb1"), id=37, color=[255, 0, 0]), + 38: dict(link=("right_thumb1", "right_thumb2"), id=38, color=[255, 76, 0]), + 39: dict(link=("right_thumb2", "right_thumb3"), id=39, color=[255, 153, 0]), + 40: dict(link=("right_thumb3", "right_thumb4"), id=40, color=[255, 230, 0]), + 41: dict(link=("right_hand_root", "right_forefinger1"), id=41, color=[204, 255, 0]), + 42: dict(link=("right_forefinger1", "right_forefinger2"), id=42, color=[128, 255, 0]), + 43: dict(link=("right_forefinger2", "right_forefinger3"), id=43, color=[51, 255, 0]), + 44: dict(link=("right_forefinger3", "right_forefinger4"), id=44, color=[0, 255, 26]), + 45: dict(link=("right_hand_root", "right_middle_finger1"), id=45, color=[0, 255, 102]), + 46: dict(link=("right_middle_finger1", "right_middle_finger2"), id=46, color=[0, 255, 178]), + 47: dict(link=("right_middle_finger2", "right_middle_finger3"), id=47, color=[255, 255, 255]), + 48: dict(link=("right_middle_finger3", "right_middle_finger4"), id=48, color=[0, 178, 255]), + 49: dict(link=("right_hand_root", "right_ring_finger1"), id=49, color=[0, 102, 255]), + 50: dict(link=("right_ring_finger1", "right_ring_finger2"), id=50, color=[0, 26, 255]), + 51: dict(link=("right_ring_finger2", "right_ring_finger3"), id=51, color=[51, 0, 255]), + 52: dict(link=("right_ring_finger3", "right_ring_finger4"), id=52, color=[128, 0, 255]), + 53: dict(link=("right_hand_root", "right_pinky_finger1"), id=53, color=[204, 0, 255]), + 54: dict(link=("right_pinky_finger1", "right_pinky_finger2"), id=54, color=[255, 0, 230]), + 55: dict(link=("right_pinky_finger2", "right_pinky_finger3"), id=55, color=[255, 0, 153]), + 56: dict(link=("right_pinky_finger3", "right_pinky_finger4"), id=56, color=[255, 0, 76]), }, - joint_weights=[1.] * 134, + joint_weights=[1.0] * 134, # 'https://github.com/jin-s13/COCO-WholeBody/blob/master/' # 'evaluation/myeval_wholebody.py#L175' sigmas=[ - 0.026, 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, - 0.062, 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, 0.068, 0.066, - 0.066, 0.092, 0.094, 0.094, 0.042, 0.043, 0.044, 0.043, 0.040, 0.035, - 0.031, 0.025, 0.020, 0.023, 0.029, 0.032, 0.037, 0.038, 0.043, 0.041, - 0.045, 0.013, 0.012, 0.011, 0.011, 0.012, 0.012, 0.011, 0.011, 0.013, - 0.015, 0.009, 0.007, 0.007, 0.007, 0.012, 0.009, 0.008, 0.016, 0.010, - 0.017, 0.011, 0.009, 0.011, 0.009, 0.007, 0.013, 0.008, 0.011, 0.012, - 0.010, 0.034, 0.008, 0.008, 0.009, 0.008, 0.008, 0.007, 0.010, 0.008, - 0.009, 0.009, 0.009, 0.007, 0.007, 0.008, 0.011, 0.008, 0.008, 0.008, - 0.01, 0.008, 0.029, 0.022, 0.035, 0.037, 0.047, 0.026, 0.025, 0.024, - 0.035, 0.018, 0.024, 0.022, 0.026, 0.017, 0.021, 0.021, 0.032, 0.02, - 0.019, 0.022, 0.031, 0.029, 0.022, 0.035, 0.037, 0.047, 0.026, 0.025, - 0.024, 0.035, 0.018, 0.024, 0.022, 0.026, 0.017, 0.021, 0.021, 0.032, - 0.02, 0.019, 0.022, 0.031 - ]) + 0.026, + 0.026, + 0.025, + 0.025, + 0.035, + 0.035, + 0.079, + 0.079, + 0.072, + 0.072, + 0.062, + 0.062, + 0.107, + 0.107, + 0.087, + 0.087, + 0.089, + 0.089, + 0.068, + 0.066, + 0.066, + 0.092, + 0.094, + 0.094, + 0.042, + 0.043, + 0.044, + 0.043, + 0.040, + 0.035, + 0.031, + 0.025, + 0.020, + 0.023, + 0.029, + 0.032, + 0.037, + 0.038, + 0.043, + 0.041, + 0.045, + 0.013, + 0.012, + 0.011, + 0.011, + 0.012, + 0.012, + 0.011, + 0.011, + 0.013, + 0.015, + 0.009, + 0.007, + 0.007, + 0.007, + 0.012, + 0.009, + 0.008, + 0.016, + 0.010, + 0.017, + 0.011, + 0.009, + 0.011, + 0.009, + 0.007, + 0.013, + 0.008, + 0.011, + 0.012, + 0.010, + 0.034, + 0.008, + 0.008, + 0.009, + 0.008, + 0.008, + 0.007, + 0.010, + 0.008, + 0.009, + 0.009, + 0.009, + 0.007, + 0.007, + 0.008, + 0.011, + 0.008, + 0.008, + 0.008, + 0.01, + 0.008, + 0.029, + 0.022, + 0.035, + 0.037, + 0.047, + 0.026, + 0.025, + 0.024, + 0.035, + 0.018, + 0.024, + 0.022, + 0.026, + 0.017, + 0.021, + 0.021, + 0.032, + 0.02, + 0.019, + 0.022, + 0.031, + 0.029, + 0.022, + 0.035, + 0.037, + 0.047, + 0.026, + 0.025, + 0.024, + 0.035, + 0.018, + 0.024, + 0.022, + 0.026, + 0.017, + 0.021, + 0.021, + 0.032, + 0.02, + 0.019, + 0.022, + 0.031, + ], +) diff --git a/mmpose/configs/_base_/datasets/cofw.py b/mmpose/configs/_base_/datasets/cofw.py index d528bf2f2f7e63adbff3ed56e18bca8b02165e42..c48368a1bc93958b430e5863e72bacdff4f09773 100644 --- a/mmpose/configs/_base_/datasets/cofw.py +++ b/mmpose/configs/_base_/datasets/cofw.py @@ -1,57 +1,44 @@ dataset_info = dict( - dataset_name='cofw', + dataset_name="cofw", paper_info=dict( - author='Burgos-Artizzu, Xavier P and Perona, ' - r'Pietro and Doll{\'a}r, Piotr', - title='Robust face landmark estimation under occlusion', - container='Proceedings of the IEEE international ' - 'conference on computer vision', - year='2013', - homepage='http://www.vision.caltech.edu/xpburgos/ICCV13/', + author="Burgos-Artizzu, Xavier P and Perona, " r"Pietro and Doll{\'a}r, Piotr", + title="Robust face landmark estimation under occlusion", + container="Proceedings of the IEEE international " "conference on computer vision", + year="2013", + homepage="http://www.vision.caltech.edu/xpburgos/ICCV13/", ), keypoint_info={ - 0: dict(name='kpt-0', id=0, color=[255, 0, 0], type='', swap='kpt-1'), - 1: dict(name='kpt-1', id=1, color=[255, 0, 0], type='', swap='kpt-0'), - 2: dict(name='kpt-2', id=2, color=[255, 0, 0], type='', swap='kpt-3'), - 3: dict(name='kpt-3', id=3, color=[255, 0, 0], type='', swap='kpt-2'), - 4: dict(name='kpt-4', id=4, color=[255, 0, 0], type='', swap='kpt-6'), - 5: dict(name='kpt-5', id=5, color=[255, 0, 0], type='', swap='kpt-7'), - 6: dict(name='kpt-6', id=6, color=[255, 0, 0], type='', swap='kpt-4'), - 7: dict(name='kpt-7', id=7, color=[255, 0, 0], type='', swap='kpt-5'), - 8: dict(name='kpt-8', id=8, color=[255, 0, 0], type='', swap='kpt-9'), - 9: dict(name='kpt-9', id=9, color=[255, 0, 0], type='', swap='kpt-8'), - 10: - dict(name='kpt-10', id=10, color=[255, 0, 0], type='', swap='kpt-11'), - 11: - dict(name='kpt-11', id=11, color=[255, 0, 0], type='', swap='kpt-10'), - 12: - dict(name='kpt-12', id=12, color=[255, 0, 0], type='', swap='kpt-14'), - 13: - dict(name='kpt-13', id=13, color=[255, 0, 0], type='', swap='kpt-15'), - 14: - dict(name='kpt-14', id=14, color=[255, 0, 0], type='', swap='kpt-12'), - 15: - dict(name='kpt-15', id=15, color=[255, 0, 0], type='', swap='kpt-13'), - 16: - dict(name='kpt-16', id=16, color=[255, 0, 0], type='', swap='kpt-17'), - 17: - dict(name='kpt-17', id=17, color=[255, 0, 0], type='', swap='kpt-16'), - 18: - dict(name='kpt-18', id=18, color=[255, 0, 0], type='', swap='kpt-19'), - 19: - dict(name='kpt-19', id=19, color=[255, 0, 0], type='', swap='kpt-18'), - 20: dict(name='kpt-20', id=20, color=[255, 0, 0], type='', swap=''), - 21: dict(name='kpt-21', id=21, color=[255, 0, 0], type='', swap=''), - 22: - dict(name='kpt-22', id=22, color=[255, 0, 0], type='', swap='kpt-23'), - 23: - dict(name='kpt-23', id=23, color=[255, 0, 0], type='', swap='kpt-22'), - 24: dict(name='kpt-24', id=24, color=[255, 0, 0], type='', swap=''), - 25: dict(name='kpt-25', id=25, color=[255, 0, 0], type='', swap=''), - 26: dict(name='kpt-26', id=26, color=[255, 0, 0], type='', swap=''), - 27: dict(name='kpt-27', id=27, color=[255, 0, 0], type='', swap=''), - 28: dict(name='kpt-28', id=28, color=[255, 0, 0], type='', swap='') + 0: dict(name="kpt-0", id=0, color=[255, 0, 0], type="", swap="kpt-1"), + 1: dict(name="kpt-1", id=1, color=[255, 0, 0], type="", swap="kpt-0"), + 2: dict(name="kpt-2", id=2, color=[255, 0, 0], type="", swap="kpt-3"), + 3: dict(name="kpt-3", id=3, color=[255, 0, 0], type="", swap="kpt-2"), + 4: dict(name="kpt-4", id=4, color=[255, 0, 0], type="", swap="kpt-6"), + 5: dict(name="kpt-5", id=5, color=[255, 0, 0], type="", swap="kpt-7"), + 6: dict(name="kpt-6", id=6, color=[255, 0, 0], type="", swap="kpt-4"), + 7: dict(name="kpt-7", id=7, color=[255, 0, 0], type="", swap="kpt-5"), + 8: dict(name="kpt-8", id=8, color=[255, 0, 0], type="", swap="kpt-9"), + 9: dict(name="kpt-9", id=9, color=[255, 0, 0], type="", swap="kpt-8"), + 10: dict(name="kpt-10", id=10, color=[255, 0, 0], type="", swap="kpt-11"), + 11: dict(name="kpt-11", id=11, color=[255, 0, 0], type="", swap="kpt-10"), + 12: dict(name="kpt-12", id=12, color=[255, 0, 0], type="", swap="kpt-14"), + 13: dict(name="kpt-13", id=13, color=[255, 0, 0], type="", swap="kpt-15"), + 14: dict(name="kpt-14", id=14, color=[255, 0, 0], type="", swap="kpt-12"), + 15: dict(name="kpt-15", id=15, color=[255, 0, 0], type="", swap="kpt-13"), + 16: dict(name="kpt-16", id=16, color=[255, 0, 0], type="", swap="kpt-17"), + 17: dict(name="kpt-17", id=17, color=[255, 0, 0], type="", swap="kpt-16"), + 18: dict(name="kpt-18", id=18, color=[255, 0, 0], type="", swap="kpt-19"), + 19: dict(name="kpt-19", id=19, color=[255, 0, 0], type="", swap="kpt-18"), + 20: dict(name="kpt-20", id=20, color=[255, 0, 0], type="", swap=""), + 21: dict(name="kpt-21", id=21, color=[255, 0, 0], type="", swap=""), + 22: dict(name="kpt-22", id=22, color=[255, 0, 0], type="", swap="kpt-23"), + 23: dict(name="kpt-23", id=23, color=[255, 0, 0], type="", swap="kpt-22"), + 24: dict(name="kpt-24", id=24, color=[255, 0, 0], type="", swap=""), + 25: dict(name="kpt-25", id=25, color=[255, 0, 0], type="", swap=""), + 26: dict(name="kpt-26", id=26, color=[255, 0, 0], type="", swap=""), + 27: dict(name="kpt-27", id=27, color=[255, 0, 0], type="", swap=""), + 28: dict(name="kpt-28", id=28, color=[255, 0, 0], type="", swap=""), }, skeleton_info={}, - joint_weights=[1.] * 29, - sigmas=[]) + joint_weights=[1.0] * 29, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/crowdpose.py b/mmpose/configs/_base_/datasets/crowdpose.py index 45086531a601870716eed15a32c5413c0e24b7ae..d1fdf79b60b82e6ef59bb7b2bd904e015fc619e5 100644 --- a/mmpose/configs/_base_/datasets/crowdpose.py +++ b/mmpose/configs/_base_/datasets/crowdpose.py @@ -1,147 +1,45 @@ dataset_info = dict( - dataset_name='crowdpose', + dataset_name="crowdpose", paper_info=dict( - author='Li, Jiefeng and Wang, Can and Zhu, Hao and ' - 'Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu', - title='CrowdPose: Efficient Crowded Scenes Pose Estimation ' - 'and A New Benchmark', - container='Proceedings of IEEE Conference on Computer ' - 'Vision and Pattern Recognition (CVPR)', - year='2019', - homepage='https://github.com/Jeff-sjtu/CrowdPose', + author="Li, Jiefeng and Wang, Can and Zhu, Hao and " "Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu", + title="CrowdPose: Efficient Crowded Scenes Pose Estimation " "and A New Benchmark", + container="Proceedings of IEEE Conference on Computer " "Vision and Pattern Recognition (CVPR)", + year="2019", + homepage="https://github.com/Jeff-sjtu/CrowdPose", ), keypoint_info={ - 0: - dict( - name='left_shoulder', - id=0, - color=[51, 153, 255], - type='upper', - swap='right_shoulder'), - 1: - dict( - name='right_shoulder', - id=1, - color=[51, 153, 255], - type='upper', - swap='left_shoulder'), - 2: - dict( - name='left_elbow', - id=2, - color=[51, 153, 255], - type='upper', - swap='right_elbow'), - 3: - dict( - name='right_elbow', - id=3, - color=[51, 153, 255], - type='upper', - swap='left_elbow'), - 4: - dict( - name='left_wrist', - id=4, - color=[51, 153, 255], - type='upper', - swap='right_wrist'), - 5: - dict( - name='right_wrist', - id=5, - color=[0, 255, 0], - type='upper', - swap='left_wrist'), - 6: - dict( - name='left_hip', - id=6, - color=[255, 128, 0], - type='lower', - swap='right_hip'), - 7: - dict( - name='right_hip', - id=7, - color=[0, 255, 0], - type='lower', - swap='left_hip'), - 8: - dict( - name='left_knee', - id=8, - color=[255, 128, 0], - type='lower', - swap='right_knee'), - 9: - dict( - name='right_knee', - id=9, - color=[0, 255, 0], - type='lower', - swap='left_knee'), - 10: - dict( - name='left_ankle', - id=10, - color=[255, 128, 0], - type='lower', - swap='right_ankle'), - 11: - dict( - name='right_ankle', - id=11, - color=[0, 255, 0], - type='lower', - swap='left_ankle'), - 12: - dict( - name='top_head', id=12, color=[255, 128, 0], type='upper', - swap=''), - 13: - dict(name='neck', id=13, color=[0, 255, 0], type='upper', swap='') + 0: dict(name="left_shoulder", id=0, color=[51, 153, 255], type="upper", swap="right_shoulder"), + 1: dict(name="right_shoulder", id=1, color=[51, 153, 255], type="upper", swap="left_shoulder"), + 2: dict(name="left_elbow", id=2, color=[51, 153, 255], type="upper", swap="right_elbow"), + 3: dict(name="right_elbow", id=3, color=[51, 153, 255], type="upper", swap="left_elbow"), + 4: dict(name="left_wrist", id=4, color=[51, 153, 255], type="upper", swap="right_wrist"), + 5: dict(name="right_wrist", id=5, color=[0, 255, 0], type="upper", swap="left_wrist"), + 6: dict(name="left_hip", id=6, color=[255, 128, 0], type="lower", swap="right_hip"), + 7: dict(name="right_hip", id=7, color=[0, 255, 0], type="lower", swap="left_hip"), + 8: dict(name="left_knee", id=8, color=[255, 128, 0], type="lower", swap="right_knee"), + 9: dict(name="right_knee", id=9, color=[0, 255, 0], type="lower", swap="left_knee"), + 10: dict(name="left_ankle", id=10, color=[255, 128, 0], type="lower", swap="right_ankle"), + 11: dict(name="right_ankle", id=11, color=[0, 255, 0], type="lower", swap="left_ankle"), + 12: dict(name="top_head", id=12, color=[255, 128, 0], type="upper", swap=""), + 13: dict(name="neck", id=13, color=[0, 255, 0], type="upper", swap=""), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'right_shoulder'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('top_head', 'neck'), id=12, color=[51, 153, 255]), - 13: - dict(link=('right_shoulder', 'neck'), id=13, color=[51, 153, 255]), - 14: - dict(link=('left_shoulder', 'neck'), id=14, color=[51, 153, 255]) + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("top_head", "neck"), id=12, color=[51, 153, 255]), + 13: dict(link=("right_shoulder", "neck"), id=13, color=[51, 153, 255]), + 14: dict(link=("left_shoulder", "neck"), id=14, color=[51, 153, 255]), }, - joint_weights=[ - 0.2, 0.2, 0.2, 1.3, 1.5, 0.2, 1.3, 1.5, 0.2, 0.2, 0.5, 0.2, 0.2, 0.5 - ], - sigmas=[ - 0.079, 0.079, 0.072, 0.072, 0.062, 0.062, 0.107, 0.107, 0.087, 0.087, - 0.089, 0.089, 0.079, 0.079 - ]) + joint_weights=[0.2, 0.2, 0.2, 1.3, 1.5, 0.2, 1.3, 1.5, 0.2, 0.2, 0.5, 0.2, 0.2, 0.5], + sigmas=[0.079, 0.079, 0.072, 0.072, 0.062, 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, 0.079, 0.079], +) diff --git a/mmpose/configs/_base_/datasets/deepfashion2.py b/mmpose/configs/_base_/datasets/deepfashion2.py index f65d1bb591fab8f06a79b5d595478a282acd8b3e..595c4467b179f7e84932df15b084eac42dbf486c 100644 --- a/mmpose/configs/_base_/datasets/deepfashion2.py +++ b/mmpose/configs/_base_/datasets/deepfashion2.py @@ -14,2647 +14,644 @@ colors = dict( sd=[128, 64, 0], # sling_dress ) dataset_info = dict( - dataset_name='deepfashion2', + dataset_name="deepfashion2", paper_info=dict( - author='Yuying Ge and Ruimao Zhang and Lingyun Wu ' - 'and Xiaogang Wang and Xiaoou Tang and Ping Luo', - title='DeepFashion2: A Versatile Benchmark for ' - 'Detection, Pose Estimation, Segmentation and ' - 'Re-Identification of Clothing Images', - container='Proceedings of IEEE Conference on Computer ' - 'Vision and Pattern Recognition (CVPR)', - year='2019', - homepage='https://github.com/switchablenorms/DeepFashion2', + author="Yuying Ge and Ruimao Zhang and Lingyun Wu " "and Xiaogang Wang and Xiaoou Tang and Ping Luo", + title="DeepFashion2: A Versatile Benchmark for " + "Detection, Pose Estimation, Segmentation and " + "Re-Identification of Clothing Images", + container="Proceedings of IEEE Conference on Computer " "Vision and Pattern Recognition (CVPR)", + year="2019", + homepage="https://github.com/switchablenorms/DeepFashion2", ), keypoint_info={ # short_sleeved_shirt - 0: - dict(name='sss_kpt1', id=0, color=colors['sss'], type='', swap=''), - 1: - dict( - name='sss_kpt2', - id=1, - color=colors['sss'], - type='', - swap='sss_kpt6'), - 2: - dict( - name='sss_kpt3', - id=2, - color=colors['sss'], - type='', - swap='sss_kpt5'), - 3: - dict(name='sss_kpt4', id=3, color=colors['sss'], type='', swap=''), - 4: - dict( - name='sss_kpt5', - id=4, - color=colors['sss'], - type='', - swap='sss_kpt3'), - 5: - dict( - name='sss_kpt6', - id=5, - color=colors['sss'], - type='', - swap='sss_kpt2'), - 6: - dict( - name='sss_kpt7', - id=6, - color=colors['sss'], - type='', - swap='sss_kpt25'), - 7: - dict( - name='sss_kpt8', - id=7, - color=colors['sss'], - type='', - swap='sss_kpt24'), - 8: - dict( - name='sss_kpt9', - id=8, - color=colors['sss'], - type='', - swap='sss_kpt23'), - 9: - dict( - name='sss_kpt10', - id=9, - color=colors['sss'], - type='', - swap='sss_kpt22'), - 10: - dict( - name='sss_kpt11', - id=10, - color=colors['sss'], - type='', - swap='sss_kpt21'), - 11: - dict( - name='sss_kpt12', - id=11, - color=colors['sss'], - type='', - swap='sss_kpt20'), - 12: - dict( - name='sss_kpt13', - id=12, - color=colors['sss'], - type='', - swap='sss_kpt19'), - 13: - dict( - name='sss_kpt14', - id=13, - color=colors['sss'], - type='', - swap='sss_kpt18'), - 14: - dict( - name='sss_kpt15', - id=14, - color=colors['sss'], - type='', - swap='sss_kpt17'), - 15: - dict(name='sss_kpt16', id=15, color=colors['sss'], type='', swap=''), - 16: - dict( - name='sss_kpt17', - id=16, - color=colors['sss'], - type='', - swap='sss_kpt15'), - 17: - dict( - name='sss_kpt18', - id=17, - color=colors['sss'], - type='', - swap='sss_kpt14'), - 18: - dict( - name='sss_kpt19', - id=18, - color=colors['sss'], - type='', - swap='sss_kpt13'), - 19: - dict( - name='sss_kpt20', - id=19, - color=colors['sss'], - type='', - swap='sss_kpt12'), - 20: - dict( - name='sss_kpt21', - id=20, - color=colors['sss'], - type='', - swap='sss_kpt11'), - 21: - dict( - name='sss_kpt22', - id=21, - color=colors['sss'], - type='', - swap='sss_kpt10'), - 22: - dict( - name='sss_kpt23', - id=22, - color=colors['sss'], - type='', - swap='sss_kpt9'), - 23: - dict( - name='sss_kpt24', - id=23, - color=colors['sss'], - type='', - swap='sss_kpt8'), - 24: - dict( - name='sss_kpt25', - id=24, - color=colors['sss'], - type='', - swap='sss_kpt7'), + 0: dict(name="sss_kpt1", id=0, color=colors["sss"], type="", swap=""), + 1: dict(name="sss_kpt2", id=1, color=colors["sss"], type="", swap="sss_kpt6"), + 2: dict(name="sss_kpt3", id=2, color=colors["sss"], type="", swap="sss_kpt5"), + 3: dict(name="sss_kpt4", id=3, color=colors["sss"], type="", swap=""), + 4: dict(name="sss_kpt5", id=4, color=colors["sss"], type="", swap="sss_kpt3"), + 5: dict(name="sss_kpt6", id=5, color=colors["sss"], type="", swap="sss_kpt2"), + 6: dict(name="sss_kpt7", id=6, color=colors["sss"], type="", swap="sss_kpt25"), + 7: dict(name="sss_kpt8", id=7, color=colors["sss"], type="", swap="sss_kpt24"), + 8: dict(name="sss_kpt9", id=8, color=colors["sss"], type="", swap="sss_kpt23"), + 9: dict(name="sss_kpt10", id=9, color=colors["sss"], type="", swap="sss_kpt22"), + 10: dict(name="sss_kpt11", id=10, color=colors["sss"], type="", swap="sss_kpt21"), + 11: dict(name="sss_kpt12", id=11, color=colors["sss"], type="", swap="sss_kpt20"), + 12: dict(name="sss_kpt13", id=12, color=colors["sss"], type="", swap="sss_kpt19"), + 13: dict(name="sss_kpt14", id=13, color=colors["sss"], type="", swap="sss_kpt18"), + 14: dict(name="sss_kpt15", id=14, color=colors["sss"], type="", swap="sss_kpt17"), + 15: dict(name="sss_kpt16", id=15, color=colors["sss"], type="", swap=""), + 16: dict(name="sss_kpt17", id=16, color=colors["sss"], type="", swap="sss_kpt15"), + 17: dict(name="sss_kpt18", id=17, color=colors["sss"], type="", swap="sss_kpt14"), + 18: dict(name="sss_kpt19", id=18, color=colors["sss"], type="", swap="sss_kpt13"), + 19: dict(name="sss_kpt20", id=19, color=colors["sss"], type="", swap="sss_kpt12"), + 20: dict(name="sss_kpt21", id=20, color=colors["sss"], type="", swap="sss_kpt11"), + 21: dict(name="sss_kpt22", id=21, color=colors["sss"], type="", swap="sss_kpt10"), + 22: dict(name="sss_kpt23", id=22, color=colors["sss"], type="", swap="sss_kpt9"), + 23: dict(name="sss_kpt24", id=23, color=colors["sss"], type="", swap="sss_kpt8"), + 24: dict(name="sss_kpt25", id=24, color=colors["sss"], type="", swap="sss_kpt7"), # long_sleeved_shirt - 25: - dict(name='lss_kpt1', id=25, color=colors['lss'], type='', swap=''), - 26: - dict( - name='lss_kpt2', - id=26, - color=colors['lss'], - type='', - swap='lss_kpt6'), - 27: - dict( - name='lss_kpt3', - id=27, - color=colors['lss'], - type='', - swap='lss_kpt5'), - 28: - dict(name='lss_kpt4', id=28, color=colors['lss'], type='', swap=''), - 29: - dict( - name='lss_kpt5', - id=29, - color=colors['lss'], - type='', - swap='lss_kpt3'), - 30: - dict( - name='lss_kpt6', - id=30, - color=colors['lss'], - type='', - swap='lss_kpt2'), - 31: - dict( - name='lss_kpt7', - id=31, - color=colors['lss'], - type='', - swap='lss_kpt33'), - 32: - dict( - name='lss_kpt8', - id=32, - color=colors['lss'], - type='', - swap='lss_kpt32'), - 33: - dict( - name='lss_kpt9', - id=33, - color=colors['lss'], - type='', - swap='lss_kpt31'), - 34: - dict( - name='lss_kpt10', - id=34, - color=colors['lss'], - type='', - swap='lss_kpt30'), - 35: - dict( - name='lss_kpt11', - id=35, - color=colors['lss'], - type='', - swap='lss_kpt29'), - 36: - dict( - name='lss_kpt12', - id=36, - color=colors['lss'], - type='', - swap='lss_kpt28'), - 37: - dict( - name='lss_kpt13', - id=37, - color=colors['lss'], - type='', - swap='lss_kpt27'), - 38: - dict( - name='lss_kpt14', - id=38, - color=colors['lss'], - type='', - swap='lss_kpt26'), - 39: - dict( - name='lss_kpt15', - id=39, - color=colors['lss'], - type='', - swap='lss_kpt25'), - 40: - dict( - name='lss_kpt16', - id=40, - color=colors['lss'], - type='', - swap='lss_kpt24'), - 41: - dict( - name='lss_kpt17', - id=41, - color=colors['lss'], - type='', - swap='lss_kpt23'), - 42: - dict( - name='lss_kpt18', - id=42, - color=colors['lss'], - type='', - swap='lss_kpt22'), - 43: - dict( - name='lss_kpt19', - id=43, - color=colors['lss'], - type='', - swap='lss_kpt21'), - 44: - dict(name='lss_kpt20', id=44, color=colors['lss'], type='', swap=''), - 45: - dict( - name='lss_kpt21', - id=45, - color=colors['lss'], - type='', - swap='lss_kpt19'), - 46: - dict( - name='lss_kpt22', - id=46, - color=colors['lss'], - type='', - swap='lss_kpt18'), - 47: - dict( - name='lss_kpt23', - id=47, - color=colors['lss'], - type='', - swap='lss_kpt17'), - 48: - dict( - name='lss_kpt24', - id=48, - color=colors['lss'], - type='', - swap='lss_kpt16'), - 49: - dict( - name='lss_kpt25', - id=49, - color=colors['lss'], - type='', - swap='lss_kpt15'), - 50: - dict( - name='lss_kpt26', - id=50, - color=colors['lss'], - type='', - swap='lss_kpt14'), - 51: - dict( - name='lss_kpt27', - id=51, - color=colors['lss'], - type='', - swap='lss_kpt13'), - 52: - dict( - name='lss_kpt28', - id=52, - color=colors['lss'], - type='', - swap='lss_kpt12'), - 53: - dict( - name='lss_kpt29', - id=53, - color=colors['lss'], - type='', - swap='lss_kpt11'), - 54: - dict( - name='lss_kpt30', - id=54, - color=colors['lss'], - type='', - swap='lss_kpt10'), - 55: - dict( - name='lss_kpt31', - id=55, - color=colors['lss'], - type='', - swap='lss_kpt9'), - 56: - dict( - name='lss_kpt32', - id=56, - color=colors['lss'], - type='', - swap='lss_kpt8'), - 57: - dict( - name='lss_kpt33', - id=57, - color=colors['lss'], - type='', - swap='lss_kpt7'), + 25: dict(name="lss_kpt1", id=25, color=colors["lss"], type="", swap=""), + 26: dict(name="lss_kpt2", id=26, color=colors["lss"], type="", swap="lss_kpt6"), + 27: dict(name="lss_kpt3", id=27, color=colors["lss"], type="", swap="lss_kpt5"), + 28: dict(name="lss_kpt4", id=28, color=colors["lss"], type="", swap=""), + 29: dict(name="lss_kpt5", id=29, color=colors["lss"], type="", swap="lss_kpt3"), + 30: dict(name="lss_kpt6", id=30, color=colors["lss"], type="", swap="lss_kpt2"), + 31: dict(name="lss_kpt7", id=31, color=colors["lss"], type="", swap="lss_kpt33"), + 32: dict(name="lss_kpt8", id=32, color=colors["lss"], type="", swap="lss_kpt32"), + 33: dict(name="lss_kpt9", id=33, color=colors["lss"], type="", swap="lss_kpt31"), + 34: dict(name="lss_kpt10", id=34, color=colors["lss"], type="", swap="lss_kpt30"), + 35: dict(name="lss_kpt11", id=35, color=colors["lss"], type="", swap="lss_kpt29"), + 36: dict(name="lss_kpt12", id=36, color=colors["lss"], type="", swap="lss_kpt28"), + 37: dict(name="lss_kpt13", id=37, color=colors["lss"], type="", swap="lss_kpt27"), + 38: dict(name="lss_kpt14", id=38, color=colors["lss"], type="", swap="lss_kpt26"), + 39: dict(name="lss_kpt15", id=39, color=colors["lss"], type="", swap="lss_kpt25"), + 40: dict(name="lss_kpt16", id=40, color=colors["lss"], type="", swap="lss_kpt24"), + 41: dict(name="lss_kpt17", id=41, color=colors["lss"], type="", swap="lss_kpt23"), + 42: dict(name="lss_kpt18", id=42, color=colors["lss"], type="", swap="lss_kpt22"), + 43: dict(name="lss_kpt19", id=43, color=colors["lss"], type="", swap="lss_kpt21"), + 44: dict(name="lss_kpt20", id=44, color=colors["lss"], type="", swap=""), + 45: dict(name="lss_kpt21", id=45, color=colors["lss"], type="", swap="lss_kpt19"), + 46: dict(name="lss_kpt22", id=46, color=colors["lss"], type="", swap="lss_kpt18"), + 47: dict(name="lss_kpt23", id=47, color=colors["lss"], type="", swap="lss_kpt17"), + 48: dict(name="lss_kpt24", id=48, color=colors["lss"], type="", swap="lss_kpt16"), + 49: dict(name="lss_kpt25", id=49, color=colors["lss"], type="", swap="lss_kpt15"), + 50: dict(name="lss_kpt26", id=50, color=colors["lss"], type="", swap="lss_kpt14"), + 51: dict(name="lss_kpt27", id=51, color=colors["lss"], type="", swap="lss_kpt13"), + 52: dict(name="lss_kpt28", id=52, color=colors["lss"], type="", swap="lss_kpt12"), + 53: dict(name="lss_kpt29", id=53, color=colors["lss"], type="", swap="lss_kpt11"), + 54: dict(name="lss_kpt30", id=54, color=colors["lss"], type="", swap="lss_kpt10"), + 55: dict(name="lss_kpt31", id=55, color=colors["lss"], type="", swap="lss_kpt9"), + 56: dict(name="lss_kpt32", id=56, color=colors["lss"], type="", swap="lss_kpt8"), + 57: dict(name="lss_kpt33", id=57, color=colors["lss"], type="", swap="lss_kpt7"), # short_sleeved_outwear - 58: - dict(name='sso_kpt1', id=58, color=colors['sso'], type='', swap=''), - 59: - dict( - name='sso_kpt2', - id=59, - color=colors['sso'], - type='', - swap='sso_kpt26'), - 60: - dict( - name='sso_kpt3', - id=60, - color=colors['sso'], - type='', - swap='sso_kpt5'), - 61: - dict( - name='sso_kpt4', - id=61, - color=colors['sso'], - type='', - swap='sso_kpt6'), - 62: - dict( - name='sso_kpt5', - id=62, - color=colors['sso'], - type='', - swap='sso_kpt3'), - 63: - dict( - name='sso_kpt6', - id=63, - color=colors['sso'], - type='', - swap='sso_kpt4'), - 64: - dict( - name='sso_kpt7', - id=64, - color=colors['sso'], - type='', - swap='sso_kpt25'), - 65: - dict( - name='sso_kpt8', - id=65, - color=colors['sso'], - type='', - swap='sso_kpt24'), - 66: - dict( - name='sso_kpt9', - id=66, - color=colors['sso'], - type='', - swap='sso_kpt23'), - 67: - dict( - name='sso_kpt10', - id=67, - color=colors['sso'], - type='', - swap='sso_kpt22'), - 68: - dict( - name='sso_kpt11', - id=68, - color=colors['sso'], - type='', - swap='sso_kpt21'), - 69: - dict( - name='sso_kpt12', - id=69, - color=colors['sso'], - type='', - swap='sso_kpt20'), - 70: - dict( - name='sso_kpt13', - id=70, - color=colors['sso'], - type='', - swap='sso_kpt19'), - 71: - dict( - name='sso_kpt14', - id=71, - color=colors['sso'], - type='', - swap='sso_kpt18'), - 72: - dict( - name='sso_kpt15', - id=72, - color=colors['sso'], - type='', - swap='sso_kpt17'), - 73: - dict( - name='sso_kpt16', - id=73, - color=colors['sso'], - type='', - swap='sso_kpt29'), - 74: - dict( - name='sso_kpt17', - id=74, - color=colors['sso'], - type='', - swap='sso_kpt15'), - 75: - dict( - name='sso_kpt18', - id=75, - color=colors['sso'], - type='', - swap='sso_kpt14'), - 76: - dict( - name='sso_kpt19', - id=76, - color=colors['sso'], - type='', - swap='sso_kpt13'), - 77: - dict( - name='sso_kpt20', - id=77, - color=colors['sso'], - type='', - swap='sso_kpt12'), - 78: - dict( - name='sso_kpt21', - id=78, - color=colors['sso'], - type='', - swap='sso_kpt11'), - 79: - dict( - name='sso_kpt22', - id=79, - color=colors['sso'], - type='', - swap='sso_kpt10'), - 80: - dict( - name='sso_kpt23', - id=80, - color=colors['sso'], - type='', - swap='sso_kpt9'), - 81: - dict( - name='sso_kpt24', - id=81, - color=colors['sso'], - type='', - swap='sso_kpt8'), - 82: - dict( - name='sso_kpt25', - id=82, - color=colors['sso'], - type='', - swap='sso_kpt7'), - 83: - dict( - name='sso_kpt26', - id=83, - color=colors['sso'], - type='', - swap='sso_kpt2'), - 84: - dict( - name='sso_kpt27', - id=84, - color=colors['sso'], - type='', - swap='sso_kpt30'), - 85: - dict( - name='sso_kpt28', - id=85, - color=colors['sso'], - type='', - swap='sso_kpt31'), - 86: - dict( - name='sso_kpt29', - id=86, - color=colors['sso'], - type='', - swap='sso_kpt16'), - 87: - dict( - name='sso_kpt30', - id=87, - color=colors['sso'], - type='', - swap='sso_kpt27'), - 88: - dict( - name='sso_kpt31', - id=88, - color=colors['sso'], - type='', - swap='sso_kpt28'), + 58: dict(name="sso_kpt1", id=58, color=colors["sso"], type="", swap=""), + 59: dict(name="sso_kpt2", id=59, color=colors["sso"], type="", swap="sso_kpt26"), + 60: dict(name="sso_kpt3", id=60, color=colors["sso"], type="", swap="sso_kpt5"), + 61: dict(name="sso_kpt4", id=61, color=colors["sso"], type="", swap="sso_kpt6"), + 62: dict(name="sso_kpt5", id=62, color=colors["sso"], type="", swap="sso_kpt3"), + 63: dict(name="sso_kpt6", id=63, color=colors["sso"], type="", swap="sso_kpt4"), + 64: dict(name="sso_kpt7", id=64, color=colors["sso"], type="", swap="sso_kpt25"), + 65: dict(name="sso_kpt8", id=65, color=colors["sso"], type="", swap="sso_kpt24"), + 66: dict(name="sso_kpt9", id=66, color=colors["sso"], type="", swap="sso_kpt23"), + 67: dict(name="sso_kpt10", id=67, color=colors["sso"], type="", swap="sso_kpt22"), + 68: dict(name="sso_kpt11", id=68, color=colors["sso"], type="", swap="sso_kpt21"), + 69: dict(name="sso_kpt12", id=69, color=colors["sso"], type="", swap="sso_kpt20"), + 70: dict(name="sso_kpt13", id=70, color=colors["sso"], type="", swap="sso_kpt19"), + 71: dict(name="sso_kpt14", id=71, color=colors["sso"], type="", swap="sso_kpt18"), + 72: dict(name="sso_kpt15", id=72, color=colors["sso"], type="", swap="sso_kpt17"), + 73: dict(name="sso_kpt16", id=73, color=colors["sso"], type="", swap="sso_kpt29"), + 74: dict(name="sso_kpt17", id=74, color=colors["sso"], type="", swap="sso_kpt15"), + 75: dict(name="sso_kpt18", id=75, color=colors["sso"], type="", swap="sso_kpt14"), + 76: dict(name="sso_kpt19", id=76, color=colors["sso"], type="", swap="sso_kpt13"), + 77: dict(name="sso_kpt20", id=77, color=colors["sso"], type="", swap="sso_kpt12"), + 78: dict(name="sso_kpt21", id=78, color=colors["sso"], type="", swap="sso_kpt11"), + 79: dict(name="sso_kpt22", id=79, color=colors["sso"], type="", swap="sso_kpt10"), + 80: dict(name="sso_kpt23", id=80, color=colors["sso"], type="", swap="sso_kpt9"), + 81: dict(name="sso_kpt24", id=81, color=colors["sso"], type="", swap="sso_kpt8"), + 82: dict(name="sso_kpt25", id=82, color=colors["sso"], type="", swap="sso_kpt7"), + 83: dict(name="sso_kpt26", id=83, color=colors["sso"], type="", swap="sso_kpt2"), + 84: dict(name="sso_kpt27", id=84, color=colors["sso"], type="", swap="sso_kpt30"), + 85: dict(name="sso_kpt28", id=85, color=colors["sso"], type="", swap="sso_kpt31"), + 86: dict(name="sso_kpt29", id=86, color=colors["sso"], type="", swap="sso_kpt16"), + 87: dict(name="sso_kpt30", id=87, color=colors["sso"], type="", swap="sso_kpt27"), + 88: dict(name="sso_kpt31", id=88, color=colors["sso"], type="", swap="sso_kpt28"), # long_sleeved_outwear - 89: - dict(name='lso_kpt1', id=89, color=colors['lso'], type='', swap=''), - 90: - dict( - name='lso_kpt2', - id=90, - color=colors['lso'], - type='', - swap='lso_kpt6'), - 91: - dict( - name='lso_kpt3', - id=91, - color=colors['lso'], - type='', - swap='lso_kpt5'), - 92: - dict( - name='lso_kpt4', - id=92, - color=colors['lso'], - type='', - swap='lso_kpt34'), - 93: - dict( - name='lso_kpt5', - id=93, - color=colors['lso'], - type='', - swap='lso_kpt3'), - 94: - dict( - name='lso_kpt6', - id=94, - color=colors['lso'], - type='', - swap='lso_kpt2'), - 95: - dict( - name='lso_kpt7', - id=95, - color=colors['lso'], - type='', - swap='lso_kpt33'), - 96: - dict( - name='lso_kpt8', - id=96, - color=colors['lso'], - type='', - swap='lso_kpt32'), - 97: - dict( - name='lso_kpt9', - id=97, - color=colors['lso'], - type='', - swap='lso_kpt31'), - 98: - dict( - name='lso_kpt10', - id=98, - color=colors['lso'], - type='', - swap='lso_kpt30'), - 99: - dict( - name='lso_kpt11', - id=99, - color=colors['lso'], - type='', - swap='lso_kpt29'), - 100: - dict( - name='lso_kpt12', - id=100, - color=colors['lso'], - type='', - swap='lso_kpt28'), - 101: - dict( - name='lso_kpt13', - id=101, - color=colors['lso'], - type='', - swap='lso_kpt27'), - 102: - dict( - name='lso_kpt14', - id=102, - color=colors['lso'], - type='', - swap='lso_kpt26'), - 103: - dict( - name='lso_kpt15', - id=103, - color=colors['lso'], - type='', - swap='lso_kpt25'), - 104: - dict( - name='lso_kpt16', - id=104, - color=colors['lso'], - type='', - swap='lso_kpt24'), - 105: - dict( - name='lso_kpt17', - id=105, - color=colors['lso'], - type='', - swap='lso_kpt23'), - 106: - dict( - name='lso_kpt18', - id=106, - color=colors['lso'], - type='', - swap='lso_kpt22'), - 107: - dict( - name='lso_kpt19', - id=107, - color=colors['lso'], - type='', - swap='lso_kpt21'), - 108: - dict( - name='lso_kpt20', - id=108, - color=colors['lso'], - type='', - swap='lso_kpt37'), - 109: - dict( - name='lso_kpt21', - id=109, - color=colors['lso'], - type='', - swap='lso_kpt19'), - 110: - dict( - name='lso_kpt22', - id=110, - color=colors['lso'], - type='', - swap='lso_kpt18'), - 111: - dict( - name='lso_kpt23', - id=111, - color=colors['lso'], - type='', - swap='lso_kpt17'), - 112: - dict( - name='lso_kpt24', - id=112, - color=colors['lso'], - type='', - swap='lso_kpt16'), - 113: - dict( - name='lso_kpt25', - id=113, - color=colors['lso'], - type='', - swap='lso_kpt15'), - 114: - dict( - name='lso_kpt26', - id=114, - color=colors['lso'], - type='', - swap='lso_kpt14'), - 115: - dict( - name='lso_kpt27', - id=115, - color=colors['lso'], - type='', - swap='lso_kpt13'), - 116: - dict( - name='lso_kpt28', - id=116, - color=colors['lso'], - type='', - swap='lso_kpt12'), - 117: - dict( - name='lso_kpt29', - id=117, - color=colors['lso'], - type='', - swap='lso_kpt11'), - 118: - dict( - name='lso_kpt30', - id=118, - color=colors['lso'], - type='', - swap='lso_kpt10'), - 119: - dict( - name='lso_kpt31', - id=119, - color=colors['lso'], - type='', - swap='lso_kpt9'), - 120: - dict( - name='lso_kpt32', - id=120, - color=colors['lso'], - type='', - swap='lso_kpt8'), - 121: - dict( - name='lso_kpt33', - id=121, - color=colors['lso'], - type='', - swap='lso_kpt7'), - 122: - dict( - name='lso_kpt34', - id=122, - color=colors['lso'], - type='', - swap='lso_kpt4'), - 123: - dict( - name='lso_kpt35', - id=123, - color=colors['lso'], - type='', - swap='lso_kpt38'), - 124: - dict( - name='lso_kpt36', - id=124, - color=colors['lso'], - type='', - swap='lso_kpt39'), - 125: - dict( - name='lso_kpt37', - id=125, - color=colors['lso'], - type='', - swap='lso_kpt20'), - 126: - dict( - name='lso_kpt38', - id=126, - color=colors['lso'], - type='', - swap='lso_kpt35'), - 127: - dict( - name='lso_kpt39', - id=127, - color=colors['lso'], - type='', - swap='lso_kpt36'), + 89: dict(name="lso_kpt1", id=89, color=colors["lso"], type="", swap=""), + 90: dict(name="lso_kpt2", id=90, color=colors["lso"], type="", swap="lso_kpt6"), + 91: dict(name="lso_kpt3", id=91, color=colors["lso"], type="", swap="lso_kpt5"), + 92: dict(name="lso_kpt4", id=92, color=colors["lso"], type="", swap="lso_kpt34"), + 93: dict(name="lso_kpt5", id=93, color=colors["lso"], type="", swap="lso_kpt3"), + 94: dict(name="lso_kpt6", id=94, color=colors["lso"], type="", swap="lso_kpt2"), + 95: dict(name="lso_kpt7", id=95, color=colors["lso"], type="", swap="lso_kpt33"), + 96: dict(name="lso_kpt8", id=96, color=colors["lso"], type="", swap="lso_kpt32"), + 97: dict(name="lso_kpt9", id=97, color=colors["lso"], type="", swap="lso_kpt31"), + 98: dict(name="lso_kpt10", id=98, color=colors["lso"], type="", swap="lso_kpt30"), + 99: dict(name="lso_kpt11", id=99, color=colors["lso"], type="", swap="lso_kpt29"), + 100: dict(name="lso_kpt12", id=100, color=colors["lso"], type="", swap="lso_kpt28"), + 101: dict(name="lso_kpt13", id=101, color=colors["lso"], type="", swap="lso_kpt27"), + 102: dict(name="lso_kpt14", id=102, color=colors["lso"], type="", swap="lso_kpt26"), + 103: dict(name="lso_kpt15", id=103, color=colors["lso"], type="", swap="lso_kpt25"), + 104: dict(name="lso_kpt16", id=104, color=colors["lso"], type="", swap="lso_kpt24"), + 105: dict(name="lso_kpt17", id=105, color=colors["lso"], type="", swap="lso_kpt23"), + 106: dict(name="lso_kpt18", id=106, color=colors["lso"], type="", swap="lso_kpt22"), + 107: dict(name="lso_kpt19", id=107, color=colors["lso"], type="", swap="lso_kpt21"), + 108: dict(name="lso_kpt20", id=108, color=colors["lso"], type="", swap="lso_kpt37"), + 109: dict(name="lso_kpt21", id=109, color=colors["lso"], type="", swap="lso_kpt19"), + 110: dict(name="lso_kpt22", id=110, color=colors["lso"], type="", swap="lso_kpt18"), + 111: dict(name="lso_kpt23", id=111, color=colors["lso"], type="", swap="lso_kpt17"), + 112: dict(name="lso_kpt24", id=112, color=colors["lso"], type="", swap="lso_kpt16"), + 113: dict(name="lso_kpt25", id=113, color=colors["lso"], type="", swap="lso_kpt15"), + 114: dict(name="lso_kpt26", id=114, color=colors["lso"], type="", swap="lso_kpt14"), + 115: dict(name="lso_kpt27", id=115, color=colors["lso"], type="", swap="lso_kpt13"), + 116: dict(name="lso_kpt28", id=116, color=colors["lso"], type="", swap="lso_kpt12"), + 117: dict(name="lso_kpt29", id=117, color=colors["lso"], type="", swap="lso_kpt11"), + 118: dict(name="lso_kpt30", id=118, color=colors["lso"], type="", swap="lso_kpt10"), + 119: dict(name="lso_kpt31", id=119, color=colors["lso"], type="", swap="lso_kpt9"), + 120: dict(name="lso_kpt32", id=120, color=colors["lso"], type="", swap="lso_kpt8"), + 121: dict(name="lso_kpt33", id=121, color=colors["lso"], type="", swap="lso_kpt7"), + 122: dict(name="lso_kpt34", id=122, color=colors["lso"], type="", swap="lso_kpt4"), + 123: dict(name="lso_kpt35", id=123, color=colors["lso"], type="", swap="lso_kpt38"), + 124: dict(name="lso_kpt36", id=124, color=colors["lso"], type="", swap="lso_kpt39"), + 125: dict(name="lso_kpt37", id=125, color=colors["lso"], type="", swap="lso_kpt20"), + 126: dict(name="lso_kpt38", id=126, color=colors["lso"], type="", swap="lso_kpt35"), + 127: dict(name="lso_kpt39", id=127, color=colors["lso"], type="", swap="lso_kpt36"), # vest - 128: - dict(name='vest_kpt1', id=128, color=colors['vest'], type='', swap=''), - 129: - dict( - name='vest_kpt2', - id=129, - color=colors['vest'], - type='', - swap='vest_kpt6'), - 130: - dict( - name='vest_kpt3', - id=130, - color=colors['vest'], - type='', - swap='vest_kpt5'), - 131: - dict(name='vest_kpt4', id=131, color=colors['vest'], type='', swap=''), - 132: - dict( - name='vest_kpt5', - id=132, - color=colors['vest'], - type='', - swap='vest_kpt3'), - 133: - dict( - name='vest_kpt6', - id=133, - color=colors['vest'], - type='', - swap='vest_kpt2'), - 134: - dict( - name='vest_kpt7', - id=134, - color=colors['vest'], - type='', - swap='vest_kpt15'), - 135: - dict( - name='vest_kpt8', - id=135, - color=colors['vest'], - type='', - swap='vest_kpt14'), - 136: - dict( - name='vest_kpt9', - id=136, - color=colors['vest'], - type='', - swap='vest_kpt13'), - 137: - dict( - name='vest_kpt10', - id=137, - color=colors['vest'], - type='', - swap='vest_kpt12'), - 138: - dict( - name='vest_kpt11', id=138, color=colors['vest'], type='', swap=''), - 139: - dict( - name='vest_kpt12', - id=139, - color=colors['vest'], - type='', - swap='vest_kpt10'), - 140: - dict( - name='vest_kpt13', id=140, color=colors['vest'], type='', swap=''), - 141: - dict( - name='vest_kpt14', - id=141, - color=colors['vest'], - type='', - swap='vest_kpt8'), - 142: - dict( - name='vest_kpt15', - id=142, - color=colors['vest'], - type='', - swap='vest_kpt7'), + 128: dict(name="vest_kpt1", id=128, color=colors["vest"], type="", swap=""), + 129: dict(name="vest_kpt2", id=129, color=colors["vest"], type="", swap="vest_kpt6"), + 130: dict(name="vest_kpt3", id=130, color=colors["vest"], type="", swap="vest_kpt5"), + 131: dict(name="vest_kpt4", id=131, color=colors["vest"], type="", swap=""), + 132: dict(name="vest_kpt5", id=132, color=colors["vest"], type="", swap="vest_kpt3"), + 133: dict(name="vest_kpt6", id=133, color=colors["vest"], type="", swap="vest_kpt2"), + 134: dict(name="vest_kpt7", id=134, color=colors["vest"], type="", swap="vest_kpt15"), + 135: dict(name="vest_kpt8", id=135, color=colors["vest"], type="", swap="vest_kpt14"), + 136: dict(name="vest_kpt9", id=136, color=colors["vest"], type="", swap="vest_kpt13"), + 137: dict(name="vest_kpt10", id=137, color=colors["vest"], type="", swap="vest_kpt12"), + 138: dict(name="vest_kpt11", id=138, color=colors["vest"], type="", swap=""), + 139: dict(name="vest_kpt12", id=139, color=colors["vest"], type="", swap="vest_kpt10"), + 140: dict(name="vest_kpt13", id=140, color=colors["vest"], type="", swap=""), + 141: dict(name="vest_kpt14", id=141, color=colors["vest"], type="", swap="vest_kpt8"), + 142: dict(name="vest_kpt15", id=142, color=colors["vest"], type="", swap="vest_kpt7"), # sling - 143: - dict( - name='sling_kpt1', id=143, color=colors['sling'], type='', - swap=''), - 144: - dict( - name='sling_kpt2', - id=144, - color=colors['sling'], - type='', - swap='sling_kpt6'), - 145: - dict( - name='sling_kpt3', - id=145, - color=colors['sling'], - type='', - swap='sling_kpt5'), - 146: - dict( - name='sling_kpt4', id=146, color=colors['sling'], type='', - swap=''), - 147: - dict( - name='sling_kpt5', - id=147, - color=colors['sling'], - type='', - swap='sling_kpt3'), - 148: - dict( - name='sling_kpt6', - id=148, - color=colors['sling'], - type='', - swap='sling_kpt2'), - 149: - dict( - name='sling_kpt7', - id=149, - color=colors['sling'], - type='', - swap='sling_kpt15'), - 150: - dict( - name='sling_kpt8', - id=150, - color=colors['sling'], - type='', - swap='sling_kpt14'), - 151: - dict( - name='sling_kpt9', - id=151, - color=colors['sling'], - type='', - swap='sling_kpt13'), - 152: - dict( - name='sling_kpt10', - id=152, - color=colors['sling'], - type='', - swap='sling_kpt12'), - 153: - dict( - name='sling_kpt11', - id=153, - color=colors['sling'], - type='', - swap=''), - 154: - dict( - name='sling_kpt12', - id=154, - color=colors['sling'], - type='', - swap='sling_kpt10'), - 155: - dict( - name='sling_kpt13', - id=155, - color=colors['sling'], - type='', - swap='sling_kpt9'), - 156: - dict( - name='sling_kpt14', - id=156, - color=colors['sling'], - type='', - swap='sling_kpt8'), - 157: - dict( - name='sling_kpt15', - id=157, - color=colors['sling'], - type='', - swap='sling_kpt7'), + 143: dict(name="sling_kpt1", id=143, color=colors["sling"], type="", swap=""), + 144: dict(name="sling_kpt2", id=144, color=colors["sling"], type="", swap="sling_kpt6"), + 145: dict(name="sling_kpt3", id=145, color=colors["sling"], type="", swap="sling_kpt5"), + 146: dict(name="sling_kpt4", id=146, color=colors["sling"], type="", swap=""), + 147: dict(name="sling_kpt5", id=147, color=colors["sling"], type="", swap="sling_kpt3"), + 148: dict(name="sling_kpt6", id=148, color=colors["sling"], type="", swap="sling_kpt2"), + 149: dict(name="sling_kpt7", id=149, color=colors["sling"], type="", swap="sling_kpt15"), + 150: dict(name="sling_kpt8", id=150, color=colors["sling"], type="", swap="sling_kpt14"), + 151: dict(name="sling_kpt9", id=151, color=colors["sling"], type="", swap="sling_kpt13"), + 152: dict(name="sling_kpt10", id=152, color=colors["sling"], type="", swap="sling_kpt12"), + 153: dict(name="sling_kpt11", id=153, color=colors["sling"], type="", swap=""), + 154: dict(name="sling_kpt12", id=154, color=colors["sling"], type="", swap="sling_kpt10"), + 155: dict(name="sling_kpt13", id=155, color=colors["sling"], type="", swap="sling_kpt9"), + 156: dict(name="sling_kpt14", id=156, color=colors["sling"], type="", swap="sling_kpt8"), + 157: dict(name="sling_kpt15", id=157, color=colors["sling"], type="", swap="sling_kpt7"), # shorts - 158: - dict( - name='shorts_kpt1', - id=158, - color=colors['shorts'], - type='', - swap='shorts_kpt3'), - 159: - dict( - name='shorts_kpt2', - id=159, - color=colors['shorts'], - type='', - swap=''), - 160: - dict( - name='shorts_kpt3', - id=160, - color=colors['shorts'], - type='', - swap='shorts_kpt1'), - 161: - dict( - name='shorts_kpt4', - id=161, - color=colors['shorts'], - type='', - swap='shorts_kpt10'), - 162: - dict( - name='shorts_kpt5', - id=162, - color=colors['shorts'], - type='', - swap='shorts_kpt9'), - 163: - dict( - name='shorts_kpt6', - id=163, - color=colors['shorts'], - type='', - swap='shorts_kpt8'), - 164: - dict( - name='shorts_kpt7', - id=164, - color=colors['shorts'], - type='', - swap=''), - 165: - dict( - name='shorts_kpt8', - id=165, - color=colors['shorts'], - type='', - swap='shorts_kpt6'), - 166: - dict( - name='shorts_kpt9', - id=166, - color=colors['shorts'], - type='', - swap='shorts_kpt5'), - 167: - dict( - name='shorts_kpt10', - id=167, - color=colors['shorts'], - type='', - swap='shorts_kpt4'), + 158: dict(name="shorts_kpt1", id=158, color=colors["shorts"], type="", swap="shorts_kpt3"), + 159: dict(name="shorts_kpt2", id=159, color=colors["shorts"], type="", swap=""), + 160: dict(name="shorts_kpt3", id=160, color=colors["shorts"], type="", swap="shorts_kpt1"), + 161: dict(name="shorts_kpt4", id=161, color=colors["shorts"], type="", swap="shorts_kpt10"), + 162: dict(name="shorts_kpt5", id=162, color=colors["shorts"], type="", swap="shorts_kpt9"), + 163: dict(name="shorts_kpt6", id=163, color=colors["shorts"], type="", swap="shorts_kpt8"), + 164: dict(name="shorts_kpt7", id=164, color=colors["shorts"], type="", swap=""), + 165: dict(name="shorts_kpt8", id=165, color=colors["shorts"], type="", swap="shorts_kpt6"), + 166: dict(name="shorts_kpt9", id=166, color=colors["shorts"], type="", swap="shorts_kpt5"), + 167: dict(name="shorts_kpt10", id=167, color=colors["shorts"], type="", swap="shorts_kpt4"), # trousers - 168: - dict( - name='trousers_kpt1', - id=168, - color=colors['trousers'], - type='', - swap='trousers_kpt3'), - 169: - dict( - name='trousers_kpt2', - id=169, - color=colors['trousers'], - type='', - swap=''), - 170: - dict( - name='trousers_kpt3', - id=170, - color=colors['trousers'], - type='', - swap='trousers_kpt1'), - 171: - dict( - name='trousers_kpt4', - id=171, - color=colors['trousers'], - type='', - swap='trousers_kpt14'), - 172: - dict( - name='trousers_kpt5', - id=172, - color=colors['trousers'], - type='', - swap='trousers_kpt13'), - 173: - dict( - name='trousers_kpt6', - id=173, - color=colors['trousers'], - type='', - swap='trousers_kpt12'), - 174: - dict( - name='trousers_kpt7', - id=174, - color=colors['trousers'], - type='', - swap='trousers_kpt11'), - 175: - dict( - name='trousers_kpt8', - id=175, - color=colors['trousers'], - type='', - swap='trousers_kpt10'), - 176: - dict( - name='trousers_kpt9', - id=176, - color=colors['trousers'], - type='', - swap=''), - 177: - dict( - name='trousers_kpt10', - id=177, - color=colors['trousers'], - type='', - swap='trousers_kpt8'), - 178: - dict( - name='trousers_kpt11', - id=178, - color=colors['trousers'], - type='', - swap='trousers_kpt7'), - 179: - dict( - name='trousers_kpt12', - id=179, - color=colors['trousers'], - type='', - swap='trousers_kpt6'), - 180: - dict( - name='trousers_kpt13', - id=180, - color=colors['trousers'], - type='', - swap='trousers_kpt5'), - 181: - dict( - name='trousers_kpt14', - id=181, - color=colors['trousers'], - type='', - swap='trousers_kpt4'), + 168: dict(name="trousers_kpt1", id=168, color=colors["trousers"], type="", swap="trousers_kpt3"), + 169: dict(name="trousers_kpt2", id=169, color=colors["trousers"], type="", swap=""), + 170: dict(name="trousers_kpt3", id=170, color=colors["trousers"], type="", swap="trousers_kpt1"), + 171: dict(name="trousers_kpt4", id=171, color=colors["trousers"], type="", swap="trousers_kpt14"), + 172: dict(name="trousers_kpt5", id=172, color=colors["trousers"], type="", swap="trousers_kpt13"), + 173: dict(name="trousers_kpt6", id=173, color=colors["trousers"], type="", swap="trousers_kpt12"), + 174: dict(name="trousers_kpt7", id=174, color=colors["trousers"], type="", swap="trousers_kpt11"), + 175: dict(name="trousers_kpt8", id=175, color=colors["trousers"], type="", swap="trousers_kpt10"), + 176: dict(name="trousers_kpt9", id=176, color=colors["trousers"], type="", swap=""), + 177: dict(name="trousers_kpt10", id=177, color=colors["trousers"], type="", swap="trousers_kpt8"), + 178: dict(name="trousers_kpt11", id=178, color=colors["trousers"], type="", swap="trousers_kpt7"), + 179: dict(name="trousers_kpt12", id=179, color=colors["trousers"], type="", swap="trousers_kpt6"), + 180: dict(name="trousers_kpt13", id=180, color=colors["trousers"], type="", swap="trousers_kpt5"), + 181: dict(name="trousers_kpt14", id=181, color=colors["trousers"], type="", swap="trousers_kpt4"), # skirt - 182: - dict( - name='skirt_kpt1', - id=182, - color=colors['skirt'], - type='', - swap='skirt_kpt3'), - 183: - dict( - name='skirt_kpt2', id=183, color=colors['skirt'], type='', - swap=''), - 184: - dict( - name='skirt_kpt3', - id=184, - color=colors['skirt'], - type='', - swap='skirt_kpt1'), - 185: - dict( - name='skirt_kpt4', - id=185, - color=colors['skirt'], - type='', - swap='skirt_kpt8'), - 186: - dict( - name='skirt_kpt5', - id=186, - color=colors['skirt'], - type='', - swap='skirt_kpt7'), - 187: - dict( - name='skirt_kpt6', id=187, color=colors['skirt'], type='', - swap=''), - 188: - dict( - name='skirt_kpt7', - id=188, - color=colors['skirt'], - type='', - swap='skirt_kpt5'), - 189: - dict( - name='skirt_kpt8', - id=189, - color=colors['skirt'], - type='', - swap='skirt_kpt4'), + 182: dict(name="skirt_kpt1", id=182, color=colors["skirt"], type="", swap="skirt_kpt3"), + 183: dict(name="skirt_kpt2", id=183, color=colors["skirt"], type="", swap=""), + 184: dict(name="skirt_kpt3", id=184, color=colors["skirt"], type="", swap="skirt_kpt1"), + 185: dict(name="skirt_kpt4", id=185, color=colors["skirt"], type="", swap="skirt_kpt8"), + 186: dict(name="skirt_kpt5", id=186, color=colors["skirt"], type="", swap="skirt_kpt7"), + 187: dict(name="skirt_kpt6", id=187, color=colors["skirt"], type="", swap=""), + 188: dict(name="skirt_kpt7", id=188, color=colors["skirt"], type="", swap="skirt_kpt5"), + 189: dict(name="skirt_kpt8", id=189, color=colors["skirt"], type="", swap="skirt_kpt4"), # short_sleeved_dress - 190: - dict(name='ssd_kpt1', id=190, color=colors['ssd'], type='', swap=''), - 191: - dict( - name='ssd_kpt2', - id=191, - color=colors['ssd'], - type='', - swap='ssd_kpt6'), - 192: - dict( - name='ssd_kpt3', - id=192, - color=colors['ssd'], - type='', - swap='ssd_kpt5'), - 193: - dict(name='ssd_kpt4', id=193, color=colors['ssd'], type='', swap=''), - 194: - dict( - name='ssd_kpt5', - id=194, - color=colors['ssd'], - type='', - swap='ssd_kpt3'), - 195: - dict( - name='ssd_kpt6', - id=195, - color=colors['ssd'], - type='', - swap='ssd_kpt2'), - 196: - dict( - name='ssd_kpt7', - id=196, - color=colors['ssd'], - type='', - swap='ssd_kpt29'), - 197: - dict( - name='ssd_kpt8', - id=197, - color=colors['ssd'], - type='', - swap='ssd_kpt28'), - 198: - dict( - name='ssd_kpt9', - id=198, - color=colors['ssd'], - type='', - swap='ssd_kpt27'), - 199: - dict( - name='ssd_kpt10', - id=199, - color=colors['ssd'], - type='', - swap='ssd_kpt26'), - 200: - dict( - name='ssd_kpt11', - id=200, - color=colors['ssd'], - type='', - swap='ssd_kpt25'), - 201: - dict( - name='ssd_kpt12', - id=201, - color=colors['ssd'], - type='', - swap='ssd_kpt24'), - 202: - dict( - name='ssd_kpt13', - id=202, - color=colors['ssd'], - type='', - swap='ssd_kpt23'), - 203: - dict( - name='ssd_kpt14', - id=203, - color=colors['ssd'], - type='', - swap='ssd_kpt22'), - 204: - dict( - name='ssd_kpt15', - id=204, - color=colors['ssd'], - type='', - swap='ssd_kpt21'), - 205: - dict( - name='ssd_kpt16', - id=205, - color=colors['ssd'], - type='', - swap='ssd_kpt20'), - 206: - dict( - name='ssd_kpt17', - id=206, - color=colors['ssd'], - type='', - swap='ssd_kpt19'), - 207: - dict(name='ssd_kpt18', id=207, color=colors['ssd'], type='', swap=''), - 208: - dict( - name='ssd_kpt19', - id=208, - color=colors['ssd'], - type='', - swap='ssd_kpt17'), - 209: - dict( - name='ssd_kpt20', - id=209, - color=colors['ssd'], - type='', - swap='ssd_kpt16'), - 210: - dict( - name='ssd_kpt21', - id=210, - color=colors['ssd'], - type='', - swap='ssd_kpt15'), - 211: - dict( - name='ssd_kpt22', - id=211, - color=colors['ssd'], - type='', - swap='ssd_kpt14'), - 212: - dict( - name='ssd_kpt23', - id=212, - color=colors['ssd'], - type='', - swap='ssd_kpt13'), - 213: - dict( - name='ssd_kpt24', - id=213, - color=colors['ssd'], - type='', - swap='ssd_kpt12'), - 214: - dict( - name='ssd_kpt25', - id=214, - color=colors['ssd'], - type='', - swap='ssd_kpt11'), - 215: - dict( - name='ssd_kpt26', - id=215, - color=colors['ssd'], - type='', - swap='ssd_kpt10'), - 216: - dict( - name='ssd_kpt27', - id=216, - color=colors['ssd'], - type='', - swap='ssd_kpt9'), - 217: - dict( - name='ssd_kpt28', - id=217, - color=colors['ssd'], - type='', - swap='ssd_kpt8'), - 218: - dict( - name='ssd_kpt29', - id=218, - color=colors['ssd'], - type='', - swap='ssd_kpt7'), + 190: dict(name="ssd_kpt1", id=190, color=colors["ssd"], type="", swap=""), + 191: dict(name="ssd_kpt2", id=191, color=colors["ssd"], type="", swap="ssd_kpt6"), + 192: dict(name="ssd_kpt3", id=192, color=colors["ssd"], type="", swap="ssd_kpt5"), + 193: dict(name="ssd_kpt4", id=193, color=colors["ssd"], type="", swap=""), + 194: dict(name="ssd_kpt5", id=194, color=colors["ssd"], type="", swap="ssd_kpt3"), + 195: dict(name="ssd_kpt6", id=195, color=colors["ssd"], type="", swap="ssd_kpt2"), + 196: dict(name="ssd_kpt7", id=196, color=colors["ssd"], type="", swap="ssd_kpt29"), + 197: dict(name="ssd_kpt8", id=197, color=colors["ssd"], type="", swap="ssd_kpt28"), + 198: dict(name="ssd_kpt9", id=198, color=colors["ssd"], type="", swap="ssd_kpt27"), + 199: dict(name="ssd_kpt10", id=199, color=colors["ssd"], type="", swap="ssd_kpt26"), + 200: dict(name="ssd_kpt11", id=200, color=colors["ssd"], type="", swap="ssd_kpt25"), + 201: dict(name="ssd_kpt12", id=201, color=colors["ssd"], type="", swap="ssd_kpt24"), + 202: dict(name="ssd_kpt13", id=202, color=colors["ssd"], type="", swap="ssd_kpt23"), + 203: dict(name="ssd_kpt14", id=203, color=colors["ssd"], type="", swap="ssd_kpt22"), + 204: dict(name="ssd_kpt15", id=204, color=colors["ssd"], type="", swap="ssd_kpt21"), + 205: dict(name="ssd_kpt16", id=205, color=colors["ssd"], type="", swap="ssd_kpt20"), + 206: dict(name="ssd_kpt17", id=206, color=colors["ssd"], type="", swap="ssd_kpt19"), + 207: dict(name="ssd_kpt18", id=207, color=colors["ssd"], type="", swap=""), + 208: dict(name="ssd_kpt19", id=208, color=colors["ssd"], type="", swap="ssd_kpt17"), + 209: dict(name="ssd_kpt20", id=209, color=colors["ssd"], type="", swap="ssd_kpt16"), + 210: dict(name="ssd_kpt21", id=210, color=colors["ssd"], type="", swap="ssd_kpt15"), + 211: dict(name="ssd_kpt22", id=211, color=colors["ssd"], type="", swap="ssd_kpt14"), + 212: dict(name="ssd_kpt23", id=212, color=colors["ssd"], type="", swap="ssd_kpt13"), + 213: dict(name="ssd_kpt24", id=213, color=colors["ssd"], type="", swap="ssd_kpt12"), + 214: dict(name="ssd_kpt25", id=214, color=colors["ssd"], type="", swap="ssd_kpt11"), + 215: dict(name="ssd_kpt26", id=215, color=colors["ssd"], type="", swap="ssd_kpt10"), + 216: dict(name="ssd_kpt27", id=216, color=colors["ssd"], type="", swap="ssd_kpt9"), + 217: dict(name="ssd_kpt28", id=217, color=colors["ssd"], type="", swap="ssd_kpt8"), + 218: dict(name="ssd_kpt29", id=218, color=colors["ssd"], type="", swap="ssd_kpt7"), # long_sleeved_dress - 219: - dict(name='lsd_kpt1', id=219, color=colors['lsd'], type='', swap=''), - 220: - dict( - name='lsd_kpt2', - id=220, - color=colors['lsd'], - type='', - swap='lsd_kpt6'), - 221: - dict( - name='lsd_kpt3', - id=221, - color=colors['lsd'], - type='', - swap='lsd_kpt5'), - 222: - dict(name='lsd_kpt4', id=222, color=colors['lsd'], type='', swap=''), - 223: - dict( - name='lsd_kpt5', - id=223, - color=colors['lsd'], - type='', - swap='lsd_kpt3'), - 224: - dict( - name='lsd_kpt6', - id=224, - color=colors['lsd'], - type='', - swap='lsd_kpt2'), - 225: - dict( - name='lsd_kpt7', - id=225, - color=colors['lsd'], - type='', - swap='lsd_kpt37'), - 226: - dict( - name='lsd_kpt8', - id=226, - color=colors['lsd'], - type='', - swap='lsd_kpt36'), - 227: - dict( - name='lsd_kpt9', - id=227, - color=colors['lsd'], - type='', - swap='lsd_kpt35'), - 228: - dict( - name='lsd_kpt10', - id=228, - color=colors['lsd'], - type='', - swap='lsd_kpt34'), - 229: - dict( - name='lsd_kpt11', - id=229, - color=colors['lsd'], - type='', - swap='lsd_kpt33'), - 230: - dict( - name='lsd_kpt12', - id=230, - color=colors['lsd'], - type='', - swap='lsd_kpt32'), - 231: - dict( - name='lsd_kpt13', - id=231, - color=colors['lsd'], - type='', - swap='lsd_kpt31'), - 232: - dict( - name='lsd_kpt14', - id=232, - color=colors['lsd'], - type='', - swap='lsd_kpt30'), - 233: - dict( - name='lsd_kpt15', - id=233, - color=colors['lsd'], - type='', - swap='lsd_kpt29'), - 234: - dict( - name='lsd_kpt16', - id=234, - color=colors['lsd'], - type='', - swap='lsd_kpt28'), - 235: - dict( - name='lsd_kpt17', - id=235, - color=colors['lsd'], - type='', - swap='lsd_kpt27'), - 236: - dict( - name='lsd_kpt18', - id=236, - color=colors['lsd'], - type='', - swap='lsd_kpt26'), - 237: - dict( - name='lsd_kpt19', - id=237, - color=colors['lsd'], - type='', - swap='lsd_kpt25'), - 238: - dict( - name='lsd_kpt20', - id=238, - color=colors['lsd'], - type='', - swap='lsd_kpt24'), - 239: - dict( - name='lsd_kpt21', - id=239, - color=colors['lsd'], - type='', - swap='lsd_kpt23'), - 240: - dict(name='lsd_kpt22', id=240, color=colors['lsd'], type='', swap=''), - 241: - dict( - name='lsd_kpt23', - id=241, - color=colors['lsd'], - type='', - swap='lsd_kpt21'), - 242: - dict( - name='lsd_kpt24', - id=242, - color=colors['lsd'], - type='', - swap='lsd_kpt20'), - 243: - dict( - name='lsd_kpt25', - id=243, - color=colors['lsd'], - type='', - swap='lsd_kpt19'), - 244: - dict( - name='lsd_kpt26', - id=244, - color=colors['lsd'], - type='', - swap='lsd_kpt18'), - 245: - dict( - name='lsd_kpt27', - id=245, - color=colors['lsd'], - type='', - swap='lsd_kpt17'), - 246: - dict( - name='lsd_kpt28', - id=246, - color=colors['lsd'], - type='', - swap='lsd_kpt16'), - 247: - dict( - name='lsd_kpt29', - id=247, - color=colors['lsd'], - type='', - swap='lsd_kpt15'), - 248: - dict( - name='lsd_kpt30', - id=248, - color=colors['lsd'], - type='', - swap='lsd_kpt14'), - 249: - dict( - name='lsd_kpt31', - id=249, - color=colors['lsd'], - type='', - swap='lsd_kpt13'), - 250: - dict( - name='lsd_kpt32', - id=250, - color=colors['lsd'], - type='', - swap='lsd_kpt12'), - 251: - dict( - name='lsd_kpt33', - id=251, - color=colors['lsd'], - type='', - swap='lsd_kpt11'), - 252: - dict( - name='lsd_kpt34', - id=252, - color=colors['lsd'], - type='', - swap='lsd_kpt10'), - 253: - dict( - name='lsd_kpt35', - id=253, - color=colors['lsd'], - type='', - swap='lsd_kpt9'), - 254: - dict( - name='lsd_kpt36', - id=254, - color=colors['lsd'], - type='', - swap='lsd_kpt8'), - 255: - dict( - name='lsd_kpt37', - id=255, - color=colors['lsd'], - type='', - swap='lsd_kpt7'), + 219: dict(name="lsd_kpt1", id=219, color=colors["lsd"], type="", swap=""), + 220: dict(name="lsd_kpt2", id=220, color=colors["lsd"], type="", swap="lsd_kpt6"), + 221: dict(name="lsd_kpt3", id=221, color=colors["lsd"], type="", swap="lsd_kpt5"), + 222: dict(name="lsd_kpt4", id=222, color=colors["lsd"], type="", swap=""), + 223: dict(name="lsd_kpt5", id=223, color=colors["lsd"], type="", swap="lsd_kpt3"), + 224: dict(name="lsd_kpt6", id=224, color=colors["lsd"], type="", swap="lsd_kpt2"), + 225: dict(name="lsd_kpt7", id=225, color=colors["lsd"], type="", swap="lsd_kpt37"), + 226: dict(name="lsd_kpt8", id=226, color=colors["lsd"], type="", swap="lsd_kpt36"), + 227: dict(name="lsd_kpt9", id=227, color=colors["lsd"], type="", swap="lsd_kpt35"), + 228: dict(name="lsd_kpt10", id=228, color=colors["lsd"], type="", swap="lsd_kpt34"), + 229: dict(name="lsd_kpt11", id=229, color=colors["lsd"], type="", swap="lsd_kpt33"), + 230: dict(name="lsd_kpt12", id=230, color=colors["lsd"], type="", swap="lsd_kpt32"), + 231: dict(name="lsd_kpt13", id=231, color=colors["lsd"], type="", swap="lsd_kpt31"), + 232: dict(name="lsd_kpt14", id=232, color=colors["lsd"], type="", swap="lsd_kpt30"), + 233: dict(name="lsd_kpt15", id=233, color=colors["lsd"], type="", swap="lsd_kpt29"), + 234: dict(name="lsd_kpt16", id=234, color=colors["lsd"], type="", swap="lsd_kpt28"), + 235: dict(name="lsd_kpt17", id=235, color=colors["lsd"], type="", swap="lsd_kpt27"), + 236: dict(name="lsd_kpt18", id=236, color=colors["lsd"], type="", swap="lsd_kpt26"), + 237: dict(name="lsd_kpt19", id=237, color=colors["lsd"], type="", swap="lsd_kpt25"), + 238: dict(name="lsd_kpt20", id=238, color=colors["lsd"], type="", swap="lsd_kpt24"), + 239: dict(name="lsd_kpt21", id=239, color=colors["lsd"], type="", swap="lsd_kpt23"), + 240: dict(name="lsd_kpt22", id=240, color=colors["lsd"], type="", swap=""), + 241: dict(name="lsd_kpt23", id=241, color=colors["lsd"], type="", swap="lsd_kpt21"), + 242: dict(name="lsd_kpt24", id=242, color=colors["lsd"], type="", swap="lsd_kpt20"), + 243: dict(name="lsd_kpt25", id=243, color=colors["lsd"], type="", swap="lsd_kpt19"), + 244: dict(name="lsd_kpt26", id=244, color=colors["lsd"], type="", swap="lsd_kpt18"), + 245: dict(name="lsd_kpt27", id=245, color=colors["lsd"], type="", swap="lsd_kpt17"), + 246: dict(name="lsd_kpt28", id=246, color=colors["lsd"], type="", swap="lsd_kpt16"), + 247: dict(name="lsd_kpt29", id=247, color=colors["lsd"], type="", swap="lsd_kpt15"), + 248: dict(name="lsd_kpt30", id=248, color=colors["lsd"], type="", swap="lsd_kpt14"), + 249: dict(name="lsd_kpt31", id=249, color=colors["lsd"], type="", swap="lsd_kpt13"), + 250: dict(name="lsd_kpt32", id=250, color=colors["lsd"], type="", swap="lsd_kpt12"), + 251: dict(name="lsd_kpt33", id=251, color=colors["lsd"], type="", swap="lsd_kpt11"), + 252: dict(name="lsd_kpt34", id=252, color=colors["lsd"], type="", swap="lsd_kpt10"), + 253: dict(name="lsd_kpt35", id=253, color=colors["lsd"], type="", swap="lsd_kpt9"), + 254: dict(name="lsd_kpt36", id=254, color=colors["lsd"], type="", swap="lsd_kpt8"), + 255: dict(name="lsd_kpt37", id=255, color=colors["lsd"], type="", swap="lsd_kpt7"), # vest_dress - 256: - dict(name='vd_kpt1', id=256, color=colors['vd'], type='', swap=''), - 257: - dict( - name='vd_kpt2', - id=257, - color=colors['vd'], - type='', - swap='vd_kpt6'), - 258: - dict( - name='vd_kpt3', - id=258, - color=colors['vd'], - type='', - swap='vd_kpt5'), - 259: - dict(name='vd_kpt4', id=259, color=colors['vd'], type='', swap=''), - 260: - dict( - name='vd_kpt5', - id=260, - color=colors['vd'], - type='', - swap='vd_kpt3'), - 261: - dict( - name='vd_kpt6', - id=261, - color=colors['vd'], - type='', - swap='vd_kpt2'), - 262: - dict( - name='vd_kpt7', - id=262, - color=colors['vd'], - type='', - swap='vd_kpt19'), - 263: - dict( - name='vd_kpt8', - id=263, - color=colors['vd'], - type='', - swap='vd_kpt18'), - 264: - dict( - name='vd_kpt9', - id=264, - color=colors['vd'], - type='', - swap='vd_kpt17'), - 265: - dict( - name='vd_kpt10', - id=265, - color=colors['vd'], - type='', - swap='vd_kpt16'), - 266: - dict( - name='vd_kpt11', - id=266, - color=colors['vd'], - type='', - swap='vd_kpt15'), - 267: - dict( - name='vd_kpt12', - id=267, - color=colors['vd'], - type='', - swap='vd_kpt14'), - 268: - dict(name='vd_kpt13', id=268, color=colors['vd'], type='', swap=''), - 269: - dict( - name='vd_kpt14', - id=269, - color=colors['vd'], - type='', - swap='vd_kpt12'), - 270: - dict( - name='vd_kpt15', - id=270, - color=colors['vd'], - type='', - swap='vd_kpt11'), - 271: - dict( - name='vd_kpt16', - id=271, - color=colors['vd'], - type='', - swap='vd_kpt10'), - 272: - dict( - name='vd_kpt17', - id=272, - color=colors['vd'], - type='', - swap='vd_kpt9'), - 273: - dict( - name='vd_kpt18', - id=273, - color=colors['vd'], - type='', - swap='vd_kpt8'), - 274: - dict( - name='vd_kpt19', - id=274, - color=colors['vd'], - type='', - swap='vd_kpt7'), + 256: dict(name="vd_kpt1", id=256, color=colors["vd"], type="", swap=""), + 257: dict(name="vd_kpt2", id=257, color=colors["vd"], type="", swap="vd_kpt6"), + 258: dict(name="vd_kpt3", id=258, color=colors["vd"], type="", swap="vd_kpt5"), + 259: dict(name="vd_kpt4", id=259, color=colors["vd"], type="", swap=""), + 260: dict(name="vd_kpt5", id=260, color=colors["vd"], type="", swap="vd_kpt3"), + 261: dict(name="vd_kpt6", id=261, color=colors["vd"], type="", swap="vd_kpt2"), + 262: dict(name="vd_kpt7", id=262, color=colors["vd"], type="", swap="vd_kpt19"), + 263: dict(name="vd_kpt8", id=263, color=colors["vd"], type="", swap="vd_kpt18"), + 264: dict(name="vd_kpt9", id=264, color=colors["vd"], type="", swap="vd_kpt17"), + 265: dict(name="vd_kpt10", id=265, color=colors["vd"], type="", swap="vd_kpt16"), + 266: dict(name="vd_kpt11", id=266, color=colors["vd"], type="", swap="vd_kpt15"), + 267: dict(name="vd_kpt12", id=267, color=colors["vd"], type="", swap="vd_kpt14"), + 268: dict(name="vd_kpt13", id=268, color=colors["vd"], type="", swap=""), + 269: dict(name="vd_kpt14", id=269, color=colors["vd"], type="", swap="vd_kpt12"), + 270: dict(name="vd_kpt15", id=270, color=colors["vd"], type="", swap="vd_kpt11"), + 271: dict(name="vd_kpt16", id=271, color=colors["vd"], type="", swap="vd_kpt10"), + 272: dict(name="vd_kpt17", id=272, color=colors["vd"], type="", swap="vd_kpt9"), + 273: dict(name="vd_kpt18", id=273, color=colors["vd"], type="", swap="vd_kpt8"), + 274: dict(name="vd_kpt19", id=274, color=colors["vd"], type="", swap="vd_kpt7"), # sling_dress - 275: - dict(name='sd_kpt1', id=275, color=colors['sd'], type='', swap=''), - 276: - dict( - name='sd_kpt2', - id=276, - color=colors['sd'], - type='', - swap='sd_kpt6'), - 277: - dict( - name='sd_kpt3', - id=277, - color=colors['sd'], - type='', - swap='sd_kpt5'), - 278: - dict(name='sd_kpt4', id=278, color=colors['sd'], type='', swap=''), - 279: - dict( - name='sd_kpt5', - id=279, - color=colors['sd'], - type='', - swap='sd_kpt3'), - 280: - dict( - name='sd_kpt6', - id=280, - color=colors['sd'], - type='', - swap='sd_kpt2'), - 281: - dict( - name='sd_kpt7', - id=281, - color=colors['sd'], - type='', - swap='sd_kpt19'), - 282: - dict( - name='sd_kpt8', - id=282, - color=colors['sd'], - type='', - swap='sd_kpt18'), - 283: - dict( - name='sd_kpt9', - id=283, - color=colors['sd'], - type='', - swap='sd_kpt17'), - 284: - dict( - name='sd_kpt10', - id=284, - color=colors['sd'], - type='', - swap='sd_kpt16'), - 285: - dict( - name='sd_kpt11', - id=285, - color=colors['sd'], - type='', - swap='sd_kpt15'), - 286: - dict( - name='sd_kpt12', - id=286, - color=colors['sd'], - type='', - swap='sd_kpt14'), - 287: - dict(name='sd_kpt13', id=287, color=colors['sd'], type='', swap=''), - 288: - dict( - name='sd_kpt14', - id=288, - color=colors['sd'], - type='', - swap='sd_kpt12'), - 289: - dict( - name='sd_kpt15', - id=289, - color=colors['sd'], - type='', - swap='sd_kpt11'), - 290: - dict( - name='sd_kpt16', - id=290, - color=colors['sd'], - type='', - swap='sd_kpt10'), - 291: - dict( - name='sd_kpt17', - id=291, - color=colors['sd'], - type='', - swap='sd_kpt9'), - 292: - dict( - name='sd_kpt18', - id=292, - color=colors['sd'], - type='', - swap='sd_kpt8'), - 293: - dict( - name='sd_kpt19', - id=293, - color=colors['sd'], - type='', - swap='sd_kpt7'), + 275: dict(name="sd_kpt1", id=275, color=colors["sd"], type="", swap=""), + 276: dict(name="sd_kpt2", id=276, color=colors["sd"], type="", swap="sd_kpt6"), + 277: dict(name="sd_kpt3", id=277, color=colors["sd"], type="", swap="sd_kpt5"), + 278: dict(name="sd_kpt4", id=278, color=colors["sd"], type="", swap=""), + 279: dict(name="sd_kpt5", id=279, color=colors["sd"], type="", swap="sd_kpt3"), + 280: dict(name="sd_kpt6", id=280, color=colors["sd"], type="", swap="sd_kpt2"), + 281: dict(name="sd_kpt7", id=281, color=colors["sd"], type="", swap="sd_kpt19"), + 282: dict(name="sd_kpt8", id=282, color=colors["sd"], type="", swap="sd_kpt18"), + 283: dict(name="sd_kpt9", id=283, color=colors["sd"], type="", swap="sd_kpt17"), + 284: dict(name="sd_kpt10", id=284, color=colors["sd"], type="", swap="sd_kpt16"), + 285: dict(name="sd_kpt11", id=285, color=colors["sd"], type="", swap="sd_kpt15"), + 286: dict(name="sd_kpt12", id=286, color=colors["sd"], type="", swap="sd_kpt14"), + 287: dict(name="sd_kpt13", id=287, color=colors["sd"], type="", swap=""), + 288: dict(name="sd_kpt14", id=288, color=colors["sd"], type="", swap="sd_kpt12"), + 289: dict(name="sd_kpt15", id=289, color=colors["sd"], type="", swap="sd_kpt11"), + 290: dict(name="sd_kpt16", id=290, color=colors["sd"], type="", swap="sd_kpt10"), + 291: dict(name="sd_kpt17", id=291, color=colors["sd"], type="", swap="sd_kpt9"), + 292: dict(name="sd_kpt18", id=292, color=colors["sd"], type="", swap="sd_kpt8"), + 293: dict(name="sd_kpt19", id=293, color=colors["sd"], type="", swap="sd_kpt7"), }, skeleton_info={ # short_sleeved_shirt - 0: - dict(link=('sss_kpt1', 'sss_kpt2'), id=0, color=[255, 128, 0]), - 1: - dict(link=('sss_kpt2', 'sss_kpt7'), id=1, color=[255, 128, 0]), - 2: - dict(link=('sss_kpt7', 'sss_kpt8'), id=2, color=[255, 128, 0]), - 3: - dict(link=('sss_kpt8', 'sss_kpt9'), id=3, color=[255, 128, 0]), - 4: - dict(link=('sss_kpt9', 'sss_kpt10'), id=4, color=[255, 128, 0]), - 5: - dict(link=('sss_kpt10', 'sss_kpt11'), id=5, color=[255, 128, 0]), - 6: - dict(link=('sss_kpt11', 'sss_kpt12'), id=6, color=[255, 128, 0]), - 7: - dict(link=('sss_kpt12', 'sss_kpt13'), id=7, color=[255, 128, 0]), - 8: - dict(link=('sss_kpt13', 'sss_kpt14'), id=8, color=[255, 128, 0]), - 9: - dict(link=('sss_kpt14', 'sss_kpt15'), id=9, color=[255, 128, 0]), - 10: - dict(link=('sss_kpt15', 'sss_kpt16'), id=10, color=[255, 128, 0]), - 11: - dict(link=('sss_kpt16', 'sss_kpt17'), id=11, color=[255, 128, 0]), - 12: - dict(link=('sss_kpt17', 'sss_kpt18'), id=12, color=[255, 128, 0]), - 13: - dict(link=('sss_kpt18', 'sss_kpt19'), id=13, color=[255, 128, 0]), - 14: - dict(link=('sss_kpt19', 'sss_kpt20'), id=14, color=[255, 128, 0]), - 15: - dict(link=('sss_kpt20', 'sss_kpt21'), id=15, color=[255, 128, 0]), - 16: - dict(link=('sss_kpt21', 'sss_kpt22'), id=16, color=[255, 128, 0]), - 17: - dict(link=('sss_kpt22', 'sss_kpt23'), id=17, color=[255, 128, 0]), - 18: - dict(link=('sss_kpt23', 'sss_kpt24'), id=18, color=[255, 128, 0]), - 19: - dict(link=('sss_kpt24', 'sss_kpt25'), id=19, color=[255, 128, 0]), - 20: - dict(link=('sss_kpt25', 'sss_kpt6'), id=20, color=[255, 128, 0]), - 21: - dict(link=('sss_kpt6', 'sss_kpt1'), id=21, color=[255, 128, 0]), - 22: - dict(link=('sss_kpt2', 'sss_kpt3'), id=22, color=[255, 128, 0]), - 23: - dict(link=('sss_kpt3', 'sss_kpt4'), id=23, color=[255, 128, 0]), - 24: - dict(link=('sss_kpt4', 'sss_kpt5'), id=24, color=[255, 128, 0]), - 25: - dict(link=('sss_kpt5', 'sss_kpt6'), id=25, color=[255, 128, 0]), + 0: dict(link=("sss_kpt1", "sss_kpt2"), id=0, color=[255, 128, 0]), + 1: dict(link=("sss_kpt2", "sss_kpt7"), id=1, color=[255, 128, 0]), + 2: dict(link=("sss_kpt7", "sss_kpt8"), id=2, color=[255, 128, 0]), + 3: dict(link=("sss_kpt8", "sss_kpt9"), id=3, color=[255, 128, 0]), + 4: dict(link=("sss_kpt9", "sss_kpt10"), id=4, color=[255, 128, 0]), + 5: dict(link=("sss_kpt10", "sss_kpt11"), id=5, color=[255, 128, 0]), + 6: dict(link=("sss_kpt11", "sss_kpt12"), id=6, color=[255, 128, 0]), + 7: dict(link=("sss_kpt12", "sss_kpt13"), id=7, color=[255, 128, 0]), + 8: dict(link=("sss_kpt13", "sss_kpt14"), id=8, color=[255, 128, 0]), + 9: dict(link=("sss_kpt14", "sss_kpt15"), id=9, color=[255, 128, 0]), + 10: dict(link=("sss_kpt15", "sss_kpt16"), id=10, color=[255, 128, 0]), + 11: dict(link=("sss_kpt16", "sss_kpt17"), id=11, color=[255, 128, 0]), + 12: dict(link=("sss_kpt17", "sss_kpt18"), id=12, color=[255, 128, 0]), + 13: dict(link=("sss_kpt18", "sss_kpt19"), id=13, color=[255, 128, 0]), + 14: dict(link=("sss_kpt19", "sss_kpt20"), id=14, color=[255, 128, 0]), + 15: dict(link=("sss_kpt20", "sss_kpt21"), id=15, color=[255, 128, 0]), + 16: dict(link=("sss_kpt21", "sss_kpt22"), id=16, color=[255, 128, 0]), + 17: dict(link=("sss_kpt22", "sss_kpt23"), id=17, color=[255, 128, 0]), + 18: dict(link=("sss_kpt23", "sss_kpt24"), id=18, color=[255, 128, 0]), + 19: dict(link=("sss_kpt24", "sss_kpt25"), id=19, color=[255, 128, 0]), + 20: dict(link=("sss_kpt25", "sss_kpt6"), id=20, color=[255, 128, 0]), + 21: dict(link=("sss_kpt6", "sss_kpt1"), id=21, color=[255, 128, 0]), + 22: dict(link=("sss_kpt2", "sss_kpt3"), id=22, color=[255, 128, 0]), + 23: dict(link=("sss_kpt3", "sss_kpt4"), id=23, color=[255, 128, 0]), + 24: dict(link=("sss_kpt4", "sss_kpt5"), id=24, color=[255, 128, 0]), + 25: dict(link=("sss_kpt5", "sss_kpt6"), id=25, color=[255, 128, 0]), # long_sleeve_shirt - 26: - dict(link=('lss_kpt1', 'lss_kpt2'), id=26, color=[255, 0, 128]), - 27: - dict(link=('lss_kpt2', 'lss_kpt7'), id=27, color=[255, 0, 128]), - 28: - dict(link=('lss_kpt7', 'lss_kpt8'), id=28, color=[255, 0, 128]), - 29: - dict(link=('lss_kpt8', 'lss_kpt9'), id=29, color=[255, 0, 128]), - 30: - dict(link=('lss_kpt9', 'lss_kpt10'), id=30, color=[255, 0, 128]), - 31: - dict(link=('lss_kpt10', 'lss_kpt11'), id=31, color=[255, 0, 128]), - 32: - dict(link=('lss_kpt11', 'lss_kpt12'), id=32, color=[255, 0, 128]), - 33: - dict(link=('lss_kpt12', 'lss_kpt13'), id=33, color=[255, 0, 128]), - 34: - dict(link=('lss_kpt13', 'lss_kpt14'), id=34, color=[255, 0, 128]), - 35: - dict(link=('lss_kpt14', 'lss_kpt15'), id=35, color=[255, 0, 128]), - 36: - dict(link=('lss_kpt15', 'lss_kpt16'), id=36, color=[255, 0, 128]), - 37: - dict(link=('lss_kpt16', 'lss_kpt17'), id=37, color=[255, 0, 128]), - 38: - dict(link=('lss_kpt17', 'lss_kpt18'), id=38, color=[255, 0, 128]), - 39: - dict(link=('lss_kpt18', 'lss_kpt19'), id=39, color=[255, 0, 128]), - 40: - dict(link=('lss_kpt19', 'lss_kpt20'), id=40, color=[255, 0, 128]), - 41: - dict(link=('lss_kpt20', 'lss_kpt21'), id=41, color=[255, 0, 128]), - 42: - dict(link=('lss_kpt21', 'lss_kpt22'), id=42, color=[255, 0, 128]), - 43: - dict(link=('lss_kpt22', 'lss_kpt23'), id=43, color=[255, 0, 128]), - 44: - dict(link=('lss_kpt23', 'lss_kpt24'), id=44, color=[255, 0, 128]), - 45: - dict(link=('lss_kpt24', 'lss_kpt25'), id=45, color=[255, 0, 128]), - 46: - dict(link=('lss_kpt25', 'lss_kpt26'), id=46, color=[255, 0, 128]), - 47: - dict(link=('lss_kpt26', 'lss_kpt27'), id=47, color=[255, 0, 128]), - 48: - dict(link=('lss_kpt27', 'lss_kpt28'), id=48, color=[255, 0, 128]), - 49: - dict(link=('lss_kpt28', 'lss_kpt29'), id=49, color=[255, 0, 128]), - 50: - dict(link=('lss_kpt29', 'lss_kpt30'), id=50, color=[255, 0, 128]), - 51: - dict(link=('lss_kpt30', 'lss_kpt31'), id=51, color=[255, 0, 128]), - 52: - dict(link=('lss_kpt31', 'lss_kpt32'), id=52, color=[255, 0, 128]), - 53: - dict(link=('lss_kpt32', 'lss_kpt33'), id=53, color=[255, 0, 128]), - 54: - dict(link=('lss_kpt33', 'lss_kpt6'), id=54, color=[255, 0, 128]), - 55: - dict(link=('lss_kpt6', 'lss_kpt5'), id=55, color=[255, 0, 128]), - 56: - dict(link=('lss_kpt5', 'lss_kpt4'), id=56, color=[255, 0, 128]), - 57: - dict(link=('lss_kpt4', 'lss_kpt3'), id=57, color=[255, 0, 128]), - 58: - dict(link=('lss_kpt3', 'lss_kpt2'), id=58, color=[255, 0, 128]), - 59: - dict(link=('lss_kpt6', 'lss_kpt1'), id=59, color=[255, 0, 128]), + 26: dict(link=("lss_kpt1", "lss_kpt2"), id=26, color=[255, 0, 128]), + 27: dict(link=("lss_kpt2", "lss_kpt7"), id=27, color=[255, 0, 128]), + 28: dict(link=("lss_kpt7", "lss_kpt8"), id=28, color=[255, 0, 128]), + 29: dict(link=("lss_kpt8", "lss_kpt9"), id=29, color=[255, 0, 128]), + 30: dict(link=("lss_kpt9", "lss_kpt10"), id=30, color=[255, 0, 128]), + 31: dict(link=("lss_kpt10", "lss_kpt11"), id=31, color=[255, 0, 128]), + 32: dict(link=("lss_kpt11", "lss_kpt12"), id=32, color=[255, 0, 128]), + 33: dict(link=("lss_kpt12", "lss_kpt13"), id=33, color=[255, 0, 128]), + 34: dict(link=("lss_kpt13", "lss_kpt14"), id=34, color=[255, 0, 128]), + 35: dict(link=("lss_kpt14", "lss_kpt15"), id=35, color=[255, 0, 128]), + 36: dict(link=("lss_kpt15", "lss_kpt16"), id=36, color=[255, 0, 128]), + 37: dict(link=("lss_kpt16", "lss_kpt17"), id=37, color=[255, 0, 128]), + 38: dict(link=("lss_kpt17", "lss_kpt18"), id=38, color=[255, 0, 128]), + 39: dict(link=("lss_kpt18", "lss_kpt19"), id=39, color=[255, 0, 128]), + 40: dict(link=("lss_kpt19", "lss_kpt20"), id=40, color=[255, 0, 128]), + 41: dict(link=("lss_kpt20", "lss_kpt21"), id=41, color=[255, 0, 128]), + 42: dict(link=("lss_kpt21", "lss_kpt22"), id=42, color=[255, 0, 128]), + 43: dict(link=("lss_kpt22", "lss_kpt23"), id=43, color=[255, 0, 128]), + 44: dict(link=("lss_kpt23", "lss_kpt24"), id=44, color=[255, 0, 128]), + 45: dict(link=("lss_kpt24", "lss_kpt25"), id=45, color=[255, 0, 128]), + 46: dict(link=("lss_kpt25", "lss_kpt26"), id=46, color=[255, 0, 128]), + 47: dict(link=("lss_kpt26", "lss_kpt27"), id=47, color=[255, 0, 128]), + 48: dict(link=("lss_kpt27", "lss_kpt28"), id=48, color=[255, 0, 128]), + 49: dict(link=("lss_kpt28", "lss_kpt29"), id=49, color=[255, 0, 128]), + 50: dict(link=("lss_kpt29", "lss_kpt30"), id=50, color=[255, 0, 128]), + 51: dict(link=("lss_kpt30", "lss_kpt31"), id=51, color=[255, 0, 128]), + 52: dict(link=("lss_kpt31", "lss_kpt32"), id=52, color=[255, 0, 128]), + 53: dict(link=("lss_kpt32", "lss_kpt33"), id=53, color=[255, 0, 128]), + 54: dict(link=("lss_kpt33", "lss_kpt6"), id=54, color=[255, 0, 128]), + 55: dict(link=("lss_kpt6", "lss_kpt5"), id=55, color=[255, 0, 128]), + 56: dict(link=("lss_kpt5", "lss_kpt4"), id=56, color=[255, 0, 128]), + 57: dict(link=("lss_kpt4", "lss_kpt3"), id=57, color=[255, 0, 128]), + 58: dict(link=("lss_kpt3", "lss_kpt2"), id=58, color=[255, 0, 128]), + 59: dict(link=("lss_kpt6", "lss_kpt1"), id=59, color=[255, 0, 128]), # short_sleeved_outwear - 60: - dict(link=('sso_kpt1', 'sso_kpt4'), id=60, color=[128, 0, 255]), - 61: - dict(link=('sso_kpt4', 'sso_kpt7'), id=61, color=[128, 0, 255]), - 62: - dict(link=('sso_kpt7', 'sso_kpt8'), id=62, color=[128, 0, 255]), - 63: - dict(link=('sso_kpt8', 'sso_kpt9'), id=63, color=[128, 0, 255]), - 64: - dict(link=('sso_kpt9', 'sso_kpt10'), id=64, color=[128, 0, 255]), - 65: - dict(link=('sso_kpt10', 'sso_kpt11'), id=65, color=[128, 0, 255]), - 66: - dict(link=('sso_kpt11', 'sso_kpt12'), id=66, color=[128, 0, 255]), - 67: - dict(link=('sso_kpt12', 'sso_kpt13'), id=67, color=[128, 0, 255]), - 68: - dict(link=('sso_kpt13', 'sso_kpt14'), id=68, color=[128, 0, 255]), - 69: - dict(link=('sso_kpt14', 'sso_kpt15'), id=69, color=[128, 0, 255]), - 70: - dict(link=('sso_kpt15', 'sso_kpt16'), id=70, color=[128, 0, 255]), - 71: - dict(link=('sso_kpt16', 'sso_kpt31'), id=71, color=[128, 0, 255]), - 72: - dict(link=('sso_kpt31', 'sso_kpt30'), id=72, color=[128, 0, 255]), - 73: - dict(link=('sso_kpt30', 'sso_kpt2'), id=73, color=[128, 0, 255]), - 74: - dict(link=('sso_kpt2', 'sso_kpt3'), id=74, color=[128, 0, 255]), - 75: - dict(link=('sso_kpt3', 'sso_kpt4'), id=75, color=[128, 0, 255]), - 76: - dict(link=('sso_kpt1', 'sso_kpt6'), id=76, color=[128, 0, 255]), - 77: - dict(link=('sso_kpt6', 'sso_kpt25'), id=77, color=[128, 0, 255]), - 78: - dict(link=('sso_kpt25', 'sso_kpt24'), id=78, color=[128, 0, 255]), - 79: - dict(link=('sso_kpt24', 'sso_kpt23'), id=79, color=[128, 0, 255]), - 80: - dict(link=('sso_kpt23', 'sso_kpt22'), id=80, color=[128, 0, 255]), - 81: - dict(link=('sso_kpt22', 'sso_kpt21'), id=81, color=[128, 0, 255]), - 82: - dict(link=('sso_kpt21', 'sso_kpt20'), id=82, color=[128, 0, 255]), - 83: - dict(link=('sso_kpt20', 'sso_kpt19'), id=83, color=[128, 0, 255]), - 84: - dict(link=('sso_kpt19', 'sso_kpt18'), id=84, color=[128, 0, 255]), - 85: - dict(link=('sso_kpt18', 'sso_kpt17'), id=85, color=[128, 0, 255]), - 86: - dict(link=('sso_kpt17', 'sso_kpt29'), id=86, color=[128, 0, 255]), - 87: - dict(link=('sso_kpt29', 'sso_kpt28'), id=87, color=[128, 0, 255]), - 88: - dict(link=('sso_kpt28', 'sso_kpt27'), id=88, color=[128, 0, 255]), - 89: - dict(link=('sso_kpt27', 'sso_kpt26'), id=89, color=[128, 0, 255]), - 90: - dict(link=('sso_kpt26', 'sso_kpt5'), id=90, color=[128, 0, 255]), - 91: - dict(link=('sso_kpt5', 'sso_kpt6'), id=91, color=[128, 0, 255]), + 60: dict(link=("sso_kpt1", "sso_kpt4"), id=60, color=[128, 0, 255]), + 61: dict(link=("sso_kpt4", "sso_kpt7"), id=61, color=[128, 0, 255]), + 62: dict(link=("sso_kpt7", "sso_kpt8"), id=62, color=[128, 0, 255]), + 63: dict(link=("sso_kpt8", "sso_kpt9"), id=63, color=[128, 0, 255]), + 64: dict(link=("sso_kpt9", "sso_kpt10"), id=64, color=[128, 0, 255]), + 65: dict(link=("sso_kpt10", "sso_kpt11"), id=65, color=[128, 0, 255]), + 66: dict(link=("sso_kpt11", "sso_kpt12"), id=66, color=[128, 0, 255]), + 67: dict(link=("sso_kpt12", "sso_kpt13"), id=67, color=[128, 0, 255]), + 68: dict(link=("sso_kpt13", "sso_kpt14"), id=68, color=[128, 0, 255]), + 69: dict(link=("sso_kpt14", "sso_kpt15"), id=69, color=[128, 0, 255]), + 70: dict(link=("sso_kpt15", "sso_kpt16"), id=70, color=[128, 0, 255]), + 71: dict(link=("sso_kpt16", "sso_kpt31"), id=71, color=[128, 0, 255]), + 72: dict(link=("sso_kpt31", "sso_kpt30"), id=72, color=[128, 0, 255]), + 73: dict(link=("sso_kpt30", "sso_kpt2"), id=73, color=[128, 0, 255]), + 74: dict(link=("sso_kpt2", "sso_kpt3"), id=74, color=[128, 0, 255]), + 75: dict(link=("sso_kpt3", "sso_kpt4"), id=75, color=[128, 0, 255]), + 76: dict(link=("sso_kpt1", "sso_kpt6"), id=76, color=[128, 0, 255]), + 77: dict(link=("sso_kpt6", "sso_kpt25"), id=77, color=[128, 0, 255]), + 78: dict(link=("sso_kpt25", "sso_kpt24"), id=78, color=[128, 0, 255]), + 79: dict(link=("sso_kpt24", "sso_kpt23"), id=79, color=[128, 0, 255]), + 80: dict(link=("sso_kpt23", "sso_kpt22"), id=80, color=[128, 0, 255]), + 81: dict(link=("sso_kpt22", "sso_kpt21"), id=81, color=[128, 0, 255]), + 82: dict(link=("sso_kpt21", "sso_kpt20"), id=82, color=[128, 0, 255]), + 83: dict(link=("sso_kpt20", "sso_kpt19"), id=83, color=[128, 0, 255]), + 84: dict(link=("sso_kpt19", "sso_kpt18"), id=84, color=[128, 0, 255]), + 85: dict(link=("sso_kpt18", "sso_kpt17"), id=85, color=[128, 0, 255]), + 86: dict(link=("sso_kpt17", "sso_kpt29"), id=86, color=[128, 0, 255]), + 87: dict(link=("sso_kpt29", "sso_kpt28"), id=87, color=[128, 0, 255]), + 88: dict(link=("sso_kpt28", "sso_kpt27"), id=88, color=[128, 0, 255]), + 89: dict(link=("sso_kpt27", "sso_kpt26"), id=89, color=[128, 0, 255]), + 90: dict(link=("sso_kpt26", "sso_kpt5"), id=90, color=[128, 0, 255]), + 91: dict(link=("sso_kpt5", "sso_kpt6"), id=91, color=[128, 0, 255]), # long_sleeved_outwear - 92: - dict(link=('lso_kpt1', 'lso_kpt2'), id=92, color=[0, 128, 255]), - 93: - dict(link=('lso_kpt2', 'lso_kpt7'), id=93, color=[0, 128, 255]), - 94: - dict(link=('lso_kpt7', 'lso_kpt8'), id=94, color=[0, 128, 255]), - 95: - dict(link=('lso_kpt8', 'lso_kpt9'), id=95, color=[0, 128, 255]), - 96: - dict(link=('lso_kpt9', 'lso_kpt10'), id=96, color=[0, 128, 255]), - 97: - dict(link=('lso_kpt10', 'lso_kpt11'), id=97, color=[0, 128, 255]), - 98: - dict(link=('lso_kpt11', 'lso_kpt12'), id=98, color=[0, 128, 255]), - 99: - dict(link=('lso_kpt12', 'lso_kpt13'), id=99, color=[0, 128, 255]), - 100: - dict(link=('lso_kpt13', 'lso_kpt14'), id=100, color=[0, 128, 255]), - 101: - dict(link=('lso_kpt14', 'lso_kpt15'), id=101, color=[0, 128, 255]), - 102: - dict(link=('lso_kpt15', 'lso_kpt16'), id=102, color=[0, 128, 255]), - 103: - dict(link=('lso_kpt16', 'lso_kpt17'), id=103, color=[0, 128, 255]), - 104: - dict(link=('lso_kpt17', 'lso_kpt18'), id=104, color=[0, 128, 255]), - 105: - dict(link=('lso_kpt18', 'lso_kpt19'), id=105, color=[0, 128, 255]), - 106: - dict(link=('lso_kpt19', 'lso_kpt20'), id=106, color=[0, 128, 255]), - 107: - dict(link=('lso_kpt20', 'lso_kpt39'), id=107, color=[0, 128, 255]), - 108: - dict(link=('lso_kpt39', 'lso_kpt38'), id=108, color=[0, 128, 255]), - 109: - dict(link=('lso_kpt38', 'lso_kpt4'), id=109, color=[0, 128, 255]), - 110: - dict(link=('lso_kpt4', 'lso_kpt3'), id=110, color=[0, 128, 255]), - 111: - dict(link=('lso_kpt3', 'lso_kpt2'), id=111, color=[0, 128, 255]), - 112: - dict(link=('lso_kpt1', 'lso_kpt6'), id=112, color=[0, 128, 255]), - 113: - dict(link=('lso_kpt6', 'lso_kpt33'), id=113, color=[0, 128, 255]), - 114: - dict(link=('lso_kpt33', 'lso_kpt32'), id=114, color=[0, 128, 255]), - 115: - dict(link=('lso_kpt32', 'lso_kpt31'), id=115, color=[0, 128, 255]), - 116: - dict(link=('lso_kpt31', 'lso_kpt30'), id=116, color=[0, 128, 255]), - 117: - dict(link=('lso_kpt30', 'lso_kpt29'), id=117, color=[0, 128, 255]), - 118: - dict(link=('lso_kpt29', 'lso_kpt28'), id=118, color=[0, 128, 255]), - 119: - dict(link=('lso_kpt28', 'lso_kpt27'), id=119, color=[0, 128, 255]), - 120: - dict(link=('lso_kpt27', 'lso_kpt26'), id=120, color=[0, 128, 255]), - 121: - dict(link=('lso_kpt26', 'lso_kpt25'), id=121, color=[0, 128, 255]), - 122: - dict(link=('lso_kpt25', 'lso_kpt24'), id=122, color=[0, 128, 255]), - 123: - dict(link=('lso_kpt24', 'lso_kpt23'), id=123, color=[0, 128, 255]), - 124: - dict(link=('lso_kpt23', 'lso_kpt22'), id=124, color=[0, 128, 255]), - 125: - dict(link=('lso_kpt22', 'lso_kpt21'), id=125, color=[0, 128, 255]), - 126: - dict(link=('lso_kpt21', 'lso_kpt37'), id=126, color=[0, 128, 255]), - 127: - dict(link=('lso_kpt37', 'lso_kpt36'), id=127, color=[0, 128, 255]), - 128: - dict(link=('lso_kpt36', 'lso_kpt35'), id=128, color=[0, 128, 255]), - 129: - dict(link=('lso_kpt35', 'lso_kpt34'), id=129, color=[0, 128, 255]), - 130: - dict(link=('lso_kpt34', 'lso_kpt5'), id=130, color=[0, 128, 255]), - 131: - dict(link=('lso_kpt5', 'lso_kpt6'), id=131, color=[0, 128, 255]), + 92: dict(link=("lso_kpt1", "lso_kpt2"), id=92, color=[0, 128, 255]), + 93: dict(link=("lso_kpt2", "lso_kpt7"), id=93, color=[0, 128, 255]), + 94: dict(link=("lso_kpt7", "lso_kpt8"), id=94, color=[0, 128, 255]), + 95: dict(link=("lso_kpt8", "lso_kpt9"), id=95, color=[0, 128, 255]), + 96: dict(link=("lso_kpt9", "lso_kpt10"), id=96, color=[0, 128, 255]), + 97: dict(link=("lso_kpt10", "lso_kpt11"), id=97, color=[0, 128, 255]), + 98: dict(link=("lso_kpt11", "lso_kpt12"), id=98, color=[0, 128, 255]), + 99: dict(link=("lso_kpt12", "lso_kpt13"), id=99, color=[0, 128, 255]), + 100: dict(link=("lso_kpt13", "lso_kpt14"), id=100, color=[0, 128, 255]), + 101: dict(link=("lso_kpt14", "lso_kpt15"), id=101, color=[0, 128, 255]), + 102: dict(link=("lso_kpt15", "lso_kpt16"), id=102, color=[0, 128, 255]), + 103: dict(link=("lso_kpt16", "lso_kpt17"), id=103, color=[0, 128, 255]), + 104: dict(link=("lso_kpt17", "lso_kpt18"), id=104, color=[0, 128, 255]), + 105: dict(link=("lso_kpt18", "lso_kpt19"), id=105, color=[0, 128, 255]), + 106: dict(link=("lso_kpt19", "lso_kpt20"), id=106, color=[0, 128, 255]), + 107: dict(link=("lso_kpt20", "lso_kpt39"), id=107, color=[0, 128, 255]), + 108: dict(link=("lso_kpt39", "lso_kpt38"), id=108, color=[0, 128, 255]), + 109: dict(link=("lso_kpt38", "lso_kpt4"), id=109, color=[0, 128, 255]), + 110: dict(link=("lso_kpt4", "lso_kpt3"), id=110, color=[0, 128, 255]), + 111: dict(link=("lso_kpt3", "lso_kpt2"), id=111, color=[0, 128, 255]), + 112: dict(link=("lso_kpt1", "lso_kpt6"), id=112, color=[0, 128, 255]), + 113: dict(link=("lso_kpt6", "lso_kpt33"), id=113, color=[0, 128, 255]), + 114: dict(link=("lso_kpt33", "lso_kpt32"), id=114, color=[0, 128, 255]), + 115: dict(link=("lso_kpt32", "lso_kpt31"), id=115, color=[0, 128, 255]), + 116: dict(link=("lso_kpt31", "lso_kpt30"), id=116, color=[0, 128, 255]), + 117: dict(link=("lso_kpt30", "lso_kpt29"), id=117, color=[0, 128, 255]), + 118: dict(link=("lso_kpt29", "lso_kpt28"), id=118, color=[0, 128, 255]), + 119: dict(link=("lso_kpt28", "lso_kpt27"), id=119, color=[0, 128, 255]), + 120: dict(link=("lso_kpt27", "lso_kpt26"), id=120, color=[0, 128, 255]), + 121: dict(link=("lso_kpt26", "lso_kpt25"), id=121, color=[0, 128, 255]), + 122: dict(link=("lso_kpt25", "lso_kpt24"), id=122, color=[0, 128, 255]), + 123: dict(link=("lso_kpt24", "lso_kpt23"), id=123, color=[0, 128, 255]), + 124: dict(link=("lso_kpt23", "lso_kpt22"), id=124, color=[0, 128, 255]), + 125: dict(link=("lso_kpt22", "lso_kpt21"), id=125, color=[0, 128, 255]), + 126: dict(link=("lso_kpt21", "lso_kpt37"), id=126, color=[0, 128, 255]), + 127: dict(link=("lso_kpt37", "lso_kpt36"), id=127, color=[0, 128, 255]), + 128: dict(link=("lso_kpt36", "lso_kpt35"), id=128, color=[0, 128, 255]), + 129: dict(link=("lso_kpt35", "lso_kpt34"), id=129, color=[0, 128, 255]), + 130: dict(link=("lso_kpt34", "lso_kpt5"), id=130, color=[0, 128, 255]), + 131: dict(link=("lso_kpt5", "lso_kpt6"), id=131, color=[0, 128, 255]), # vest - 132: - dict(link=('vest_kpt1', 'vest_kpt2'), id=132, color=[0, 128, 128]), - 133: - dict(link=('vest_kpt2', 'vest_kpt7'), id=133, color=[0, 128, 128]), - 134: - dict(link=('vest_kpt7', 'vest_kpt8'), id=134, color=[0, 128, 128]), - 135: - dict(link=('vest_kpt8', 'vest_kpt9'), id=135, color=[0, 128, 128]), - 136: - dict(link=('vest_kpt9', 'vest_kpt10'), id=136, color=[0, 128, 128]), - 137: - dict(link=('vest_kpt10', 'vest_kpt11'), id=137, color=[0, 128, 128]), - 138: - dict(link=('vest_kpt11', 'vest_kpt12'), id=138, color=[0, 128, 128]), - 139: - dict(link=('vest_kpt12', 'vest_kpt13'), id=139, color=[0, 128, 128]), - 140: - dict(link=('vest_kpt13', 'vest_kpt14'), id=140, color=[0, 128, 128]), - 141: - dict(link=('vest_kpt14', 'vest_kpt15'), id=141, color=[0, 128, 128]), - 142: - dict(link=('vest_kpt15', 'vest_kpt6'), id=142, color=[0, 128, 128]), - 143: - dict(link=('vest_kpt6', 'vest_kpt1'), id=143, color=[0, 128, 128]), - 144: - dict(link=('vest_kpt2', 'vest_kpt3'), id=144, color=[0, 128, 128]), - 145: - dict(link=('vest_kpt3', 'vest_kpt4'), id=145, color=[0, 128, 128]), - 146: - dict(link=('vest_kpt4', 'vest_kpt5'), id=146, color=[0, 128, 128]), - 147: - dict(link=('vest_kpt5', 'vest_kpt6'), id=147, color=[0, 128, 128]), + 132: dict(link=("vest_kpt1", "vest_kpt2"), id=132, color=[0, 128, 128]), + 133: dict(link=("vest_kpt2", "vest_kpt7"), id=133, color=[0, 128, 128]), + 134: dict(link=("vest_kpt7", "vest_kpt8"), id=134, color=[0, 128, 128]), + 135: dict(link=("vest_kpt8", "vest_kpt9"), id=135, color=[0, 128, 128]), + 136: dict(link=("vest_kpt9", "vest_kpt10"), id=136, color=[0, 128, 128]), + 137: dict(link=("vest_kpt10", "vest_kpt11"), id=137, color=[0, 128, 128]), + 138: dict(link=("vest_kpt11", "vest_kpt12"), id=138, color=[0, 128, 128]), + 139: dict(link=("vest_kpt12", "vest_kpt13"), id=139, color=[0, 128, 128]), + 140: dict(link=("vest_kpt13", "vest_kpt14"), id=140, color=[0, 128, 128]), + 141: dict(link=("vest_kpt14", "vest_kpt15"), id=141, color=[0, 128, 128]), + 142: dict(link=("vest_kpt15", "vest_kpt6"), id=142, color=[0, 128, 128]), + 143: dict(link=("vest_kpt6", "vest_kpt1"), id=143, color=[0, 128, 128]), + 144: dict(link=("vest_kpt2", "vest_kpt3"), id=144, color=[0, 128, 128]), + 145: dict(link=("vest_kpt3", "vest_kpt4"), id=145, color=[0, 128, 128]), + 146: dict(link=("vest_kpt4", "vest_kpt5"), id=146, color=[0, 128, 128]), + 147: dict(link=("vest_kpt5", "vest_kpt6"), id=147, color=[0, 128, 128]), # sling - 148: - dict(link=('sling_kpt1', 'sling_kpt2'), id=148, color=[0, 0, 128]), - 149: - dict(link=('sling_kpt2', 'sling_kpt8'), id=149, color=[0, 0, 128]), - 150: - dict(link=('sling_kpt8', 'sling_kpt9'), id=150, color=[0, 0, 128]), - 151: - dict(link=('sling_kpt9', 'sling_kpt10'), id=151, color=[0, 0, 128]), - 152: - dict(link=('sling_kpt10', 'sling_kpt11'), id=152, color=[0, 0, 128]), - 153: - dict(link=('sling_kpt11', 'sling_kpt12'), id=153, color=[0, 0, 128]), - 154: - dict(link=('sling_kpt12', 'sling_kpt13'), id=154, color=[0, 0, 128]), - 155: - dict(link=('sling_kpt13', 'sling_kpt14'), id=155, color=[0, 0, 128]), - 156: - dict(link=('sling_kpt14', 'sling_kpt6'), id=156, color=[0, 0, 128]), - 157: - dict(link=('sling_kpt2', 'sling_kpt7'), id=157, color=[0, 0, 128]), - 158: - dict(link=('sling_kpt6', 'sling_kpt15'), id=158, color=[0, 0, 128]), - 159: - dict(link=('sling_kpt2', 'sling_kpt3'), id=159, color=[0, 0, 128]), - 160: - dict(link=('sling_kpt3', 'sling_kpt4'), id=160, color=[0, 0, 128]), - 161: - dict(link=('sling_kpt4', 'sling_kpt5'), id=161, color=[0, 0, 128]), - 162: - dict(link=('sling_kpt5', 'sling_kpt6'), id=162, color=[0, 0, 128]), - 163: - dict(link=('sling_kpt1', 'sling_kpt6'), id=163, color=[0, 0, 128]), + 148: dict(link=("sling_kpt1", "sling_kpt2"), id=148, color=[0, 0, 128]), + 149: dict(link=("sling_kpt2", "sling_kpt8"), id=149, color=[0, 0, 128]), + 150: dict(link=("sling_kpt8", "sling_kpt9"), id=150, color=[0, 0, 128]), + 151: dict(link=("sling_kpt9", "sling_kpt10"), id=151, color=[0, 0, 128]), + 152: dict(link=("sling_kpt10", "sling_kpt11"), id=152, color=[0, 0, 128]), + 153: dict(link=("sling_kpt11", "sling_kpt12"), id=153, color=[0, 0, 128]), + 154: dict(link=("sling_kpt12", "sling_kpt13"), id=154, color=[0, 0, 128]), + 155: dict(link=("sling_kpt13", "sling_kpt14"), id=155, color=[0, 0, 128]), + 156: dict(link=("sling_kpt14", "sling_kpt6"), id=156, color=[0, 0, 128]), + 157: dict(link=("sling_kpt2", "sling_kpt7"), id=157, color=[0, 0, 128]), + 158: dict(link=("sling_kpt6", "sling_kpt15"), id=158, color=[0, 0, 128]), + 159: dict(link=("sling_kpt2", "sling_kpt3"), id=159, color=[0, 0, 128]), + 160: dict(link=("sling_kpt3", "sling_kpt4"), id=160, color=[0, 0, 128]), + 161: dict(link=("sling_kpt4", "sling_kpt5"), id=161, color=[0, 0, 128]), + 162: dict(link=("sling_kpt5", "sling_kpt6"), id=162, color=[0, 0, 128]), + 163: dict(link=("sling_kpt1", "sling_kpt6"), id=163, color=[0, 0, 128]), # shorts - 164: - dict( - link=('shorts_kpt1', 'shorts_kpt4'), id=164, color=[128, 128, - 128]), - 165: - dict( - link=('shorts_kpt4', 'shorts_kpt5'), id=165, color=[128, 128, - 128]), - 166: - dict( - link=('shorts_kpt5', 'shorts_kpt6'), id=166, color=[128, 128, - 128]), - 167: - dict( - link=('shorts_kpt6', 'shorts_kpt7'), id=167, color=[128, 128, - 128]), - 168: - dict( - link=('shorts_kpt7', 'shorts_kpt8'), id=168, color=[128, 128, - 128]), - 169: - dict( - link=('shorts_kpt8', 'shorts_kpt9'), id=169, color=[128, 128, - 128]), - 170: - dict( - link=('shorts_kpt9', 'shorts_kpt10'), - id=170, - color=[128, 128, 128]), - 171: - dict( - link=('shorts_kpt10', 'shorts_kpt3'), - id=171, - color=[128, 128, 128]), - 172: - dict( - link=('shorts_kpt3', 'shorts_kpt2'), id=172, color=[128, 128, - 128]), - 173: - dict( - link=('shorts_kpt2', 'shorts_kpt1'), id=173, color=[128, 128, - 128]), + 164: dict(link=("shorts_kpt1", "shorts_kpt4"), id=164, color=[128, 128, 128]), + 165: dict(link=("shorts_kpt4", "shorts_kpt5"), id=165, color=[128, 128, 128]), + 166: dict(link=("shorts_kpt5", "shorts_kpt6"), id=166, color=[128, 128, 128]), + 167: dict(link=("shorts_kpt6", "shorts_kpt7"), id=167, color=[128, 128, 128]), + 168: dict(link=("shorts_kpt7", "shorts_kpt8"), id=168, color=[128, 128, 128]), + 169: dict(link=("shorts_kpt8", "shorts_kpt9"), id=169, color=[128, 128, 128]), + 170: dict(link=("shorts_kpt9", "shorts_kpt10"), id=170, color=[128, 128, 128]), + 171: dict(link=("shorts_kpt10", "shorts_kpt3"), id=171, color=[128, 128, 128]), + 172: dict(link=("shorts_kpt3", "shorts_kpt2"), id=172, color=[128, 128, 128]), + 173: dict(link=("shorts_kpt2", "shorts_kpt1"), id=173, color=[128, 128, 128]), # trousers - 174: - dict( - link=('trousers_kpt1', 'trousers_kpt4'), - id=174, - color=[128, 0, 128]), - 175: - dict( - link=('trousers_kpt4', 'trousers_kpt5'), - id=175, - color=[128, 0, 128]), - 176: - dict( - link=('trousers_kpt5', 'trousers_kpt6'), - id=176, - color=[128, 0, 128]), - 177: - dict( - link=('trousers_kpt6', 'trousers_kpt7'), - id=177, - color=[128, 0, 128]), - 178: - dict( - link=('trousers_kpt7', 'trousers_kpt8'), - id=178, - color=[128, 0, 128]), - 179: - dict( - link=('trousers_kpt8', 'trousers_kpt9'), - id=179, - color=[128, 0, 128]), - 180: - dict( - link=('trousers_kpt9', 'trousers_kpt10'), - id=180, - color=[128, 0, 128]), - 181: - dict( - link=('trousers_kpt10', 'trousers_kpt11'), - id=181, - color=[128, 0, 128]), - 182: - dict( - link=('trousers_kpt11', 'trousers_kpt12'), - id=182, - color=[128, 0, 128]), - 183: - dict( - link=('trousers_kpt12', 'trousers_kpt13'), - id=183, - color=[128, 0, 128]), - 184: - dict( - link=('trousers_kpt13', 'trousers_kpt14'), - id=184, - color=[128, 0, 128]), - 185: - dict( - link=('trousers_kpt14', 'trousers_kpt3'), - id=185, - color=[128, 0, 128]), - 186: - dict( - link=('trousers_kpt3', 'trousers_kpt2'), - id=186, - color=[128, 0, 128]), - 187: - dict( - link=('trousers_kpt2', 'trousers_kpt1'), - id=187, - color=[128, 0, 128]), + 174: dict(link=("trousers_kpt1", "trousers_kpt4"), id=174, color=[128, 0, 128]), + 175: dict(link=("trousers_kpt4", "trousers_kpt5"), id=175, color=[128, 0, 128]), + 176: dict(link=("trousers_kpt5", "trousers_kpt6"), id=176, color=[128, 0, 128]), + 177: dict(link=("trousers_kpt6", "trousers_kpt7"), id=177, color=[128, 0, 128]), + 178: dict(link=("trousers_kpt7", "trousers_kpt8"), id=178, color=[128, 0, 128]), + 179: dict(link=("trousers_kpt8", "trousers_kpt9"), id=179, color=[128, 0, 128]), + 180: dict(link=("trousers_kpt9", "trousers_kpt10"), id=180, color=[128, 0, 128]), + 181: dict(link=("trousers_kpt10", "trousers_kpt11"), id=181, color=[128, 0, 128]), + 182: dict(link=("trousers_kpt11", "trousers_kpt12"), id=182, color=[128, 0, 128]), + 183: dict(link=("trousers_kpt12", "trousers_kpt13"), id=183, color=[128, 0, 128]), + 184: dict(link=("trousers_kpt13", "trousers_kpt14"), id=184, color=[128, 0, 128]), + 185: dict(link=("trousers_kpt14", "trousers_kpt3"), id=185, color=[128, 0, 128]), + 186: dict(link=("trousers_kpt3", "trousers_kpt2"), id=186, color=[128, 0, 128]), + 187: dict(link=("trousers_kpt2", "trousers_kpt1"), id=187, color=[128, 0, 128]), # skirt - 188: - dict(link=('skirt_kpt1', 'skirt_kpt4'), id=188, color=[64, 128, 128]), - 189: - dict(link=('skirt_kpt4', 'skirt_kpt5'), id=189, color=[64, 128, 128]), - 190: - dict(link=('skirt_kpt5', 'skirt_kpt6'), id=190, color=[64, 128, 128]), - 191: - dict(link=('skirt_kpt6', 'skirt_kpt7'), id=191, color=[64, 128, 128]), - 192: - dict(link=('skirt_kpt7', 'skirt_kpt8'), id=192, color=[64, 128, 128]), - 193: - dict(link=('skirt_kpt8', 'skirt_kpt3'), id=193, color=[64, 128, 128]), - 194: - dict(link=('skirt_kpt3', 'skirt_kpt2'), id=194, color=[64, 128, 128]), - 195: - dict(link=('skirt_kpt2', 'skirt_kpt1'), id=195, color=[64, 128, 128]), + 188: dict(link=("skirt_kpt1", "skirt_kpt4"), id=188, color=[64, 128, 128]), + 189: dict(link=("skirt_kpt4", "skirt_kpt5"), id=189, color=[64, 128, 128]), + 190: dict(link=("skirt_kpt5", "skirt_kpt6"), id=190, color=[64, 128, 128]), + 191: dict(link=("skirt_kpt6", "skirt_kpt7"), id=191, color=[64, 128, 128]), + 192: dict(link=("skirt_kpt7", "skirt_kpt8"), id=192, color=[64, 128, 128]), + 193: dict(link=("skirt_kpt8", "skirt_kpt3"), id=193, color=[64, 128, 128]), + 194: dict(link=("skirt_kpt3", "skirt_kpt2"), id=194, color=[64, 128, 128]), + 195: dict(link=("skirt_kpt2", "skirt_kpt1"), id=195, color=[64, 128, 128]), # short_sleeved_dress - 196: - dict(link=('ssd_kpt1', 'ssd_kpt2'), id=196, color=[64, 64, 128]), - 197: - dict(link=('ssd_kpt2', 'ssd_kpt7'), id=197, color=[64, 64, 128]), - 198: - dict(link=('ssd_kpt7', 'ssd_kpt8'), id=198, color=[64, 64, 128]), - 199: - dict(link=('ssd_kpt8', 'ssd_kpt9'), id=199, color=[64, 64, 128]), - 200: - dict(link=('ssd_kpt9', 'ssd_kpt10'), id=200, color=[64, 64, 128]), - 201: - dict(link=('ssd_kpt10', 'ssd_kpt11'), id=201, color=[64, 64, 128]), - 202: - dict(link=('ssd_kpt11', 'ssd_kpt12'), id=202, color=[64, 64, 128]), - 203: - dict(link=('ssd_kpt12', 'ssd_kpt13'), id=203, color=[64, 64, 128]), - 204: - dict(link=('ssd_kpt13', 'ssd_kpt14'), id=204, color=[64, 64, 128]), - 205: - dict(link=('ssd_kpt14', 'ssd_kpt15'), id=205, color=[64, 64, 128]), - 206: - dict(link=('ssd_kpt15', 'ssd_kpt16'), id=206, color=[64, 64, 128]), - 207: - dict(link=('ssd_kpt16', 'ssd_kpt17'), id=207, color=[64, 64, 128]), - 208: - dict(link=('ssd_kpt17', 'ssd_kpt18'), id=208, color=[64, 64, 128]), - 209: - dict(link=('ssd_kpt18', 'ssd_kpt19'), id=209, color=[64, 64, 128]), - 210: - dict(link=('ssd_kpt19', 'ssd_kpt20'), id=210, color=[64, 64, 128]), - 211: - dict(link=('ssd_kpt20', 'ssd_kpt21'), id=211, color=[64, 64, 128]), - 212: - dict(link=('ssd_kpt21', 'ssd_kpt22'), id=212, color=[64, 64, 128]), - 213: - dict(link=('ssd_kpt22', 'ssd_kpt23'), id=213, color=[64, 64, 128]), - 214: - dict(link=('ssd_kpt23', 'ssd_kpt24'), id=214, color=[64, 64, 128]), - 215: - dict(link=('ssd_kpt24', 'ssd_kpt25'), id=215, color=[64, 64, 128]), - 216: - dict(link=('ssd_kpt25', 'ssd_kpt26'), id=216, color=[64, 64, 128]), - 217: - dict(link=('ssd_kpt26', 'ssd_kpt27'), id=217, color=[64, 64, 128]), - 218: - dict(link=('ssd_kpt27', 'ssd_kpt28'), id=218, color=[64, 64, 128]), - 219: - dict(link=('ssd_kpt28', 'ssd_kpt29'), id=219, color=[64, 64, 128]), - 220: - dict(link=('ssd_kpt29', 'ssd_kpt6'), id=220, color=[64, 64, 128]), - 221: - dict(link=('ssd_kpt6', 'ssd_kpt5'), id=221, color=[64, 64, 128]), - 222: - dict(link=('ssd_kpt5', 'ssd_kpt4'), id=222, color=[64, 64, 128]), - 223: - dict(link=('ssd_kpt4', 'ssd_kpt3'), id=223, color=[64, 64, 128]), - 224: - dict(link=('ssd_kpt3', 'ssd_kpt2'), id=224, color=[64, 64, 128]), - 225: - dict(link=('ssd_kpt6', 'ssd_kpt1'), id=225, color=[64, 64, 128]), + 196: dict(link=("ssd_kpt1", "ssd_kpt2"), id=196, color=[64, 64, 128]), + 197: dict(link=("ssd_kpt2", "ssd_kpt7"), id=197, color=[64, 64, 128]), + 198: dict(link=("ssd_kpt7", "ssd_kpt8"), id=198, color=[64, 64, 128]), + 199: dict(link=("ssd_kpt8", "ssd_kpt9"), id=199, color=[64, 64, 128]), + 200: dict(link=("ssd_kpt9", "ssd_kpt10"), id=200, color=[64, 64, 128]), + 201: dict(link=("ssd_kpt10", "ssd_kpt11"), id=201, color=[64, 64, 128]), + 202: dict(link=("ssd_kpt11", "ssd_kpt12"), id=202, color=[64, 64, 128]), + 203: dict(link=("ssd_kpt12", "ssd_kpt13"), id=203, color=[64, 64, 128]), + 204: dict(link=("ssd_kpt13", "ssd_kpt14"), id=204, color=[64, 64, 128]), + 205: dict(link=("ssd_kpt14", "ssd_kpt15"), id=205, color=[64, 64, 128]), + 206: dict(link=("ssd_kpt15", "ssd_kpt16"), id=206, color=[64, 64, 128]), + 207: dict(link=("ssd_kpt16", "ssd_kpt17"), id=207, color=[64, 64, 128]), + 208: dict(link=("ssd_kpt17", "ssd_kpt18"), id=208, color=[64, 64, 128]), + 209: dict(link=("ssd_kpt18", "ssd_kpt19"), id=209, color=[64, 64, 128]), + 210: dict(link=("ssd_kpt19", "ssd_kpt20"), id=210, color=[64, 64, 128]), + 211: dict(link=("ssd_kpt20", "ssd_kpt21"), id=211, color=[64, 64, 128]), + 212: dict(link=("ssd_kpt21", "ssd_kpt22"), id=212, color=[64, 64, 128]), + 213: dict(link=("ssd_kpt22", "ssd_kpt23"), id=213, color=[64, 64, 128]), + 214: dict(link=("ssd_kpt23", "ssd_kpt24"), id=214, color=[64, 64, 128]), + 215: dict(link=("ssd_kpt24", "ssd_kpt25"), id=215, color=[64, 64, 128]), + 216: dict(link=("ssd_kpt25", "ssd_kpt26"), id=216, color=[64, 64, 128]), + 217: dict(link=("ssd_kpt26", "ssd_kpt27"), id=217, color=[64, 64, 128]), + 218: dict(link=("ssd_kpt27", "ssd_kpt28"), id=218, color=[64, 64, 128]), + 219: dict(link=("ssd_kpt28", "ssd_kpt29"), id=219, color=[64, 64, 128]), + 220: dict(link=("ssd_kpt29", "ssd_kpt6"), id=220, color=[64, 64, 128]), + 221: dict(link=("ssd_kpt6", "ssd_kpt5"), id=221, color=[64, 64, 128]), + 222: dict(link=("ssd_kpt5", "ssd_kpt4"), id=222, color=[64, 64, 128]), + 223: dict(link=("ssd_kpt4", "ssd_kpt3"), id=223, color=[64, 64, 128]), + 224: dict(link=("ssd_kpt3", "ssd_kpt2"), id=224, color=[64, 64, 128]), + 225: dict(link=("ssd_kpt6", "ssd_kpt1"), id=225, color=[64, 64, 128]), # long_sleeved_dress - 226: - dict(link=('lsd_kpt1', 'lsd_kpt2'), id=226, color=[128, 64, 0]), - 227: - dict(link=('lsd_kpt2', 'lsd_kpt7'), id=228, color=[128, 64, 0]), - 228: - dict(link=('lsd_kpt7', 'lsd_kpt8'), id=228, color=[128, 64, 0]), - 229: - dict(link=('lsd_kpt8', 'lsd_kpt9'), id=229, color=[128, 64, 0]), - 230: - dict(link=('lsd_kpt9', 'lsd_kpt10'), id=230, color=[128, 64, 0]), - 231: - dict(link=('lsd_kpt10', 'lsd_kpt11'), id=231, color=[128, 64, 0]), - 232: - dict(link=('lsd_kpt11', 'lsd_kpt12'), id=232, color=[128, 64, 0]), - 233: - dict(link=('lsd_kpt12', 'lsd_kpt13'), id=233, color=[128, 64, 0]), - 234: - dict(link=('lsd_kpt13', 'lsd_kpt14'), id=234, color=[128, 64, 0]), - 235: - dict(link=('lsd_kpt14', 'lsd_kpt15'), id=235, color=[128, 64, 0]), - 236: - dict(link=('lsd_kpt15', 'lsd_kpt16'), id=236, color=[128, 64, 0]), - 237: - dict(link=('lsd_kpt16', 'lsd_kpt17'), id=237, color=[128, 64, 0]), - 238: - dict(link=('lsd_kpt17', 'lsd_kpt18'), id=238, color=[128, 64, 0]), - 239: - dict(link=('lsd_kpt18', 'lsd_kpt19'), id=239, color=[128, 64, 0]), - 240: - dict(link=('lsd_kpt19', 'lsd_kpt20'), id=240, color=[128, 64, 0]), - 241: - dict(link=('lsd_kpt20', 'lsd_kpt21'), id=241, color=[128, 64, 0]), - 242: - dict(link=('lsd_kpt21', 'lsd_kpt22'), id=242, color=[128, 64, 0]), - 243: - dict(link=('lsd_kpt22', 'lsd_kpt23'), id=243, color=[128, 64, 0]), - 244: - dict(link=('lsd_kpt23', 'lsd_kpt24'), id=244, color=[128, 64, 0]), - 245: - dict(link=('lsd_kpt24', 'lsd_kpt25'), id=245, color=[128, 64, 0]), - 246: - dict(link=('lsd_kpt25', 'lsd_kpt26'), id=246, color=[128, 64, 0]), - 247: - dict(link=('lsd_kpt26', 'lsd_kpt27'), id=247, color=[128, 64, 0]), - 248: - dict(link=('lsd_kpt27', 'lsd_kpt28'), id=248, color=[128, 64, 0]), - 249: - dict(link=('lsd_kpt28', 'lsd_kpt29'), id=249, color=[128, 64, 0]), - 250: - dict(link=('lsd_kpt29', 'lsd_kpt30'), id=250, color=[128, 64, 0]), - 251: - dict(link=('lsd_kpt30', 'lsd_kpt31'), id=251, color=[128, 64, 0]), - 252: - dict(link=('lsd_kpt31', 'lsd_kpt32'), id=252, color=[128, 64, 0]), - 253: - dict(link=('lsd_kpt32', 'lsd_kpt33'), id=253, color=[128, 64, 0]), - 254: - dict(link=('lsd_kpt33', 'lsd_kpt34'), id=254, color=[128, 64, 0]), - 255: - dict(link=('lsd_kpt34', 'lsd_kpt35'), id=255, color=[128, 64, 0]), - 256: - dict(link=('lsd_kpt35', 'lsd_kpt36'), id=256, color=[128, 64, 0]), - 257: - dict(link=('lsd_kpt36', 'lsd_kpt37'), id=257, color=[128, 64, 0]), - 258: - dict(link=('lsd_kpt37', 'lsd_kpt6'), id=258, color=[128, 64, 0]), - 259: - dict(link=('lsd_kpt6', 'lsd_kpt5'), id=259, color=[128, 64, 0]), - 260: - dict(link=('lsd_kpt5', 'lsd_kpt4'), id=260, color=[128, 64, 0]), - 261: - dict(link=('lsd_kpt4', 'lsd_kpt3'), id=261, color=[128, 64, 0]), - 262: - dict(link=('lsd_kpt3', 'lsd_kpt2'), id=262, color=[128, 64, 0]), - 263: - dict(link=('lsd_kpt6', 'lsd_kpt1'), id=263, color=[128, 64, 0]), + 226: dict(link=("lsd_kpt1", "lsd_kpt2"), id=226, color=[128, 64, 0]), + 227: dict(link=("lsd_kpt2", "lsd_kpt7"), id=228, color=[128, 64, 0]), + 228: dict(link=("lsd_kpt7", "lsd_kpt8"), id=228, color=[128, 64, 0]), + 229: dict(link=("lsd_kpt8", "lsd_kpt9"), id=229, color=[128, 64, 0]), + 230: dict(link=("lsd_kpt9", "lsd_kpt10"), id=230, color=[128, 64, 0]), + 231: dict(link=("lsd_kpt10", "lsd_kpt11"), id=231, color=[128, 64, 0]), + 232: dict(link=("lsd_kpt11", "lsd_kpt12"), id=232, color=[128, 64, 0]), + 233: dict(link=("lsd_kpt12", "lsd_kpt13"), id=233, color=[128, 64, 0]), + 234: dict(link=("lsd_kpt13", "lsd_kpt14"), id=234, color=[128, 64, 0]), + 235: dict(link=("lsd_kpt14", "lsd_kpt15"), id=235, color=[128, 64, 0]), + 236: dict(link=("lsd_kpt15", "lsd_kpt16"), id=236, color=[128, 64, 0]), + 237: dict(link=("lsd_kpt16", "lsd_kpt17"), id=237, color=[128, 64, 0]), + 238: dict(link=("lsd_kpt17", "lsd_kpt18"), id=238, color=[128, 64, 0]), + 239: dict(link=("lsd_kpt18", "lsd_kpt19"), id=239, color=[128, 64, 0]), + 240: dict(link=("lsd_kpt19", "lsd_kpt20"), id=240, color=[128, 64, 0]), + 241: dict(link=("lsd_kpt20", "lsd_kpt21"), id=241, color=[128, 64, 0]), + 242: dict(link=("lsd_kpt21", "lsd_kpt22"), id=242, color=[128, 64, 0]), + 243: dict(link=("lsd_kpt22", "lsd_kpt23"), id=243, color=[128, 64, 0]), + 244: dict(link=("lsd_kpt23", "lsd_kpt24"), id=244, color=[128, 64, 0]), + 245: dict(link=("lsd_kpt24", "lsd_kpt25"), id=245, color=[128, 64, 0]), + 246: dict(link=("lsd_kpt25", "lsd_kpt26"), id=246, color=[128, 64, 0]), + 247: dict(link=("lsd_kpt26", "lsd_kpt27"), id=247, color=[128, 64, 0]), + 248: dict(link=("lsd_kpt27", "lsd_kpt28"), id=248, color=[128, 64, 0]), + 249: dict(link=("lsd_kpt28", "lsd_kpt29"), id=249, color=[128, 64, 0]), + 250: dict(link=("lsd_kpt29", "lsd_kpt30"), id=250, color=[128, 64, 0]), + 251: dict(link=("lsd_kpt30", "lsd_kpt31"), id=251, color=[128, 64, 0]), + 252: dict(link=("lsd_kpt31", "lsd_kpt32"), id=252, color=[128, 64, 0]), + 253: dict(link=("lsd_kpt32", "lsd_kpt33"), id=253, color=[128, 64, 0]), + 254: dict(link=("lsd_kpt33", "lsd_kpt34"), id=254, color=[128, 64, 0]), + 255: dict(link=("lsd_kpt34", "lsd_kpt35"), id=255, color=[128, 64, 0]), + 256: dict(link=("lsd_kpt35", "lsd_kpt36"), id=256, color=[128, 64, 0]), + 257: dict(link=("lsd_kpt36", "lsd_kpt37"), id=257, color=[128, 64, 0]), + 258: dict(link=("lsd_kpt37", "lsd_kpt6"), id=258, color=[128, 64, 0]), + 259: dict(link=("lsd_kpt6", "lsd_kpt5"), id=259, color=[128, 64, 0]), + 260: dict(link=("lsd_kpt5", "lsd_kpt4"), id=260, color=[128, 64, 0]), + 261: dict(link=("lsd_kpt4", "lsd_kpt3"), id=261, color=[128, 64, 0]), + 262: dict(link=("lsd_kpt3", "lsd_kpt2"), id=262, color=[128, 64, 0]), + 263: dict(link=("lsd_kpt6", "lsd_kpt1"), id=263, color=[128, 64, 0]), # vest_dress - 264: - dict(link=('vd_kpt1', 'vd_kpt2'), id=264, color=[128, 64, 255]), - 265: - dict(link=('vd_kpt2', 'vd_kpt7'), id=265, color=[128, 64, 255]), - 266: - dict(link=('vd_kpt7', 'vd_kpt8'), id=266, color=[128, 64, 255]), - 267: - dict(link=('vd_kpt8', 'vd_kpt9'), id=267, color=[128, 64, 255]), - 268: - dict(link=('vd_kpt9', 'vd_kpt10'), id=268, color=[128, 64, 255]), - 269: - dict(link=('vd_kpt10', 'vd_kpt11'), id=269, color=[128, 64, 255]), - 270: - dict(link=('vd_kpt11', 'vd_kpt12'), id=270, color=[128, 64, 255]), - 271: - dict(link=('vd_kpt12', 'vd_kpt13'), id=271, color=[128, 64, 255]), - 272: - dict(link=('vd_kpt13', 'vd_kpt14'), id=272, color=[128, 64, 255]), - 273: - dict(link=('vd_kpt14', 'vd_kpt15'), id=273, color=[128, 64, 255]), - 274: - dict(link=('vd_kpt15', 'vd_kpt16'), id=274, color=[128, 64, 255]), - 275: - dict(link=('vd_kpt16', 'vd_kpt17'), id=275, color=[128, 64, 255]), - 276: - dict(link=('vd_kpt17', 'vd_kpt18'), id=276, color=[128, 64, 255]), - 277: - dict(link=('vd_kpt18', 'vd_kpt19'), id=277, color=[128, 64, 255]), - 278: - dict(link=('vd_kpt19', 'vd_kpt6'), id=278, color=[128, 64, 255]), - 279: - dict(link=('vd_kpt6', 'vd_kpt5'), id=279, color=[128, 64, 255]), - 280: - dict(link=('vd_kpt5', 'vd_kpt4'), id=280, color=[128, 64, 255]), - 281: - dict(link=('vd_kpt4', 'vd_kpt3'), id=281, color=[128, 64, 255]), - 282: - dict(link=('vd_kpt3', 'vd_kpt2'), id=282, color=[128, 64, 255]), - 283: - dict(link=('vd_kpt6', 'vd_kpt1'), id=283, color=[128, 64, 255]), + 264: dict(link=("vd_kpt1", "vd_kpt2"), id=264, color=[128, 64, 255]), + 265: dict(link=("vd_kpt2", "vd_kpt7"), id=265, color=[128, 64, 255]), + 266: dict(link=("vd_kpt7", "vd_kpt8"), id=266, color=[128, 64, 255]), + 267: dict(link=("vd_kpt8", "vd_kpt9"), id=267, color=[128, 64, 255]), + 268: dict(link=("vd_kpt9", "vd_kpt10"), id=268, color=[128, 64, 255]), + 269: dict(link=("vd_kpt10", "vd_kpt11"), id=269, color=[128, 64, 255]), + 270: dict(link=("vd_kpt11", "vd_kpt12"), id=270, color=[128, 64, 255]), + 271: dict(link=("vd_kpt12", "vd_kpt13"), id=271, color=[128, 64, 255]), + 272: dict(link=("vd_kpt13", "vd_kpt14"), id=272, color=[128, 64, 255]), + 273: dict(link=("vd_kpt14", "vd_kpt15"), id=273, color=[128, 64, 255]), + 274: dict(link=("vd_kpt15", "vd_kpt16"), id=274, color=[128, 64, 255]), + 275: dict(link=("vd_kpt16", "vd_kpt17"), id=275, color=[128, 64, 255]), + 276: dict(link=("vd_kpt17", "vd_kpt18"), id=276, color=[128, 64, 255]), + 277: dict(link=("vd_kpt18", "vd_kpt19"), id=277, color=[128, 64, 255]), + 278: dict(link=("vd_kpt19", "vd_kpt6"), id=278, color=[128, 64, 255]), + 279: dict(link=("vd_kpt6", "vd_kpt5"), id=279, color=[128, 64, 255]), + 280: dict(link=("vd_kpt5", "vd_kpt4"), id=280, color=[128, 64, 255]), + 281: dict(link=("vd_kpt4", "vd_kpt3"), id=281, color=[128, 64, 255]), + 282: dict(link=("vd_kpt3", "vd_kpt2"), id=282, color=[128, 64, 255]), + 283: dict(link=("vd_kpt6", "vd_kpt1"), id=283, color=[128, 64, 255]), # sling_dress - 284: - dict(link=('sd_kpt1', 'sd_kpt2'), id=284, color=[128, 64, 0]), - 285: - dict(link=('sd_kpt2', 'sd_kpt8'), id=285, color=[128, 64, 0]), - 286: - dict(link=('sd_kpt8', 'sd_kpt9'), id=286, color=[128, 64, 0]), - 287: - dict(link=('sd_kpt9', 'sd_kpt10'), id=287, color=[128, 64, 0]), - 288: - dict(link=('sd_kpt10', 'sd_kpt11'), id=288, color=[128, 64, 0]), - 289: - dict(link=('sd_kpt11', 'sd_kpt12'), id=289, color=[128, 64, 0]), - 290: - dict(link=('sd_kpt12', 'sd_kpt13'), id=290, color=[128, 64, 0]), - 291: - dict(link=('sd_kpt13', 'sd_kpt14'), id=291, color=[128, 64, 0]), - 292: - dict(link=('sd_kpt14', 'sd_kpt15'), id=292, color=[128, 64, 0]), - 293: - dict(link=('sd_kpt15', 'sd_kpt16'), id=293, color=[128, 64, 0]), - 294: - dict(link=('sd_kpt16', 'sd_kpt17'), id=294, color=[128, 64, 0]), - 295: - dict(link=('sd_kpt17', 'sd_kpt18'), id=295, color=[128, 64, 0]), - 296: - dict(link=('sd_kpt18', 'sd_kpt6'), id=296, color=[128, 64, 0]), - 297: - dict(link=('sd_kpt6', 'sd_kpt5'), id=297, color=[128, 64, 0]), - 298: - dict(link=('sd_kpt5', 'sd_kpt4'), id=298, color=[128, 64, 0]), - 299: - dict(link=('sd_kpt4', 'sd_kpt3'), id=299, color=[128, 64, 0]), - 300: - dict(link=('sd_kpt3', 'sd_kpt2'), id=300, color=[128, 64, 0]), - 301: - dict(link=('sd_kpt2', 'sd_kpt7'), id=301, color=[128, 64, 0]), - 302: - dict(link=('sd_kpt6', 'sd_kpt19'), id=302, color=[128, 64, 0]), - 303: - dict(link=('sd_kpt6', 'sd_kpt1'), id=303, color=[128, 64, 0]), + 284: dict(link=("sd_kpt1", "sd_kpt2"), id=284, color=[128, 64, 0]), + 285: dict(link=("sd_kpt2", "sd_kpt8"), id=285, color=[128, 64, 0]), + 286: dict(link=("sd_kpt8", "sd_kpt9"), id=286, color=[128, 64, 0]), + 287: dict(link=("sd_kpt9", "sd_kpt10"), id=287, color=[128, 64, 0]), + 288: dict(link=("sd_kpt10", "sd_kpt11"), id=288, color=[128, 64, 0]), + 289: dict(link=("sd_kpt11", "sd_kpt12"), id=289, color=[128, 64, 0]), + 290: dict(link=("sd_kpt12", "sd_kpt13"), id=290, color=[128, 64, 0]), + 291: dict(link=("sd_kpt13", "sd_kpt14"), id=291, color=[128, 64, 0]), + 292: dict(link=("sd_kpt14", "sd_kpt15"), id=292, color=[128, 64, 0]), + 293: dict(link=("sd_kpt15", "sd_kpt16"), id=293, color=[128, 64, 0]), + 294: dict(link=("sd_kpt16", "sd_kpt17"), id=294, color=[128, 64, 0]), + 295: dict(link=("sd_kpt17", "sd_kpt18"), id=295, color=[128, 64, 0]), + 296: dict(link=("sd_kpt18", "sd_kpt6"), id=296, color=[128, 64, 0]), + 297: dict(link=("sd_kpt6", "sd_kpt5"), id=297, color=[128, 64, 0]), + 298: dict(link=("sd_kpt5", "sd_kpt4"), id=298, color=[128, 64, 0]), + 299: dict(link=("sd_kpt4", "sd_kpt3"), id=299, color=[128, 64, 0]), + 300: dict(link=("sd_kpt3", "sd_kpt2"), id=300, color=[128, 64, 0]), + 301: dict(link=("sd_kpt2", "sd_kpt7"), id=301, color=[128, 64, 0]), + 302: dict(link=("sd_kpt6", "sd_kpt19"), id=302, color=[128, 64, 0]), + 303: dict(link=("sd_kpt6", "sd_kpt1"), id=303, color=[128, 64, 0]), }, - joint_weights=[1.] * 294, - sigmas=[]) + joint_weights=[1.0] * 294, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/deepfashion_full.py b/mmpose/configs/_base_/datasets/deepfashion_full.py index 4d989069ee7253d3a5b5f01c81135b1a472cd4b2..5f22ac1af6a48a90c50b7c5593dd3740980ea6cb 100644 --- a/mmpose/configs/_base_/datasets/deepfashion_full.py +++ b/mmpose/configs/_base_/datasets/deepfashion_full.py @@ -1,74 +1,23 @@ dataset_info = dict( - dataset_name='deepfashion_full', + dataset_name="deepfashion_full", paper_info=dict( - author='Liu, Ziwei and Luo, Ping and Qiu, Shi ' - 'and Wang, Xiaogang and Tang, Xiaoou', - title='DeepFashion: Powering Robust Clothes Recognition ' - 'and Retrieval with Rich Annotations', - container='Proceedings of IEEE Conference on Computer ' - 'Vision and Pattern Recognition (CVPR)', - year='2016', - homepage='http://mmlab.ie.cuhk.edu.hk/projects/' - 'DeepFashion/LandmarkDetection.html', + author="Liu, Ziwei and Luo, Ping and Qiu, Shi " "and Wang, Xiaogang and Tang, Xiaoou", + title="DeepFashion: Powering Robust Clothes Recognition " "and Retrieval with Rich Annotations", + container="Proceedings of IEEE Conference on Computer " "Vision and Pattern Recognition (CVPR)", + year="2016", + homepage="http://mmlab.ie.cuhk.edu.hk/projects/" "DeepFashion/LandmarkDetection.html", ), keypoint_info={ - 0: - dict( - name='left collar', - id=0, - color=[255, 255, 255], - type='', - swap='right collar'), - 1: - dict( - name='right collar', - id=1, - color=[255, 255, 255], - type='', - swap='left collar'), - 2: - dict( - name='left sleeve', - id=2, - color=[255, 255, 255], - type='', - swap='right sleeve'), - 3: - dict( - name='right sleeve', - id=3, - color=[255, 255, 255], - type='', - swap='left sleeve'), - 4: - dict( - name='left waistline', - id=0, - color=[255, 255, 255], - type='', - swap='right waistline'), - 5: - dict( - name='right waistline', - id=1, - color=[255, 255, 255], - type='', - swap='left waistline'), - 6: - dict( - name='left hem', - id=2, - color=[255, 255, 255], - type='', - swap='right hem'), - 7: - dict( - name='right hem', - id=3, - color=[255, 255, 255], - type='', - swap='left hem'), + 0: dict(name="left collar", id=0, color=[255, 255, 255], type="", swap="right collar"), + 1: dict(name="right collar", id=1, color=[255, 255, 255], type="", swap="left collar"), + 2: dict(name="left sleeve", id=2, color=[255, 255, 255], type="", swap="right sleeve"), + 3: dict(name="right sleeve", id=3, color=[255, 255, 255], type="", swap="left sleeve"), + 4: dict(name="left waistline", id=0, color=[255, 255, 255], type="", swap="right waistline"), + 5: dict(name="right waistline", id=1, color=[255, 255, 255], type="", swap="left waistline"), + 6: dict(name="left hem", id=2, color=[255, 255, 255], type="", swap="right hem"), + 7: dict(name="right hem", id=3, color=[255, 255, 255], type="", swap="left hem"), }, skeleton_info={}, - joint_weights=[1.] * 8, - sigmas=[]) + joint_weights=[1.0] * 8, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/deepfashion_lower.py b/mmpose/configs/_base_/datasets/deepfashion_lower.py index db014a1747ca618f93a7d092d29027015b48ae3c..9af70ffdd2cbc46ceb4134d30e4781130ef10857 100644 --- a/mmpose/configs/_base_/datasets/deepfashion_lower.py +++ b/mmpose/configs/_base_/datasets/deepfashion_lower.py @@ -1,46 +1,19 @@ dataset_info = dict( - dataset_name='deepfashion_lower', + dataset_name="deepfashion_lower", paper_info=dict( - author='Liu, Ziwei and Luo, Ping and Qiu, Shi ' - 'and Wang, Xiaogang and Tang, Xiaoou', - title='DeepFashion: Powering Robust Clothes Recognition ' - 'and Retrieval with Rich Annotations', - container='Proceedings of IEEE Conference on Computer ' - 'Vision and Pattern Recognition (CVPR)', - year='2016', - homepage='http://mmlab.ie.cuhk.edu.hk/projects/' - 'DeepFashion/LandmarkDetection.html', + author="Liu, Ziwei and Luo, Ping and Qiu, Shi " "and Wang, Xiaogang and Tang, Xiaoou", + title="DeepFashion: Powering Robust Clothes Recognition " "and Retrieval with Rich Annotations", + container="Proceedings of IEEE Conference on Computer " "Vision and Pattern Recognition (CVPR)", + year="2016", + homepage="http://mmlab.ie.cuhk.edu.hk/projects/" "DeepFashion/LandmarkDetection.html", ), keypoint_info={ - 0: - dict( - name='left waistline', - id=0, - color=[255, 255, 255], - type='', - swap='right waistline'), - 1: - dict( - name='right waistline', - id=1, - color=[255, 255, 255], - type='', - swap='left waistline'), - 2: - dict( - name='left hem', - id=2, - color=[255, 255, 255], - type='', - swap='right hem'), - 3: - dict( - name='right hem', - id=3, - color=[255, 255, 255], - type='', - swap='left hem'), + 0: dict(name="left waistline", id=0, color=[255, 255, 255], type="", swap="right waistline"), + 1: dict(name="right waistline", id=1, color=[255, 255, 255], type="", swap="left waistline"), + 2: dict(name="left hem", id=2, color=[255, 255, 255], type="", swap="right hem"), + 3: dict(name="right hem", id=3, color=[255, 255, 255], type="", swap="left hem"), }, skeleton_info={}, - joint_weights=[1.] * 4, - sigmas=[]) + joint_weights=[1.0] * 4, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/deepfashion_upper.py b/mmpose/configs/_base_/datasets/deepfashion_upper.py index f0b012fd37bee1ba5ed956a7a5465a8623bf0894..71cfab55952c1255b917e8022e8c63b850b1b0ef 100644 --- a/mmpose/configs/_base_/datasets/deepfashion_upper.py +++ b/mmpose/configs/_base_/datasets/deepfashion_upper.py @@ -1,60 +1,21 @@ dataset_info = dict( - dataset_name='deepfashion_upper', + dataset_name="deepfashion_upper", paper_info=dict( - author='Liu, Ziwei and Luo, Ping and Qiu, Shi ' - 'and Wang, Xiaogang and Tang, Xiaoou', - title='DeepFashion: Powering Robust Clothes Recognition ' - 'and Retrieval with Rich Annotations', - container='Proceedings of IEEE Conference on Computer ' - 'Vision and Pattern Recognition (CVPR)', - year='2016', - homepage='http://mmlab.ie.cuhk.edu.hk/projects/' - 'DeepFashion/LandmarkDetection.html', + author="Liu, Ziwei and Luo, Ping and Qiu, Shi " "and Wang, Xiaogang and Tang, Xiaoou", + title="DeepFashion: Powering Robust Clothes Recognition " "and Retrieval with Rich Annotations", + container="Proceedings of IEEE Conference on Computer " "Vision and Pattern Recognition (CVPR)", + year="2016", + homepage="http://mmlab.ie.cuhk.edu.hk/projects/" "DeepFashion/LandmarkDetection.html", ), keypoint_info={ - 0: - dict( - name='left collar', - id=0, - color=[255, 255, 255], - type='', - swap='right collar'), - 1: - dict( - name='right collar', - id=1, - color=[255, 255, 255], - type='', - swap='left collar'), - 2: - dict( - name='left sleeve', - id=2, - color=[255, 255, 255], - type='', - swap='right sleeve'), - 3: - dict( - name='right sleeve', - id=3, - color=[255, 255, 255], - type='', - swap='left sleeve'), - 4: - dict( - name='left hem', - id=4, - color=[255, 255, 255], - type='', - swap='right hem'), - 5: - dict( - name='right hem', - id=5, - color=[255, 255, 255], - type='', - swap='left hem'), + 0: dict(name="left collar", id=0, color=[255, 255, 255], type="", swap="right collar"), + 1: dict(name="right collar", id=1, color=[255, 255, 255], type="", swap="left collar"), + 2: dict(name="left sleeve", id=2, color=[255, 255, 255], type="", swap="right sleeve"), + 3: dict(name="right sleeve", id=3, color=[255, 255, 255], type="", swap="left sleeve"), + 4: dict(name="left hem", id=4, color=[255, 255, 255], type="", swap="right hem"), + 5: dict(name="right hem", id=5, color=[255, 255, 255], type="", swap="left hem"), }, skeleton_info={}, - joint_weights=[1.] * 6, - sigmas=[]) + joint_weights=[1.0] * 6, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/exlpose.py b/mmpose/configs/_base_/datasets/exlpose.py index 29b758aa21117bb71766373d3eabc8633b372354..c6e294ed20a99ca34e7881b023a515df88a20803 100644 --- a/mmpose/configs/_base_/datasets/exlpose.py +++ b/mmpose/configs/_base_/datasets/exlpose.py @@ -1,125 +1,43 @@ dataset_info = dict( - dataset_name='exlpose', + dataset_name="exlpose", paper_info=dict( - author='Sohyun Lee, Jaesung Rim, Boseung Jeong, Geonu Kim,' - 'ByungJu Woo, Haechan Lee, Sunghyun Cho, Suha Kwak', - title='Human Pose Estimation in Extremely Low-Light Conditions', - container='arXiv', - year='2023', - homepage='https://arxiv.org/abs/2303.15410', + author="Sohyun Lee, Jaesung Rim, Boseung Jeong, Geonu Kim," "ByungJu Woo, Haechan Lee, Sunghyun Cho, Suha Kwak", + title="Human Pose Estimation in Extremely Low-Light Conditions", + container="arXiv", + year="2023", + homepage="https://arxiv.org/abs/2303.15410", ), keypoint_info={ - 0: - dict( - name='left_shoulder', - id=0, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 1: - dict( - name='right_shoulder', - id=1, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 2: - dict( - name='left_elbow', - id=2, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 3: - dict( - name='right_elbow', - id=3, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 4: - dict( - name='left_wrist', - id=4, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 5: - dict( - name='right_wrist', - id=5, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 6: - dict( - name='left_hip', - id=6, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 7: - dict( - name='right_hip', - id=7, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 8: - dict( - name='left_knee', - id=8, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 9: - dict( - name='right_knee', - id=9, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 10: - dict( - name='left_ankle', - id=10, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 11: - dict( - name='right_ankle', - id=11, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 12: - dict(name='head', id=12, color=[51, 153, 255], type='upper', swap=''), - 13: - dict(name='neck', id=13, color=[51, 153, 255], type='upper', swap='') + 0: dict(name="left_shoulder", id=0, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 1: dict(name="right_shoulder", id=1, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 2: dict(name="left_elbow", id=2, color=[0, 255, 0], type="upper", swap="right_elbow"), + 3: dict(name="right_elbow", id=3, color=[255, 128, 0], type="upper", swap="left_elbow"), + 4: dict(name="left_wrist", id=4, color=[0, 255, 0], type="upper", swap="right_wrist"), + 5: dict(name="right_wrist", id=5, color=[255, 128, 0], type="upper", swap="left_wrist"), + 6: dict(name="left_hip", id=6, color=[0, 255, 0], type="lower", swap="right_hip"), + 7: dict(name="right_hip", id=7, color=[255, 128, 0], type="lower", swap="left_hip"), + 8: dict(name="left_knee", id=8, color=[0, 255, 0], type="lower", swap="right_knee"), + 9: dict(name="right_knee", id=9, color=[255, 128, 0], type="lower", swap="left_knee"), + 10: dict(name="left_ankle", id=10, color=[0, 255, 0], type="lower", swap="right_ankle"), + 11: dict(name="right_ankle", id=11, color=[255, 128, 0], type="lower", swap="left_ankle"), + 12: dict(name="head", id=12, color=[51, 153, 255], type="upper", swap=""), + 13: dict(name="neck", id=13, color=[51, 153, 255], type="upper", swap=""), }, skeleton_info={ - 0: dict(link=('head', 'neck'), id=0, color=[51, 153, 255]), - 1: dict(link=('neck', 'left_shoulder'), id=1, color=[51, 153, 255]), - 2: dict(link=('neck', 'right_shoulder'), id=2, color=[51, 153, 255]), - 3: dict(link=('left_shoulder', 'left_elbow'), id=3, color=[0, 255, 0]), - 4: dict(link=('left_elbow', 'left_wrist'), id=4, color=[0, 255, 0]), - 5: dict( - link=('right_shoulder', 'right_elbow'), id=5, color=[255, 128, 0]), - 6: - dict(link=('right_elbow', 'right_wrist'), id=6, color=[255, 128, 0]), - 7: dict(link=('neck', 'right_hip'), id=7, color=[51, 153, 255]), - 8: dict(link=('neck', 'left_hip'), id=8, color=[51, 153, 255]), - 9: dict(link=('right_hip', 'right_knee'), id=9, color=[255, 128, 0]), - 10: - dict(link=('right_knee', 'right_ankle'), id=10, color=[255, 128, 0]), - 11: dict(link=('left_hip', 'left_knee'), id=11, color=[0, 255, 0]), - 12: dict(link=('left_knee', 'left_ankle'), id=12, color=[0, 255, 0]), + 0: dict(link=("head", "neck"), id=0, color=[51, 153, 255]), + 1: dict(link=("neck", "left_shoulder"), id=1, color=[51, 153, 255]), + 2: dict(link=("neck", "right_shoulder"), id=2, color=[51, 153, 255]), + 3: dict(link=("left_shoulder", "left_elbow"), id=3, color=[0, 255, 0]), + 4: dict(link=("left_elbow", "left_wrist"), id=4, color=[0, 255, 0]), + 5: dict(link=("right_shoulder", "right_elbow"), id=5, color=[255, 128, 0]), + 6: dict(link=("right_elbow", "right_wrist"), id=6, color=[255, 128, 0]), + 7: dict(link=("neck", "right_hip"), id=7, color=[51, 153, 255]), + 8: dict(link=("neck", "left_hip"), id=8, color=[51, 153, 255]), + 9: dict(link=("right_hip", "right_knee"), id=9, color=[255, 128, 0]), + 10: dict(link=("right_knee", "right_ankle"), id=10, color=[255, 128, 0]), + 11: dict(link=("left_hip", "left_knee"), id=11, color=[0, 255, 0]), + 12: dict(link=("left_knee", "left_ankle"), id=12, color=[0, 255, 0]), }, - joint_weights=[ - 0.2, 0.2, 0.2, 1.3, 1.5, 0.2, 1.3, 1.5, 0.2, 0.2, 0.5, 0.2, 0.2, 0.5 - ], - sigmas=[ - 0.079, 0.079, 0.072, 0.072, 0.062, 0.062, 0.107, 0.107, 0.087, 0.087, - 0.089, 0.089, 0.079, 0.079 - ]) + joint_weights=[0.2, 0.2, 0.2, 1.3, 1.5, 0.2, 1.3, 1.5, 0.2, 0.2, 0.5, 0.2, 0.2, 0.5], + sigmas=[0.079, 0.079, 0.072, 0.072, 0.062, 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, 0.079, 0.079], +) diff --git a/mmpose/configs/_base_/datasets/fly.py b/mmpose/configs/_base_/datasets/fly.py index 5f94ff57ca93d8f562b6a61b9a67198abdcde217..26f45f34431b55715906506cdb8ad6af203cec70 100644 --- a/mmpose/configs/_base_/datasets/fly.py +++ b/mmpose/configs/_base_/datasets/fly.py @@ -1,237 +1,75 @@ dataset_info = dict( - dataset_name='fly', + dataset_name="fly", paper_info=dict( - author='Pereira, Talmo D and Aldarondo, Diego E and ' - 'Willmore, Lindsay and Kislin, Mikhail and ' - 'Wang, Samuel S-H and Murthy, Mala and Shaevitz, Joshua W', - title='Fast animal pose estimation using deep neural networks', - container='Nature methods', - year='2019', - homepage='https://github.com/jgraving/DeepPoseKit-Data', + author="Pereira, Talmo D and Aldarondo, Diego E and " + "Willmore, Lindsay and Kislin, Mikhail and " + "Wang, Samuel S-H and Murthy, Mala and Shaevitz, Joshua W", + title="Fast animal pose estimation using deep neural networks", + container="Nature methods", + year="2019", + homepage="https://github.com/jgraving/DeepPoseKit-Data", ), keypoint_info={ - 0: - dict(name='head', id=0, color=[255, 255, 255], type='', swap=''), - 1: - dict(name='eyeL', id=1, color=[255, 255, 255], type='', swap='eyeR'), - 2: - dict(name='eyeR', id=2, color=[255, 255, 255], type='', swap='eyeL'), - 3: - dict(name='neck', id=3, color=[255, 255, 255], type='', swap=''), - 4: - dict(name='thorax', id=4, color=[255, 255, 255], type='', swap=''), - 5: - dict(name='abdomen', id=5, color=[255, 255, 255], type='', swap=''), - 6: - dict( - name='forelegR1', - id=6, - color=[255, 255, 255], - type='', - swap='forelegL1'), - 7: - dict( - name='forelegR2', - id=7, - color=[255, 255, 255], - type='', - swap='forelegL2'), - 8: - dict( - name='forelegR3', - id=8, - color=[255, 255, 255], - type='', - swap='forelegL3'), - 9: - dict( - name='forelegR4', - id=9, - color=[255, 255, 255], - type='', - swap='forelegL4'), - 10: - dict( - name='midlegR1', - id=10, - color=[255, 255, 255], - type='', - swap='midlegL1'), - 11: - dict( - name='midlegR2', - id=11, - color=[255, 255, 255], - type='', - swap='midlegL2'), - 12: - dict( - name='midlegR3', - id=12, - color=[255, 255, 255], - type='', - swap='midlegL3'), - 13: - dict( - name='midlegR4', - id=13, - color=[255, 255, 255], - type='', - swap='midlegL4'), - 14: - dict( - name='hindlegR1', - id=14, - color=[255, 255, 255], - type='', - swap='hindlegL1'), - 15: - dict( - name='hindlegR2', - id=15, - color=[255, 255, 255], - type='', - swap='hindlegL2'), - 16: - dict( - name='hindlegR3', - id=16, - color=[255, 255, 255], - type='', - swap='hindlegL3'), - 17: - dict( - name='hindlegR4', - id=17, - color=[255, 255, 255], - type='', - swap='hindlegL4'), - 18: - dict( - name='forelegL1', - id=18, - color=[255, 255, 255], - type='', - swap='forelegR1'), - 19: - dict( - name='forelegL2', - id=19, - color=[255, 255, 255], - type='', - swap='forelegR2'), - 20: - dict( - name='forelegL3', - id=20, - color=[255, 255, 255], - type='', - swap='forelegR3'), - 21: - dict( - name='forelegL4', - id=21, - color=[255, 255, 255], - type='', - swap='forelegR4'), - 22: - dict( - name='midlegL1', - id=22, - color=[255, 255, 255], - type='', - swap='midlegR1'), - 23: - dict( - name='midlegL2', - id=23, - color=[255, 255, 255], - type='', - swap='midlegR2'), - 24: - dict( - name='midlegL3', - id=24, - color=[255, 255, 255], - type='', - swap='midlegR3'), - 25: - dict( - name='midlegL4', - id=25, - color=[255, 255, 255], - type='', - swap='midlegR4'), - 26: - dict( - name='hindlegL1', - id=26, - color=[255, 255, 255], - type='', - swap='hindlegR1'), - 27: - dict( - name='hindlegL2', - id=27, - color=[255, 255, 255], - type='', - swap='hindlegR2'), - 28: - dict( - name='hindlegL3', - id=28, - color=[255, 255, 255], - type='', - swap='hindlegR3'), - 29: - dict( - name='hindlegL4', - id=29, - color=[255, 255, 255], - type='', - swap='hindlegR4'), - 30: - dict( - name='wingL', id=30, color=[255, 255, 255], type='', swap='wingR'), - 31: - dict( - name='wingR', id=31, color=[255, 255, 255], type='', swap='wingL'), + 0: dict(name="head", id=0, color=[255, 255, 255], type="", swap=""), + 1: dict(name="eyeL", id=1, color=[255, 255, 255], type="", swap="eyeR"), + 2: dict(name="eyeR", id=2, color=[255, 255, 255], type="", swap="eyeL"), + 3: dict(name="neck", id=3, color=[255, 255, 255], type="", swap=""), + 4: dict(name="thorax", id=4, color=[255, 255, 255], type="", swap=""), + 5: dict(name="abdomen", id=5, color=[255, 255, 255], type="", swap=""), + 6: dict(name="forelegR1", id=6, color=[255, 255, 255], type="", swap="forelegL1"), + 7: dict(name="forelegR2", id=7, color=[255, 255, 255], type="", swap="forelegL2"), + 8: dict(name="forelegR3", id=8, color=[255, 255, 255], type="", swap="forelegL3"), + 9: dict(name="forelegR4", id=9, color=[255, 255, 255], type="", swap="forelegL4"), + 10: dict(name="midlegR1", id=10, color=[255, 255, 255], type="", swap="midlegL1"), + 11: dict(name="midlegR2", id=11, color=[255, 255, 255], type="", swap="midlegL2"), + 12: dict(name="midlegR3", id=12, color=[255, 255, 255], type="", swap="midlegL3"), + 13: dict(name="midlegR4", id=13, color=[255, 255, 255], type="", swap="midlegL4"), + 14: dict(name="hindlegR1", id=14, color=[255, 255, 255], type="", swap="hindlegL1"), + 15: dict(name="hindlegR2", id=15, color=[255, 255, 255], type="", swap="hindlegL2"), + 16: dict(name="hindlegR3", id=16, color=[255, 255, 255], type="", swap="hindlegL3"), + 17: dict(name="hindlegR4", id=17, color=[255, 255, 255], type="", swap="hindlegL4"), + 18: dict(name="forelegL1", id=18, color=[255, 255, 255], type="", swap="forelegR1"), + 19: dict(name="forelegL2", id=19, color=[255, 255, 255], type="", swap="forelegR2"), + 20: dict(name="forelegL3", id=20, color=[255, 255, 255], type="", swap="forelegR3"), + 21: dict(name="forelegL4", id=21, color=[255, 255, 255], type="", swap="forelegR4"), + 22: dict(name="midlegL1", id=22, color=[255, 255, 255], type="", swap="midlegR1"), + 23: dict(name="midlegL2", id=23, color=[255, 255, 255], type="", swap="midlegR2"), + 24: dict(name="midlegL3", id=24, color=[255, 255, 255], type="", swap="midlegR3"), + 25: dict(name="midlegL4", id=25, color=[255, 255, 255], type="", swap="midlegR4"), + 26: dict(name="hindlegL1", id=26, color=[255, 255, 255], type="", swap="hindlegR1"), + 27: dict(name="hindlegL2", id=27, color=[255, 255, 255], type="", swap="hindlegR2"), + 28: dict(name="hindlegL3", id=28, color=[255, 255, 255], type="", swap="hindlegR3"), + 29: dict(name="hindlegL4", id=29, color=[255, 255, 255], type="", swap="hindlegR4"), + 30: dict(name="wingL", id=30, color=[255, 255, 255], type="", swap="wingR"), + 31: dict(name="wingR", id=31, color=[255, 255, 255], type="", swap="wingL"), }, skeleton_info={ - 0: dict(link=('eyeL', 'head'), id=0, color=[255, 255, 255]), - 1: dict(link=('eyeR', 'head'), id=1, color=[255, 255, 255]), - 2: dict(link=('neck', 'head'), id=2, color=[255, 255, 255]), - 3: dict(link=('thorax', 'neck'), id=3, color=[255, 255, 255]), - 4: dict(link=('abdomen', 'thorax'), id=4, color=[255, 255, 255]), - 5: dict(link=('forelegR2', 'forelegR1'), id=5, color=[255, 255, 255]), - 6: dict(link=('forelegR3', 'forelegR2'), id=6, color=[255, 255, 255]), - 7: dict(link=('forelegR4', 'forelegR3'), id=7, color=[255, 255, 255]), - 8: dict(link=('midlegR2', 'midlegR1'), id=8, color=[255, 255, 255]), - 9: dict(link=('midlegR3', 'midlegR2'), id=9, color=[255, 255, 255]), - 10: dict(link=('midlegR4', 'midlegR3'), id=10, color=[255, 255, 255]), - 11: - dict(link=('hindlegR2', 'hindlegR1'), id=11, color=[255, 255, 255]), - 12: - dict(link=('hindlegR3', 'hindlegR2'), id=12, color=[255, 255, 255]), - 13: - dict(link=('hindlegR4', 'hindlegR3'), id=13, color=[255, 255, 255]), - 14: - dict(link=('forelegL2', 'forelegL1'), id=14, color=[255, 255, 255]), - 15: - dict(link=('forelegL3', 'forelegL2'), id=15, color=[255, 255, 255]), - 16: - dict(link=('forelegL4', 'forelegL3'), id=16, color=[255, 255, 255]), - 17: dict(link=('midlegL2', 'midlegL1'), id=17, color=[255, 255, 255]), - 18: dict(link=('midlegL3', 'midlegL2'), id=18, color=[255, 255, 255]), - 19: dict(link=('midlegL4', 'midlegL3'), id=19, color=[255, 255, 255]), - 20: - dict(link=('hindlegL2', 'hindlegL1'), id=20, color=[255, 255, 255]), - 21: - dict(link=('hindlegL3', 'hindlegL2'), id=21, color=[255, 255, 255]), - 22: - dict(link=('hindlegL4', 'hindlegL3'), id=22, color=[255, 255, 255]), - 23: dict(link=('wingL', 'neck'), id=23, color=[255, 255, 255]), - 24: dict(link=('wingR', 'neck'), id=24, color=[255, 255, 255]) + 0: dict(link=("eyeL", "head"), id=0, color=[255, 255, 255]), + 1: dict(link=("eyeR", "head"), id=1, color=[255, 255, 255]), + 2: dict(link=("neck", "head"), id=2, color=[255, 255, 255]), + 3: dict(link=("thorax", "neck"), id=3, color=[255, 255, 255]), + 4: dict(link=("abdomen", "thorax"), id=4, color=[255, 255, 255]), + 5: dict(link=("forelegR2", "forelegR1"), id=5, color=[255, 255, 255]), + 6: dict(link=("forelegR3", "forelegR2"), id=6, color=[255, 255, 255]), + 7: dict(link=("forelegR4", "forelegR3"), id=7, color=[255, 255, 255]), + 8: dict(link=("midlegR2", "midlegR1"), id=8, color=[255, 255, 255]), + 9: dict(link=("midlegR3", "midlegR2"), id=9, color=[255, 255, 255]), + 10: dict(link=("midlegR4", "midlegR3"), id=10, color=[255, 255, 255]), + 11: dict(link=("hindlegR2", "hindlegR1"), id=11, color=[255, 255, 255]), + 12: dict(link=("hindlegR3", "hindlegR2"), id=12, color=[255, 255, 255]), + 13: dict(link=("hindlegR4", "hindlegR3"), id=13, color=[255, 255, 255]), + 14: dict(link=("forelegL2", "forelegL1"), id=14, color=[255, 255, 255]), + 15: dict(link=("forelegL3", "forelegL2"), id=15, color=[255, 255, 255]), + 16: dict(link=("forelegL4", "forelegL3"), id=16, color=[255, 255, 255]), + 17: dict(link=("midlegL2", "midlegL1"), id=17, color=[255, 255, 255]), + 18: dict(link=("midlegL3", "midlegL2"), id=18, color=[255, 255, 255]), + 19: dict(link=("midlegL4", "midlegL3"), id=19, color=[255, 255, 255]), + 20: dict(link=("hindlegL2", "hindlegL1"), id=20, color=[255, 255, 255]), + 21: dict(link=("hindlegL3", "hindlegL2"), id=21, color=[255, 255, 255]), + 22: dict(link=("hindlegL4", "hindlegL3"), id=22, color=[255, 255, 255]), + 23: dict(link=("wingL", "neck"), id=23, color=[255, 255, 255]), + 24: dict(link=("wingR", "neck"), id=24, color=[255, 255, 255]), }, - joint_weights=[1.] * 32, - sigmas=[]) + joint_weights=[1.0] * 32, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/freihand2d.py b/mmpose/configs/_base_/datasets/freihand2d.py index 8b960d10f3538801531dbccdd67aeac6e73ac572..3f5dc061562abb38790306435571fedc87170f2f 100644 --- a/mmpose/configs/_base_/datasets/freihand2d.py +++ b/mmpose/configs/_base_/datasets/freihand2d.py @@ -1,144 +1,57 @@ dataset_info = dict( - dataset_name='freihand', + dataset_name="freihand", paper_info=dict( - author='Zimmermann, Christian and Ceylan, Duygu and ' - 'Yang, Jimei and Russell, Bryan and ' - 'Argus, Max and Brox, Thomas', - title='Freihand: A dataset for markerless capture of hand pose ' - 'and shape from single rgb images', - container='Proceedings of the IEEE International ' - 'Conference on Computer Vision', - year='2019', - homepage='https://lmb.informatik.uni-freiburg.de/projects/freihand/', + author="Zimmermann, Christian and Ceylan, Duygu and " "Yang, Jimei and Russell, Bryan and " "Argus, Max and Brox, Thomas", + title="Freihand: A dataset for markerless capture of hand pose " "and shape from single rgb images", + container="Proceedings of the IEEE International " "Conference on Computer Vision", + year="2019", + homepage="https://lmb.informatik.uni-freiburg.de/projects/freihand/", ), keypoint_info={ - 0: - dict(name='wrist', id=0, color=[255, 255, 255], type='', swap=''), - 1: - dict(name='thumb1', id=1, color=[255, 128, 0], type='', swap=''), - 2: - dict(name='thumb2', id=2, color=[255, 128, 0], type='', swap=''), - 3: - dict(name='thumb3', id=3, color=[255, 128, 0], type='', swap=''), - 4: - dict(name='thumb4', id=4, color=[255, 128, 0], type='', swap=''), - 5: - dict( - name='forefinger1', id=5, color=[255, 153, 255], type='', swap=''), - 6: - dict( - name='forefinger2', id=6, color=[255, 153, 255], type='', swap=''), - 7: - dict( - name='forefinger3', id=7, color=[255, 153, 255], type='', swap=''), - 8: - dict( - name='forefinger4', id=8, color=[255, 153, 255], type='', swap=''), - 9: - dict( - name='middle_finger1', - id=9, - color=[102, 178, 255], - type='', - swap=''), - 10: - dict( - name='middle_finger2', - id=10, - color=[102, 178, 255], - type='', - swap=''), - 11: - dict( - name='middle_finger3', - id=11, - color=[102, 178, 255], - type='', - swap=''), - 12: - dict( - name='middle_finger4', - id=12, - color=[102, 178, 255], - type='', - swap=''), - 13: - dict( - name='ring_finger1', id=13, color=[255, 51, 51], type='', swap=''), - 14: - dict( - name='ring_finger2', id=14, color=[255, 51, 51], type='', swap=''), - 15: - dict( - name='ring_finger3', id=15, color=[255, 51, 51], type='', swap=''), - 16: - dict( - name='ring_finger4', id=16, color=[255, 51, 51], type='', swap=''), - 17: - dict(name='pinky_finger1', id=17, color=[0, 255, 0], type='', swap=''), - 18: - dict(name='pinky_finger2', id=18, color=[0, 255, 0], type='', swap=''), - 19: - dict(name='pinky_finger3', id=19, color=[0, 255, 0], type='', swap=''), - 20: - dict(name='pinky_finger4', id=20, color=[0, 255, 0], type='', swap='') + 0: dict(name="wrist", id=0, color=[255, 255, 255], type="", swap=""), + 1: dict(name="thumb1", id=1, color=[255, 128, 0], type="", swap=""), + 2: dict(name="thumb2", id=2, color=[255, 128, 0], type="", swap=""), + 3: dict(name="thumb3", id=3, color=[255, 128, 0], type="", swap=""), + 4: dict(name="thumb4", id=4, color=[255, 128, 0], type="", swap=""), + 5: dict(name="forefinger1", id=5, color=[255, 153, 255], type="", swap=""), + 6: dict(name="forefinger2", id=6, color=[255, 153, 255], type="", swap=""), + 7: dict(name="forefinger3", id=7, color=[255, 153, 255], type="", swap=""), + 8: dict(name="forefinger4", id=8, color=[255, 153, 255], type="", swap=""), + 9: dict(name="middle_finger1", id=9, color=[102, 178, 255], type="", swap=""), + 10: dict(name="middle_finger2", id=10, color=[102, 178, 255], type="", swap=""), + 11: dict(name="middle_finger3", id=11, color=[102, 178, 255], type="", swap=""), + 12: dict(name="middle_finger4", id=12, color=[102, 178, 255], type="", swap=""), + 13: dict(name="ring_finger1", id=13, color=[255, 51, 51], type="", swap=""), + 14: dict(name="ring_finger2", id=14, color=[255, 51, 51], type="", swap=""), + 15: dict(name="ring_finger3", id=15, color=[255, 51, 51], type="", swap=""), + 16: dict(name="ring_finger4", id=16, color=[255, 51, 51], type="", swap=""), + 17: dict(name="pinky_finger1", id=17, color=[0, 255, 0], type="", swap=""), + 18: dict(name="pinky_finger2", id=18, color=[0, 255, 0], type="", swap=""), + 19: dict(name="pinky_finger3", id=19, color=[0, 255, 0], type="", swap=""), + 20: dict(name="pinky_finger4", id=20, color=[0, 255, 0], type="", swap=""), }, skeleton_info={ - 0: - dict(link=('wrist', 'thumb1'), id=0, color=[255, 128, 0]), - 1: - dict(link=('thumb1', 'thumb2'), id=1, color=[255, 128, 0]), - 2: - dict(link=('thumb2', 'thumb3'), id=2, color=[255, 128, 0]), - 3: - dict(link=('thumb3', 'thumb4'), id=3, color=[255, 128, 0]), - 4: - dict(link=('wrist', 'forefinger1'), id=4, color=[255, 153, 255]), - 5: - dict(link=('forefinger1', 'forefinger2'), id=5, color=[255, 153, 255]), - 6: - dict(link=('forefinger2', 'forefinger3'), id=6, color=[255, 153, 255]), - 7: - dict(link=('forefinger3', 'forefinger4'), id=7, color=[255, 153, 255]), - 8: - dict(link=('wrist', 'middle_finger1'), id=8, color=[102, 178, 255]), - 9: - dict( - link=('middle_finger1', 'middle_finger2'), - id=9, - color=[102, 178, 255]), - 10: - dict( - link=('middle_finger2', 'middle_finger3'), - id=10, - color=[102, 178, 255]), - 11: - dict( - link=('middle_finger3', 'middle_finger4'), - id=11, - color=[102, 178, 255]), - 12: - dict(link=('wrist', 'ring_finger1'), id=12, color=[255, 51, 51]), - 13: - dict( - link=('ring_finger1', 'ring_finger2'), id=13, color=[255, 51, 51]), - 14: - dict( - link=('ring_finger2', 'ring_finger3'), id=14, color=[255, 51, 51]), - 15: - dict( - link=('ring_finger3', 'ring_finger4'), id=15, color=[255, 51, 51]), - 16: - dict(link=('wrist', 'pinky_finger1'), id=16, color=[0, 255, 0]), - 17: - dict( - link=('pinky_finger1', 'pinky_finger2'), id=17, color=[0, 255, 0]), - 18: - dict( - link=('pinky_finger2', 'pinky_finger3'), id=18, color=[0, 255, 0]), - 19: - dict( - link=('pinky_finger3', 'pinky_finger4'), id=19, color=[0, 255, 0]) + 0: dict(link=("wrist", "thumb1"), id=0, color=[255, 128, 0]), + 1: dict(link=("thumb1", "thumb2"), id=1, color=[255, 128, 0]), + 2: dict(link=("thumb2", "thumb3"), id=2, color=[255, 128, 0]), + 3: dict(link=("thumb3", "thumb4"), id=3, color=[255, 128, 0]), + 4: dict(link=("wrist", "forefinger1"), id=4, color=[255, 153, 255]), + 5: dict(link=("forefinger1", "forefinger2"), id=5, color=[255, 153, 255]), + 6: dict(link=("forefinger2", "forefinger3"), id=6, color=[255, 153, 255]), + 7: dict(link=("forefinger3", "forefinger4"), id=7, color=[255, 153, 255]), + 8: dict(link=("wrist", "middle_finger1"), id=8, color=[102, 178, 255]), + 9: dict(link=("middle_finger1", "middle_finger2"), id=9, color=[102, 178, 255]), + 10: dict(link=("middle_finger2", "middle_finger3"), id=10, color=[102, 178, 255]), + 11: dict(link=("middle_finger3", "middle_finger4"), id=11, color=[102, 178, 255]), + 12: dict(link=("wrist", "ring_finger1"), id=12, color=[255, 51, 51]), + 13: dict(link=("ring_finger1", "ring_finger2"), id=13, color=[255, 51, 51]), + 14: dict(link=("ring_finger2", "ring_finger3"), id=14, color=[255, 51, 51]), + 15: dict(link=("ring_finger3", "ring_finger4"), id=15, color=[255, 51, 51]), + 16: dict(link=("wrist", "pinky_finger1"), id=16, color=[0, 255, 0]), + 17: dict(link=("pinky_finger1", "pinky_finger2"), id=17, color=[0, 255, 0]), + 18: dict(link=("pinky_finger2", "pinky_finger3"), id=18, color=[0, 255, 0]), + 19: dict(link=("pinky_finger3", "pinky_finger4"), id=19, color=[0, 255, 0]), }, - joint_weights=[1.] * 21, - sigmas=[]) + joint_weights=[1.0] * 21, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/h36m.py b/mmpose/configs/_base_/datasets/h36m.py index 00a719d8b19f9ff3c5ef98476d73216055bf9186..36f4463dc3f0dd5ce537ba57d5825ce2b86edb94 100644 --- a/mmpose/configs/_base_/datasets/h36m.py +++ b/mmpose/configs/_base_/datasets/h36m.py @@ -1,152 +1,50 @@ dataset_info = dict( - dataset_name='h36m', + dataset_name="h36m", paper_info=dict( - author='Ionescu, Catalin and Papava, Dragos and ' - 'Olaru, Vlad and Sminchisescu, Cristian', - title='Human3.6M: Large Scale Datasets and Predictive ' - 'Methods for 3D Human Sensing in Natural Environments', - container='IEEE Transactions on Pattern Analysis and ' - 'Machine Intelligence', - year='2014', - homepage='http://vision.imar.ro/human3.6m/description.php', + author="Ionescu, Catalin and Papava, Dragos and " "Olaru, Vlad and Sminchisescu, Cristian", + title="Human3.6M: Large Scale Datasets and Predictive " "Methods for 3D Human Sensing in Natural Environments", + container="IEEE Transactions on Pattern Analysis and " "Machine Intelligence", + year="2014", + homepage="http://vision.imar.ro/human3.6m/description.php", ), keypoint_info={ - 0: - dict(name='root', id=0, color=[51, 153, 255], type='lower', swap=''), - 1: - dict( - name='right_hip', - id=1, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 2: - dict( - name='right_knee', - id=2, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 3: - dict( - name='right_foot', - id=3, - color=[255, 128, 0], - type='lower', - swap='left_foot'), - 4: - dict( - name='left_hip', - id=4, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 5: - dict( - name='left_knee', - id=5, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 6: - dict( - name='left_foot', - id=6, - color=[0, 255, 0], - type='lower', - swap='right_foot'), - 7: - dict(name='spine', id=7, color=[51, 153, 255], type='upper', swap=''), - 8: - dict(name='thorax', id=8, color=[51, 153, 255], type='upper', swap=''), - 9: - dict( - name='neck_base', - id=9, - color=[51, 153, 255], - type='upper', - swap=''), - 10: - dict(name='head', id=10, color=[51, 153, 255], type='upper', swap=''), - 11: - dict( - name='left_shoulder', - id=11, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 12: - dict( - name='left_elbow', - id=12, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 13: - dict( - name='left_wrist', - id=13, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 14: - dict( - name='right_shoulder', - id=14, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 15: - dict( - name='right_elbow', - id=15, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 16: - dict( - name='right_wrist', - id=16, - color=[255, 128, 0], - type='upper', - swap='left_wrist') + 0: dict(name="root", id=0, color=[51, 153, 255], type="lower", swap=""), + 1: dict(name="right_hip", id=1, color=[255, 128, 0], type="lower", swap="left_hip"), + 2: dict(name="right_knee", id=2, color=[255, 128, 0], type="lower", swap="left_knee"), + 3: dict(name="right_foot", id=3, color=[255, 128, 0], type="lower", swap="left_foot"), + 4: dict(name="left_hip", id=4, color=[0, 255, 0], type="lower", swap="right_hip"), + 5: dict(name="left_knee", id=5, color=[0, 255, 0], type="lower", swap="right_knee"), + 6: dict(name="left_foot", id=6, color=[0, 255, 0], type="lower", swap="right_foot"), + 7: dict(name="spine", id=7, color=[51, 153, 255], type="upper", swap=""), + 8: dict(name="thorax", id=8, color=[51, 153, 255], type="upper", swap=""), + 9: dict(name="neck_base", id=9, color=[51, 153, 255], type="upper", swap=""), + 10: dict(name="head", id=10, color=[51, 153, 255], type="upper", swap=""), + 11: dict(name="left_shoulder", id=11, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 12: dict(name="left_elbow", id=12, color=[0, 255, 0], type="upper", swap="right_elbow"), + 13: dict(name="left_wrist", id=13, color=[0, 255, 0], type="upper", swap="right_wrist"), + 14: dict(name="right_shoulder", id=14, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 15: dict(name="right_elbow", id=15, color=[255, 128, 0], type="upper", swap="left_elbow"), + 16: dict(name="right_wrist", id=16, color=[255, 128, 0], type="upper", swap="left_wrist"), }, skeleton_info={ - 0: - dict(link=('root', 'left_hip'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_hip', 'left_knee'), id=1, color=[0, 255, 0]), - 2: - dict(link=('left_knee', 'left_foot'), id=2, color=[0, 255, 0]), - 3: - dict(link=('root', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('right_hip', 'right_knee'), id=4, color=[255, 128, 0]), - 5: - dict(link=('right_knee', 'right_foot'), id=5, color=[255, 128, 0]), - 6: - dict(link=('root', 'spine'), id=6, color=[51, 153, 255]), - 7: - dict(link=('spine', 'thorax'), id=7, color=[51, 153, 255]), - 8: - dict(link=('thorax', 'neck_base'), id=8, color=[51, 153, 255]), - 9: - dict(link=('neck_base', 'head'), id=9, color=[51, 153, 255]), - 10: - dict(link=('thorax', 'left_shoulder'), id=10, color=[0, 255, 0]), - 11: - dict(link=('left_shoulder', 'left_elbow'), id=11, color=[0, 255, 0]), - 12: - dict(link=('left_elbow', 'left_wrist'), id=12, color=[0, 255, 0]), - 13: - dict(link=('thorax', 'right_shoulder'), id=13, color=[255, 128, 0]), - 14: - dict( - link=('right_shoulder', 'right_elbow'), id=14, color=[255, 128, - 0]), - 15: - dict(link=('right_elbow', 'right_wrist'), id=15, color=[255, 128, 0]) + 0: dict(link=("root", "left_hip"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_hip", "left_knee"), id=1, color=[0, 255, 0]), + 2: dict(link=("left_knee", "left_foot"), id=2, color=[0, 255, 0]), + 3: dict(link=("root", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("right_hip", "right_knee"), id=4, color=[255, 128, 0]), + 5: dict(link=("right_knee", "right_foot"), id=5, color=[255, 128, 0]), + 6: dict(link=("root", "spine"), id=6, color=[51, 153, 255]), + 7: dict(link=("spine", "thorax"), id=7, color=[51, 153, 255]), + 8: dict(link=("thorax", "neck_base"), id=8, color=[51, 153, 255]), + 9: dict(link=("neck_base", "head"), id=9, color=[51, 153, 255]), + 10: dict(link=("thorax", "left_shoulder"), id=10, color=[0, 255, 0]), + 11: dict(link=("left_shoulder", "left_elbow"), id=11, color=[0, 255, 0]), + 12: dict(link=("left_elbow", "left_wrist"), id=12, color=[0, 255, 0]), + 13: dict(link=("thorax", "right_shoulder"), id=13, color=[255, 128, 0]), + 14: dict(link=("right_shoulder", "right_elbow"), id=14, color=[255, 128, 0]), + 15: dict(link=("right_elbow", "right_wrist"), id=15, color=[255, 128, 0]), }, - joint_weights=[1.] * 17, + joint_weights=[1.0] * 17, sigmas=[], - stats_info=dict(bbox_center=(528., 427.), bbox_scale=400.)) + stats_info=dict(bbox_center=(528.0, 427.0), bbox_scale=400.0), +) diff --git a/mmpose/configs/_base_/datasets/h3wb.py b/mmpose/configs/_base_/datasets/h3wb.py index bb47a1b3f5809d7b2c6429e0c8520f7141e4ca3d..2c556fbd9a78789a2359644991204c97b5b142fb 100644 --- a/mmpose/configs/_base_/datasets/h3wb.py +++ b/mmpose/configs/_base_/datasets/h3wb.py @@ -1,1151 +1,350 @@ dataset_info = dict( - dataset_name='h3wb', + dataset_name="h3wb", paper_info=dict( - author='Yue Zhu, Nermin Samet, David Picard', - title='H3WB: Human3.6M 3D WholeBody Dataset and Benchmark', - container='International Conf. on Computer Vision (ICCV)', - year='2023', - homepage='https://github.com/wholebody3d/wholebody3d', + author="Yue Zhu, Nermin Samet, David Picard", + title="H3WB: Human3.6M 3D WholeBody Dataset and Benchmark", + container="International Conf. on Computer Vision (ICCV)", + year="2023", + homepage="https://github.com/wholebody3d/wholebody3d", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 17: - dict( - name='left_big_toe', - id=17, - color=[255, 128, 0], - type='lower', - swap='right_big_toe'), - 18: - dict( - name='left_small_toe', - id=18, - color=[255, 128, 0], - type='lower', - swap='right_small_toe'), - 19: - dict( - name='left_heel', - id=19, - color=[255, 128, 0], - type='lower', - swap='right_heel'), - 20: - dict( - name='right_big_toe', - id=20, - color=[255, 128, 0], - type='lower', - swap='left_big_toe'), - 21: - dict( - name='right_small_toe', - id=21, - color=[255, 128, 0], - type='lower', - swap='left_small_toe'), - 22: - dict( - name='right_heel', - id=22, - color=[255, 128, 0], - type='lower', - swap='left_heel'), - 23: - dict( - name='face-0', - id=23, - color=[255, 255, 255], - type='', - swap='face-16'), - 24: - dict( - name='face-1', - id=24, - color=[255, 255, 255], - type='', - swap='face-15'), - 25: - dict( - name='face-2', - id=25, - color=[255, 255, 255], - type='', - swap='face-14'), - 26: - dict( - name='face-3', - id=26, - color=[255, 255, 255], - type='', - swap='face-13'), - 27: - dict( - name='face-4', - id=27, - color=[255, 255, 255], - type='', - swap='face-12'), - 28: - dict( - name='face-5', - id=28, - color=[255, 255, 255], - type='', - swap='face-11'), - 29: - dict( - name='face-6', - id=29, - color=[255, 255, 255], - type='', - swap='face-10'), - 30: - dict( - name='face-7', - id=30, - color=[255, 255, 255], - type='', - swap='face-9'), - 31: - dict(name='face-8', id=31, color=[255, 255, 255], type='', swap=''), - 32: - dict( - name='face-9', - id=32, - color=[255, 255, 255], - type='', - swap='face-7'), - 33: - dict( - name='face-10', - id=33, - color=[255, 255, 255], - type='', - swap='face-6'), - 34: - dict( - name='face-11', - id=34, - color=[255, 255, 255], - type='', - swap='face-5'), - 35: - dict( - name='face-12', - id=35, - color=[255, 255, 255], - type='', - swap='face-4'), - 36: - dict( - name='face-13', - id=36, - color=[255, 255, 255], - type='', - swap='face-3'), - 37: - dict( - name='face-14', - id=37, - color=[255, 255, 255], - type='', - swap='face-2'), - 38: - dict( - name='face-15', - id=38, - color=[255, 255, 255], - type='', - swap='face-1'), - 39: - dict( - name='face-16', - id=39, - color=[255, 255, 255], - type='', - swap='face-0'), - 40: - dict( - name='face-17', - id=40, - color=[255, 255, 255], - type='', - swap='face-26'), - 41: - dict( - name='face-18', - id=41, - color=[255, 255, 255], - type='', - swap='face-25'), - 42: - dict( - name='face-19', - id=42, - color=[255, 255, 255], - type='', - swap='face-24'), - 43: - dict( - name='face-20', - id=43, - color=[255, 255, 255], - type='', - swap='face-23'), - 44: - dict( - name='face-21', - id=44, - color=[255, 255, 255], - type='', - swap='face-22'), - 45: - dict( - name='face-22', - id=45, - color=[255, 255, 255], - type='', - swap='face-21'), - 46: - dict( - name='face-23', - id=46, - color=[255, 255, 255], - type='', - swap='face-20'), - 47: - dict( - name='face-24', - id=47, - color=[255, 255, 255], - type='', - swap='face-19'), - 48: - dict( - name='face-25', - id=48, - color=[255, 255, 255], - type='', - swap='face-18'), - 49: - dict( - name='face-26', - id=49, - color=[255, 255, 255], - type='', - swap='face-17'), - 50: - dict(name='face-27', id=50, color=[255, 255, 255], type='', swap=''), - 51: - dict(name='face-28', id=51, color=[255, 255, 255], type='', swap=''), - 52: - dict(name='face-29', id=52, color=[255, 255, 255], type='', swap=''), - 53: - dict(name='face-30', id=53, color=[255, 255, 255], type='', swap=''), - 54: - dict( - name='face-31', - id=54, - color=[255, 255, 255], - type='', - swap='face-35'), - 55: - dict( - name='face-32', - id=55, - color=[255, 255, 255], - type='', - swap='face-34'), - 56: - dict(name='face-33', id=56, color=[255, 255, 255], type='', swap=''), - 57: - dict( - name='face-34', - id=57, - color=[255, 255, 255], - type='', - swap='face-32'), - 58: - dict( - name='face-35', - id=58, - color=[255, 255, 255], - type='', - swap='face-31'), - 59: - dict( - name='face-36', - id=59, - color=[255, 255, 255], - type='', - swap='face-45'), - 60: - dict( - name='face-37', - id=60, - color=[255, 255, 255], - type='', - swap='face-44'), - 61: - dict( - name='face-38', - id=61, - color=[255, 255, 255], - type='', - swap='face-43'), - 62: - dict( - name='face-39', - id=62, - color=[255, 255, 255], - type='', - swap='face-42'), - 63: - dict( - name='face-40', - id=63, - color=[255, 255, 255], - type='', - swap='face-47'), - 64: - dict( - name='face-41', - id=64, - color=[255, 255, 255], - type='', - swap='face-46'), - 65: - dict( - name='face-42', - id=65, - color=[255, 255, 255], - type='', - swap='face-39'), - 66: - dict( - name='face-43', - id=66, - color=[255, 255, 255], - type='', - swap='face-38'), - 67: - dict( - name='face-44', - id=67, - color=[255, 255, 255], - type='', - swap='face-37'), - 68: - dict( - name='face-45', - id=68, - color=[255, 255, 255], - type='', - swap='face-36'), - 69: - dict( - name='face-46', - id=69, - color=[255, 255, 255], - type='', - swap='face-41'), - 70: - dict( - name='face-47', - id=70, - color=[255, 255, 255], - type='', - swap='face-40'), - 71: - dict( - name='face-48', - id=71, - color=[255, 255, 255], - type='', - swap='face-54'), - 72: - dict( - name='face-49', - id=72, - color=[255, 255, 255], - type='', - swap='face-53'), - 73: - dict( - name='face-50', - id=73, - color=[255, 255, 255], - type='', - swap='face-52'), - 74: - dict(name='face-51', id=74, color=[255, 255, 255], type='', swap=''), - 75: - dict( - name='face-52', - id=75, - color=[255, 255, 255], - type='', - swap='face-50'), - 76: - dict( - name='face-53', - id=76, - color=[255, 255, 255], - type='', - swap='face-49'), - 77: - dict( - name='face-54', - id=77, - color=[255, 255, 255], - type='', - swap='face-48'), - 78: - dict( - name='face-55', - id=78, - color=[255, 255, 255], - type='', - swap='face-59'), - 79: - dict( - name='face-56', - id=79, - color=[255, 255, 255], - type='', - swap='face-58'), - 80: - dict(name='face-57', id=80, color=[255, 255, 255], type='', swap=''), - 81: - dict( - name='face-58', - id=81, - color=[255, 255, 255], - type='', - swap='face-56'), - 82: - dict( - name='face-59', - id=82, - color=[255, 255, 255], - type='', - swap='face-55'), - 83: - dict( - name='face-60', - id=83, - color=[255, 255, 255], - type='', - swap='face-64'), - 84: - dict( - name='face-61', - id=84, - color=[255, 255, 255], - type='', - swap='face-63'), - 85: - dict(name='face-62', id=85, color=[255, 255, 255], type='', swap=''), - 86: - dict( - name='face-63', - id=86, - color=[255, 255, 255], - type='', - swap='face-61'), - 87: - dict( - name='face-64', - id=87, - color=[255, 255, 255], - type='', - swap='face-60'), - 88: - dict( - name='face-65', - id=88, - color=[255, 255, 255], - type='', - swap='face-67'), - 89: - dict(name='face-66', id=89, color=[255, 255, 255], type='', swap=''), - 90: - dict( - name='face-67', - id=90, - color=[255, 255, 255], - type='', - swap='face-65'), - 91: - dict( - name='left_hand_root', - id=91, - color=[255, 255, 255], - type='', - swap='right_hand_root'), - 92: - dict( - name='left_thumb1', - id=92, - color=[255, 128, 0], - type='', - swap='right_thumb1'), - 93: - dict( - name='left_thumb2', - id=93, - color=[255, 128, 0], - type='', - swap='right_thumb2'), - 94: - dict( - name='left_thumb3', - id=94, - color=[255, 128, 0], - type='', - swap='right_thumb3'), - 95: - dict( - name='left_thumb4', - id=95, - color=[255, 128, 0], - type='', - swap='right_thumb4'), - 96: - dict( - name='left_forefinger1', - id=96, - color=[255, 153, 255], - type='', - swap='right_forefinger1'), - 97: - dict( - name='left_forefinger2', - id=97, - color=[255, 153, 255], - type='', - swap='right_forefinger2'), - 98: - dict( - name='left_forefinger3', - id=98, - color=[255, 153, 255], - type='', - swap='right_forefinger3'), - 99: - dict( - name='left_forefinger4', - id=99, - color=[255, 153, 255], - type='', - swap='right_forefinger4'), - 100: - dict( - name='left_middle_finger1', - id=100, - color=[102, 178, 255], - type='', - swap='right_middle_finger1'), - 101: - dict( - name='left_middle_finger2', - id=101, - color=[102, 178, 255], - type='', - swap='right_middle_finger2'), - 102: - dict( - name='left_middle_finger3', - id=102, - color=[102, 178, 255], - type='', - swap='right_middle_finger3'), - 103: - dict( - name='left_middle_finger4', - id=103, - color=[102, 178, 255], - type='', - swap='right_middle_finger4'), - 104: - dict( - name='left_ring_finger1', - id=104, - color=[255, 51, 51], - type='', - swap='right_ring_finger1'), - 105: - dict( - name='left_ring_finger2', - id=105, - color=[255, 51, 51], - type='', - swap='right_ring_finger2'), - 106: - dict( - name='left_ring_finger3', - id=106, - color=[255, 51, 51], - type='', - swap='right_ring_finger3'), - 107: - dict( - name='left_ring_finger4', - id=107, - color=[255, 51, 51], - type='', - swap='right_ring_finger4'), - 108: - dict( - name='left_pinky_finger1', - id=108, - color=[0, 255, 0], - type='', - swap='right_pinky_finger1'), - 109: - dict( - name='left_pinky_finger2', - id=109, - color=[0, 255, 0], - type='', - swap='right_pinky_finger2'), - 110: - dict( - name='left_pinky_finger3', - id=110, - color=[0, 255, 0], - type='', - swap='right_pinky_finger3'), - 111: - dict( - name='left_pinky_finger4', - id=111, - color=[0, 255, 0], - type='', - swap='right_pinky_finger4'), - 112: - dict( - name='right_hand_root', - id=112, - color=[255, 255, 255], - type='', - swap='left_hand_root'), - 113: - dict( - name='right_thumb1', - id=113, - color=[255, 128, 0], - type='', - swap='left_thumb1'), - 114: - dict( - name='right_thumb2', - id=114, - color=[255, 128, 0], - type='', - swap='left_thumb2'), - 115: - dict( - name='right_thumb3', - id=115, - color=[255, 128, 0], - type='', - swap='left_thumb3'), - 116: - dict( - name='right_thumb4', - id=116, - color=[255, 128, 0], - type='', - swap='left_thumb4'), - 117: - dict( - name='right_forefinger1', - id=117, - color=[255, 153, 255], - type='', - swap='left_forefinger1'), - 118: - dict( - name='right_forefinger2', - id=118, - color=[255, 153, 255], - type='', - swap='left_forefinger2'), - 119: - dict( - name='right_forefinger3', - id=119, - color=[255, 153, 255], - type='', - swap='left_forefinger3'), - 120: - dict( - name='right_forefinger4', - id=120, - color=[255, 153, 255], - type='', - swap='left_forefinger4'), - 121: - dict( - name='right_middle_finger1', - id=121, - color=[102, 178, 255], - type='', - swap='left_middle_finger1'), - 122: - dict( - name='right_middle_finger2', - id=122, - color=[102, 178, 255], - type='', - swap='left_middle_finger2'), - 123: - dict( - name='right_middle_finger3', - id=123, - color=[102, 178, 255], - type='', - swap='left_middle_finger3'), - 124: - dict( - name='right_middle_finger4', - id=124, - color=[102, 178, 255], - type='', - swap='left_middle_finger4'), - 125: - dict( - name='right_ring_finger1', - id=125, - color=[255, 51, 51], - type='', - swap='left_ring_finger1'), - 126: - dict( - name='right_ring_finger2', - id=126, - color=[255, 51, 51], - type='', - swap='left_ring_finger2'), - 127: - dict( - name='right_ring_finger3', - id=127, - color=[255, 51, 51], - type='', - swap='left_ring_finger3'), - 128: - dict( - name='right_ring_finger4', - id=128, - color=[255, 51, 51], - type='', - swap='left_ring_finger4'), - 129: - dict( - name='right_pinky_finger1', - id=129, - color=[0, 255, 0], - type='', - swap='left_pinky_finger1'), - 130: - dict( - name='right_pinky_finger2', - id=130, - color=[0, 255, 0], - type='', - swap='left_pinky_finger2'), - 131: - dict( - name='right_pinky_finger3', - id=131, - color=[0, 255, 0], - type='', - swap='left_pinky_finger3'), - 132: - dict( - name='right_pinky_finger4', - id=132, - color=[0, 255, 0], - type='', - swap='left_pinky_finger4') + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), + 17: dict(name="left_big_toe", id=17, color=[255, 128, 0], type="lower", swap="right_big_toe"), + 18: dict(name="left_small_toe", id=18, color=[255, 128, 0], type="lower", swap="right_small_toe"), + 19: dict(name="left_heel", id=19, color=[255, 128, 0], type="lower", swap="right_heel"), + 20: dict(name="right_big_toe", id=20, color=[255, 128, 0], type="lower", swap="left_big_toe"), + 21: dict(name="right_small_toe", id=21, color=[255, 128, 0], type="lower", swap="left_small_toe"), + 22: dict(name="right_heel", id=22, color=[255, 128, 0], type="lower", swap="left_heel"), + 23: dict(name="face-0", id=23, color=[255, 255, 255], type="", swap="face-16"), + 24: dict(name="face-1", id=24, color=[255, 255, 255], type="", swap="face-15"), + 25: dict(name="face-2", id=25, color=[255, 255, 255], type="", swap="face-14"), + 26: dict(name="face-3", id=26, color=[255, 255, 255], type="", swap="face-13"), + 27: dict(name="face-4", id=27, color=[255, 255, 255], type="", swap="face-12"), + 28: dict(name="face-5", id=28, color=[255, 255, 255], type="", swap="face-11"), + 29: dict(name="face-6", id=29, color=[255, 255, 255], type="", swap="face-10"), + 30: dict(name="face-7", id=30, color=[255, 255, 255], type="", swap="face-9"), + 31: dict(name="face-8", id=31, color=[255, 255, 255], type="", swap=""), + 32: dict(name="face-9", id=32, color=[255, 255, 255], type="", swap="face-7"), + 33: dict(name="face-10", id=33, color=[255, 255, 255], type="", swap="face-6"), + 34: dict(name="face-11", id=34, color=[255, 255, 255], type="", swap="face-5"), + 35: dict(name="face-12", id=35, color=[255, 255, 255], type="", swap="face-4"), + 36: dict(name="face-13", id=36, color=[255, 255, 255], type="", swap="face-3"), + 37: dict(name="face-14", id=37, color=[255, 255, 255], type="", swap="face-2"), + 38: dict(name="face-15", id=38, color=[255, 255, 255], type="", swap="face-1"), + 39: dict(name="face-16", id=39, color=[255, 255, 255], type="", swap="face-0"), + 40: dict(name="face-17", id=40, color=[255, 255, 255], type="", swap="face-26"), + 41: dict(name="face-18", id=41, color=[255, 255, 255], type="", swap="face-25"), + 42: dict(name="face-19", id=42, color=[255, 255, 255], type="", swap="face-24"), + 43: dict(name="face-20", id=43, color=[255, 255, 255], type="", swap="face-23"), + 44: dict(name="face-21", id=44, color=[255, 255, 255], type="", swap="face-22"), + 45: dict(name="face-22", id=45, color=[255, 255, 255], type="", swap="face-21"), + 46: dict(name="face-23", id=46, color=[255, 255, 255], type="", swap="face-20"), + 47: dict(name="face-24", id=47, color=[255, 255, 255], type="", swap="face-19"), + 48: dict(name="face-25", id=48, color=[255, 255, 255], type="", swap="face-18"), + 49: dict(name="face-26", id=49, color=[255, 255, 255], type="", swap="face-17"), + 50: dict(name="face-27", id=50, color=[255, 255, 255], type="", swap=""), + 51: dict(name="face-28", id=51, color=[255, 255, 255], type="", swap=""), + 52: dict(name="face-29", id=52, color=[255, 255, 255], type="", swap=""), + 53: dict(name="face-30", id=53, color=[255, 255, 255], type="", swap=""), + 54: dict(name="face-31", id=54, color=[255, 255, 255], type="", swap="face-35"), + 55: dict(name="face-32", id=55, color=[255, 255, 255], type="", swap="face-34"), + 56: dict(name="face-33", id=56, color=[255, 255, 255], type="", swap=""), + 57: dict(name="face-34", id=57, color=[255, 255, 255], type="", swap="face-32"), + 58: dict(name="face-35", id=58, color=[255, 255, 255], type="", swap="face-31"), + 59: dict(name="face-36", id=59, color=[255, 255, 255], type="", swap="face-45"), + 60: dict(name="face-37", id=60, color=[255, 255, 255], type="", swap="face-44"), + 61: dict(name="face-38", id=61, color=[255, 255, 255], type="", swap="face-43"), + 62: dict(name="face-39", id=62, color=[255, 255, 255], type="", swap="face-42"), + 63: dict(name="face-40", id=63, color=[255, 255, 255], type="", swap="face-47"), + 64: dict(name="face-41", id=64, color=[255, 255, 255], type="", swap="face-46"), + 65: dict(name="face-42", id=65, color=[255, 255, 255], type="", swap="face-39"), + 66: dict(name="face-43", id=66, color=[255, 255, 255], type="", swap="face-38"), + 67: dict(name="face-44", id=67, color=[255, 255, 255], type="", swap="face-37"), + 68: dict(name="face-45", id=68, color=[255, 255, 255], type="", swap="face-36"), + 69: dict(name="face-46", id=69, color=[255, 255, 255], type="", swap="face-41"), + 70: dict(name="face-47", id=70, color=[255, 255, 255], type="", swap="face-40"), + 71: dict(name="face-48", id=71, color=[255, 255, 255], type="", swap="face-54"), + 72: dict(name="face-49", id=72, color=[255, 255, 255], type="", swap="face-53"), + 73: dict(name="face-50", id=73, color=[255, 255, 255], type="", swap="face-52"), + 74: dict(name="face-51", id=74, color=[255, 255, 255], type="", swap=""), + 75: dict(name="face-52", id=75, color=[255, 255, 255], type="", swap="face-50"), + 76: dict(name="face-53", id=76, color=[255, 255, 255], type="", swap="face-49"), + 77: dict(name="face-54", id=77, color=[255, 255, 255], type="", swap="face-48"), + 78: dict(name="face-55", id=78, color=[255, 255, 255], type="", swap="face-59"), + 79: dict(name="face-56", id=79, color=[255, 255, 255], type="", swap="face-58"), + 80: dict(name="face-57", id=80, color=[255, 255, 255], type="", swap=""), + 81: dict(name="face-58", id=81, color=[255, 255, 255], type="", swap="face-56"), + 82: dict(name="face-59", id=82, color=[255, 255, 255], type="", swap="face-55"), + 83: dict(name="face-60", id=83, color=[255, 255, 255], type="", swap="face-64"), + 84: dict(name="face-61", id=84, color=[255, 255, 255], type="", swap="face-63"), + 85: dict(name="face-62", id=85, color=[255, 255, 255], type="", swap=""), + 86: dict(name="face-63", id=86, color=[255, 255, 255], type="", swap="face-61"), + 87: dict(name="face-64", id=87, color=[255, 255, 255], type="", swap="face-60"), + 88: dict(name="face-65", id=88, color=[255, 255, 255], type="", swap="face-67"), + 89: dict(name="face-66", id=89, color=[255, 255, 255], type="", swap=""), + 90: dict(name="face-67", id=90, color=[255, 255, 255], type="", swap="face-65"), + 91: dict(name="left_hand_root", id=91, color=[255, 255, 255], type="", swap="right_hand_root"), + 92: dict(name="left_thumb1", id=92, color=[255, 128, 0], type="", swap="right_thumb1"), + 93: dict(name="left_thumb2", id=93, color=[255, 128, 0], type="", swap="right_thumb2"), + 94: dict(name="left_thumb3", id=94, color=[255, 128, 0], type="", swap="right_thumb3"), + 95: dict(name="left_thumb4", id=95, color=[255, 128, 0], type="", swap="right_thumb4"), + 96: dict(name="left_forefinger1", id=96, color=[255, 153, 255], type="", swap="right_forefinger1"), + 97: dict(name="left_forefinger2", id=97, color=[255, 153, 255], type="", swap="right_forefinger2"), + 98: dict(name="left_forefinger3", id=98, color=[255, 153, 255], type="", swap="right_forefinger3"), + 99: dict(name="left_forefinger4", id=99, color=[255, 153, 255], type="", swap="right_forefinger4"), + 100: dict(name="left_middle_finger1", id=100, color=[102, 178, 255], type="", swap="right_middle_finger1"), + 101: dict(name="left_middle_finger2", id=101, color=[102, 178, 255], type="", swap="right_middle_finger2"), + 102: dict(name="left_middle_finger3", id=102, color=[102, 178, 255], type="", swap="right_middle_finger3"), + 103: dict(name="left_middle_finger4", id=103, color=[102, 178, 255], type="", swap="right_middle_finger4"), + 104: dict(name="left_ring_finger1", id=104, color=[255, 51, 51], type="", swap="right_ring_finger1"), + 105: dict(name="left_ring_finger2", id=105, color=[255, 51, 51], type="", swap="right_ring_finger2"), + 106: dict(name="left_ring_finger3", id=106, color=[255, 51, 51], type="", swap="right_ring_finger3"), + 107: dict(name="left_ring_finger4", id=107, color=[255, 51, 51], type="", swap="right_ring_finger4"), + 108: dict(name="left_pinky_finger1", id=108, color=[0, 255, 0], type="", swap="right_pinky_finger1"), + 109: dict(name="left_pinky_finger2", id=109, color=[0, 255, 0], type="", swap="right_pinky_finger2"), + 110: dict(name="left_pinky_finger3", id=110, color=[0, 255, 0], type="", swap="right_pinky_finger3"), + 111: dict(name="left_pinky_finger4", id=111, color=[0, 255, 0], type="", swap="right_pinky_finger4"), + 112: dict(name="right_hand_root", id=112, color=[255, 255, 255], type="", swap="left_hand_root"), + 113: dict(name="right_thumb1", id=113, color=[255, 128, 0], type="", swap="left_thumb1"), + 114: dict(name="right_thumb2", id=114, color=[255, 128, 0], type="", swap="left_thumb2"), + 115: dict(name="right_thumb3", id=115, color=[255, 128, 0], type="", swap="left_thumb3"), + 116: dict(name="right_thumb4", id=116, color=[255, 128, 0], type="", swap="left_thumb4"), + 117: dict(name="right_forefinger1", id=117, color=[255, 153, 255], type="", swap="left_forefinger1"), + 118: dict(name="right_forefinger2", id=118, color=[255, 153, 255], type="", swap="left_forefinger2"), + 119: dict(name="right_forefinger3", id=119, color=[255, 153, 255], type="", swap="left_forefinger3"), + 120: dict(name="right_forefinger4", id=120, color=[255, 153, 255], type="", swap="left_forefinger4"), + 121: dict(name="right_middle_finger1", id=121, color=[102, 178, 255], type="", swap="left_middle_finger1"), + 122: dict(name="right_middle_finger2", id=122, color=[102, 178, 255], type="", swap="left_middle_finger2"), + 123: dict(name="right_middle_finger3", id=123, color=[102, 178, 255], type="", swap="left_middle_finger3"), + 124: dict(name="right_middle_finger4", id=124, color=[102, 178, 255], type="", swap="left_middle_finger4"), + 125: dict(name="right_ring_finger1", id=125, color=[255, 51, 51], type="", swap="left_ring_finger1"), + 126: dict(name="right_ring_finger2", id=126, color=[255, 51, 51], type="", swap="left_ring_finger2"), + 127: dict(name="right_ring_finger3", id=127, color=[255, 51, 51], type="", swap="left_ring_finger3"), + 128: dict(name="right_ring_finger4", id=128, color=[255, 51, 51], type="", swap="left_ring_finger4"), + 129: dict(name="right_pinky_finger1", id=129, color=[0, 255, 0], type="", swap="left_pinky_finger1"), + 130: dict(name="right_pinky_finger2", id=130, color=[0, 255, 0], type="", swap="left_pinky_finger2"), + 131: dict(name="right_pinky_finger3", id=131, color=[0, 255, 0], type="", swap="left_pinky_finger3"), + 132: dict(name="right_pinky_finger4", id=132, color=[0, 255, 0], type="", swap="left_pinky_finger4"), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'right_shoulder'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), - 14: - dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), - 16: - dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), - 18: - dict( - link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]), - 19: - dict(link=('left_ankle', 'left_big_toe'), id=19, color=[0, 255, 0]), - 20: - dict(link=('left_ankle', 'left_small_toe'), id=20, color=[0, 255, 0]), - 21: - dict(link=('left_ankle', 'left_heel'), id=21, color=[0, 255, 0]), - 22: - dict( - link=('right_ankle', 'right_big_toe'), id=22, color=[255, 128, 0]), - 23: - dict( - link=('right_ankle', 'right_small_toe'), - id=23, - color=[255, 128, 0]), - 24: - dict(link=('right_ankle', 'right_heel'), id=24, color=[255, 128, 0]), - 25: - dict( - link=('left_hand_root', 'left_thumb1'), id=25, color=[255, 128, - 0]), - 26: - dict(link=('left_thumb1', 'left_thumb2'), id=26, color=[255, 128, 0]), - 27: - dict(link=('left_thumb2', 'left_thumb3'), id=27, color=[255, 128, 0]), - 28: - dict(link=('left_thumb3', 'left_thumb4'), id=28, color=[255, 128, 0]), - 29: - dict( - link=('left_hand_root', 'left_forefinger1'), - id=29, - color=[255, 153, 255]), - 30: - dict( - link=('left_forefinger1', 'left_forefinger2'), - id=30, - color=[255, 153, 255]), - 31: - dict( - link=('left_forefinger2', 'left_forefinger3'), - id=31, - color=[255, 153, 255]), - 32: - dict( - link=('left_forefinger3', 'left_forefinger4'), - id=32, - color=[255, 153, 255]), - 33: - dict( - link=('left_hand_root', 'left_middle_finger1'), - id=33, - color=[102, 178, 255]), - 34: - dict( - link=('left_middle_finger1', 'left_middle_finger2'), - id=34, - color=[102, 178, 255]), - 35: - dict( - link=('left_middle_finger2', 'left_middle_finger3'), - id=35, - color=[102, 178, 255]), - 36: - dict( - link=('left_middle_finger3', 'left_middle_finger4'), - id=36, - color=[102, 178, 255]), - 37: - dict( - link=('left_hand_root', 'left_ring_finger1'), - id=37, - color=[255, 51, 51]), - 38: - dict( - link=('left_ring_finger1', 'left_ring_finger2'), - id=38, - color=[255, 51, 51]), - 39: - dict( - link=('left_ring_finger2', 'left_ring_finger3'), - id=39, - color=[255, 51, 51]), - 40: - dict( - link=('left_ring_finger3', 'left_ring_finger4'), - id=40, - color=[255, 51, 51]), - 41: - dict( - link=('left_hand_root', 'left_pinky_finger1'), - id=41, - color=[0, 255, 0]), - 42: - dict( - link=('left_pinky_finger1', 'left_pinky_finger2'), - id=42, - color=[0, 255, 0]), - 43: - dict( - link=('left_pinky_finger2', 'left_pinky_finger3'), - id=43, - color=[0, 255, 0]), - 44: - dict( - link=('left_pinky_finger3', 'left_pinky_finger4'), - id=44, - color=[0, 255, 0]), - 45: - dict( - link=('right_hand_root', 'right_thumb1'), - id=45, - color=[255, 128, 0]), - 46: - dict( - link=('right_thumb1', 'right_thumb2'), id=46, color=[255, 128, 0]), - 47: - dict( - link=('right_thumb2', 'right_thumb3'), id=47, color=[255, 128, 0]), - 48: - dict( - link=('right_thumb3', 'right_thumb4'), id=48, color=[255, 128, 0]), - 49: - dict( - link=('right_hand_root', 'right_forefinger1'), - id=49, - color=[255, 153, 255]), - 50: - dict( - link=('right_forefinger1', 'right_forefinger2'), - id=50, - color=[255, 153, 255]), - 51: - dict( - link=('right_forefinger2', 'right_forefinger3'), - id=51, - color=[255, 153, 255]), - 52: - dict( - link=('right_forefinger3', 'right_forefinger4'), - id=52, - color=[255, 153, 255]), - 53: - dict( - link=('right_hand_root', 'right_middle_finger1'), - id=53, - color=[102, 178, 255]), - 54: - dict( - link=('right_middle_finger1', 'right_middle_finger2'), - id=54, - color=[102, 178, 255]), - 55: - dict( - link=('right_middle_finger2', 'right_middle_finger3'), - id=55, - color=[102, 178, 255]), - 56: - dict( - link=('right_middle_finger3', 'right_middle_finger4'), - id=56, - color=[102, 178, 255]), - 57: - dict( - link=('right_hand_root', 'right_ring_finger1'), - id=57, - color=[255, 51, 51]), - 58: - dict( - link=('right_ring_finger1', 'right_ring_finger2'), - id=58, - color=[255, 51, 51]), - 59: - dict( - link=('right_ring_finger2', 'right_ring_finger3'), - id=59, - color=[255, 51, 51]), - 60: - dict( - link=('right_ring_finger3', 'right_ring_finger4'), - id=60, - color=[255, 51, 51]), - 61: - dict( - link=('right_hand_root', 'right_pinky_finger1'), - id=61, - color=[0, 255, 0]), - 62: - dict( - link=('right_pinky_finger1', 'right_pinky_finger2'), - id=62, - color=[0, 255, 0]), - 63: - dict( - link=('right_pinky_finger2', 'right_pinky_finger3'), - id=63, - color=[0, 255, 0]), - 64: - dict( - link=('right_pinky_finger3', 'right_pinky_finger4'), - id=64, - color=[0, 255, 0]) + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), + 19: dict(link=("left_ankle", "left_big_toe"), id=19, color=[0, 255, 0]), + 20: dict(link=("left_ankle", "left_small_toe"), id=20, color=[0, 255, 0]), + 21: dict(link=("left_ankle", "left_heel"), id=21, color=[0, 255, 0]), + 22: dict(link=("right_ankle", "right_big_toe"), id=22, color=[255, 128, 0]), + 23: dict(link=("right_ankle", "right_small_toe"), id=23, color=[255, 128, 0]), + 24: dict(link=("right_ankle", "right_heel"), id=24, color=[255, 128, 0]), + 25: dict(link=("left_hand_root", "left_thumb1"), id=25, color=[255, 128, 0]), + 26: dict(link=("left_thumb1", "left_thumb2"), id=26, color=[255, 128, 0]), + 27: dict(link=("left_thumb2", "left_thumb3"), id=27, color=[255, 128, 0]), + 28: dict(link=("left_thumb3", "left_thumb4"), id=28, color=[255, 128, 0]), + 29: dict(link=("left_hand_root", "left_forefinger1"), id=29, color=[255, 153, 255]), + 30: dict(link=("left_forefinger1", "left_forefinger2"), id=30, color=[255, 153, 255]), + 31: dict(link=("left_forefinger2", "left_forefinger3"), id=31, color=[255, 153, 255]), + 32: dict(link=("left_forefinger3", "left_forefinger4"), id=32, color=[255, 153, 255]), + 33: dict(link=("left_hand_root", "left_middle_finger1"), id=33, color=[102, 178, 255]), + 34: dict(link=("left_middle_finger1", "left_middle_finger2"), id=34, color=[102, 178, 255]), + 35: dict(link=("left_middle_finger2", "left_middle_finger3"), id=35, color=[102, 178, 255]), + 36: dict(link=("left_middle_finger3", "left_middle_finger4"), id=36, color=[102, 178, 255]), + 37: dict(link=("left_hand_root", "left_ring_finger1"), id=37, color=[255, 51, 51]), + 38: dict(link=("left_ring_finger1", "left_ring_finger2"), id=38, color=[255, 51, 51]), + 39: dict(link=("left_ring_finger2", "left_ring_finger3"), id=39, color=[255, 51, 51]), + 40: dict(link=("left_ring_finger3", "left_ring_finger4"), id=40, color=[255, 51, 51]), + 41: dict(link=("left_hand_root", "left_pinky_finger1"), id=41, color=[0, 255, 0]), + 42: dict(link=("left_pinky_finger1", "left_pinky_finger2"), id=42, color=[0, 255, 0]), + 43: dict(link=("left_pinky_finger2", "left_pinky_finger3"), id=43, color=[0, 255, 0]), + 44: dict(link=("left_pinky_finger3", "left_pinky_finger4"), id=44, color=[0, 255, 0]), + 45: dict(link=("right_hand_root", "right_thumb1"), id=45, color=[255, 128, 0]), + 46: dict(link=("right_thumb1", "right_thumb2"), id=46, color=[255, 128, 0]), + 47: dict(link=("right_thumb2", "right_thumb3"), id=47, color=[255, 128, 0]), + 48: dict(link=("right_thumb3", "right_thumb4"), id=48, color=[255, 128, 0]), + 49: dict(link=("right_hand_root", "right_forefinger1"), id=49, color=[255, 153, 255]), + 50: dict(link=("right_forefinger1", "right_forefinger2"), id=50, color=[255, 153, 255]), + 51: dict(link=("right_forefinger2", "right_forefinger3"), id=51, color=[255, 153, 255]), + 52: dict(link=("right_forefinger3", "right_forefinger4"), id=52, color=[255, 153, 255]), + 53: dict(link=("right_hand_root", "right_middle_finger1"), id=53, color=[102, 178, 255]), + 54: dict(link=("right_middle_finger1", "right_middle_finger2"), id=54, color=[102, 178, 255]), + 55: dict(link=("right_middle_finger2", "right_middle_finger3"), id=55, color=[102, 178, 255]), + 56: dict(link=("right_middle_finger3", "right_middle_finger4"), id=56, color=[102, 178, 255]), + 57: dict(link=("right_hand_root", "right_ring_finger1"), id=57, color=[255, 51, 51]), + 58: dict(link=("right_ring_finger1", "right_ring_finger2"), id=58, color=[255, 51, 51]), + 59: dict(link=("right_ring_finger2", "right_ring_finger3"), id=59, color=[255, 51, 51]), + 60: dict(link=("right_ring_finger3", "right_ring_finger4"), id=60, color=[255, 51, 51]), + 61: dict(link=("right_hand_root", "right_pinky_finger1"), id=61, color=[0, 255, 0]), + 62: dict(link=("right_pinky_finger1", "right_pinky_finger2"), id=62, color=[0, 255, 0]), + 63: dict(link=("right_pinky_finger2", "right_pinky_finger3"), id=63, color=[0, 255, 0]), + 64: dict(link=("right_pinky_finger3", "right_pinky_finger4"), id=64, color=[0, 255, 0]), }, - joint_weights=[1.] * 133, + joint_weights=[1.0] * 133, # 'https://github.com/jin-s13/COCO-WholeBody/blob/master/' # 'evaluation/myeval_wholebody.py#L175' sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, 0.068, 0.066, 0.066, - 0.092, 0.094, 0.094, 0.042, 0.043, 0.044, 0.043, 0.040, 0.035, 0.031, - 0.025, 0.020, 0.023, 0.029, 0.032, 0.037, 0.038, 0.043, 0.041, 0.045, - 0.013, 0.012, 0.011, 0.011, 0.012, 0.012, 0.011, 0.011, 0.013, 0.015, - 0.009, 0.007, 0.007, 0.007, 0.012, 0.009, 0.008, 0.016, 0.010, 0.017, - 0.011, 0.009, 0.011, 0.009, 0.007, 0.013, 0.008, 0.011, 0.012, 0.010, - 0.034, 0.008, 0.008, 0.009, 0.008, 0.008, 0.007, 0.010, 0.008, 0.009, - 0.009, 0.009, 0.007, 0.007, 0.008, 0.011, 0.008, 0.008, 0.008, 0.01, - 0.008, 0.029, 0.022, 0.035, 0.037, 0.047, 0.026, 0.025, 0.024, 0.035, - 0.018, 0.024, 0.022, 0.026, 0.017, 0.021, 0.021, 0.032, 0.02, 0.019, - 0.022, 0.031, 0.029, 0.022, 0.035, 0.037, 0.047, 0.026, 0.025, 0.024, - 0.035, 0.018, 0.024, 0.022, 0.026, 0.017, 0.021, 0.021, 0.032, 0.02, - 0.019, 0.022, 0.031 - ]) + 0.026, + 0.025, + 0.025, + 0.035, + 0.035, + 0.079, + 0.079, + 0.072, + 0.072, + 0.062, + 0.062, + 0.107, + 0.107, + 0.087, + 0.087, + 0.089, + 0.089, + 0.068, + 0.066, + 0.066, + 0.092, + 0.094, + 0.094, + 0.042, + 0.043, + 0.044, + 0.043, + 0.040, + 0.035, + 0.031, + 0.025, + 0.020, + 0.023, + 0.029, + 0.032, + 0.037, + 0.038, + 0.043, + 0.041, + 0.045, + 0.013, + 0.012, + 0.011, + 0.011, + 0.012, + 0.012, + 0.011, + 0.011, + 0.013, + 0.015, + 0.009, + 0.007, + 0.007, + 0.007, + 0.012, + 0.009, + 0.008, + 0.016, + 0.010, + 0.017, + 0.011, + 0.009, + 0.011, + 0.009, + 0.007, + 0.013, + 0.008, + 0.011, + 0.012, + 0.010, + 0.034, + 0.008, + 0.008, + 0.009, + 0.008, + 0.008, + 0.007, + 0.010, + 0.008, + 0.009, + 0.009, + 0.009, + 0.007, + 0.007, + 0.008, + 0.011, + 0.008, + 0.008, + 0.008, + 0.01, + 0.008, + 0.029, + 0.022, + 0.035, + 0.037, + 0.047, + 0.026, + 0.025, + 0.024, + 0.035, + 0.018, + 0.024, + 0.022, + 0.026, + 0.017, + 0.021, + 0.021, + 0.032, + 0.02, + 0.019, + 0.022, + 0.031, + 0.029, + 0.022, + 0.035, + 0.037, + 0.047, + 0.026, + 0.025, + 0.024, + 0.035, + 0.018, + 0.024, + 0.022, + 0.026, + 0.017, + 0.021, + 0.021, + 0.032, + 0.02, + 0.019, + 0.022, + 0.031, + ], +) diff --git a/mmpose/configs/_base_/datasets/halpe.py b/mmpose/configs/_base_/datasets/halpe.py index 1385fe81dc2190684f2142449c0f288f2cb74c1a..2a6ec565022a9d8a9b3a61f7563d9a28489d46bd 100644 --- a/mmpose/configs/_base_/datasets/halpe.py +++ b/mmpose/configs/_base_/datasets/halpe.py @@ -1,1157 +1,360 @@ dataset_info = dict( - dataset_name='halpe', + dataset_name="halpe", paper_info=dict( - author='Li, Yong-Lu and Xu, Liang and Liu, Xinpeng and Huang, Xijie' - ' and Xu, Yue and Wang, Shiyi and Fang, Hao-Shu' - ' and Ma, Ze and Chen, Mingyang and Lu, Cewu', - title='PaStaNet: Toward Human Activity Knowledge Engine', - container='CVPR', - year='2020', - homepage='https://github.com/Fang-Haoshu/Halpe-FullBody/', + author="Li, Yong-Lu and Xu, Liang and Liu, Xinpeng and Huang, Xijie" + " and Xu, Yue and Wang, Shiyi and Fang, Hao-Shu" + " and Ma, Ze and Chen, Mingyang and Lu, Cewu", + title="PaStaNet: Toward Human Activity Knowledge Engine", + container="CVPR", + year="2020", + homepage="https://github.com/Fang-Haoshu/Halpe-FullBody/", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 17: - dict(name='head', id=17, color=[255, 128, 0], type='upper', swap=''), - 18: - dict(name='neck', id=18, color=[255, 128, 0], type='upper', swap=''), - 19: - dict(name='hip', id=19, color=[255, 128, 0], type='lower', swap=''), - 20: - dict( - name='left_big_toe', - id=20, - color=[255, 128, 0], - type='lower', - swap='right_big_toe'), - 21: - dict( - name='right_big_toe', - id=21, - color=[255, 128, 0], - type='lower', - swap='left_big_toe'), - 22: - dict( - name='left_small_toe', - id=22, - color=[255, 128, 0], - type='lower', - swap='right_small_toe'), - 23: - dict( - name='right_small_toe', - id=23, - color=[255, 128, 0], - type='lower', - swap='left_small_toe'), - 24: - dict( - name='left_heel', - id=24, - color=[255, 128, 0], - type='lower', - swap='right_heel'), - 25: - dict( - name='right_heel', - id=25, - color=[255, 128, 0], - type='lower', - swap='left_heel'), - 26: - dict( - name='face-0', - id=26, - color=[255, 255, 255], - type='', - swap='face-16'), - 27: - dict( - name='face-1', - id=27, - color=[255, 255, 255], - type='', - swap='face-15'), - 28: - dict( - name='face-2', - id=28, - color=[255, 255, 255], - type='', - swap='face-14'), - 29: - dict( - name='face-3', - id=29, - color=[255, 255, 255], - type='', - swap='face-13'), - 30: - dict( - name='face-4', - id=30, - color=[255, 255, 255], - type='', - swap='face-12'), - 31: - dict( - name='face-5', - id=31, - color=[255, 255, 255], - type='', - swap='face-11'), - 32: - dict( - name='face-6', - id=32, - color=[255, 255, 255], - type='', - swap='face-10'), - 33: - dict( - name='face-7', - id=33, - color=[255, 255, 255], - type='', - swap='face-9'), - 34: - dict(name='face-8', id=34, color=[255, 255, 255], type='', swap=''), - 35: - dict( - name='face-9', - id=35, - color=[255, 255, 255], - type='', - swap='face-7'), - 36: - dict( - name='face-10', - id=36, - color=[255, 255, 255], - type='', - swap='face-6'), - 37: - dict( - name='face-11', - id=37, - color=[255, 255, 255], - type='', - swap='face-5'), - 38: - dict( - name='face-12', - id=38, - color=[255, 255, 255], - type='', - swap='face-4'), - 39: - dict( - name='face-13', - id=39, - color=[255, 255, 255], - type='', - swap='face-3'), - 40: - dict( - name='face-14', - id=40, - color=[255, 255, 255], - type='', - swap='face-2'), - 41: - dict( - name='face-15', - id=41, - color=[255, 255, 255], - type='', - swap='face-1'), - 42: - dict( - name='face-16', - id=42, - color=[255, 255, 255], - type='', - swap='face-0'), - 43: - dict( - name='face-17', - id=43, - color=[255, 255, 255], - type='', - swap='face-26'), - 44: - dict( - name='face-18', - id=44, - color=[255, 255, 255], - type='', - swap='face-25'), - 45: - dict( - name='face-19', - id=45, - color=[255, 255, 255], - type='', - swap='face-24'), - 46: - dict( - name='face-20', - id=46, - color=[255, 255, 255], - type='', - swap='face-23'), - 47: - dict( - name='face-21', - id=47, - color=[255, 255, 255], - type='', - swap='face-22'), - 48: - dict( - name='face-22', - id=48, - color=[255, 255, 255], - type='', - swap='face-21'), - 49: - dict( - name='face-23', - id=49, - color=[255, 255, 255], - type='', - swap='face-20'), - 50: - dict( - name='face-24', - id=50, - color=[255, 255, 255], - type='', - swap='face-19'), - 51: - dict( - name='face-25', - id=51, - color=[255, 255, 255], - type='', - swap='face-18'), - 52: - dict( - name='face-26', - id=52, - color=[255, 255, 255], - type='', - swap='face-17'), - 53: - dict(name='face-27', id=53, color=[255, 255, 255], type='', swap=''), - 54: - dict(name='face-28', id=54, color=[255, 255, 255], type='', swap=''), - 55: - dict(name='face-29', id=55, color=[255, 255, 255], type='', swap=''), - 56: - dict(name='face-30', id=56, color=[255, 255, 255], type='', swap=''), - 57: - dict( - name='face-31', - id=57, - color=[255, 255, 255], - type='', - swap='face-35'), - 58: - dict( - name='face-32', - id=58, - color=[255, 255, 255], - type='', - swap='face-34'), - 59: - dict(name='face-33', id=59, color=[255, 255, 255], type='', swap=''), - 60: - dict( - name='face-34', - id=60, - color=[255, 255, 255], - type='', - swap='face-32'), - 61: - dict( - name='face-35', - id=61, - color=[255, 255, 255], - type='', - swap='face-31'), - 62: - dict( - name='face-36', - id=62, - color=[255, 255, 255], - type='', - swap='face-45'), - 63: - dict( - name='face-37', - id=63, - color=[255, 255, 255], - type='', - swap='face-44'), - 64: - dict( - name='face-38', - id=64, - color=[255, 255, 255], - type='', - swap='face-43'), - 65: - dict( - name='face-39', - id=65, - color=[255, 255, 255], - type='', - swap='face-42'), - 66: - dict( - name='face-40', - id=66, - color=[255, 255, 255], - type='', - swap='face-47'), - 67: - dict( - name='face-41', - id=67, - color=[255, 255, 255], - type='', - swap='face-46'), - 68: - dict( - name='face-42', - id=68, - color=[255, 255, 255], - type='', - swap='face-39'), - 69: - dict( - name='face-43', - id=69, - color=[255, 255, 255], - type='', - swap='face-38'), - 70: - dict( - name='face-44', - id=70, - color=[255, 255, 255], - type='', - swap='face-37'), - 71: - dict( - name='face-45', - id=71, - color=[255, 255, 255], - type='', - swap='face-36'), - 72: - dict( - name='face-46', - id=72, - color=[255, 255, 255], - type='', - swap='face-41'), - 73: - dict( - name='face-47', - id=73, - color=[255, 255, 255], - type='', - swap='face-40'), - 74: - dict( - name='face-48', - id=74, - color=[255, 255, 255], - type='', - swap='face-54'), - 75: - dict( - name='face-49', - id=75, - color=[255, 255, 255], - type='', - swap='face-53'), - 76: - dict( - name='face-50', - id=76, - color=[255, 255, 255], - type='', - swap='face-52'), - 77: - dict(name='face-51', id=77, color=[255, 255, 255], type='', swap=''), - 78: - dict( - name='face-52', - id=78, - color=[255, 255, 255], - type='', - swap='face-50'), - 79: - dict( - name='face-53', - id=79, - color=[255, 255, 255], - type='', - swap='face-49'), - 80: - dict( - name='face-54', - id=80, - color=[255, 255, 255], - type='', - swap='face-48'), - 81: - dict( - name='face-55', - id=81, - color=[255, 255, 255], - type='', - swap='face-59'), - 82: - dict( - name='face-56', - id=82, - color=[255, 255, 255], - type='', - swap='face-58'), - 83: - dict(name='face-57', id=83, color=[255, 255, 255], type='', swap=''), - 84: - dict( - name='face-58', - id=84, - color=[255, 255, 255], - type='', - swap='face-56'), - 85: - dict( - name='face-59', - id=85, - color=[255, 255, 255], - type='', - swap='face-55'), - 86: - dict( - name='face-60', - id=86, - color=[255, 255, 255], - type='', - swap='face-64'), - 87: - dict( - name='face-61', - id=87, - color=[255, 255, 255], - type='', - swap='face-63'), - 88: - dict(name='face-62', id=88, color=[255, 255, 255], type='', swap=''), - 89: - dict( - name='face-63', - id=89, - color=[255, 255, 255], - type='', - swap='face-61'), - 90: - dict( - name='face-64', - id=90, - color=[255, 255, 255], - type='', - swap='face-60'), - 91: - dict( - name='face-65', - id=91, - color=[255, 255, 255], - type='', - swap='face-67'), - 92: - dict(name='face-66', id=92, color=[255, 255, 255], type='', swap=''), - 93: - dict( - name='face-67', - id=93, - color=[255, 255, 255], - type='', - swap='face-65'), - 94: - dict( - name='left_hand_root', - id=94, - color=[255, 255, 255], - type='', - swap='right_hand_root'), - 95: - dict( - name='left_thumb1', - id=95, - color=[255, 128, 0], - type='', - swap='right_thumb1'), - 96: - dict( - name='left_thumb2', - id=96, - color=[255, 128, 0], - type='', - swap='right_thumb2'), - 97: - dict( - name='left_thumb3', - id=97, - color=[255, 128, 0], - type='', - swap='right_thumb3'), - 98: - dict( - name='left_thumb4', - id=98, - color=[255, 128, 0], - type='', - swap='right_thumb4'), - 99: - dict( - name='left_forefinger1', - id=99, - color=[255, 153, 255], - type='', - swap='right_forefinger1'), - 100: - dict( - name='left_forefinger2', - id=100, - color=[255, 153, 255], - type='', - swap='right_forefinger2'), - 101: - dict( - name='left_forefinger3', - id=101, - color=[255, 153, 255], - type='', - swap='right_forefinger3'), - 102: - dict( - name='left_forefinger4', - id=102, - color=[255, 153, 255], - type='', - swap='right_forefinger4'), - 103: - dict( - name='left_middle_finger1', - id=103, - color=[102, 178, 255], - type='', - swap='right_middle_finger1'), - 104: - dict( - name='left_middle_finger2', - id=104, - color=[102, 178, 255], - type='', - swap='right_middle_finger2'), - 105: - dict( - name='left_middle_finger3', - id=105, - color=[102, 178, 255], - type='', - swap='right_middle_finger3'), - 106: - dict( - name='left_middle_finger4', - id=106, - color=[102, 178, 255], - type='', - swap='right_middle_finger4'), - 107: - dict( - name='left_ring_finger1', - id=107, - color=[255, 51, 51], - type='', - swap='right_ring_finger1'), - 108: - dict( - name='left_ring_finger2', - id=108, - color=[255, 51, 51], - type='', - swap='right_ring_finger2'), - 109: - dict( - name='left_ring_finger3', - id=109, - color=[255, 51, 51], - type='', - swap='right_ring_finger3'), - 110: - dict( - name='left_ring_finger4', - id=110, - color=[255, 51, 51], - type='', - swap='right_ring_finger4'), - 111: - dict( - name='left_pinky_finger1', - id=111, - color=[0, 255, 0], - type='', - swap='right_pinky_finger1'), - 112: - dict( - name='left_pinky_finger2', - id=112, - color=[0, 255, 0], - type='', - swap='right_pinky_finger2'), - 113: - dict( - name='left_pinky_finger3', - id=113, - color=[0, 255, 0], - type='', - swap='right_pinky_finger3'), - 114: - dict( - name='left_pinky_finger4', - id=114, - color=[0, 255, 0], - type='', - swap='right_pinky_finger4'), - 115: - dict( - name='right_hand_root', - id=115, - color=[255, 255, 255], - type='', - swap='left_hand_root'), - 116: - dict( - name='right_thumb1', - id=116, - color=[255, 128, 0], - type='', - swap='left_thumb1'), - 117: - dict( - name='right_thumb2', - id=117, - color=[255, 128, 0], - type='', - swap='left_thumb2'), - 118: - dict( - name='right_thumb3', - id=118, - color=[255, 128, 0], - type='', - swap='left_thumb3'), - 119: - dict( - name='right_thumb4', - id=119, - color=[255, 128, 0], - type='', - swap='left_thumb4'), - 120: - dict( - name='right_forefinger1', - id=120, - color=[255, 153, 255], - type='', - swap='left_forefinger1'), - 121: - dict( - name='right_forefinger2', - id=121, - color=[255, 153, 255], - type='', - swap='left_forefinger2'), - 122: - dict( - name='right_forefinger3', - id=122, - color=[255, 153, 255], - type='', - swap='left_forefinger3'), - 123: - dict( - name='right_forefinger4', - id=123, - color=[255, 153, 255], - type='', - swap='left_forefinger4'), - 124: - dict( - name='right_middle_finger1', - id=124, - color=[102, 178, 255], - type='', - swap='left_middle_finger1'), - 125: - dict( - name='right_middle_finger2', - id=125, - color=[102, 178, 255], - type='', - swap='left_middle_finger2'), - 126: - dict( - name='right_middle_finger3', - id=126, - color=[102, 178, 255], - type='', - swap='left_middle_finger3'), - 127: - dict( - name='right_middle_finger4', - id=127, - color=[102, 178, 255], - type='', - swap='left_middle_finger4'), - 128: - dict( - name='right_ring_finger1', - id=128, - color=[255, 51, 51], - type='', - swap='left_ring_finger1'), - 129: - dict( - name='right_ring_finger2', - id=129, - color=[255, 51, 51], - type='', - swap='left_ring_finger2'), - 130: - dict( - name='right_ring_finger3', - id=130, - color=[255, 51, 51], - type='', - swap='left_ring_finger3'), - 131: - dict( - name='right_ring_finger4', - id=131, - color=[255, 51, 51], - type='', - swap='left_ring_finger4'), - 132: - dict( - name='right_pinky_finger1', - id=132, - color=[0, 255, 0], - type='', - swap='left_pinky_finger1'), - 133: - dict( - name='right_pinky_finger2', - id=133, - color=[0, 255, 0], - type='', - swap='left_pinky_finger2'), - 134: - dict( - name='right_pinky_finger3', - id=134, - color=[0, 255, 0], - type='', - swap='left_pinky_finger3'), - 135: - dict( - name='right_pinky_finger4', - id=135, - color=[0, 255, 0], - type='', - swap='left_pinky_finger4') + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), + 17: dict(name="head", id=17, color=[255, 128, 0], type="upper", swap=""), + 18: dict(name="neck", id=18, color=[255, 128, 0], type="upper", swap=""), + 19: dict(name="hip", id=19, color=[255, 128, 0], type="lower", swap=""), + 20: dict(name="left_big_toe", id=20, color=[255, 128, 0], type="lower", swap="right_big_toe"), + 21: dict(name="right_big_toe", id=21, color=[255, 128, 0], type="lower", swap="left_big_toe"), + 22: dict(name="left_small_toe", id=22, color=[255, 128, 0], type="lower", swap="right_small_toe"), + 23: dict(name="right_small_toe", id=23, color=[255, 128, 0], type="lower", swap="left_small_toe"), + 24: dict(name="left_heel", id=24, color=[255, 128, 0], type="lower", swap="right_heel"), + 25: dict(name="right_heel", id=25, color=[255, 128, 0], type="lower", swap="left_heel"), + 26: dict(name="face-0", id=26, color=[255, 255, 255], type="", swap="face-16"), + 27: dict(name="face-1", id=27, color=[255, 255, 255], type="", swap="face-15"), + 28: dict(name="face-2", id=28, color=[255, 255, 255], type="", swap="face-14"), + 29: dict(name="face-3", id=29, color=[255, 255, 255], type="", swap="face-13"), + 30: dict(name="face-4", id=30, color=[255, 255, 255], type="", swap="face-12"), + 31: dict(name="face-5", id=31, color=[255, 255, 255], type="", swap="face-11"), + 32: dict(name="face-6", id=32, color=[255, 255, 255], type="", swap="face-10"), + 33: dict(name="face-7", id=33, color=[255, 255, 255], type="", swap="face-9"), + 34: dict(name="face-8", id=34, color=[255, 255, 255], type="", swap=""), + 35: dict(name="face-9", id=35, color=[255, 255, 255], type="", swap="face-7"), + 36: dict(name="face-10", id=36, color=[255, 255, 255], type="", swap="face-6"), + 37: dict(name="face-11", id=37, color=[255, 255, 255], type="", swap="face-5"), + 38: dict(name="face-12", id=38, color=[255, 255, 255], type="", swap="face-4"), + 39: dict(name="face-13", id=39, color=[255, 255, 255], type="", swap="face-3"), + 40: dict(name="face-14", id=40, color=[255, 255, 255], type="", swap="face-2"), + 41: dict(name="face-15", id=41, color=[255, 255, 255], type="", swap="face-1"), + 42: dict(name="face-16", id=42, color=[255, 255, 255], type="", swap="face-0"), + 43: dict(name="face-17", id=43, color=[255, 255, 255], type="", swap="face-26"), + 44: dict(name="face-18", id=44, color=[255, 255, 255], type="", swap="face-25"), + 45: dict(name="face-19", id=45, color=[255, 255, 255], type="", swap="face-24"), + 46: dict(name="face-20", id=46, color=[255, 255, 255], type="", swap="face-23"), + 47: dict(name="face-21", id=47, color=[255, 255, 255], type="", swap="face-22"), + 48: dict(name="face-22", id=48, color=[255, 255, 255], type="", swap="face-21"), + 49: dict(name="face-23", id=49, color=[255, 255, 255], type="", swap="face-20"), + 50: dict(name="face-24", id=50, color=[255, 255, 255], type="", swap="face-19"), + 51: dict(name="face-25", id=51, color=[255, 255, 255], type="", swap="face-18"), + 52: dict(name="face-26", id=52, color=[255, 255, 255], type="", swap="face-17"), + 53: dict(name="face-27", id=53, color=[255, 255, 255], type="", swap=""), + 54: dict(name="face-28", id=54, color=[255, 255, 255], type="", swap=""), + 55: dict(name="face-29", id=55, color=[255, 255, 255], type="", swap=""), + 56: dict(name="face-30", id=56, color=[255, 255, 255], type="", swap=""), + 57: dict(name="face-31", id=57, color=[255, 255, 255], type="", swap="face-35"), + 58: dict(name="face-32", id=58, color=[255, 255, 255], type="", swap="face-34"), + 59: dict(name="face-33", id=59, color=[255, 255, 255], type="", swap=""), + 60: dict(name="face-34", id=60, color=[255, 255, 255], type="", swap="face-32"), + 61: dict(name="face-35", id=61, color=[255, 255, 255], type="", swap="face-31"), + 62: dict(name="face-36", id=62, color=[255, 255, 255], type="", swap="face-45"), + 63: dict(name="face-37", id=63, color=[255, 255, 255], type="", swap="face-44"), + 64: dict(name="face-38", id=64, color=[255, 255, 255], type="", swap="face-43"), + 65: dict(name="face-39", id=65, color=[255, 255, 255], type="", swap="face-42"), + 66: dict(name="face-40", id=66, color=[255, 255, 255], type="", swap="face-47"), + 67: dict(name="face-41", id=67, color=[255, 255, 255], type="", swap="face-46"), + 68: dict(name="face-42", id=68, color=[255, 255, 255], type="", swap="face-39"), + 69: dict(name="face-43", id=69, color=[255, 255, 255], type="", swap="face-38"), + 70: dict(name="face-44", id=70, color=[255, 255, 255], type="", swap="face-37"), + 71: dict(name="face-45", id=71, color=[255, 255, 255], type="", swap="face-36"), + 72: dict(name="face-46", id=72, color=[255, 255, 255], type="", swap="face-41"), + 73: dict(name="face-47", id=73, color=[255, 255, 255], type="", swap="face-40"), + 74: dict(name="face-48", id=74, color=[255, 255, 255], type="", swap="face-54"), + 75: dict(name="face-49", id=75, color=[255, 255, 255], type="", swap="face-53"), + 76: dict(name="face-50", id=76, color=[255, 255, 255], type="", swap="face-52"), + 77: dict(name="face-51", id=77, color=[255, 255, 255], type="", swap=""), + 78: dict(name="face-52", id=78, color=[255, 255, 255], type="", swap="face-50"), + 79: dict(name="face-53", id=79, color=[255, 255, 255], type="", swap="face-49"), + 80: dict(name="face-54", id=80, color=[255, 255, 255], type="", swap="face-48"), + 81: dict(name="face-55", id=81, color=[255, 255, 255], type="", swap="face-59"), + 82: dict(name="face-56", id=82, color=[255, 255, 255], type="", swap="face-58"), + 83: dict(name="face-57", id=83, color=[255, 255, 255], type="", swap=""), + 84: dict(name="face-58", id=84, color=[255, 255, 255], type="", swap="face-56"), + 85: dict(name="face-59", id=85, color=[255, 255, 255], type="", swap="face-55"), + 86: dict(name="face-60", id=86, color=[255, 255, 255], type="", swap="face-64"), + 87: dict(name="face-61", id=87, color=[255, 255, 255], type="", swap="face-63"), + 88: dict(name="face-62", id=88, color=[255, 255, 255], type="", swap=""), + 89: dict(name="face-63", id=89, color=[255, 255, 255], type="", swap="face-61"), + 90: dict(name="face-64", id=90, color=[255, 255, 255], type="", swap="face-60"), + 91: dict(name="face-65", id=91, color=[255, 255, 255], type="", swap="face-67"), + 92: dict(name="face-66", id=92, color=[255, 255, 255], type="", swap=""), + 93: dict(name="face-67", id=93, color=[255, 255, 255], type="", swap="face-65"), + 94: dict(name="left_hand_root", id=94, color=[255, 255, 255], type="", swap="right_hand_root"), + 95: dict(name="left_thumb1", id=95, color=[255, 128, 0], type="", swap="right_thumb1"), + 96: dict(name="left_thumb2", id=96, color=[255, 128, 0], type="", swap="right_thumb2"), + 97: dict(name="left_thumb3", id=97, color=[255, 128, 0], type="", swap="right_thumb3"), + 98: dict(name="left_thumb4", id=98, color=[255, 128, 0], type="", swap="right_thumb4"), + 99: dict(name="left_forefinger1", id=99, color=[255, 153, 255], type="", swap="right_forefinger1"), + 100: dict(name="left_forefinger2", id=100, color=[255, 153, 255], type="", swap="right_forefinger2"), + 101: dict(name="left_forefinger3", id=101, color=[255, 153, 255], type="", swap="right_forefinger3"), + 102: dict(name="left_forefinger4", id=102, color=[255, 153, 255], type="", swap="right_forefinger4"), + 103: dict(name="left_middle_finger1", id=103, color=[102, 178, 255], type="", swap="right_middle_finger1"), + 104: dict(name="left_middle_finger2", id=104, color=[102, 178, 255], type="", swap="right_middle_finger2"), + 105: dict(name="left_middle_finger3", id=105, color=[102, 178, 255], type="", swap="right_middle_finger3"), + 106: dict(name="left_middle_finger4", id=106, color=[102, 178, 255], type="", swap="right_middle_finger4"), + 107: dict(name="left_ring_finger1", id=107, color=[255, 51, 51], type="", swap="right_ring_finger1"), + 108: dict(name="left_ring_finger2", id=108, color=[255, 51, 51], type="", swap="right_ring_finger2"), + 109: dict(name="left_ring_finger3", id=109, color=[255, 51, 51], type="", swap="right_ring_finger3"), + 110: dict(name="left_ring_finger4", id=110, color=[255, 51, 51], type="", swap="right_ring_finger4"), + 111: dict(name="left_pinky_finger1", id=111, color=[0, 255, 0], type="", swap="right_pinky_finger1"), + 112: dict(name="left_pinky_finger2", id=112, color=[0, 255, 0], type="", swap="right_pinky_finger2"), + 113: dict(name="left_pinky_finger3", id=113, color=[0, 255, 0], type="", swap="right_pinky_finger3"), + 114: dict(name="left_pinky_finger4", id=114, color=[0, 255, 0], type="", swap="right_pinky_finger4"), + 115: dict(name="right_hand_root", id=115, color=[255, 255, 255], type="", swap="left_hand_root"), + 116: dict(name="right_thumb1", id=116, color=[255, 128, 0], type="", swap="left_thumb1"), + 117: dict(name="right_thumb2", id=117, color=[255, 128, 0], type="", swap="left_thumb2"), + 118: dict(name="right_thumb3", id=118, color=[255, 128, 0], type="", swap="left_thumb3"), + 119: dict(name="right_thumb4", id=119, color=[255, 128, 0], type="", swap="left_thumb4"), + 120: dict(name="right_forefinger1", id=120, color=[255, 153, 255], type="", swap="left_forefinger1"), + 121: dict(name="right_forefinger2", id=121, color=[255, 153, 255], type="", swap="left_forefinger2"), + 122: dict(name="right_forefinger3", id=122, color=[255, 153, 255], type="", swap="left_forefinger3"), + 123: dict(name="right_forefinger4", id=123, color=[255, 153, 255], type="", swap="left_forefinger4"), + 124: dict(name="right_middle_finger1", id=124, color=[102, 178, 255], type="", swap="left_middle_finger1"), + 125: dict(name="right_middle_finger2", id=125, color=[102, 178, 255], type="", swap="left_middle_finger2"), + 126: dict(name="right_middle_finger3", id=126, color=[102, 178, 255], type="", swap="left_middle_finger3"), + 127: dict(name="right_middle_finger4", id=127, color=[102, 178, 255], type="", swap="left_middle_finger4"), + 128: dict(name="right_ring_finger1", id=128, color=[255, 51, 51], type="", swap="left_ring_finger1"), + 129: dict(name="right_ring_finger2", id=129, color=[255, 51, 51], type="", swap="left_ring_finger2"), + 130: dict(name="right_ring_finger3", id=130, color=[255, 51, 51], type="", swap="left_ring_finger3"), + 131: dict(name="right_ring_finger4", id=131, color=[255, 51, 51], type="", swap="left_ring_finger4"), + 132: dict(name="right_pinky_finger1", id=132, color=[0, 255, 0], type="", swap="left_pinky_finger1"), + 133: dict(name="right_pinky_finger2", id=133, color=[0, 255, 0], type="", swap="left_pinky_finger2"), + 134: dict(name="right_pinky_finger3", id=134, color=[0, 255, 0], type="", swap="left_pinky_finger3"), + 135: dict(name="right_pinky_finger4", id=135, color=[0, 255, 0], type="", swap="left_pinky_finger4"), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('left_hip', 'hip'), id=2, color=[0, 255, 0]), - 3: - dict(link=('right_ankle', 'right_knee'), id=3, color=[255, 128, 0]), - 4: - dict(link=('right_knee', 'right_hip'), id=4, color=[255, 128, 0]), - 5: - dict(link=('right_hip', 'hip'), id=5, color=[255, 128, 0]), - 6: - dict(link=('head', 'neck'), id=6, color=[51, 153, 255]), - 7: - dict(link=('neck', 'hip'), id=7, color=[51, 153, 255]), - 8: - dict(link=('neck', 'left_shoulder'), id=8, color=[0, 255, 0]), - 9: - dict(link=('left_shoulder', 'left_elbow'), id=9, color=[0, 255, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('neck', 'right_shoulder'), id=11, color=[255, 128, 0]), - 12: - dict( - link=('right_shoulder', 'right_elbow'), id=12, color=[255, 128, - 0]), - 13: - dict(link=('right_elbow', 'right_wrist'), id=13, color=[255, 128, 0]), - 14: - dict(link=('left_eye', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('nose', 'left_eye'), id=15, color=[51, 153, 255]), - 16: - dict(link=('nose', 'right_eye'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_eye', 'left_ear'), id=17, color=[51, 153, 255]), - 18: - dict(link=('right_eye', 'right_ear'), id=18, color=[51, 153, 255]), - 19: - dict(link=('left_ear', 'left_shoulder'), id=19, color=[51, 153, 255]), - 20: - dict( - link=('right_ear', 'right_shoulder'), id=20, color=[51, 153, 255]), - 21: - dict(link=('left_ankle', 'left_big_toe'), id=21, color=[0, 255, 0]), - 22: - dict(link=('left_ankle', 'left_small_toe'), id=22, color=[0, 255, 0]), - 23: - dict(link=('left_ankle', 'left_heel'), id=23, color=[0, 255, 0]), - 24: - dict( - link=('right_ankle', 'right_big_toe'), id=24, color=[255, 128, 0]), - 25: - dict( - link=('right_ankle', 'right_small_toe'), - id=25, - color=[255, 128, 0]), - 26: - dict(link=('right_ankle', 'right_heel'), id=26, color=[255, 128, 0]), - 27: - dict(link=('left_wrist', 'left_thumb1'), id=27, color=[255, 128, 0]), - 28: - dict(link=('left_thumb1', 'left_thumb2'), id=28, color=[255, 128, 0]), - 29: - dict(link=('left_thumb2', 'left_thumb3'), id=29, color=[255, 128, 0]), - 30: - dict(link=('left_thumb3', 'left_thumb4'), id=30, color=[255, 128, 0]), - 31: - dict( - link=('left_wrist', 'left_forefinger1'), - id=31, - color=[255, 153, 255]), - 32: - dict( - link=('left_forefinger1', 'left_forefinger2'), - id=32, - color=[255, 153, 255]), - 33: - dict( - link=('left_forefinger2', 'left_forefinger3'), - id=33, - color=[255, 153, 255]), - 34: - dict( - link=('left_forefinger3', 'left_forefinger4'), - id=34, - color=[255, 153, 255]), - 35: - dict( - link=('left_wrist', 'left_middle_finger1'), - id=35, - color=[102, 178, 255]), - 36: - dict( - link=('left_middle_finger1', 'left_middle_finger2'), - id=36, - color=[102, 178, 255]), - 37: - dict( - link=('left_middle_finger2', 'left_middle_finger3'), - id=37, - color=[102, 178, 255]), - 38: - dict( - link=('left_middle_finger3', 'left_middle_finger4'), - id=38, - color=[102, 178, 255]), - 39: - dict( - link=('left_wrist', 'left_ring_finger1'), - id=39, - color=[255, 51, 51]), - 40: - dict( - link=('left_ring_finger1', 'left_ring_finger2'), - id=40, - color=[255, 51, 51]), - 41: - dict( - link=('left_ring_finger2', 'left_ring_finger3'), - id=41, - color=[255, 51, 51]), - 42: - dict( - link=('left_ring_finger3', 'left_ring_finger4'), - id=42, - color=[255, 51, 51]), - 43: - dict( - link=('left_wrist', 'left_pinky_finger1'), - id=43, - color=[0, 255, 0]), - 44: - dict( - link=('left_pinky_finger1', 'left_pinky_finger2'), - id=44, - color=[0, 255, 0]), - 45: - dict( - link=('left_pinky_finger2', 'left_pinky_finger3'), - id=45, - color=[0, 255, 0]), - 46: - dict( - link=('left_pinky_finger3', 'left_pinky_finger4'), - id=46, - color=[0, 255, 0]), - 47: - dict(link=('right_wrist', 'right_thumb1'), id=47, color=[255, 128, 0]), - 48: - dict( - link=('right_thumb1', 'right_thumb2'), id=48, color=[255, 128, 0]), - 49: - dict( - link=('right_thumb2', 'right_thumb3'), id=49, color=[255, 128, 0]), - 50: - dict( - link=('right_thumb3', 'right_thumb4'), id=50, color=[255, 128, 0]), - 51: - dict( - link=('right_wrist', 'right_forefinger1'), - id=51, - color=[255, 153, 255]), - 52: - dict( - link=('right_forefinger1', 'right_forefinger2'), - id=52, - color=[255, 153, 255]), - 53: - dict( - link=('right_forefinger2', 'right_forefinger3'), - id=53, - color=[255, 153, 255]), - 54: - dict( - link=('right_forefinger3', 'right_forefinger4'), - id=54, - color=[255, 153, 255]), - 55: - dict( - link=('right_wrist', 'right_middle_finger1'), - id=55, - color=[102, 178, 255]), - 56: - dict( - link=('right_middle_finger1', 'right_middle_finger2'), - id=56, - color=[102, 178, 255]), - 57: - dict( - link=('right_middle_finger2', 'right_middle_finger3'), - id=57, - color=[102, 178, 255]), - 58: - dict( - link=('right_middle_finger3', 'right_middle_finger4'), - id=58, - color=[102, 178, 255]), - 59: - dict( - link=('right_wrist', 'right_ring_finger1'), - id=59, - color=[255, 51, 51]), - 60: - dict( - link=('right_ring_finger1', 'right_ring_finger2'), - id=60, - color=[255, 51, 51]), - 61: - dict( - link=('right_ring_finger2', 'right_ring_finger3'), - id=61, - color=[255, 51, 51]), - 62: - dict( - link=('right_ring_finger3', 'right_ring_finger4'), - id=62, - color=[255, 51, 51]), - 63: - dict( - link=('right_wrist', 'right_pinky_finger1'), - id=63, - color=[0, 255, 0]), - 64: - dict( - link=('right_pinky_finger1', 'right_pinky_finger2'), - id=64, - color=[0, 255, 0]), - 65: - dict( - link=('right_pinky_finger2', 'right_pinky_finger3'), - id=65, - color=[0, 255, 0]), - 66: - dict( - link=('right_pinky_finger3', 'right_pinky_finger4'), - id=66, - color=[0, 255, 0]) + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("left_hip", "hip"), id=2, color=[0, 255, 0]), + 3: dict(link=("right_ankle", "right_knee"), id=3, color=[255, 128, 0]), + 4: dict(link=("right_knee", "right_hip"), id=4, color=[255, 128, 0]), + 5: dict(link=("right_hip", "hip"), id=5, color=[255, 128, 0]), + 6: dict(link=("head", "neck"), id=6, color=[51, 153, 255]), + 7: dict(link=("neck", "hip"), id=7, color=[51, 153, 255]), + 8: dict(link=("neck", "left_shoulder"), id=8, color=[0, 255, 0]), + 9: dict(link=("left_shoulder", "left_elbow"), id=9, color=[0, 255, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("neck", "right_shoulder"), id=11, color=[255, 128, 0]), + 12: dict(link=("right_shoulder", "right_elbow"), id=12, color=[255, 128, 0]), + 13: dict(link=("right_elbow", "right_wrist"), id=13, color=[255, 128, 0]), + 14: dict(link=("left_eye", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("nose", "left_eye"), id=15, color=[51, 153, 255]), + 16: dict(link=("nose", "right_eye"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_eye", "left_ear"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_eye", "right_ear"), id=18, color=[51, 153, 255]), + 19: dict(link=("left_ear", "left_shoulder"), id=19, color=[51, 153, 255]), + 20: dict(link=("right_ear", "right_shoulder"), id=20, color=[51, 153, 255]), + 21: dict(link=("left_ankle", "left_big_toe"), id=21, color=[0, 255, 0]), + 22: dict(link=("left_ankle", "left_small_toe"), id=22, color=[0, 255, 0]), + 23: dict(link=("left_ankle", "left_heel"), id=23, color=[0, 255, 0]), + 24: dict(link=("right_ankle", "right_big_toe"), id=24, color=[255, 128, 0]), + 25: dict(link=("right_ankle", "right_small_toe"), id=25, color=[255, 128, 0]), + 26: dict(link=("right_ankle", "right_heel"), id=26, color=[255, 128, 0]), + 27: dict(link=("left_wrist", "left_thumb1"), id=27, color=[255, 128, 0]), + 28: dict(link=("left_thumb1", "left_thumb2"), id=28, color=[255, 128, 0]), + 29: dict(link=("left_thumb2", "left_thumb3"), id=29, color=[255, 128, 0]), + 30: dict(link=("left_thumb3", "left_thumb4"), id=30, color=[255, 128, 0]), + 31: dict(link=("left_wrist", "left_forefinger1"), id=31, color=[255, 153, 255]), + 32: dict(link=("left_forefinger1", "left_forefinger2"), id=32, color=[255, 153, 255]), + 33: dict(link=("left_forefinger2", "left_forefinger3"), id=33, color=[255, 153, 255]), + 34: dict(link=("left_forefinger3", "left_forefinger4"), id=34, color=[255, 153, 255]), + 35: dict(link=("left_wrist", "left_middle_finger1"), id=35, color=[102, 178, 255]), + 36: dict(link=("left_middle_finger1", "left_middle_finger2"), id=36, color=[102, 178, 255]), + 37: dict(link=("left_middle_finger2", "left_middle_finger3"), id=37, color=[102, 178, 255]), + 38: dict(link=("left_middle_finger3", "left_middle_finger4"), id=38, color=[102, 178, 255]), + 39: dict(link=("left_wrist", "left_ring_finger1"), id=39, color=[255, 51, 51]), + 40: dict(link=("left_ring_finger1", "left_ring_finger2"), id=40, color=[255, 51, 51]), + 41: dict(link=("left_ring_finger2", "left_ring_finger3"), id=41, color=[255, 51, 51]), + 42: dict(link=("left_ring_finger3", "left_ring_finger4"), id=42, color=[255, 51, 51]), + 43: dict(link=("left_wrist", "left_pinky_finger1"), id=43, color=[0, 255, 0]), + 44: dict(link=("left_pinky_finger1", "left_pinky_finger2"), id=44, color=[0, 255, 0]), + 45: dict(link=("left_pinky_finger2", "left_pinky_finger3"), id=45, color=[0, 255, 0]), + 46: dict(link=("left_pinky_finger3", "left_pinky_finger4"), id=46, color=[0, 255, 0]), + 47: dict(link=("right_wrist", "right_thumb1"), id=47, color=[255, 128, 0]), + 48: dict(link=("right_thumb1", "right_thumb2"), id=48, color=[255, 128, 0]), + 49: dict(link=("right_thumb2", "right_thumb3"), id=49, color=[255, 128, 0]), + 50: dict(link=("right_thumb3", "right_thumb4"), id=50, color=[255, 128, 0]), + 51: dict(link=("right_wrist", "right_forefinger1"), id=51, color=[255, 153, 255]), + 52: dict(link=("right_forefinger1", "right_forefinger2"), id=52, color=[255, 153, 255]), + 53: dict(link=("right_forefinger2", "right_forefinger3"), id=53, color=[255, 153, 255]), + 54: dict(link=("right_forefinger3", "right_forefinger4"), id=54, color=[255, 153, 255]), + 55: dict(link=("right_wrist", "right_middle_finger1"), id=55, color=[102, 178, 255]), + 56: dict(link=("right_middle_finger1", "right_middle_finger2"), id=56, color=[102, 178, 255]), + 57: dict(link=("right_middle_finger2", "right_middle_finger3"), id=57, color=[102, 178, 255]), + 58: dict(link=("right_middle_finger3", "right_middle_finger4"), id=58, color=[102, 178, 255]), + 59: dict(link=("right_wrist", "right_ring_finger1"), id=59, color=[255, 51, 51]), + 60: dict(link=("right_ring_finger1", "right_ring_finger2"), id=60, color=[255, 51, 51]), + 61: dict(link=("right_ring_finger2", "right_ring_finger3"), id=61, color=[255, 51, 51]), + 62: dict(link=("right_ring_finger3", "right_ring_finger4"), id=62, color=[255, 51, 51]), + 63: dict(link=("right_wrist", "right_pinky_finger1"), id=63, color=[0, 255, 0]), + 64: dict(link=("right_pinky_finger1", "right_pinky_finger2"), id=64, color=[0, 255, 0]), + 65: dict(link=("right_pinky_finger2", "right_pinky_finger3"), id=65, color=[0, 255, 0]), + 66: dict(link=("right_pinky_finger3", "right_pinky_finger4"), id=66, color=[0, 255, 0]), }, - joint_weights=[1.] * 136, - + joint_weights=[1.0] * 136, # 'https://github.com/Fang-Haoshu/Halpe-FullBody/blob/master/' # 'HalpeCOCOAPI/PythonAPI/halpecocotools/cocoeval.py#L245' sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, 0.08, 0.08, 0.08, - 0.089, 0.089, 0.089, 0.089, 0.089, 0.089, 0.015, 0.015, 0.015, 0.015, - 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, - 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, - 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, - 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, - 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, - 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, - 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, - 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, - 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, - 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, 0.015, - 0.015, 0.015, 0.015, 0.015, 0.015, 0.015 - ]) + 0.026, + 0.025, + 0.025, + 0.035, + 0.035, + 0.079, + 0.079, + 0.072, + 0.072, + 0.062, + 0.062, + 0.107, + 0.107, + 0.087, + 0.087, + 0.089, + 0.089, + 0.08, + 0.08, + 0.08, + 0.089, + 0.089, + 0.089, + 0.089, + 0.089, + 0.089, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + 0.015, + ], +) diff --git a/mmpose/configs/_base_/datasets/halpe26.py b/mmpose/configs/_base_/datasets/halpe26.py index cb4df83874c08ee7169aed251b266a03e411ccc9..c09a9663f26c6460bf31f8ea9c6d4e08c30f21d1 100644 --- a/mmpose/configs/_base_/datasets/halpe26.py +++ b/mmpose/configs/_base_/datasets/halpe26.py @@ -1,247 +1,73 @@ dataset_info = dict( - dataset_name='halpe26', + dataset_name="halpe26", paper_info=dict( - author='Li, Yong-Lu and Xu, Liang and Liu, Xinpeng and Huang, Xijie' - ' and Xu, Yue and Wang, Shiyi and Fang, Hao-Shu' - ' and Ma, Ze and Chen, Mingyang and Lu, Cewu', - title='PaStaNet: Toward Human Activity Knowledge Engine', - container='CVPR', - year='2020', - homepage='https://github.com/Fang-Haoshu/Halpe-FullBody/', + author="Li, Yong-Lu and Xu, Liang and Liu, Xinpeng and Huang, Xijie" + " and Xu, Yue and Wang, Shiyi and Fang, Hao-Shu" + " and Ma, Ze and Chen, Mingyang and Lu, Cewu", + title="PaStaNet: Toward Human Activity Knowledge Engine", + container="CVPR", + year="2020", + homepage="https://github.com/Fang-Haoshu/Halpe-FullBody/", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 17: - dict(name='head', id=17, color=[255, 128, 0], type='upper', swap=''), - 18: - dict(name='neck', id=18, color=[255, 128, 0], type='upper', swap=''), - 19: - dict(name='hip', id=19, color=[255, 128, 0], type='lower', swap=''), - 20: - dict( - name='left_big_toe', - id=20, - color=[255, 128, 0], - type='lower', - swap='right_big_toe'), - 21: - dict( - name='right_big_toe', - id=21, - color=[255, 128, 0], - type='lower', - swap='left_big_toe'), - 22: - dict( - name='left_small_toe', - id=22, - color=[255, 128, 0], - type='lower', - swap='right_small_toe'), - 23: - dict( - name='right_small_toe', - id=23, - color=[255, 128, 0], - type='lower', - swap='left_small_toe'), - 24: - dict( - name='left_heel', - id=24, - color=[255, 128, 0], - type='lower', - swap='right_heel'), - 25: - dict( - name='right_heel', - id=25, - color=[255, 128, 0], - type='lower', - swap='left_heel') + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), + 17: dict(name="head", id=17, color=[255, 128, 0], type="upper", swap=""), + 18: dict(name="neck", id=18, color=[255, 128, 0], type="upper", swap=""), + 19: dict(name="hip", id=19, color=[255, 128, 0], type="lower", swap=""), + 20: dict(name="left_big_toe", id=20, color=[255, 128, 0], type="lower", swap="right_big_toe"), + 21: dict(name="right_big_toe", id=21, color=[255, 128, 0], type="lower", swap="left_big_toe"), + 22: dict(name="left_small_toe", id=22, color=[255, 128, 0], type="lower", swap="right_small_toe"), + 23: dict(name="right_small_toe", id=23, color=[255, 128, 0], type="lower", swap="left_small_toe"), + 24: dict(name="left_heel", id=24, color=[255, 128, 0], type="lower", swap="right_heel"), + 25: dict(name="right_heel", id=25, color=[255, 128, 0], type="lower", swap="left_heel"), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('left_hip', 'hip'), id=2, color=[0, 255, 0]), - 3: - dict(link=('right_ankle', 'right_knee'), id=3, color=[255, 128, 0]), - 4: - dict(link=('right_knee', 'right_hip'), id=4, color=[255, 128, 0]), - 5: - dict(link=('right_hip', 'hip'), id=5, color=[255, 128, 0]), - 6: - dict(link=('head', 'neck'), id=6, color=[51, 153, 255]), - 7: - dict(link=('neck', 'hip'), id=7, color=[51, 153, 255]), - 8: - dict(link=('neck', 'left_shoulder'), id=8, color=[0, 255, 0]), - 9: - dict(link=('left_shoulder', 'left_elbow'), id=9, color=[0, 255, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('neck', 'right_shoulder'), id=11, color=[255, 128, 0]), - 12: - dict( - link=('right_shoulder', 'right_elbow'), id=12, color=[255, 128, - 0]), - 13: - dict(link=('right_elbow', 'right_wrist'), id=13, color=[255, 128, 0]), - 14: - dict(link=('left_eye', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('nose', 'left_eye'), id=15, color=[51, 153, 255]), - 16: - dict(link=('nose', 'right_eye'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_eye', 'left_ear'), id=17, color=[51, 153, 255]), - 18: - dict(link=('right_eye', 'right_ear'), id=18, color=[51, 153, 255]), - 19: - dict(link=('left_ear', 'left_shoulder'), id=19, color=[51, 153, 255]), - 20: - dict( - link=('right_ear', 'right_shoulder'), id=20, color=[51, 153, 255]), - 21: - dict(link=('left_ankle', 'left_big_toe'), id=21, color=[0, 255, 0]), - 22: - dict(link=('left_ankle', 'left_small_toe'), id=22, color=[0, 255, 0]), - 23: - dict(link=('left_ankle', 'left_heel'), id=23, color=[0, 255, 0]), - 24: - dict( - link=('right_ankle', 'right_big_toe'), id=24, color=[255, 128, 0]), - 25: - dict( - link=('right_ankle', 'right_small_toe'), - id=25, - color=[255, 128, 0]), - 26: - dict(link=('right_ankle', 'right_heel'), id=26, color=[255, 128, 0]), + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("left_hip", "hip"), id=2, color=[0, 255, 0]), + 3: dict(link=("right_ankle", "right_knee"), id=3, color=[255, 128, 0]), + 4: dict(link=("right_knee", "right_hip"), id=4, color=[255, 128, 0]), + 5: dict(link=("right_hip", "hip"), id=5, color=[255, 128, 0]), + 6: dict(link=("head", "neck"), id=6, color=[51, 153, 255]), + 7: dict(link=("neck", "hip"), id=7, color=[51, 153, 255]), + 8: dict(link=("neck", "left_shoulder"), id=8, color=[0, 255, 0]), + 9: dict(link=("left_shoulder", "left_elbow"), id=9, color=[0, 255, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("neck", "right_shoulder"), id=11, color=[255, 128, 0]), + 12: dict(link=("right_shoulder", "right_elbow"), id=12, color=[255, 128, 0]), + 13: dict(link=("right_elbow", "right_wrist"), id=13, color=[255, 128, 0]), + 14: dict(link=("left_eye", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("nose", "left_eye"), id=15, color=[51, 153, 255]), + 16: dict(link=("nose", "right_eye"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_eye", "left_ear"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_eye", "right_ear"), id=18, color=[51, 153, 255]), + 19: dict(link=("left_ear", "left_shoulder"), id=19, color=[51, 153, 255]), + 20: dict(link=("right_ear", "right_shoulder"), id=20, color=[51, 153, 255]), + 21: dict(link=("left_ankle", "left_big_toe"), id=21, color=[0, 255, 0]), + 22: dict(link=("left_ankle", "left_small_toe"), id=22, color=[0, 255, 0]), + 23: dict(link=("left_ankle", "left_heel"), id=23, color=[0, 255, 0]), + 24: dict(link=("right_ankle", "right_big_toe"), id=24, color=[255, 128, 0]), + 25: dict(link=("right_ankle", "right_small_toe"), id=25, color=[255, 128, 0]), + 26: dict(link=("right_ankle", "right_heel"), id=26, color=[255, 128, 0]), }, # the joint_weights is modified by MMPose Team - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5 - ] + [1., 1., 1.2] + [1.5] * 6, - + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5] + [1.0, 1.0, 1.2] + [1.5] * 6, # 'https://github.com/Fang-Haoshu/Halpe-FullBody/blob/master/' # 'HalpeCOCOAPI/PythonAPI/halpecocotools/cocoeval.py#L245' sigmas=[ @@ -271,4 +97,5 @@ dataset_info = dict( 0.079, 0.079, 0.079, - ]) + ], +) diff --git a/mmpose/configs/_base_/datasets/horse10.py b/mmpose/configs/_base_/datasets/horse10.py index a485bf191bc151b0d76e48f3e55eb8e2dda6c506..813d3aa584478c6e254118161598e784ec35a0b8 100644 --- a/mmpose/configs/_base_/datasets/horse10.py +++ b/mmpose/configs/_base_/datasets/horse10.py @@ -1,201 +1,60 @@ dataset_info = dict( - dataset_name='horse10', + dataset_name="horse10", paper_info=dict( - author='Mathis, Alexander and Biasi, Thomas and ' - 'Schneider, Steffen and ' - 'Yuksekgonul, Mert and Rogers, Byron and ' - 'Bethge, Matthias and ' - 'Mathis, Mackenzie W', - title='Pretraining boosts out-of-domain robustness ' - 'for pose estimation', - container='Proceedings of the IEEE/CVF Winter Conference on ' - 'Applications of Computer Vision', - year='2021', - homepage='http://www.mackenziemathislab.org/horse10', + author="Mathis, Alexander and Biasi, Thomas and " + "Schneider, Steffen and " + "Yuksekgonul, Mert and Rogers, Byron and " + "Bethge, Matthias and " + "Mathis, Mackenzie W", + title="Pretraining boosts out-of-domain robustness " "for pose estimation", + container="Proceedings of the IEEE/CVF Winter Conference on " "Applications of Computer Vision", + year="2021", + homepage="http://www.mackenziemathislab.org/horse10", ), keypoint_info={ - 0: - dict(name='Nose', id=0, color=[255, 153, 255], type='upper', swap=''), - 1: - dict(name='Eye', id=1, color=[255, 153, 255], type='upper', swap=''), - 2: - dict( - name='Nearknee', - id=2, - color=[255, 102, 255], - type='upper', - swap=''), - 3: - dict( - name='Nearfrontfetlock', - id=3, - color=[255, 102, 255], - type='upper', - swap=''), - 4: - dict( - name='Nearfrontfoot', - id=4, - color=[255, 102, 255], - type='upper', - swap=''), - 5: - dict( - name='Offknee', id=5, color=[255, 102, 255], type='upper', - swap=''), - 6: - dict( - name='Offfrontfetlock', - id=6, - color=[255, 102, 255], - type='upper', - swap=''), - 7: - dict( - name='Offfrontfoot', - id=7, - color=[255, 102, 255], - type='upper', - swap=''), - 8: - dict( - name='Shoulder', - id=8, - color=[255, 153, 255], - type='upper', - swap=''), - 9: - dict( - name='Midshoulder', - id=9, - color=[255, 153, 255], - type='upper', - swap=''), - 10: - dict( - name='Elbow', id=10, color=[255, 153, 255], type='upper', swap=''), - 11: - dict( - name='Girth', id=11, color=[255, 153, 255], type='upper', swap=''), - 12: - dict( - name='Wither', id=12, color=[255, 153, 255], type='upper', - swap=''), - 13: - dict( - name='Nearhindhock', - id=13, - color=[255, 51, 255], - type='lower', - swap=''), - 14: - dict( - name='Nearhindfetlock', - id=14, - color=[255, 51, 255], - type='lower', - swap=''), - 15: - dict( - name='Nearhindfoot', - id=15, - color=[255, 51, 255], - type='lower', - swap=''), - 16: - dict(name='Hip', id=16, color=[255, 153, 255], type='lower', swap=''), - 17: - dict( - name='Stifle', id=17, color=[255, 153, 255], type='lower', - swap=''), - 18: - dict( - name='Offhindhock', - id=18, - color=[255, 51, 255], - type='lower', - swap=''), - 19: - dict( - name='Offhindfetlock', - id=19, - color=[255, 51, 255], - type='lower', - swap=''), - 20: - dict( - name='Offhindfoot', - id=20, - color=[255, 51, 255], - type='lower', - swap=''), - 21: - dict( - name='Ischium', - id=21, - color=[255, 153, 255], - type='lower', - swap='') + 0: dict(name="Nose", id=0, color=[255, 153, 255], type="upper", swap=""), + 1: dict(name="Eye", id=1, color=[255, 153, 255], type="upper", swap=""), + 2: dict(name="Nearknee", id=2, color=[255, 102, 255], type="upper", swap=""), + 3: dict(name="Nearfrontfetlock", id=3, color=[255, 102, 255], type="upper", swap=""), + 4: dict(name="Nearfrontfoot", id=4, color=[255, 102, 255], type="upper", swap=""), + 5: dict(name="Offknee", id=5, color=[255, 102, 255], type="upper", swap=""), + 6: dict(name="Offfrontfetlock", id=6, color=[255, 102, 255], type="upper", swap=""), + 7: dict(name="Offfrontfoot", id=7, color=[255, 102, 255], type="upper", swap=""), + 8: dict(name="Shoulder", id=8, color=[255, 153, 255], type="upper", swap=""), + 9: dict(name="Midshoulder", id=9, color=[255, 153, 255], type="upper", swap=""), + 10: dict(name="Elbow", id=10, color=[255, 153, 255], type="upper", swap=""), + 11: dict(name="Girth", id=11, color=[255, 153, 255], type="upper", swap=""), + 12: dict(name="Wither", id=12, color=[255, 153, 255], type="upper", swap=""), + 13: dict(name="Nearhindhock", id=13, color=[255, 51, 255], type="lower", swap=""), + 14: dict(name="Nearhindfetlock", id=14, color=[255, 51, 255], type="lower", swap=""), + 15: dict(name="Nearhindfoot", id=15, color=[255, 51, 255], type="lower", swap=""), + 16: dict(name="Hip", id=16, color=[255, 153, 255], type="lower", swap=""), + 17: dict(name="Stifle", id=17, color=[255, 153, 255], type="lower", swap=""), + 18: dict(name="Offhindhock", id=18, color=[255, 51, 255], type="lower", swap=""), + 19: dict(name="Offhindfetlock", id=19, color=[255, 51, 255], type="lower", swap=""), + 20: dict(name="Offhindfoot", id=20, color=[255, 51, 255], type="lower", swap=""), + 21: dict(name="Ischium", id=21, color=[255, 153, 255], type="lower", swap=""), }, skeleton_info={ - 0: - dict(link=('Nose', 'Eye'), id=0, color=[255, 153, 255]), - 1: - dict(link=('Eye', 'Wither'), id=1, color=[255, 153, 255]), - 2: - dict(link=('Wither', 'Hip'), id=2, color=[255, 153, 255]), - 3: - dict(link=('Hip', 'Ischium'), id=3, color=[255, 153, 255]), - 4: - dict(link=('Ischium', 'Stifle'), id=4, color=[255, 153, 255]), - 5: - dict(link=('Stifle', 'Girth'), id=5, color=[255, 153, 255]), - 6: - dict(link=('Girth', 'Elbow'), id=6, color=[255, 153, 255]), - 7: - dict(link=('Elbow', 'Shoulder'), id=7, color=[255, 153, 255]), - 8: - dict(link=('Shoulder', 'Midshoulder'), id=8, color=[255, 153, 255]), - 9: - dict(link=('Midshoulder', 'Wither'), id=9, color=[255, 153, 255]), - 10: - dict( - link=('Nearknee', 'Nearfrontfetlock'), - id=10, - color=[255, 102, 255]), - 11: - dict( - link=('Nearfrontfetlock', 'Nearfrontfoot'), - id=11, - color=[255, 102, 255]), - 12: - dict( - link=('Offknee', 'Offfrontfetlock'), id=12, color=[255, 102, 255]), - 13: - dict( - link=('Offfrontfetlock', 'Offfrontfoot'), - id=13, - color=[255, 102, 255]), - 14: - dict( - link=('Nearhindhock', 'Nearhindfetlock'), - id=14, - color=[255, 51, 255]), - 15: - dict( - link=('Nearhindfetlock', 'Nearhindfoot'), - id=15, - color=[255, 51, 255]), - 16: - dict( - link=('Offhindhock', 'Offhindfetlock'), - id=16, - color=[255, 51, 255]), - 17: - dict( - link=('Offhindfetlock', 'Offhindfoot'), - id=17, - color=[255, 51, 255]) + 0: dict(link=("Nose", "Eye"), id=0, color=[255, 153, 255]), + 1: dict(link=("Eye", "Wither"), id=1, color=[255, 153, 255]), + 2: dict(link=("Wither", "Hip"), id=2, color=[255, 153, 255]), + 3: dict(link=("Hip", "Ischium"), id=3, color=[255, 153, 255]), + 4: dict(link=("Ischium", "Stifle"), id=4, color=[255, 153, 255]), + 5: dict(link=("Stifle", "Girth"), id=5, color=[255, 153, 255]), + 6: dict(link=("Girth", "Elbow"), id=6, color=[255, 153, 255]), + 7: dict(link=("Elbow", "Shoulder"), id=7, color=[255, 153, 255]), + 8: dict(link=("Shoulder", "Midshoulder"), id=8, color=[255, 153, 255]), + 9: dict(link=("Midshoulder", "Wither"), id=9, color=[255, 153, 255]), + 10: dict(link=("Nearknee", "Nearfrontfetlock"), id=10, color=[255, 102, 255]), + 11: dict(link=("Nearfrontfetlock", "Nearfrontfoot"), id=11, color=[255, 102, 255]), + 12: dict(link=("Offknee", "Offfrontfetlock"), id=12, color=[255, 102, 255]), + 13: dict(link=("Offfrontfetlock", "Offfrontfoot"), id=13, color=[255, 102, 255]), + 14: dict(link=("Nearhindhock", "Nearhindfetlock"), id=14, color=[255, 51, 255]), + 15: dict(link=("Nearhindfetlock", "Nearhindfoot"), id=15, color=[255, 51, 255]), + 16: dict(link=("Offhindhock", "Offhindfetlock"), id=16, color=[255, 51, 255]), + 17: dict(link=("Offhindfetlock", "Offhindfoot"), id=17, color=[255, 51, 255]), }, - joint_weights=[1.] * 22, - sigmas=[]) + joint_weights=[1.0] * 22, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/humanart.py b/mmpose/configs/_base_/datasets/humanart.py index b549269b692682f5dd0350e6f557edeb25730126..37ce56b6519c17d9a92647303234cee9d44b033c 100644 --- a/mmpose/configs/_base_/datasets/humanart.py +++ b/mmpose/configs/_base_/datasets/humanart.py @@ -1,181 +1,52 @@ dataset_info = dict( - dataset_name='Human-Art', + dataset_name="Human-Art", paper_info=dict( - author='Ju, Xuan and Zeng, Ailing and ' - 'Wang, Jianan and Xu, Qiang and Zhang, Lei', - title='Human-Art: A Versatile Human-Centric Dataset ' - 'Bridging Natural and Artificial Scenes', - container='Proceedings of the IEEE/CVF Conference on ' - 'Computer Vision and Pattern Recognition', - year='2023', - homepage='https://idea-research.github.io/HumanArt/', + author="Ju, Xuan and Zeng, Ailing and " "Wang, Jianan and Xu, Qiang and Zhang, Lei", + title="Human-Art: A Versatile Human-Centric Dataset " "Bridging Natural and Artificial Scenes", + container="Proceedings of the IEEE/CVF Conference on " "Computer Vision and Pattern Recognition", + year="2023", + homepage="https://idea-research.github.io/HumanArt/", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle') + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'right_shoulder'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), - 14: - dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), - 16: - dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), - 18: - dict( - link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]) + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5 - ], - sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089 - ]) + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5], + sigmas=[0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089], +) diff --git a/mmpose/configs/_base_/datasets/humanart21.py b/mmpose/configs/_base_/datasets/humanart21.py index e6d935d1a97e00aac7830a01f587840321b68625..5713836b3e6d932b19ac001071fe1f73be8352f2 100644 --- a/mmpose/configs/_base_/datasets/humanart21.py +++ b/mmpose/configs/_base_/datasets/humanart21.py @@ -1,218 +1,82 @@ dataset_info = dict( - dataset_name='Human-Art', + dataset_name="Human-Art", paper_info=dict( - author='Ju, Xuan and Zeng, Ailing and ' - 'Wang, Jianan and Xu, Qiang and Zhang, Lei', - title='Human-Art: A Versatile Human-Centric Dataset ' - 'Bridging Natural and Artificial Scenes', - container='Proceedings of the IEEE/CVF Conference on ' - 'Computer Vision and Pattern Recognition', - year='2023', - homepage='https://idea-research.github.io/HumanArt/', + author="Ju, Xuan and Zeng, Ailing and " "Wang, Jianan and Xu, Qiang and Zhang, Lei", + title="Human-Art: A Versatile Human-Centric Dataset " "Bridging Natural and Artificial Scenes", + container="Proceedings of the IEEE/CVF Conference on " "Computer Vision and Pattern Recognition", + year="2023", + homepage="https://idea-research.github.io/HumanArt/", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 17: - dict( - name='left_finger', - id=17, - color=[0, 255, 0], - type='lower', - swap='right_finger'), - 18: - dict( - name='right_finger', - id=18, - color=[255, 128, 0], - type='lower', - swap='left_finger'), - 19: - dict( - name='left_toe', - id=19, - color=[0, 255, 0], - type='lower', - swap='right_toe'), - 20: - dict( - name='right_toe', - id=20, - color=[255, 128, 0], - type='lower', - swap='left_toe'), + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), + 17: dict(name="left_finger", id=17, color=[0, 255, 0], type="lower", swap="right_finger"), + 18: dict(name="right_finger", id=18, color=[255, 128, 0], type="lower", swap="left_finger"), + 19: dict(name="left_toe", id=19, color=[0, 255, 0], type="lower", swap="right_toe"), + 20: dict(name="right_toe", id=20, color=[255, 128, 0], type="lower", swap="left_toe"), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'right_shoulder'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), - 14: - dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), - 16: - dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), - 18: - dict( - link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]), - 19: - dict(link=('left_ankle', 'left_toe'), id=19, color=[0, 255, 0]), - 20: - dict(link=('right_ankle', 'right_toe'), id=20, color=[255, 128, 0]), - 21: - dict(link=('left_wrist', 'left_finger'), id=21, color=[0, 255, 0]), - 22: - dict(link=('right_wrist', 'right_finger'), id=22, color=[255, 128, 0]), + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), + 19: dict(link=("left_ankle", "left_toe"), id=19, color=[0, 255, 0]), + 20: dict(link=("right_ankle", "right_toe"), id=20, color=[255, 128, 0]), + 21: dict(link=("left_wrist", "left_finger"), id=21, color=[0, 255, 0]), + 22: dict(link=("right_wrist", "right_finger"), id=22, color=[255, 128, 0]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5, 1., 1., 1., 1. - ], + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.0, 1.0], sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, 0.089, 0.089, 0.089, - 0.089 - ]) + 0.026, + 0.025, + 0.025, + 0.035, + 0.035, + 0.079, + 0.079, + 0.072, + 0.072, + 0.062, + 0.062, + 0.107, + 0.107, + 0.087, + 0.087, + 0.089, + 0.089, + 0.089, + 0.089, + 0.089, + 0.089, + ], +) diff --git a/mmpose/configs/_base_/datasets/humanart_aic.py b/mmpose/configs/_base_/datasets/humanart_aic.py index e99942753606346d24d19ece5a52b55dff72840f..581c8625390e6010e73f2752563b2574dfa1e112 100644 --- a/mmpose/configs/_base_/datasets/humanart_aic.py +++ b/mmpose/configs/_base_/datasets/humanart_aic.py @@ -1,205 +1,87 @@ dataset_info = dict( - dataset_name='humanart', + dataset_name="humanart", paper_info=[ dict( - author='Ju, Xuan and Zeng, Ailing and ' - 'Wang, Jianan and Xu, Qiang and Zhang, ' - 'Lei', - title='Human-Art: A Versatile Human-Centric Dataset ' - 'Bridging Natural and Artificial Scenes', - container='CVPR', - year='2023', - homepage='https://idea-research.github.io/HumanArt/', + author="Ju, Xuan and Zeng, Ailing and " "Wang, Jianan and Xu, Qiang and Zhang, " "Lei", + title="Human-Art: A Versatile Human-Centric Dataset " "Bridging Natural and Artificial Scenes", + container="CVPR", + year="2023", + homepage="https://idea-research.github.io/HumanArt/", ), dict( - author='Wu, Jiahong and Zheng, He and Zhao, Bo and ' - 'Li, Yixin and Yan, Baoming and Liang, Rui and ' - 'Wang, Wenjia and Zhou, Shipei and Lin, Guosen and ' - 'Fu, Yanwei and others', - title='Ai challenger: A large-scale dataset for going ' - 'deeper in image understanding', - container='arXiv', - year='2017', - homepage='https://github.com/AIChallenger/AI_Challenger_2017', + author="Wu, Jiahong and Zheng, He and Zhao, Bo and " + "Li, Yixin and Yan, Baoming and Liang, Rui and " + "Wang, Wenjia and Zhou, Shipei and Lin, Guosen and " + "Fu, Yanwei and others", + title="Ai challenger: A large-scale dataset for going " "deeper in image understanding", + container="arXiv", + year="2017", + homepage="https://github.com/AIChallenger/AI_Challenger_2017", ), ], keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 17: - dict( - name='head_top', - id=17, - color=[51, 153, 255], - type='upper', - swap=''), - 18: - dict(name='neck', id=18, color=[51, 153, 255], type='upper', swap='') + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), + 17: dict(name="head_top", id=17, color=[51, 153, 255], type="upper", swap=""), + 18: dict(name="neck", id=18, color=[51, 153, 255], type="upper", swap=""), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'right_shoulder'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), - 14: - dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), - 16: - dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), - 18: - dict( - link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]), - 19: - dict(link=('head_top', 'neck'), id=11, color=[51, 153, 255]), + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), + 19: dict(link=("head_top", "neck"), id=11, color=[51, 153, 255]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5, 1.5 - ], + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.5], sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, 0.026, 0.026 - ]) + 0.026, + 0.025, + 0.025, + 0.035, + 0.035, + 0.079, + 0.079, + 0.072, + 0.072, + 0.062, + 0.062, + 0.107, + 0.107, + 0.087, + 0.087, + 0.089, + 0.089, + 0.026, + 0.026, + ], +) diff --git a/mmpose/configs/_base_/datasets/interhand2d.py b/mmpose/configs/_base_/datasets/interhand2d.py index 0134f07de5bf536eaffbf71155a7e6eb33b24f0a..1649564b5d3571cbe8bb013fb92408956376682e 100644 --- a/mmpose/configs/_base_/datasets/interhand2d.py +++ b/mmpose/configs/_base_/datasets/interhand2d.py @@ -1,142 +1,57 @@ dataset_info = dict( - dataset_name='interhand2d', + dataset_name="interhand2d", paper_info=dict( - author='Moon, Gyeongsik and Yu, Shoou-I and Wen, He and ' - 'Shiratori, Takaaki and Lee, Kyoung Mu', - title='InterHand2.6M: A dataset and baseline for 3D ' - 'interacting hand pose estimation from a single RGB image', - container='arXiv', - year='2020', - homepage='https://mks0601.github.io/InterHand2.6M/', + author="Moon, Gyeongsik and Yu, Shoou-I and Wen, He and " "Shiratori, Takaaki and Lee, Kyoung Mu", + title="InterHand2.6M: A dataset and baseline for 3D " "interacting hand pose estimation from a single RGB image", + container="arXiv", + year="2020", + homepage="https://mks0601.github.io/InterHand2.6M/", ), keypoint_info={ - 0: - dict(name='thumb4', id=0, color=[255, 128, 0], type='', swap=''), - 1: - dict(name='thumb3', id=1, color=[255, 128, 0], type='', swap=''), - 2: - dict(name='thumb2', id=2, color=[255, 128, 0], type='', swap=''), - 3: - dict(name='thumb1', id=3, color=[255, 128, 0], type='', swap=''), - 4: - dict( - name='forefinger4', id=4, color=[255, 153, 255], type='', swap=''), - 5: - dict( - name='forefinger3', id=5, color=[255, 153, 255], type='', swap=''), - 6: - dict( - name='forefinger2', id=6, color=[255, 153, 255], type='', swap=''), - 7: - dict( - name='forefinger1', id=7, color=[255, 153, 255], type='', swap=''), - 8: - dict( - name='middle_finger4', - id=8, - color=[102, 178, 255], - type='', - swap=''), - 9: - dict( - name='middle_finger3', - id=9, - color=[102, 178, 255], - type='', - swap=''), - 10: - dict( - name='middle_finger2', - id=10, - color=[102, 178, 255], - type='', - swap=''), - 11: - dict( - name='middle_finger1', - id=11, - color=[102, 178, 255], - type='', - swap=''), - 12: - dict( - name='ring_finger4', id=12, color=[255, 51, 51], type='', swap=''), - 13: - dict( - name='ring_finger3', id=13, color=[255, 51, 51], type='', swap=''), - 14: - dict( - name='ring_finger2', id=14, color=[255, 51, 51], type='', swap=''), - 15: - dict( - name='ring_finger1', id=15, color=[255, 51, 51], type='', swap=''), - 16: - dict(name='pinky_finger4', id=16, color=[0, 255, 0], type='', swap=''), - 17: - dict(name='pinky_finger3', id=17, color=[0, 255, 0], type='', swap=''), - 18: - dict(name='pinky_finger2', id=18, color=[0, 255, 0], type='', swap=''), - 19: - dict(name='pinky_finger1', id=19, color=[0, 255, 0], type='', swap=''), - 20: - dict(name='wrist', id=20, color=[255, 255, 255], type='', swap='') + 0: dict(name="thumb4", id=0, color=[255, 128, 0], type="", swap=""), + 1: dict(name="thumb3", id=1, color=[255, 128, 0], type="", swap=""), + 2: dict(name="thumb2", id=2, color=[255, 128, 0], type="", swap=""), + 3: dict(name="thumb1", id=3, color=[255, 128, 0], type="", swap=""), + 4: dict(name="forefinger4", id=4, color=[255, 153, 255], type="", swap=""), + 5: dict(name="forefinger3", id=5, color=[255, 153, 255], type="", swap=""), + 6: dict(name="forefinger2", id=6, color=[255, 153, 255], type="", swap=""), + 7: dict(name="forefinger1", id=7, color=[255, 153, 255], type="", swap=""), + 8: dict(name="middle_finger4", id=8, color=[102, 178, 255], type="", swap=""), + 9: dict(name="middle_finger3", id=9, color=[102, 178, 255], type="", swap=""), + 10: dict(name="middle_finger2", id=10, color=[102, 178, 255], type="", swap=""), + 11: dict(name="middle_finger1", id=11, color=[102, 178, 255], type="", swap=""), + 12: dict(name="ring_finger4", id=12, color=[255, 51, 51], type="", swap=""), + 13: dict(name="ring_finger3", id=13, color=[255, 51, 51], type="", swap=""), + 14: dict(name="ring_finger2", id=14, color=[255, 51, 51], type="", swap=""), + 15: dict(name="ring_finger1", id=15, color=[255, 51, 51], type="", swap=""), + 16: dict(name="pinky_finger4", id=16, color=[0, 255, 0], type="", swap=""), + 17: dict(name="pinky_finger3", id=17, color=[0, 255, 0], type="", swap=""), + 18: dict(name="pinky_finger2", id=18, color=[0, 255, 0], type="", swap=""), + 19: dict(name="pinky_finger1", id=19, color=[0, 255, 0], type="", swap=""), + 20: dict(name="wrist", id=20, color=[255, 255, 255], type="", swap=""), }, skeleton_info={ - 0: - dict(link=('wrist', 'thumb1'), id=0, color=[255, 128, 0]), - 1: - dict(link=('thumb1', 'thumb2'), id=1, color=[255, 128, 0]), - 2: - dict(link=('thumb2', 'thumb3'), id=2, color=[255, 128, 0]), - 3: - dict(link=('thumb3', 'thumb4'), id=3, color=[255, 128, 0]), - 4: - dict(link=('wrist', 'forefinger1'), id=4, color=[255, 153, 255]), - 5: - dict(link=('forefinger1', 'forefinger2'), id=5, color=[255, 153, 255]), - 6: - dict(link=('forefinger2', 'forefinger3'), id=6, color=[255, 153, 255]), - 7: - dict(link=('forefinger3', 'forefinger4'), id=7, color=[255, 153, 255]), - 8: - dict(link=('wrist', 'middle_finger1'), id=8, color=[102, 178, 255]), - 9: - dict( - link=('middle_finger1', 'middle_finger2'), - id=9, - color=[102, 178, 255]), - 10: - dict( - link=('middle_finger2', 'middle_finger3'), - id=10, - color=[102, 178, 255]), - 11: - dict( - link=('middle_finger3', 'middle_finger4'), - id=11, - color=[102, 178, 255]), - 12: - dict(link=('wrist', 'ring_finger1'), id=12, color=[255, 51, 51]), - 13: - dict( - link=('ring_finger1', 'ring_finger2'), id=13, color=[255, 51, 51]), - 14: - dict( - link=('ring_finger2', 'ring_finger3'), id=14, color=[255, 51, 51]), - 15: - dict( - link=('ring_finger3', 'ring_finger4'), id=15, color=[255, 51, 51]), - 16: - dict(link=('wrist', 'pinky_finger1'), id=16, color=[0, 255, 0]), - 17: - dict( - link=('pinky_finger1', 'pinky_finger2'), id=17, color=[0, 255, 0]), - 18: - dict( - link=('pinky_finger2', 'pinky_finger3'), id=18, color=[0, 255, 0]), - 19: - dict( - link=('pinky_finger3', 'pinky_finger4'), id=19, color=[0, 255, 0]) + 0: dict(link=("wrist", "thumb1"), id=0, color=[255, 128, 0]), + 1: dict(link=("thumb1", "thumb2"), id=1, color=[255, 128, 0]), + 2: dict(link=("thumb2", "thumb3"), id=2, color=[255, 128, 0]), + 3: dict(link=("thumb3", "thumb4"), id=3, color=[255, 128, 0]), + 4: dict(link=("wrist", "forefinger1"), id=4, color=[255, 153, 255]), + 5: dict(link=("forefinger1", "forefinger2"), id=5, color=[255, 153, 255]), + 6: dict(link=("forefinger2", "forefinger3"), id=6, color=[255, 153, 255]), + 7: dict(link=("forefinger3", "forefinger4"), id=7, color=[255, 153, 255]), + 8: dict(link=("wrist", "middle_finger1"), id=8, color=[102, 178, 255]), + 9: dict(link=("middle_finger1", "middle_finger2"), id=9, color=[102, 178, 255]), + 10: dict(link=("middle_finger2", "middle_finger3"), id=10, color=[102, 178, 255]), + 11: dict(link=("middle_finger3", "middle_finger4"), id=11, color=[102, 178, 255]), + 12: dict(link=("wrist", "ring_finger1"), id=12, color=[255, 51, 51]), + 13: dict(link=("ring_finger1", "ring_finger2"), id=13, color=[255, 51, 51]), + 14: dict(link=("ring_finger2", "ring_finger3"), id=14, color=[255, 51, 51]), + 15: dict(link=("ring_finger3", "ring_finger4"), id=15, color=[255, 51, 51]), + 16: dict(link=("wrist", "pinky_finger1"), id=16, color=[0, 255, 0]), + 17: dict(link=("pinky_finger1", "pinky_finger2"), id=17, color=[0, 255, 0]), + 18: dict(link=("pinky_finger2", "pinky_finger3"), id=18, color=[0, 255, 0]), + 19: dict(link=("pinky_finger3", "pinky_finger4"), id=19, color=[0, 255, 0]), }, - joint_weights=[1.] * 21, - sigmas=[]) + joint_weights=[1.0] * 21, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/interhand3d.py b/mmpose/configs/_base_/datasets/interhand3d.py index e2bd8121c281c741ec9b980c7570ebef8a632993..09c5ae3256a38a9ff1bf3335672d74d1cb153a02 100644 --- a/mmpose/configs/_base_/datasets/interhand3d.py +++ b/mmpose/configs/_base_/datasets/interhand3d.py @@ -1,487 +1,98 @@ dataset_info = dict( - dataset_name='interhand3d', + dataset_name="interhand3d", paper_info=dict( - author='Moon, Gyeongsik and Yu, Shoou-I and Wen, He and ' - 'Shiratori, Takaaki and Lee, Kyoung Mu', - title='InterHand2.6M: A dataset and baseline for 3D ' - 'interacting hand pose estimation from a single RGB image', - container='arXiv', - year='2020', - homepage='https://mks0601.github.io/InterHand2.6M/', + author="Moon, Gyeongsik and Yu, Shoou-I and Wen, He and " "Shiratori, Takaaki and Lee, Kyoung Mu", + title="InterHand2.6M: A dataset and baseline for 3D " "interacting hand pose estimation from a single RGB image", + container="arXiv", + year="2020", + homepage="https://mks0601.github.io/InterHand2.6M/", ), keypoint_info={ - 0: - dict( - name='right_thumb4', - id=0, - color=[255, 128, 0], - type='', - swap='left_thumb4'), - 1: - dict( - name='right_thumb3', - id=1, - color=[255, 128, 0], - type='', - swap='left_thumb3'), - 2: - dict( - name='right_thumb2', - id=2, - color=[255, 128, 0], - type='', - swap='left_thumb2'), - 3: - dict( - name='right_thumb1', - id=3, - color=[255, 128, 0], - type='', - swap='left_thumb1'), - 4: - dict( - name='right_forefinger4', - id=4, - color=[255, 153, 255], - type='', - swap='left_forefinger4'), - 5: - dict( - name='right_forefinger3', - id=5, - color=[255, 153, 255], - type='', - swap='left_forefinger3'), - 6: - dict( - name='right_forefinger2', - id=6, - color=[255, 153, 255], - type='', - swap='left_forefinger2'), - 7: - dict( - name='right_forefinger1', - id=7, - color=[255, 153, 255], - type='', - swap='left_forefinger1'), - 8: - dict( - name='right_middle_finger4', - id=8, - color=[102, 178, 255], - type='', - swap='left_middle_finger4'), - 9: - dict( - name='right_middle_finger3', - id=9, - color=[102, 178, 255], - type='', - swap='left_middle_finger3'), - 10: - dict( - name='right_middle_finger2', - id=10, - color=[102, 178, 255], - type='', - swap='left_middle_finger2'), - 11: - dict( - name='right_middle_finger1', - id=11, - color=[102, 178, 255], - type='', - swap='left_middle_finger1'), - 12: - dict( - name='right_ring_finger4', - id=12, - color=[255, 51, 51], - type='', - swap='left_ring_finger4'), - 13: - dict( - name='right_ring_finger3', - id=13, - color=[255, 51, 51], - type='', - swap='left_ring_finger3'), - 14: - dict( - name='right_ring_finger2', - id=14, - color=[255, 51, 51], - type='', - swap='left_ring_finger2'), - 15: - dict( - name='right_ring_finger1', - id=15, - color=[255, 51, 51], - type='', - swap='left_ring_finger1'), - 16: - dict( - name='right_pinky_finger4', - id=16, - color=[0, 255, 0], - type='', - swap='left_pinky_finger4'), - 17: - dict( - name='right_pinky_finger3', - id=17, - color=[0, 255, 0], - type='', - swap='left_pinky_finger3'), - 18: - dict( - name='right_pinky_finger2', - id=18, - color=[0, 255, 0], - type='', - swap='left_pinky_finger2'), - 19: - dict( - name='right_pinky_finger1', - id=19, - color=[0, 255, 0], - type='', - swap='left_pinky_finger1'), - 20: - dict( - name='right_wrist', - id=20, - color=[255, 255, 255], - type='', - swap='left_wrist'), - 21: - dict( - name='left_thumb4', - id=21, - color=[255, 128, 0], - type='', - swap='right_thumb4'), - 22: - dict( - name='left_thumb3', - id=22, - color=[255, 128, 0], - type='', - swap='right_thumb3'), - 23: - dict( - name='left_thumb2', - id=23, - color=[255, 128, 0], - type='', - swap='right_thumb2'), - 24: - dict( - name='left_thumb1', - id=24, - color=[255, 128, 0], - type='', - swap='right_thumb1'), - 25: - dict( - name='left_forefinger4', - id=25, - color=[255, 153, 255], - type='', - swap='right_forefinger4'), - 26: - dict( - name='left_forefinger3', - id=26, - color=[255, 153, 255], - type='', - swap='right_forefinger3'), - 27: - dict( - name='left_forefinger2', - id=27, - color=[255, 153, 255], - type='', - swap='right_forefinger2'), - 28: - dict( - name='left_forefinger1', - id=28, - color=[255, 153, 255], - type='', - swap='right_forefinger1'), - 29: - dict( - name='left_middle_finger4', - id=29, - color=[102, 178, 255], - type='', - swap='right_middle_finger4'), - 30: - dict( - name='left_middle_finger3', - id=30, - color=[102, 178, 255], - type='', - swap='right_middle_finger3'), - 31: - dict( - name='left_middle_finger2', - id=31, - color=[102, 178, 255], - type='', - swap='right_middle_finger2'), - 32: - dict( - name='left_middle_finger1', - id=32, - color=[102, 178, 255], - type='', - swap='right_middle_finger1'), - 33: - dict( - name='left_ring_finger4', - id=33, - color=[255, 51, 51], - type='', - swap='right_ring_finger4'), - 34: - dict( - name='left_ring_finger3', - id=34, - color=[255, 51, 51], - type='', - swap='right_ring_finger3'), - 35: - dict( - name='left_ring_finger2', - id=35, - color=[255, 51, 51], - type='', - swap='right_ring_finger2'), - 36: - dict( - name='left_ring_finger1', - id=36, - color=[255, 51, 51], - type='', - swap='right_ring_finger1'), - 37: - dict( - name='left_pinky_finger4', - id=37, - color=[0, 255, 0], - type='', - swap='right_pinky_finger4'), - 38: - dict( - name='left_pinky_finger3', - id=38, - color=[0, 255, 0], - type='', - swap='right_pinky_finger3'), - 39: - dict( - name='left_pinky_finger2', - id=39, - color=[0, 255, 0], - type='', - swap='right_pinky_finger2'), - 40: - dict( - name='left_pinky_finger1', - id=40, - color=[0, 255, 0], - type='', - swap='right_pinky_finger1'), - 41: - dict( - name='left_wrist', - id=41, - color=[255, 255, 255], - type='', - swap='right_wrist'), + 0: dict(name="right_thumb4", id=0, color=[255, 128, 0], type="", swap="left_thumb4"), + 1: dict(name="right_thumb3", id=1, color=[255, 128, 0], type="", swap="left_thumb3"), + 2: dict(name="right_thumb2", id=2, color=[255, 128, 0], type="", swap="left_thumb2"), + 3: dict(name="right_thumb1", id=3, color=[255, 128, 0], type="", swap="left_thumb1"), + 4: dict(name="right_forefinger4", id=4, color=[255, 153, 255], type="", swap="left_forefinger4"), + 5: dict(name="right_forefinger3", id=5, color=[255, 153, 255], type="", swap="left_forefinger3"), + 6: dict(name="right_forefinger2", id=6, color=[255, 153, 255], type="", swap="left_forefinger2"), + 7: dict(name="right_forefinger1", id=7, color=[255, 153, 255], type="", swap="left_forefinger1"), + 8: dict(name="right_middle_finger4", id=8, color=[102, 178, 255], type="", swap="left_middle_finger4"), + 9: dict(name="right_middle_finger3", id=9, color=[102, 178, 255], type="", swap="left_middle_finger3"), + 10: dict(name="right_middle_finger2", id=10, color=[102, 178, 255], type="", swap="left_middle_finger2"), + 11: dict(name="right_middle_finger1", id=11, color=[102, 178, 255], type="", swap="left_middle_finger1"), + 12: dict(name="right_ring_finger4", id=12, color=[255, 51, 51], type="", swap="left_ring_finger4"), + 13: dict(name="right_ring_finger3", id=13, color=[255, 51, 51], type="", swap="left_ring_finger3"), + 14: dict(name="right_ring_finger2", id=14, color=[255, 51, 51], type="", swap="left_ring_finger2"), + 15: dict(name="right_ring_finger1", id=15, color=[255, 51, 51], type="", swap="left_ring_finger1"), + 16: dict(name="right_pinky_finger4", id=16, color=[0, 255, 0], type="", swap="left_pinky_finger4"), + 17: dict(name="right_pinky_finger3", id=17, color=[0, 255, 0], type="", swap="left_pinky_finger3"), + 18: dict(name="right_pinky_finger2", id=18, color=[0, 255, 0], type="", swap="left_pinky_finger2"), + 19: dict(name="right_pinky_finger1", id=19, color=[0, 255, 0], type="", swap="left_pinky_finger1"), + 20: dict(name="right_wrist", id=20, color=[255, 255, 255], type="", swap="left_wrist"), + 21: dict(name="left_thumb4", id=21, color=[255, 128, 0], type="", swap="right_thumb4"), + 22: dict(name="left_thumb3", id=22, color=[255, 128, 0], type="", swap="right_thumb3"), + 23: dict(name="left_thumb2", id=23, color=[255, 128, 0], type="", swap="right_thumb2"), + 24: dict(name="left_thumb1", id=24, color=[255, 128, 0], type="", swap="right_thumb1"), + 25: dict(name="left_forefinger4", id=25, color=[255, 153, 255], type="", swap="right_forefinger4"), + 26: dict(name="left_forefinger3", id=26, color=[255, 153, 255], type="", swap="right_forefinger3"), + 27: dict(name="left_forefinger2", id=27, color=[255, 153, 255], type="", swap="right_forefinger2"), + 28: dict(name="left_forefinger1", id=28, color=[255, 153, 255], type="", swap="right_forefinger1"), + 29: dict(name="left_middle_finger4", id=29, color=[102, 178, 255], type="", swap="right_middle_finger4"), + 30: dict(name="left_middle_finger3", id=30, color=[102, 178, 255], type="", swap="right_middle_finger3"), + 31: dict(name="left_middle_finger2", id=31, color=[102, 178, 255], type="", swap="right_middle_finger2"), + 32: dict(name="left_middle_finger1", id=32, color=[102, 178, 255], type="", swap="right_middle_finger1"), + 33: dict(name="left_ring_finger4", id=33, color=[255, 51, 51], type="", swap="right_ring_finger4"), + 34: dict(name="left_ring_finger3", id=34, color=[255, 51, 51], type="", swap="right_ring_finger3"), + 35: dict(name="left_ring_finger2", id=35, color=[255, 51, 51], type="", swap="right_ring_finger2"), + 36: dict(name="left_ring_finger1", id=36, color=[255, 51, 51], type="", swap="right_ring_finger1"), + 37: dict(name="left_pinky_finger4", id=37, color=[0, 255, 0], type="", swap="right_pinky_finger4"), + 38: dict(name="left_pinky_finger3", id=38, color=[0, 255, 0], type="", swap="right_pinky_finger3"), + 39: dict(name="left_pinky_finger2", id=39, color=[0, 255, 0], type="", swap="right_pinky_finger2"), + 40: dict(name="left_pinky_finger1", id=40, color=[0, 255, 0], type="", swap="right_pinky_finger1"), + 41: dict(name="left_wrist", id=41, color=[255, 255, 255], type="", swap="right_wrist"), }, skeleton_info={ - 0: - dict(link=('right_wrist', 'right_thumb1'), id=0, color=[255, 128, 0]), - 1: - dict(link=('right_thumb1', 'right_thumb2'), id=1, color=[255, 128, 0]), - 2: - dict(link=('right_thumb2', 'right_thumb3'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_thumb3', 'right_thumb4'), id=3, color=[255, 128, 0]), - 4: - dict( - link=('right_wrist', 'right_forefinger1'), - id=4, - color=[255, 153, 255]), - 5: - dict( - link=('right_forefinger1', 'right_forefinger2'), - id=5, - color=[255, 153, 255]), - 6: - dict( - link=('right_forefinger2', 'right_forefinger3'), - id=6, - color=[255, 153, 255]), - 7: - dict( - link=('right_forefinger3', 'right_forefinger4'), - id=7, - color=[255, 153, 255]), - 8: - dict( - link=('right_wrist', 'right_middle_finger1'), - id=8, - color=[102, 178, 255]), - 9: - dict( - link=('right_middle_finger1', 'right_middle_finger2'), - id=9, - color=[102, 178, 255]), - 10: - dict( - link=('right_middle_finger2', 'right_middle_finger3'), - id=10, - color=[102, 178, 255]), - 11: - dict( - link=('right_middle_finger3', 'right_middle_finger4'), - id=11, - color=[102, 178, 255]), - 12: - dict( - link=('right_wrist', 'right_ring_finger1'), - id=12, - color=[255, 51, 51]), - 13: - dict( - link=('right_ring_finger1', 'right_ring_finger2'), - id=13, - color=[255, 51, 51]), - 14: - dict( - link=('right_ring_finger2', 'right_ring_finger3'), - id=14, - color=[255, 51, 51]), - 15: - dict( - link=('right_ring_finger3', 'right_ring_finger4'), - id=15, - color=[255, 51, 51]), - 16: - dict( - link=('right_wrist', 'right_pinky_finger1'), - id=16, - color=[0, 255, 0]), - 17: - dict( - link=('right_pinky_finger1', 'right_pinky_finger2'), - id=17, - color=[0, 255, 0]), - 18: - dict( - link=('right_pinky_finger2', 'right_pinky_finger3'), - id=18, - color=[0, 255, 0]), - 19: - dict( - link=('right_pinky_finger3', 'right_pinky_finger4'), - id=19, - color=[0, 255, 0]), - 20: - dict(link=('left_wrist', 'left_thumb1'), id=20, color=[255, 128, 0]), - 21: - dict(link=('left_thumb1', 'left_thumb2'), id=21, color=[255, 128, 0]), - 22: - dict(link=('left_thumb2', 'left_thumb3'), id=22, color=[255, 128, 0]), - 23: - dict(link=('left_thumb3', 'left_thumb4'), id=23, color=[255, 128, 0]), - 24: - dict( - link=('left_wrist', 'left_forefinger1'), - id=24, - color=[255, 153, 255]), - 25: - dict( - link=('left_forefinger1', 'left_forefinger2'), - id=25, - color=[255, 153, 255]), - 26: - dict( - link=('left_forefinger2', 'left_forefinger3'), - id=26, - color=[255, 153, 255]), - 27: - dict( - link=('left_forefinger3', 'left_forefinger4'), - id=27, - color=[255, 153, 255]), - 28: - dict( - link=('left_wrist', 'left_middle_finger1'), - id=28, - color=[102, 178, 255]), - 29: - dict( - link=('left_middle_finger1', 'left_middle_finger2'), - id=29, - color=[102, 178, 255]), - 30: - dict( - link=('left_middle_finger2', 'left_middle_finger3'), - id=30, - color=[102, 178, 255]), - 31: - dict( - link=('left_middle_finger3', 'left_middle_finger4'), - id=31, - color=[102, 178, 255]), - 32: - dict( - link=('left_wrist', 'left_ring_finger1'), - id=32, - color=[255, 51, 51]), - 33: - dict( - link=('left_ring_finger1', 'left_ring_finger2'), - id=33, - color=[255, 51, 51]), - 34: - dict( - link=('left_ring_finger2', 'left_ring_finger3'), - id=34, - color=[255, 51, 51]), - 35: - dict( - link=('left_ring_finger3', 'left_ring_finger4'), - id=35, - color=[255, 51, 51]), - 36: - dict( - link=('left_wrist', 'left_pinky_finger1'), - id=36, - color=[0, 255, 0]), - 37: - dict( - link=('left_pinky_finger1', 'left_pinky_finger2'), - id=37, - color=[0, 255, 0]), - 38: - dict( - link=('left_pinky_finger2', 'left_pinky_finger3'), - id=38, - color=[0, 255, 0]), - 39: - dict( - link=('left_pinky_finger3', 'left_pinky_finger4'), - id=39, - color=[0, 255, 0]), + 0: dict(link=("right_wrist", "right_thumb1"), id=0, color=[255, 128, 0]), + 1: dict(link=("right_thumb1", "right_thumb2"), id=1, color=[255, 128, 0]), + 2: dict(link=("right_thumb2", "right_thumb3"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_thumb3", "right_thumb4"), id=3, color=[255, 128, 0]), + 4: dict(link=("right_wrist", "right_forefinger1"), id=4, color=[255, 153, 255]), + 5: dict(link=("right_forefinger1", "right_forefinger2"), id=5, color=[255, 153, 255]), + 6: dict(link=("right_forefinger2", "right_forefinger3"), id=6, color=[255, 153, 255]), + 7: dict(link=("right_forefinger3", "right_forefinger4"), id=7, color=[255, 153, 255]), + 8: dict(link=("right_wrist", "right_middle_finger1"), id=8, color=[102, 178, 255]), + 9: dict(link=("right_middle_finger1", "right_middle_finger2"), id=9, color=[102, 178, 255]), + 10: dict(link=("right_middle_finger2", "right_middle_finger3"), id=10, color=[102, 178, 255]), + 11: dict(link=("right_middle_finger3", "right_middle_finger4"), id=11, color=[102, 178, 255]), + 12: dict(link=("right_wrist", "right_ring_finger1"), id=12, color=[255, 51, 51]), + 13: dict(link=("right_ring_finger1", "right_ring_finger2"), id=13, color=[255, 51, 51]), + 14: dict(link=("right_ring_finger2", "right_ring_finger3"), id=14, color=[255, 51, 51]), + 15: dict(link=("right_ring_finger3", "right_ring_finger4"), id=15, color=[255, 51, 51]), + 16: dict(link=("right_wrist", "right_pinky_finger1"), id=16, color=[0, 255, 0]), + 17: dict(link=("right_pinky_finger1", "right_pinky_finger2"), id=17, color=[0, 255, 0]), + 18: dict(link=("right_pinky_finger2", "right_pinky_finger3"), id=18, color=[0, 255, 0]), + 19: dict(link=("right_pinky_finger3", "right_pinky_finger4"), id=19, color=[0, 255, 0]), + 20: dict(link=("left_wrist", "left_thumb1"), id=20, color=[255, 128, 0]), + 21: dict(link=("left_thumb1", "left_thumb2"), id=21, color=[255, 128, 0]), + 22: dict(link=("left_thumb2", "left_thumb3"), id=22, color=[255, 128, 0]), + 23: dict(link=("left_thumb3", "left_thumb4"), id=23, color=[255, 128, 0]), + 24: dict(link=("left_wrist", "left_forefinger1"), id=24, color=[255, 153, 255]), + 25: dict(link=("left_forefinger1", "left_forefinger2"), id=25, color=[255, 153, 255]), + 26: dict(link=("left_forefinger2", "left_forefinger3"), id=26, color=[255, 153, 255]), + 27: dict(link=("left_forefinger3", "left_forefinger4"), id=27, color=[255, 153, 255]), + 28: dict(link=("left_wrist", "left_middle_finger1"), id=28, color=[102, 178, 255]), + 29: dict(link=("left_middle_finger1", "left_middle_finger2"), id=29, color=[102, 178, 255]), + 30: dict(link=("left_middle_finger2", "left_middle_finger3"), id=30, color=[102, 178, 255]), + 31: dict(link=("left_middle_finger3", "left_middle_finger4"), id=31, color=[102, 178, 255]), + 32: dict(link=("left_wrist", "left_ring_finger1"), id=32, color=[255, 51, 51]), + 33: dict(link=("left_ring_finger1", "left_ring_finger2"), id=33, color=[255, 51, 51]), + 34: dict(link=("left_ring_finger2", "left_ring_finger3"), id=34, color=[255, 51, 51]), + 35: dict(link=("left_ring_finger3", "left_ring_finger4"), id=35, color=[255, 51, 51]), + 36: dict(link=("left_wrist", "left_pinky_finger1"), id=36, color=[0, 255, 0]), + 37: dict(link=("left_pinky_finger1", "left_pinky_finger2"), id=37, color=[0, 255, 0]), + 38: dict(link=("left_pinky_finger2", "left_pinky_finger3"), id=38, color=[0, 255, 0]), + 39: dict(link=("left_pinky_finger3", "left_pinky_finger4"), id=39, color=[0, 255, 0]), }, - joint_weights=[1.] * 42, - sigmas=[]) + joint_weights=[1.0] * 42, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/jhmdb.py b/mmpose/configs/_base_/datasets/jhmdb.py index 1b37488498a2bade1fa6f2ff6532fcd219071803..ab73be2964dfd876c0feaa278c60f211a86a6b39 100644 --- a/mmpose/configs/_base_/datasets/jhmdb.py +++ b/mmpose/configs/_base_/datasets/jhmdb.py @@ -1,129 +1,46 @@ dataset_info = dict( - dataset_name='jhmdb', + dataset_name="jhmdb", paper_info=dict( - author='H. Jhuang and J. Gall and S. Zuffi and ' - 'C. Schmid and M. J. Black', - title='Towards understanding action recognition', - container='International Conf. on Computer Vision (ICCV)', - year='2013', - homepage='http://jhmdb.is.tue.mpg.de/dataset', + author="H. Jhuang and J. Gall and S. Zuffi and " "C. Schmid and M. J. Black", + title="Towards understanding action recognition", + container="International Conf. on Computer Vision (ICCV)", + year="2013", + homepage="http://jhmdb.is.tue.mpg.de/dataset", ), keypoint_info={ - 0: - dict(name='neck', id=0, color=[255, 128, 0], type='upper', swap=''), - 1: - dict(name='belly', id=1, color=[255, 128, 0], type='upper', swap=''), - 2: - dict(name='head', id=2, color=[255, 128, 0], type='upper', swap=''), - 3: - dict( - name='right_shoulder', - id=3, - color=[0, 255, 0], - type='upper', - swap='left_shoulder'), - 4: - dict( - name='left_shoulder', - id=4, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 5: - dict( - name='right_hip', - id=5, - color=[0, 255, 0], - type='lower', - swap='left_hip'), - 6: - dict( - name='left_hip', - id=6, - color=[51, 153, 255], - type='lower', - swap='right_hip'), - 7: - dict( - name='right_elbow', - id=7, - color=[51, 153, 255], - type='upper', - swap='left_elbow'), - 8: - dict( - name='left_elbow', - id=8, - color=[51, 153, 255], - type='upper', - swap='right_elbow'), - 9: - dict( - name='right_knee', - id=9, - color=[51, 153, 255], - type='lower', - swap='left_knee'), - 10: - dict( - name='left_knee', - id=10, - color=[255, 128, 0], - type='lower', - swap='right_knee'), - 11: - dict( - name='right_wrist', - id=11, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 12: - dict( - name='left_wrist', - id=12, - color=[255, 128, 0], - type='upper', - swap='right_wrist'), - 13: - dict( - name='right_ankle', - id=13, - color=[0, 255, 0], - type='lower', - swap='left_ankle'), - 14: - dict( - name='left_ankle', - id=14, - color=[0, 255, 0], - type='lower', - swap='right_ankle') + 0: dict(name="neck", id=0, color=[255, 128, 0], type="upper", swap=""), + 1: dict(name="belly", id=1, color=[255, 128, 0], type="upper", swap=""), + 2: dict(name="head", id=2, color=[255, 128, 0], type="upper", swap=""), + 3: dict(name="right_shoulder", id=3, color=[0, 255, 0], type="upper", swap="left_shoulder"), + 4: dict(name="left_shoulder", id=4, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 5: dict(name="right_hip", id=5, color=[0, 255, 0], type="lower", swap="left_hip"), + 6: dict(name="left_hip", id=6, color=[51, 153, 255], type="lower", swap="right_hip"), + 7: dict(name="right_elbow", id=7, color=[51, 153, 255], type="upper", swap="left_elbow"), + 8: dict(name="left_elbow", id=8, color=[51, 153, 255], type="upper", swap="right_elbow"), + 9: dict(name="right_knee", id=9, color=[51, 153, 255], type="lower", swap="left_knee"), + 10: dict(name="left_knee", id=10, color=[255, 128, 0], type="lower", swap="right_knee"), + 11: dict(name="right_wrist", id=11, color=[255, 128, 0], type="upper", swap="left_wrist"), + 12: dict(name="left_wrist", id=12, color=[255, 128, 0], type="upper", swap="right_wrist"), + 13: dict(name="right_ankle", id=13, color=[0, 255, 0], type="lower", swap="left_ankle"), + 14: dict(name="left_ankle", id=14, color=[0, 255, 0], type="lower", swap="right_ankle"), }, skeleton_info={ - 0: dict(link=('right_ankle', 'right_knee'), id=0, color=[255, 128, 0]), - 1: dict(link=('right_knee', 'right_hip'), id=1, color=[255, 128, 0]), - 2: dict(link=('right_hip', 'belly'), id=2, color=[255, 128, 0]), - 3: dict(link=('belly', 'left_hip'), id=3, color=[0, 255, 0]), - 4: dict(link=('left_hip', 'left_knee'), id=4, color=[0, 255, 0]), - 5: dict(link=('left_knee', 'left_ankle'), id=5, color=[0, 255, 0]), - 6: dict(link=('belly', 'neck'), id=6, color=[51, 153, 255]), - 7: dict(link=('neck', 'head'), id=7, color=[51, 153, 255]), - 8: dict(link=('neck', 'right_shoulder'), id=8, color=[255, 128, 0]), - 9: dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('right_elbow', 'right_wrist'), id=10, color=[255, 128, 0]), - 11: dict(link=('neck', 'left_shoulder'), id=11, color=[0, 255, 0]), - 12: - dict(link=('left_shoulder', 'left_elbow'), id=12, color=[0, 255, 0]), - 13: dict(link=('left_elbow', 'left_wrist'), id=13, color=[0, 255, 0]) + 0: dict(link=("right_ankle", "right_knee"), id=0, color=[255, 128, 0]), + 1: dict(link=("right_knee", "right_hip"), id=1, color=[255, 128, 0]), + 2: dict(link=("right_hip", "belly"), id=2, color=[255, 128, 0]), + 3: dict(link=("belly", "left_hip"), id=3, color=[0, 255, 0]), + 4: dict(link=("left_hip", "left_knee"), id=4, color=[0, 255, 0]), + 5: dict(link=("left_knee", "left_ankle"), id=5, color=[0, 255, 0]), + 6: dict(link=("belly", "neck"), id=6, color=[51, 153, 255]), + 7: dict(link=("neck", "head"), id=7, color=[51, 153, 255]), + 8: dict(link=("neck", "right_shoulder"), id=8, color=[255, 128, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("right_elbow", "right_wrist"), id=10, color=[255, 128, 0]), + 11: dict(link=("neck", "left_shoulder"), id=11, color=[0, 255, 0]), + 12: dict(link=("left_shoulder", "left_elbow"), id=12, color=[0, 255, 0]), + 13: dict(link=("left_elbow", "left_wrist"), id=13, color=[0, 255, 0]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.2, 1.2, 1.5, 1.5, 1.5, 1.5 - ], + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.2, 1.2, 1.5, 1.5, 1.5, 1.5], # Adapted from COCO dataset. - sigmas=[ - 0.025, 0.107, 0.025, 0.079, 0.079, 0.107, 0.107, 0.072, 0.072, 0.087, - 0.087, 0.062, 0.062, 0.089, 0.089 - ]) + sigmas=[0.025, 0.107, 0.025, 0.079, 0.079, 0.107, 0.107, 0.072, 0.072, 0.087, 0.087, 0.062, 0.062, 0.089, 0.089], +) diff --git a/mmpose/configs/_base_/datasets/lapa.py b/mmpose/configs/_base_/datasets/lapa.py index 3929edd90ed94b8717fdac62017277d1435a74f2..95175c6ad4b92be07331c765e682df478cb23cda 100644 --- a/mmpose/configs/_base_/datasets/lapa.py +++ b/mmpose/configs/_base_/datasets/lapa.py @@ -1,246 +1,228 @@ dataset_info = dict( - dataset_name='lapa', + dataset_name="lapa", paper_info=dict( - author='Liu, Yinglu and Shi, Hailin and Shen, Hao and Si, ' - 'Yue and Wang, Xiaobo and Mei, Tao', - title='A New Dataset and Boundary-Attention Semantic ' - 'Segmentation for Face Parsing.', - container='Proceedings of the AAAI Conference on ' - 'Artificial Intelligence 2020', - year='2020', - homepage='https://github.com/JDAI-CV/lapa-dataset', + author="Liu, Yinglu and Shi, Hailin and Shen, Hao and Si, " "Yue and Wang, Xiaobo and Mei, Tao", + title="A New Dataset and Boundary-Attention Semantic " "Segmentation for Face Parsing.", + container="Proceedings of the AAAI Conference on " "Artificial Intelligence 2020", + year="2020", + homepage="https://github.com/JDAI-CV/lapa-dataset", ), keypoint_info={ - 0: - dict(name='kpt-0', id=0, color=[255, 0, 0], type='', swap='kpt-32'), - 1: - dict(name='kpt-1', id=1, color=[255, 0, 0], type='', swap='kpt-31'), - 2: - dict(name='kpt-2', id=2, color=[255, 0, 0], type='', swap='kpt-30'), - 3: - dict(name='kpt-3', id=3, color=[255, 0, 0], type='', swap='kpt-29'), - 4: - dict(name='kpt-4', id=4, color=[255, 0, 0], type='', swap='kpt-28'), - 5: - dict(name='kpt-5', id=5, color=[255, 0, 0], type='', swap='kpt-27'), - 6: - dict(name='kpt-6', id=6, color=[255, 0, 0], type='', swap='kpt-26'), - 7: - dict(name='kpt-7', id=7, color=[255, 0, 0], type='', swap='kpt-25'), - 8: - dict(name='kpt-8', id=8, color=[255, 0, 0], type='', swap='kpt-24'), - 9: - dict(name='kpt-9', id=9, color=[255, 0, 0], type='', swap='kpt-23'), - 10: - dict(name='kpt-10', id=10, color=[255, 0, 0], type='', swap='kpt-22'), - 11: - dict(name='kpt-11', id=11, color=[255, 0, 0], type='', swap='kpt-21'), - 12: - dict(name='kpt-12', id=12, color=[255, 0, 0], type='', swap='kpt-20'), - 13: - dict(name='kpt-13', id=13, color=[255, 0, 0], type='', swap='kpt-19'), - 14: - dict(name='kpt-14', id=14, color=[255, 0, 0], type='', swap='kpt-18'), - 15: - dict(name='kpt-15', id=15, color=[255, 0, 0], type='', swap='kpt-17'), - 16: - dict(name='kpt-16', id=16, color=[255, 0, 0], type='', swap=''), - 17: - dict(name='kpt-17', id=17, color=[255, 0, 0], type='', swap='kpt-15'), - 18: - dict(name='kpt-18', id=18, color=[255, 0, 0], type='', swap='kpt-14'), - 19: - dict(name='kpt-19', id=19, color=[255, 0, 0], type='', swap='kpt-13'), - 20: - dict(name='kpt-20', id=20, color=[255, 0, 0], type='', swap='kpt-12'), - 21: - dict(name='kpt-21', id=21, color=[255, 0, 0], type='', swap='kpt-11'), - 22: - dict(name='kpt-22', id=22, color=[255, 0, 0], type='', swap='kpt-10'), - 23: - dict(name='kpt-23', id=23, color=[255, 0, 0], type='', swap='kpt-9'), - 24: - dict(name='kpt-24', id=24, color=[255, 0, 0], type='', swap='kpt-8'), - 25: - dict(name='kpt-25', id=25, color=[255, 0, 0], type='', swap='kpt-7'), - 26: - dict(name='kpt-26', id=26, color=[255, 0, 0], type='', swap='kpt-6'), - 27: - dict(name='kpt-27', id=27, color=[255, 0, 0], type='', swap='kpt-5'), - 28: - dict(name='kpt-28', id=28, color=[255, 0, 0], type='', swap='kpt-4'), - 29: - dict(name='kpt-29', id=29, color=[255, 0, 0], type='', swap='kpt-3'), - 30: - dict(name='kpt-30', id=30, color=[255, 0, 0], type='', swap='kpt-2'), - 31: - dict(name='kpt-31', id=31, color=[255, 0, 0], type='', swap='kpt-1'), - 32: - dict(name='kpt-32', id=32, color=[255, 0, 0], type='', swap='kpt-0'), - 33: - dict(name='kpt-33', id=33, color=[255, 0, 0], type='', swap='kpt-46'), - 34: - dict(name='kpt-34', id=34, color=[255, 0, 0], type='', swap='kpt-45'), - 35: - dict(name='kpt-35', id=35, color=[255, 0, 0], type='', swap='kpt-44'), - 36: - dict(name='kpt-36', id=36, color=[255, 0, 0], type='', swap='kpt-43'), - 37: - dict(name='kpt-37', id=37, color=[255, 0, 0], type='', swap='kpt-42'), - 38: - dict(name='kpt-38', id=38, color=[255, 0, 0], type='', swap='kpt-50'), - 39: - dict(name='kpt-39', id=39, color=[255, 0, 0], type='', swap='kpt-49'), - 40: - dict(name='kpt-40', id=40, color=[255, 0, 0], type='', swap='kpt-48'), - 41: - dict(name='kpt-41', id=41, color=[255, 0, 0], type='', swap='kpt-47'), - 42: - dict(name='kpt-42', id=42, color=[255, 0, 0], type='', swap='kpt-37'), - 43: - dict(name='kpt-43', id=43, color=[255, 0, 0], type='', swap='kpt-36'), - 44: - dict(name='kpt-44', id=44, color=[255, 0, 0], type='', swap='kpt-35'), - 45: - dict(name='kpt-45', id=45, color=[255, 0, 0], type='', swap='kpt-34'), - 46: - dict(name='kpt-46', id=46, color=[255, 0, 0], type='', swap='kpt-33'), - 47: - dict(name='kpt-47', id=47, color=[255, 0, 0], type='', swap='kpt-41'), - 48: - dict(name='kpt-48', id=48, color=[255, 0, 0], type='', swap='kpt-40'), - 49: - dict(name='kpt-49', id=49, color=[255, 0, 0], type='', swap='kpt-39'), - 50: - dict(name='kpt-50', id=50, color=[255, 0, 0], type='', swap='kpt-38'), - 51: - dict(name='kpt-51', id=51, color=[255, 0, 0], type='', swap=''), - 52: - dict(name='kpt-52', id=52, color=[255, 0, 0], type='', swap=''), - 53: - dict(name='kpt-53', id=53, color=[255, 0, 0], type='', swap=''), - 54: - dict(name='kpt-54', id=54, color=[255, 0, 0], type='', swap=''), - 55: - dict(name='kpt-55', id=55, color=[255, 0, 0], type='', swap='kpt-65'), - 56: - dict(name='kpt-56', id=56, color=[255, 0, 0], type='', swap='kpt-64'), - 57: - dict(name='kpt-57', id=57, color=[255, 0, 0], type='', swap='kpt-63'), - 58: - dict(name='kpt-58', id=58, color=[255, 0, 0], type='', swap='kpt-62'), - 59: - dict(name='kpt-59', id=59, color=[255, 0, 0], type='', swap='kpt-61'), - 60: - dict(name='kpt-60', id=60, color=[255, 0, 0], type='', swap=''), - 61: - dict(name='kpt-61', id=61, color=[255, 0, 0], type='', swap='kpt-59'), - 62: - dict(name='kpt-62', id=62, color=[255, 0, 0], type='', swap='kpt-58'), - 63: - dict(name='kpt-63', id=63, color=[255, 0, 0], type='', swap='kpt-57'), - 64: - dict(name='kpt-64', id=64, color=[255, 0, 0], type='', swap='kpt-56'), - 65: - dict(name='kpt-65', id=65, color=[255, 0, 0], type='', swap='kpt-55'), - 66: - dict(name='kpt-66', id=66, color=[255, 0, 0], type='', swap='kpt-79'), - 67: - dict(name='kpt-67', id=67, color=[255, 0, 0], type='', swap='kpt-78'), - 68: - dict(name='kpt-68', id=68, color=[255, 0, 0], type='', swap='kpt-77'), - 69: - dict(name='kpt-69', id=69, color=[255, 0, 0], type='', swap='kpt-76'), - 70: - dict(name='kpt-70', id=70, color=[255, 0, 0], type='', swap='kpt-75'), - 71: - dict(name='kpt-71', id=71, color=[255, 0, 0], type='', swap='kpt-82'), - 72: - dict(name='kpt-72', id=72, color=[255, 0, 0], type='', swap='kpt-81'), - 73: - dict(name='kpt-73', id=73, color=[255, 0, 0], type='', swap='kpt-80'), - 74: - dict(name='kpt-74', id=74, color=[255, 0, 0], type='', swap='kpt-83'), - 75: - dict(name='kpt-75', id=75, color=[255, 0, 0], type='', swap='kpt-70'), - 76: - dict(name='kpt-76', id=76, color=[255, 0, 0], type='', swap='kpt-69'), - 77: - dict(name='kpt-77', id=77, color=[255, 0, 0], type='', swap='kpt-68'), - 78: - dict(name='kpt-78', id=78, color=[255, 0, 0], type='', swap='kpt-67'), - 79: - dict(name='kpt-79', id=79, color=[255, 0, 0], type='', swap='kpt-66'), - 80: - dict(name='kpt-80', id=80, color=[255, 0, 0], type='', swap='kpt-73'), - 81: - dict(name='kpt-81', id=81, color=[255, 0, 0], type='', swap='kpt-72'), - 82: - dict(name='kpt-82', id=82, color=[255, 0, 0], type='', swap='kpt-71'), - 83: - dict(name='kpt-83', id=83, color=[255, 0, 0], type='', swap='kpt-74'), - 84: - dict(name='kpt-84', id=84, color=[255, 0, 0], type='', swap='kpt-90'), - 85: - dict(name='kpt-85', id=85, color=[255, 0, 0], type='', swap='kpt-89'), - 86: - dict(name='kpt-86', id=86, color=[255, 0, 0], type='', swap='kpt-88'), - 87: - dict(name='kpt-87', id=87, color=[255, 0, 0], type='', swap=''), - 88: - dict(name='kpt-88', id=88, color=[255, 0, 0], type='', swap='kpt-86'), - 89: - dict(name='kpt-89', id=89, color=[255, 0, 0], type='', swap='kpt-85'), - 90: - dict(name='kpt-90', id=90, color=[255, 0, 0], type='', swap='kpt-84'), - 91: - dict(name='kpt-91', id=91, color=[255, 0, 0], type='', swap='kpt-95'), - 92: - dict(name='kpt-92', id=92, color=[255, 0, 0], type='', swap='kpt-94'), - 93: - dict(name='kpt-93', id=93, color=[255, 0, 0], type='', swap=''), - 94: - dict(name='kpt-94', id=94, color=[255, 0, 0], type='', swap='kpt-92'), - 95: - dict(name='kpt-95', id=95, color=[255, 0, 0], type='', swap='kpt-91'), - 96: - dict(name='kpt-96', id=96, color=[255, 0, 0], type='', swap='kpt-100'), - 97: - dict(name='kpt-97', id=97, color=[255, 0, 0], type='', swap='kpt-99'), - 98: - dict(name='kpt-98', id=98, color=[255, 0, 0], type='', swap=''), - 99: - dict(name='kpt-99', id=99, color=[255, 0, 0], type='', swap='kpt-97'), - 100: - dict( - name='kpt-100', id=100, color=[255, 0, 0], type='', swap='kpt-96'), - 101: - dict( - name='kpt-101', id=101, color=[255, 0, 0], type='', - swap='kpt-103'), - 102: - dict(name='kpt-102', id=102, color=[255, 0, 0], type='', swap=''), - 103: - dict( - name='kpt-103', id=103, color=[255, 0, 0], type='', - swap='kpt-101'), - 104: - dict( - name='kpt-104', id=104, color=[255, 0, 0], type='', - swap='kpt-105'), - 105: - dict( - name='kpt-105', id=105, color=[255, 0, 0], type='', swap='kpt-104') + 0: dict(name="kpt-0", id=0, color=[255, 0, 0], type="", swap="kpt-32"), + 1: dict(name="kpt-1", id=1, color=[255, 0, 0], type="", swap="kpt-31"), + 2: dict(name="kpt-2", id=2, color=[255, 0, 0], type="", swap="kpt-30"), + 3: dict(name="kpt-3", id=3, color=[255, 0, 0], type="", swap="kpt-29"), + 4: dict(name="kpt-4", id=4, color=[255, 0, 0], type="", swap="kpt-28"), + 5: dict(name="kpt-5", id=5, color=[255, 0, 0], type="", swap="kpt-27"), + 6: dict(name="kpt-6", id=6, color=[255, 0, 0], type="", swap="kpt-26"), + 7: dict(name="kpt-7", id=7, color=[255, 0, 0], type="", swap="kpt-25"), + 8: dict(name="kpt-8", id=8, color=[255, 0, 0], type="", swap="kpt-24"), + 9: dict(name="kpt-9", id=9, color=[255, 0, 0], type="", swap="kpt-23"), + 10: dict(name="kpt-10", id=10, color=[255, 0, 0], type="", swap="kpt-22"), + 11: dict(name="kpt-11", id=11, color=[255, 0, 0], type="", swap="kpt-21"), + 12: dict(name="kpt-12", id=12, color=[255, 0, 0], type="", swap="kpt-20"), + 13: dict(name="kpt-13", id=13, color=[255, 0, 0], type="", swap="kpt-19"), + 14: dict(name="kpt-14", id=14, color=[255, 0, 0], type="", swap="kpt-18"), + 15: dict(name="kpt-15", id=15, color=[255, 0, 0], type="", swap="kpt-17"), + 16: dict(name="kpt-16", id=16, color=[255, 0, 0], type="", swap=""), + 17: dict(name="kpt-17", id=17, color=[255, 0, 0], type="", swap="kpt-15"), + 18: dict(name="kpt-18", id=18, color=[255, 0, 0], type="", swap="kpt-14"), + 19: dict(name="kpt-19", id=19, color=[255, 0, 0], type="", swap="kpt-13"), + 20: dict(name="kpt-20", id=20, color=[255, 0, 0], type="", swap="kpt-12"), + 21: dict(name="kpt-21", id=21, color=[255, 0, 0], type="", swap="kpt-11"), + 22: dict(name="kpt-22", id=22, color=[255, 0, 0], type="", swap="kpt-10"), + 23: dict(name="kpt-23", id=23, color=[255, 0, 0], type="", swap="kpt-9"), + 24: dict(name="kpt-24", id=24, color=[255, 0, 0], type="", swap="kpt-8"), + 25: dict(name="kpt-25", id=25, color=[255, 0, 0], type="", swap="kpt-7"), + 26: dict(name="kpt-26", id=26, color=[255, 0, 0], type="", swap="kpt-6"), + 27: dict(name="kpt-27", id=27, color=[255, 0, 0], type="", swap="kpt-5"), + 28: dict(name="kpt-28", id=28, color=[255, 0, 0], type="", swap="kpt-4"), + 29: dict(name="kpt-29", id=29, color=[255, 0, 0], type="", swap="kpt-3"), + 30: dict(name="kpt-30", id=30, color=[255, 0, 0], type="", swap="kpt-2"), + 31: dict(name="kpt-31", id=31, color=[255, 0, 0], type="", swap="kpt-1"), + 32: dict(name="kpt-32", id=32, color=[255, 0, 0], type="", swap="kpt-0"), + 33: dict(name="kpt-33", id=33, color=[255, 0, 0], type="", swap="kpt-46"), + 34: dict(name="kpt-34", id=34, color=[255, 0, 0], type="", swap="kpt-45"), + 35: dict(name="kpt-35", id=35, color=[255, 0, 0], type="", swap="kpt-44"), + 36: dict(name="kpt-36", id=36, color=[255, 0, 0], type="", swap="kpt-43"), + 37: dict(name="kpt-37", id=37, color=[255, 0, 0], type="", swap="kpt-42"), + 38: dict(name="kpt-38", id=38, color=[255, 0, 0], type="", swap="kpt-50"), + 39: dict(name="kpt-39", id=39, color=[255, 0, 0], type="", swap="kpt-49"), + 40: dict(name="kpt-40", id=40, color=[255, 0, 0], type="", swap="kpt-48"), + 41: dict(name="kpt-41", id=41, color=[255, 0, 0], type="", swap="kpt-47"), + 42: dict(name="kpt-42", id=42, color=[255, 0, 0], type="", swap="kpt-37"), + 43: dict(name="kpt-43", id=43, color=[255, 0, 0], type="", swap="kpt-36"), + 44: dict(name="kpt-44", id=44, color=[255, 0, 0], type="", swap="kpt-35"), + 45: dict(name="kpt-45", id=45, color=[255, 0, 0], type="", swap="kpt-34"), + 46: dict(name="kpt-46", id=46, color=[255, 0, 0], type="", swap="kpt-33"), + 47: dict(name="kpt-47", id=47, color=[255, 0, 0], type="", swap="kpt-41"), + 48: dict(name="kpt-48", id=48, color=[255, 0, 0], type="", swap="kpt-40"), + 49: dict(name="kpt-49", id=49, color=[255, 0, 0], type="", swap="kpt-39"), + 50: dict(name="kpt-50", id=50, color=[255, 0, 0], type="", swap="kpt-38"), + 51: dict(name="kpt-51", id=51, color=[255, 0, 0], type="", swap=""), + 52: dict(name="kpt-52", id=52, color=[255, 0, 0], type="", swap=""), + 53: dict(name="kpt-53", id=53, color=[255, 0, 0], type="", swap=""), + 54: dict(name="kpt-54", id=54, color=[255, 0, 0], type="", swap=""), + 55: dict(name="kpt-55", id=55, color=[255, 0, 0], type="", swap="kpt-65"), + 56: dict(name="kpt-56", id=56, color=[255, 0, 0], type="", swap="kpt-64"), + 57: dict(name="kpt-57", id=57, color=[255, 0, 0], type="", swap="kpt-63"), + 58: dict(name="kpt-58", id=58, color=[255, 0, 0], type="", swap="kpt-62"), + 59: dict(name="kpt-59", id=59, color=[255, 0, 0], type="", swap="kpt-61"), + 60: dict(name="kpt-60", id=60, color=[255, 0, 0], type="", swap=""), + 61: dict(name="kpt-61", id=61, color=[255, 0, 0], type="", swap="kpt-59"), + 62: dict(name="kpt-62", id=62, color=[255, 0, 0], type="", swap="kpt-58"), + 63: dict(name="kpt-63", id=63, color=[255, 0, 0], type="", swap="kpt-57"), + 64: dict(name="kpt-64", id=64, color=[255, 0, 0], type="", swap="kpt-56"), + 65: dict(name="kpt-65", id=65, color=[255, 0, 0], type="", swap="kpt-55"), + 66: dict(name="kpt-66", id=66, color=[255, 0, 0], type="", swap="kpt-79"), + 67: dict(name="kpt-67", id=67, color=[255, 0, 0], type="", swap="kpt-78"), + 68: dict(name="kpt-68", id=68, color=[255, 0, 0], type="", swap="kpt-77"), + 69: dict(name="kpt-69", id=69, color=[255, 0, 0], type="", swap="kpt-76"), + 70: dict(name="kpt-70", id=70, color=[255, 0, 0], type="", swap="kpt-75"), + 71: dict(name="kpt-71", id=71, color=[255, 0, 0], type="", swap="kpt-82"), + 72: dict(name="kpt-72", id=72, color=[255, 0, 0], type="", swap="kpt-81"), + 73: dict(name="kpt-73", id=73, color=[255, 0, 0], type="", swap="kpt-80"), + 74: dict(name="kpt-74", id=74, color=[255, 0, 0], type="", swap="kpt-83"), + 75: dict(name="kpt-75", id=75, color=[255, 0, 0], type="", swap="kpt-70"), + 76: dict(name="kpt-76", id=76, color=[255, 0, 0], type="", swap="kpt-69"), + 77: dict(name="kpt-77", id=77, color=[255, 0, 0], type="", swap="kpt-68"), + 78: dict(name="kpt-78", id=78, color=[255, 0, 0], type="", swap="kpt-67"), + 79: dict(name="kpt-79", id=79, color=[255, 0, 0], type="", swap="kpt-66"), + 80: dict(name="kpt-80", id=80, color=[255, 0, 0], type="", swap="kpt-73"), + 81: dict(name="kpt-81", id=81, color=[255, 0, 0], type="", swap="kpt-72"), + 82: dict(name="kpt-82", id=82, color=[255, 0, 0], type="", swap="kpt-71"), + 83: dict(name="kpt-83", id=83, color=[255, 0, 0], type="", swap="kpt-74"), + 84: dict(name="kpt-84", id=84, color=[255, 0, 0], type="", swap="kpt-90"), + 85: dict(name="kpt-85", id=85, color=[255, 0, 0], type="", swap="kpt-89"), + 86: dict(name="kpt-86", id=86, color=[255, 0, 0], type="", swap="kpt-88"), + 87: dict(name="kpt-87", id=87, color=[255, 0, 0], type="", swap=""), + 88: dict(name="kpt-88", id=88, color=[255, 0, 0], type="", swap="kpt-86"), + 89: dict(name="kpt-89", id=89, color=[255, 0, 0], type="", swap="kpt-85"), + 90: dict(name="kpt-90", id=90, color=[255, 0, 0], type="", swap="kpt-84"), + 91: dict(name="kpt-91", id=91, color=[255, 0, 0], type="", swap="kpt-95"), + 92: dict(name="kpt-92", id=92, color=[255, 0, 0], type="", swap="kpt-94"), + 93: dict(name="kpt-93", id=93, color=[255, 0, 0], type="", swap=""), + 94: dict(name="kpt-94", id=94, color=[255, 0, 0], type="", swap="kpt-92"), + 95: dict(name="kpt-95", id=95, color=[255, 0, 0], type="", swap="kpt-91"), + 96: dict(name="kpt-96", id=96, color=[255, 0, 0], type="", swap="kpt-100"), + 97: dict(name="kpt-97", id=97, color=[255, 0, 0], type="", swap="kpt-99"), + 98: dict(name="kpt-98", id=98, color=[255, 0, 0], type="", swap=""), + 99: dict(name="kpt-99", id=99, color=[255, 0, 0], type="", swap="kpt-97"), + 100: dict(name="kpt-100", id=100, color=[255, 0, 0], type="", swap="kpt-96"), + 101: dict(name="kpt-101", id=101, color=[255, 0, 0], type="", swap="kpt-103"), + 102: dict(name="kpt-102", id=102, color=[255, 0, 0], type="", swap=""), + 103: dict(name="kpt-103", id=103, color=[255, 0, 0], type="", swap="kpt-101"), + 104: dict(name="kpt-104", id=104, color=[255, 0, 0], type="", swap="kpt-105"), + 105: dict(name="kpt-105", id=105, color=[255, 0, 0], type="", swap="kpt-104"), }, skeleton_info={}, joint_weights=[ - 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, - 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, - 0.8, 0.8, 0.8, 0.8, 0.8, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, - 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, - 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, - 2.0, 2.0, 2.0, 2.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, - 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, - 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.0, 1.0 + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 2.0, + 2.0, + 2.0, + 2.0, + 2.0, + 2.0, + 2.0, + 2.0, + 1.0, + 2.0, + 2.0, + 2.0, + 2.0, + 2.0, + 2.0, + 2.0, + 2.0, + 1.0, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.5, + 1.0, + 1.0, ], - sigmas=[]) + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/locust.py b/mmpose/configs/_base_/datasets/locust.py index db3fa15aa060b5806faae7a21f65460f77be2745..305ba78828d80f76a4d154dd70ff0f6626fb9b31 100644 --- a/mmpose/configs/_base_/datasets/locust.py +++ b/mmpose/configs/_base_/datasets/locust.py @@ -1,263 +1,79 @@ dataset_info = dict( - dataset_name='locust', + dataset_name="locust", paper_info=dict( - author='Graving, Jacob M and Chae, Daniel and Naik, Hemal and ' - 'Li, Liang and Koger, Benjamin and Costelloe, Blair R and ' - 'Couzin, Iain D', - title='DeepPoseKit, a software toolkit for fast and robust ' - 'animal pose estimation using deep learning', - container='Elife', - year='2019', - homepage='https://github.com/jgraving/DeepPoseKit-Data', + author="Graving, Jacob M and Chae, Daniel and Naik, Hemal and " + "Li, Liang and Koger, Benjamin and Costelloe, Blair R and " + "Couzin, Iain D", + title="DeepPoseKit, a software toolkit for fast and robust " "animal pose estimation using deep learning", + container="Elife", + year="2019", + homepage="https://github.com/jgraving/DeepPoseKit-Data", ), keypoint_info={ - 0: - dict(name='head', id=0, color=[255, 255, 255], type='', swap=''), - 1: - dict(name='neck', id=1, color=[255, 255, 255], type='', swap=''), - 2: - dict(name='thorax', id=2, color=[255, 255, 255], type='', swap=''), - 3: - dict(name='abdomen1', id=3, color=[255, 255, 255], type='', swap=''), - 4: - dict(name='abdomen2', id=4, color=[255, 255, 255], type='', swap=''), - 5: - dict( - name='anttipL', - id=5, - color=[255, 255, 255], - type='', - swap='anttipR'), - 6: - dict( - name='antbaseL', - id=6, - color=[255, 255, 255], - type='', - swap='antbaseR'), - 7: - dict(name='eyeL', id=7, color=[255, 255, 255], type='', swap='eyeR'), - 8: - dict( - name='forelegL1', - id=8, - color=[255, 255, 255], - type='', - swap='forelegR1'), - 9: - dict( - name='forelegL2', - id=9, - color=[255, 255, 255], - type='', - swap='forelegR2'), - 10: - dict( - name='forelegL3', - id=10, - color=[255, 255, 255], - type='', - swap='forelegR3'), - 11: - dict( - name='forelegL4', - id=11, - color=[255, 255, 255], - type='', - swap='forelegR4'), - 12: - dict( - name='midlegL1', - id=12, - color=[255, 255, 255], - type='', - swap='midlegR1'), - 13: - dict( - name='midlegL2', - id=13, - color=[255, 255, 255], - type='', - swap='midlegR2'), - 14: - dict( - name='midlegL3', - id=14, - color=[255, 255, 255], - type='', - swap='midlegR3'), - 15: - dict( - name='midlegL4', - id=15, - color=[255, 255, 255], - type='', - swap='midlegR4'), - 16: - dict( - name='hindlegL1', - id=16, - color=[255, 255, 255], - type='', - swap='hindlegR1'), - 17: - dict( - name='hindlegL2', - id=17, - color=[255, 255, 255], - type='', - swap='hindlegR2'), - 18: - dict( - name='hindlegL3', - id=18, - color=[255, 255, 255], - type='', - swap='hindlegR3'), - 19: - dict( - name='hindlegL4', - id=19, - color=[255, 255, 255], - type='', - swap='hindlegR4'), - 20: - dict( - name='anttipR', - id=20, - color=[255, 255, 255], - type='', - swap='anttipL'), - 21: - dict( - name='antbaseR', - id=21, - color=[255, 255, 255], - type='', - swap='antbaseL'), - 22: - dict(name='eyeR', id=22, color=[255, 255, 255], type='', swap='eyeL'), - 23: - dict( - name='forelegR1', - id=23, - color=[255, 255, 255], - type='', - swap='forelegL1'), - 24: - dict( - name='forelegR2', - id=24, - color=[255, 255, 255], - type='', - swap='forelegL2'), - 25: - dict( - name='forelegR3', - id=25, - color=[255, 255, 255], - type='', - swap='forelegL3'), - 26: - dict( - name='forelegR4', - id=26, - color=[255, 255, 255], - type='', - swap='forelegL4'), - 27: - dict( - name='midlegR1', - id=27, - color=[255, 255, 255], - type='', - swap='midlegL1'), - 28: - dict( - name='midlegR2', - id=28, - color=[255, 255, 255], - type='', - swap='midlegL2'), - 29: - dict( - name='midlegR3', - id=29, - color=[255, 255, 255], - type='', - swap='midlegL3'), - 30: - dict( - name='midlegR4', - id=30, - color=[255, 255, 255], - type='', - swap='midlegL4'), - 31: - dict( - name='hindlegR1', - id=31, - color=[255, 255, 255], - type='', - swap='hindlegL1'), - 32: - dict( - name='hindlegR2', - id=32, - color=[255, 255, 255], - type='', - swap='hindlegL2'), - 33: - dict( - name='hindlegR3', - id=33, - color=[255, 255, 255], - type='', - swap='hindlegL3'), - 34: - dict( - name='hindlegR4', - id=34, - color=[255, 255, 255], - type='', - swap='hindlegL4') + 0: dict(name="head", id=0, color=[255, 255, 255], type="", swap=""), + 1: dict(name="neck", id=1, color=[255, 255, 255], type="", swap=""), + 2: dict(name="thorax", id=2, color=[255, 255, 255], type="", swap=""), + 3: dict(name="abdomen1", id=3, color=[255, 255, 255], type="", swap=""), + 4: dict(name="abdomen2", id=4, color=[255, 255, 255], type="", swap=""), + 5: dict(name="anttipL", id=5, color=[255, 255, 255], type="", swap="anttipR"), + 6: dict(name="antbaseL", id=6, color=[255, 255, 255], type="", swap="antbaseR"), + 7: dict(name="eyeL", id=7, color=[255, 255, 255], type="", swap="eyeR"), + 8: dict(name="forelegL1", id=8, color=[255, 255, 255], type="", swap="forelegR1"), + 9: dict(name="forelegL2", id=9, color=[255, 255, 255], type="", swap="forelegR2"), + 10: dict(name="forelegL3", id=10, color=[255, 255, 255], type="", swap="forelegR3"), + 11: dict(name="forelegL4", id=11, color=[255, 255, 255], type="", swap="forelegR4"), + 12: dict(name="midlegL1", id=12, color=[255, 255, 255], type="", swap="midlegR1"), + 13: dict(name="midlegL2", id=13, color=[255, 255, 255], type="", swap="midlegR2"), + 14: dict(name="midlegL3", id=14, color=[255, 255, 255], type="", swap="midlegR3"), + 15: dict(name="midlegL4", id=15, color=[255, 255, 255], type="", swap="midlegR4"), + 16: dict(name="hindlegL1", id=16, color=[255, 255, 255], type="", swap="hindlegR1"), + 17: dict(name="hindlegL2", id=17, color=[255, 255, 255], type="", swap="hindlegR2"), + 18: dict(name="hindlegL3", id=18, color=[255, 255, 255], type="", swap="hindlegR3"), + 19: dict(name="hindlegL4", id=19, color=[255, 255, 255], type="", swap="hindlegR4"), + 20: dict(name="anttipR", id=20, color=[255, 255, 255], type="", swap="anttipL"), + 21: dict(name="antbaseR", id=21, color=[255, 255, 255], type="", swap="antbaseL"), + 22: dict(name="eyeR", id=22, color=[255, 255, 255], type="", swap="eyeL"), + 23: dict(name="forelegR1", id=23, color=[255, 255, 255], type="", swap="forelegL1"), + 24: dict(name="forelegR2", id=24, color=[255, 255, 255], type="", swap="forelegL2"), + 25: dict(name="forelegR3", id=25, color=[255, 255, 255], type="", swap="forelegL3"), + 26: dict(name="forelegR4", id=26, color=[255, 255, 255], type="", swap="forelegL4"), + 27: dict(name="midlegR1", id=27, color=[255, 255, 255], type="", swap="midlegL1"), + 28: dict(name="midlegR2", id=28, color=[255, 255, 255], type="", swap="midlegL2"), + 29: dict(name="midlegR3", id=29, color=[255, 255, 255], type="", swap="midlegL3"), + 30: dict(name="midlegR4", id=30, color=[255, 255, 255], type="", swap="midlegL4"), + 31: dict(name="hindlegR1", id=31, color=[255, 255, 255], type="", swap="hindlegL1"), + 32: dict(name="hindlegR2", id=32, color=[255, 255, 255], type="", swap="hindlegL2"), + 33: dict(name="hindlegR3", id=33, color=[255, 255, 255], type="", swap="hindlegL3"), + 34: dict(name="hindlegR4", id=34, color=[255, 255, 255], type="", swap="hindlegL4"), }, skeleton_info={ - 0: dict(link=('neck', 'head'), id=0, color=[255, 255, 255]), - 1: dict(link=('thorax', 'neck'), id=1, color=[255, 255, 255]), - 2: dict(link=('abdomen1', 'thorax'), id=2, color=[255, 255, 255]), - 3: dict(link=('abdomen2', 'abdomen1'), id=3, color=[255, 255, 255]), - 4: dict(link=('antbaseL', 'anttipL'), id=4, color=[255, 255, 255]), - 5: dict(link=('eyeL', 'antbaseL'), id=5, color=[255, 255, 255]), - 6: dict(link=('forelegL2', 'forelegL1'), id=6, color=[255, 255, 255]), - 7: dict(link=('forelegL3', 'forelegL2'), id=7, color=[255, 255, 255]), - 8: dict(link=('forelegL4', 'forelegL3'), id=8, color=[255, 255, 255]), - 9: dict(link=('midlegL2', 'midlegL1'), id=9, color=[255, 255, 255]), - 10: dict(link=('midlegL3', 'midlegL2'), id=10, color=[255, 255, 255]), - 11: dict(link=('midlegL4', 'midlegL3'), id=11, color=[255, 255, 255]), - 12: - dict(link=('hindlegL2', 'hindlegL1'), id=12, color=[255, 255, 255]), - 13: - dict(link=('hindlegL3', 'hindlegL2'), id=13, color=[255, 255, 255]), - 14: - dict(link=('hindlegL4', 'hindlegL3'), id=14, color=[255, 255, 255]), - 15: dict(link=('antbaseR', 'anttipR'), id=15, color=[255, 255, 255]), - 16: dict(link=('eyeR', 'antbaseR'), id=16, color=[255, 255, 255]), - 17: - dict(link=('forelegR2', 'forelegR1'), id=17, color=[255, 255, 255]), - 18: - dict(link=('forelegR3', 'forelegR2'), id=18, color=[255, 255, 255]), - 19: - dict(link=('forelegR4', 'forelegR3'), id=19, color=[255, 255, 255]), - 20: dict(link=('midlegR2', 'midlegR1'), id=20, color=[255, 255, 255]), - 21: dict(link=('midlegR3', 'midlegR2'), id=21, color=[255, 255, 255]), - 22: dict(link=('midlegR4', 'midlegR3'), id=22, color=[255, 255, 255]), - 23: - dict(link=('hindlegR2', 'hindlegR1'), id=23, color=[255, 255, 255]), - 24: - dict(link=('hindlegR3', 'hindlegR2'), id=24, color=[255, 255, 255]), - 25: - dict(link=('hindlegR4', 'hindlegR3'), id=25, color=[255, 255, 255]) + 0: dict(link=("neck", "head"), id=0, color=[255, 255, 255]), + 1: dict(link=("thorax", "neck"), id=1, color=[255, 255, 255]), + 2: dict(link=("abdomen1", "thorax"), id=2, color=[255, 255, 255]), + 3: dict(link=("abdomen2", "abdomen1"), id=3, color=[255, 255, 255]), + 4: dict(link=("antbaseL", "anttipL"), id=4, color=[255, 255, 255]), + 5: dict(link=("eyeL", "antbaseL"), id=5, color=[255, 255, 255]), + 6: dict(link=("forelegL2", "forelegL1"), id=6, color=[255, 255, 255]), + 7: dict(link=("forelegL3", "forelegL2"), id=7, color=[255, 255, 255]), + 8: dict(link=("forelegL4", "forelegL3"), id=8, color=[255, 255, 255]), + 9: dict(link=("midlegL2", "midlegL1"), id=9, color=[255, 255, 255]), + 10: dict(link=("midlegL3", "midlegL2"), id=10, color=[255, 255, 255]), + 11: dict(link=("midlegL4", "midlegL3"), id=11, color=[255, 255, 255]), + 12: dict(link=("hindlegL2", "hindlegL1"), id=12, color=[255, 255, 255]), + 13: dict(link=("hindlegL3", "hindlegL2"), id=13, color=[255, 255, 255]), + 14: dict(link=("hindlegL4", "hindlegL3"), id=14, color=[255, 255, 255]), + 15: dict(link=("antbaseR", "anttipR"), id=15, color=[255, 255, 255]), + 16: dict(link=("eyeR", "antbaseR"), id=16, color=[255, 255, 255]), + 17: dict(link=("forelegR2", "forelegR1"), id=17, color=[255, 255, 255]), + 18: dict(link=("forelegR3", "forelegR2"), id=18, color=[255, 255, 255]), + 19: dict(link=("forelegR4", "forelegR3"), id=19, color=[255, 255, 255]), + 20: dict(link=("midlegR2", "midlegR1"), id=20, color=[255, 255, 255]), + 21: dict(link=("midlegR3", "midlegR2"), id=21, color=[255, 255, 255]), + 22: dict(link=("midlegR4", "midlegR3"), id=22, color=[255, 255, 255]), + 23: dict(link=("hindlegR2", "hindlegR1"), id=23, color=[255, 255, 255]), + 24: dict(link=("hindlegR3", "hindlegR2"), id=24, color=[255, 255, 255]), + 25: dict(link=("hindlegR4", "hindlegR3"), id=25, color=[255, 255, 255]), }, - joint_weights=[1.] * 35, - sigmas=[]) + joint_weights=[1.0] * 35, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/macaque.py b/mmpose/configs/_base_/datasets/macaque.py index ea8dac297ea2f0e36dabccccc021d953216a6ac8..ea86652d77a07c64ac024a22ff4d1b6558464ce5 100644 --- a/mmpose/configs/_base_/datasets/macaque.py +++ b/mmpose/configs/_base_/datasets/macaque.py @@ -1,183 +1,55 @@ dataset_info = dict( - dataset_name='macaque', + dataset_name="macaque", paper_info=dict( - author='Labuguen, Rollyn and Matsumoto, Jumpei and ' - 'Negrete, Salvador and Nishimaru, Hiroshi and ' - 'Nishijo, Hisao and Takada, Masahiko and ' - 'Go, Yasuhiro and Inoue, Ken-ichi and Shibata, Tomohiro', - title='MacaquePose: A novel "in the wild" macaque monkey pose dataset ' - 'for markerless motion capture', - container='bioRxiv', - year='2020', - homepage='http://www.pri.kyoto-u.ac.jp/datasets/' - 'macaquepose/index.html', + author="Labuguen, Rollyn and Matsumoto, Jumpei and " + "Negrete, Salvador and Nishimaru, Hiroshi and " + "Nishijo, Hisao and Takada, Masahiko and " + "Go, Yasuhiro and Inoue, Ken-ichi and Shibata, Tomohiro", + title='MacaquePose: A novel "in the wild" macaque monkey pose dataset ' "for markerless motion capture", + container="bioRxiv", + year="2020", + homepage="http://www.pri.kyoto-u.ac.jp/datasets/" "macaquepose/index.html", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle') + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'right_shoulder'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), - 14: - dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), - 16: - dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), - 18: - dict( - link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]) + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5 - ], - sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089 - ]) + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5], + sigmas=[0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089], +) diff --git a/mmpose/configs/_base_/datasets/merged_COCO_AIC_MPII.py b/mmpose/configs/_base_/datasets/merged_COCO_AIC_MPII.py index acfff665d2ee53890be41ee73deb8a4e939ea15d..88f05abeb4a40f6f31a3199ff0ec954f6d8db25b 100644 --- a/mmpose/configs/_base_/datasets/merged_COCO_AIC_MPII.py +++ b/mmpose/configs/_base_/datasets/merged_COCO_AIC_MPII.py @@ -1,250 +1,93 @@ dataset_info = dict( - dataset_name='merged_COCO_AIC_MPII', + dataset_name="merged_COCO_AIC_MPII", paper_info=dict( - author='Miroslav Purkrabek', - title='Merged Pose Estimation Dataset', - container='', - year='2024', - homepage='', + author="Miroslav Purkrabek", + title="Merged Pose Estimation Dataset", + container="", + year="2024", + homepage="", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 17: - dict( - name='thorax_mpii', - id=17, - color=[255, 128, 0], - type='upper', - swap=''), - 18: - dict( - name='neck_mpii', - id=18, - color=[255, 128, 0], - type='upper', - swap=''), - 19: - dict( - name='neck_aic', - id=19, - color=[255, 128, 0], - type='upper', - swap=''), - 20: - dict( - name='top_head', - id=20, - color=[255, 128, 0], - type='upper', - swap=''), - 21: - dict( - name='pelvis', - id=21, - color=[255, 128, 0], - type='lower', - swap=''), + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), + 17: dict(name="thorax_mpii", id=17, color=[255, 128, 0], type="upper", swap=""), + 18: dict(name="neck_mpii", id=18, color=[255, 128, 0], type="upper", swap=""), + 19: dict(name="neck_aic", id=19, color=[255, 128, 0], type="upper", swap=""), + 20: dict(name="top_head", id=20, color=[255, 128, 0], type="upper", swap=""), + 21: dict(name="pelvis", id=21, color=[255, 128, 0], type="lower", swap=""), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), # 4: # dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 4: - dict(link=('left_hip', 'pelvis'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), + 4: dict(link=("left_hip", "pelvis"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), # 7: # dict( # link=('left_shoulder', 'right_shoulder'), # id=7, # color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'thorax_mpii'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), - 14: - dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), - 16: - dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), - 18: - dict( - link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]), - 19: - dict(link=('pelvis', 'right_hip'), id=19, color=[51, 153, 255]), - 20: - dict( - link=('right_shoulder', 'thorax_mpii'), - id=20, - color=[51, 153, 255]), - 21: - dict( - link=('thorax_mpii', 'neck_mpii'), - id=21, - color=[51, 153, 255]), - 22: - dict( - link=('thorax_mpii', 'neck_aic'), - id=22, - color=[51, 153, 255]), - 23: - dict( - link=('left_ear', 'top_head'), - id=23, - color=[51, 153, 255]), - 24: - dict( - link=('right_ear', 'top_head'), - id=24, - color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "thorax_mpii"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), + 19: dict(link=("pelvis", "right_hip"), id=19, color=[51, 153, 255]), + 20: dict(link=("right_shoulder", "thorax_mpii"), id=20, color=[51, 153, 255]), + 21: dict(link=("thorax_mpii", "neck_mpii"), id=21, color=[51, 153, 255]), + 22: dict(link=("thorax_mpii", "neck_aic"), id=22, color=[51, 153, 255]), + 23: dict(link=("left_ear", "top_head"), id=23, color=[51, 153, 255]), + 24: dict(link=("right_ear", "top_head"), id=24, color=[51, 153, 255]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5, 1., 1., 1., 1., 1., 1. - ], + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, - 0.079, 0.079, 0.079, # Thorax and neck has the same as shoulders - 0.035, # Top of head has the same as ears - 0.107, # Pelvis has the same as hips - ]) + 0.026, + 0.025, + 0.025, + 0.035, + 0.035, + 0.079, + 0.079, + 0.072, + 0.072, + 0.062, + 0.062, + 0.107, + 0.107, + 0.087, + 0.087, + 0.089, + 0.089, + 0.079, + 0.079, + 0.079, # Thorax and neck has the same as shoulders + 0.035, # Top of head has the same as ears + 0.107, # Pelvis has the same as hips + ], +) diff --git a/mmpose/configs/_base_/datasets/merged_COCO_AIC_MPII_21.py b/mmpose/configs/_base_/datasets/merged_COCO_AIC_MPII_21.py index 5ede684d16ab557f7917a9bfa3f90c8a016cc06c..915589cdccc19163d10f195db0645b4e6007c06b 100644 --- a/mmpose/configs/_base_/datasets/merged_COCO_AIC_MPII_21.py +++ b/mmpose/configs/_base_/datasets/merged_COCO_AIC_MPII_21.py @@ -1,238 +1,88 @@ dataset_info = dict( - dataset_name='merged_COCO_AIC_MPII_21', + dataset_name="merged_COCO_AIC_MPII_21", paper_info=dict( - author='Miroslav Purkrabek', - title='Merged Pose Estimation Dataset', - container='', - year='2024', - homepage='', + author="Miroslav Purkrabek", + title="Merged Pose Estimation Dataset", + container="", + year="2024", + homepage="", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 17: - dict( - name='thorax_mpii', - id=17, - color=[255, 128, 0], - type='upper', - swap=''), - 18: - dict( - name='neck_mpii', - id=18, - color=[255, 128, 0], - type='upper', - swap=''), - 19: - dict( - name='neck_aic', - id=19, - color=[255, 128, 0], - type='upper', - swap=''), - 20: - dict( - name='top_head', - id=20, - color=[255, 128, 0], - type='upper', - swap=''), + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), + 17: dict(name="thorax_mpii", id=17, color=[255, 128, 0], type="upper", swap=""), + 18: dict(name="neck_mpii", id=18, color=[255, 128, 0], type="upper", swap=""), + 19: dict(name="neck_aic", id=19, color=[255, 128, 0], type="upper", swap=""), + 20: dict(name="top_head", id=20, color=[255, 128, 0], type="upper", swap=""), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), # 7: # dict( # link=('left_shoulder', 'right_shoulder'), # id=7, # color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'thorax_mpii'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), - 14: - dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), - 16: - dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), - 18: - dict( - link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]), - 19: - dict( - link=('right_shoulder', 'thorax_mpii'), - id=19, - color=[51, 153, 255]), - 20: - dict( - link=('thorax_mpii', 'neck_mpii'), - id=20, - color=[51, 153, 255]), - 21: - dict( - link=('thorax_mpii', 'neck_aic'), - id=21, - color=[51, 153, 255]), - 22: - dict( - link=('left_ear', 'top_head'), - id=22, - color=[51, 153, 255]), - 23: - dict( - link=('right_ear', 'top_head'), - id=23, - color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "thorax_mpii"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), + 19: dict(link=("right_shoulder", "thorax_mpii"), id=19, color=[51, 153, 255]), + 20: dict(link=("thorax_mpii", "neck_mpii"), id=20, color=[51, 153, 255]), + 21: dict(link=("thorax_mpii", "neck_aic"), id=21, color=[51, 153, 255]), + 22: dict(link=("left_ear", "top_head"), id=22, color=[51, 153, 255]), + 23: dict(link=("right_ear", "top_head"), id=23, color=[51, 153, 255]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5, 1., 1., 1., 1., 1., 1. - ], + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, - 0.079, 0.079, 0.079, # Thorax and neck has the same as shoulders - 0.035, # Top of head has the same as ears - ]) + 0.026, + 0.025, + 0.025, + 0.035, + 0.035, + 0.079, + 0.079, + 0.072, + 0.072, + 0.062, + 0.062, + 0.107, + 0.107, + 0.087, + 0.087, + 0.089, + 0.089, + 0.079, + 0.079, + 0.079, # Thorax and neck has the same as shoulders + 0.035, # Top of head has the same as ears + ], +) diff --git a/mmpose/configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py b/mmpose/configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py new file mode 100644 index 0000000000000000000000000000000000000000..b5e9d1cb5b0104969c8359e072f19183afcf5a86 --- /dev/null +++ b/mmpose/configs/_base_/datasets/merged_COCO_AIC_MPII_mergable.py @@ -0,0 +1,154 @@ +dataset_info = dict( + dataset_name="merged_COCO_AIC_MPII_mergable", + paper_info=dict( + author="Miroslav Purkrabek", + title="Merged Pose Estimation Dataset", + container="", + year="2024", + homepage="", + ), + # COCO - 17 keypoints: + # -> nose, left_eye, right_eye, left_ear, right_ear, + # -> left_shoulder, right_shoulder, left_elbow, right_elbow, left_wrist, right_wrist + # -> left_hip, right_hip, left_knee, right_knee, left_ankle, right_ankle + # AIC - 14 keypoints: + # -> right_shoulder, right_elbow, right_wrist, left_shoulder, left_elbow, left_wrist + # -> right_hip, right_knee, right_ankle, left_hip, left_knee, left_ankle + # -> head_top, thorax + # MPII - 16 keypoints: + # -> right_ankle, right_knee, right_hip, left_hip, left_knee, left_ankle + # -> pelvis, thorax, lower_neck, head_top + # -> right_wrist, right_elbow, right_shoulder, left_shoulder, left_elbow, left_wrist + # --> Altogether (COCO: 12+5, AIC: 12+2, MPII: 12+4) = 12+5+2+4 = 23 keypoints + # Merged - 21 keypoints: + # ---- + # Merged - 22 keypoints: + # 0. Nose (COCO 0, AIC -, MPII -) + # 1. Left Eye (COCO 1, AIC -, MPII -) + # 2. Right Eye (COCO 2, AIC -, MPII -) + # 3. Left Ear (COCO 3, AIC -, MPII -) + # 4. Right Ear (COCO 4, AIC -, MPII -) + # 5. Left Shoulder (COCO 5, AIC 3, MPII 13) + # 6. Right Shoulder (COCO 6, AIC 0, MPII 12) + # 7. Left Elbow (COCO 7, AIC 4, MPII 14) + # 8. Right Elbow (COCO 8, AIC 1, MPII 11) + # 9. Left Wrist (COCO 9, AIC 5, MPII 15) + # 10. Right Wrist (COCO 10, AIC 2, MPII 10) + # 11. Left Hip (COCO 11, AIC 6, MPII 3) + # 12. Right Hip (COCO 12, AIC 9, MPII 2) + # 13. Left Knee (COCO 13, AIC 10, MPII 4) + # 14. Right Knee (COCO 14, AIC 7, MPII 1) + # 15. Left Ankle (COCO 15, AIC 11, MPII 5) + # 16. Right Ankle (COCO 16, AIC 8, MPII 0) + # 17. MPII Thorax (COCO -, AIC -, MPII 7) + # 18. MPII Pelvis (COCO -, AIC -, MPII 6) + # 19. MPII Neck (COCO -, AIC -, MPII 8) + # 20. MPII Head Top (COCO -, AIC -, MPII 9) + # 21. AIC Neck (COCO -, AIC 13, MPII -) + # 22. AIC Head Top (COCO -, AIC 12, MPII -) + keypoint_info={ + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), + 17: dict(name="thorax_mpii", id=17, color=[51, 153, 255], type="upper", swap=""), + 18: dict(name="pelvis_mpii", id=18, color=[51, 153, 255], type="lower", swap=""), + 19: dict(name="neck_mpii", id=19, color=[51, 153, 255], type="upper", swap=""), + 20: dict(name="head_top_mpii", id=20, color=[51, 153, 255], type="upper", swap=""), + 21: dict(name="neck_aic", id=21, color=[51, 153, 255], type="upper", swap=""), + 22: dict(name="head_top_aic", id=22, color=[51, 153, 255], type="upper", swap=""), + }, + skeleton_info={ + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), + }, + joint_weights=[ + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.2, + 1.2, + 1.5, + 1.5, + 1.0, + 1.0, + 1.2, + 1.2, + 1.5, + 1.5, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + ], + sigmas=[ + # Face - 0.025 + # Shoulders, thorax, neck, top of head - 0.079 + # Elbows - 0.072 + # Wrists - 0.062 + # Hips, pelvis - 0.107 + # Knees - 0.087 + # Ankles - 0.089 + # COCO + 0.026, + 0.025, + 0.025, + 0.035, + 0.035, + 0.079, + 0.079, + 0.072, + 0.072, + 0.062, + 0.062, + 0.107, + 0.107, + 0.087, + 0.087, + 0.089, + 0.089, + # Additional + 0.079, + 0.107, + 0.079, + 0.079, + 0.079, + 0.079, + ], +) diff --git a/mmpose/configs/_base_/datasets/merged_COCO_AIC_MPII_wLimbs.py b/mmpose/configs/_base_/datasets/merged_COCO_AIC_MPII_wLimbs.py new file mode 100644 index 0000000000000000000000000000000000000000..d56665dd66d169b53a093857a984de9bae5c8abf --- /dev/null +++ b/mmpose/configs/_base_/datasets/merged_COCO_AIC_MPII_wLimbs.py @@ -0,0 +1,213 @@ +dataset_info = dict( + dataset_name="merged_COCO_AIC_MPII_mergable", + paper_info=dict( + author="Miroslav Purkrabek", + title="Merged Pose Estimation Dataset", + container="", + year="2024", + homepage="", + ), + # COCO - 17 keypoints: + # -> nose, left_eye, right_eye, left_ear, right_ear, + # -> left_shoulder, right_shoulder, left_elbow, right_elbow, left_wrist, right_wrist + # -> left_hip, right_hip, left_knee, right_knee, left_ankle, right_ankle + # AIC - 14 keypoints: + # -> right_shoulder, right_elbow, right_wrist, left_shoulder, left_elbow, left_wrist + # -> right_hip, right_knee, right_ankle, left_hip, left_knee, left_ankle + # -> head_top, thorax + # MPII - 16 keypoints: + # -> right_ankle, right_knee, right_hip, left_hip, left_knee, left_ankle + # -> pelvis, thorax, lower_neck, head_top + # -> right_wrist, right_elbow, right_shoulder, left_shoulder, left_elbow, left_wrist + # --> Altogether (COCO: 12+5, AIC: 12+2, MPII: 12+4) = 12+5+2+4 = 23 keypoints + # Merged - 21 keypoints: + # ---- + # Merged - 22 keypoints: + # 0. Nose (COCO 0, AIC -, MPII -) + # 1. Left Eye (COCO 1, AIC -, MPII -) + # 2. Right Eye (COCO 2, AIC -, MPII -) + # 3. Left Ear (COCO 3, AIC -, MPII -) + # 4. Right Ear (COCO 4, AIC -, MPII -) + # 5. Left Shoulder (COCO 5, AIC 3, MPII 13) + # 6. Right Shoulder (COCO 6, AIC 0, MPII 12) + # 7. Left Elbow (COCO 7, AIC 4, MPII 14) + # 8. Right Elbow (COCO 8, AIC 1, MPII 11) + # 9. Left Wrist (COCO 9, AIC 5, MPII 15) + # 10. Right Wrist (COCO 10, AIC 2, MPII 10) + # 11. Left Hip (COCO 11, AIC 6, MPII 3) + # 12. Right Hip (COCO 12, AIC 9, MPII 2) + # 13. Left Knee (COCO 13, AIC 10, MPII 4) + # 14. Right Knee (COCO 14, AIC 7, MPII 1) + # 15. Left Ankle (COCO 15, AIC 11, MPII 5) + # 16. Right Ankle (COCO 16, AIC 8, MPII 0) + # 17. MPII Thorax (COCO -, AIC -, MPII 7) + # 18. MPII Pelvis (COCO -, AIC -, MPII 6) + # 19. MPII Neck (COCO -, AIC -, MPII 8) + # 20. MPII Head Top (COCO -, AIC -, MPII 9) + # 21. AIC Neck (COCO -, AIC 13, MPII -) + # 22. AIC Head Top (COCO -, AIC 12, MPII -) + keypoint_info={ + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), + 17: dict(name="thorax_mpii", id=17, color=[51, 153, 255], type="upper", swap=""), + 18: dict(name="pelvis_mpii", id=18, color=[51, 153, 255], type="lower", swap=""), + 19: dict(name="neck_mpii", id=19, color=[51, 153, 255], type="upper", swap=""), + 20: dict(name="head_top_mpii", id=20, color=[51, 153, 255], type="upper", swap=""), + 21: dict(name="neck_aic", id=21, color=[51, 153, 255], type="upper", swap=""), + 22: dict(name="head_top_aic", id=22, color=[51, 153, 255], type="upper", swap=""), + ################################################################ + 23: dict(name="left_big_toe", id=23, color=[51, 153, 255], type="lower", swap="right_big_toe"), + 24: dict(name="left_little_toe", id=24, color=[51, 153, 255], type="lower", swap="right_little_toe"), + 25: dict(name="left_heel", id=25, color=[51, 153, 255], type="lower", swap="right_heel"), + 26: dict(name="right_big_toe", id=26, color=[51, 153, 255], type="lower", swap="left_big_toe"), + 27: dict(name="right_little_toe", id=27, color=[51, 153, 255], type="lower", swap="left_little_toe"), + 28: dict(name="right_heel", id=28, color=[51, 153, 255], type="lower", swap="left_heel"), + 29: dict(name="left_thumb_tip", id=29, color=[51, 153, 255], type="upper", swap="right_thumb_tip"), + 30: dict(name="left_index_tip", id=30, color=[51, 153, 255], type="upper", swap="right_index_tip"), + 31: dict(name="left_middle_tip", id=31, color=[51, 153, 255], type="upper", swap="right_middle_tip"), + 32: dict(name="left_ring_tip", id=32, color=[51, 153, 255], type="upper", swap="right_ring_tip"), + 33: dict(name="left_little_tip", id=33, color=[51, 153, 255], type="upper", swap="right_little_tip"), + 34: dict(name="right_thumb_tip", id=34, color=[51, 153, 255], type="upper", swap="left_thumb_tip"), + 35: dict(name="right_index_tip", id=35, color=[51, 153, 255], type="upper", swap="left_index_tip"), + 36: dict(name="right_middle_tip", id=36, color=[51, 153, 255], type="upper", swap="left_middle_tip"), + 37: dict(name="right_ring_tip", id=37, color=[51, 153, 255], type="upper", swap="left_ring_tip"), + 38: dict(name="right_little_tip", id=38, color=[51, 153, 255], type="upper", swap="left_little_tip"), + 39: dict(name="right_mouth_corner", id=39, color=[51, 153, 255], type="upper", swap="left_mouth_corner"), + 40: dict(name="left_mouth_corner", id=40, color=[51, 153, 255], type="upper", swap="right_mouth_corner"), + }, + skeleton_info={ + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), + }, + joint_weights=[ + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.2, + 1.2, + 1.5, + 1.5, + 1.0, + 1.0, + 1.2, + 1.2, + 1.5, + 1.5, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + 1.0, + # Down-weight toes, heels, fingers and mouth corners as the resolution might be low + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.8, + 0.6, + 0.6, + ], + sigmas=[ + # Face - 0.025 + # Shoulders, thorax, neck, top of head - 0.079 + # Elbows - 0.072 + # Wrists - 0.062 + # Hips, pelvis - 0.107 + # Knees - 0.087 + # Ankles - 0.089 + # COCO + 0.026, + 0.025, + 0.025, + 0.035, + 0.035, + 0.079, + 0.079, + 0.072, + 0.072, + 0.062, + 0.062, + 0.107, + 0.107, + 0.087, + 0.087, + 0.089, + 0.089, + # Additional + 0.079, + 0.107, + 0.079, + 0.079, + 0.079, + 0.079, + # Toes and heels (taken from COCO-wholebody; values for left feet as the original COCO-wb eval is not symmetric) + 0.068, + 0.066, + 0.066, + 0.068, + 0.066, + 0.066, + # Fingers (taken from COCO-wholebody) + 0.047, + 0.035, + 0.026, + 0.032, + 0.031, + 0.047, + 0.035, + 0.026, + 0.032, + 0.031, + # Mouth corners (taken from COCO-wholebody) + 0.008, + 0.008, + ], +) diff --git a/mmpose/configs/_base_/datasets/mhp.py b/mmpose/configs/_base_/datasets/mhp.py index e16e37c79cb63c4352c48bb4e45602b8408f534b..63eaa3e6d0d3cd5ebba5c2b0e3beae0b5f8910ec 100644 --- a/mmpose/configs/_base_/datasets/mhp.py +++ b/mmpose/configs/_base_/datasets/mhp.py @@ -1,156 +1,48 @@ dataset_info = dict( - dataset_name='mhp', + dataset_name="mhp", paper_info=dict( - author='Zhao, Jian and Li, Jianshu and Cheng, Yu and ' - 'Sim, Terence and Yan, Shuicheng and Feng, Jiashi', - title='Understanding humans in crowded scenes: ' - 'Deep nested adversarial learning and a ' - 'new benchmark for multi-human parsing', - container='Proceedings of the 26th ACM ' - 'international conference on Multimedia', - year='2018', - homepage='https://lv-mhp.github.io/dataset', + author="Zhao, Jian and Li, Jianshu and Cheng, Yu and " "Sim, Terence and Yan, Shuicheng and Feng, Jiashi", + title="Understanding humans in crowded scenes: " "Deep nested adversarial learning and a " "new benchmark for multi-human parsing", + container="Proceedings of the 26th ACM " "international conference on Multimedia", + year="2018", + homepage="https://lv-mhp.github.io/dataset", ), keypoint_info={ - 0: - dict( - name='right_ankle', - id=0, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 1: - dict( - name='right_knee', - id=1, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 2: - dict( - name='right_hip', - id=2, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 3: - dict( - name='left_hip', - id=3, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 4: - dict( - name='left_knee', - id=4, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 5: - dict( - name='left_ankle', - id=5, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 6: - dict(name='pelvis', id=6, color=[51, 153, 255], type='lower', swap=''), - 7: - dict(name='thorax', id=7, color=[51, 153, 255], type='upper', swap=''), - 8: - dict( - name='upper_neck', - id=8, - color=[51, 153, 255], - type='upper', - swap=''), - 9: - dict( - name='head_top', id=9, color=[51, 153, 255], type='upper', - swap=''), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='right_elbow', - id=11, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 12: - dict( - name='right_shoulder', - id=12, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 13: - dict( - name='left_shoulder', - id=13, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 14: - dict( - name='left_elbow', - id=14, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 15: - dict( - name='left_wrist', - id=15, - color=[0, 255, 0], - type='upper', - swap='right_wrist') + 0: dict(name="right_ankle", id=0, color=[255, 128, 0], type="lower", swap="left_ankle"), + 1: dict(name="right_knee", id=1, color=[255, 128, 0], type="lower", swap="left_knee"), + 2: dict(name="right_hip", id=2, color=[255, 128, 0], type="lower", swap="left_hip"), + 3: dict(name="left_hip", id=3, color=[0, 255, 0], type="lower", swap="right_hip"), + 4: dict(name="left_knee", id=4, color=[0, 255, 0], type="lower", swap="right_knee"), + 5: dict(name="left_ankle", id=5, color=[0, 255, 0], type="lower", swap="right_ankle"), + 6: dict(name="pelvis", id=6, color=[51, 153, 255], type="lower", swap=""), + 7: dict(name="thorax", id=7, color=[51, 153, 255], type="upper", swap=""), + 8: dict(name="upper_neck", id=8, color=[51, 153, 255], type="upper", swap=""), + 9: dict(name="head_top", id=9, color=[51, 153, 255], type="upper", swap=""), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="right_elbow", id=11, color=[255, 128, 0], type="upper", swap="left_elbow"), + 12: dict(name="right_shoulder", id=12, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 13: dict(name="left_shoulder", id=13, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 14: dict(name="left_elbow", id=14, color=[0, 255, 0], type="upper", swap="right_elbow"), + 15: dict(name="left_wrist", id=15, color=[0, 255, 0], type="upper", swap="right_wrist"), }, skeleton_info={ - 0: - dict(link=('right_ankle', 'right_knee'), id=0, color=[255, 128, 0]), - 1: - dict(link=('right_knee', 'right_hip'), id=1, color=[255, 128, 0]), - 2: - dict(link=('right_hip', 'pelvis'), id=2, color=[255, 128, 0]), - 3: - dict(link=('pelvis', 'left_hip'), id=3, color=[0, 255, 0]), - 4: - dict(link=('left_hip', 'left_knee'), id=4, color=[0, 255, 0]), - 5: - dict(link=('left_knee', 'left_ankle'), id=5, color=[0, 255, 0]), - 6: - dict(link=('pelvis', 'thorax'), id=6, color=[51, 153, 255]), - 7: - dict(link=('thorax', 'upper_neck'), id=7, color=[51, 153, 255]), - 8: - dict(link=('upper_neck', 'head_top'), id=8, color=[51, 153, 255]), - 9: - dict(link=('upper_neck', 'right_shoulder'), id=9, color=[255, 128, 0]), - 10: - dict( - link=('right_shoulder', 'right_elbow'), id=10, color=[255, 128, - 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('upper_neck', 'left_shoulder'), id=12, color=[0, 255, 0]), - 13: - dict(link=('left_shoulder', 'left_elbow'), id=13, color=[0, 255, 0]), - 14: - dict(link=('left_elbow', 'left_wrist'), id=14, color=[0, 255, 0]) + 0: dict(link=("right_ankle", "right_knee"), id=0, color=[255, 128, 0]), + 1: dict(link=("right_knee", "right_hip"), id=1, color=[255, 128, 0]), + 2: dict(link=("right_hip", "pelvis"), id=2, color=[255, 128, 0]), + 3: dict(link=("pelvis", "left_hip"), id=3, color=[0, 255, 0]), + 4: dict(link=("left_hip", "left_knee"), id=4, color=[0, 255, 0]), + 5: dict(link=("left_knee", "left_ankle"), id=5, color=[0, 255, 0]), + 6: dict(link=("pelvis", "thorax"), id=6, color=[51, 153, 255]), + 7: dict(link=("thorax", "upper_neck"), id=7, color=[51, 153, 255]), + 8: dict(link=("upper_neck", "head_top"), id=8, color=[51, 153, 255]), + 9: dict(link=("upper_neck", "right_shoulder"), id=9, color=[255, 128, 0]), + 10: dict(link=("right_shoulder", "right_elbow"), id=10, color=[255, 128, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("upper_neck", "left_shoulder"), id=12, color=[0, 255, 0]), + 13: dict(link=("left_shoulder", "left_elbow"), id=13, color=[0, 255, 0]), + 14: dict(link=("left_elbow", "left_wrist"), id=14, color=[0, 255, 0]), }, - joint_weights=[ - 1.5, 1.2, 1., 1., 1.2, 1.5, 1., 1., 1., 1., 1.5, 1.2, 1., 1., 1.2, 1.5 - ], + joint_weights=[1.5, 1.2, 1.0, 1.0, 1.2, 1.5, 1.0, 1.0, 1.0, 1.0, 1.5, 1.2, 1.0, 1.0, 1.2, 1.5], # Adapted from COCO dataset. - sigmas=[ - 0.089, 0.083, 0.107, 0.107, 0.083, 0.089, 0.026, 0.026, 0.026, 0.026, - 0.062, 0.072, 0.179, 0.179, 0.072, 0.062 - ]) + sigmas=[0.089, 0.083, 0.107, 0.107, 0.083, 0.089, 0.026, 0.026, 0.026, 0.026, 0.062, 0.072, 0.179, 0.179, 0.072, 0.062], +) diff --git a/mmpose/configs/_base_/datasets/mpi_inf_3dhp.py b/mmpose/configs/_base_/datasets/mpi_inf_3dhp.py index ffd0a70297b24456ea38566ac205bb585aa47e5d..842ee8419ce80c81b1029b628b618a7ba6d695df 100644 --- a/mmpose/configs/_base_/datasets/mpi_inf_3dhp.py +++ b/mmpose/configs/_base_/datasets/mpi_inf_3dhp.py @@ -1,132 +1,51 @@ dataset_info = dict( - dataset_name='mpi_inf_3dhp', + dataset_name="mpi_inf_3dhp", paper_info=dict( - author='ehta, Dushyant and Rhodin, Helge and Casas, Dan and ' - 'Fua, Pascal and Sotnychenko, Oleksandr and Xu, Weipeng and ' - 'Theobalt, Christian', - title='Monocular 3D Human Pose Estimation In The Wild Using Improved ' - 'CNN Supervision', - container='2017 international conference on 3D vision (3DV)', - year='2017', - homepage='http://gvv.mpi-inf.mpg.de/3dhp-dataset', + author="ehta, Dushyant and Rhodin, Helge and Casas, Dan and " + "Fua, Pascal and Sotnychenko, Oleksandr and Xu, Weipeng and " + "Theobalt, Christian", + title="Monocular 3D Human Pose Estimation In The Wild Using Improved " "CNN Supervision", + container="2017 international conference on 3D vision (3DV)", + year="2017", + homepage="http://gvv.mpi-inf.mpg.de/3dhp-dataset", ), keypoint_info={ - 0: - dict( - name='head_top', id=0, color=[51, 153, 255], type='upper', - swap=''), - 1: - dict(name='neck', id=1, color=[51, 153, 255], type='upper', swap=''), - 2: - dict( - name='right_shoulder', - id=2, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 3: - dict( - name='right_elbow', - id=3, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 4: - dict( - name='right_wrist', - id=4, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='left_elbow', - id=6, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 7: - dict( - name='left_wrist', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 8: - dict( - name='right_hip', - id=8, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 9: - dict( - name='right_knee', - id=9, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 10: - dict( - name='right_ankle', - id=10, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='left_knee', - id=12, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 13: - dict( - name='left_ankle', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 14: - dict(name='root', id=14, color=[51, 153, 255], type='lower', swap=''), - 15: - dict(name='spine', id=15, color=[51, 153, 255], type='upper', swap=''), - 16: - dict(name='head', id=16, color=[51, 153, 255], type='upper', swap='') + 0: dict(name="head_top", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="neck", id=1, color=[51, 153, 255], type="upper", swap=""), + 2: dict(name="right_shoulder", id=2, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 3: dict(name="right_elbow", id=3, color=[255, 128, 0], type="upper", swap="left_elbow"), + 4: dict(name="right_wrist", id=4, color=[255, 128, 0], type="upper", swap="left_wrist"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="left_elbow", id=6, color=[0, 255, 0], type="upper", swap="right_elbow"), + 7: dict(name="left_wrist", id=7, color=[0, 255, 0], type="upper", swap="right_wrist"), + 8: dict(name="right_hip", id=8, color=[255, 128, 0], type="lower", swap="left_hip"), + 9: dict(name="right_knee", id=9, color=[255, 128, 0], type="lower", swap="left_knee"), + 10: dict(name="right_ankle", id=10, color=[255, 128, 0], type="lower", swap="left_ankle"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="left_knee", id=12, color=[0, 255, 0], type="lower", swap="right_knee"), + 13: dict(name="left_ankle", id=13, color=[0, 255, 0], type="lower", swap="right_ankle"), + 14: dict(name="root", id=14, color=[51, 153, 255], type="lower", swap=""), + 15: dict(name="spine", id=15, color=[51, 153, 255], type="upper", swap=""), + 16: dict(name="head", id=16, color=[51, 153, 255], type="upper", swap=""), }, skeleton_info={ - 0: dict(link=('neck', 'right_shoulder'), id=0, color=[255, 128, 0]), - 1: dict( - link=('right_shoulder', 'right_elbow'), id=1, color=[255, 128, 0]), - 2: - dict(link=('right_elbow', 'right_wrist'), id=2, color=[255, 128, 0]), - 3: dict(link=('neck', 'left_shoulder'), id=3, color=[0, 255, 0]), - 4: dict(link=('left_shoulder', 'left_elbow'), id=4, color=[0, 255, 0]), - 5: dict(link=('left_elbow', 'left_wrist'), id=5, color=[0, 255, 0]), - 6: dict(link=('root', 'right_hip'), id=6, color=[255, 128, 0]), - 7: dict(link=('right_hip', 'right_knee'), id=7, color=[255, 128, 0]), - 8: dict(link=('right_knee', 'right_ankle'), id=8, color=[255, 128, 0]), - 9: dict(link=('root', 'left_hip'), id=9, color=[0, 255, 0]), - 10: dict(link=('left_hip', 'left_knee'), id=10, color=[0, 255, 0]), - 11: dict(link=('left_knee', 'left_ankle'), id=11, color=[0, 255, 0]), - 12: dict(link=('head_top', 'head'), id=12, color=[51, 153, 255]), - 13: dict(link=('head', 'neck'), id=13, color=[51, 153, 255]), - 14: dict(link=('neck', 'spine'), id=14, color=[51, 153, 255]), - 15: dict(link=('spine', 'root'), id=15, color=[51, 153, 255]) + 0: dict(link=("neck", "right_shoulder"), id=0, color=[255, 128, 0]), + 1: dict(link=("right_shoulder", "right_elbow"), id=1, color=[255, 128, 0]), + 2: dict(link=("right_elbow", "right_wrist"), id=2, color=[255, 128, 0]), + 3: dict(link=("neck", "left_shoulder"), id=3, color=[0, 255, 0]), + 4: dict(link=("left_shoulder", "left_elbow"), id=4, color=[0, 255, 0]), + 5: dict(link=("left_elbow", "left_wrist"), id=5, color=[0, 255, 0]), + 6: dict(link=("root", "right_hip"), id=6, color=[255, 128, 0]), + 7: dict(link=("right_hip", "right_knee"), id=7, color=[255, 128, 0]), + 8: dict(link=("right_knee", "right_ankle"), id=8, color=[255, 128, 0]), + 9: dict(link=("root", "left_hip"), id=9, color=[0, 255, 0]), + 10: dict(link=("left_hip", "left_knee"), id=10, color=[0, 255, 0]), + 11: dict(link=("left_knee", "left_ankle"), id=11, color=[0, 255, 0]), + 12: dict(link=("head_top", "head"), id=12, color=[51, 153, 255]), + 13: dict(link=("head", "neck"), id=13, color=[51, 153, 255]), + 14: dict(link=("neck", "spine"), id=14, color=[51, 153, 255]), + 15: dict(link=("spine", "root"), id=15, color=[51, 153, 255]), }, - joint_weights=[1.] * 17, - sigmas=[]) + joint_weights=[1.0] * 17, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/mpii.py b/mmpose/configs/_base_/datasets/mpii.py index 6c2a491c7b58bc3eaa5c0056d3d7184bdd1d1cc7..f6a58b172cf5202b920db03ee0f5a98b100ddfd2 100644 --- a/mmpose/configs/_base_/datasets/mpii.py +++ b/mmpose/configs/_base_/datasets/mpii.py @@ -1,155 +1,48 @@ dataset_info = dict( - dataset_name='mpii', + dataset_name="mpii", paper_info=dict( - author='Mykhaylo Andriluka and Leonid Pishchulin and ' - 'Peter Gehler and Schiele, Bernt', - title='2D Human Pose Estimation: New Benchmark and ' - 'State of the Art Analysis', - container='IEEE Conference on Computer Vision and ' - 'Pattern Recognition (CVPR)', - year='2014', - homepage='http://human-pose.mpi-inf.mpg.de/', + author="Mykhaylo Andriluka and Leonid Pishchulin and " "Peter Gehler and Schiele, Bernt", + title="2D Human Pose Estimation: New Benchmark and " "State of the Art Analysis", + container="IEEE Conference on Computer Vision and " "Pattern Recognition (CVPR)", + year="2014", + homepage="http://human-pose.mpi-inf.mpg.de/", ), keypoint_info={ - 0: - dict( - name='right_ankle', - id=0, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 1: - dict( - name='right_knee', - id=1, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 2: - dict( - name='right_hip', - id=2, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 3: - dict( - name='left_hip', - id=3, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 4: - dict( - name='left_knee', - id=4, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 5: - dict( - name='left_ankle', - id=5, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 6: - dict(name='pelvis', id=6, color=[51, 153, 255], type='lower', swap=''), - 7: - dict(name='thorax', id=7, color=[51, 153, 255], type='upper', swap=''), - 8: - dict( - name='upper_neck', - id=8, - color=[51, 153, 255], - type='upper', - swap=''), - 9: - dict( - name='head_top', id=9, color=[51, 153, 255], type='upper', - swap=''), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='right_elbow', - id=11, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 12: - dict( - name='right_shoulder', - id=12, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 13: - dict( - name='left_shoulder', - id=13, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 14: - dict( - name='left_elbow', - id=14, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 15: - dict( - name='left_wrist', - id=15, - color=[0, 255, 0], - type='upper', - swap='right_wrist') + 0: dict(name="right_ankle", id=0, color=[255, 128, 0], type="lower", swap="left_ankle"), + 1: dict(name="right_knee", id=1, color=[255, 128, 0], type="lower", swap="left_knee"), + 2: dict(name="right_hip", id=2, color=[255, 128, 0], type="lower", swap="left_hip"), + 3: dict(name="left_hip", id=3, color=[0, 255, 0], type="lower", swap="right_hip"), + 4: dict(name="left_knee", id=4, color=[0, 255, 0], type="lower", swap="right_knee"), + 5: dict(name="left_ankle", id=5, color=[0, 255, 0], type="lower", swap="right_ankle"), + 6: dict(name="pelvis", id=6, color=[51, 153, 255], type="lower", swap=""), + 7: dict(name="thorax", id=7, color=[51, 153, 255], type="upper", swap=""), + 8: dict(name="upper_neck", id=8, color=[51, 153, 255], type="upper", swap=""), + 9: dict(name="head_top", id=9, color=[51, 153, 255], type="upper", swap=""), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="right_elbow", id=11, color=[255, 128, 0], type="upper", swap="left_elbow"), + 12: dict(name="right_shoulder", id=12, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 13: dict(name="left_shoulder", id=13, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 14: dict(name="left_elbow", id=14, color=[0, 255, 0], type="upper", swap="right_elbow"), + 15: dict(name="left_wrist", id=15, color=[0, 255, 0], type="upper", swap="right_wrist"), }, skeleton_info={ - 0: - dict(link=('right_ankle', 'right_knee'), id=0, color=[255, 128, 0]), - 1: - dict(link=('right_knee', 'right_hip'), id=1, color=[255, 128, 0]), - 2: - dict(link=('right_hip', 'pelvis'), id=2, color=[255, 128, 0]), - 3: - dict(link=('pelvis', 'left_hip'), id=3, color=[0, 255, 0]), - 4: - dict(link=('left_hip', 'left_knee'), id=4, color=[0, 255, 0]), - 5: - dict(link=('left_knee', 'left_ankle'), id=5, color=[0, 255, 0]), - 6: - dict(link=('pelvis', 'thorax'), id=6, color=[51, 153, 255]), - 7: - dict(link=('thorax', 'upper_neck'), id=7, color=[51, 153, 255]), - 8: - dict(link=('upper_neck', 'head_top'), id=8, color=[51, 153, 255]), - 9: - dict(link=('upper_neck', 'right_shoulder'), id=9, color=[255, 128, 0]), - 10: - dict( - link=('right_shoulder', 'right_elbow'), id=10, color=[255, 128, - 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('upper_neck', 'left_shoulder'), id=12, color=[0, 255, 0]), - 13: - dict(link=('left_shoulder', 'left_elbow'), id=13, color=[0, 255, 0]), - 14: - dict(link=('left_elbow', 'left_wrist'), id=14, color=[0, 255, 0]) + 0: dict(link=("right_ankle", "right_knee"), id=0, color=[255, 128, 0]), + 1: dict(link=("right_knee", "right_hip"), id=1, color=[255, 128, 0]), + 2: dict(link=("right_hip", "pelvis"), id=2, color=[255, 128, 0]), + 3: dict(link=("pelvis", "left_hip"), id=3, color=[0, 255, 0]), + 4: dict(link=("left_hip", "left_knee"), id=4, color=[0, 255, 0]), + 5: dict(link=("left_knee", "left_ankle"), id=5, color=[0, 255, 0]), + 6: dict(link=("pelvis", "thorax"), id=6, color=[51, 153, 255]), + 7: dict(link=("thorax", "upper_neck"), id=7, color=[51, 153, 255]), + 8: dict(link=("upper_neck", "head_top"), id=8, color=[51, 153, 255]), + 9: dict(link=("upper_neck", "right_shoulder"), id=9, color=[255, 128, 0]), + 10: dict(link=("right_shoulder", "right_elbow"), id=10, color=[255, 128, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("upper_neck", "left_shoulder"), id=12, color=[0, 255, 0]), + 13: dict(link=("left_shoulder", "left_elbow"), id=13, color=[0, 255, 0]), + 14: dict(link=("left_elbow", "left_wrist"), id=14, color=[0, 255, 0]), }, - joint_weights=[ - 1.5, 1.2, 1., 1., 1.2, 1.5, 1., 1., 1., 1., 1.5, 1.2, 1., 1., 1.2, 1.5 - ], + joint_weights=[1.5, 1.2, 1.0, 1.0, 1.2, 1.5, 1.0, 1.0, 1.0, 1.0, 1.5, 1.2, 1.0, 1.0, 1.2, 1.5], # Adapted from COCO dataset. - sigmas=[ - 0.089, 0.083, 0.107, 0.107, 0.083, 0.089, 0.026, 0.026, 0.026, 0.026, - 0.062, 0.072, 0.179, 0.179, 0.072, 0.062 - ]) + sigmas=[0.089, 0.083, 0.107, 0.107, 0.083, 0.089, 0.026, 0.026, 0.026, 0.026, 0.062, 0.072, 0.179, 0.179, 0.072, 0.062], +) diff --git a/mmpose/configs/_base_/datasets/mpii_trb.py b/mmpose/configs/_base_/datasets/mpii_trb.py index 73940d4b4827f8e08343c3b517360db788e4820d..d13c598b1e1214f385af133e2c1d9cce954e14f0 100644 --- a/mmpose/configs/_base_/datasets/mpii_trb.py +++ b/mmpose/configs/_base_/datasets/mpii_trb.py @@ -1,380 +1,83 @@ dataset_info = dict( - dataset_name='mpii_trb', + dataset_name="mpii_trb", paper_info=dict( - author='Duan, Haodong and Lin, Kwan-Yee and Jin, Sheng and ' - 'Liu, Wentao and Qian, Chen and Ouyang, Wanli', - title='TRB: A Novel Triplet Representation for ' - 'Understanding 2D Human Body', - container='Proceedings of the IEEE International ' - 'Conference on Computer Vision', - year='2019', - homepage='https://github.com/kennymckormick/' - 'Triplet-Representation-of-human-Body', + author="Duan, Haodong and Lin, Kwan-Yee and Jin, Sheng and " "Liu, Wentao and Qian, Chen and Ouyang, Wanli", + title="TRB: A Novel Triplet Representation for " "Understanding 2D Human Body", + container="Proceedings of the IEEE International " "Conference on Computer Vision", + year="2019", + homepage="https://github.com/kennymckormick/" "Triplet-Representation-of-human-Body", ), keypoint_info={ - 0: - dict( - name='left_shoulder', - id=0, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 1: - dict( - name='right_shoulder', - id=1, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 2: - dict( - name='left_elbow', - id=2, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 3: - dict( - name='right_elbow', - id=3, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 4: - dict( - name='left_wrist', - id=4, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 5: - dict( - name='right_wrist', - id=5, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 6: - dict( - name='left_hip', - id=6, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 7: - dict( - name='right_hip', - id=7, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 8: - dict( - name='left_knee', - id=8, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 9: - dict( - name='right_knee', - id=9, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 10: - dict( - name='left_ankle', - id=10, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 11: - dict( - name='right_ankle', - id=11, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 12: - dict(name='head', id=12, color=[51, 153, 255], type='upper', swap=''), - 13: - dict(name='neck', id=13, color=[51, 153, 255], type='upper', swap=''), - 14: - dict( - name='right_neck', - id=14, - color=[255, 255, 255], - type='upper', - swap='left_neck'), - 15: - dict( - name='left_neck', - id=15, - color=[255, 255, 255], - type='upper', - swap='right_neck'), - 16: - dict( - name='medial_right_shoulder', - id=16, - color=[255, 255, 255], - type='upper', - swap='medial_left_shoulder'), - 17: - dict( - name='lateral_right_shoulder', - id=17, - color=[255, 255, 255], - type='upper', - swap='lateral_left_shoulder'), - 18: - dict( - name='medial_right_bow', - id=18, - color=[255, 255, 255], - type='upper', - swap='medial_left_bow'), - 19: - dict( - name='lateral_right_bow', - id=19, - color=[255, 255, 255], - type='upper', - swap='lateral_left_bow'), - 20: - dict( - name='medial_right_wrist', - id=20, - color=[255, 255, 255], - type='upper', - swap='medial_left_wrist'), - 21: - dict( - name='lateral_right_wrist', - id=21, - color=[255, 255, 255], - type='upper', - swap='lateral_left_wrist'), - 22: - dict( - name='medial_left_shoulder', - id=22, - color=[255, 255, 255], - type='upper', - swap='medial_right_shoulder'), - 23: - dict( - name='lateral_left_shoulder', - id=23, - color=[255, 255, 255], - type='upper', - swap='lateral_right_shoulder'), - 24: - dict( - name='medial_left_bow', - id=24, - color=[255, 255, 255], - type='upper', - swap='medial_right_bow'), - 25: - dict( - name='lateral_left_bow', - id=25, - color=[255, 255, 255], - type='upper', - swap='lateral_right_bow'), - 26: - dict( - name='medial_left_wrist', - id=26, - color=[255, 255, 255], - type='upper', - swap='medial_right_wrist'), - 27: - dict( - name='lateral_left_wrist', - id=27, - color=[255, 255, 255], - type='upper', - swap='lateral_right_wrist'), - 28: - dict( - name='medial_right_hip', - id=28, - color=[255, 255, 255], - type='lower', - swap='medial_left_hip'), - 29: - dict( - name='lateral_right_hip', - id=29, - color=[255, 255, 255], - type='lower', - swap='lateral_left_hip'), - 30: - dict( - name='medial_right_knee', - id=30, - color=[255, 255, 255], - type='lower', - swap='medial_left_knee'), - 31: - dict( - name='lateral_right_knee', - id=31, - color=[255, 255, 255], - type='lower', - swap='lateral_left_knee'), - 32: - dict( - name='medial_right_ankle', - id=32, - color=[255, 255, 255], - type='lower', - swap='medial_left_ankle'), - 33: - dict( - name='lateral_right_ankle', - id=33, - color=[255, 255, 255], - type='lower', - swap='lateral_left_ankle'), - 34: - dict( - name='medial_left_hip', - id=34, - color=[255, 255, 255], - type='lower', - swap='medial_right_hip'), - 35: - dict( - name='lateral_left_hip', - id=35, - color=[255, 255, 255], - type='lower', - swap='lateral_right_hip'), - 36: - dict( - name='medial_left_knee', - id=36, - color=[255, 255, 255], - type='lower', - swap='medial_right_knee'), - 37: - dict( - name='lateral_left_knee', - id=37, - color=[255, 255, 255], - type='lower', - swap='lateral_right_knee'), - 38: - dict( - name='medial_left_ankle', - id=38, - color=[255, 255, 255], - type='lower', - swap='medial_right_ankle'), - 39: - dict( - name='lateral_left_ankle', - id=39, - color=[255, 255, 255], - type='lower', - swap='lateral_right_ankle'), + 0: dict(name="left_shoulder", id=0, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 1: dict(name="right_shoulder", id=1, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 2: dict(name="left_elbow", id=2, color=[0, 255, 0], type="upper", swap="right_elbow"), + 3: dict(name="right_elbow", id=3, color=[255, 128, 0], type="upper", swap="left_elbow"), + 4: dict(name="left_wrist", id=4, color=[0, 255, 0], type="upper", swap="right_wrist"), + 5: dict(name="right_wrist", id=5, color=[255, 128, 0], type="upper", swap="left_wrist"), + 6: dict(name="left_hip", id=6, color=[0, 255, 0], type="lower", swap="right_hip"), + 7: dict(name="right_hip", id=7, color=[255, 128, 0], type="lower", swap="left_hip"), + 8: dict(name="left_knee", id=8, color=[0, 255, 0], type="lower", swap="right_knee"), + 9: dict(name="right_knee", id=9, color=[255, 128, 0], type="lower", swap="left_knee"), + 10: dict(name="left_ankle", id=10, color=[0, 255, 0], type="lower", swap="right_ankle"), + 11: dict(name="right_ankle", id=11, color=[255, 128, 0], type="lower", swap="left_ankle"), + 12: dict(name="head", id=12, color=[51, 153, 255], type="upper", swap=""), + 13: dict(name="neck", id=13, color=[51, 153, 255], type="upper", swap=""), + 14: dict(name="right_neck", id=14, color=[255, 255, 255], type="upper", swap="left_neck"), + 15: dict(name="left_neck", id=15, color=[255, 255, 255], type="upper", swap="right_neck"), + 16: dict(name="medial_right_shoulder", id=16, color=[255, 255, 255], type="upper", swap="medial_left_shoulder"), + 17: dict(name="lateral_right_shoulder", id=17, color=[255, 255, 255], type="upper", swap="lateral_left_shoulder"), + 18: dict(name="medial_right_bow", id=18, color=[255, 255, 255], type="upper", swap="medial_left_bow"), + 19: dict(name="lateral_right_bow", id=19, color=[255, 255, 255], type="upper", swap="lateral_left_bow"), + 20: dict(name="medial_right_wrist", id=20, color=[255, 255, 255], type="upper", swap="medial_left_wrist"), + 21: dict(name="lateral_right_wrist", id=21, color=[255, 255, 255], type="upper", swap="lateral_left_wrist"), + 22: dict(name="medial_left_shoulder", id=22, color=[255, 255, 255], type="upper", swap="medial_right_shoulder"), + 23: dict(name="lateral_left_shoulder", id=23, color=[255, 255, 255], type="upper", swap="lateral_right_shoulder"), + 24: dict(name="medial_left_bow", id=24, color=[255, 255, 255], type="upper", swap="medial_right_bow"), + 25: dict(name="lateral_left_bow", id=25, color=[255, 255, 255], type="upper", swap="lateral_right_bow"), + 26: dict(name="medial_left_wrist", id=26, color=[255, 255, 255], type="upper", swap="medial_right_wrist"), + 27: dict(name="lateral_left_wrist", id=27, color=[255, 255, 255], type="upper", swap="lateral_right_wrist"), + 28: dict(name="medial_right_hip", id=28, color=[255, 255, 255], type="lower", swap="medial_left_hip"), + 29: dict(name="lateral_right_hip", id=29, color=[255, 255, 255], type="lower", swap="lateral_left_hip"), + 30: dict(name="medial_right_knee", id=30, color=[255, 255, 255], type="lower", swap="medial_left_knee"), + 31: dict(name="lateral_right_knee", id=31, color=[255, 255, 255], type="lower", swap="lateral_left_knee"), + 32: dict(name="medial_right_ankle", id=32, color=[255, 255, 255], type="lower", swap="medial_left_ankle"), + 33: dict(name="lateral_right_ankle", id=33, color=[255, 255, 255], type="lower", swap="lateral_left_ankle"), + 34: dict(name="medial_left_hip", id=34, color=[255, 255, 255], type="lower", swap="medial_right_hip"), + 35: dict(name="lateral_left_hip", id=35, color=[255, 255, 255], type="lower", swap="lateral_right_hip"), + 36: dict(name="medial_left_knee", id=36, color=[255, 255, 255], type="lower", swap="medial_right_knee"), + 37: dict(name="lateral_left_knee", id=37, color=[255, 255, 255], type="lower", swap="lateral_right_knee"), + 38: dict(name="medial_left_ankle", id=38, color=[255, 255, 255], type="lower", swap="medial_right_ankle"), + 39: dict(name="lateral_left_ankle", id=39, color=[255, 255, 255], type="lower", swap="lateral_right_ankle"), }, skeleton_info={ - 0: - dict(link=('head', 'neck'), id=0, color=[51, 153, 255]), - 1: - dict(link=('neck', 'left_shoulder'), id=1, color=[51, 153, 255]), - 2: - dict(link=('neck', 'right_shoulder'), id=2, color=[51, 153, 255]), - 3: - dict(link=('left_shoulder', 'left_elbow'), id=3, color=[0, 255, 0]), - 4: - dict( - link=('right_shoulder', 'right_elbow'), id=4, color=[255, 128, 0]), - 5: - dict(link=('left_elbow', 'left_wrist'), id=5, color=[0, 255, 0]), - 6: - dict(link=('right_elbow', 'right_wrist'), id=6, color=[255, 128, 0]), - 7: - dict(link=('left_shoulder', 'left_hip'), id=7, color=[51, 153, 255]), - 8: - dict(link=('right_shoulder', 'right_hip'), id=8, color=[51, 153, 255]), - 9: - dict(link=('left_hip', 'right_hip'), id=9, color=[51, 153, 255]), - 10: - dict(link=('left_hip', 'left_knee'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_hip', 'right_knee'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_knee', 'left_ankle'), id=12, color=[0, 255, 0]), - 13: - dict(link=('right_knee', 'right_ankle'), id=13, color=[255, 128, 0]), - 14: - dict(link=('right_neck', 'left_neck'), id=14, color=[255, 255, 255]), - 15: - dict( - link=('medial_right_shoulder', 'lateral_right_shoulder'), - id=15, - color=[255, 255, 255]), - 16: - dict( - link=('medial_right_bow', 'lateral_right_bow'), - id=16, - color=[255, 255, 255]), - 17: - dict( - link=('medial_right_wrist', 'lateral_right_wrist'), - id=17, - color=[255, 255, 255]), - 18: - dict( - link=('medial_left_shoulder', 'lateral_left_shoulder'), - id=18, - color=[255, 255, 255]), - 19: - dict( - link=('medial_left_bow', 'lateral_left_bow'), - id=19, - color=[255, 255, 255]), - 20: - dict( - link=('medial_left_wrist', 'lateral_left_wrist'), - id=20, - color=[255, 255, 255]), - 21: - dict( - link=('medial_right_hip', 'lateral_right_hip'), - id=21, - color=[255, 255, 255]), - 22: - dict( - link=('medial_right_knee', 'lateral_right_knee'), - id=22, - color=[255, 255, 255]), - 23: - dict( - link=('medial_right_ankle', 'lateral_right_ankle'), - id=23, - color=[255, 255, 255]), - 24: - dict( - link=('medial_left_hip', 'lateral_left_hip'), - id=24, - color=[255, 255, 255]), - 25: - dict( - link=('medial_left_knee', 'lateral_left_knee'), - id=25, - color=[255, 255, 255]), - 26: - dict( - link=('medial_left_ankle', 'lateral_left_ankle'), - id=26, - color=[255, 255, 255]) + 0: dict(link=("head", "neck"), id=0, color=[51, 153, 255]), + 1: dict(link=("neck", "left_shoulder"), id=1, color=[51, 153, 255]), + 2: dict(link=("neck", "right_shoulder"), id=2, color=[51, 153, 255]), + 3: dict(link=("left_shoulder", "left_elbow"), id=3, color=[0, 255, 0]), + 4: dict(link=("right_shoulder", "right_elbow"), id=4, color=[255, 128, 0]), + 5: dict(link=("left_elbow", "left_wrist"), id=5, color=[0, 255, 0]), + 6: dict(link=("right_elbow", "right_wrist"), id=6, color=[255, 128, 0]), + 7: dict(link=("left_shoulder", "left_hip"), id=7, color=[51, 153, 255]), + 8: dict(link=("right_shoulder", "right_hip"), id=8, color=[51, 153, 255]), + 9: dict(link=("left_hip", "right_hip"), id=9, color=[51, 153, 255]), + 10: dict(link=("left_hip", "left_knee"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_hip", "right_knee"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_knee", "left_ankle"), id=12, color=[0, 255, 0]), + 13: dict(link=("right_knee", "right_ankle"), id=13, color=[255, 128, 0]), + 14: dict(link=("right_neck", "left_neck"), id=14, color=[255, 255, 255]), + 15: dict(link=("medial_right_shoulder", "lateral_right_shoulder"), id=15, color=[255, 255, 255]), + 16: dict(link=("medial_right_bow", "lateral_right_bow"), id=16, color=[255, 255, 255]), + 17: dict(link=("medial_right_wrist", "lateral_right_wrist"), id=17, color=[255, 255, 255]), + 18: dict(link=("medial_left_shoulder", "lateral_left_shoulder"), id=18, color=[255, 255, 255]), + 19: dict(link=("medial_left_bow", "lateral_left_bow"), id=19, color=[255, 255, 255]), + 20: dict(link=("medial_left_wrist", "lateral_left_wrist"), id=20, color=[255, 255, 255]), + 21: dict(link=("medial_right_hip", "lateral_right_hip"), id=21, color=[255, 255, 255]), + 22: dict(link=("medial_right_knee", "lateral_right_knee"), id=22, color=[255, 255, 255]), + 23: dict(link=("medial_right_ankle", "lateral_right_ankle"), id=23, color=[255, 255, 255]), + 24: dict(link=("medial_left_hip", "lateral_left_hip"), id=24, color=[255, 255, 255]), + 25: dict(link=("medial_left_knee", "lateral_left_knee"), id=25, color=[255, 255, 255]), + 26: dict(link=("medial_left_ankle", "lateral_left_ankle"), id=26, color=[255, 255, 255]), }, - joint_weights=[1.] * 40, - sigmas=[]) + joint_weights=[1.0] * 40, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/ochuman.py b/mmpose/configs/_base_/datasets/ochuman.py index 2ef20838fe583fde133a97e688d30e91ae562746..7a8f791dedc59877c4310d70994cd9cf5b7d529c 100644 --- a/mmpose/configs/_base_/datasets/ochuman.py +++ b/mmpose/configs/_base_/datasets/ochuman.py @@ -1,181 +1,54 @@ dataset_info = dict( - dataset_name='ochuman', + dataset_name="ochuman", paper_info=dict( - author='Zhang, Song-Hai and Li, Ruilong and Dong, Xin and ' - 'Rosin, Paul and Cai, Zixi and Han, Xi and ' - 'Yang, Dingcheng and Huang, Haozhi and Hu, Shi-Min', - title='Pose2seg: Detection free human instance segmentation', - container='Proceedings of the IEEE conference on computer ' - 'vision and pattern recognition', - year='2019', - homepage='https://github.com/liruilong940607/OCHumanApi', + author="Zhang, Song-Hai and Li, Ruilong and Dong, Xin and " + "Rosin, Paul and Cai, Zixi and Han, Xi and " + "Yang, Dingcheng and Huang, Haozhi and Hu, Shi-Min", + title="Pose2seg: Detection free human instance segmentation", + container="Proceedings of the IEEE conference on computer " "vision and pattern recognition", + year="2019", + homepage="https://github.com/liruilong940607/OCHumanApi", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle') + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'right_shoulder'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), - 14: - dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), - 16: - dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), - 18: - dict( - link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]) + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5 - ], - sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089 - ]) + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5], + sigmas=[0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089], +) diff --git a/mmpose/configs/_base_/datasets/onehand10k.py b/mmpose/configs/_base_/datasets/onehand10k.py index 016770f14f3075dfa7d59389524a0c11a4feb802..59f4ad307f71ff7dd755ee70054777d4401d1c05 100644 --- a/mmpose/configs/_base_/datasets/onehand10k.py +++ b/mmpose/configs/_base_/datasets/onehand10k.py @@ -1,142 +1,57 @@ dataset_info = dict( - dataset_name='onehand10k', + dataset_name="onehand10k", paper_info=dict( - author='Wang, Yangang and Peng, Cong and Liu, Yebin', - title='Mask-pose cascaded cnn for 2d hand pose estimation ' - 'from single color image', - container='IEEE Transactions on Circuits and Systems ' - 'for Video Technology', - year='2018', - homepage='https://www.yangangwang.com/papers/WANG-MCC-2018-10.html', + author="Wang, Yangang and Peng, Cong and Liu, Yebin", + title="Mask-pose cascaded cnn for 2d hand pose estimation " "from single color image", + container="IEEE Transactions on Circuits and Systems " "for Video Technology", + year="2018", + homepage="https://www.yangangwang.com/papers/WANG-MCC-2018-10.html", ), keypoint_info={ - 0: - dict(name='wrist', id=0, color=[255, 255, 255], type='', swap=''), - 1: - dict(name='thumb1', id=1, color=[255, 128, 0], type='', swap=''), - 2: - dict(name='thumb2', id=2, color=[255, 128, 0], type='', swap=''), - 3: - dict(name='thumb3', id=3, color=[255, 128, 0], type='', swap=''), - 4: - dict(name='thumb4', id=4, color=[255, 128, 0], type='', swap=''), - 5: - dict( - name='forefinger1', id=5, color=[255, 153, 255], type='', swap=''), - 6: - dict( - name='forefinger2', id=6, color=[255, 153, 255], type='', swap=''), - 7: - dict( - name='forefinger3', id=7, color=[255, 153, 255], type='', swap=''), - 8: - dict( - name='forefinger4', id=8, color=[255, 153, 255], type='', swap=''), - 9: - dict( - name='middle_finger1', - id=9, - color=[102, 178, 255], - type='', - swap=''), - 10: - dict( - name='middle_finger2', - id=10, - color=[102, 178, 255], - type='', - swap=''), - 11: - dict( - name='middle_finger3', - id=11, - color=[102, 178, 255], - type='', - swap=''), - 12: - dict( - name='middle_finger4', - id=12, - color=[102, 178, 255], - type='', - swap=''), - 13: - dict( - name='ring_finger1', id=13, color=[255, 51, 51], type='', swap=''), - 14: - dict( - name='ring_finger2', id=14, color=[255, 51, 51], type='', swap=''), - 15: - dict( - name='ring_finger3', id=15, color=[255, 51, 51], type='', swap=''), - 16: - dict( - name='ring_finger4', id=16, color=[255, 51, 51], type='', swap=''), - 17: - dict(name='pinky_finger1', id=17, color=[0, 255, 0], type='', swap=''), - 18: - dict(name='pinky_finger2', id=18, color=[0, 255, 0], type='', swap=''), - 19: - dict(name='pinky_finger3', id=19, color=[0, 255, 0], type='', swap=''), - 20: - dict(name='pinky_finger4', id=20, color=[0, 255, 0], type='', swap='') + 0: dict(name="wrist", id=0, color=[255, 255, 255], type="", swap=""), + 1: dict(name="thumb1", id=1, color=[255, 128, 0], type="", swap=""), + 2: dict(name="thumb2", id=2, color=[255, 128, 0], type="", swap=""), + 3: dict(name="thumb3", id=3, color=[255, 128, 0], type="", swap=""), + 4: dict(name="thumb4", id=4, color=[255, 128, 0], type="", swap=""), + 5: dict(name="forefinger1", id=5, color=[255, 153, 255], type="", swap=""), + 6: dict(name="forefinger2", id=6, color=[255, 153, 255], type="", swap=""), + 7: dict(name="forefinger3", id=7, color=[255, 153, 255], type="", swap=""), + 8: dict(name="forefinger4", id=8, color=[255, 153, 255], type="", swap=""), + 9: dict(name="middle_finger1", id=9, color=[102, 178, 255], type="", swap=""), + 10: dict(name="middle_finger2", id=10, color=[102, 178, 255], type="", swap=""), + 11: dict(name="middle_finger3", id=11, color=[102, 178, 255], type="", swap=""), + 12: dict(name="middle_finger4", id=12, color=[102, 178, 255], type="", swap=""), + 13: dict(name="ring_finger1", id=13, color=[255, 51, 51], type="", swap=""), + 14: dict(name="ring_finger2", id=14, color=[255, 51, 51], type="", swap=""), + 15: dict(name="ring_finger3", id=15, color=[255, 51, 51], type="", swap=""), + 16: dict(name="ring_finger4", id=16, color=[255, 51, 51], type="", swap=""), + 17: dict(name="pinky_finger1", id=17, color=[0, 255, 0], type="", swap=""), + 18: dict(name="pinky_finger2", id=18, color=[0, 255, 0], type="", swap=""), + 19: dict(name="pinky_finger3", id=19, color=[0, 255, 0], type="", swap=""), + 20: dict(name="pinky_finger4", id=20, color=[0, 255, 0], type="", swap=""), }, skeleton_info={ - 0: - dict(link=('wrist', 'thumb1'), id=0, color=[255, 128, 0]), - 1: - dict(link=('thumb1', 'thumb2'), id=1, color=[255, 128, 0]), - 2: - dict(link=('thumb2', 'thumb3'), id=2, color=[255, 128, 0]), - 3: - dict(link=('thumb3', 'thumb4'), id=3, color=[255, 128, 0]), - 4: - dict(link=('wrist', 'forefinger1'), id=4, color=[255, 153, 255]), - 5: - dict(link=('forefinger1', 'forefinger2'), id=5, color=[255, 153, 255]), - 6: - dict(link=('forefinger2', 'forefinger3'), id=6, color=[255, 153, 255]), - 7: - dict(link=('forefinger3', 'forefinger4'), id=7, color=[255, 153, 255]), - 8: - dict(link=('wrist', 'middle_finger1'), id=8, color=[102, 178, 255]), - 9: - dict( - link=('middle_finger1', 'middle_finger2'), - id=9, - color=[102, 178, 255]), - 10: - dict( - link=('middle_finger2', 'middle_finger3'), - id=10, - color=[102, 178, 255]), - 11: - dict( - link=('middle_finger3', 'middle_finger4'), - id=11, - color=[102, 178, 255]), - 12: - dict(link=('wrist', 'ring_finger1'), id=12, color=[255, 51, 51]), - 13: - dict( - link=('ring_finger1', 'ring_finger2'), id=13, color=[255, 51, 51]), - 14: - dict( - link=('ring_finger2', 'ring_finger3'), id=14, color=[255, 51, 51]), - 15: - dict( - link=('ring_finger3', 'ring_finger4'), id=15, color=[255, 51, 51]), - 16: - dict(link=('wrist', 'pinky_finger1'), id=16, color=[0, 255, 0]), - 17: - dict( - link=('pinky_finger1', 'pinky_finger2'), id=17, color=[0, 255, 0]), - 18: - dict( - link=('pinky_finger2', 'pinky_finger3'), id=18, color=[0, 255, 0]), - 19: - dict( - link=('pinky_finger3', 'pinky_finger4'), id=19, color=[0, 255, 0]) + 0: dict(link=("wrist", "thumb1"), id=0, color=[255, 128, 0]), + 1: dict(link=("thumb1", "thumb2"), id=1, color=[255, 128, 0]), + 2: dict(link=("thumb2", "thumb3"), id=2, color=[255, 128, 0]), + 3: dict(link=("thumb3", "thumb4"), id=3, color=[255, 128, 0]), + 4: dict(link=("wrist", "forefinger1"), id=4, color=[255, 153, 255]), + 5: dict(link=("forefinger1", "forefinger2"), id=5, color=[255, 153, 255]), + 6: dict(link=("forefinger2", "forefinger3"), id=6, color=[255, 153, 255]), + 7: dict(link=("forefinger3", "forefinger4"), id=7, color=[255, 153, 255]), + 8: dict(link=("wrist", "middle_finger1"), id=8, color=[102, 178, 255]), + 9: dict(link=("middle_finger1", "middle_finger2"), id=9, color=[102, 178, 255]), + 10: dict(link=("middle_finger2", "middle_finger3"), id=10, color=[102, 178, 255]), + 11: dict(link=("middle_finger3", "middle_finger4"), id=11, color=[102, 178, 255]), + 12: dict(link=("wrist", "ring_finger1"), id=12, color=[255, 51, 51]), + 13: dict(link=("ring_finger1", "ring_finger2"), id=13, color=[255, 51, 51]), + 14: dict(link=("ring_finger2", "ring_finger3"), id=14, color=[255, 51, 51]), + 15: dict(link=("ring_finger3", "ring_finger4"), id=15, color=[255, 51, 51]), + 16: dict(link=("wrist", "pinky_finger1"), id=16, color=[0, 255, 0]), + 17: dict(link=("pinky_finger1", "pinky_finger2"), id=17, color=[0, 255, 0]), + 18: dict(link=("pinky_finger2", "pinky_finger3"), id=18, color=[0, 255, 0]), + 19: dict(link=("pinky_finger3", "pinky_finger4"), id=19, color=[0, 255, 0]), }, - joint_weights=[1.] * 21, - sigmas=[]) + joint_weights=[1.0] * 21, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/panoptic_body3d.py b/mmpose/configs/_base_/datasets/panoptic_body3d.py index e3b19ac462415a840ca2e0b9e214bdb35d91b5e4..e021382dded584f893f90bf4e44ca77e2b2807b3 100644 --- a/mmpose/configs/_base_/datasets/panoptic_body3d.py +++ b/mmpose/configs/_base_/datasets/panoptic_body3d.py @@ -1,160 +1,72 @@ dataset_info = dict( - dataset_name='panoptic_pose_3d', + dataset_name="panoptic_pose_3d", paper_info=dict( - author='Joo, Hanbyul and Simon, Tomas and Li, Xulong' - 'and Liu, Hao and Tan, Lei and Gui, Lin and Banerjee, Sean' - 'and Godisart, Timothy and Nabbe, Bart and Matthews, Iain' - 'and Kanade, Takeo and Nobuhara, Shohei and Sheikh, Yaser', - title='Panoptic Studio: A Massively Multiview System ' - 'for Interaction Motion Capture', - container='IEEE Transactions on Pattern Analysis' - ' and Machine Intelligence', - year='2017', - homepage='http://domedb.perception.cs.cmu.edu', + author="Joo, Hanbyul and Simon, Tomas and Li, Xulong" + "and Liu, Hao and Tan, Lei and Gui, Lin and Banerjee, Sean" + "and Godisart, Timothy and Nabbe, Bart and Matthews, Iain" + "and Kanade, Takeo and Nobuhara, Shohei and Sheikh, Yaser", + title="Panoptic Studio: A Massively Multiview System " "for Interaction Motion Capture", + container="IEEE Transactions on Pattern Analysis" " and Machine Intelligence", + year="2017", + homepage="http://domedb.perception.cs.cmu.edu", ), keypoint_info={ - 0: - dict(name='neck', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict(name='nose', id=1, color=[51, 153, 255], type='upper', swap=''), - 2: - dict(name='mid_hip', id=2, color=[0, 255, 0], type='lower', swap=''), - 3: - dict( - name='left_shoulder', - id=3, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 4: - dict( - name='left_elbow', - id=4, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 5: - dict( - name='left_wrist', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 6: - dict( - name='left_hip', - id=6, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 7: - dict( - name='left_knee', - id=7, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 8: - dict( - name='left_ankle', - id=8, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 9: - dict( - name='right_shoulder', - id=9, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 10: - dict( - name='right_elbow', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 11: - dict( - name='right_wrist', - id=11, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='right_knee', - id=13, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 14: - dict( - name='right_ankle', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 15: - dict( - name='left_eye', - id=15, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 16: - dict( - name='left_ear', - id=16, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 17: - dict( - name='right_eye', - id=17, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 18: - dict( - name='right_ear', - id=18, - color=[51, 153, 255], - type='upper', - swap='left_ear') + 0: dict(name="neck", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="nose", id=1, color=[51, 153, 255], type="upper", swap=""), + 2: dict(name="mid_hip", id=2, color=[0, 255, 0], type="lower", swap=""), + 3: dict(name="left_shoulder", id=3, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 4: dict(name="left_elbow", id=4, color=[0, 255, 0], type="upper", swap="right_elbow"), + 5: dict(name="left_wrist", id=5, color=[0, 255, 0], type="upper", swap="right_wrist"), + 6: dict(name="left_hip", id=6, color=[0, 255, 0], type="lower", swap="right_hip"), + 7: dict(name="left_knee", id=7, color=[0, 255, 0], type="lower", swap="right_knee"), + 8: dict(name="left_ankle", id=8, color=[0, 255, 0], type="lower", swap="right_ankle"), + 9: dict(name="right_shoulder", id=9, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 10: dict(name="right_elbow", id=10, color=[255, 128, 0], type="upper", swap="left_elbow"), + 11: dict(name="right_wrist", id=11, color=[255, 128, 0], type="upper", swap="left_wrist"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="right_knee", id=13, color=[255, 128, 0], type="lower", swap="left_knee"), + 14: dict(name="right_ankle", id=14, color=[255, 128, 0], type="lower", swap="left_ankle"), + 15: dict(name="left_eye", id=15, color=[51, 153, 255], type="upper", swap="right_eye"), + 16: dict(name="left_ear", id=16, color=[51, 153, 255], type="upper", swap="right_ear"), + 17: dict(name="right_eye", id=17, color=[51, 153, 255], type="upper", swap="left_eye"), + 18: dict(name="right_ear", id=18, color=[51, 153, 255], type="upper", swap="left_ear"), }, skeleton_info={ - 0: dict(link=('nose', 'neck'), id=0, color=[51, 153, 255]), - 1: dict(link=('neck', 'left_shoulder'), id=1, color=[0, 255, 0]), - 2: dict(link=('neck', 'right_shoulder'), id=2, color=[255, 128, 0]), - 3: dict(link=('left_shoulder', 'left_elbow'), id=3, color=[0, 255, 0]), - 4: dict( - link=('right_shoulder', 'right_elbow'), id=4, color=[255, 128, 0]), - 5: dict(link=('left_elbow', 'left_wrist'), id=5, color=[0, 255, 0]), - 6: - dict(link=('right_elbow', 'right_wrist'), id=6, color=[255, 128, 0]), - 7: dict(link=('left_ankle', 'left_knee'), id=7, color=[0, 255, 0]), - 8: dict(link=('left_knee', 'left_hip'), id=8, color=[0, 255, 0]), - 9: dict(link=('right_ankle', 'right_knee'), id=9, color=[255, 128, 0]), - 10: dict(link=('right_knee', 'right_hip'), id=10, color=[255, 128, 0]), - 11: dict(link=('mid_hip', 'left_hip'), id=11, color=[0, 255, 0]), - 12: dict(link=('mid_hip', 'right_hip'), id=12, color=[255, 128, 0]), - 13: dict(link=('mid_hip', 'neck'), id=13, color=[51, 153, 255]), + 0: dict(link=("nose", "neck"), id=0, color=[51, 153, 255]), + 1: dict(link=("neck", "left_shoulder"), id=1, color=[0, 255, 0]), + 2: dict(link=("neck", "right_shoulder"), id=2, color=[255, 128, 0]), + 3: dict(link=("left_shoulder", "left_elbow"), id=3, color=[0, 255, 0]), + 4: dict(link=("right_shoulder", "right_elbow"), id=4, color=[255, 128, 0]), + 5: dict(link=("left_elbow", "left_wrist"), id=5, color=[0, 255, 0]), + 6: dict(link=("right_elbow", "right_wrist"), id=6, color=[255, 128, 0]), + 7: dict(link=("left_ankle", "left_knee"), id=7, color=[0, 255, 0]), + 8: dict(link=("left_knee", "left_hip"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_ankle", "right_knee"), id=9, color=[255, 128, 0]), + 10: dict(link=("right_knee", "right_hip"), id=10, color=[255, 128, 0]), + 11: dict(link=("mid_hip", "left_hip"), id=11, color=[0, 255, 0]), + 12: dict(link=("mid_hip", "right_hip"), id=12, color=[255, 128, 0]), + 13: dict(link=("mid_hip", "neck"), id=13, color=[51, 153, 255]), }, - joint_weights=[ - 1.0, 1.0, 1.0, 1.0, 1.2, 1.5, 1.0, 1.2, 1.5, 1.0, 1.2, 1.5, 1.0, 1.2, - 1.5, 1.0, 1.0, 1.0, 1.0 - ], + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.2, 1.5, 1.0, 1.2, 1.5, 1.0, 1.2, 1.5, 1.0, 1.2, 1.5, 1.0, 1.0, 1.0, 1.0], sigmas=[ - 0.026, 0.026, 0.107, 0.079, 0.072, 0.062, 0.107, 0.087, 0.089, 0.079, - 0.072, 0.062, 0.107, 0.087, 0.089, 0.025, 0.035, 0.025, 0.035 - ]) + 0.026, + 0.026, + 0.107, + 0.079, + 0.072, + 0.062, + 0.107, + 0.087, + 0.089, + 0.079, + 0.072, + 0.062, + 0.107, + 0.087, + 0.089, + 0.025, + 0.035, + 0.025, + 0.035, + ], +) diff --git a/mmpose/configs/_base_/datasets/panoptic_hand2d.py b/mmpose/configs/_base_/datasets/panoptic_hand2d.py index 7a65731ba87b155beb1b40591fd9acb232c2afc6..3ef157ab2a58a9898ff0e557864ce86e2d471510 100644 --- a/mmpose/configs/_base_/datasets/panoptic_hand2d.py +++ b/mmpose/configs/_base_/datasets/panoptic_hand2d.py @@ -1,143 +1,57 @@ dataset_info = dict( - dataset_name='panoptic_hand2d', + dataset_name="panoptic_hand2d", paper_info=dict( - author='Simon, Tomas and Joo, Hanbyul and ' - 'Matthews, Iain and Sheikh, Yaser', - title='Hand keypoint detection in single images using ' - 'multiview bootstrapping', - container='Proceedings of the IEEE conference on ' - 'Computer Vision and Pattern Recognition', - year='2017', - homepage='http://domedb.perception.cs.cmu.edu/handdb.html', + author="Simon, Tomas and Joo, Hanbyul and " "Matthews, Iain and Sheikh, Yaser", + title="Hand keypoint detection in single images using " "multiview bootstrapping", + container="Proceedings of the IEEE conference on " "Computer Vision and Pattern Recognition", + year="2017", + homepage="http://domedb.perception.cs.cmu.edu/handdb.html", ), keypoint_info={ - 0: - dict(name='wrist', id=0, color=[255, 255, 255], type='', swap=''), - 1: - dict(name='thumb1', id=1, color=[255, 128, 0], type='', swap=''), - 2: - dict(name='thumb2', id=2, color=[255, 128, 0], type='', swap=''), - 3: - dict(name='thumb3', id=3, color=[255, 128, 0], type='', swap=''), - 4: - dict(name='thumb4', id=4, color=[255, 128, 0], type='', swap=''), - 5: - dict( - name='forefinger1', id=5, color=[255, 153, 255], type='', swap=''), - 6: - dict( - name='forefinger2', id=6, color=[255, 153, 255], type='', swap=''), - 7: - dict( - name='forefinger3', id=7, color=[255, 153, 255], type='', swap=''), - 8: - dict( - name='forefinger4', id=8, color=[255, 153, 255], type='', swap=''), - 9: - dict( - name='middle_finger1', - id=9, - color=[102, 178, 255], - type='', - swap=''), - 10: - dict( - name='middle_finger2', - id=10, - color=[102, 178, 255], - type='', - swap=''), - 11: - dict( - name='middle_finger3', - id=11, - color=[102, 178, 255], - type='', - swap=''), - 12: - dict( - name='middle_finger4', - id=12, - color=[102, 178, 255], - type='', - swap=''), - 13: - dict( - name='ring_finger1', id=13, color=[255, 51, 51], type='', swap=''), - 14: - dict( - name='ring_finger2', id=14, color=[255, 51, 51], type='', swap=''), - 15: - dict( - name='ring_finger3', id=15, color=[255, 51, 51], type='', swap=''), - 16: - dict( - name='ring_finger4', id=16, color=[255, 51, 51], type='', swap=''), - 17: - dict(name='pinky_finger1', id=17, color=[0, 255, 0], type='', swap=''), - 18: - dict(name='pinky_finger2', id=18, color=[0, 255, 0], type='', swap=''), - 19: - dict(name='pinky_finger3', id=19, color=[0, 255, 0], type='', swap=''), - 20: - dict(name='pinky_finger4', id=20, color=[0, 255, 0], type='', swap='') + 0: dict(name="wrist", id=0, color=[255, 255, 255], type="", swap=""), + 1: dict(name="thumb1", id=1, color=[255, 128, 0], type="", swap=""), + 2: dict(name="thumb2", id=2, color=[255, 128, 0], type="", swap=""), + 3: dict(name="thumb3", id=3, color=[255, 128, 0], type="", swap=""), + 4: dict(name="thumb4", id=4, color=[255, 128, 0], type="", swap=""), + 5: dict(name="forefinger1", id=5, color=[255, 153, 255], type="", swap=""), + 6: dict(name="forefinger2", id=6, color=[255, 153, 255], type="", swap=""), + 7: dict(name="forefinger3", id=7, color=[255, 153, 255], type="", swap=""), + 8: dict(name="forefinger4", id=8, color=[255, 153, 255], type="", swap=""), + 9: dict(name="middle_finger1", id=9, color=[102, 178, 255], type="", swap=""), + 10: dict(name="middle_finger2", id=10, color=[102, 178, 255], type="", swap=""), + 11: dict(name="middle_finger3", id=11, color=[102, 178, 255], type="", swap=""), + 12: dict(name="middle_finger4", id=12, color=[102, 178, 255], type="", swap=""), + 13: dict(name="ring_finger1", id=13, color=[255, 51, 51], type="", swap=""), + 14: dict(name="ring_finger2", id=14, color=[255, 51, 51], type="", swap=""), + 15: dict(name="ring_finger3", id=15, color=[255, 51, 51], type="", swap=""), + 16: dict(name="ring_finger4", id=16, color=[255, 51, 51], type="", swap=""), + 17: dict(name="pinky_finger1", id=17, color=[0, 255, 0], type="", swap=""), + 18: dict(name="pinky_finger2", id=18, color=[0, 255, 0], type="", swap=""), + 19: dict(name="pinky_finger3", id=19, color=[0, 255, 0], type="", swap=""), + 20: dict(name="pinky_finger4", id=20, color=[0, 255, 0], type="", swap=""), }, skeleton_info={ - 0: - dict(link=('wrist', 'thumb1'), id=0, color=[255, 128, 0]), - 1: - dict(link=('thumb1', 'thumb2'), id=1, color=[255, 128, 0]), - 2: - dict(link=('thumb2', 'thumb3'), id=2, color=[255, 128, 0]), - 3: - dict(link=('thumb3', 'thumb4'), id=3, color=[255, 128, 0]), - 4: - dict(link=('wrist', 'forefinger1'), id=4, color=[255, 153, 255]), - 5: - dict(link=('forefinger1', 'forefinger2'), id=5, color=[255, 153, 255]), - 6: - dict(link=('forefinger2', 'forefinger3'), id=6, color=[255, 153, 255]), - 7: - dict(link=('forefinger3', 'forefinger4'), id=7, color=[255, 153, 255]), - 8: - dict(link=('wrist', 'middle_finger1'), id=8, color=[102, 178, 255]), - 9: - dict( - link=('middle_finger1', 'middle_finger2'), - id=9, - color=[102, 178, 255]), - 10: - dict( - link=('middle_finger2', 'middle_finger3'), - id=10, - color=[102, 178, 255]), - 11: - dict( - link=('middle_finger3', 'middle_finger4'), - id=11, - color=[102, 178, 255]), - 12: - dict(link=('wrist', 'ring_finger1'), id=12, color=[255, 51, 51]), - 13: - dict( - link=('ring_finger1', 'ring_finger2'), id=13, color=[255, 51, 51]), - 14: - dict( - link=('ring_finger2', 'ring_finger3'), id=14, color=[255, 51, 51]), - 15: - dict( - link=('ring_finger3', 'ring_finger4'), id=15, color=[255, 51, 51]), - 16: - dict(link=('wrist', 'pinky_finger1'), id=16, color=[0, 255, 0]), - 17: - dict( - link=('pinky_finger1', 'pinky_finger2'), id=17, color=[0, 255, 0]), - 18: - dict( - link=('pinky_finger2', 'pinky_finger3'), id=18, color=[0, 255, 0]), - 19: - dict( - link=('pinky_finger3', 'pinky_finger4'), id=19, color=[0, 255, 0]) + 0: dict(link=("wrist", "thumb1"), id=0, color=[255, 128, 0]), + 1: dict(link=("thumb1", "thumb2"), id=1, color=[255, 128, 0]), + 2: dict(link=("thumb2", "thumb3"), id=2, color=[255, 128, 0]), + 3: dict(link=("thumb3", "thumb4"), id=3, color=[255, 128, 0]), + 4: dict(link=("wrist", "forefinger1"), id=4, color=[255, 153, 255]), + 5: dict(link=("forefinger1", "forefinger2"), id=5, color=[255, 153, 255]), + 6: dict(link=("forefinger2", "forefinger3"), id=6, color=[255, 153, 255]), + 7: dict(link=("forefinger3", "forefinger4"), id=7, color=[255, 153, 255]), + 8: dict(link=("wrist", "middle_finger1"), id=8, color=[102, 178, 255]), + 9: dict(link=("middle_finger1", "middle_finger2"), id=9, color=[102, 178, 255]), + 10: dict(link=("middle_finger2", "middle_finger3"), id=10, color=[102, 178, 255]), + 11: dict(link=("middle_finger3", "middle_finger4"), id=11, color=[102, 178, 255]), + 12: dict(link=("wrist", "ring_finger1"), id=12, color=[255, 51, 51]), + 13: dict(link=("ring_finger1", "ring_finger2"), id=13, color=[255, 51, 51]), + 14: dict(link=("ring_finger2", "ring_finger3"), id=14, color=[255, 51, 51]), + 15: dict(link=("ring_finger3", "ring_finger4"), id=15, color=[255, 51, 51]), + 16: dict(link=("wrist", "pinky_finger1"), id=16, color=[0, 255, 0]), + 17: dict(link=("pinky_finger1", "pinky_finger2"), id=17, color=[0, 255, 0]), + 18: dict(link=("pinky_finger2", "pinky_finger3"), id=18, color=[0, 255, 0]), + 19: dict(link=("pinky_finger3", "pinky_finger4"), id=19, color=[0, 255, 0]), }, - joint_weights=[1.] * 21, - sigmas=[]) + joint_weights=[1.0] * 21, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/posetrack18.py b/mmpose/configs/_base_/datasets/posetrack18.py index 5aefd1c97fe083df35ee88bebab4f99134c27971..1499c9f8a9b14c106fdf80e5c496910c887c756c 100644 --- a/mmpose/configs/_base_/datasets/posetrack18.py +++ b/mmpose/configs/_base_/datasets/posetrack18.py @@ -1,176 +1,51 @@ dataset_info = dict( - dataset_name='posetrack18', + dataset_name="posetrack18", paper_info=dict( - author='Andriluka, Mykhaylo and Iqbal, Umar and ' - 'Insafutdinov, Eldar and Pishchulin, Leonid and ' - 'Milan, Anton and Gall, Juergen and Schiele, Bernt', - title='Posetrack: A benchmark for human pose estimation and tracking', - container='Proceedings of the IEEE Conference on ' - 'Computer Vision and Pattern Recognition', - year='2018', - homepage='https://posetrack.net/users/download.php', + author="Andriluka, Mykhaylo and Iqbal, Umar and " + "Insafutdinov, Eldar and Pishchulin, Leonid and " + "Milan, Anton and Gall, Juergen and Schiele, Bernt", + title="Posetrack: A benchmark for human pose estimation and tracking", + container="Proceedings of the IEEE Conference on " "Computer Vision and Pattern Recognition", + year="2018", + homepage="https://posetrack.net/users/download.php", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='head_bottom', - id=1, - color=[51, 153, 255], - type='upper', - swap=''), - 2: - dict( - name='head_top', id=2, color=[51, 153, 255], type='upper', - swap=''), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle') + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="head_bottom", id=1, color=[51, 153, 255], type="upper", swap=""), + 2: dict(name="head_top", id=2, color=[51, 153, 255], type="upper", swap=""), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'right_shoulder'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('nose', 'head_bottom'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'head_top'), id=13, color=[51, 153, 255]), - 14: - dict( - link=('head_bottom', 'left_shoulder'), id=14, color=[51, 153, - 255]), - 15: - dict( - link=('head_bottom', 'right_shoulder'), - id=15, - color=[51, 153, 255]) + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("nose", "head_bottom"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "head_top"), id=13, color=[51, 153, 255]), + 14: dict(link=("head_bottom", "left_shoulder"), id=14, color=[51, 153, 255]), + 15: dict(link=("head_bottom", "right_shoulder"), id=15, color=[51, 153, 255]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5 - ], - sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089 - ]) + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5], + sigmas=[0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089], +) diff --git a/mmpose/configs/_base_/datasets/rhd2d.py b/mmpose/configs/_base_/datasets/rhd2d.py index 4631ccd03814155b06687e0b1ba2b83404c837fc..bab35a83ffc14d82bacd35300f91ade82e49b9d3 100644 --- a/mmpose/configs/_base_/datasets/rhd2d.py +++ b/mmpose/configs/_base_/datasets/rhd2d.py @@ -1,12 +1,11 @@ dataset_info = dict( - dataset_name='rhd2d', + dataset_name="rhd2d", paper_info=dict( - author='Christian Zimmermann and Thomas Brox', - title='Learning to Estimate 3D Hand Pose from Single RGB Images', - container='arXiv', - year='2017', - homepage='https://lmb.informatik.uni-freiburg.de/resources/' - 'datasets/RenderedHandposeDataset.en.html', + author="Christian Zimmermann and Thomas Brox", + title="Learning to Estimate 3D Hand Pose from Single RGB Images", + container="arXiv", + year="2017", + homepage="https://lmb.informatik.uni-freiburg.de/resources/" "datasets/RenderedHandposeDataset.en.html", ), # In RHD, 1-4: left thumb [tip to palm], which means the finger is from # tip to palm, so as other fingers. Please refer to @@ -19,133 +18,50 @@ dataset_info = dict( # the keypoint in the dataset. It is mostly for visualization & storing # information about flip_pairs. keypoint_info={ - 0: - dict(name='wrist', id=0, color=[255, 255, 255], type='', swap=''), - 1: - dict(name='thumb4', id=1, color=[255, 128, 0], type='', swap=''), - 2: - dict(name='thumb3', id=2, color=[255, 128, 0], type='', swap=''), - 3: - dict(name='thumb2', id=3, color=[255, 128, 0], type='', swap=''), - 4: - dict(name='thumb1', id=4, color=[255, 128, 0], type='', swap=''), - 5: - dict( - name='forefinger4', id=5, color=[255, 153, 255], type='', swap=''), - 6: - dict( - name='forefinger3', id=6, color=[255, 153, 255], type='', swap=''), - 7: - dict( - name='forefinger2', id=7, color=[255, 153, 255], type='', swap=''), - 8: - dict( - name='forefinger1', id=8, color=[255, 153, 255], type='', swap=''), - 9: - dict( - name='middle_finger4', - id=9, - color=[102, 178, 255], - type='', - swap=''), - 10: - dict( - name='middle_finger3', - id=10, - color=[102, 178, 255], - type='', - swap=''), - 11: - dict( - name='middle_finger2', - id=11, - color=[102, 178, 255], - type='', - swap=''), - 12: - dict( - name='middle_finger1', - id=12, - color=[102, 178, 255], - type='', - swap=''), - 13: - dict( - name='ring_finger4', id=13, color=[255, 51, 51], type='', swap=''), - 14: - dict( - name='ring_finger3', id=14, color=[255, 51, 51], type='', swap=''), - 15: - dict( - name='ring_finger2', id=15, color=[255, 51, 51], type='', swap=''), - 16: - dict( - name='ring_finger1', id=16, color=[255, 51, 51], type='', swap=''), - 17: - dict(name='pinky_finger4', id=17, color=[0, 255, 0], type='', swap=''), - 18: - dict(name='pinky_finger3', id=18, color=[0, 255, 0], type='', swap=''), - 19: - dict(name='pinky_finger2', id=19, color=[0, 255, 0], type='', swap=''), - 20: - dict(name='pinky_finger1', id=20, color=[0, 255, 0], type='', swap='') + 0: dict(name="wrist", id=0, color=[255, 255, 255], type="", swap=""), + 1: dict(name="thumb4", id=1, color=[255, 128, 0], type="", swap=""), + 2: dict(name="thumb3", id=2, color=[255, 128, 0], type="", swap=""), + 3: dict(name="thumb2", id=3, color=[255, 128, 0], type="", swap=""), + 4: dict(name="thumb1", id=4, color=[255, 128, 0], type="", swap=""), + 5: dict(name="forefinger4", id=5, color=[255, 153, 255], type="", swap=""), + 6: dict(name="forefinger3", id=6, color=[255, 153, 255], type="", swap=""), + 7: dict(name="forefinger2", id=7, color=[255, 153, 255], type="", swap=""), + 8: dict(name="forefinger1", id=8, color=[255, 153, 255], type="", swap=""), + 9: dict(name="middle_finger4", id=9, color=[102, 178, 255], type="", swap=""), + 10: dict(name="middle_finger3", id=10, color=[102, 178, 255], type="", swap=""), + 11: dict(name="middle_finger2", id=11, color=[102, 178, 255], type="", swap=""), + 12: dict(name="middle_finger1", id=12, color=[102, 178, 255], type="", swap=""), + 13: dict(name="ring_finger4", id=13, color=[255, 51, 51], type="", swap=""), + 14: dict(name="ring_finger3", id=14, color=[255, 51, 51], type="", swap=""), + 15: dict(name="ring_finger2", id=15, color=[255, 51, 51], type="", swap=""), + 16: dict(name="ring_finger1", id=16, color=[255, 51, 51], type="", swap=""), + 17: dict(name="pinky_finger4", id=17, color=[0, 255, 0], type="", swap=""), + 18: dict(name="pinky_finger3", id=18, color=[0, 255, 0], type="", swap=""), + 19: dict(name="pinky_finger2", id=19, color=[0, 255, 0], type="", swap=""), + 20: dict(name="pinky_finger1", id=20, color=[0, 255, 0], type="", swap=""), }, skeleton_info={ - 0: - dict(link=('wrist', 'thumb1'), id=0, color=[255, 128, 0]), - 1: - dict(link=('thumb1', 'thumb2'), id=1, color=[255, 128, 0]), - 2: - dict(link=('thumb2', 'thumb3'), id=2, color=[255, 128, 0]), - 3: - dict(link=('thumb3', 'thumb4'), id=3, color=[255, 128, 0]), - 4: - dict(link=('wrist', 'forefinger1'), id=4, color=[255, 153, 255]), - 5: - dict(link=('forefinger1', 'forefinger2'), id=5, color=[255, 153, 255]), - 6: - dict(link=('forefinger2', 'forefinger3'), id=6, color=[255, 153, 255]), - 7: - dict(link=('forefinger3', 'forefinger4'), id=7, color=[255, 153, 255]), - 8: - dict(link=('wrist', 'middle_finger1'), id=8, color=[102, 178, 255]), - 9: - dict( - link=('middle_finger1', 'middle_finger2'), - id=9, - color=[102, 178, 255]), - 10: - dict( - link=('middle_finger2', 'middle_finger3'), - id=10, - color=[102, 178, 255]), - 11: - dict( - link=('middle_finger3', 'middle_finger4'), - id=11, - color=[102, 178, 255]), - 12: - dict(link=('wrist', 'ring_finger1'), id=12, color=[255, 51, 51]), - 13: - dict( - link=('ring_finger1', 'ring_finger2'), id=13, color=[255, 51, 51]), - 14: - dict( - link=('ring_finger2', 'ring_finger3'), id=14, color=[255, 51, 51]), - 15: - dict( - link=('ring_finger3', 'ring_finger4'), id=15, color=[255, 51, 51]), - 16: - dict(link=('wrist', 'pinky_finger1'), id=16, color=[0, 255, 0]), - 17: - dict( - link=('pinky_finger1', 'pinky_finger2'), id=17, color=[0, 255, 0]), - 18: - dict( - link=('pinky_finger2', 'pinky_finger3'), id=18, color=[0, 255, 0]), - 19: - dict( - link=('pinky_finger3', 'pinky_finger4'), id=19, color=[0, 255, 0]) + 0: dict(link=("wrist", "thumb1"), id=0, color=[255, 128, 0]), + 1: dict(link=("thumb1", "thumb2"), id=1, color=[255, 128, 0]), + 2: dict(link=("thumb2", "thumb3"), id=2, color=[255, 128, 0]), + 3: dict(link=("thumb3", "thumb4"), id=3, color=[255, 128, 0]), + 4: dict(link=("wrist", "forefinger1"), id=4, color=[255, 153, 255]), + 5: dict(link=("forefinger1", "forefinger2"), id=5, color=[255, 153, 255]), + 6: dict(link=("forefinger2", "forefinger3"), id=6, color=[255, 153, 255]), + 7: dict(link=("forefinger3", "forefinger4"), id=7, color=[255, 153, 255]), + 8: dict(link=("wrist", "middle_finger1"), id=8, color=[102, 178, 255]), + 9: dict(link=("middle_finger1", "middle_finger2"), id=9, color=[102, 178, 255]), + 10: dict(link=("middle_finger2", "middle_finger3"), id=10, color=[102, 178, 255]), + 11: dict(link=("middle_finger3", "middle_finger4"), id=11, color=[102, 178, 255]), + 12: dict(link=("wrist", "ring_finger1"), id=12, color=[255, 51, 51]), + 13: dict(link=("ring_finger1", "ring_finger2"), id=13, color=[255, 51, 51]), + 14: dict(link=("ring_finger2", "ring_finger3"), id=14, color=[255, 51, 51]), + 15: dict(link=("ring_finger3", "ring_finger4"), id=15, color=[255, 51, 51]), + 16: dict(link=("wrist", "pinky_finger1"), id=16, color=[0, 255, 0]), + 17: dict(link=("pinky_finger1", "pinky_finger2"), id=17, color=[0, 255, 0]), + 18: dict(link=("pinky_finger2", "pinky_finger3"), id=18, color=[0, 255, 0]), + 19: dict(link=("pinky_finger3", "pinky_finger4"), id=19, color=[0, 255, 0]), }, - joint_weights=[1.] * 21, - sigmas=[]) + joint_weights=[1.0] * 21, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/shelf.py b/mmpose/configs/_base_/datasets/shelf.py index 5fe6e42b3b44e3f65947284efd9ffac58d41d43f..88af9e794de29511deb04667cec6f6708ce131ca 100644 --- a/mmpose/configs/_base_/datasets/shelf.py +++ b/mmpose/configs/_base_/datasets/shelf.py @@ -1,151 +1,45 @@ dataset_info = dict( - dataset_name='shelf', + dataset_name="shelf", paper_info=dict( - author='Belagiannis, Vasileios and Amin, Sikandar and Andriluka, ' - 'Mykhaylo and Schiele, Bernt and Navab, Nassir and Ilic, Slobodan', - title='3D Pictorial Structures for Multiple Human Pose Estimation', - container='IEEE Computer Society Conference on Computer Vision and ' - 'Pattern Recognition (CVPR)', - year='2014', - homepage='http://campar.in.tum.de/Chair/MultiHumanPose', + author="Belagiannis, Vasileios and Amin, Sikandar and Andriluka, " + "Mykhaylo and Schiele, Bernt and Navab, Nassir and Ilic, Slobodan", + title="3D Pictorial Structures for Multiple Human Pose Estimation", + container="IEEE Computer Society Conference on Computer Vision and " "Pattern Recognition (CVPR)", + year="2014", + homepage="http://campar.in.tum.de/Chair/MultiHumanPose", ), keypoint_info={ - 0: - dict( - name='right_ankle', - id=0, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 1: - dict( - name='right_knee', - id=1, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 2: - dict( - name='right_hip', - id=2, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 3: - dict( - name='left_hip', - id=3, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 4: - dict( - name='left_knee', - id=4, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 5: - dict( - name='left_ankle', - id=5, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 6: - dict( - name='right_wrist', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 7: - dict( - name='right_elbow', - id=7, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 8: - dict( - name='right_shoulder', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 9: - dict( - name='left_shoulder', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 10: - dict( - name='left_elbow', - id=10, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 11: - dict( - name='left_wrist', - id=11, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 12: - dict( - name='bottom_head', - id=12, - color=[51, 153, 255], - type='upper', - swap=''), - 13: - dict( - name='top_head', - id=13, - color=[51, 153, 255], - type='upper', - swap=''), + 0: dict(name="right_ankle", id=0, color=[255, 128, 0], type="lower", swap="left_ankle"), + 1: dict(name="right_knee", id=1, color=[255, 128, 0], type="lower", swap="left_knee"), + 2: dict(name="right_hip", id=2, color=[255, 128, 0], type="lower", swap="left_hip"), + 3: dict(name="left_hip", id=3, color=[0, 255, 0], type="lower", swap="right_hip"), + 4: dict(name="left_knee", id=4, color=[0, 255, 0], type="lower", swap="right_knee"), + 5: dict(name="left_ankle", id=5, color=[0, 255, 0], type="lower", swap="right_ankle"), + 6: dict(name="right_wrist", id=6, color=[255, 128, 0], type="upper", swap="left_wrist"), + 7: dict(name="right_elbow", id=7, color=[255, 128, 0], type="upper", swap="left_elbow"), + 8: dict(name="right_shoulder", id=8, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 9: dict(name="left_shoulder", id=9, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 10: dict(name="left_elbow", id=10, color=[0, 255, 0], type="upper", swap="right_elbow"), + 11: dict(name="left_wrist", id=11, color=[0, 255, 0], type="upper", swap="right_wrist"), + 12: dict(name="bottom_head", id=12, color=[51, 153, 255], type="upper", swap=""), + 13: dict(name="top_head", id=13, color=[51, 153, 255], type="upper", swap=""), }, skeleton_info={ - 0: - dict(link=('right_ankle', 'right_knee'), id=0, color=[255, 128, 0]), - 1: - dict(link=('right_knee', 'right_hip'), id=1, color=[255, 128, 0]), - 2: - dict(link=('left_hip', 'left_knee'), id=2, color=[0, 255, 0]), - 3: - dict(link=('left_knee', 'left_ankle'), id=3, color=[0, 255, 0]), - 4: - dict(link=('right_hip', 'left_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('right_wrist', 'right_elbow'), id=5, color=[255, 128, 0]), - 6: - dict( - link=('right_elbow', 'right_shoulder'), id=6, color=[255, 128, 0]), - 7: - dict(link=('left_shoulder', 'left_elbow'), id=7, color=[0, 255, 0]), - 8: - dict(link=('left_elbow', 'left_wrist'), id=8, color=[0, 255, 0]), - 9: - dict(link=('right_hip', 'right_shoulder'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_hip', 'left_shoulder'), id=10, color=[0, 255, 0]), - 11: - dict( - link=('right_shoulder', 'bottom_head'), id=11, color=[255, 128, - 0]), - 12: - dict(link=('left_shoulder', 'bottom_head'), id=12, color=[0, 255, 0]), - 13: - dict(link=('bottom_head', 'top_head'), id=13, color=[51, 153, 255]), + 0: dict(link=("right_ankle", "right_knee"), id=0, color=[255, 128, 0]), + 1: dict(link=("right_knee", "right_hip"), id=1, color=[255, 128, 0]), + 2: dict(link=("left_hip", "left_knee"), id=2, color=[0, 255, 0]), + 3: dict(link=("left_knee", "left_ankle"), id=3, color=[0, 255, 0]), + 4: dict(link=("right_hip", "left_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("right_wrist", "right_elbow"), id=5, color=[255, 128, 0]), + 6: dict(link=("right_elbow", "right_shoulder"), id=6, color=[255, 128, 0]), + 7: dict(link=("left_shoulder", "left_elbow"), id=7, color=[0, 255, 0]), + 8: dict(link=("left_elbow", "left_wrist"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_hip", "right_shoulder"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_hip", "left_shoulder"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_shoulder", "bottom_head"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_shoulder", "bottom_head"), id=12, color=[0, 255, 0]), + 13: dict(link=("bottom_head", "top_head"), id=13, color=[51, 153, 255]), }, - joint_weights=[ - 1.5, 1.2, 1.0, 1.0, 1.2, 1.5, 1.5, 1.2, 1.0, 1.0, 1.2, 1.5, 1.0, 1.0 - ], - sigmas=[ - 0.089, 0.087, 0.107, 0.107, 0.087, 0.089, 0.062, 0.072, 0.079, 0.079, - 0.072, 0.062, 0.026, 0.026 - ]) + joint_weights=[1.5, 1.2, 1.0, 1.0, 1.2, 1.5, 1.5, 1.2, 1.0, 1.0, 1.2, 1.5, 1.0, 1.0], + sigmas=[0.089, 0.087, 0.107, 0.107, 0.087, 0.089, 0.062, 0.072, 0.079, 0.079, 0.072, 0.062, 0.026, 0.026], +) diff --git a/mmpose/configs/_base_/datasets/ubody2d.py b/mmpose/configs/_base_/datasets/ubody2d.py index 8486db05ab3cf961da15eb5e15ed570d27c3cb09..78cb3026f5fa7c3fa19b7d899d3fbc147905aeff 100644 --- a/mmpose/configs/_base_/datasets/ubody2d.py +++ b/mmpose/configs/_base_/datasets/ubody2d.py @@ -1,1153 +1,350 @@ dataset_info = dict( - dataset_name='ubody2d', + dataset_name="ubody2d", paper_info=dict( - author='Jing Lin, Ailing Zeng, Haoqian Wang, Lei Zhang, Yu Li', - title='One-Stage 3D Whole-Body Mesh Recovery with Component Aware' - 'Transformer', - container='IEEE Computer Society Conference on Computer Vision and ' - 'Pattern Recognition (CVPR)', - year='2023', - homepage='https://github.com/IDEA-Research/OSX', + author="Jing Lin, Ailing Zeng, Haoqian Wang, Lei Zhang, Yu Li", + title="One-Stage 3D Whole-Body Mesh Recovery with Component Aware" "Transformer", + container="IEEE Computer Society Conference on Computer Vision and " "Pattern Recognition (CVPR)", + year="2023", + homepage="https://github.com/IDEA-Research/OSX", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 17: - dict( - name='left_big_toe', - id=17, - color=[255, 128, 0], - type='lower', - swap='right_big_toe'), - 18: - dict( - name='left_small_toe', - id=18, - color=[255, 128, 0], - type='lower', - swap='right_small_toe'), - 19: - dict( - name='left_heel', - id=19, - color=[255, 128, 0], - type='lower', - swap='right_heel'), - 20: - dict( - name='right_big_toe', - id=20, - color=[255, 128, 0], - type='lower', - swap='left_big_toe'), - 21: - dict( - name='right_small_toe', - id=21, - color=[255, 128, 0], - type='lower', - swap='left_small_toe'), - 22: - dict( - name='right_heel', - id=22, - color=[255, 128, 0], - type='lower', - swap='left_heel'), - 23: - dict( - name='face-0', - id=23, - color=[255, 255, 255], - type='', - swap='face-16'), - 24: - dict( - name='face-1', - id=24, - color=[255, 255, 255], - type='', - swap='face-15'), - 25: - dict( - name='face-2', - id=25, - color=[255, 255, 255], - type='', - swap='face-14'), - 26: - dict( - name='face-3', - id=26, - color=[255, 255, 255], - type='', - swap='face-13'), - 27: - dict( - name='face-4', - id=27, - color=[255, 255, 255], - type='', - swap='face-12'), - 28: - dict( - name='face-5', - id=28, - color=[255, 255, 255], - type='', - swap='face-11'), - 29: - dict( - name='face-6', - id=29, - color=[255, 255, 255], - type='', - swap='face-10'), - 30: - dict( - name='face-7', - id=30, - color=[255, 255, 255], - type='', - swap='face-9'), - 31: - dict(name='face-8', id=31, color=[255, 255, 255], type='', swap=''), - 32: - dict( - name='face-9', - id=32, - color=[255, 255, 255], - type='', - swap='face-7'), - 33: - dict( - name='face-10', - id=33, - color=[255, 255, 255], - type='', - swap='face-6'), - 34: - dict( - name='face-11', - id=34, - color=[255, 255, 255], - type='', - swap='face-5'), - 35: - dict( - name='face-12', - id=35, - color=[255, 255, 255], - type='', - swap='face-4'), - 36: - dict( - name='face-13', - id=36, - color=[255, 255, 255], - type='', - swap='face-3'), - 37: - dict( - name='face-14', - id=37, - color=[255, 255, 255], - type='', - swap='face-2'), - 38: - dict( - name='face-15', - id=38, - color=[255, 255, 255], - type='', - swap='face-1'), - 39: - dict( - name='face-16', - id=39, - color=[255, 255, 255], - type='', - swap='face-0'), - 40: - dict( - name='face-17', - id=40, - color=[255, 255, 255], - type='', - swap='face-26'), - 41: - dict( - name='face-18', - id=41, - color=[255, 255, 255], - type='', - swap='face-25'), - 42: - dict( - name='face-19', - id=42, - color=[255, 255, 255], - type='', - swap='face-24'), - 43: - dict( - name='face-20', - id=43, - color=[255, 255, 255], - type='', - swap='face-23'), - 44: - dict( - name='face-21', - id=44, - color=[255, 255, 255], - type='', - swap='face-22'), - 45: - dict( - name='face-22', - id=45, - color=[255, 255, 255], - type='', - swap='face-21'), - 46: - dict( - name='face-23', - id=46, - color=[255, 255, 255], - type='', - swap='face-20'), - 47: - dict( - name='face-24', - id=47, - color=[255, 255, 255], - type='', - swap='face-19'), - 48: - dict( - name='face-25', - id=48, - color=[255, 255, 255], - type='', - swap='face-18'), - 49: - dict( - name='face-26', - id=49, - color=[255, 255, 255], - type='', - swap='face-17'), - 50: - dict(name='face-27', id=50, color=[255, 255, 255], type='', swap=''), - 51: - dict(name='face-28', id=51, color=[255, 255, 255], type='', swap=''), - 52: - dict(name='face-29', id=52, color=[255, 255, 255], type='', swap=''), - 53: - dict(name='face-30', id=53, color=[255, 255, 255], type='', swap=''), - 54: - dict( - name='face-31', - id=54, - color=[255, 255, 255], - type='', - swap='face-35'), - 55: - dict( - name='face-32', - id=55, - color=[255, 255, 255], - type='', - swap='face-34'), - 56: - dict(name='face-33', id=56, color=[255, 255, 255], type='', swap=''), - 57: - dict( - name='face-34', - id=57, - color=[255, 255, 255], - type='', - swap='face-32'), - 58: - dict( - name='face-35', - id=58, - color=[255, 255, 255], - type='', - swap='face-31'), - 59: - dict( - name='face-36', - id=59, - color=[255, 255, 255], - type='', - swap='face-45'), - 60: - dict( - name='face-37', - id=60, - color=[255, 255, 255], - type='', - swap='face-44'), - 61: - dict( - name='face-38', - id=61, - color=[255, 255, 255], - type='', - swap='face-43'), - 62: - dict( - name='face-39', - id=62, - color=[255, 255, 255], - type='', - swap='face-42'), - 63: - dict( - name='face-40', - id=63, - color=[255, 255, 255], - type='', - swap='face-47'), - 64: - dict( - name='face-41', - id=64, - color=[255, 255, 255], - type='', - swap='face-46'), - 65: - dict( - name='face-42', - id=65, - color=[255, 255, 255], - type='', - swap='face-39'), - 66: - dict( - name='face-43', - id=66, - color=[255, 255, 255], - type='', - swap='face-38'), - 67: - dict( - name='face-44', - id=67, - color=[255, 255, 255], - type='', - swap='face-37'), - 68: - dict( - name='face-45', - id=68, - color=[255, 255, 255], - type='', - swap='face-36'), - 69: - dict( - name='face-46', - id=69, - color=[255, 255, 255], - type='', - swap='face-41'), - 70: - dict( - name='face-47', - id=70, - color=[255, 255, 255], - type='', - swap='face-40'), - 71: - dict( - name='face-48', - id=71, - color=[255, 255, 255], - type='', - swap='face-54'), - 72: - dict( - name='face-49', - id=72, - color=[255, 255, 255], - type='', - swap='face-53'), - 73: - dict( - name='face-50', - id=73, - color=[255, 255, 255], - type='', - swap='face-52'), - 74: - dict(name='face-51', id=74, color=[255, 255, 255], type='', swap=''), - 75: - dict( - name='face-52', - id=75, - color=[255, 255, 255], - type='', - swap='face-50'), - 76: - dict( - name='face-53', - id=76, - color=[255, 255, 255], - type='', - swap='face-49'), - 77: - dict( - name='face-54', - id=77, - color=[255, 255, 255], - type='', - swap='face-48'), - 78: - dict( - name='face-55', - id=78, - color=[255, 255, 255], - type='', - swap='face-59'), - 79: - dict( - name='face-56', - id=79, - color=[255, 255, 255], - type='', - swap='face-58'), - 80: - dict(name='face-57', id=80, color=[255, 255, 255], type='', swap=''), - 81: - dict( - name='face-58', - id=81, - color=[255, 255, 255], - type='', - swap='face-56'), - 82: - dict( - name='face-59', - id=82, - color=[255, 255, 255], - type='', - swap='face-55'), - 83: - dict( - name='face-60', - id=83, - color=[255, 255, 255], - type='', - swap='face-64'), - 84: - dict( - name='face-61', - id=84, - color=[255, 255, 255], - type='', - swap='face-63'), - 85: - dict(name='face-62', id=85, color=[255, 255, 255], type='', swap=''), - 86: - dict( - name='face-63', - id=86, - color=[255, 255, 255], - type='', - swap='face-61'), - 87: - dict( - name='face-64', - id=87, - color=[255, 255, 255], - type='', - swap='face-60'), - 88: - dict( - name='face-65', - id=88, - color=[255, 255, 255], - type='', - swap='face-67'), - 89: - dict(name='face-66', id=89, color=[255, 255, 255], type='', swap=''), - 90: - dict( - name='face-67', - id=90, - color=[255, 255, 255], - type='', - swap='face-65'), - 91: - dict( - name='left_hand_root', - id=91, - color=[255, 255, 255], - type='', - swap='right_hand_root'), - 92: - dict( - name='left_thumb1', - id=92, - color=[255, 128, 0], - type='', - swap='right_thumb1'), - 93: - dict( - name='left_thumb2', - id=93, - color=[255, 128, 0], - type='', - swap='right_thumb2'), - 94: - dict( - name='left_thumb3', - id=94, - color=[255, 128, 0], - type='', - swap='right_thumb3'), - 95: - dict( - name='left_thumb4', - id=95, - color=[255, 128, 0], - type='', - swap='right_thumb4'), - 96: - dict( - name='left_forefinger1', - id=96, - color=[255, 153, 255], - type='', - swap='right_forefinger1'), - 97: - dict( - name='left_forefinger2', - id=97, - color=[255, 153, 255], - type='', - swap='right_forefinger2'), - 98: - dict( - name='left_forefinger3', - id=98, - color=[255, 153, 255], - type='', - swap='right_forefinger3'), - 99: - dict( - name='left_forefinger4', - id=99, - color=[255, 153, 255], - type='', - swap='right_forefinger4'), - 100: - dict( - name='left_middle_finger1', - id=100, - color=[102, 178, 255], - type='', - swap='right_middle_finger1'), - 101: - dict( - name='left_middle_finger2', - id=101, - color=[102, 178, 255], - type='', - swap='right_middle_finger2'), - 102: - dict( - name='left_middle_finger3', - id=102, - color=[102, 178, 255], - type='', - swap='right_middle_finger3'), - 103: - dict( - name='left_middle_finger4', - id=103, - color=[102, 178, 255], - type='', - swap='right_middle_finger4'), - 104: - dict( - name='left_ring_finger1', - id=104, - color=[255, 51, 51], - type='', - swap='right_ring_finger1'), - 105: - dict( - name='left_ring_finger2', - id=105, - color=[255, 51, 51], - type='', - swap='right_ring_finger2'), - 106: - dict( - name='left_ring_finger3', - id=106, - color=[255, 51, 51], - type='', - swap='right_ring_finger3'), - 107: - dict( - name='left_ring_finger4', - id=107, - color=[255, 51, 51], - type='', - swap='right_ring_finger4'), - 108: - dict( - name='left_pinky_finger1', - id=108, - color=[0, 255, 0], - type='', - swap='right_pinky_finger1'), - 109: - dict( - name='left_pinky_finger2', - id=109, - color=[0, 255, 0], - type='', - swap='right_pinky_finger2'), - 110: - dict( - name='left_pinky_finger3', - id=110, - color=[0, 255, 0], - type='', - swap='right_pinky_finger3'), - 111: - dict( - name='left_pinky_finger4', - id=111, - color=[0, 255, 0], - type='', - swap='right_pinky_finger4'), - 112: - dict( - name='right_hand_root', - id=112, - color=[255, 255, 255], - type='', - swap='left_hand_root'), - 113: - dict( - name='right_thumb1', - id=113, - color=[255, 128, 0], - type='', - swap='left_thumb1'), - 114: - dict( - name='right_thumb2', - id=114, - color=[255, 128, 0], - type='', - swap='left_thumb2'), - 115: - dict( - name='right_thumb3', - id=115, - color=[255, 128, 0], - type='', - swap='left_thumb3'), - 116: - dict( - name='right_thumb4', - id=116, - color=[255, 128, 0], - type='', - swap='left_thumb4'), - 117: - dict( - name='right_forefinger1', - id=117, - color=[255, 153, 255], - type='', - swap='left_forefinger1'), - 118: - dict( - name='right_forefinger2', - id=118, - color=[255, 153, 255], - type='', - swap='left_forefinger2'), - 119: - dict( - name='right_forefinger3', - id=119, - color=[255, 153, 255], - type='', - swap='left_forefinger3'), - 120: - dict( - name='right_forefinger4', - id=120, - color=[255, 153, 255], - type='', - swap='left_forefinger4'), - 121: - dict( - name='right_middle_finger1', - id=121, - color=[102, 178, 255], - type='', - swap='left_middle_finger1'), - 122: - dict( - name='right_middle_finger2', - id=122, - color=[102, 178, 255], - type='', - swap='left_middle_finger2'), - 123: - dict( - name='right_middle_finger3', - id=123, - color=[102, 178, 255], - type='', - swap='left_middle_finger3'), - 124: - dict( - name='right_middle_finger4', - id=124, - color=[102, 178, 255], - type='', - swap='left_middle_finger4'), - 125: - dict( - name='right_ring_finger1', - id=125, - color=[255, 51, 51], - type='', - swap='left_ring_finger1'), - 126: - dict( - name='right_ring_finger2', - id=126, - color=[255, 51, 51], - type='', - swap='left_ring_finger2'), - 127: - dict( - name='right_ring_finger3', - id=127, - color=[255, 51, 51], - type='', - swap='left_ring_finger3'), - 128: - dict( - name='right_ring_finger4', - id=128, - color=[255, 51, 51], - type='', - swap='left_ring_finger4'), - 129: - dict( - name='right_pinky_finger1', - id=129, - color=[0, 255, 0], - type='', - swap='left_pinky_finger1'), - 130: - dict( - name='right_pinky_finger2', - id=130, - color=[0, 255, 0], - type='', - swap='left_pinky_finger2'), - 131: - dict( - name='right_pinky_finger3', - id=131, - color=[0, 255, 0], - type='', - swap='left_pinky_finger3'), - 132: - dict( - name='right_pinky_finger4', - id=132, - color=[0, 255, 0], - type='', - swap='left_pinky_finger4') + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), + 17: dict(name="left_big_toe", id=17, color=[255, 128, 0], type="lower", swap="right_big_toe"), + 18: dict(name="left_small_toe", id=18, color=[255, 128, 0], type="lower", swap="right_small_toe"), + 19: dict(name="left_heel", id=19, color=[255, 128, 0], type="lower", swap="right_heel"), + 20: dict(name="right_big_toe", id=20, color=[255, 128, 0], type="lower", swap="left_big_toe"), + 21: dict(name="right_small_toe", id=21, color=[255, 128, 0], type="lower", swap="left_small_toe"), + 22: dict(name="right_heel", id=22, color=[255, 128, 0], type="lower", swap="left_heel"), + 23: dict(name="face-0", id=23, color=[255, 255, 255], type="", swap="face-16"), + 24: dict(name="face-1", id=24, color=[255, 255, 255], type="", swap="face-15"), + 25: dict(name="face-2", id=25, color=[255, 255, 255], type="", swap="face-14"), + 26: dict(name="face-3", id=26, color=[255, 255, 255], type="", swap="face-13"), + 27: dict(name="face-4", id=27, color=[255, 255, 255], type="", swap="face-12"), + 28: dict(name="face-5", id=28, color=[255, 255, 255], type="", swap="face-11"), + 29: dict(name="face-6", id=29, color=[255, 255, 255], type="", swap="face-10"), + 30: dict(name="face-7", id=30, color=[255, 255, 255], type="", swap="face-9"), + 31: dict(name="face-8", id=31, color=[255, 255, 255], type="", swap=""), + 32: dict(name="face-9", id=32, color=[255, 255, 255], type="", swap="face-7"), + 33: dict(name="face-10", id=33, color=[255, 255, 255], type="", swap="face-6"), + 34: dict(name="face-11", id=34, color=[255, 255, 255], type="", swap="face-5"), + 35: dict(name="face-12", id=35, color=[255, 255, 255], type="", swap="face-4"), + 36: dict(name="face-13", id=36, color=[255, 255, 255], type="", swap="face-3"), + 37: dict(name="face-14", id=37, color=[255, 255, 255], type="", swap="face-2"), + 38: dict(name="face-15", id=38, color=[255, 255, 255], type="", swap="face-1"), + 39: dict(name="face-16", id=39, color=[255, 255, 255], type="", swap="face-0"), + 40: dict(name="face-17", id=40, color=[255, 255, 255], type="", swap="face-26"), + 41: dict(name="face-18", id=41, color=[255, 255, 255], type="", swap="face-25"), + 42: dict(name="face-19", id=42, color=[255, 255, 255], type="", swap="face-24"), + 43: dict(name="face-20", id=43, color=[255, 255, 255], type="", swap="face-23"), + 44: dict(name="face-21", id=44, color=[255, 255, 255], type="", swap="face-22"), + 45: dict(name="face-22", id=45, color=[255, 255, 255], type="", swap="face-21"), + 46: dict(name="face-23", id=46, color=[255, 255, 255], type="", swap="face-20"), + 47: dict(name="face-24", id=47, color=[255, 255, 255], type="", swap="face-19"), + 48: dict(name="face-25", id=48, color=[255, 255, 255], type="", swap="face-18"), + 49: dict(name="face-26", id=49, color=[255, 255, 255], type="", swap="face-17"), + 50: dict(name="face-27", id=50, color=[255, 255, 255], type="", swap=""), + 51: dict(name="face-28", id=51, color=[255, 255, 255], type="", swap=""), + 52: dict(name="face-29", id=52, color=[255, 255, 255], type="", swap=""), + 53: dict(name="face-30", id=53, color=[255, 255, 255], type="", swap=""), + 54: dict(name="face-31", id=54, color=[255, 255, 255], type="", swap="face-35"), + 55: dict(name="face-32", id=55, color=[255, 255, 255], type="", swap="face-34"), + 56: dict(name="face-33", id=56, color=[255, 255, 255], type="", swap=""), + 57: dict(name="face-34", id=57, color=[255, 255, 255], type="", swap="face-32"), + 58: dict(name="face-35", id=58, color=[255, 255, 255], type="", swap="face-31"), + 59: dict(name="face-36", id=59, color=[255, 255, 255], type="", swap="face-45"), + 60: dict(name="face-37", id=60, color=[255, 255, 255], type="", swap="face-44"), + 61: dict(name="face-38", id=61, color=[255, 255, 255], type="", swap="face-43"), + 62: dict(name="face-39", id=62, color=[255, 255, 255], type="", swap="face-42"), + 63: dict(name="face-40", id=63, color=[255, 255, 255], type="", swap="face-47"), + 64: dict(name="face-41", id=64, color=[255, 255, 255], type="", swap="face-46"), + 65: dict(name="face-42", id=65, color=[255, 255, 255], type="", swap="face-39"), + 66: dict(name="face-43", id=66, color=[255, 255, 255], type="", swap="face-38"), + 67: dict(name="face-44", id=67, color=[255, 255, 255], type="", swap="face-37"), + 68: dict(name="face-45", id=68, color=[255, 255, 255], type="", swap="face-36"), + 69: dict(name="face-46", id=69, color=[255, 255, 255], type="", swap="face-41"), + 70: dict(name="face-47", id=70, color=[255, 255, 255], type="", swap="face-40"), + 71: dict(name="face-48", id=71, color=[255, 255, 255], type="", swap="face-54"), + 72: dict(name="face-49", id=72, color=[255, 255, 255], type="", swap="face-53"), + 73: dict(name="face-50", id=73, color=[255, 255, 255], type="", swap="face-52"), + 74: dict(name="face-51", id=74, color=[255, 255, 255], type="", swap=""), + 75: dict(name="face-52", id=75, color=[255, 255, 255], type="", swap="face-50"), + 76: dict(name="face-53", id=76, color=[255, 255, 255], type="", swap="face-49"), + 77: dict(name="face-54", id=77, color=[255, 255, 255], type="", swap="face-48"), + 78: dict(name="face-55", id=78, color=[255, 255, 255], type="", swap="face-59"), + 79: dict(name="face-56", id=79, color=[255, 255, 255], type="", swap="face-58"), + 80: dict(name="face-57", id=80, color=[255, 255, 255], type="", swap=""), + 81: dict(name="face-58", id=81, color=[255, 255, 255], type="", swap="face-56"), + 82: dict(name="face-59", id=82, color=[255, 255, 255], type="", swap="face-55"), + 83: dict(name="face-60", id=83, color=[255, 255, 255], type="", swap="face-64"), + 84: dict(name="face-61", id=84, color=[255, 255, 255], type="", swap="face-63"), + 85: dict(name="face-62", id=85, color=[255, 255, 255], type="", swap=""), + 86: dict(name="face-63", id=86, color=[255, 255, 255], type="", swap="face-61"), + 87: dict(name="face-64", id=87, color=[255, 255, 255], type="", swap="face-60"), + 88: dict(name="face-65", id=88, color=[255, 255, 255], type="", swap="face-67"), + 89: dict(name="face-66", id=89, color=[255, 255, 255], type="", swap=""), + 90: dict(name="face-67", id=90, color=[255, 255, 255], type="", swap="face-65"), + 91: dict(name="left_hand_root", id=91, color=[255, 255, 255], type="", swap="right_hand_root"), + 92: dict(name="left_thumb1", id=92, color=[255, 128, 0], type="", swap="right_thumb1"), + 93: dict(name="left_thumb2", id=93, color=[255, 128, 0], type="", swap="right_thumb2"), + 94: dict(name="left_thumb3", id=94, color=[255, 128, 0], type="", swap="right_thumb3"), + 95: dict(name="left_thumb4", id=95, color=[255, 128, 0], type="", swap="right_thumb4"), + 96: dict(name="left_forefinger1", id=96, color=[255, 153, 255], type="", swap="right_forefinger1"), + 97: dict(name="left_forefinger2", id=97, color=[255, 153, 255], type="", swap="right_forefinger2"), + 98: dict(name="left_forefinger3", id=98, color=[255, 153, 255], type="", swap="right_forefinger3"), + 99: dict(name="left_forefinger4", id=99, color=[255, 153, 255], type="", swap="right_forefinger4"), + 100: dict(name="left_middle_finger1", id=100, color=[102, 178, 255], type="", swap="right_middle_finger1"), + 101: dict(name="left_middle_finger2", id=101, color=[102, 178, 255], type="", swap="right_middle_finger2"), + 102: dict(name="left_middle_finger3", id=102, color=[102, 178, 255], type="", swap="right_middle_finger3"), + 103: dict(name="left_middle_finger4", id=103, color=[102, 178, 255], type="", swap="right_middle_finger4"), + 104: dict(name="left_ring_finger1", id=104, color=[255, 51, 51], type="", swap="right_ring_finger1"), + 105: dict(name="left_ring_finger2", id=105, color=[255, 51, 51], type="", swap="right_ring_finger2"), + 106: dict(name="left_ring_finger3", id=106, color=[255, 51, 51], type="", swap="right_ring_finger3"), + 107: dict(name="left_ring_finger4", id=107, color=[255, 51, 51], type="", swap="right_ring_finger4"), + 108: dict(name="left_pinky_finger1", id=108, color=[0, 255, 0], type="", swap="right_pinky_finger1"), + 109: dict(name="left_pinky_finger2", id=109, color=[0, 255, 0], type="", swap="right_pinky_finger2"), + 110: dict(name="left_pinky_finger3", id=110, color=[0, 255, 0], type="", swap="right_pinky_finger3"), + 111: dict(name="left_pinky_finger4", id=111, color=[0, 255, 0], type="", swap="right_pinky_finger4"), + 112: dict(name="right_hand_root", id=112, color=[255, 255, 255], type="", swap="left_hand_root"), + 113: dict(name="right_thumb1", id=113, color=[255, 128, 0], type="", swap="left_thumb1"), + 114: dict(name="right_thumb2", id=114, color=[255, 128, 0], type="", swap="left_thumb2"), + 115: dict(name="right_thumb3", id=115, color=[255, 128, 0], type="", swap="left_thumb3"), + 116: dict(name="right_thumb4", id=116, color=[255, 128, 0], type="", swap="left_thumb4"), + 117: dict(name="right_forefinger1", id=117, color=[255, 153, 255], type="", swap="left_forefinger1"), + 118: dict(name="right_forefinger2", id=118, color=[255, 153, 255], type="", swap="left_forefinger2"), + 119: dict(name="right_forefinger3", id=119, color=[255, 153, 255], type="", swap="left_forefinger3"), + 120: dict(name="right_forefinger4", id=120, color=[255, 153, 255], type="", swap="left_forefinger4"), + 121: dict(name="right_middle_finger1", id=121, color=[102, 178, 255], type="", swap="left_middle_finger1"), + 122: dict(name="right_middle_finger2", id=122, color=[102, 178, 255], type="", swap="left_middle_finger2"), + 123: dict(name="right_middle_finger3", id=123, color=[102, 178, 255], type="", swap="left_middle_finger3"), + 124: dict(name="right_middle_finger4", id=124, color=[102, 178, 255], type="", swap="left_middle_finger4"), + 125: dict(name="right_ring_finger1", id=125, color=[255, 51, 51], type="", swap="left_ring_finger1"), + 126: dict(name="right_ring_finger2", id=126, color=[255, 51, 51], type="", swap="left_ring_finger2"), + 127: dict(name="right_ring_finger3", id=127, color=[255, 51, 51], type="", swap="left_ring_finger3"), + 128: dict(name="right_ring_finger4", id=128, color=[255, 51, 51], type="", swap="left_ring_finger4"), + 129: dict(name="right_pinky_finger1", id=129, color=[0, 255, 0], type="", swap="left_pinky_finger1"), + 130: dict(name="right_pinky_finger2", id=130, color=[0, 255, 0], type="", swap="left_pinky_finger2"), + 131: dict(name="right_pinky_finger3", id=131, color=[0, 255, 0], type="", swap="left_pinky_finger3"), + 132: dict(name="right_pinky_finger4", id=132, color=[0, 255, 0], type="", swap="left_pinky_finger4"), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), - 4: - dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'right_shoulder'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), - 14: - dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), - 16: - dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), - 18: - dict( - link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]), - 19: - dict(link=('left_ankle', 'left_big_toe'), id=19, color=[0, 255, 0]), - 20: - dict(link=('left_ankle', 'left_small_toe'), id=20, color=[0, 255, 0]), - 21: - dict(link=('left_ankle', 'left_heel'), id=21, color=[0, 255, 0]), - 22: - dict( - link=('right_ankle', 'right_big_toe'), id=22, color=[255, 128, 0]), - 23: - dict( - link=('right_ankle', 'right_small_toe'), - id=23, - color=[255, 128, 0]), - 24: - dict(link=('right_ankle', 'right_heel'), id=24, color=[255, 128, 0]), - 25: - dict( - link=('left_hand_root', 'left_thumb1'), id=25, color=[255, 128, - 0]), - 26: - dict(link=('left_thumb1', 'left_thumb2'), id=26, color=[255, 128, 0]), - 27: - dict(link=('left_thumb2', 'left_thumb3'), id=27, color=[255, 128, 0]), - 28: - dict(link=('left_thumb3', 'left_thumb4'), id=28, color=[255, 128, 0]), - 29: - dict( - link=('left_hand_root', 'left_forefinger1'), - id=29, - color=[255, 153, 255]), - 30: - dict( - link=('left_forefinger1', 'left_forefinger2'), - id=30, - color=[255, 153, 255]), - 31: - dict( - link=('left_forefinger2', 'left_forefinger3'), - id=31, - color=[255, 153, 255]), - 32: - dict( - link=('left_forefinger3', 'left_forefinger4'), - id=32, - color=[255, 153, 255]), - 33: - dict( - link=('left_hand_root', 'left_middle_finger1'), - id=33, - color=[102, 178, 255]), - 34: - dict( - link=('left_middle_finger1', 'left_middle_finger2'), - id=34, - color=[102, 178, 255]), - 35: - dict( - link=('left_middle_finger2', 'left_middle_finger3'), - id=35, - color=[102, 178, 255]), - 36: - dict( - link=('left_middle_finger3', 'left_middle_finger4'), - id=36, - color=[102, 178, 255]), - 37: - dict( - link=('left_hand_root', 'left_ring_finger1'), - id=37, - color=[255, 51, 51]), - 38: - dict( - link=('left_ring_finger1', 'left_ring_finger2'), - id=38, - color=[255, 51, 51]), - 39: - dict( - link=('left_ring_finger2', 'left_ring_finger3'), - id=39, - color=[255, 51, 51]), - 40: - dict( - link=('left_ring_finger3', 'left_ring_finger4'), - id=40, - color=[255, 51, 51]), - 41: - dict( - link=('left_hand_root', 'left_pinky_finger1'), - id=41, - color=[0, 255, 0]), - 42: - dict( - link=('left_pinky_finger1', 'left_pinky_finger2'), - id=42, - color=[0, 255, 0]), - 43: - dict( - link=('left_pinky_finger2', 'left_pinky_finger3'), - id=43, - color=[0, 255, 0]), - 44: - dict( - link=('left_pinky_finger3', 'left_pinky_finger4'), - id=44, - color=[0, 255, 0]), - 45: - dict( - link=('right_hand_root', 'right_thumb1'), - id=45, - color=[255, 128, 0]), - 46: - dict( - link=('right_thumb1', 'right_thumb2'), id=46, color=[255, 128, 0]), - 47: - dict( - link=('right_thumb2', 'right_thumb3'), id=47, color=[255, 128, 0]), - 48: - dict( - link=('right_thumb3', 'right_thumb4'), id=48, color=[255, 128, 0]), - 49: - dict( - link=('right_hand_root', 'right_forefinger1'), - id=49, - color=[255, 153, 255]), - 50: - dict( - link=('right_forefinger1', 'right_forefinger2'), - id=50, - color=[255, 153, 255]), - 51: - dict( - link=('right_forefinger2', 'right_forefinger3'), - id=51, - color=[255, 153, 255]), - 52: - dict( - link=('right_forefinger3', 'right_forefinger4'), - id=52, - color=[255, 153, 255]), - 53: - dict( - link=('right_hand_root', 'right_middle_finger1'), - id=53, - color=[102, 178, 255]), - 54: - dict( - link=('right_middle_finger1', 'right_middle_finger2'), - id=54, - color=[102, 178, 255]), - 55: - dict( - link=('right_middle_finger2', 'right_middle_finger3'), - id=55, - color=[102, 178, 255]), - 56: - dict( - link=('right_middle_finger3', 'right_middle_finger4'), - id=56, - color=[102, 178, 255]), - 57: - dict( - link=('right_hand_root', 'right_ring_finger1'), - id=57, - color=[255, 51, 51]), - 58: - dict( - link=('right_ring_finger1', 'right_ring_finger2'), - id=58, - color=[255, 51, 51]), - 59: - dict( - link=('right_ring_finger2', 'right_ring_finger3'), - id=59, - color=[255, 51, 51]), - 60: - dict( - link=('right_ring_finger3', 'right_ring_finger4'), - id=60, - color=[255, 51, 51]), - 61: - dict( - link=('right_hand_root', 'right_pinky_finger1'), - id=61, - color=[0, 255, 0]), - 62: - dict( - link=('right_pinky_finger1', 'right_pinky_finger2'), - id=62, - color=[0, 255, 0]), - 63: - dict( - link=('right_pinky_finger2', 'right_pinky_finger3'), - id=63, - color=[0, 255, 0]), - 64: - dict( - link=('right_pinky_finger3', 'right_pinky_finger4'), - id=64, - color=[0, 255, 0]) + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), + 4: dict(link=("left_hip", "right_hip"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "right_shoulder"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), + 19: dict(link=("left_ankle", "left_big_toe"), id=19, color=[0, 255, 0]), + 20: dict(link=("left_ankle", "left_small_toe"), id=20, color=[0, 255, 0]), + 21: dict(link=("left_ankle", "left_heel"), id=21, color=[0, 255, 0]), + 22: dict(link=("right_ankle", "right_big_toe"), id=22, color=[255, 128, 0]), + 23: dict(link=("right_ankle", "right_small_toe"), id=23, color=[255, 128, 0]), + 24: dict(link=("right_ankle", "right_heel"), id=24, color=[255, 128, 0]), + 25: dict(link=("left_hand_root", "left_thumb1"), id=25, color=[255, 128, 0]), + 26: dict(link=("left_thumb1", "left_thumb2"), id=26, color=[255, 128, 0]), + 27: dict(link=("left_thumb2", "left_thumb3"), id=27, color=[255, 128, 0]), + 28: dict(link=("left_thumb3", "left_thumb4"), id=28, color=[255, 128, 0]), + 29: dict(link=("left_hand_root", "left_forefinger1"), id=29, color=[255, 153, 255]), + 30: dict(link=("left_forefinger1", "left_forefinger2"), id=30, color=[255, 153, 255]), + 31: dict(link=("left_forefinger2", "left_forefinger3"), id=31, color=[255, 153, 255]), + 32: dict(link=("left_forefinger3", "left_forefinger4"), id=32, color=[255, 153, 255]), + 33: dict(link=("left_hand_root", "left_middle_finger1"), id=33, color=[102, 178, 255]), + 34: dict(link=("left_middle_finger1", "left_middle_finger2"), id=34, color=[102, 178, 255]), + 35: dict(link=("left_middle_finger2", "left_middle_finger3"), id=35, color=[102, 178, 255]), + 36: dict(link=("left_middle_finger3", "left_middle_finger4"), id=36, color=[102, 178, 255]), + 37: dict(link=("left_hand_root", "left_ring_finger1"), id=37, color=[255, 51, 51]), + 38: dict(link=("left_ring_finger1", "left_ring_finger2"), id=38, color=[255, 51, 51]), + 39: dict(link=("left_ring_finger2", "left_ring_finger3"), id=39, color=[255, 51, 51]), + 40: dict(link=("left_ring_finger3", "left_ring_finger4"), id=40, color=[255, 51, 51]), + 41: dict(link=("left_hand_root", "left_pinky_finger1"), id=41, color=[0, 255, 0]), + 42: dict(link=("left_pinky_finger1", "left_pinky_finger2"), id=42, color=[0, 255, 0]), + 43: dict(link=("left_pinky_finger2", "left_pinky_finger3"), id=43, color=[0, 255, 0]), + 44: dict(link=("left_pinky_finger3", "left_pinky_finger4"), id=44, color=[0, 255, 0]), + 45: dict(link=("right_hand_root", "right_thumb1"), id=45, color=[255, 128, 0]), + 46: dict(link=("right_thumb1", "right_thumb2"), id=46, color=[255, 128, 0]), + 47: dict(link=("right_thumb2", "right_thumb3"), id=47, color=[255, 128, 0]), + 48: dict(link=("right_thumb3", "right_thumb4"), id=48, color=[255, 128, 0]), + 49: dict(link=("right_hand_root", "right_forefinger1"), id=49, color=[255, 153, 255]), + 50: dict(link=("right_forefinger1", "right_forefinger2"), id=50, color=[255, 153, 255]), + 51: dict(link=("right_forefinger2", "right_forefinger3"), id=51, color=[255, 153, 255]), + 52: dict(link=("right_forefinger3", "right_forefinger4"), id=52, color=[255, 153, 255]), + 53: dict(link=("right_hand_root", "right_middle_finger1"), id=53, color=[102, 178, 255]), + 54: dict(link=("right_middle_finger1", "right_middle_finger2"), id=54, color=[102, 178, 255]), + 55: dict(link=("right_middle_finger2", "right_middle_finger3"), id=55, color=[102, 178, 255]), + 56: dict(link=("right_middle_finger3", "right_middle_finger4"), id=56, color=[102, 178, 255]), + 57: dict(link=("right_hand_root", "right_ring_finger1"), id=57, color=[255, 51, 51]), + 58: dict(link=("right_ring_finger1", "right_ring_finger2"), id=58, color=[255, 51, 51]), + 59: dict(link=("right_ring_finger2", "right_ring_finger3"), id=59, color=[255, 51, 51]), + 60: dict(link=("right_ring_finger3", "right_ring_finger4"), id=60, color=[255, 51, 51]), + 61: dict(link=("right_hand_root", "right_pinky_finger1"), id=61, color=[0, 255, 0]), + 62: dict(link=("right_pinky_finger1", "right_pinky_finger2"), id=62, color=[0, 255, 0]), + 63: dict(link=("right_pinky_finger2", "right_pinky_finger3"), id=63, color=[0, 255, 0]), + 64: dict(link=("right_pinky_finger3", "right_pinky_finger4"), id=64, color=[0, 255, 0]), }, - joint_weights=[1.] * 133, + joint_weights=[1.0] * 133, # 'https://github.com/jin-s13/COCO-WholeBody/blob/master/' # 'evaluation/myeval_wholebody.py#L175' sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, 0.068, 0.066, 0.066, - 0.092, 0.094, 0.094, 0.042, 0.043, 0.044, 0.043, 0.040, 0.035, 0.031, - 0.025, 0.020, 0.023, 0.029, 0.032, 0.037, 0.038, 0.043, 0.041, 0.045, - 0.013, 0.012, 0.011, 0.011, 0.012, 0.012, 0.011, 0.011, 0.013, 0.015, - 0.009, 0.007, 0.007, 0.007, 0.012, 0.009, 0.008, 0.016, 0.010, 0.017, - 0.011, 0.009, 0.011, 0.009, 0.007, 0.013, 0.008, 0.011, 0.012, 0.010, - 0.034, 0.008, 0.008, 0.009, 0.008, 0.008, 0.007, 0.010, 0.008, 0.009, - 0.009, 0.009, 0.007, 0.007, 0.008, 0.011, 0.008, 0.008, 0.008, 0.01, - 0.008, 0.029, 0.022, 0.035, 0.037, 0.047, 0.026, 0.025, 0.024, 0.035, - 0.018, 0.024, 0.022, 0.026, 0.017, 0.021, 0.021, 0.032, 0.02, 0.019, - 0.022, 0.031, 0.029, 0.022, 0.035, 0.037, 0.047, 0.026, 0.025, 0.024, - 0.035, 0.018, 0.024, 0.022, 0.026, 0.017, 0.021, 0.021, 0.032, 0.02, - 0.019, 0.022, 0.031 - ]) + 0.026, + 0.025, + 0.025, + 0.035, + 0.035, + 0.079, + 0.079, + 0.072, + 0.072, + 0.062, + 0.062, + 0.107, + 0.107, + 0.087, + 0.087, + 0.089, + 0.089, + 0.068, + 0.066, + 0.066, + 0.092, + 0.094, + 0.094, + 0.042, + 0.043, + 0.044, + 0.043, + 0.040, + 0.035, + 0.031, + 0.025, + 0.020, + 0.023, + 0.029, + 0.032, + 0.037, + 0.038, + 0.043, + 0.041, + 0.045, + 0.013, + 0.012, + 0.011, + 0.011, + 0.012, + 0.012, + 0.011, + 0.011, + 0.013, + 0.015, + 0.009, + 0.007, + 0.007, + 0.007, + 0.012, + 0.009, + 0.008, + 0.016, + 0.010, + 0.017, + 0.011, + 0.009, + 0.011, + 0.009, + 0.007, + 0.013, + 0.008, + 0.011, + 0.012, + 0.010, + 0.034, + 0.008, + 0.008, + 0.009, + 0.008, + 0.008, + 0.007, + 0.010, + 0.008, + 0.009, + 0.009, + 0.009, + 0.007, + 0.007, + 0.008, + 0.011, + 0.008, + 0.008, + 0.008, + 0.01, + 0.008, + 0.029, + 0.022, + 0.035, + 0.037, + 0.047, + 0.026, + 0.025, + 0.024, + 0.035, + 0.018, + 0.024, + 0.022, + 0.026, + 0.017, + 0.021, + 0.021, + 0.032, + 0.02, + 0.019, + 0.022, + 0.031, + 0.029, + 0.022, + 0.035, + 0.037, + 0.047, + 0.026, + 0.025, + 0.024, + 0.035, + 0.018, + 0.024, + 0.022, + 0.026, + 0.017, + 0.021, + 0.021, + 0.032, + 0.02, + 0.019, + 0.022, + 0.031, + ], +) diff --git a/mmpose/configs/_base_/datasets/ubody3d.py b/mmpose/configs/_base_/datasets/ubody3d.py index 9242559ea1fd85291d0f9136b65fdd5cb66664fb..cc622fa29c79c69d8ba389de1a75f5b1453861e6 100644 --- a/mmpose/configs/_base_/datasets/ubody3d.py +++ b/mmpose/configs/_base_/datasets/ubody3d.py @@ -1,958 +1,214 @@ dataset_info = dict( - dataset_name='ubody3d', + dataset_name="ubody3d", paper_info=dict( - author='Jing Lin, Ailing Zeng, Haoqian Wang, Lei Zhang, Yu Li', - title='One-Stage 3D Whole-Body Mesh Recovery with Component Aware' - 'Transformer', - container='IEEE Computer Society Conference on Computer Vision and ' - 'Pattern Recognition (CVPR)', - year='2023', - homepage='https://github.com/IDEA-Research/OSX', + author="Jing Lin, Ailing Zeng, Haoqian Wang, Lei Zhang, Yu Li", + title="One-Stage 3D Whole-Body Mesh Recovery with Component Aware" "Transformer", + container="IEEE Computer Society Conference on Computer Vision and " "Pattern Recognition (CVPR)", + year="2023", + homepage="https://github.com/IDEA-Research/OSX", ), keypoint_info={ - 0: - dict(name='Pelvis', id=0, color=[0, 255, 0], type='', swap=''), - 1: - dict( - name='L_Hip', id=1, color=[0, 255, 0], type='lower', swap='R_Hip'), - 2: - dict( - name='R_Hip', id=2, color=[0, 255, 0], type='lower', swap='L_Hip'), - 3: - dict( - name='L_Knee', - id=3, - color=[0, 255, 0], - type='lower', - swap='R_Knee'), - 4: - dict( - name='R_Knee', - id=4, - color=[0, 255, 0], - type='lower', - swap='L_Knee'), - 5: - dict( - name='L_Ankle', - id=5, - color=[0, 255, 0], - type='lower', - swap='R_Ankle'), - 6: - dict( - name='R_Ankle', - id=6, - color=[0, 255, 0], - type='lower', - swap='L_Ankle'), - 7: - dict(name='Neck', id=7, color=[0, 255, 0], type='upper', swap=''), - 8: - dict( - name='L_Shoulder', - id=8, - color=[0, 255, 0], - type='upper', - swap='R_Shoulder'), - 9: - dict( - name='R_Shoulder', - id=9, - color=[0, 255, 0], - type='upper', - swap='L_Shoulder'), - 10: - dict( - name='L_Elbow', - id=10, - color=[0, 255, 0], - type='upper', - swap='R_Elbow'), - 11: - dict( - name='R_Elbow', - id=11, - color=[0, 255, 0], - type='upper', - swap='L_Elbow'), - 12: - dict( - name='L_Wrist', - id=12, - color=[0, 255, 0], - type='upper', - swap='R_Wrist'), - 13: - dict( - name='R_Wrist', - id=13, - color=[0, 255, 0], - type='upper', - swap='L_Wrist'), - 14: - dict( - name='L_Big_toe', - id=14, - color=[0, 255, 0], - type='lower', - swap='R_Big_toe'), - 15: - dict( - name='L_Small_toe', - id=15, - color=[0, 255, 0], - type='lower', - swap='R_Small_toe'), - 16: - dict( - name='L_Heel', - id=16, - color=[0, 255, 0], - type='lower', - swap='R_Heel'), - 17: - dict( - name='R_Big_toe', - id=17, - color=[0, 255, 0], - type='lower', - swap='L_Big_toe'), - 18: - dict( - name='R_Small_toe', - id=18, - color=[0, 255, 0], - type='lower', - swap='L_Small_toe'), - 19: - dict( - name='R_Heel', - id=19, - color=[0, 255, 0], - type='lower', - swap='L_Heel'), - 20: - dict( - name='L_Ear', id=20, color=[0, 255, 0], type='upper', - swap='R_Ear'), - 21: - dict( - name='R_Ear', id=21, color=[0, 255, 0], type='upper', - swap='L_Ear'), - 22: - dict(name='L_Eye', id=22, color=[0, 255, 0], type='', swap='R_Eye'), - 23: - dict(name='R_Eye', id=23, color=[0, 255, 0], type='', swap='L_Eye'), - 24: - dict(name='Nose', id=24, color=[0, 255, 0], type='upper', swap=''), - 25: - dict( - name='L_Thumb_1', - id=25, - color=[255, 128, 0], - type='', - swap='R_Thumb_1'), - 26: - dict( - name='L_Thumb_2', - id=26, - color=[255, 128, 0], - type='', - swap='R_Thumb_2'), - 27: - dict( - name='L_Thumb_3', - id=27, - color=[255, 128, 0], - type='', - swap='R_Thumb_3'), - 28: - dict( - name='L_Thumb_4', - id=28, - color=[255, 128, 0], - type='', - swap='R_Thumb_4'), - 29: - dict( - name='L_Index_1', - id=29, - color=[255, 128, 0], - type='', - swap='R_Index_1'), - 30: - dict( - name='L_Index_2', - id=30, - color=[255, 128, 0], - type='', - swap='R_Index_2'), - 31: - dict( - name='L_Index_3', - id=31, - color=[255, 128, 0], - type='', - swap='R_Index_3'), - 32: - dict( - name='L_Index_4', - id=32, - color=[255, 128, 0], - type='', - swap='R_Index_4'), - 33: - dict( - name='L_Middle_1', - id=33, - color=[255, 128, 0], - type='', - swap='R_Middle_1'), - 34: - dict( - name='L_Middle_2', - id=34, - color=[255, 128, 0], - type='', - swap='R_Middle_2'), - 35: - dict( - name='L_Middle_3', - id=35, - color=[255, 128, 0], - type='', - swap='R_Middle_3'), - 36: - dict( - name='L_Middle_4', - id=36, - color=[255, 128, 0], - type='', - swap='R_Middle_4'), - 37: - dict( - name='L_Ring_1', - id=37, - color=[255, 128, 0], - type='', - swap='R_Ring_1'), - 38: - dict( - name='L_Ring_2', - id=38, - color=[255, 128, 0], - type='', - swap='R_Ring_2'), - 39: - dict( - name='L_Ring_3', - id=39, - color=[255, 128, 0], - type='', - swap='R_Ring_3'), - 40: - dict( - name='L_Ring_4', - id=40, - color=[255, 128, 0], - type='', - swap='R_Ring_4'), - 41: - dict( - name='L_Pinky_1', - id=41, - color=[255, 128, 0], - type='', - swap='R_Pinky_1'), - 42: - dict( - name='L_Pinky_2', - id=42, - color=[255, 128, 0], - type='', - swap='R_Pinky_2'), - 43: - dict( - name='L_Pinky_3', - id=43, - color=[255, 128, 0], - type='', - swap='R_Pinky_3'), - 44: - dict( - name='L_Pinky_4', - id=44, - color=[255, 128, 0], - type='', - swap='R_Pinky_4'), - 45: - dict( - name='R_Thumb_1', - id=45, - color=[255, 128, 0], - type='', - swap='L_Thumb_1'), - 46: - dict( - name='R_Thumb_2', - id=46, - color=[255, 128, 0], - type='', - swap='L_Thumb_2'), - 47: - dict( - name='R_Thumb_3', - id=47, - color=[255, 128, 0], - type='', - swap='L_Thumb_3'), - 48: - dict( - name='R_Thumb_4', - id=48, - color=[255, 128, 0], - type='', - swap='L_Thumb_4'), - 49: - dict( - name='R_Index_1', - id=49, - color=[255, 128, 0], - type='', - swap='L_Index_1'), - 50: - dict( - name='R_Index_2', - id=50, - color=[255, 128, 0], - type='', - swap='L_Index_2'), - 51: - dict( - name='R_Index_3', - id=51, - color=[255, 128, 0], - type='', - swap='L_Index_3'), - 52: - dict( - name='R_Index_4', - id=52, - color=[255, 128, 0], - type='', - swap='L_Index_4'), - 53: - dict( - name='R_Middle_1', - id=53, - color=[255, 128, 0], - type='', - swap='L_Middle_1'), - 54: - dict( - name='R_Middle_2', - id=54, - color=[255, 128, 0], - type='', - swap='L_Middle_2'), - 55: - dict( - name='R_Middle_3', - id=55, - color=[255, 128, 0], - type='', - swap='L_Middle_3'), - 56: - dict( - name='R_Middle_4', - id=56, - color=[255, 128, 0], - type='', - swap='L_Middle_4'), - 57: - dict( - name='R_Ring_1', - id=57, - color=[255, 128, 0], - type='', - swap='L_Ring_1'), - 58: - dict( - name='R_Ring_2', - id=58, - color=[255, 128, 0], - type='', - swap='L_Ring_2'), - 59: - dict( - name='R_Ring_3', - id=59, - color=[255, 128, 0], - type='', - swap='L_Ring_3'), - 60: - dict( - name='R_Ring_4', - id=60, - color=[255, 128, 0], - type='', - swap='L_Ring_4'), - 61: - dict( - name='R_Pinky_1', - id=61, - color=[255, 128, 0], - type='', - swap='L_Pinky_1'), - 62: - dict( - name='R_Pinky_2', - id=62, - color=[255, 128, 0], - type='', - swap='L_Pinky_2'), - 63: - dict( - name='R_Pinky_3', - id=63, - color=[255, 128, 0], - type='', - swap='L_Pinky_3'), - 64: - dict( - name='R_Pinky_4', - id=64, - color=[255, 128, 0], - type='', - swap='L_Pinky_4'), - 65: - dict(name='Face_1', id=65, color=[255, 255, 255], type='', swap=''), - 66: - dict(name='Face_2', id=66, color=[255, 255, 255], type='', swap=''), - 67: - dict( - name='Face_3', - id=67, - color=[255, 255, 255], - type='', - swap='Face_4'), - 68: - dict( - name='Face_4', - id=68, - color=[255, 255, 255], - type='', - swap='Face_3'), - 69: - dict( - name='Face_5', - id=69, - color=[255, 255, 255], - type='', - swap='Face_14'), - 70: - dict( - name='Face_6', - id=70, - color=[255, 255, 255], - type='', - swap='Face_13'), - 71: - dict( - name='Face_7', - id=71, - color=[255, 255, 255], - type='', - swap='Face_12'), - 72: - dict( - name='Face_8', - id=72, - color=[255, 255, 255], - type='', - swap='Face_11'), - 73: - dict( - name='Face_9', - id=73, - color=[255, 255, 255], - type='', - swap='Face_10'), - 74: - dict( - name='Face_10', - id=74, - color=[255, 255, 255], - type='', - swap='Face_9'), - 75: - dict( - name='Face_11', - id=75, - color=[255, 255, 255], - type='', - swap='Face_8'), - 76: - dict( - name='Face_12', - id=76, - color=[255, 255, 255], - type='', - swap='Face_7'), - 77: - dict( - name='Face_13', - id=77, - color=[255, 255, 255], - type='', - swap='Face_6'), - 78: - dict( - name='Face_14', - id=78, - color=[255, 255, 255], - type='', - swap='Face_5'), - 79: - dict(name='Face_15', id=79, color=[255, 255, 255], type='', swap=''), - 80: - dict(name='Face_16', id=80, color=[255, 255, 255], type='', swap=''), - 81: - dict(name='Face_17', id=81, color=[255, 255, 255], type='', swap=''), - 82: - dict(name='Face_18', id=82, color=[255, 255, 255], type='', swap=''), - 83: - dict( - name='Face_19', - id=83, - color=[255, 255, 255], - type='', - swap='Face_23'), - 84: - dict( - name='Face_20', - id=84, - color=[255, 255, 255], - type='', - swap='Face_22'), - 85: - dict(name='Face_21', id=85, color=[255, 255, 255], type='', swap=''), - 86: - dict( - name='Face_22', - id=86, - color=[255, 255, 255], - type='', - swap='Face_20'), - 87: - dict( - name='Face_23', - id=87, - color=[255, 255, 255], - type='', - swap='Face_19'), - 88: - dict( - name='Face_24', - id=88, - color=[255, 255, 255], - type='', - swap='Face_33'), - 89: - dict( - name='Face_25', - id=89, - color=[255, 255, 255], - type='', - swap='Face_32'), - 90: - dict( - name='Face_26', - id=90, - color=[255, 255, 255], - type='', - swap='Face_31'), - 91: - dict( - name='Face_27', - id=91, - color=[255, 255, 255], - type='', - swap='Face_30'), - 92: - dict( - name='Face_28', - id=92, - color=[255, 255, 255], - type='', - swap='Face_35'), - 93: - dict( - name='Face_29', - id=93, - color=[255, 255, 255], - type='', - swap='Face_34'), - 94: - dict( - name='Face_30', - id=94, - color=[255, 255, 255], - type='', - swap='Face_27'), - 95: - dict( - name='Face_31', - id=95, - color=[255, 255, 255], - type='', - swap='Face_26'), - 96: - dict( - name='Face_32', - id=96, - color=[255, 255, 255], - type='', - swap='Face_25'), - 97: - dict( - name='Face_33', - id=97, - color=[255, 255, 255], - type='', - swap='Face_24'), - 98: - dict( - name='Face_34', - id=98, - color=[255, 255, 255], - type='', - swap='Face_29'), - 99: - dict( - name='Face_35', - id=99, - color=[255, 255, 255], - type='', - swap='Face_28'), - 100: - dict( - name='Face_36', - id=100, - color=[255, 255, 255], - type='', - swap='Face_42'), - 101: - dict( - name='Face_37', - id=101, - color=[255, 255, 255], - type='', - swap='Face_41'), - 102: - dict( - name='Face_38', - id=102, - color=[255, 255, 255], - type='', - swap='Face_40'), - 103: - dict(name='Face_39', id=103, color=[255, 255, 255], type='', swap=''), - 104: - dict( - name='Face_40', - id=104, - color=[255, 255, 255], - type='', - swap='Face_38'), - 105: - dict( - name='Face_41', - id=105, - color=[255, 255, 255], - type='', - swap='Face_37'), - 106: - dict( - name='Face_42', - id=106, - color=[255, 255, 255], - type='', - swap='Face_36'), - 107: - dict( - name='Face_43', - id=107, - color=[255, 255, 255], - type='', - swap='Face_47'), - 108: - dict( - name='Face_44', - id=108, - color=[255, 255, 255], - type='', - swap='Face_46'), - 109: - dict(name='Face_45', id=109, color=[255, 255, 255], type='', swap=''), - 110: - dict( - name='Face_46', - id=110, - color=[255, 255, 255], - type='', - swap='Face_44'), - 111: - dict( - name='Face_47', - id=111, - color=[255, 255, 255], - type='', - swap='Face_43'), - 112: - dict( - name='Face_48', - id=112, - color=[255, 255, 255], - type='', - swap='Face_52'), - 113: - dict( - name='Face_49', - id=113, - color=[255, 255, 255], - type='', - swap='Face_51'), - 114: - dict(name='Face_50', id=114, color=[255, 255, 255], type='', swap=''), - 115: - dict( - name='Face_51', - id=115, - color=[255, 255, 255], - type='', - swap='Face_49'), - 116: - dict( - name='Face_52', - id=116, - color=[255, 255, 255], - type='', - swap='Face_48'), - 117: - dict( - name='Face_53', - id=117, - color=[255, 255, 255], - type='', - swap='Face_55'), - 118: - dict(name='Face_54', id=118, color=[255, 255, 255], type='', swap=''), - 119: - dict( - name='Face_55', - id=119, - color=[255, 255, 255], - type='', - swap='Face_53'), - 120: - dict( - name='Face_56', - id=120, - color=[255, 255, 255], - type='', - swap='Face_72'), - 121: - dict( - name='Face_57', - id=121, - color=[255, 255, 255], - type='', - swap='Face_71'), - 122: - dict( - name='Face_58', - id=122, - color=[255, 255, 255], - type='', - swap='Face_70'), - 123: - dict( - name='Face_59', - id=123, - color=[255, 255, 255], - type='', - swap='Face_69'), - 124: - dict( - name='Face_60', - id=124, - color=[255, 255, 255], - type='', - swap='Face_68'), - 125: - dict( - name='Face_61', - id=125, - color=[255, 255, 255], - type='', - swap='Face_67'), - 126: - dict( - name='Face_62', - id=126, - color=[255, 255, 255], - type='', - swap='Face_66'), - 127: - dict( - name='Face_63', - id=127, - color=[255, 255, 255], - type='', - swap='Face_65'), - 128: - dict(name='Face_64', id=128, color=[255, 255, 255], type='', swap=''), - 129: - dict( - name='Face_65', - id=129, - color=[255, 255, 255], - type='', - swap='Face_63'), - 130: - dict( - name='Face_66', - id=130, - color=[255, 255, 255], - type='', - swap='Face_62'), - 131: - dict( - name='Face_67', - id=131, - color=[255, 255, 255], - type='', - swap='Face_61'), - 132: - dict( - name='Face_68', - id=132, - color=[255, 255, 255], - type='', - swap='Face_60'), - 133: - dict( - name='Face_69', - id=133, - color=[255, 255, 255], - type='', - swap='Face_59'), - 134: - dict( - name='Face_70', - id=134, - color=[255, 255, 255], - type='', - swap='Face_58'), - 135: - dict( - name='Face_71', - id=135, - color=[255, 255, 255], - type='', - swap='Face_57'), - 136: - dict( - name='Face_72', - id=136, - color=[255, 255, 255], - type='', - swap='Face_56'), + 0: dict(name="Pelvis", id=0, color=[0, 255, 0], type="", swap=""), + 1: dict(name="L_Hip", id=1, color=[0, 255, 0], type="lower", swap="R_Hip"), + 2: dict(name="R_Hip", id=2, color=[0, 255, 0], type="lower", swap="L_Hip"), + 3: dict(name="L_Knee", id=3, color=[0, 255, 0], type="lower", swap="R_Knee"), + 4: dict(name="R_Knee", id=4, color=[0, 255, 0], type="lower", swap="L_Knee"), + 5: dict(name="L_Ankle", id=5, color=[0, 255, 0], type="lower", swap="R_Ankle"), + 6: dict(name="R_Ankle", id=6, color=[0, 255, 0], type="lower", swap="L_Ankle"), + 7: dict(name="Neck", id=7, color=[0, 255, 0], type="upper", swap=""), + 8: dict(name="L_Shoulder", id=8, color=[0, 255, 0], type="upper", swap="R_Shoulder"), + 9: dict(name="R_Shoulder", id=9, color=[0, 255, 0], type="upper", swap="L_Shoulder"), + 10: dict(name="L_Elbow", id=10, color=[0, 255, 0], type="upper", swap="R_Elbow"), + 11: dict(name="R_Elbow", id=11, color=[0, 255, 0], type="upper", swap="L_Elbow"), + 12: dict(name="L_Wrist", id=12, color=[0, 255, 0], type="upper", swap="R_Wrist"), + 13: dict(name="R_Wrist", id=13, color=[0, 255, 0], type="upper", swap="L_Wrist"), + 14: dict(name="L_Big_toe", id=14, color=[0, 255, 0], type="lower", swap="R_Big_toe"), + 15: dict(name="L_Small_toe", id=15, color=[0, 255, 0], type="lower", swap="R_Small_toe"), + 16: dict(name="L_Heel", id=16, color=[0, 255, 0], type="lower", swap="R_Heel"), + 17: dict(name="R_Big_toe", id=17, color=[0, 255, 0], type="lower", swap="L_Big_toe"), + 18: dict(name="R_Small_toe", id=18, color=[0, 255, 0], type="lower", swap="L_Small_toe"), + 19: dict(name="R_Heel", id=19, color=[0, 255, 0], type="lower", swap="L_Heel"), + 20: dict(name="L_Ear", id=20, color=[0, 255, 0], type="upper", swap="R_Ear"), + 21: dict(name="R_Ear", id=21, color=[0, 255, 0], type="upper", swap="L_Ear"), + 22: dict(name="L_Eye", id=22, color=[0, 255, 0], type="", swap="R_Eye"), + 23: dict(name="R_Eye", id=23, color=[0, 255, 0], type="", swap="L_Eye"), + 24: dict(name="Nose", id=24, color=[0, 255, 0], type="upper", swap=""), + 25: dict(name="L_Thumb_1", id=25, color=[255, 128, 0], type="", swap="R_Thumb_1"), + 26: dict(name="L_Thumb_2", id=26, color=[255, 128, 0], type="", swap="R_Thumb_2"), + 27: dict(name="L_Thumb_3", id=27, color=[255, 128, 0], type="", swap="R_Thumb_3"), + 28: dict(name="L_Thumb_4", id=28, color=[255, 128, 0], type="", swap="R_Thumb_4"), + 29: dict(name="L_Index_1", id=29, color=[255, 128, 0], type="", swap="R_Index_1"), + 30: dict(name="L_Index_2", id=30, color=[255, 128, 0], type="", swap="R_Index_2"), + 31: dict(name="L_Index_3", id=31, color=[255, 128, 0], type="", swap="R_Index_3"), + 32: dict(name="L_Index_4", id=32, color=[255, 128, 0], type="", swap="R_Index_4"), + 33: dict(name="L_Middle_1", id=33, color=[255, 128, 0], type="", swap="R_Middle_1"), + 34: dict(name="L_Middle_2", id=34, color=[255, 128, 0], type="", swap="R_Middle_2"), + 35: dict(name="L_Middle_3", id=35, color=[255, 128, 0], type="", swap="R_Middle_3"), + 36: dict(name="L_Middle_4", id=36, color=[255, 128, 0], type="", swap="R_Middle_4"), + 37: dict(name="L_Ring_1", id=37, color=[255, 128, 0], type="", swap="R_Ring_1"), + 38: dict(name="L_Ring_2", id=38, color=[255, 128, 0], type="", swap="R_Ring_2"), + 39: dict(name="L_Ring_3", id=39, color=[255, 128, 0], type="", swap="R_Ring_3"), + 40: dict(name="L_Ring_4", id=40, color=[255, 128, 0], type="", swap="R_Ring_4"), + 41: dict(name="L_Pinky_1", id=41, color=[255, 128, 0], type="", swap="R_Pinky_1"), + 42: dict(name="L_Pinky_2", id=42, color=[255, 128, 0], type="", swap="R_Pinky_2"), + 43: dict(name="L_Pinky_3", id=43, color=[255, 128, 0], type="", swap="R_Pinky_3"), + 44: dict(name="L_Pinky_4", id=44, color=[255, 128, 0], type="", swap="R_Pinky_4"), + 45: dict(name="R_Thumb_1", id=45, color=[255, 128, 0], type="", swap="L_Thumb_1"), + 46: dict(name="R_Thumb_2", id=46, color=[255, 128, 0], type="", swap="L_Thumb_2"), + 47: dict(name="R_Thumb_3", id=47, color=[255, 128, 0], type="", swap="L_Thumb_3"), + 48: dict(name="R_Thumb_4", id=48, color=[255, 128, 0], type="", swap="L_Thumb_4"), + 49: dict(name="R_Index_1", id=49, color=[255, 128, 0], type="", swap="L_Index_1"), + 50: dict(name="R_Index_2", id=50, color=[255, 128, 0], type="", swap="L_Index_2"), + 51: dict(name="R_Index_3", id=51, color=[255, 128, 0], type="", swap="L_Index_3"), + 52: dict(name="R_Index_4", id=52, color=[255, 128, 0], type="", swap="L_Index_4"), + 53: dict(name="R_Middle_1", id=53, color=[255, 128, 0], type="", swap="L_Middle_1"), + 54: dict(name="R_Middle_2", id=54, color=[255, 128, 0], type="", swap="L_Middle_2"), + 55: dict(name="R_Middle_3", id=55, color=[255, 128, 0], type="", swap="L_Middle_3"), + 56: dict(name="R_Middle_4", id=56, color=[255, 128, 0], type="", swap="L_Middle_4"), + 57: dict(name="R_Ring_1", id=57, color=[255, 128, 0], type="", swap="L_Ring_1"), + 58: dict(name="R_Ring_2", id=58, color=[255, 128, 0], type="", swap="L_Ring_2"), + 59: dict(name="R_Ring_3", id=59, color=[255, 128, 0], type="", swap="L_Ring_3"), + 60: dict(name="R_Ring_4", id=60, color=[255, 128, 0], type="", swap="L_Ring_4"), + 61: dict(name="R_Pinky_1", id=61, color=[255, 128, 0], type="", swap="L_Pinky_1"), + 62: dict(name="R_Pinky_2", id=62, color=[255, 128, 0], type="", swap="L_Pinky_2"), + 63: dict(name="R_Pinky_3", id=63, color=[255, 128, 0], type="", swap="L_Pinky_3"), + 64: dict(name="R_Pinky_4", id=64, color=[255, 128, 0], type="", swap="L_Pinky_4"), + 65: dict(name="Face_1", id=65, color=[255, 255, 255], type="", swap=""), + 66: dict(name="Face_2", id=66, color=[255, 255, 255], type="", swap=""), + 67: dict(name="Face_3", id=67, color=[255, 255, 255], type="", swap="Face_4"), + 68: dict(name="Face_4", id=68, color=[255, 255, 255], type="", swap="Face_3"), + 69: dict(name="Face_5", id=69, color=[255, 255, 255], type="", swap="Face_14"), + 70: dict(name="Face_6", id=70, color=[255, 255, 255], type="", swap="Face_13"), + 71: dict(name="Face_7", id=71, color=[255, 255, 255], type="", swap="Face_12"), + 72: dict(name="Face_8", id=72, color=[255, 255, 255], type="", swap="Face_11"), + 73: dict(name="Face_9", id=73, color=[255, 255, 255], type="", swap="Face_10"), + 74: dict(name="Face_10", id=74, color=[255, 255, 255], type="", swap="Face_9"), + 75: dict(name="Face_11", id=75, color=[255, 255, 255], type="", swap="Face_8"), + 76: dict(name="Face_12", id=76, color=[255, 255, 255], type="", swap="Face_7"), + 77: dict(name="Face_13", id=77, color=[255, 255, 255], type="", swap="Face_6"), + 78: dict(name="Face_14", id=78, color=[255, 255, 255], type="", swap="Face_5"), + 79: dict(name="Face_15", id=79, color=[255, 255, 255], type="", swap=""), + 80: dict(name="Face_16", id=80, color=[255, 255, 255], type="", swap=""), + 81: dict(name="Face_17", id=81, color=[255, 255, 255], type="", swap=""), + 82: dict(name="Face_18", id=82, color=[255, 255, 255], type="", swap=""), + 83: dict(name="Face_19", id=83, color=[255, 255, 255], type="", swap="Face_23"), + 84: dict(name="Face_20", id=84, color=[255, 255, 255], type="", swap="Face_22"), + 85: dict(name="Face_21", id=85, color=[255, 255, 255], type="", swap=""), + 86: dict(name="Face_22", id=86, color=[255, 255, 255], type="", swap="Face_20"), + 87: dict(name="Face_23", id=87, color=[255, 255, 255], type="", swap="Face_19"), + 88: dict(name="Face_24", id=88, color=[255, 255, 255], type="", swap="Face_33"), + 89: dict(name="Face_25", id=89, color=[255, 255, 255], type="", swap="Face_32"), + 90: dict(name="Face_26", id=90, color=[255, 255, 255], type="", swap="Face_31"), + 91: dict(name="Face_27", id=91, color=[255, 255, 255], type="", swap="Face_30"), + 92: dict(name="Face_28", id=92, color=[255, 255, 255], type="", swap="Face_35"), + 93: dict(name="Face_29", id=93, color=[255, 255, 255], type="", swap="Face_34"), + 94: dict(name="Face_30", id=94, color=[255, 255, 255], type="", swap="Face_27"), + 95: dict(name="Face_31", id=95, color=[255, 255, 255], type="", swap="Face_26"), + 96: dict(name="Face_32", id=96, color=[255, 255, 255], type="", swap="Face_25"), + 97: dict(name="Face_33", id=97, color=[255, 255, 255], type="", swap="Face_24"), + 98: dict(name="Face_34", id=98, color=[255, 255, 255], type="", swap="Face_29"), + 99: dict(name="Face_35", id=99, color=[255, 255, 255], type="", swap="Face_28"), + 100: dict(name="Face_36", id=100, color=[255, 255, 255], type="", swap="Face_42"), + 101: dict(name="Face_37", id=101, color=[255, 255, 255], type="", swap="Face_41"), + 102: dict(name="Face_38", id=102, color=[255, 255, 255], type="", swap="Face_40"), + 103: dict(name="Face_39", id=103, color=[255, 255, 255], type="", swap=""), + 104: dict(name="Face_40", id=104, color=[255, 255, 255], type="", swap="Face_38"), + 105: dict(name="Face_41", id=105, color=[255, 255, 255], type="", swap="Face_37"), + 106: dict(name="Face_42", id=106, color=[255, 255, 255], type="", swap="Face_36"), + 107: dict(name="Face_43", id=107, color=[255, 255, 255], type="", swap="Face_47"), + 108: dict(name="Face_44", id=108, color=[255, 255, 255], type="", swap="Face_46"), + 109: dict(name="Face_45", id=109, color=[255, 255, 255], type="", swap=""), + 110: dict(name="Face_46", id=110, color=[255, 255, 255], type="", swap="Face_44"), + 111: dict(name="Face_47", id=111, color=[255, 255, 255], type="", swap="Face_43"), + 112: dict(name="Face_48", id=112, color=[255, 255, 255], type="", swap="Face_52"), + 113: dict(name="Face_49", id=113, color=[255, 255, 255], type="", swap="Face_51"), + 114: dict(name="Face_50", id=114, color=[255, 255, 255], type="", swap=""), + 115: dict(name="Face_51", id=115, color=[255, 255, 255], type="", swap="Face_49"), + 116: dict(name="Face_52", id=116, color=[255, 255, 255], type="", swap="Face_48"), + 117: dict(name="Face_53", id=117, color=[255, 255, 255], type="", swap="Face_55"), + 118: dict(name="Face_54", id=118, color=[255, 255, 255], type="", swap=""), + 119: dict(name="Face_55", id=119, color=[255, 255, 255], type="", swap="Face_53"), + 120: dict(name="Face_56", id=120, color=[255, 255, 255], type="", swap="Face_72"), + 121: dict(name="Face_57", id=121, color=[255, 255, 255], type="", swap="Face_71"), + 122: dict(name="Face_58", id=122, color=[255, 255, 255], type="", swap="Face_70"), + 123: dict(name="Face_59", id=123, color=[255, 255, 255], type="", swap="Face_69"), + 124: dict(name="Face_60", id=124, color=[255, 255, 255], type="", swap="Face_68"), + 125: dict(name="Face_61", id=125, color=[255, 255, 255], type="", swap="Face_67"), + 126: dict(name="Face_62", id=126, color=[255, 255, 255], type="", swap="Face_66"), + 127: dict(name="Face_63", id=127, color=[255, 255, 255], type="", swap="Face_65"), + 128: dict(name="Face_64", id=128, color=[255, 255, 255], type="", swap=""), + 129: dict(name="Face_65", id=129, color=[255, 255, 255], type="", swap="Face_63"), + 130: dict(name="Face_66", id=130, color=[255, 255, 255], type="", swap="Face_62"), + 131: dict(name="Face_67", id=131, color=[255, 255, 255], type="", swap="Face_61"), + 132: dict(name="Face_68", id=132, color=[255, 255, 255], type="", swap="Face_60"), + 133: dict(name="Face_69", id=133, color=[255, 255, 255], type="", swap="Face_59"), + 134: dict(name="Face_70", id=134, color=[255, 255, 255], type="", swap="Face_58"), + 135: dict(name="Face_71", id=135, color=[255, 255, 255], type="", swap="Face_57"), + 136: dict(name="Face_72", id=136, color=[255, 255, 255], type="", swap="Face_56"), }, skeleton_info={ - 0: dict(link=('L_Ankle', 'L_Knee'), id=0, color=[0, 255, 0]), - 1: dict(link=('L_Knee', 'L_Hip'), id=1, color=[0, 255, 0]), - 2: dict(link=('R_Ankle', 'R_Knee'), id=2, color=[0, 255, 0]), - 3: dict(link=('R_Knee', 'R_Hip'), id=3, color=[0, 255, 0]), - 4: dict(link=('L_Hip', 'R_Hip'), id=4, color=[0, 255, 0]), - 5: dict(link=('L_Shoulder', 'L_Hip'), id=5, color=[0, 255, 0]), - 6: dict(link=('R_Shoulder', 'R_Hip'), id=6, color=[0, 255, 0]), - 7: dict(link=('L_Shoulder', 'R_Shoulder'), id=7, color=[0, 255, 0]), - 8: dict(link=('L_Shoulder', 'L_Elbow'), id=8, color=[0, 255, 0]), - 9: dict(link=('R_Shoulder', 'R_Elbow'), id=9, color=[0, 255, 0]), - 10: dict(link=('L_Elbow', 'L_Wrist'), id=10, color=[0, 255, 0]), - 11: dict(link=('R_Elbow', 'R_Wrist'), id=11, color=[255, 128, 0]), - 12: dict(link=('L_Eye', 'R_Eye'), id=12, color=[255, 128, 0]), - 13: dict(link=('Nose', 'L_Eye'), id=13, color=[255, 128, 0]), - 14: dict(link=('Nose', 'R_Eye'), id=14, color=[255, 128, 0]), - 15: dict(link=('L_Eye', 'L_Ear'), id=15, color=[255, 128, 0]), - 16: dict(link=('R_Eye', 'R_Ear'), id=16, color=[255, 128, 0]), - 17: dict(link=('L_Ear', 'L_Shoulder'), id=17, color=[255, 128, 0]), - 18: dict(link=('R_Ear', 'R_Shoulder'), id=18, color=[255, 128, 0]), - 19: dict(link=('L_Ankle', 'L_Big_toe'), id=19, color=[255, 128, 0]), - 20: dict(link=('L_Ankle', 'L_Small_toe'), id=20, color=[255, 128, 0]), - 21: dict(link=('L_Ankle', 'L_Heel'), id=21, color=[255, 128, 0]), - 22: dict(link=('R_Ankle', 'R_Big_toe'), id=22, color=[255, 128, 0]), - 23: dict(link=('R_Ankle', 'R_Small_toe'), id=23, color=[255, 128, 0]), - 24: dict(link=('R_Ankle', 'R_Heel'), id=24, color=[255, 128, 0]), - 25: dict(link=('L_Wrist', 'L_Thumb_1'), id=25, color=[255, 128, 0]), - 26: dict(link=('L_Thumb_1', 'L_Thumb_2'), id=26, color=[255, 128, 0]), - 27: dict(link=('L_Thumb_2', 'L_Thumb_3'), id=27, color=[255, 128, 0]), - 28: dict(link=('L_Thumb_3', 'L_Thumb_4'), id=28, color=[255, 128, 0]), - 29: dict(link=('L_Wrist', 'L_Index_1'), id=29, color=[255, 128, 0]), - 30: dict(link=('L_Index_1', 'L_Index_2'), id=30, color=[255, 128, 0]), - 31: - dict(link=('L_Index_2', 'L_Index_3'), id=31, color=[255, 255, 255]), - 32: - dict(link=('L_Index_3', 'L_Index_4'), id=32, color=[255, 255, 255]), - 33: dict(link=('L_Wrist', 'L_Middle_1'), id=33, color=[255, 255, 255]), - 34: - dict(link=('L_Middle_1', 'L_Middle_2'), id=34, color=[255, 255, 255]), - 35: - dict(link=('L_Middle_2', 'L_Middle_3'), id=35, color=[255, 255, 255]), - 36: - dict(link=('L_Middle_3', 'L_Middle_4'), id=36, color=[255, 255, 255]), - 37: dict(link=('L_Wrist', 'L_Ring_1'), id=37, color=[255, 255, 255]), - 38: dict(link=('L_Ring_1', 'L_Ring_2'), id=38, color=[255, 255, 255]), - 39: dict(link=('L_Ring_2', 'L_Ring_3'), id=39, color=[255, 255, 255]), - 40: dict(link=('L_Ring_3', 'L_Ring_4'), id=40, color=[255, 255, 255]), - 41: dict(link=('L_Wrist', 'L_Pinky_1'), id=41, color=[255, 255, 255]), - 42: - dict(link=('L_Pinky_1', 'L_Pinky_2'), id=42, color=[255, 255, 255]), - 43: - dict(link=('L_Pinky_2', 'L_Pinky_3'), id=43, color=[255, 255, 255]), - 44: - dict(link=('L_Pinky_3', 'L_Pinky_4'), id=44, color=[255, 255, 255]), - 45: dict(link=('R_Wrist', 'R_Thumb_1'), id=45, color=[255, 255, 255]), - 46: - dict(link=('R_Thumb_1', 'R_Thumb_2'), id=46, color=[255, 255, 255]), - 47: - dict(link=('R_Thumb_2', 'R_Thumb_3'), id=47, color=[255, 255, 255]), - 48: - dict(link=('R_Thumb_3', 'R_Thumb_4'), id=48, color=[255, 255, 255]), - 49: dict(link=('R_Wrist', 'R_Index_1'), id=49, color=[255, 255, 255]), - 50: - dict(link=('R_Index_1', 'R_Index_2'), id=50, color=[255, 255, 255]), - 51: - dict(link=('R_Index_2', 'R_Index_3'), id=51, color=[255, 255, 255]), - 52: - dict(link=('R_Index_3', 'R_Index_4'), id=52, color=[255, 255, 255]), - 53: dict(link=('R_Wrist', 'R_Middle_1'), id=53, color=[255, 255, 255]), - 54: - dict(link=('R_Middle_1', 'R_Middle_2'), id=54, color=[255, 255, 255]), - 55: - dict(link=('R_Middle_2', 'R_Middle_3'), id=55, color=[255, 255, 255]), - 56: - dict(link=('R_Middle_3', 'R_Middle_4'), id=56, color=[255, 255, 255]), - 57: dict(link=('R_Wrist', 'R_Pinky_1'), id=57, color=[255, 255, 255]), - 58: - dict(link=('R_Pinky_1', 'R_Pinky_2'), id=58, color=[255, 255, 255]), - 59: - dict(link=('R_Pinky_2', 'R_Pinky_3'), id=59, color=[255, 255, 255]), - 60: - dict(link=('R_Pinky_3', 'R_Pinky_4'), id=60, color=[255, 255, 255]), + 0: dict(link=("L_Ankle", "L_Knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("L_Knee", "L_Hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("R_Ankle", "R_Knee"), id=2, color=[0, 255, 0]), + 3: dict(link=("R_Knee", "R_Hip"), id=3, color=[0, 255, 0]), + 4: dict(link=("L_Hip", "R_Hip"), id=4, color=[0, 255, 0]), + 5: dict(link=("L_Shoulder", "L_Hip"), id=5, color=[0, 255, 0]), + 6: dict(link=("R_Shoulder", "R_Hip"), id=6, color=[0, 255, 0]), + 7: dict(link=("L_Shoulder", "R_Shoulder"), id=7, color=[0, 255, 0]), + 8: dict(link=("L_Shoulder", "L_Elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("R_Shoulder", "R_Elbow"), id=9, color=[0, 255, 0]), + 10: dict(link=("L_Elbow", "L_Wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("R_Elbow", "R_Wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("L_Eye", "R_Eye"), id=12, color=[255, 128, 0]), + 13: dict(link=("Nose", "L_Eye"), id=13, color=[255, 128, 0]), + 14: dict(link=("Nose", "R_Eye"), id=14, color=[255, 128, 0]), + 15: dict(link=("L_Eye", "L_Ear"), id=15, color=[255, 128, 0]), + 16: dict(link=("R_Eye", "R_Ear"), id=16, color=[255, 128, 0]), + 17: dict(link=("L_Ear", "L_Shoulder"), id=17, color=[255, 128, 0]), + 18: dict(link=("R_Ear", "R_Shoulder"), id=18, color=[255, 128, 0]), + 19: dict(link=("L_Ankle", "L_Big_toe"), id=19, color=[255, 128, 0]), + 20: dict(link=("L_Ankle", "L_Small_toe"), id=20, color=[255, 128, 0]), + 21: dict(link=("L_Ankle", "L_Heel"), id=21, color=[255, 128, 0]), + 22: dict(link=("R_Ankle", "R_Big_toe"), id=22, color=[255, 128, 0]), + 23: dict(link=("R_Ankle", "R_Small_toe"), id=23, color=[255, 128, 0]), + 24: dict(link=("R_Ankle", "R_Heel"), id=24, color=[255, 128, 0]), + 25: dict(link=("L_Wrist", "L_Thumb_1"), id=25, color=[255, 128, 0]), + 26: dict(link=("L_Thumb_1", "L_Thumb_2"), id=26, color=[255, 128, 0]), + 27: dict(link=("L_Thumb_2", "L_Thumb_3"), id=27, color=[255, 128, 0]), + 28: dict(link=("L_Thumb_3", "L_Thumb_4"), id=28, color=[255, 128, 0]), + 29: dict(link=("L_Wrist", "L_Index_1"), id=29, color=[255, 128, 0]), + 30: dict(link=("L_Index_1", "L_Index_2"), id=30, color=[255, 128, 0]), + 31: dict(link=("L_Index_2", "L_Index_3"), id=31, color=[255, 255, 255]), + 32: dict(link=("L_Index_3", "L_Index_4"), id=32, color=[255, 255, 255]), + 33: dict(link=("L_Wrist", "L_Middle_1"), id=33, color=[255, 255, 255]), + 34: dict(link=("L_Middle_1", "L_Middle_2"), id=34, color=[255, 255, 255]), + 35: dict(link=("L_Middle_2", "L_Middle_3"), id=35, color=[255, 255, 255]), + 36: dict(link=("L_Middle_3", "L_Middle_4"), id=36, color=[255, 255, 255]), + 37: dict(link=("L_Wrist", "L_Ring_1"), id=37, color=[255, 255, 255]), + 38: dict(link=("L_Ring_1", "L_Ring_2"), id=38, color=[255, 255, 255]), + 39: dict(link=("L_Ring_2", "L_Ring_3"), id=39, color=[255, 255, 255]), + 40: dict(link=("L_Ring_3", "L_Ring_4"), id=40, color=[255, 255, 255]), + 41: dict(link=("L_Wrist", "L_Pinky_1"), id=41, color=[255, 255, 255]), + 42: dict(link=("L_Pinky_1", "L_Pinky_2"), id=42, color=[255, 255, 255]), + 43: dict(link=("L_Pinky_2", "L_Pinky_3"), id=43, color=[255, 255, 255]), + 44: dict(link=("L_Pinky_3", "L_Pinky_4"), id=44, color=[255, 255, 255]), + 45: dict(link=("R_Wrist", "R_Thumb_1"), id=45, color=[255, 255, 255]), + 46: dict(link=("R_Thumb_1", "R_Thumb_2"), id=46, color=[255, 255, 255]), + 47: dict(link=("R_Thumb_2", "R_Thumb_3"), id=47, color=[255, 255, 255]), + 48: dict(link=("R_Thumb_3", "R_Thumb_4"), id=48, color=[255, 255, 255]), + 49: dict(link=("R_Wrist", "R_Index_1"), id=49, color=[255, 255, 255]), + 50: dict(link=("R_Index_1", "R_Index_2"), id=50, color=[255, 255, 255]), + 51: dict(link=("R_Index_2", "R_Index_3"), id=51, color=[255, 255, 255]), + 52: dict(link=("R_Index_3", "R_Index_4"), id=52, color=[255, 255, 255]), + 53: dict(link=("R_Wrist", "R_Middle_1"), id=53, color=[255, 255, 255]), + 54: dict(link=("R_Middle_1", "R_Middle_2"), id=54, color=[255, 255, 255]), + 55: dict(link=("R_Middle_2", "R_Middle_3"), id=55, color=[255, 255, 255]), + 56: dict(link=("R_Middle_3", "R_Middle_4"), id=56, color=[255, 255, 255]), + 57: dict(link=("R_Wrist", "R_Pinky_1"), id=57, color=[255, 255, 255]), + 58: dict(link=("R_Pinky_1", "R_Pinky_2"), id=58, color=[255, 255, 255]), + 59: dict(link=("R_Pinky_2", "R_Pinky_3"), id=59, color=[255, 255, 255]), + 60: dict(link=("R_Pinky_3", "R_Pinky_4"), id=60, color=[255, 255, 255]), }, - joint_weights=[1.] * 137, - sigmas=[]) + joint_weights=[1.0] * 137, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/wflw.py b/mmpose/configs/_base_/datasets/wflw.py index 80c29b696cf5031d8f21d7d8ed7e573043666f35..639877969079c38f97a33775848a293aafefe22f 100644 --- a/mmpose/configs/_base_/datasets/wflw.py +++ b/mmpose/configs/_base_/datasets/wflw.py @@ -1,192 +1,113 @@ dataset_info = dict( - dataset_name='wflw', + dataset_name="wflw", paper_info=dict( - author='Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, ' - 'Quan and Cai, Yici and Zhou, Qiang', - title='Look at boundary: A boundary-aware face alignment algorithm', - container='Proceedings of the IEEE conference on computer ' - 'vision and pattern recognition', - year='2018', - homepage='https://wywu.github.io/projects/LAB/WFLW.html', + author="Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, " "Quan and Cai, Yici and Zhou, Qiang", + title="Look at boundary: A boundary-aware face alignment algorithm", + container="Proceedings of the IEEE conference on computer " "vision and pattern recognition", + year="2018", + homepage="https://wywu.github.io/projects/LAB/WFLW.html", ), keypoint_info={ - 0: dict(name='kpt-0', id=0, color=[255, 0, 0], type='', swap='kpt-32'), - 1: dict(name='kpt-1', id=1, color=[255, 0, 0], type='', swap='kpt-31'), - 2: dict(name='kpt-2', id=2, color=[255, 0, 0], type='', swap='kpt-30'), - 3: dict(name='kpt-3', id=3, color=[255, 0, 0], type='', swap='kpt-29'), - 4: dict(name='kpt-4', id=4, color=[255, 0, 0], type='', swap='kpt-28'), - 5: dict(name='kpt-5', id=5, color=[255, 0, 0], type='', swap='kpt-27'), - 6: dict(name='kpt-6', id=6, color=[255, 0, 0], type='', swap='kpt-26'), - 7: dict(name='kpt-7', id=7, color=[255, 0, 0], type='', swap='kpt-25'), - 8: dict(name='kpt-8', id=8, color=[255, 0, 0], type='', swap='kpt-24'), - 9: dict(name='kpt-9', id=9, color=[255, 0, 0], type='', swap='kpt-23'), - 10: - dict(name='kpt-10', id=10, color=[255, 0, 0], type='', swap='kpt-22'), - 11: - dict(name='kpt-11', id=11, color=[255, 0, 0], type='', swap='kpt-21'), - 12: - dict(name='kpt-12', id=12, color=[255, 0, 0], type='', swap='kpt-20'), - 13: - dict(name='kpt-13', id=13, color=[255, 0, 0], type='', swap='kpt-19'), - 14: - dict(name='kpt-14', id=14, color=[255, 0, 0], type='', swap='kpt-18'), - 15: - dict(name='kpt-15', id=15, color=[255, 0, 0], type='', swap='kpt-17'), - 16: dict(name='kpt-16', id=16, color=[255, 0, 0], type='', swap=''), - 17: - dict(name='kpt-17', id=17, color=[255, 0, 0], type='', swap='kpt-15'), - 18: - dict(name='kpt-18', id=18, color=[255, 0, 0], type='', swap='kpt-14'), - 19: - dict(name='kpt-19', id=19, color=[255, 0, 0], type='', swap='kpt-13'), - 20: - dict(name='kpt-20', id=20, color=[255, 0, 0], type='', swap='kpt-12'), - 21: - dict(name='kpt-21', id=21, color=[255, 0, 0], type='', swap='kpt-11'), - 22: - dict(name='kpt-22', id=22, color=[255, 0, 0], type='', swap='kpt-10'), - 23: - dict(name='kpt-23', id=23, color=[255, 0, 0], type='', swap='kpt-9'), - 24: - dict(name='kpt-24', id=24, color=[255, 0, 0], type='', swap='kpt-8'), - 25: - dict(name='kpt-25', id=25, color=[255, 0, 0], type='', swap='kpt-7'), - 26: - dict(name='kpt-26', id=26, color=[255, 0, 0], type='', swap='kpt-6'), - 27: - dict(name='kpt-27', id=27, color=[255, 0, 0], type='', swap='kpt-5'), - 28: - dict(name='kpt-28', id=28, color=[255, 0, 0], type='', swap='kpt-4'), - 29: - dict(name='kpt-29', id=29, color=[255, 0, 0], type='', swap='kpt-3'), - 30: - dict(name='kpt-30', id=30, color=[255, 0, 0], type='', swap='kpt-2'), - 31: - dict(name='kpt-31', id=31, color=[255, 0, 0], type='', swap='kpt-1'), - 32: - dict(name='kpt-32', id=32, color=[255, 0, 0], type='', swap='kpt-0'), - 33: - dict(name='kpt-33', id=33, color=[255, 0, 0], type='', swap='kpt-46'), - 34: - dict(name='kpt-34', id=34, color=[255, 0, 0], type='', swap='kpt-45'), - 35: - dict(name='kpt-35', id=35, color=[255, 0, 0], type='', swap='kpt-44'), - 36: - dict(name='kpt-36', id=36, color=[255, 0, 0], type='', swap='kpt-43'), - 37: dict( - name='kpt-37', id=37, color=[255, 0, 0], type='', swap='kpt-42'), - 38: dict( - name='kpt-38', id=38, color=[255, 0, 0], type='', swap='kpt-50'), - 39: dict( - name='kpt-39', id=39, color=[255, 0, 0], type='', swap='kpt-49'), - 40: dict( - name='kpt-40', id=40, color=[255, 0, 0], type='', swap='kpt-48'), - 41: dict( - name='kpt-41', id=41, color=[255, 0, 0], type='', swap='kpt-47'), - 42: dict( - name='kpt-42', id=42, color=[255, 0, 0], type='', swap='kpt-37'), - 43: dict( - name='kpt-43', id=43, color=[255, 0, 0], type='', swap='kpt-36'), - 44: dict( - name='kpt-44', id=44, color=[255, 0, 0], type='', swap='kpt-35'), - 45: dict( - name='kpt-45', id=45, color=[255, 0, 0], type='', swap='kpt-34'), - 46: dict( - name='kpt-46', id=46, color=[255, 0, 0], type='', swap='kpt-33'), - 47: dict( - name='kpt-47', id=47, color=[255, 0, 0], type='', swap='kpt-41'), - 48: dict( - name='kpt-48', id=48, color=[255, 0, 0], type='', swap='kpt-40'), - 49: dict( - name='kpt-49', id=49, color=[255, 0, 0], type='', swap='kpt-39'), - 50: dict( - name='kpt-50', id=50, color=[255, 0, 0], type='', swap='kpt-38'), - 51: dict(name='kpt-51', id=51, color=[255, 0, 0], type='', swap=''), - 52: dict(name='kpt-52', id=52, color=[255, 0, 0], type='', swap=''), - 53: dict(name='kpt-53', id=53, color=[255, 0, 0], type='', swap=''), - 54: dict(name='kpt-54', id=54, color=[255, 0, 0], type='', swap=''), - 55: dict( - name='kpt-55', id=55, color=[255, 0, 0], type='', swap='kpt-59'), - 56: dict( - name='kpt-56', id=56, color=[255, 0, 0], type='', swap='kpt-58'), - 57: dict(name='kpt-57', id=57, color=[255, 0, 0], type='', swap=''), - 58: dict( - name='kpt-58', id=58, color=[255, 0, 0], type='', swap='kpt-56'), - 59: dict( - name='kpt-59', id=59, color=[255, 0, 0], type='', swap='kpt-55'), - 60: dict( - name='kpt-60', id=60, color=[255, 0, 0], type='', swap='kpt-72'), - 61: dict( - name='kpt-61', id=61, color=[255, 0, 0], type='', swap='kpt-71'), - 62: dict( - name='kpt-62', id=62, color=[255, 0, 0], type='', swap='kpt-70'), - 63: dict( - name='kpt-63', id=63, color=[255, 0, 0], type='', swap='kpt-69'), - 64: dict( - name='kpt-64', id=64, color=[255, 0, 0], type='', swap='kpt-68'), - 65: dict( - name='kpt-65', id=65, color=[255, 0, 0], type='', swap='kpt-75'), - 66: dict( - name='kpt-66', id=66, color=[255, 0, 0], type='', swap='kpt-74'), - 67: dict( - name='kpt-67', id=67, color=[255, 0, 0], type='', swap='kpt-73'), - 68: dict( - name='kpt-68', id=68, color=[255, 0, 0], type='', swap='kpt-64'), - 69: dict( - name='kpt-69', id=69, color=[255, 0, 0], type='', swap='kpt-63'), - 70: dict( - name='kpt-70', id=70, color=[255, 0, 0], type='', swap='kpt-62'), - 71: dict( - name='kpt-71', id=71, color=[255, 0, 0], type='', swap='kpt-61'), - 72: dict( - name='kpt-72', id=72, color=[255, 0, 0], type='', swap='kpt-60'), - 73: dict( - name='kpt-73', id=73, color=[255, 0, 0], type='', swap='kpt-67'), - 74: dict( - name='kpt-74', id=74, color=[255, 0, 0], type='', swap='kpt-66'), - 75: dict( - name='kpt-75', id=75, color=[255, 0, 0], type='', swap='kpt-65'), - 76: dict( - name='kpt-76', id=76, color=[255, 0, 0], type='', swap='kpt-82'), - 77: dict( - name='kpt-77', id=77, color=[255, 0, 0], type='', swap='kpt-81'), - 78: dict( - name='kpt-78', id=78, color=[255, 0, 0], type='', swap='kpt-80'), - 79: dict(name='kpt-79', id=79, color=[255, 0, 0], type='', swap=''), - 80: dict( - name='kpt-80', id=80, color=[255, 0, 0], type='', swap='kpt-78'), - 81: dict( - name='kpt-81', id=81, color=[255, 0, 0], type='', swap='kpt-77'), - 82: dict( - name='kpt-82', id=82, color=[255, 0, 0], type='', swap='kpt-76'), - 83: dict( - name='kpt-83', id=83, color=[255, 0, 0], type='', swap='kpt-87'), - 84: dict( - name='kpt-84', id=84, color=[255, 0, 0], type='', swap='kpt-86'), - 85: dict(name='kpt-85', id=85, color=[255, 0, 0], type='', swap=''), - 86: dict( - name='kpt-86', id=86, color=[255, 0, 0], type='', swap='kpt-84'), - 87: dict( - name='kpt-87', id=87, color=[255, 0, 0], type='', swap='kpt-83'), - 88: dict( - name='kpt-88', id=88, color=[255, 0, 0], type='', swap='kpt-92'), - 89: dict( - name='kpt-89', id=89, color=[255, 0, 0], type='', swap='kpt-91'), - 90: dict(name='kpt-90', id=90, color=[255, 0, 0], type='', swap=''), - 91: dict( - name='kpt-91', id=91, color=[255, 0, 0], type='', swap='kpt-89'), - 92: dict( - name='kpt-92', id=92, color=[255, 0, 0], type='', swap='kpt-88'), - 93: dict( - name='kpt-93', id=93, color=[255, 0, 0], type='', swap='kpt-95'), - 94: dict(name='kpt-94', id=94, color=[255, 0, 0], type='', swap=''), - 95: dict( - name='kpt-95', id=95, color=[255, 0, 0], type='', swap='kpt-93'), - 96: dict( - name='kpt-96', id=96, color=[255, 0, 0], type='', swap='kpt-97'), - 97: dict( - name='kpt-97', id=97, color=[255, 0, 0], type='', swap='kpt-96') + 0: dict(name="kpt-0", id=0, color=[255, 0, 0], type="", swap="kpt-32"), + 1: dict(name="kpt-1", id=1, color=[255, 0, 0], type="", swap="kpt-31"), + 2: dict(name="kpt-2", id=2, color=[255, 0, 0], type="", swap="kpt-30"), + 3: dict(name="kpt-3", id=3, color=[255, 0, 0], type="", swap="kpt-29"), + 4: dict(name="kpt-4", id=4, color=[255, 0, 0], type="", swap="kpt-28"), + 5: dict(name="kpt-5", id=5, color=[255, 0, 0], type="", swap="kpt-27"), + 6: dict(name="kpt-6", id=6, color=[255, 0, 0], type="", swap="kpt-26"), + 7: dict(name="kpt-7", id=7, color=[255, 0, 0], type="", swap="kpt-25"), + 8: dict(name="kpt-8", id=8, color=[255, 0, 0], type="", swap="kpt-24"), + 9: dict(name="kpt-9", id=9, color=[255, 0, 0], type="", swap="kpt-23"), + 10: dict(name="kpt-10", id=10, color=[255, 0, 0], type="", swap="kpt-22"), + 11: dict(name="kpt-11", id=11, color=[255, 0, 0], type="", swap="kpt-21"), + 12: dict(name="kpt-12", id=12, color=[255, 0, 0], type="", swap="kpt-20"), + 13: dict(name="kpt-13", id=13, color=[255, 0, 0], type="", swap="kpt-19"), + 14: dict(name="kpt-14", id=14, color=[255, 0, 0], type="", swap="kpt-18"), + 15: dict(name="kpt-15", id=15, color=[255, 0, 0], type="", swap="kpt-17"), + 16: dict(name="kpt-16", id=16, color=[255, 0, 0], type="", swap=""), + 17: dict(name="kpt-17", id=17, color=[255, 0, 0], type="", swap="kpt-15"), + 18: dict(name="kpt-18", id=18, color=[255, 0, 0], type="", swap="kpt-14"), + 19: dict(name="kpt-19", id=19, color=[255, 0, 0], type="", swap="kpt-13"), + 20: dict(name="kpt-20", id=20, color=[255, 0, 0], type="", swap="kpt-12"), + 21: dict(name="kpt-21", id=21, color=[255, 0, 0], type="", swap="kpt-11"), + 22: dict(name="kpt-22", id=22, color=[255, 0, 0], type="", swap="kpt-10"), + 23: dict(name="kpt-23", id=23, color=[255, 0, 0], type="", swap="kpt-9"), + 24: dict(name="kpt-24", id=24, color=[255, 0, 0], type="", swap="kpt-8"), + 25: dict(name="kpt-25", id=25, color=[255, 0, 0], type="", swap="kpt-7"), + 26: dict(name="kpt-26", id=26, color=[255, 0, 0], type="", swap="kpt-6"), + 27: dict(name="kpt-27", id=27, color=[255, 0, 0], type="", swap="kpt-5"), + 28: dict(name="kpt-28", id=28, color=[255, 0, 0], type="", swap="kpt-4"), + 29: dict(name="kpt-29", id=29, color=[255, 0, 0], type="", swap="kpt-3"), + 30: dict(name="kpt-30", id=30, color=[255, 0, 0], type="", swap="kpt-2"), + 31: dict(name="kpt-31", id=31, color=[255, 0, 0], type="", swap="kpt-1"), + 32: dict(name="kpt-32", id=32, color=[255, 0, 0], type="", swap="kpt-0"), + 33: dict(name="kpt-33", id=33, color=[255, 0, 0], type="", swap="kpt-46"), + 34: dict(name="kpt-34", id=34, color=[255, 0, 0], type="", swap="kpt-45"), + 35: dict(name="kpt-35", id=35, color=[255, 0, 0], type="", swap="kpt-44"), + 36: dict(name="kpt-36", id=36, color=[255, 0, 0], type="", swap="kpt-43"), + 37: dict(name="kpt-37", id=37, color=[255, 0, 0], type="", swap="kpt-42"), + 38: dict(name="kpt-38", id=38, color=[255, 0, 0], type="", swap="kpt-50"), + 39: dict(name="kpt-39", id=39, color=[255, 0, 0], type="", swap="kpt-49"), + 40: dict(name="kpt-40", id=40, color=[255, 0, 0], type="", swap="kpt-48"), + 41: dict(name="kpt-41", id=41, color=[255, 0, 0], type="", swap="kpt-47"), + 42: dict(name="kpt-42", id=42, color=[255, 0, 0], type="", swap="kpt-37"), + 43: dict(name="kpt-43", id=43, color=[255, 0, 0], type="", swap="kpt-36"), + 44: dict(name="kpt-44", id=44, color=[255, 0, 0], type="", swap="kpt-35"), + 45: dict(name="kpt-45", id=45, color=[255, 0, 0], type="", swap="kpt-34"), + 46: dict(name="kpt-46", id=46, color=[255, 0, 0], type="", swap="kpt-33"), + 47: dict(name="kpt-47", id=47, color=[255, 0, 0], type="", swap="kpt-41"), + 48: dict(name="kpt-48", id=48, color=[255, 0, 0], type="", swap="kpt-40"), + 49: dict(name="kpt-49", id=49, color=[255, 0, 0], type="", swap="kpt-39"), + 50: dict(name="kpt-50", id=50, color=[255, 0, 0], type="", swap="kpt-38"), + 51: dict(name="kpt-51", id=51, color=[255, 0, 0], type="", swap=""), + 52: dict(name="kpt-52", id=52, color=[255, 0, 0], type="", swap=""), + 53: dict(name="kpt-53", id=53, color=[255, 0, 0], type="", swap=""), + 54: dict(name="kpt-54", id=54, color=[255, 0, 0], type="", swap=""), + 55: dict(name="kpt-55", id=55, color=[255, 0, 0], type="", swap="kpt-59"), + 56: dict(name="kpt-56", id=56, color=[255, 0, 0], type="", swap="kpt-58"), + 57: dict(name="kpt-57", id=57, color=[255, 0, 0], type="", swap=""), + 58: dict(name="kpt-58", id=58, color=[255, 0, 0], type="", swap="kpt-56"), + 59: dict(name="kpt-59", id=59, color=[255, 0, 0], type="", swap="kpt-55"), + 60: dict(name="kpt-60", id=60, color=[255, 0, 0], type="", swap="kpt-72"), + 61: dict(name="kpt-61", id=61, color=[255, 0, 0], type="", swap="kpt-71"), + 62: dict(name="kpt-62", id=62, color=[255, 0, 0], type="", swap="kpt-70"), + 63: dict(name="kpt-63", id=63, color=[255, 0, 0], type="", swap="kpt-69"), + 64: dict(name="kpt-64", id=64, color=[255, 0, 0], type="", swap="kpt-68"), + 65: dict(name="kpt-65", id=65, color=[255, 0, 0], type="", swap="kpt-75"), + 66: dict(name="kpt-66", id=66, color=[255, 0, 0], type="", swap="kpt-74"), + 67: dict(name="kpt-67", id=67, color=[255, 0, 0], type="", swap="kpt-73"), + 68: dict(name="kpt-68", id=68, color=[255, 0, 0], type="", swap="kpt-64"), + 69: dict(name="kpt-69", id=69, color=[255, 0, 0], type="", swap="kpt-63"), + 70: dict(name="kpt-70", id=70, color=[255, 0, 0], type="", swap="kpt-62"), + 71: dict(name="kpt-71", id=71, color=[255, 0, 0], type="", swap="kpt-61"), + 72: dict(name="kpt-72", id=72, color=[255, 0, 0], type="", swap="kpt-60"), + 73: dict(name="kpt-73", id=73, color=[255, 0, 0], type="", swap="kpt-67"), + 74: dict(name="kpt-74", id=74, color=[255, 0, 0], type="", swap="kpt-66"), + 75: dict(name="kpt-75", id=75, color=[255, 0, 0], type="", swap="kpt-65"), + 76: dict(name="kpt-76", id=76, color=[255, 0, 0], type="", swap="kpt-82"), + 77: dict(name="kpt-77", id=77, color=[255, 0, 0], type="", swap="kpt-81"), + 78: dict(name="kpt-78", id=78, color=[255, 0, 0], type="", swap="kpt-80"), + 79: dict(name="kpt-79", id=79, color=[255, 0, 0], type="", swap=""), + 80: dict(name="kpt-80", id=80, color=[255, 0, 0], type="", swap="kpt-78"), + 81: dict(name="kpt-81", id=81, color=[255, 0, 0], type="", swap="kpt-77"), + 82: dict(name="kpt-82", id=82, color=[255, 0, 0], type="", swap="kpt-76"), + 83: dict(name="kpt-83", id=83, color=[255, 0, 0], type="", swap="kpt-87"), + 84: dict(name="kpt-84", id=84, color=[255, 0, 0], type="", swap="kpt-86"), + 85: dict(name="kpt-85", id=85, color=[255, 0, 0], type="", swap=""), + 86: dict(name="kpt-86", id=86, color=[255, 0, 0], type="", swap="kpt-84"), + 87: dict(name="kpt-87", id=87, color=[255, 0, 0], type="", swap="kpt-83"), + 88: dict(name="kpt-88", id=88, color=[255, 0, 0], type="", swap="kpt-92"), + 89: dict(name="kpt-89", id=89, color=[255, 0, 0], type="", swap="kpt-91"), + 90: dict(name="kpt-90", id=90, color=[255, 0, 0], type="", swap=""), + 91: dict(name="kpt-91", id=91, color=[255, 0, 0], type="", swap="kpt-89"), + 92: dict(name="kpt-92", id=92, color=[255, 0, 0], type="", swap="kpt-88"), + 93: dict(name="kpt-93", id=93, color=[255, 0, 0], type="", swap="kpt-95"), + 94: dict(name="kpt-94", id=94, color=[255, 0, 0], type="", swap=""), + 95: dict(name="kpt-95", id=95, color=[255, 0, 0], type="", swap="kpt-93"), + 96: dict(name="kpt-96", id=96, color=[255, 0, 0], type="", swap="kpt-97"), + 97: dict(name="kpt-97", id=97, color=[255, 0, 0], type="", swap="kpt-96"), }, skeleton_info={}, - joint_weights=[1.] * 98, - sigmas=[]) + joint_weights=[1.0] * 98, + sigmas=[], +) diff --git a/mmpose/configs/_base_/datasets/zebra.py b/mmpose/configs/_base_/datasets/zebra.py index eac71f796a761bbf87b123f8b7b8b4585df0c525..1e4e0b307ec5e751bfc0da576d04a17f5f594222 100644 --- a/mmpose/configs/_base_/datasets/zebra.py +++ b/mmpose/configs/_base_/datasets/zebra.py @@ -1,64 +1,35 @@ dataset_info = dict( - dataset_name='zebra', + dataset_name="zebra", paper_info=dict( - author='Graving, Jacob M and Chae, Daniel and Naik, Hemal and ' - 'Li, Liang and Koger, Benjamin and Costelloe, Blair R and ' - 'Couzin, Iain D', - title='DeepPoseKit, a software toolkit for fast and robust ' - 'animal pose estimation using deep learning', - container='Elife', - year='2019', - homepage='https://github.com/jgraving/DeepPoseKit-Data', + author="Graving, Jacob M and Chae, Daniel and Naik, Hemal and " + "Li, Liang and Koger, Benjamin and Costelloe, Blair R and " + "Couzin, Iain D", + title="DeepPoseKit, a software toolkit for fast and robust " "animal pose estimation using deep learning", + container="Elife", + year="2019", + homepage="https://github.com/jgraving/DeepPoseKit-Data", ), keypoint_info={ - 0: - dict(name='snout', id=0, color=[255, 255, 255], type='', swap=''), - 1: - dict(name='head', id=1, color=[255, 255, 255], type='', swap=''), - 2: - dict(name='neck', id=2, color=[255, 255, 255], type='', swap=''), - 3: - dict( - name='forelegL1', - id=3, - color=[255, 255, 255], - type='', - swap='forelegR1'), - 4: - dict( - name='forelegR1', - id=4, - color=[255, 255, 255], - type='', - swap='forelegL1'), - 5: - dict( - name='hindlegL1', - id=5, - color=[255, 255, 255], - type='', - swap='hindlegR1'), - 6: - dict( - name='hindlegR1', - id=6, - color=[255, 255, 255], - type='', - swap='hindlegL1'), - 7: - dict(name='tailbase', id=7, color=[255, 255, 255], type='', swap=''), - 8: - dict(name='tailtip', id=8, color=[255, 255, 255], type='', swap='') + 0: dict(name="snout", id=0, color=[255, 255, 255], type="", swap=""), + 1: dict(name="head", id=1, color=[255, 255, 255], type="", swap=""), + 2: dict(name="neck", id=2, color=[255, 255, 255], type="", swap=""), + 3: dict(name="forelegL1", id=3, color=[255, 255, 255], type="", swap="forelegR1"), + 4: dict(name="forelegR1", id=4, color=[255, 255, 255], type="", swap="forelegL1"), + 5: dict(name="hindlegL1", id=5, color=[255, 255, 255], type="", swap="hindlegR1"), + 6: dict(name="hindlegR1", id=6, color=[255, 255, 255], type="", swap="hindlegL1"), + 7: dict(name="tailbase", id=7, color=[255, 255, 255], type="", swap=""), + 8: dict(name="tailtip", id=8, color=[255, 255, 255], type="", swap=""), }, skeleton_info={ - 0: dict(link=('head', 'snout'), id=0, color=[255, 255, 255]), - 1: dict(link=('neck', 'head'), id=1, color=[255, 255, 255]), - 2: dict(link=('forelegL1', 'neck'), id=2, color=[255, 255, 255]), - 3: dict(link=('forelegR1', 'neck'), id=3, color=[255, 255, 255]), - 4: dict(link=('hindlegL1', 'tailbase'), id=4, color=[255, 255, 255]), - 5: dict(link=('hindlegR1', 'tailbase'), id=5, color=[255, 255, 255]), - 6: dict(link=('tailbase', 'neck'), id=6, color=[255, 255, 255]), - 7: dict(link=('tailtip', 'tailbase'), id=7, color=[255, 255, 255]) + 0: dict(link=("head", "snout"), id=0, color=[255, 255, 255]), + 1: dict(link=("neck", "head"), id=1, color=[255, 255, 255]), + 2: dict(link=("forelegL1", "neck"), id=2, color=[255, 255, 255]), + 3: dict(link=("forelegR1", "neck"), id=3, color=[255, 255, 255]), + 4: dict(link=("hindlegL1", "tailbase"), id=4, color=[255, 255, 255]), + 5: dict(link=("hindlegR1", "tailbase"), id=5, color=[255, 255, 255]), + 6: dict(link=("tailbase", "neck"), id=6, color=[255, 255, 255]), + 7: dict(link=("tailtip", "tailbase"), id=7, color=[255, 255, 255]), }, - joint_weights=[1.] * 9, - sigmas=[]) + joint_weights=[1.0] * 9, + sigmas=[], +) diff --git a/mmpose/configs/_base_/default_runtime.py b/mmpose/configs/_base_/default_runtime.py index d87e8f15efa8c5f2a2a9fa1e827382b504e44f35..5f46382a280a5e4a6909348caaf467b3823300f5 100644 --- a/mmpose/configs/_base_/default_runtime.py +++ b/mmpose/configs/_base_/default_runtime.py @@ -1,52 +1,46 @@ -default_scope = 'mmpose' +default_scope = "mmpose" # hooks default_hooks = dict( - timer=dict(type='IterTimerHook'), - logger=dict(type='LoggerHook', interval=50), - param_scheduler=dict(type='ParamSchedulerHook'), - checkpoint=dict(type='CheckpointHook', interval=10), - sampler_seed=dict(type='DistSamplerSeedHook'), - visualization=dict(type='PoseVisualizationHook', enable=False), - badcase=dict( - type='BadCaseAnalysisHook', - enable=False, - out_dir='badcase', - metric_type='loss', - badcase_thr=5)) + timer=dict(type="IterTimerHook"), + logger=dict(type="LoggerHook", interval=50), + param_scheduler=dict(type="ParamSchedulerHook"), + checkpoint=dict(type="CheckpointHook", interval=10), + sampler_seed=dict(type="DistSamplerSeedHook"), + visualization=dict(type="PoseVisualizationHook", enable=False), + badcase=dict(type="BadCaseAnalysisHook", enable=False, out_dir="badcase", metric_type="loss", badcase_thr=5), +) # custom hooks custom_hooks = [ # Synchronize model buffers such as running_mean and running_var in BN # at the end of each epoch - dict(type='SyncBuffersHook') + dict(type="SyncBuffersHook") ] # multi-processing backend env_cfg = dict( cudnn_benchmark=False, - mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), - dist_cfg=dict(backend='nccl'), + mp_cfg=dict(mp_start_method="fork", opencv_num_threads=0), + dist_cfg=dict(backend="nccl"), ) # visualizer vis_backends = [ - dict(type='LocalVisBackend'), - dict(type='TensorboardVisBackend'), + dict(type="LocalVisBackend"), + dict(type="TensorboardVisBackend"), # dict(type='WandbVisBackend'), ] -visualizer = dict( - type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer') +visualizer = dict(type="PoseLocalVisualizer", vis_backends=vis_backends, name="visualizer") # logger -log_processor = dict( - type='LogProcessor', window_size=50, by_epoch=True, num_digits=6) -log_level = 'INFO' +log_processor = dict(type="LogProcessor", window_size=50, by_epoch=True, num_digits=6) +log_level = "INFO" load_from = None resume = False # file I/O backend -backend_args = dict(backend='local') +backend_args = dict(backend="local") # training/validation/testing progress train_cfg = dict(by_epoch=True) diff --git a/mmpose/configs/_base_/merged_COCO_AIC_MPII.py b/mmpose/configs/_base_/merged_COCO_AIC_MPII.py index 46952757da1270d3b0fc570be0a61e4c93114ef3..2272ab451be20248d8cfc3fce0d729745fbdfbc5 100644 --- a/mmpose/configs/_base_/merged_COCO_AIC_MPII.py +++ b/mmpose/configs/_base_/merged_COCO_AIC_MPII.py @@ -1,238 +1,92 @@ +# Copyright (c) Miroslav Purkrabek, BMPv1. All rights reserved. + dataset_info = dict( - dataset_name='merged_COCO_AIC_MPII', + dataset_name="merged_COCO_AIC_MPII", paper_info=dict( - author='Miroslav Purkrabek', - title='Merged Pose Estimation Dataset', - container='', - year='2024', - homepage='', + author="Miroslav Purkrabek", + title="Merged Pose Estimation Dataset", + container="", + year="2024", + homepage="", ), keypoint_info={ - 0: - dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), - 1: - dict( - name='left_eye', - id=1, - color=[51, 153, 255], - type='upper', - swap='right_eye'), - 2: - dict( - name='right_eye', - id=2, - color=[51, 153, 255], - type='upper', - swap='left_eye'), - 3: - dict( - name='left_ear', - id=3, - color=[51, 153, 255], - type='upper', - swap='right_ear'), - 4: - dict( - name='right_ear', - id=4, - color=[51, 153, 255], - type='upper', - swap='left_ear'), - 5: - dict( - name='left_shoulder', - id=5, - color=[0, 255, 0], - type='upper', - swap='right_shoulder'), - 6: - dict( - name='right_shoulder', - id=6, - color=[255, 128, 0], - type='upper', - swap='left_shoulder'), - 7: - dict( - name='left_elbow', - id=7, - color=[0, 255, 0], - type='upper', - swap='right_elbow'), - 8: - dict( - name='right_elbow', - id=8, - color=[255, 128, 0], - type='upper', - swap='left_elbow'), - 9: - dict( - name='left_wrist', - id=9, - color=[0, 255, 0], - type='upper', - swap='right_wrist'), - 10: - dict( - name='right_wrist', - id=10, - color=[255, 128, 0], - type='upper', - swap='left_wrist'), - 11: - dict( - name='left_hip', - id=11, - color=[0, 255, 0], - type='lower', - swap='right_hip'), - 12: - dict( - name='right_hip', - id=12, - color=[255, 128, 0], - type='lower', - swap='left_hip'), - 13: - dict( - name='left_knee', - id=13, - color=[0, 255, 0], - type='lower', - swap='right_knee'), - 14: - dict( - name='right_knee', - id=14, - color=[255, 128, 0], - type='lower', - swap='left_knee'), - 15: - dict( - name='left_ankle', - id=15, - color=[0, 255, 0], - type='lower', - swap='right_ankle'), - 16: - dict( - name='right_ankle', - id=16, - color=[255, 128, 0], - type='lower', - swap='left_ankle'), - 17: - dict( - name='thorax', - id=17, - color=[255, 128, 0], - type='upper', - swap=''), - 18: - dict( - name='neck', - id=18, - color=[255, 128, 0], - type='upper', - swap=''), - 19: - dict( - name='top_head', - id=19, - color=[255, 128, 0], - type='upper', - swap=''), - 20: - dict( - name='pelvis', - id=20, - color=[255, 128, 0], - type='lower', - swap=''), + 0: dict(name="nose", id=0, color=[51, 153, 255], type="upper", swap=""), + 1: dict(name="left_eye", id=1, color=[51, 153, 255], type="upper", swap="right_eye"), + 2: dict(name="right_eye", id=2, color=[51, 153, 255], type="upper", swap="left_eye"), + 3: dict(name="left_ear", id=3, color=[51, 153, 255], type="upper", swap="right_ear"), + 4: dict(name="right_ear", id=4, color=[51, 153, 255], type="upper", swap="left_ear"), + 5: dict(name="left_shoulder", id=5, color=[0, 255, 0], type="upper", swap="right_shoulder"), + 6: dict(name="right_shoulder", id=6, color=[255, 128, 0], type="upper", swap="left_shoulder"), + 7: dict(name="left_elbow", id=7, color=[0, 255, 0], type="upper", swap="right_elbow"), + 8: dict(name="right_elbow", id=8, color=[255, 128, 0], type="upper", swap="left_elbow"), + 9: dict(name="left_wrist", id=9, color=[0, 255, 0], type="upper", swap="right_wrist"), + 10: dict(name="right_wrist", id=10, color=[255, 128, 0], type="upper", swap="left_wrist"), + 11: dict(name="left_hip", id=11, color=[0, 255, 0], type="lower", swap="right_hip"), + 12: dict(name="right_hip", id=12, color=[255, 128, 0], type="lower", swap="left_hip"), + 13: dict(name="left_knee", id=13, color=[0, 255, 0], type="lower", swap="right_knee"), + 14: dict(name="right_knee", id=14, color=[255, 128, 0], type="lower", swap="left_knee"), + 15: dict(name="left_ankle", id=15, color=[0, 255, 0], type="lower", swap="right_ankle"), + 16: dict(name="right_ankle", id=16, color=[255, 128, 0], type="lower", swap="left_ankle"), + 17: dict(name="thorax", id=17, color=[255, 128, 0], type="upper", swap=""), + 18: dict(name="neck", id=18, color=[255, 128, 0], type="upper", swap=""), + 19: dict(name="top_head", id=19, color=[255, 128, 0], type="upper", swap=""), + 20: dict(name="pelvis", id=20, color=[255, 128, 0], type="lower", swap=""), }, skeleton_info={ - 0: - dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), - 1: - dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), - 2: - dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), - 3: - dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), + 0: dict(link=("left_ankle", "left_knee"), id=0, color=[0, 255, 0]), + 1: dict(link=("left_knee", "left_hip"), id=1, color=[0, 255, 0]), + 2: dict(link=("right_ankle", "right_knee"), id=2, color=[255, 128, 0]), + 3: dict(link=("right_knee", "right_hip"), id=3, color=[255, 128, 0]), # 4: # dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), - 4: - dict(link=('left_hip', 'pelvis'), id=4, color=[51, 153, 255]), - 5: - dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), - 6: - dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), + 4: dict(link=("left_hip", "pelvis"), id=4, color=[51, 153, 255]), + 5: dict(link=("left_shoulder", "left_hip"), id=5, color=[51, 153, 255]), + 6: dict(link=("right_shoulder", "right_hip"), id=6, color=[51, 153, 255]), # 7: # dict( # link=('left_shoulder', 'right_shoulder'), # id=7, # color=[51, 153, 255]), - 7: - dict( - link=('left_shoulder', 'thorax'), - id=7, - color=[51, 153, 255]), - 8: - dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), - 9: - dict( - link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), - 10: - dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), - 11: - dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), - 12: - dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), - 13: - dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), - 14: - dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), - 15: - dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), - 16: - dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), - 17: - dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), - 18: - dict( - link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]), - 19: - dict(link=('pelvis', 'right_hip'), id=19, color=[51, 153, 255]), - 20: - dict( - link=('right_shoulder', 'thorax'), - id=20, - color=[51, 153, 255]), - 21: - dict( - link=('thorax', 'neck'), - id=21, - color=[51, 153, 255]), - 22: - dict( - link=('left_ear', 'top_head'), - id=22, - color=[51, 153, 255]), - 23: - dict( - link=('right_ear', 'top_head'), - id=23, - color=[51, 153, 255]), + 7: dict(link=("left_shoulder", "thorax"), id=7, color=[51, 153, 255]), + 8: dict(link=("left_shoulder", "left_elbow"), id=8, color=[0, 255, 0]), + 9: dict(link=("right_shoulder", "right_elbow"), id=9, color=[255, 128, 0]), + 10: dict(link=("left_elbow", "left_wrist"), id=10, color=[0, 255, 0]), + 11: dict(link=("right_elbow", "right_wrist"), id=11, color=[255, 128, 0]), + 12: dict(link=("left_eye", "right_eye"), id=12, color=[51, 153, 255]), + 13: dict(link=("nose", "left_eye"), id=13, color=[51, 153, 255]), + 14: dict(link=("nose", "right_eye"), id=14, color=[51, 153, 255]), + 15: dict(link=("left_eye", "left_ear"), id=15, color=[51, 153, 255]), + 16: dict(link=("right_eye", "right_ear"), id=16, color=[51, 153, 255]), + 17: dict(link=("left_ear", "left_shoulder"), id=17, color=[51, 153, 255]), + 18: dict(link=("right_ear", "right_shoulder"), id=18, color=[51, 153, 255]), + 19: dict(link=("pelvis", "right_hip"), id=19, color=[51, 153, 255]), + 20: dict(link=("right_shoulder", "thorax"), id=20, color=[51, 153, 255]), + 21: dict(link=("thorax", "neck"), id=21, color=[51, 153, 255]), + 22: dict(link=("left_ear", "top_head"), id=22, color=[51, 153, 255]), + 23: dict(link=("right_ear", "top_head"), id=23, color=[51, 153, 255]), }, - joint_weights=[ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5, 1., 1., 1., 1. - ], + joint_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.0, 1.0], sigmas=[ - 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, - 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089, - 0.079, 0.079, # Thorax and neck has the same as shoulders - 0.035, # Top of head has the same as ears - 0.107, # Pelvis has the same as hips - ]) + 0.026, + 0.025, + 0.025, + 0.035, + 0.035, + 0.079, + 0.079, + 0.072, + 0.072, + 0.062, + 0.062, + 0.107, + 0.107, + 0.087, + 0.087, + 0.089, + 0.089, + 0.079, + 0.079, # Thorax and neck has the same as shoulders + 0.035, # Top of head has the same as ears + 0.107, # Pelvis has the same as hips + ], +) diff --git a/mmpose/configs/animal_2d_keypoint/rtmpose/ap10k/rtmpose-m_8xb64-210e_ap10k-256x256.py b/mmpose/configs/animal_2d_keypoint/rtmpose/ap10k/rtmpose-m_8xb64-210e_ap10k-256x256.py index 0e8c007b311f07d3a838d015d37b88fc11f760e2..fb4e412ae2e7ff833d0febbbed3af6ec501af4e7 100644 --- a/mmpose/configs/animal_2d_keypoint/rtmpose/ap10k/rtmpose-m_8xb64-210e_ap10k-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/rtmpose/ap10k/rtmpose-m_8xb64-210e_ap10k-256x256.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,97 +10,78 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(256, 256), - sigma=(5.66, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(256, 256), sigma=(5.66, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'AP10KDataset' -data_mode = 'topdown' -data_root = 'data/ap10k/' +dataset_type = "AP10KDataset" +data_mode = "topdown" +data_root = "data/ap10k/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -110,68 +91,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -179,67 +141,57 @@ train_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-train-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-train-split1.json", + data_prefix=dict(img="data/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-val-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-val-split1.json", + data_prefix=dict(img="data/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-test-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-test-split1.json", + data_prefix=dict(img="data/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/ap10k-val-split1.json') -test_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/ap10k-test-split1.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/ap10k-val-split1.json") +test_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/ap10k-test-split1.json") diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P1-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P1-256x256.py index 0e7eb0136e9f8476c6863b52e9c2a366b7245fc3..48cdf0625406b586a1dc327fd89ad56eb9d13888 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P1-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P1-256x256.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=300, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='AdamW', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="AdamW", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=23, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AnimalKingdomDataset' -data_mode = 'topdown' -data_root = 'data/ak/' +dataset_type = "AnimalKingdomDataset" +data_mode = "topdown" +data_root = "data/ak/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,32 +84,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ak_P1/train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/ak_P1/train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=24, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ak_P1/test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/ak_P1/test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [dict(type='PCKAccuracy', thr=0.05), dict(type='AUC')] +val_evaluator = [dict(type="PCKAccuracy", thr=0.05), dict(type="AUC")] test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P2-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P2-256x256.py index f42057f8aa91de0ae2a234c7625dce725adf204b..cb97dfa533cc5beefdcfbf1437e8ff526de274c6 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P2-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P2-256x256.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=300, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='AdamW', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="AdamW", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=23, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AnimalKingdomDataset' -data_mode = 'topdown' -data_root = 'data/ak/' +dataset_type = "AnimalKingdomDataset" +data_mode = "topdown" +data_root = "data/ak/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,32 +84,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ak_P2/train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/ak_P2/train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=24, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ak_P2/test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/ak_P2/test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [dict(type='PCKAccuracy', thr=0.05), dict(type='AUC')] +val_evaluator = [dict(type="PCKAccuracy", thr=0.05), dict(type="AUC")] test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_amphibian-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_amphibian-256x256.py index 5a83e7a97b9478031f7ca4dcc4dccba0350d432d..ad7eb476cc714971790c68f1f1a5303623d0c287 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_amphibian-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_amphibian-256x256.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=300, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='AdamW', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="AdamW", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=23, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AnimalKingdomDataset' -data_mode = 'topdown' -data_root = 'data/ak/' +dataset_type = "AnimalKingdomDataset" +data_mode = "topdown" +data_root = "data/ak/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,32 +84,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ak_P3_amphibian/train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/ak_P3_amphibian/train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=24, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ak_P3_amphibian/test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/ak_P3_amphibian/test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [dict(type='PCKAccuracy', thr=0.05), dict(type='AUC')] +val_evaluator = [dict(type="PCKAccuracy", thr=0.05), dict(type="AUC")] test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_bird-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_bird-256x256.py index ca3c91af610fe995aa24106e0bc6f72b012f9228..c5ea9c2879b24693a1a7bf9466bbb0911108aa04 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_bird-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_bird-256x256.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=300, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='AdamW', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="AdamW", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=23, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AnimalKingdomDataset' -data_mode = 'topdown' -data_root = 'data/ak/' +dataset_type = "AnimalKingdomDataset" +data_mode = "topdown" +data_root = "data/ak/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,32 +84,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ak_P3_bird/train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/ak_P3_bird/train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=24, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ak_P3_bird/test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/ak_P3_bird/test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [dict(type='PCKAccuracy', thr=0.05), dict(type='AUC')] +val_evaluator = [dict(type="PCKAccuracy", thr=0.05), dict(type="AUC")] test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_fish-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_fish-256x256.py index 3923f30d104b22c21a4f1b1252a09e3fcbfb99fd..b608cfb9831edd7b9f35a851181c71d1cb7966f8 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_fish-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_fish-256x256.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=300, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='AdamW', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="AdamW", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=23, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AnimalKingdomDataset' -data_mode = 'topdown' -data_root = 'data/ak/' +dataset_type = "AnimalKingdomDataset" +data_mode = "topdown" +data_root = "data/ak/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,32 +84,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ak_P3_fish/train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/ak_P3_fish/train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=24, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ak_P3_fish/test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/ak_P3_fish/test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [dict(type='PCKAccuracy', thr=0.05), dict(type='AUC')] +val_evaluator = [dict(type="PCKAccuracy", thr=0.05), dict(type="AUC")] test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_mammal-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_mammal-256x256.py index d061c4b6fbc2e01a0b30241cca7fd5212fe29eca..24d97d7b64c87a6d17d56a967f30ec296ba6c5ff 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_mammal-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_mammal-256x256.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=300, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='AdamW', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="AdamW", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=23, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AnimalKingdomDataset' -data_mode = 'topdown' -data_root = 'data/ak/' +dataset_type = "AnimalKingdomDataset" +data_mode = "topdown" +data_root = "data/ak/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,32 +84,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ak_P3_mammal/train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/ak_P3_mammal/train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=24, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ak_P3_mammal/test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/ak_P3_mammal/test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [dict(type='PCKAccuracy', thr=0.05), dict(type='AUC')] +val_evaluator = [dict(type="PCKAccuracy", thr=0.05), dict(type="AUC")] test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_reptile-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_reptile-256x256.py index b06a49936bad84e9e01cd5510e779e1909d56520..10ab0e036b53a8c8a79137e651091589617119d3 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_reptile-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ak/td-hm_hrnet-w32_8xb32-300e_animalkingdom_P3_reptile-256x256.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=300, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='AdamW', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="AdamW", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=23, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AnimalKingdomDataset' -data_mode = 'topdown' -data_root = 'data/ak/' +dataset_type = "AnimalKingdomDataset" +data_mode = "topdown" +data_root = "data/ak/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,32 +84,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ak_P3_reptile/train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/ak_P3_reptile/train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=24, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ak_P3_reptile/test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/ak_P3_reptile/test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [dict(type='PCKAccuracy', thr=0.05), dict(type='AUC')] +val_evaluator = [dict(type="PCKAccuracy", thr=0.05), dict(type="AUC")] test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_ViTPose-base_8xb64-210e_animalpose-256x192.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_ViTPose-base_8xb64-210e_animalpose-256x192.py index b73fa4083330eb2c775c8c3b31241bd635bd40b7..f84d8591e96d50365068a2a548fd24f9b235e152 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_ViTPose-base_8xb64-210e_animalpose-256x192.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_ViTPose-base_8xb64-210e_animalpose-256x192.py @@ -1,114 +1,100 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), + optimizer=dict(type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), paramwise_cfg=dict( num_layers=12, layer_decay_rate=0.75, custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), }, ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), ) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='mmpretrain.VisionTransformer', - arch='base', + type="mmpretrain.VisionTransformer", + arch="base", img_size=(256, 192), patch_size=16, qkv_bias=True, drop_path_rate=0.3, with_cls_token=False, - out_type='featmap', + out_type="featmap", patch_cfg=dict(padding=2), init_cfg=dict( - type='Pretrained', - checkpoint = 'models/pretrained/mae_pretrain_vit_small_20230913.pth' + type="Pretrained", + checkpoint="models/pretrained/mae_pretrain_vit_small_20230913.pth", # checkpoint='https://download.openmmlab.com/mmpose/' # 'v1/pretrained_models/mae_pretrain_vit_base_20230913.pth'), + ), + head=dict( + type="HeatmapHead", + in_channels=768, + out_channels=17, + deconv_out_channels=(256, 256), + deconv_kernel_sizes=(4, 4), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + flip_mode="heatmap", + shift_heatmap=False, + ), ), - head=dict( - type='HeatmapHead', - in_channels=768, - out_channels=17, - deconv_out_channels=(256, 256), - deconv_kernel_sizes=(4, 4), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), - test_cfg=dict( - flip_test=True, - flip_mode='heatmap', - shift_heatmap=False, - )) +) # base dataset settings -dataset_type = 'AnimalPoseDataset' -data_mode = 'topdown' -data_root = "/datagrid/personal/purkrmir/data/AnimalPose/" +dataset_type = "AnimalPoseDataset" +data_mode = "topdown" +data_root = "path/to/AnimalPose/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -116,36 +102,36 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='animalpose_train.json', - data_prefix=dict(img='images/'), + ann_file="animalpose_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='animalpose_val.json', + ann_file="animalpose_val.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'animalpose_val.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "animalpose_val.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_ViTPose-small_8xb64-210e_animalpose-256x192.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_ViTPose-small_8xb64-210e_animalpose-256x192.py index f626920050a9f9c8587c788b5831a8ec96083e15..a5ca93310434f455ccdf7f8827dda4002e05ceee 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_ViTPose-small_8xb64-210e_animalpose-256x192.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_ViTPose-small_8xb64-210e_animalpose-256x192.py @@ -1,125 +1,105 @@ -TRAIN_ROOT = "/datagrid/personal/purkrmir/data/AnimalPose/" +TRAIN_ROOT = "path/to/AnimalPose/" BATCH_SIZE = 64 -load_from = 'models/pretrained/vitpose-s.pth' +load_from = "models/pretrained/vitpose-s.pth" # load_from = 'models/pretrained/vitpose-s+_compatible.pth' -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), + optimizer=dict(type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), paramwise_cfg=dict( num_layers=12, layer_decay_rate=0.75, custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), }, ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), ) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='mmpretrain.VisionTransformer', - arch={ - 'embed_dims': 384, - 'num_layers': 12, - 'num_heads': 12, - 'feedforward_channels': 384 * 4 - }, + type="mmpretrain.VisionTransformer", + arch={"embed_dims": 384, "num_layers": 12, "num_heads": 12, "feedforward_channels": 384 * 4}, img_size=(256, 192), patch_size=16, qkv_bias=True, drop_path_rate=0.1, with_cls_token=False, - out_type='featmap', + out_type="featmap", patch_cfg=dict(padding=2), - init_cfg=None + init_cfg=None, # init_cfg=dict( # type='Pretrained', # checkpoint='models/pretrained/vitpose-s+.pth'), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=384, out_channels=20, deconv_out_channels=(256, 256), deconv_kernel_sizes=(4, 4), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'AnimalPoseDataset' -data_mode = 'topdown' +dataset_type = "AnimalPoseDataset" +data_mode = "topdown" data_root = TRAIN_ROOT # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -127,36 +107,36 @@ train_dataloader = dict( batch_size=BATCH_SIZE, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/animalpose_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/animalpose_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=BATCH_SIZE, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/animalpose_val.json', + ann_file="annotations/animalpose_val.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/animalpose_val.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/animalpose_val.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_hrnet-w32_8xb64-210e_animalpose-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_hrnet-w32_8xb64-210e_animalpose-256x256.py index 2680fe8956e7b1cbf186b1c536204917478d721f..5b75c36103c51207923991ece488f07cbc5c8efa 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_hrnet-w32_8xb64-210e_animalpose-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_hrnet-w32_8xb64-210e_animalpose-256x256.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=20, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AnimalPoseDataset' -data_mode = 'topdown' -data_root = 'data/animalpose/' +dataset_type = "AnimalPoseDataset" +data_mode = "topdown" +data_root = "data/animalpose/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,33 +84,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/animalpose_train.json', - data_prefix=dict(img=''), + ann_file="annotations/animalpose_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/animalpose_val.json', - data_prefix=dict(img=''), + ann_file="annotations/animalpose_val.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', ann_file=data_root + 'annotations/animalpose_val.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/animalpose_val.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_hrnet-w48_8xb64-210e_animalpose-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_hrnet-w48_8xb64-210e_animalpose-256x256.py index 3d4a76d8f506c60493ef7e476cb5ed3310044ba2..42239ceb486c1930d401022d8e69ebbc0f4b6351 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_hrnet-w48_8xb64-210e_animalpose-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_hrnet-w48_8xb64-210e_animalpose-256x256.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=20, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AnimalPoseDataset' -data_mode = 'topdown' -data_root = 'data/animalpose/' +dataset_type = "AnimalPoseDataset" +data_mode = "topdown" +data_root = "data/animalpose/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,33 +84,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/animalpose_train.json', - data_prefix=dict(img=''), + ann_file="annotations/animalpose_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/animalpose_val.json', - data_prefix=dict(img=''), + ann_file="annotations/animalpose_val.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', ann_file=data_root + 'annotations/animalpose_val.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/animalpose_val.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_hrnet_reproduce-w48_8xb64-210e_animalpose-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_hrnet_reproduce-w48_8xb64-210e_animalpose-256x256.py index c4818e7ec858dae0dfc2ac69f553bc9495ac9a0b..38702ceca5e30476e5c8f2b804af2024d8eb0ad9 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_hrnet_reproduce-w48_8xb64-210e_animalpose-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_hrnet_reproduce-w48_8xb64-210e_animalpose-256x256.py @@ -1,117 +1,86 @@ -TRAIN_ROOT = "/datagrid/personal/purkrmir/data/AnimalPose/" +TRAIN_ROOT = "path/to/AnimalPose/" BATCH_SIZE = 64 -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=20, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AnimalPoseDataset' -data_mode = 'topdown' +dataset_type = "AnimalPoseDataset" +data_mode = "topdown" data_root = TRAIN_ROOT # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -119,34 +88,34 @@ train_dataloader = dict( batch_size=BATCH_SIZE, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='animalpose_train_coco.json', - data_prefix=dict(img='images/'), + ann_file="animalpose_train_coco.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=BATCH_SIZE, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='animalpose_val_coco.json', - data_prefix=dict(img='images/'), + ann_file="animalpose_val_coco.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'animalpose_val_coco.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "animalpose_val_coco.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_res101_8xb64-210e_animalpose-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_res101_8xb64-210e_animalpose-256x256.py index 8ffaabb06f160fb66260507db057686f4621b6b2..d4e3bd723787412b68d2dccda4da3ff11ad663f5 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_res101_8xb64-210e_animalpose-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_res101_8xb64-210e_animalpose-256x256.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=20, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=20, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AnimalPoseDataset' -data_mode = 'topdown' -data_root = 'data/animalpose/' +dataset_type = "AnimalPoseDataset" +data_mode = "topdown" +data_root = "data/animalpose/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,33 +73,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/animalpose_train.json', - data_prefix=dict(img=''), + ann_file="annotations/animalpose_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/animalpose_val.json', - data_prefix=dict(img=''), + ann_file="annotations/animalpose_val.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', ann_file=data_root + 'annotations/animalpose_val.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/animalpose_val.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_res152_8xb32-210e_animalpose-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_res152_8xb32-210e_animalpose-256x256.py index 8ed92929c9d42fa0caad87a5f6292f75745bd0bf..bd9827e2a0101d13d49016dd75a7ccbbc97d8573 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_res152_8xb32-210e_animalpose-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_res152_8xb32-210e_animalpose-256x256.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=20, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=20, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AnimalPoseDataset' -data_mode = 'topdown' -data_root = 'data/animalpose/' +dataset_type = "AnimalPoseDataset" +data_mode = "topdown" +data_root = "data/animalpose/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,33 +73,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/animalpose_train.json', - data_prefix=dict(img=''), + ann_file="annotations/animalpose_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/animalpose_val.json', - data_prefix=dict(img=''), + ann_file="annotations/animalpose_val.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', ann_file=data_root + 'annotations/animalpose_val.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/animalpose_val.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_res50_8xb64-210e_animalpose-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_res50_8xb64-210e_animalpose-256x256.py index c053c8881461de72345478da49293a6ca96c1ed4..5a503f8463608ce7833d11751dccf7bb514d383c 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_res50_8xb64-210e_animalpose-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_res50_8xb64-210e_animalpose-256x256.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=20, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=20, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AnimalPoseDataset' -data_mode = 'topdown' -data_root = 'data/animalpose/' +dataset_type = "AnimalPoseDataset" +data_mode = "topdown" +data_root = "data/animalpose/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,33 +73,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/animalpose_train.json', - data_prefix=dict(img=''), + ann_file="annotations/animalpose_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/animalpose_val.json', - data_prefix=dict(img=''), + ann_file="annotations/animalpose_val.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', ann_file=data_root + 'annotations/animalpose_val.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/animalpose_val.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/cspnext-m_udp_8xb64-210e_ap10k-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/cspnext-m_udp_8xb64-210e_ap10k-256x256.py index 844d17df4ef919ac0c2a9a14bfc966da14752286..a123fa6e69afca9a5421b425b0f4710e6bb218bc 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/cspnext-m_udp_8xb64-210e_ap10k-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/cspnext-m_udp_8xb64-210e_ap10k-256x256.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,143 +10,115 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning policy param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 105 to 210 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/' - 'cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth')), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/" + "cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth", + ), + ), head=dict( - type='HeatmapHead', - in_channels=768, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=768, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'AP10KDataset' -data_mode = 'topdown' -data_root = 'data/ap10k/' +dataset_type = "AP10KDataset" +data_mode = "topdown" +data_root = "data/ap10k/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -154,67 +126,57 @@ train_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-train-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-train-split1.json", + data_prefix=dict(img="data/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-val-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-val-split1.json", + data_prefix=dict(img="data/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-test-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-test-split1.json", + data_prefix=dict(img="data/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/ap10k-val-split1.json') -test_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/ap10k-test-split1.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/ap10k-val-split1.json") +test_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/ap10k-test-split1.json") diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_hrnet-w32_8xb64-210e_ap10k-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_hrnet-w32_8xb64-210e_ap10k-256x256.py index c61e6384aeea7efcca3ac2f2268fef01663e3234..24100202a0d5ec6969a9c474d9e297b2ae821abc 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_hrnet-w32_8xb64-210e_ap10k-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_hrnet-w32_8xb64-210e_ap10k-256x256.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AP10KDataset' -data_mode = 'topdown' -data_root = 'data/ap10k/' +dataset_type = "AP10KDataset" +data_mode = "topdown" +data_root = "data/ap10k/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,50 +84,49 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-train-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-train-split1.json", + data_prefix=dict(img="data/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-val-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-val-split1.json", + data_prefix=dict(img="data/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-test-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-test-split1.json", + data_prefix=dict(img="data/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/ap10k-val-split1.json') -test_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/ap10k-test-split1.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/ap10k-val-split1.json") +test_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/ap10k-test-split1.json") diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_hrnet-w48_8xb64-210e_ap10k-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_hrnet-w48_8xb64-210e_ap10k-256x256.py index 146114a887663a230f7a504e83f13da6fa4a2571..673ca2f372bd829ed8ca72877e14ad8f82541987 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_hrnet-w48_8xb64-210e_ap10k-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_hrnet-w48_8xb64-210e_ap10k-256x256.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AP10KDataset' -data_mode = 'topdown' -data_root = 'data/ap10k/' +dataset_type = "AP10KDataset" +data_mode = "topdown" +data_root = "data/ap10k/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,50 +84,49 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-train-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-train-split1.json", + data_prefix=dict(img="data/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-val-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-val-split1.json", + data_prefix=dict(img="data/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-test-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-test-split1.json", + data_prefix=dict(img="data/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/ap10k-val-split1.json') -test_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/ap10k-test-split1.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/ap10k-val-split1.json") +test_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/ap10k-test-split1.json") diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_res101_8xb64-210e_ap10k-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_res101_8xb64-210e_ap10k-256x256.py index be49577511584f892cc4c82797207e8ee1d6a8b4..1554755c2dca7c050e9840bedf326c128b5e151e 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_res101_8xb64-210e_ap10k-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_res101_8xb64-210e_ap10k-256x256.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AP10KDataset' -data_mode = 'topdown' -data_root = 'data/ap10k/' +dataset_type = "AP10KDataset" +data_mode = "topdown" +data_root = "data/ap10k/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,50 +73,49 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-train-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-train-split1.json", + data_prefix=dict(img="data/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-val-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-val-split1.json", + data_prefix=dict(img="data/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-test-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-test-split1.json", + data_prefix=dict(img="data/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/ap10k-val-split1.json') -test_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/ap10k-test-split1.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/ap10k-val-split1.json") +test_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/ap10k-test-split1.json") diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_res50_8xb64-210e_ap10k-256x256.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_res50_8xb64-210e_ap10k-256x256.py index 2172cbe938506ae2faa08ed731710e51203d579f..7a70e98deb6548c2f476ecb8548251002f97ab75 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_res50_8xb64-210e_ap10k-256x256.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/ap10k/td-hm_res50_8xb64-210e_ap10k-256x256.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AP10KDataset' -data_mode = 'topdown' -data_root = 'data/ap10k/' +dataset_type = "AP10KDataset" +data_mode = "topdown" +data_root = "data/ap10k/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,50 +73,49 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-train-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-train-split1.json", + data_prefix=dict(img="data/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-val-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-val-split1.json", + data_prefix=dict(img="data/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ap10k-test-split1.json', - data_prefix=dict(img='data/'), + ann_file="annotations/ap10k-test-split1.json", + data_prefix=dict(img="data/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/ap10k-val-split1.json') -test_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/ap10k-test-split1.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/ap10k-val-split1.json") +test_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/ap10k-test-split1.json") diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/locust/td-hm_res101_8xb64-210e_locust-160x160.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/locust/td-hm_res101_8xb64-210e_locust-160x160.py index f6e6c2e39bb28913b7ba180d0ab74c71a24c6cb6..e13a0541cd478e086594c3c49d8adc355a37f3c5 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/locust/td-hm_res101_8xb64-210e_locust-160x160.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/locust/td-hm_res101_8xb64-210e_locust-160x160.py @@ -1,87 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(160, 160), heatmap_size=(40, 40), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(160, 160), heatmap_size=(40, 40), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=35, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=35, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'LocustDataset' -data_mode = 'topdown' -data_root = 'data/locust/' +dataset_type = "LocustDataset" +data_mode = "topdown" +data_root = "data/locust/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale', padding=0.8), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_factor=0.25, - rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale", padding=0.8), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_factor=0.25, rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale', padding=0.8), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale", padding=0.8), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/locust_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/locust_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/locust_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/locust_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/locust/td-hm_res152_8xb32-210e_locust-160x160.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/locust/td-hm_res152_8xb32-210e_locust-160x160.py index 8f0a58bc88efab80a383df61137dbb45253da636..143dff9bd8eb6dd0525bba2154298decacb7e943 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/locust/td-hm_res152_8xb32-210e_locust-160x160.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/locust/td-hm_res152_8xb32-210e_locust-160x160.py @@ -1,87 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(160, 160), heatmap_size=(40, 40), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(160, 160), heatmap_size=(40, 40), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=35, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=35, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'LocustDataset' -data_mode = 'topdown' -data_root = 'data/locust/' +dataset_type = "LocustDataset" +data_mode = "topdown" +data_root = "data/locust/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale', padding=0.8), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_factor=0.25, - rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale", padding=0.8), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_factor=0.25, rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale', padding=0.8), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale", padding=0.8), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,36 +72,38 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/locust_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/locust_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/locust_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/locust_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/locust/td-hm_res50_8xb64-210e_locust-160x160.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/locust/td-hm_res50_8xb64-210e_locust-160x160.py index adbb89ee5b23f8697059f6778f1bfe13bd21432a..02e0ee4097f7b46c6221e6fb23edf8fbdb97751c 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/locust/td-hm_res50_8xb64-210e_locust-160x160.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/locust/td-hm_res50_8xb64-210e_locust-160x160.py @@ -1,87 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(160, 160), heatmap_size=(40, 40), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(160, 160), heatmap_size=(40, 40), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=35, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=35, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'LocustDataset' -data_mode = 'topdown' -data_root = 'data/locust/' +dataset_type = "LocustDataset" +data_mode = "topdown" +data_root = "data/locust/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale', padding=0.8), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_factor=0.25, - rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale", padding=0.8), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_factor=0.25, rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale', padding=0.8), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale", padding=0.8), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/locust_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/locust_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/locust_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/locust_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/zebra/td-hm_res101_8xb64-210e_zebra-160x160.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/zebra/td-hm_res101_8xb64-210e_zebra-160x160.py index 68c56d80fb91b068d684ec29b5c77da3e920a71f..126f4a553be015580e6515bbafab076394068b21 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/zebra/td-hm_res101_8xb64-210e_zebra-160x160.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/zebra/td-hm_res101_8xb64-210e_zebra-160x160.py @@ -1,87 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(160, 160), heatmap_size=(40, 40), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(160, 160), heatmap_size=(40, 40), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=9, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=9, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'ZebraDataset' -data_mode = 'topdown' -data_root = 'data/zebra/' +dataset_type = "ZebraDataset" +data_mode = "topdown" +data_root = "data/zebra/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale', padding=0.8), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_factor=0.25, - rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale", padding=0.8), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_factor=0.25, rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale', padding=0.8), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale", padding=0.8), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/zebra_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/zebra_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/zebra_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/zebra_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/zebra/td-hm_res152_8xb32-210e_zebra-160x160.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/zebra/td-hm_res152_8xb32-210e_zebra-160x160.py index abb14eefb84dd91912f84cf407faeabc83ec5c25..7fa2947dab00a0d17522c8ca8721a29bcfe9b6fe 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/zebra/td-hm_res152_8xb32-210e_zebra-160x160.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/zebra/td-hm_res152_8xb32-210e_zebra-160x160.py @@ -1,87 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(160, 160), heatmap_size=(40, 40), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(160, 160), heatmap_size=(40, 40), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=9, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=9, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'ZebraDataset' -data_mode = 'topdown' -data_root = 'data/zebra/' +dataset_type = "ZebraDataset" +data_mode = "topdown" +data_root = "data/zebra/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale', padding=0.8), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_factor=0.25, - rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale", padding=0.8), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_factor=0.25, rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale', padding=0.8), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale", padding=0.8), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,36 +72,38 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/zebra_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/zebra_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/zebra_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/zebra_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/zebra/td-hm_res50_8xb64-210e_zebra-160x160.py b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/zebra/td-hm_res50_8xb64-210e_zebra-160x160.py index e4d2777751d7837e7c892868f3027b145610de24..b0e27df8d4330647a31967ae7d49343ece924352 100644 --- a/mmpose/configs/animal_2d_keypoint/topdown_heatmap/zebra/td-hm_res50_8xb64-210e_zebra-160x160.py +++ b/mmpose/configs/animal_2d_keypoint/topdown_heatmap/zebra/td-hm_res50_8xb64-210e_zebra-160x160.py @@ -1,87 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(160, 160), heatmap_size=(40, 40), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(160, 160), heatmap_size=(40, 40), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=9, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=9, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'ZebraDataset' -data_mode = 'topdown' -data_root = 'data/zebra/' +dataset_type = "ZebraDataset" +data_mode = "topdown" +data_root = "data/zebra/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale', padding=0.8), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_factor=0.25, - rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale", padding=0.8), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_factor=0.25, rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale', padding=0.8), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale", padding=0.8), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/zebra_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/zebra_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/zebra_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/zebra_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/associative_embedding/coco/ae_hrnet-w32_8xb24-300e_coco-512x512.py b/mmpose/configs/body_2d_keypoint/associative_embedding/coco/ae_hrnet-w32_8xb24-300e_coco-512x512.py index a4804cbe37ec3932d7b1a7d83b89bf286f1c5761..43181de2c69a9052cd26b3ac2c15cc7c29532092 100644 --- a/mmpose/configs/body_2d_keypoint/associative_embedding/coco/ae_hrnet-w32_8xb24-300e_coco-512x512.py +++ b/mmpose/configs/body_2d_keypoint/associative_embedding/coco/ae_hrnet-w32_8xb24-300e_coco-512x512.py @@ -1,128 +1,100 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=300, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=1.5e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=1.5e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=300, - milestones=[200, 260], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=300, milestones=[200, 260], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=192) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', interval=50)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", interval=50)) # codec settings codec = dict( - type='AssociativeEmbedding', + type="AssociativeEmbedding", input_size=(512, 512), heatmap_size=(128, 128), sigma=2, decode_topk=30, decode_center_shift=0.5, - decode_keypoint_order=[ - 0, 1, 2, 3, 4, 5, 6, 11, 12, 7, 8, 9, 10, 13, 14, 15, 16 - ], - decode_max_instances=30) + decode_keypoint_order=[0, 1, 2, 3, 4, 5, 6, 11, 12, 7, 8, 9, 10, 13, 14, 15, 16], + decode_max_instances=30, +) # model settings model = dict( - type='BottomupPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="BottomupPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='AssociativeEmbeddingHead', + type="AssociativeEmbeddingHead", in_channels=32, num_keypoints=17, tag_dim=1, tag_per_keypoint=True, deconv_out_channels=None, - keypoint_loss=dict(type='KeypointMSELoss', use_target_weight=True), - tag_loss=dict(type='AssociativeEmbeddingLoss', loss_weight=0.001), + keypoint_loss=dict(type="KeypointMSELoss", use_target_weight=True), + tag_loss=dict(type="AssociativeEmbeddingLoss", loss_weight=0.001), # The heatmap will be resized to the input size before decoding # if ``restore_heatmap_size==True`` - decoder=dict(codec, heatmap_size=codec['input_size'])), - test_cfg=dict( - multiscale_test=False, - flip_test=True, - shift_heatmap=False, - restore_heatmap_size=True, - align_corners=False)) + decoder=dict(codec, heatmap_size=codec["input_size"]), + ), + test_cfg=dict(multiscale_test=False, flip_test=True, shift_heatmap=False, restore_heatmap_size=True, align_corners=False), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'bottomup' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "bottomup" +data_root = "data/coco/" # pipelines train_pipeline = [] val_pipeline = [ - dict(type='LoadImage'), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=codec["input_size"], size_factor=64, resize_mode="expand"), dict( - type='BottomupResize', - input_size=codec['input_size'], - size_factor=64, - resize_mode='expand'), - dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'crowd_index', 'ori_shape', - 'img_shape', 'input_size', 'input_center', 'input_scale', - 'flip', 'flip_direction', 'flip_indices', 'raw_ann_info', - 'skeleton_links')) + type="PackPoseInputs", + meta_keys=( + "id", + "img_id", + "img_path", + "crowd_index", + "ori_shape", + "img_shape", + "input_size", + "input_center", + "input_scale", + "flip", + "flip_direction", + "flip_indices", + "raw_ann_info", + "skeleton_links", + ), + ), ] # data loaders @@ -130,37 +102,39 @@ train_dataloader = dict( batch_size=24, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json', - nms_mode='none', - score_mode='bbox', + type="CocoMetric", + ann_file=data_root + "annotations/person_keypoints_val2017.json", + nms_mode="none", + score_mode="bbox", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/cid/coco/cid_hrnet-w32_8xb20-140e_coco-512x512.py b/mmpose/configs/body_2d_keypoint/cid/coco/cid_hrnet-w32_8xb20-140e_coco-512x512.py index 955293dcb1314f1d57cdb9efc4f62669cf41fabc..2e61a6e4a745eaba7e79a47780698381675033ce 100644 --- a/mmpose/configs/body_2d_keypoint/cid/coco/cid_hrnet-w32_8xb20-140e_coco-512x512.py +++ b/mmpose/configs/body_2d_keypoint/cid/coco/cid_hrnet-w32_8xb20-140e_coco-512x512.py @@ -1,126 +1,104 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=140, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=1e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=1e-3, + ) +) # learning policy -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=140, - milestones=[90, 120], - gamma=0.1, - by_epoch=True) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=140, milestones=[90, 120], gamma=0.1, by_epoch=True)] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=160) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='DecoupledHeatmap', input_size=(512, 512), heatmap_size=(128, 128)) +codec = dict(type="DecoupledHeatmap", input_size=(512, 512), heatmap_size=(128, 128)) # model settings model = dict( - type='BottomupPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="BottomupPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256), - multiscale_output=True)), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + multiscale_output=True, + ), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='CIDHead', + type="CIDHead", in_channels=480, num_keypoints=17, gfd_channels=32, - coupled_heatmap_loss=dict(type='FocalHeatmapLoss', loss_weight=1.0), - decoupled_heatmap_loss=dict(type='FocalHeatmapLoss', loss_weight=4.0), - contrastive_loss=dict( - type='InfoNCELoss', temperature=0.05, loss_weight=1.0), + coupled_heatmap_loss=dict(type="FocalHeatmapLoss", loss_weight=1.0), + decoupled_heatmap_loss=dict(type="FocalHeatmapLoss", loss_weight=4.0), + contrastive_loss=dict(type="InfoNCELoss", temperature=0.05, loss_weight=1.0), decoder=codec, ), train_cfg=dict(max_train_instances=200), - test_cfg=dict( - multiscale_test=False, - flip_test=True, - shift_heatmap=False, - align_corners=False)) + test_cfg=dict(multiscale_test=False, flip_test=True, shift_heatmap=False, align_corners=False), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'bottomup' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "bottomup" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='BottomupRandomAffine', input_size=codec['input_size']), - dict(type='RandomFlip', direction='horizontal'), - dict(type='GenerateTarget', encoder=codec), - dict(type='BottomupGetHeatmapMask'), - dict(type='PackPoseInputs'), + dict(type="LoadImage"), + dict(type="BottomupRandomAffine", input_size=codec["input_size"]), + dict(type="RandomFlip", direction="horizontal"), + dict(type="GenerateTarget", encoder=codec), + dict(type="BottomupGetHeatmapMask"), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict( - type='BottomupResize', - input_size=codec['input_size'], - size_factor=64, - resize_mode='expand'), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=codec["input_size"], size_factor=64, resize_mode="expand"), dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'crowd_index', 'ori_shape', - 'img_shape', 'input_size', 'input_center', 'input_scale', - 'flip', 'flip_direction', 'flip_indices', 'raw_ann_info', - 'skeleton_links')) + type="PackPoseInputs", + meta_keys=( + "id", + "img_id", + "img_path", + "crowd_index", + "ori_shape", + "img_shape", + "input_size", + "input_center", + "input_scale", + "flip", + "flip_direction", + "flip_indices", + "raw_ann_info", + "skeleton_links", + ), + ), ] # data loaders @@ -128,37 +106,39 @@ train_dataloader = dict( batch_size=20, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=1, num_workers=1, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json', + type="CocoMetric", + ann_file=data_root + "annotations/person_keypoints_val2017.json", nms_thr=0.8, - score_mode='keypoint', + score_mode="keypoint", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/cid/coco/cid_hrnet-w48_8xb20-140e_coco-512x512.py b/mmpose/configs/body_2d_keypoint/cid/coco/cid_hrnet-w48_8xb20-140e_coco-512x512.py index a114088ae217d8c8a2e0d16bab4459e163c6a129..edb6840e1b736f20a0f13b16457da909929a8650 100644 --- a/mmpose/configs/body_2d_keypoint/cid/coco/cid_hrnet-w48_8xb20-140e_coco-512x512.py +++ b/mmpose/configs/body_2d_keypoint/cid/coco/cid_hrnet-w48_8xb20-140e_coco-512x512.py @@ -1,126 +1,104 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=140, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=1e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=1e-3, + ) +) # learning policy -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=140, - milestones=[90, 120], - gamma=0.1, - by_epoch=True) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=140, milestones=[90, 120], gamma=0.1, by_epoch=True)] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=160) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='DecoupledHeatmap', input_size=(512, 512), heatmap_size=(128, 128)) +codec = dict(type="DecoupledHeatmap", input_size=(512, 512), heatmap_size=(128, 128)) # model settings model = dict( - type='BottomupPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="BottomupPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384), - multiscale_output=True)), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + multiscale_output=True, + ), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='CIDHead', + type="CIDHead", in_channels=720, num_keypoints=17, gfd_channels=48, - coupled_heatmap_loss=dict(type='FocalHeatmapLoss', loss_weight=1.0), - decoupled_heatmap_loss=dict(type='FocalHeatmapLoss', loss_weight=4.0), - contrastive_loss=dict( - type='InfoNCELoss', temperature=0.05, loss_weight=1.0), + coupled_heatmap_loss=dict(type="FocalHeatmapLoss", loss_weight=1.0), + decoupled_heatmap_loss=dict(type="FocalHeatmapLoss", loss_weight=4.0), + contrastive_loss=dict(type="InfoNCELoss", temperature=0.05, loss_weight=1.0), decoder=codec, ), train_cfg=dict(max_train_instances=200), - test_cfg=dict( - multiscale_test=False, - flip_test=True, - shift_heatmap=False, - align_corners=False)) + test_cfg=dict(multiscale_test=False, flip_test=True, shift_heatmap=False, align_corners=False), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'bottomup' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "bottomup" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='BottomupRandomAffine', input_size=codec['input_size']), - dict(type='RandomFlip', direction='horizontal'), - dict(type='GenerateTarget', encoder=codec), - dict(type='BottomupGetHeatmapMask'), - dict(type='PackPoseInputs'), + dict(type="LoadImage"), + dict(type="BottomupRandomAffine", input_size=codec["input_size"]), + dict(type="RandomFlip", direction="horizontal"), + dict(type="GenerateTarget", encoder=codec), + dict(type="BottomupGetHeatmapMask"), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict( - type='BottomupResize', - input_size=codec['input_size'], - size_factor=64, - resize_mode='expand'), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=codec["input_size"], size_factor=64, resize_mode="expand"), dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'crowd_index', 'ori_shape', - 'img_shape', 'input_size', 'input_center', 'input_scale', - 'flip', 'flip_direction', 'flip_indices', 'raw_ann_info', - 'skeleton_links')) + type="PackPoseInputs", + meta_keys=( + "id", + "img_id", + "img_path", + "crowd_index", + "ori_shape", + "img_shape", + "input_size", + "input_center", + "input_scale", + "flip", + "flip_direction", + "flip_indices", + "raw_ann_info", + "skeleton_links", + ), + ), ] # data loaders @@ -128,37 +106,39 @@ train_dataloader = dict( batch_size=20, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=1, num_workers=1, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json', + type="CocoMetric", + ann_file=data_root + "annotations/person_keypoints_val2017.json", nms_thr=0.8, - score_mode='keypoint', + score_mode="keypoint", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/dekr/coco/dekr_hrnet-w32_8xb10-140e_coco-512x512.py b/mmpose/configs/body_2d_keypoint/dekr/coco/dekr_hrnet-w32_8xb10-140e_coco-512x512.py index 743de8882cb7293b632fd3f6dedc37b15e9a0a55..5d72fd6e51f2eb2f40e961cad6939081c0cc1ad7 100644 --- a/mmpose/configs/body_2d_keypoint/dekr/coco/dekr_hrnet-w32_8xb10-140e_coco-512x512.py +++ b/mmpose/configs/body_2d_keypoint/dekr/coco/dekr_hrnet-w32_8xb10-140e_coco-512x512.py @@ -1,97 +1,72 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=140, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=1e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=1e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=140, - milestones=[90, 120], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=140, milestones=[90, 120], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=80) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings codec = dict( - type='SPR', + type="SPR", input_size=(512, 512), heatmap_size=(128, 128), sigma=(4, 2), minimal_diagonal_length=32**0.5, generate_keypoint_heatmaps=True, - decode_max_instances=30) + decode_max_instances=30, +) # model settings model = dict( - type='BottomupPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="BottomupPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256), - multiscale_output=True)), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + multiscale_output=True, + ), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='DEKRHead', + type="DEKRHead", in_channels=480, num_keypoints=17, - heatmap_loss=dict(type='KeypointMSELoss', use_target_weight=True), + heatmap_loss=dict(type="KeypointMSELoss", use_target_weight=True), displacement_loss=dict( - type='SoftWeightSmoothL1Loss', + type="SoftWeightSmoothL1Loss", use_target_weight=True, supervise_empty=False, beta=1 / 9, @@ -105,47 +80,52 @@ model = dict( in_channels=74, norm_indexes=(5, 6), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/kpt_rescore_coco-33d58c5c.pth')), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/kpt_rescore_coco-33d58c5c.pth" + ), + ), ), - test_cfg=dict( - multiscale_test=False, - flip_test=True, - nms_dist_thr=0.05, - shift_heatmap=True, - align_corners=False)) + test_cfg=dict(multiscale_test=False, flip_test=True, nms_dist_thr=0.05, shift_heatmap=True, align_corners=False), +) # enable DDP training when rescore net is used find_unused_parameters = True # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'bottomup' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "bottomup" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='BottomupRandomAffine', input_size=codec['input_size']), - dict(type='RandomFlip', direction='horizontal'), - dict(type='GenerateTarget', encoder=codec), - dict(type='BottomupGetHeatmapMask'), - dict(type='PackPoseInputs'), + dict(type="LoadImage"), + dict(type="BottomupRandomAffine", input_size=codec["input_size"]), + dict(type="RandomFlip", direction="horizontal"), + dict(type="GenerateTarget", encoder=codec), + dict(type="BottomupGetHeatmapMask"), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=codec["input_size"], size_factor=32, resize_mode="expand"), dict( - type='BottomupResize', - input_size=codec['input_size'], - size_factor=32, - resize_mode='expand'), - dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'crowd_index', 'ori_shape', - 'img_shape', 'input_size', 'input_center', 'input_scale', - 'flip', 'flip_direction', 'flip_indices', 'raw_ann_info', - 'skeleton_links')) + type="PackPoseInputs", + meta_keys=( + "id", + "img_id", + "img_path", + "crowd_index", + "ori_shape", + "img_shape", + "input_size", + "input_center", + "input_scale", + "flip", + "flip_direction", + "flip_indices", + "raw_ann_info", + "skeleton_links", + ), + ), ] # data loaders @@ -153,37 +133,39 @@ train_dataloader = dict( batch_size=10, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=1, num_workers=1, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json', - nms_mode='none', - score_mode='keypoint', + type="CocoMetric", + ann_file=data_root + "annotations/person_keypoints_val2017.json", + nms_mode="none", + score_mode="keypoint", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/dekr/coco/dekr_hrnet-w48_8xb10-140e_coco-640x640.py b/mmpose/configs/body_2d_keypoint/dekr/coco/dekr_hrnet-w48_8xb10-140e_coco-640x640.py index 57f656fb4d4c5f17f50e651ff5b160017d902971..25a350ef19f8bf2e4f6967a4c6ce313d525ed2b0 100644 --- a/mmpose/configs/body_2d_keypoint/dekr/coco/dekr_hrnet-w48_8xb10-140e_coco-640x640.py +++ b/mmpose/configs/body_2d_keypoint/dekr/coco/dekr_hrnet-w48_8xb10-140e_coco-640x640.py @@ -1,98 +1,73 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=140, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=1e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=1e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=140, - milestones=[90, 120], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=140, milestones=[90, 120], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=80) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings codec = dict( - type='SPR', + type="SPR", input_size=(640, 640), heatmap_size=(160, 160), sigma=(4, 2), minimal_diagonal_length=32**0.5, generate_keypoint_heatmaps=True, - decode_max_instances=30) + decode_max_instances=30, +) # model settings model = dict( - type='BottomupPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="BottomupPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384), - multiscale_output=True)), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + multiscale_output=True, + ), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='DEKRHead', + type="DEKRHead", in_channels=720, num_keypoints=17, num_heatmap_filters=48, - heatmap_loss=dict(type='KeypointMSELoss', use_target_weight=True), + heatmap_loss=dict(type="KeypointMSELoss", use_target_weight=True), displacement_loss=dict( - type='SoftWeightSmoothL1Loss', + type="SoftWeightSmoothL1Loss", use_target_weight=True, supervise_empty=False, beta=1 / 9, @@ -106,47 +81,52 @@ model = dict( in_channels=74, norm_indexes=(5, 6), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/kpt_rescore_coco-33d58c5c.pth')), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/kpt_rescore_coco-33d58c5c.pth" + ), + ), ), - test_cfg=dict( - multiscale_test=False, - flip_test=True, - nms_dist_thr=0.05, - shift_heatmap=True, - align_corners=False)) + test_cfg=dict(multiscale_test=False, flip_test=True, nms_dist_thr=0.05, shift_heatmap=True, align_corners=False), +) # enable DDP training when rescore net is used find_unused_parameters = True # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'bottomup' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "bottomup" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='BottomupRandomAffine', input_size=codec['input_size']), - dict(type='RandomFlip', direction='horizontal'), - dict(type='GenerateTarget', encoder=codec), - dict(type='BottomupGetHeatmapMask'), - dict(type='PackPoseInputs'), + dict(type="LoadImage"), + dict(type="BottomupRandomAffine", input_size=codec["input_size"]), + dict(type="RandomFlip", direction="horizontal"), + dict(type="GenerateTarget", encoder=codec), + dict(type="BottomupGetHeatmapMask"), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=codec["input_size"], size_factor=32, resize_mode="expand"), dict( - type='BottomupResize', - input_size=codec['input_size'], - size_factor=32, - resize_mode='expand'), - dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'crowd_index', 'ori_shape', - 'img_shape', 'input_size', 'input_center', 'input_scale', - 'flip', 'flip_direction', 'flip_indices', 'raw_ann_info', - 'skeleton_links')) + type="PackPoseInputs", + meta_keys=( + "id", + "img_id", + "img_path", + "crowd_index", + "ori_shape", + "img_shape", + "input_size", + "input_center", + "input_scale", + "flip", + "flip_direction", + "flip_indices", + "raw_ann_info", + "skeleton_links", + ), + ), ] # data loaders @@ -154,37 +134,39 @@ train_dataloader = dict( batch_size=10, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=1, num_workers=1, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json', - nms_mode='none', - score_mode='keypoint', + type="CocoMetric", + ann_file=data_root + "annotations/person_keypoints_val2017.json", + nms_mode="none", + score_mode="keypoint", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/dekr/crowdpose/dekr_hrnet-w32_8xb10-300e_crowdpose-512x512.py b/mmpose/configs/body_2d_keypoint/dekr/crowdpose/dekr_hrnet-w32_8xb10-300e_crowdpose-512x512.py index c990eecdd09eb74bb53ca29bd69ebc88670c9b2b..465ab633402cdfd5511e7c543f8302f0fa4e3c40 100644 --- a/mmpose/configs/body_2d_keypoint/dekr/crowdpose/dekr_hrnet-w32_8xb10-300e_crowdpose-512x512.py +++ b/mmpose/configs/body_2d_keypoint/dekr/crowdpose/dekr_hrnet-w32_8xb10-300e_crowdpose-512x512.py @@ -1,97 +1,72 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=300, val_interval=20) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=1e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=1e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=300, - milestones=[200, 260], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=300, milestones=[200, 260], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=80) # hooks -default_hooks = dict(checkpoint=dict(save_best='crowdpose/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="crowdpose/AP", rule="greater")) # codec settings codec = dict( - type='SPR', + type="SPR", input_size=(512, 512), heatmap_size=(128, 128), sigma=(4, 2), minimal_diagonal_length=32**0.5, generate_keypoint_heatmaps=True, - decode_max_instances=30) + decode_max_instances=30, +) # model settings model = dict( - type='BottomupPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="BottomupPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256), - multiscale_output=True)), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + multiscale_output=True, + ), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='DEKRHead', + type="DEKRHead", in_channels=480, num_keypoints=14, - heatmap_loss=dict(type='KeypointMSELoss', use_target_weight=True), + heatmap_loss=dict(type="KeypointMSELoss", use_target_weight=True), displacement_loss=dict( - type='SoftWeightSmoothL1Loss', + type="SoftWeightSmoothL1Loss", use_target_weight=True, supervise_empty=False, beta=1 / 9, @@ -105,46 +80,51 @@ model = dict( in_channels=59, norm_indexes=(0, 1), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/kpt_rescore_crowdpose-300c7efe.pth')), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/kpt_rescore_crowdpose-300c7efe.pth" + ), + ), ), - test_cfg=dict( - multiscale_test=False, - flip_test=True, - nms_dist_thr=0.05, - shift_heatmap=True, - align_corners=False)) + test_cfg=dict(multiscale_test=False, flip_test=True, nms_dist_thr=0.05, shift_heatmap=True, align_corners=False), +) # enable DDP training when rescore net is used find_unused_parameters = True # base dataset settings -dataset_type = 'CrowdPoseDataset' -data_mode = 'bottomup' -data_root = 'data/crowdpose/' +dataset_type = "CrowdPoseDataset" +data_mode = "bottomup" +data_root = "data/crowdpose/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='BottomupRandomAffine', input_size=codec['input_size']), - dict(type='RandomFlip', direction='horizontal'), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="LoadImage"), + dict(type="BottomupRandomAffine", input_size=codec["input_size"]), + dict(type="RandomFlip", direction="horizontal"), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict( - type='BottomupResize', - input_size=codec['input_size'], - size_factor=32, - resize_mode='expand'), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=codec["input_size"], size_factor=32, resize_mode="expand"), dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'crowd_index', 'ori_shape', - 'img_shape', 'input_size', 'input_center', 'input_scale', - 'flip', 'flip_direction', 'flip_indices', 'raw_ann_info', - 'skeleton_links')) + type="PackPoseInputs", + meta_keys=( + "id", + "img_id", + "img_path", + "crowd_index", + "ori_shape", + "img_shape", + "input_size", + "input_center", + "input_scale", + "flip", + "flip_direction", + "flip_indices", + "raw_ann_info", + "skeleton_links", + ), + ), ] # data loaders @@ -152,39 +132,42 @@ train_dataloader = dict( batch_size=10, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=1, num_workers=1, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/mmpose_crowdpose_test.json', - nms_mode='none', - score_mode='keypoint', + type="CocoMetric", + ann_file=data_root + "annotations/mmpose_crowdpose_test.json", + nms_mode="none", + score_mode="keypoint", use_area=False, - iou_type='keypoints_crowd', - prefix='crowdpose') + iou_type="keypoints_crowd", + prefix="crowdpose", +) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/dekr/crowdpose/dekr_hrnet-w48_8xb5-300e_crowdpose-640x640.py b/mmpose/configs/body_2d_keypoint/dekr/crowdpose/dekr_hrnet-w48_8xb5-300e_crowdpose-640x640.py index 7d88ee5d20a15686dbeb61dd477509e2d07f243b..efe5ca3ca1555ed78a781bc399b1d83a06d22ffe 100644 --- a/mmpose/configs/body_2d_keypoint/dekr/crowdpose/dekr_hrnet-w48_8xb5-300e_crowdpose-640x640.py +++ b/mmpose/configs/body_2d_keypoint/dekr/crowdpose/dekr_hrnet-w48_8xb5-300e_crowdpose-640x640.py @@ -1,98 +1,73 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=300, val_interval=20) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=1e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=1e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=300, - milestones=[200, 260], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=300, milestones=[200, 260], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=40) # hooks -default_hooks = dict(checkpoint=dict(save_best='crowdpose/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="crowdpose/AP", rule="greater")) # codec settings codec = dict( - type='SPR', + type="SPR", input_size=(640, 640), heatmap_size=(160, 160), sigma=(4, 2), minimal_diagonal_length=32**0.5, generate_keypoint_heatmaps=True, - decode_max_instances=30) + decode_max_instances=30, +) # model settings model = dict( - type='BottomupPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="BottomupPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384), - multiscale_output=True)), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + multiscale_output=True, + ), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='DEKRHead', + type="DEKRHead", in_channels=720, num_keypoints=14, num_heatmap_filters=48, - heatmap_loss=dict(type='KeypointMSELoss', use_target_weight=True), + heatmap_loss=dict(type="KeypointMSELoss", use_target_weight=True), displacement_loss=dict( - type='SoftWeightSmoothL1Loss', + type="SoftWeightSmoothL1Loss", use_target_weight=True, supervise_empty=False, beta=1 / 9, @@ -106,46 +81,51 @@ model = dict( in_channels=59, norm_indexes=(0, 1), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/kpt_rescore_crowdpose-300c7efe.pth')), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/kpt_rescore_crowdpose-300c7efe.pth" + ), + ), ), - test_cfg=dict( - multiscale_test=False, - flip_test=True, - nms_dist_thr=0.05, - shift_heatmap=True, - align_corners=False)) + test_cfg=dict(multiscale_test=False, flip_test=True, nms_dist_thr=0.05, shift_heatmap=True, align_corners=False), +) # enable DDP training when rescore net is used find_unused_parameters = True # base dataset settings -dataset_type = 'CrowdPoseDataset' -data_mode = 'bottomup' -data_root = 'data/crowdpose/' +dataset_type = "CrowdPoseDataset" +data_mode = "bottomup" +data_root = "data/crowdpose/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='BottomupRandomAffine', input_size=codec['input_size']), - dict(type='RandomFlip', direction='horizontal'), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="LoadImage"), + dict(type="BottomupRandomAffine", input_size=codec["input_size"]), + dict(type="RandomFlip", direction="horizontal"), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict( - type='BottomupResize', - input_size=codec['input_size'], - size_factor=32, - resize_mode='expand'), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=codec["input_size"], size_factor=32, resize_mode="expand"), dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'crowd_index', 'ori_shape', - 'img_shape', 'input_size', 'input_center', 'input_scale', - 'flip', 'flip_direction', 'flip_indices', 'raw_ann_info', - 'skeleton_links')) + type="PackPoseInputs", + meta_keys=( + "id", + "img_id", + "img_path", + "crowd_index", + "ori_shape", + "img_shape", + "input_size", + "input_center", + "input_scale", + "flip", + "flip_direction", + "flip_indices", + "raw_ann_info", + "skeleton_links", + ), + ), ] # data loaders @@ -153,39 +133,42 @@ train_dataloader = dict( batch_size=5, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=1, num_workers=1, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/mmpose_crowdpose_test.json', - nms_mode='none', - score_mode='keypoint', + type="CocoMetric", + ann_file=data_root + "annotations/mmpose_crowdpose_test.json", + nms_mode="none", + score_mode="keypoint", use_area=False, - iou_type='keypoints_crowd', - prefix='crowdpose') + iou_type="keypoints_crowd", + prefix="crowdpose", +) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/edpose/coco/edpose_res50_8xb2-50e_coco-800x1333.py b/mmpose/configs/body_2d_keypoint/edpose/coco/edpose_res50_8xb2-50e_coco-800x1333.py index a1592538db4d876c2842fbdb359719a37f9edfe6..a683923b3faeb93e5e0b2ef98fca4ea1b33fbf38 100644 --- a/mmpose/configs/body_2d_keypoint/edpose/coco/edpose_res50_8xb2-50e_coco-800x1333.py +++ b/mmpose/configs/body_2d_keypoint/edpose/coco/edpose_res50_8xb2-50e_coco-800x1333.py @@ -1,10 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. -from mmengine.config import read_base - -with read_base(): - from mmpose.configs._base_.default_runtime import * # noqa - from mmcv.transforms import RandomChoice, RandomChoiceResize +from mmengine.config import read_base from mmengine.dataset import DefaultSampler from mmengine.model import PretrainedInit from mmengine.optim import LinearLR, MultiStepLR @@ -12,42 +8,37 @@ from torch.nn import GroupNorm from torch.optim import Adam from mmpose.codecs import EDPoseLabel -from mmpose.datasets import (BottomupRandomChoiceResize, BottomupRandomCrop, - CocoDataset, LoadImage, PackPoseInputs, - RandomFlip) +from mmpose.datasets import BottomupRandomChoiceResize, BottomupRandomCrop, CocoDataset, LoadImage, PackPoseInputs, RandomFlip from mmpose.evaluation import CocoMetric -from mmpose.models import (BottomupPoseEstimator, ChannelMapper, EDPoseHead, - PoseDataPreprocessor, ResNet) +from mmpose.models import BottomupPoseEstimator, ChannelMapper, EDPoseHead, PoseDataPreprocessor, ResNet from mmpose.models.utils import FrozenBatchNorm2d +with read_base(): + from mmpose.configs._base_.default_runtime import * # noqa + + # runtime train_cfg.update(max_epochs=50, val_interval=10) # noqa # optimizer -optim_wrapper = dict(optimizer=dict( - type=Adam, - lr=1e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type=Adam, + lr=1e-3, + ) +) # learning policy param_scheduler = [ - dict(type=LinearLR, begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type=MultiStepLR, - begin=0, - end=140, - milestones=[33, 45], - gamma=0.1, - by_epoch=True) + dict(type=LinearLR, begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type=MultiStepLR, begin=0, end=140, milestones=[33, 45], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=80) # hooks -default_hooks.update( # noqa - checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks.update(checkpoint=dict(save_best="coco/AP", rule="greater")) # noqa # codec settings codec = dict(type=EDPoseLabel, num_select=50, num_keypoints=17) @@ -56,11 +47,8 @@ codec = dict(type=EDPoseLabel, num_select=50, num_keypoints=17) model = dict( type=BottomupPoseEstimator, data_preprocessor=dict( - type=PoseDataPreprocessor, - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=1), + type=PoseDataPreprocessor, mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=1 + ), backbone=dict( type=ResNet, depth=50, @@ -69,9 +57,9 @@ model = dict( frozen_stages=1, norm_cfg=dict(type=FrozenBatchNorm2d, requires_grad=False), norm_eval=True, - style='pytorch', - init_cfg=dict( - type=PretrainedInit, checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type=PretrainedInit, checkpoint="torchvision://resnet50"), + ), neck=dict( type=ChannelMapper, in_channels=[512, 1024, 2048], @@ -79,7 +67,8 @@ model = dict( out_channels=256, act_cfg=None, norm_cfg=dict(type=GroupNorm, num_groups=32), - num_outs=4), + num_outs=4, + ), head=dict( type=EDPoseHead, num_queries=900, @@ -90,72 +79,73 @@ model = dict( num_layers=6, layer_cfg=dict( # DeformableDetrTransformerEncoderLayer self_attn_cfg=dict( # MultiScaleDeformableAttention - embed_dims=256, - num_heads=8, - num_levels=4, - num_points=4, - batch_first=True), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, - num_fcs=2, - ffn_drop=0.0))), + embed_dims=256, num_heads=8, num_levels=4, num_points=4, batch_first=True + ), + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, num_fcs=2, ffn_drop=0.0), + ), + ), decoder=dict( num_layers=6, embed_dims=256, layer_cfg=dict( # DeformableDetrTransformerDecoderLayer - self_attn_cfg=dict( # MultiheadAttention - embed_dims=256, - num_heads=8, - batch_first=True), - cross_attn_cfg=dict( # MultiScaleDeformableAttention - embed_dims=256, - batch_first=True), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=2048, ffn_drop=0.1)), + self_attn_cfg=dict(embed_dims=256, num_heads=8, batch_first=True), # MultiheadAttention + cross_attn_cfg=dict(embed_dims=256, batch_first=True), # MultiScaleDeformableAttention + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.1), + ), query_dim=4, num_feature_levels=4, num_group=100, num_dn=100, num_box_decoder_layers=2, - return_intermediate=True), + return_intermediate=True, + ), out_head=dict(num_classes=2), - positional_encoding=dict( - num_pos_feats=128, - temperatureH=20, - temperatureW=20, - normalize=True), + positional_encoding=dict(num_pos_feats=128, temperatureH=20, temperatureW=20, normalize=True), denosing_cfg=dict( dn_box_noise_scale=0.4, dn_label_noise_ratio=0.5, dn_labelbook_size=100, - dn_attn_mask_type_list=['match2dn', 'dn2dn', 'group2group']), - data_decoder=codec), + dn_attn_mask_type_list=["match2dn", "dn2dn", "group2group"], + ), + data_decoder=codec, + ), test_cfg=dict(Pmultiscale_test=False, flip_test=False, num_select=50), - train_cfg=dict()) + train_cfg=dict(), +) # enable DDP training when rescore net is used find_unused_parameters = True # base dataset settings dataset_type = CocoDataset -data_mode = 'bottomup' -data_root = 'data/coco/' +data_mode = "bottomup" +data_root = "data/coco/" # pipelines train_pipeline = [ dict(type=LoadImage), - dict(type=RandomFlip, direction='horizontal'), + dict(type=RandomFlip, direction="horizontal"), dict( type=RandomChoice, transforms=[ [ dict( type=RandomChoiceResize, - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( @@ -163,36 +153,54 @@ train_pipeline = [ # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), - dict( - type=BottomupRandomCrop, - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), + keep_ratio=True, + ), + dict(type=BottomupRandomCrop, crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( type=BottomupRandomChoiceResize, - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), dict(type=PackPoseInputs), ] val_pipeline = [ dict(type=LoadImage), - dict( - type=BottomupRandomChoiceResize, - scales=[(800, 1333)], - keep_ratio=True, - backend='pillow'), + dict(type=BottomupRandomChoiceResize, scales=[(800, 1333)], keep_ratio=True, backend="pillow"), dict( type=PackPoseInputs, - meta_keys=('id', 'img_id', 'img_path', 'crowd_index', 'ori_shape', - 'img_shape', 'input_size', 'input_center', 'input_scale', - 'flip', 'flip_direction', 'flip_indices', 'raw_ann_info', - 'skeleton_links')) + meta_keys=( + "id", + "img_id", + "img_path", + "crowd_index", + "ori_shape", + "img_shape", + "input_size", + "input_center", + "input_scale", + "flip", + "flip_direction", + "flip_indices", + "raw_ann_info", + "skeleton_links", + ), + ), ] # data loaders @@ -205,10 +213,11 @@ train_dataloader = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=1, @@ -220,17 +229,18 @@ val_dataloader = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( type=CocoMetric, - nms_mode='none', - score_mode='keypoint', + nms_mode="none", + score_mode="keypoint", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/integral_regression/coco/ipr_res50_8xb64-210e_coco-256x256.py b/mmpose/configs/body_2d_keypoint/integral_regression/coco/ipr_res50_8xb64-210e_coco-256x256.py index 3dfaeeda8b850fa361eebbf5342ec64842d858e8..40ea08f009c2f3a4105703181b92580091e34b32 100644 --- a/mmpose/configs/body_2d_keypoint/integral_regression/coco/ipr_res50_8xb64-210e_coco-256x256.py +++ b/mmpose/configs/body_2d_keypoint/integral_regression/coco/ipr_res50_8xb64-210e_coco-256x256.py @@ -1,94 +1,78 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='IntegralRegressionLabel', - input_size=(256, 256), - heatmap_size=(64, 64), - sigma=2.0, - normalize=True) +codec = dict(type="IntegralRegressionLabel", input_size=(256, 256), heatmap_size=(64, 64), sigma=2.0, normalize=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, ), head=dict( - type='DSNTHead', + type="DSNTHead", in_channels=2048, in_featuremap_size=(8, 8), num_joints=17, loss=dict( - type='MultipleLossWrapper', - losses=[ - dict(type='SmoothL1Loss', use_target_weight=True), - dict(type='KeypointMSELoss', use_target_weight=True) - ]), - decoder=codec), + type="MultipleLossWrapper", + losses=[dict(type="SmoothL1Loss", use_target_weight=True), dict(type="KeypointMSELoss", use_target_weight=True)], + ), + decoder=codec, + ), test_cfg=dict( flip_test=True, shift_coords=True, shift_heatmap=True, ), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/td-hm_res50_8xb64-210e_coco-256x192.pth')) + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/td-hm_res50_8xb64-210e_coco-256x192.pth" + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] test_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -96,39 +80,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=f'{data_root}annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=f"{data_root}annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/integral_regression/coco/ipr_res50_debias-8xb64-210e_coco-256x256.py b/mmpose/configs/body_2d_keypoint/integral_regression/coco/ipr_res50_debias-8xb64-210e_coco-256x256.py index 9618c810ea20b0d147f71930034b616f6bed3a97..44df96511de995563f4ebe5b4eb3d9b3af0be8bd 100644 --- a/mmpose/configs/body_2d_keypoint/integral_regression/coco/ipr_res50_debias-8xb64-210e_coco-256x256.py +++ b/mmpose/configs/body_2d_keypoint/integral_regression/coco/ipr_res50_debias-8xb64-210e_coco-256x256.py @@ -1,96 +1,80 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='IntegralRegressionLabel', - input_size=(256, 256), - heatmap_size=(64, 64), - sigma=2.0, - normalize=True) +codec = dict(type="IntegralRegressionLabel", input_size=(256, 256), heatmap_size=(64, 64), sigma=2.0, normalize=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, ), head=dict( - type='DSNTHead', + type="DSNTHead", in_channels=2048, in_featuremap_size=(8, 8), num_joints=17, debias=True, - beta=10., + beta=10.0, loss=dict( - type='MultipleLossWrapper', - losses=[ - dict(type='SmoothL1Loss', use_target_weight=True), - dict(type='JSDiscretLoss', use_target_weight=True) - ]), - decoder=codec), + type="MultipleLossWrapper", + losses=[dict(type="SmoothL1Loss", use_target_weight=True), dict(type="JSDiscretLoss", use_target_weight=True)], + ), + decoder=codec, + ), test_cfg=dict( flip_test=True, shift_coords=True, shift_heatmap=True, ), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/td-hm_res50_8xb64-210e_coco-256x192.pth')) + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/td-hm_res50_8xb64-210e_coco-256x192.pth" + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] test_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -98,39 +82,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=f'{data_root}annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=f"{data_root}annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/integral_regression/coco/ipr_res50_dsnt-8xb64-210e_coco-256x256.py b/mmpose/configs/body_2d_keypoint/integral_regression/coco/ipr_res50_dsnt-8xb64-210e_coco-256x256.py index 8c3897fce1acd0deabaedacea3b38b08b9138330..d83043df0e331f1434b090b21951bce96f8aaa19 100644 --- a/mmpose/configs/body_2d_keypoint/integral_regression/coco/ipr_res50_dsnt-8xb64-210e_coco-256x256.py +++ b/mmpose/configs/body_2d_keypoint/integral_regression/coco/ipr_res50_dsnt-8xb64-210e_coco-256x256.py @@ -1,94 +1,78 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='IntegralRegressionLabel', - input_size=(256, 256), - heatmap_size=(64, 64), - sigma=2.0, - normalize=True) +codec = dict(type="IntegralRegressionLabel", input_size=(256, 256), heatmap_size=(64, 64), sigma=2.0, normalize=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, ), head=dict( - type='DSNTHead', + type="DSNTHead", in_channels=2048, in_featuremap_size=(8, 8), num_joints=17, loss=dict( - type='MultipleLossWrapper', - losses=[ - dict(type='SmoothL1Loss', use_target_weight=True), - dict(type='JSDiscretLoss', use_target_weight=True) - ]), - decoder=codec), + type="MultipleLossWrapper", + losses=[dict(type="SmoothL1Loss", use_target_weight=True), dict(type="JSDiscretLoss", use_target_weight=True)], + ), + decoder=codec, + ), test_cfg=dict( flip_test=True, shift_coords=True, shift_heatmap=True, ), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/td-hm_res50_8xb64-210e_coco-256x192.pth')) + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/td-hm_res50_8xb64-210e_coco-256x192.pth" + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] test_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -96,39 +80,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=f'{data_root}annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=f"{data_root}annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-l_16xb16-600e_body7-640x640.py b/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-l_16xb16-600e_body7-640x640.py index 45e4295c6ceedf42e72def61bed556f34eae34b2..e223ebb5c13a6730c797fc686664c92f6fe1148e 100644 --- a/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-l_16xb16-600e_body7-640x640.py +++ b/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-l_16xb16-600e_body7-640x640.py @@ -1,110 +1,89 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=600, val_interval=20, dynamic_intervals=[(580, 1)]) auto_scale_lr = dict(base_batch_size=256) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', interval=40, max_keep_ckpts=3)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", interval=40, max_keep_ckpts=3)) optim_wrapper = dict( - type='OptimWrapper', - constructor='ForceDefaultOptimWrapperConstructor', - optimizer=dict(type='AdamW', lr=0.004, weight_decay=0.05), + type="OptimWrapper", + constructor="ForceDefaultOptimWrapperConstructor", + optimizer=dict(type="AdamW", lr=0.004, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True, force_default_settings=True, - custom_keys=dict({'neck.encoder': dict(lr_mult=0.05)})), - clip_grad=dict(max_norm=0.1, norm_type=2)) + custom_keys=dict({"neck.encoder": dict(lr_mult=0.05)}), + ), + clip_grad=dict(max_norm=0.1, norm_type=2), +) param_scheduler = [ - dict( - type='QuadraticWarmupLR', - by_epoch=True, - begin=0, - end=5, - convert_to_iter_based=True), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=5, - T_max=280, - end=280, - by_epoch=True, - convert_to_iter_based=True), + dict(type="QuadraticWarmupLR", by_epoch=True, begin=0, end=5, convert_to_iter_based=True), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=5, T_max=280, end=280, by_epoch=True, convert_to_iter_based=True), # this scheduler is used to increase the lr from 2e-4 to 5e-4 - dict(type='ConstantLR', by_epoch=True, factor=2.5, begin=280, end=281), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=281, - T_max=300, - end=580, - by_epoch=True, - convert_to_iter_based=True), - dict(type='ConstantLR', by_epoch=True, factor=1, begin=580, end=600), + dict(type="ConstantLR", by_epoch=True, factor=2.5, begin=280, end=281), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=281, T_max=300, end=580, by_epoch=True, convert_to_iter_based=True), + dict(type="ConstantLR", by_epoch=True, factor=1, begin=580, end=600), ] # data input_size = (640, 640) -metafile = 'configs/_base_/datasets/coco.py' -codec = dict(type='YOLOXPoseAnnotationProcessor', input_size=input_size) +metafile = "configs/_base_/datasets/coco.py" +codec = dict(type="YOLOXPoseAnnotationProcessor", input_size=input_size) train_pipeline_stage1 = [ - dict(type='LoadImage', backend_args=None), + dict(type="LoadImage", backend_args=None), + dict(type="Mosaic", img_scale=(640, 640), pad_val=114.0, pre_transform=[dict(type="LoadImage", backend_args=None)]), dict( - type='Mosaic', - img_scale=(640, 640), - pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_factor=0.1, rotate_factor=10, scale_factor=(0.75, 1.0), pad_val=114, - distribution='uniform', - transform_mode='perspective', + distribution="uniform", + transform_mode="perspective", bbox_keep_corner=False, clip_border=True, ), dict( - type='YOLOXMixUp', + type="YOLOXMixUp", img_scale=(640, 640), ratio_range=(0.8, 1.6), pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + pre_transform=[dict(type="LoadImage", backend_args=None)], + ), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage'), + dict(type="LoadImage"), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), - scale_type='long', + scale_type="long", pad_val=(114, 114, 114), bbox_keep_corner=False, clip_border=True, ), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='BottomupGetHeatmapMask', get_invalid=True), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="BottomupGetHeatmapMask", get_invalid=True), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data settings -data_mode = 'bottomup' -data_root = 'data/' +data_mode = "bottomup" +data_root = "data/" # mapping aic_coco = [ @@ -227,90 +206,70 @@ posetrack_coco = [ # train datasets dataset_coco = dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=17, - mapping=[(i, i) for i in range(17)]) - ], + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=[(i, i) for i in range(17)])], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) train_dataset = dict( - type='CombinedDataset', + type="CombinedDataset", metainfo=dict(from_file=metafile), datasets=[ dataset_coco, @@ -323,25 +282,25 @@ train_dataset = dict( ], sample_ratio_factor=[1, 0.3, 0.5, 0.3, 0.3, 0.4, 0.3], test_mode=False, - pipeline=train_pipeline_stage1) + pipeline=train_pipeline_stage1, +) train_dataloader = dict( batch_size=16, num_workers=8, persistent_workers=True, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=True), - dataset=train_dataset) + sampler=dict(type="DefaultSampler", shuffle=True), + dataset=train_dataset, +) # val datasets val_pipeline = [ - dict(type='LoadImage'), - dict( - type='BottomupResize', input_size=input_size, pad_val=(114, 114, 114)), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=input_size, pad_val=(114, 114, 114)), dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'ori_shape', 'img_shape', - 'input_size', 'input_center', 'input_scale')) + type="PackPoseInputs", meta_keys=("id", "img_id", "img_path", "ori_shape", "img_shape", "input_size", "input_center", "input_scale") + ), ] val_dataloader = dict( @@ -350,55 +309,52 @@ val_dataloader = dict( persistent_workers=True, pin_memory=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json', - score_mode='bbox', - nms_mode='none', + type="CocoMetric", + ann_file=data_root + "coco/annotations/person_keypoints_val2017.json", + score_mode="bbox", + nms_mode="none", ) test_evaluator = val_evaluator # hooks custom_hooks = [ dict( - type='YOLOXPoseModeSwitchHook', + type="YOLOXPoseModeSwitchHook", num_last_epochs=20, new_train_dataset=dataset_coco, new_train_pipeline=train_pipeline_stage2, - priority=48), + priority=48, + ), dict( - type='RTMOModeSwitchHook', + type="RTMOModeSwitchHook", epoch_attributes={ 280: { - 'proxy_target_cc': True, - 'overlaps_power': 1.0, - 'loss_cls.loss_weight': 2.0, - 'loss_mle.loss_weight': 5.0, - 'loss_oks.loss_weight': 10.0 + "proxy_target_cc": True, + "overlaps_power": 1.0, + "loss_cls.loss_weight": 2.0, + "loss_mle.loss_weight": 5.0, + "loss_oks.loss_weight": 10.0, }, }, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - strict_load=False, - priority=49), + priority=48, + ), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, strict_load=False, priority=49), ] # model @@ -406,43 +362,35 @@ widen_factor = 1.0 deepen_factor = 1.0 model = dict( - type='BottomupPoseEstimator', - init_cfg=dict( - type='Kaiming', - layer='Conv2d', - a=2.23606797749979, - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu'), + type="BottomupPoseEstimator", + init_cfg=dict(type="Kaiming", layer="Conv2d", a=2.23606797749979, distribution="uniform", mode="fan_in", nonlinearity="leaky_relu"), data_preprocessor=dict( - type='PoseDataPreprocessor', + type="PoseDataPreprocessor", pad_size_divisor=32, mean=[0, 0, 0], std=[1, 1, 1], batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(480, 800), - size_divisor=32, - interval=1), - ]), + dict(type="BatchSyncRandomResize", random_size_range=(480, 800), size_divisor=32, interval=1), + ], + ), backbone=dict( - type='CSPDarknet', + type="CSPDarknet", deepen_factor=deepen_factor, widen_factor=widen_factor, out_indices=(2, 3, 4), spp_kernal_sizes=(5, 9, 13), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmdetection/v2.0/' - 'yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco' - '_20211126_140236-d3bd2b23.pth', - prefix='backbone.', - )), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/" + "yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco" + "_20211126_140236-d3bd2b23.pth", + prefix="backbone.", + ), + ), neck=dict( - type='HybridEncoder', + type="HybridEncoder", in_channels=[256, 512, 1024], deepen_factor=deepen_factor, widen_factor=widen_factor, @@ -450,21 +398,20 @@ model = dict( output_indices=[1, 2], encoder_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=1024, - ffn_drop=0.0, - act_cfg=dict(type='GELU'))), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0, act_cfg=dict(type="GELU")), + ), projector=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[256, 256], kernel_size=1, out_channels=512, act_cfg=None, - norm_cfg=dict(type='BN'), - num_outs=2)), + norm_cfg=dict(type="BN"), + num_outs=2, + ), + ), head=dict( - type='RTMOHead', + type="RTMOHead", num_keypoints=17, featmap_strides=(16, 32), head_module_cfg=dict( @@ -475,59 +422,33 @@ model = dict( pose_vec_channels=512, widen_factor=widen_factor, stacked_convs=2, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), - assigner=dict( - type='SimOTAAssigner', - dynamic_k_indicator='oks', - oks_calculator=dict(type='PoseOKS', metainfo=metafile)), - prior_generator=dict( - type='MlvlPointGenerator', - centralize_points=True, - strides=[16, 32]), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), + assigner=dict(type="SimOTAAssigner", dynamic_k_indicator="oks", oks_calculator=dict(type="PoseOKS", metainfo=metafile)), + prior_generator=dict(type="MlvlPointGenerator", centralize_points=True, strides=[16, 32]), dcc_cfg=dict( in_channels=512, feat_channels=128, num_bins=(192, 256), spe_channels=128, - gau_cfg=dict( - s=128, - expansion_factor=2, - dropout_rate=0.0, - drop_path=0.0, - act_fn='SiLU', - pos_enc='add')), + gau_cfg=dict(s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", pos_enc="add"), + ), overlaps_power=0.5, - loss_cls=dict( - type='VariFocalLoss', - reduction='sum', - use_target_weight=True, - loss_weight=1.0), - loss_bbox=dict( - type='IoULoss', - mode='square', - eps=1e-16, - reduction='sum', - loss_weight=5.0), - loss_oks=dict( - type='OKSLoss', - reduction='none', - metainfo=metafile, - loss_weight=30.0), - loss_vis=dict( - type='BCELoss', - use_target_weight=True, - reduction='mean', - loss_weight=1.0), + loss_cls=dict(type="VariFocalLoss", reduction="sum", use_target_weight=True, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", mode="square", eps=1e-16, reduction="sum", loss_weight=5.0), + loss_oks=dict(type="OKSLoss", reduction="none", metainfo=metafile, loss_weight=30.0), + loss_vis=dict(type="BCELoss", use_target_weight=True, reduction="mean", loss_weight=1.0), loss_mle=dict( - type='MLECCLoss', + type="MLECCLoss", use_target_weight=True, loss_weight=1e-2, ), - loss_bbox_aux=dict(type='L1Loss', reduction='sum', loss_weight=1.0), + loss_bbox_aux=dict(type="L1Loss", reduction="sum", loss_weight=1.0), ), test_cfg=dict( input_size=input_size, score_thr=0.1, nms_thr=0.65, - )) + ), +) diff --git a/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-m_16xb16-600e_body7-640x640.py b/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-m_16xb16-600e_body7-640x640.py index 6c1a0053668a31289b3a8a7cf73a546dbe1910d7..c589fb9eac23d2874db99dfeed218024db39c070 100644 --- a/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-m_16xb16-600e_body7-640x640.py +++ b/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-m_16xb16-600e_body7-640x640.py @@ -1,110 +1,89 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=600, val_interval=20, dynamic_intervals=[(580, 1)]) auto_scale_lr = dict(base_batch_size=256) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', interval=40, max_keep_ckpts=3)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", interval=40, max_keep_ckpts=3)) optim_wrapper = dict( - type='OptimWrapper', - constructor='ForceDefaultOptimWrapperConstructor', - optimizer=dict(type='AdamW', lr=0.004, weight_decay=0.05), + type="OptimWrapper", + constructor="ForceDefaultOptimWrapperConstructor", + optimizer=dict(type="AdamW", lr=0.004, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True, force_default_settings=True, - custom_keys=dict({'neck.encoder': dict(lr_mult=0.05)})), - clip_grad=dict(max_norm=0.1, norm_type=2)) + custom_keys=dict({"neck.encoder": dict(lr_mult=0.05)}), + ), + clip_grad=dict(max_norm=0.1, norm_type=2), +) param_scheduler = [ - dict( - type='QuadraticWarmupLR', - by_epoch=True, - begin=0, - end=5, - convert_to_iter_based=True), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=5, - T_max=280, - end=280, - by_epoch=True, - convert_to_iter_based=True), + dict(type="QuadraticWarmupLR", by_epoch=True, begin=0, end=5, convert_to_iter_based=True), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=5, T_max=280, end=280, by_epoch=True, convert_to_iter_based=True), # this scheduler is used to increase the lr from 2e-4 to 5e-4 - dict(type='ConstantLR', by_epoch=True, factor=2.5, begin=280, end=281), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=281, - T_max=300, - end=580, - by_epoch=True, - convert_to_iter_based=True), - dict(type='ConstantLR', by_epoch=True, factor=1, begin=580, end=600), + dict(type="ConstantLR", by_epoch=True, factor=2.5, begin=280, end=281), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=281, T_max=300, end=580, by_epoch=True, convert_to_iter_based=True), + dict(type="ConstantLR", by_epoch=True, factor=1, begin=580, end=600), ] # data input_size = (640, 640) -metafile = 'configs/_base_/datasets/coco.py' -codec = dict(type='YOLOXPoseAnnotationProcessor', input_size=input_size) +metafile = "configs/_base_/datasets/coco.py" +codec = dict(type="YOLOXPoseAnnotationProcessor", input_size=input_size) train_pipeline_stage1 = [ - dict(type='LoadImage', backend_args=None), + dict(type="LoadImage", backend_args=None), + dict(type="Mosaic", img_scale=(640, 640), pad_val=114.0, pre_transform=[dict(type="LoadImage", backend_args=None)]), dict( - type='Mosaic', - img_scale=(640, 640), - pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_factor=0.1, rotate_factor=10, scale_factor=(0.75, 1.0), pad_val=114, - distribution='uniform', - transform_mode='perspective', + distribution="uniform", + transform_mode="perspective", bbox_keep_corner=False, clip_border=True, ), dict( - type='YOLOXMixUp', + type="YOLOXMixUp", img_scale=(640, 640), ratio_range=(0.8, 1.6), pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + pre_transform=[dict(type="LoadImage", backend_args=None)], + ), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage'), + dict(type="LoadImage"), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), - scale_type='long', + scale_type="long", pad_val=(114, 114, 114), bbox_keep_corner=False, clip_border=True, ), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='BottomupGetHeatmapMask', get_invalid=True), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="BottomupGetHeatmapMask", get_invalid=True), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data settings -data_mode = 'bottomup' -data_root = 'data/' +data_mode = "bottomup" +data_root = "data/" # mapping aic_coco = [ @@ -227,90 +206,70 @@ posetrack_coco = [ # train datasets dataset_coco = dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=17, - mapping=[(i, i) for i in range(17)]) - ], + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=[(i, i) for i in range(17)])], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) train_dataset = dict( - type='CombinedDataset', + type="CombinedDataset", metainfo=dict(from_file=metafile), datasets=[ dataset_coco, @@ -323,25 +282,25 @@ train_dataset = dict( ], sample_ratio_factor=[1, 0.3, 0.5, 0.3, 0.3, 0.4, 0.3], test_mode=False, - pipeline=train_pipeline_stage1) + pipeline=train_pipeline_stage1, +) train_dataloader = dict( batch_size=16, num_workers=8, persistent_workers=True, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=True), - dataset=train_dataset) + sampler=dict(type="DefaultSampler", shuffle=True), + dataset=train_dataset, +) # val datasets val_pipeline = [ - dict(type='LoadImage'), - dict( - type='BottomupResize', input_size=input_size, pad_val=(114, 114, 114)), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=input_size, pad_val=(114, 114, 114)), dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'ori_shape', 'img_shape', - 'input_size', 'input_center', 'input_scale')) + type="PackPoseInputs", meta_keys=("id", "img_id", "img_path", "ori_shape", "img_shape", "input_size", "input_center", "input_scale") + ), ] val_dataloader = dict( @@ -350,55 +309,52 @@ val_dataloader = dict( persistent_workers=True, pin_memory=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json', - score_mode='bbox', - nms_mode='none', + type="CocoMetric", + ann_file=data_root + "coco/annotations/person_keypoints_val2017.json", + score_mode="bbox", + nms_mode="none", ) test_evaluator = val_evaluator # hooks custom_hooks = [ dict( - type='YOLOXPoseModeSwitchHook', + type="YOLOXPoseModeSwitchHook", num_last_epochs=20, new_train_dataset=dataset_coco, new_train_pipeline=train_pipeline_stage2, - priority=48), + priority=48, + ), dict( - type='RTMOModeSwitchHook', + type="RTMOModeSwitchHook", epoch_attributes={ 280: { - 'proxy_target_cc': True, - 'overlaps_power': 1.0, - 'loss_cls.loss_weight': 2.0, - 'loss_mle.loss_weight': 5.0, - 'loss_oks.loss_weight': 10.0 + "proxy_target_cc": True, + "overlaps_power": 1.0, + "loss_cls.loss_weight": 2.0, + "loss_mle.loss_weight": 5.0, + "loss_oks.loss_weight": 10.0, }, }, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - strict_load=False, - priority=49), + priority=48, + ), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, strict_load=False, priority=49), ] # model @@ -406,42 +362,33 @@ widen_factor = 0.75 deepen_factor = 0.67 model = dict( - type='BottomupPoseEstimator', - init_cfg=dict( - type='Kaiming', - layer='Conv2d', - a=2.23606797749979, - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu'), + type="BottomupPoseEstimator", + init_cfg=dict(type="Kaiming", layer="Conv2d", a=2.23606797749979, distribution="uniform", mode="fan_in", nonlinearity="leaky_relu"), data_preprocessor=dict( - type='PoseDataPreprocessor', + type="PoseDataPreprocessor", pad_size_divisor=32, mean=[0, 0, 0], std=[1, 1, 1], batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(480, 800), - size_divisor=32, - interval=1), - ]), + dict(type="BatchSyncRandomResize", random_size_range=(480, 800), size_divisor=32, interval=1), + ], + ), backbone=dict( - type='CSPDarknet', + type="CSPDarknet", deepen_factor=deepen_factor, widen_factor=widen_factor, out_indices=(2, 3, 4), spp_kernal_sizes=(5, 9, 13), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/v1/' - 'pretrained_models/yolox_m_8x8_300e_coco_20230829.pth', - prefix='backbone.', - )), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmpose/v1/" "pretrained_models/yolox_m_8x8_300e_coco_20230829.pth", + prefix="backbone.", + ), + ), neck=dict( - type='HybridEncoder', + type="HybridEncoder", in_channels=[192, 384, 768], deepen_factor=deepen_factor, widen_factor=widen_factor, @@ -449,21 +396,20 @@ model = dict( output_indices=[1, 2], encoder_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=1024, - ffn_drop=0.0, - act_cfg=dict(type='GELU'))), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0, act_cfg=dict(type="GELU")), + ), projector=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[256, 256], kernel_size=1, out_channels=384, act_cfg=None, - norm_cfg=dict(type='BN'), - num_outs=2)), + norm_cfg=dict(type="BN"), + num_outs=2, + ), + ), head=dict( - type='RTMOHead', + type="RTMOHead", num_keypoints=17, featmap_strides=(16, 32), head_module_cfg=dict( @@ -474,59 +420,33 @@ model = dict( pose_vec_channels=384, widen_factor=widen_factor, stacked_convs=2, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), - assigner=dict( - type='SimOTAAssigner', - dynamic_k_indicator='oks', - oks_calculator=dict(type='PoseOKS', metainfo=metafile)), - prior_generator=dict( - type='MlvlPointGenerator', - centralize_points=True, - strides=[16, 32]), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), + assigner=dict(type="SimOTAAssigner", dynamic_k_indicator="oks", oks_calculator=dict(type="PoseOKS", metainfo=metafile)), + prior_generator=dict(type="MlvlPointGenerator", centralize_points=True, strides=[16, 32]), dcc_cfg=dict( in_channels=384, feat_channels=128, num_bins=(192, 256), spe_channels=128, - gau_cfg=dict( - s=128, - expansion_factor=2, - dropout_rate=0.0, - drop_path=0.0, - act_fn='SiLU', - pos_enc='add')), + gau_cfg=dict(s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", pos_enc="add"), + ), overlaps_power=0.5, - loss_cls=dict( - type='VariFocalLoss', - reduction='sum', - use_target_weight=True, - loss_weight=1.0), - loss_bbox=dict( - type='IoULoss', - mode='square', - eps=1e-16, - reduction='sum', - loss_weight=5.0), - loss_oks=dict( - type='OKSLoss', - reduction='none', - metainfo=metafile, - loss_weight=30.0), - loss_vis=dict( - type='BCELoss', - use_target_weight=True, - reduction='mean', - loss_weight=1.0), + loss_cls=dict(type="VariFocalLoss", reduction="sum", use_target_weight=True, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", mode="square", eps=1e-16, reduction="sum", loss_weight=5.0), + loss_oks=dict(type="OKSLoss", reduction="none", metainfo=metafile, loss_weight=30.0), + loss_vis=dict(type="BCELoss", use_target_weight=True, reduction="mean", loss_weight=1.0), loss_mle=dict( - type='MLECCLoss', + type="MLECCLoss", use_target_weight=True, loss_weight=1e-2, ), - loss_bbox_aux=dict(type='L1Loss', reduction='sum', loss_weight=1.0), + loss_bbox_aux=dict(type="L1Loss", reduction="sum", loss_weight=1.0), ), test_cfg=dict( input_size=input_size, score_thr=0.1, nms_thr=0.65, - )) + ), +) diff --git a/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-s_8xb32-600e_body7-640x640.py b/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-s_8xb32-600e_body7-640x640.py index 83d7c21d8abf72a37a8457c44202307874d434c6..fbbc4db54920ba3be06afbd19597f631479a11ab 100644 --- a/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-s_8xb32-600e_body7-640x640.py +++ b/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-s_8xb32-600e_body7-640x640.py @@ -1,113 +1,92 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=600, val_interval=20, dynamic_intervals=[(580, 1)]) auto_scale_lr = dict(base_batch_size=256) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', interval=40, max_keep_ckpts=3)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", interval=40, max_keep_ckpts=3)) optim_wrapper = dict( - type='OptimWrapper', - constructor='ForceDefaultOptimWrapperConstructor', - optimizer=dict(type='AdamW', lr=0.004, weight_decay=0.05), + type="OptimWrapper", + constructor="ForceDefaultOptimWrapperConstructor", + optimizer=dict(type="AdamW", lr=0.004, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True, force_default_settings=True, - custom_keys=dict({'neck.encoder': dict(lr_mult=0.05)})), - clip_grad=dict(max_norm=0.1, norm_type=2)) + custom_keys=dict({"neck.encoder": dict(lr_mult=0.05)}), + ), + clip_grad=dict(max_norm=0.1, norm_type=2), +) param_scheduler = [ - dict( - type='QuadraticWarmupLR', - by_epoch=True, - begin=0, - end=5, - convert_to_iter_based=True), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=5, - T_max=280, - end=280, - by_epoch=True, - convert_to_iter_based=True), + dict(type="QuadraticWarmupLR", by_epoch=True, begin=0, end=5, convert_to_iter_based=True), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=5, T_max=280, end=280, by_epoch=True, convert_to_iter_based=True), # this scheduler is used to increase the lr from 2e-4 to 5e-4 - dict(type='ConstantLR', by_epoch=True, factor=2.5, begin=280, end=281), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=281, - T_max=300, - end=580, - by_epoch=True, - convert_to_iter_based=True), - dict(type='ConstantLR', by_epoch=True, factor=1, begin=580, end=600), + dict(type="ConstantLR", by_epoch=True, factor=2.5, begin=280, end=281), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=281, T_max=300, end=580, by_epoch=True, convert_to_iter_based=True), + dict(type="ConstantLR", by_epoch=True, factor=1, begin=580, end=600), ] # data input_size = (640, 640) -metafile = 'configs/_base_/datasets/coco.py' -codec = dict(type='YOLOXPoseAnnotationProcessor', input_size=input_size) +metafile = "configs/_base_/datasets/coco.py" +codec = dict(type="YOLOXPoseAnnotationProcessor", input_size=input_size) train_pipeline_stage1 = [ - dict(type='LoadImage', backend_args=None), + dict(type="LoadImage", backend_args=None), + dict(type="Mosaic", img_scale=(640, 640), pad_val=114.0, pre_transform=[dict(type="LoadImage", backend_args=None)]), dict( - type='Mosaic', - img_scale=(640, 640), - pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_factor=0.1, rotate_factor=10, scale_factor=(0.75, 1.0), pad_val=114, - distribution='uniform', - transform_mode='perspective', + distribution="uniform", + transform_mode="perspective", bbox_keep_corner=False, clip_border=True, ), dict( - type='YOLOXMixUp', + type="YOLOXMixUp", img_scale=(640, 640), ratio_range=(0.8, 1.6), pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + pre_transform=[dict(type="LoadImage", backend_args=None)], + ), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage'), + dict(type="LoadImage"), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_prob=0, rotate_prob=0, scale_prob=0, - scale_type='long', + scale_type="long", pad_val=(114, 114, 114), bbox_keep_corner=False, clip_border=True, ), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='BottomupGetHeatmapMask', get_invalid=True), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="BottomupGetHeatmapMask", get_invalid=True), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data settings -data_mode = 'bottomup' -data_root = 'data/' +data_mode = "bottomup" +data_root = "data/" # mapping aic_coco = [ @@ -230,90 +209,70 @@ posetrack_coco = [ # train datasets dataset_coco = dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=17, - mapping=[(i, i) for i in range(17)]) - ], + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=[(i, i) for i in range(17)])], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) train_dataset = dict( - type='CombinedDataset', + type="CombinedDataset", metainfo=dict(from_file=metafile), datasets=[ dataset_coco, @@ -326,25 +285,25 @@ train_dataset = dict( ], sample_ratio_factor=[1, 0.3, 0.5, 0.3, 0.3, 0.4, 0.3], test_mode=False, - pipeline=train_pipeline_stage1) + pipeline=train_pipeline_stage1, +) train_dataloader = dict( batch_size=32, num_workers=8, persistent_workers=True, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=True), - dataset=train_dataset) + sampler=dict(type="DefaultSampler", shuffle=True), + dataset=train_dataset, +) # val datasets val_pipeline = [ - dict(type='LoadImage'), - dict( - type='BottomupResize', input_size=input_size, pad_val=(114, 114, 114)), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=input_size, pad_val=(114, 114, 114)), dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'ori_shape', 'img_shape', - 'input_size', 'input_center', 'input_scale')) + type="PackPoseInputs", meta_keys=("id", "img_id", "img_path", "ori_shape", "img_shape", "input_size", "input_center", "input_scale") + ), ] val_dataloader = dict( @@ -353,53 +312,46 @@ val_dataloader = dict( persistent_workers=True, pin_memory=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json', - score_mode='bbox', - nms_mode='none', + type="CocoMetric", + ann_file=data_root + "coco/annotations/person_keypoints_val2017.json", + score_mode="bbox", + nms_mode="none", ) test_evaluator = val_evaluator # hooks custom_hooks = [ dict( - type='YOLOXPoseModeSwitchHook', + type="YOLOXPoseModeSwitchHook", num_last_epochs=20, new_train_dataset=dataset_coco, new_train_pipeline=train_pipeline_stage2, - priority=48), + priority=48, + ), dict( - type='RTMOModeSwitchHook', + type="RTMOModeSwitchHook", epoch_attributes={ - 280: { - 'proxy_target_cc': True, - 'loss_mle.loss_weight': 5.0, - 'loss_oks.loss_weight': 10.0 - }, + 280: {"proxy_target_cc": True, "loss_mle.loss_weight": 5.0, "loss_oks.loss_weight": 10.0}, }, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - strict_load=False, - priority=49), + priority=48, + ), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, strict_load=False, priority=49), ] # model @@ -407,43 +359,35 @@ widen_factor = 0.5 deepen_factor = 0.33 model = dict( - type='BottomupPoseEstimator', - init_cfg=dict( - type='Kaiming', - layer='Conv2d', - a=2.23606797749979, - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu'), + type="BottomupPoseEstimator", + init_cfg=dict(type="Kaiming", layer="Conv2d", a=2.23606797749979, distribution="uniform", mode="fan_in", nonlinearity="leaky_relu"), data_preprocessor=dict( - type='PoseDataPreprocessor', + type="PoseDataPreprocessor", pad_size_divisor=32, mean=[0, 0, 0], std=[1, 1, 1], batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(480, 800), - size_divisor=32, - interval=1), - ]), + dict(type="BatchSyncRandomResize", random_size_range=(480, 800), size_divisor=32, interval=1), + ], + ), backbone=dict( - type='CSPDarknet', + type="CSPDarknet", deepen_factor=deepen_factor, widen_factor=widen_factor, out_indices=(2, 3, 4), spp_kernal_sizes=(5, 9, 13), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmdetection/v2.0/' - 'yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_' - '20211121_095711-4592a793.pth', - prefix='backbone.', - )), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/" + "yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_" + "20211121_095711-4592a793.pth", + prefix="backbone.", + ), + ), neck=dict( - type='HybridEncoder', + type="HybridEncoder", in_channels=[128, 256, 512], deepen_factor=deepen_factor, widen_factor=widen_factor, @@ -451,21 +395,20 @@ model = dict( output_indices=[1, 2], encoder_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=1024, - ffn_drop=0.0, - act_cfg=dict(type='GELU'))), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0, act_cfg=dict(type="GELU")), + ), projector=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[256, 256], kernel_size=1, out_channels=256, act_cfg=None, - norm_cfg=dict(type='BN'), - num_outs=2)), + norm_cfg=dict(type="BN"), + num_outs=2, + ), + ), head=dict( - type='RTMOHead', + type="RTMOHead", num_keypoints=17, featmap_strides=(16, 32), head_module_cfg=dict( @@ -476,60 +419,38 @@ model = dict( pose_vec_channels=256, widen_factor=widen_factor, stacked_convs=2, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), assigner=dict( - type='SimOTAAssigner', - dynamic_k_indicator='oks', - oks_calculator=dict(type='PoseOKS', metainfo=metafile), - use_keypoints_for_center=True), - prior_generator=dict( - type='MlvlPointGenerator', - centralize_points=True, - strides=[16, 32]), + type="SimOTAAssigner", + dynamic_k_indicator="oks", + oks_calculator=dict(type="PoseOKS", metainfo=metafile), + use_keypoints_for_center=True, + ), + prior_generator=dict(type="MlvlPointGenerator", centralize_points=True, strides=[16, 32]), dcc_cfg=dict( in_channels=256, feat_channels=128, num_bins=(192, 256), spe_channels=128, - gau_cfg=dict( - s=128, - expansion_factor=2, - dropout_rate=0.0, - drop_path=0.0, - act_fn='SiLU', - pos_enc='add')), + gau_cfg=dict(s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", pos_enc="add"), + ), overlaps_power=0.5, - loss_cls=dict( - type='VariFocalLoss', - reduction='sum', - use_target_weight=True, - loss_weight=1.0), - loss_bbox=dict( - type='IoULoss', - mode='square', - eps=1e-16, - reduction='sum', - loss_weight=5.0), - loss_oks=dict( - type='OKSLoss', - reduction='none', - metainfo=metafile, - loss_weight=30.0), - loss_vis=dict( - type='BCELoss', - use_target_weight=True, - reduction='mean', - loss_weight=1.0), + loss_cls=dict(type="VariFocalLoss", reduction="sum", use_target_weight=True, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", mode="square", eps=1e-16, reduction="sum", loss_weight=5.0), + loss_oks=dict(type="OKSLoss", reduction="none", metainfo=metafile, loss_weight=30.0), + loss_vis=dict(type="BCELoss", use_target_weight=True, reduction="mean", loss_weight=1.0), loss_mle=dict( - type='MLECCLoss', + type="MLECCLoss", use_target_weight=True, loss_weight=1.0, ), - loss_bbox_aux=dict(type='L1Loss', reduction='sum', loss_weight=1.0), + loss_bbox_aux=dict(type="L1Loss", reduction="sum", loss_weight=1.0), ), test_cfg=dict( input_size=input_size, score_thr=0.1, nms_thr=0.65, - )) + ), +) diff --git a/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-t_8xb32-600e_body7-416x416.py b/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-t_8xb32-600e_body7-416x416.py index 566fe34455dacd437d736d6d94f7fa2d627d8a34..bed781e5679d6268a405b4c42ef7730568d69502 100644 --- a/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-t_8xb32-600e_body7-416x416.py +++ b/mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-t_8xb32-600e_body7-416x416.py @@ -1,107 +1,85 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=600, val_interval=20, dynamic_intervals=[(580, 1)]) auto_scale_lr = dict(base_batch_size=256) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', interval=40, max_keep_ckpts=3)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", interval=40, max_keep_ckpts=3)) optim_wrapper = dict( - type='OptimWrapper', - constructor='ForceDefaultOptimWrapperConstructor', - optimizer=dict(type='AdamW', lr=0.004, weight_decay=0.05), + type="OptimWrapper", + constructor="ForceDefaultOptimWrapperConstructor", + optimizer=dict(type="AdamW", lr=0.004, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True, force_default_settings=True, - custom_keys=dict({'neck.encoder': dict(lr_mult=0.05)})), - clip_grad=dict(max_norm=0.1, norm_type=2)) + custom_keys=dict({"neck.encoder": dict(lr_mult=0.05)}), + ), + clip_grad=dict(max_norm=0.1, norm_type=2), +) param_scheduler = [ - dict( - type='QuadraticWarmupLR', - by_epoch=True, - begin=0, - end=5, - convert_to_iter_based=True), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=5, - T_max=280, - end=280, - by_epoch=True, - convert_to_iter_based=True), + dict(type="QuadraticWarmupLR", by_epoch=True, begin=0, end=5, convert_to_iter_based=True), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=5, T_max=280, end=280, by_epoch=True, convert_to_iter_based=True), # this scheduler is used to increase the lr from 2e-4 to 5e-4 - dict(type='ConstantLR', by_epoch=True, factor=2.5, begin=280, end=281), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=281, - T_max=300, - end=580, - by_epoch=True, - convert_to_iter_based=True), - dict(type='ConstantLR', by_epoch=True, factor=1, begin=580, end=600), + dict(type="ConstantLR", by_epoch=True, factor=2.5, begin=280, end=281), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=281, T_max=300, end=580, by_epoch=True, convert_to_iter_based=True), + dict(type="ConstantLR", by_epoch=True, factor=1, begin=580, end=600), ] # data input_size = (416, 416) -metafile = 'configs/_base_/datasets/coco.py' -codec = dict(type='YOLOXPoseAnnotationProcessor', input_size=input_size) +metafile = "configs/_base_/datasets/coco.py" +codec = dict(type="YOLOXPoseAnnotationProcessor", input_size=input_size) train_pipeline_stage1 = [ - dict(type='LoadImage', backend_args=None), + dict(type="LoadImage", backend_args=None), + dict(type="Mosaic", img_scale=(416, 416), pad_val=114.0, pre_transform=[dict(type="LoadImage", backend_args=None)]), dict( - type='Mosaic', - img_scale=(416, 416), - pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(416, 416), shift_factor=0.1, rotate_factor=10, scale_factor=(0.75, 1.0), pad_val=114, - distribution='uniform', - transform_mode='perspective', + distribution="uniform", + transform_mode="perspective", bbox_keep_corner=False, clip_border=True, ), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage'), + dict(type="LoadImage"), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(416, 416), shift_prob=0, rotate_prob=0, scale_prob=0, - scale_type='long', + scale_type="long", pad_val=(114, 114, 114), bbox_keep_corner=False, clip_border=True, ), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='BottomupGetHeatmapMask', get_invalid=True), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="BottomupGetHeatmapMask", get_invalid=True), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data settings -data_mode = 'bottomup' -data_root = 'data/' +data_mode = "bottomup" +data_root = "data/" # mapping aic_coco = [ @@ -224,90 +202,70 @@ posetrack_coco = [ # train datasets dataset_coco = dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=17, - mapping=[(i, i) for i in range(17)]) - ], + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=[(i, i) for i in range(17)])], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) train_dataset = dict( - type='CombinedDataset', + type="CombinedDataset", metainfo=dict(from_file=metafile), datasets=[ dataset_coco, @@ -320,25 +278,25 @@ train_dataset = dict( ], sample_ratio_factor=[1, 0.3, 0.5, 0.3, 0.3, 0.4, 0.3], test_mode=False, - pipeline=train_pipeline_stage1) + pipeline=train_pipeline_stage1, +) train_dataloader = dict( batch_size=32, num_workers=8, persistent_workers=True, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=True), - dataset=train_dataset) + sampler=dict(type="DefaultSampler", shuffle=True), + dataset=train_dataset, +) # val datasets val_pipeline = [ - dict(type='LoadImage'), - dict( - type='BottomupResize', input_size=input_size, pad_val=(114, 114, 114)), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=input_size, pad_val=(114, 114, 114)), dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'ori_shape', 'img_shape', - 'input_size', 'input_center', 'input_scale')) + type="PackPoseInputs", meta_keys=("id", "img_id", "img_path", "ori_shape", "img_shape", "input_size", "input_center", "input_scale") + ), ] val_dataloader = dict( @@ -347,53 +305,46 @@ val_dataloader = dict( persistent_workers=True, pin_memory=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json', - score_mode='bbox', - nms_mode='none', + type="CocoMetric", + ann_file=data_root + "coco/annotations/person_keypoints_val2017.json", + score_mode="bbox", + nms_mode="none", ) test_evaluator = val_evaluator # hooks custom_hooks = [ dict( - type='YOLOXPoseModeSwitchHook', + type="YOLOXPoseModeSwitchHook", num_last_epochs=20, new_train_dataset=dataset_coco, new_train_pipeline=train_pipeline_stage2, - priority=48), + priority=48, + ), dict( - type='RTMOModeSwitchHook', + type="RTMOModeSwitchHook", epoch_attributes={ - 280: { - 'proxy_target_cc': True, - 'loss_mle.loss_weight': 5.0, - 'loss_oks.loss_weight': 10.0 - }, + 280: {"proxy_target_cc": True, "loss_mle.loss_weight": 5.0, "loss_oks.loss_weight": 10.0}, }, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - strict_load=False, - priority=49), + priority=48, + ), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, strict_load=False, priority=49), ] # model @@ -401,43 +352,35 @@ widen_factor = 0.375 deepen_factor = 0.33 model = dict( - type='BottomupPoseEstimator', - init_cfg=dict( - type='Kaiming', - layer='Conv2d', - a=2.23606797749979, - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu'), + type="BottomupPoseEstimator", + init_cfg=dict(type="Kaiming", layer="Conv2d", a=2.23606797749979, distribution="uniform", mode="fan_in", nonlinearity="leaky_relu"), data_preprocessor=dict( - type='PoseDataPreprocessor', + type="PoseDataPreprocessor", pad_size_divisor=32, mean=[0, 0, 0], std=[1, 1, 1], batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(320, 640), - size_divisor=32, - interval=1), - ]), + dict(type="BatchSyncRandomResize", random_size_range=(320, 640), size_divisor=32, interval=1), + ], + ), backbone=dict( - type='CSPDarknet', + type="CSPDarknet", deepen_factor=deepen_factor, widen_factor=widen_factor, out_indices=(2, 3, 4), spp_kernal_sizes=(5, 9, 13), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmdetection/v2.0/' - 'yolox/yolox_tiny_8x8_300e_coco/yolox_tiny_8x8_300e_coco_' - '20211124_171234-b4047906.pth', - prefix='backbone.', - )), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/" + "yolox/yolox_tiny_8x8_300e_coco/yolox_tiny_8x8_300e_coco_" + "20211124_171234-b4047906.pth", + prefix="backbone.", + ), + ), neck=dict( - type='HybridEncoder', + type="HybridEncoder", in_channels=[96, 192, 384], deepen_factor=deepen_factor, widen_factor=widen_factor, @@ -445,21 +388,20 @@ model = dict( output_indices=[1, 2], encoder_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=1024, - ffn_drop=0.0, - act_cfg=dict(type='GELU'))), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0, act_cfg=dict(type="GELU")), + ), projector=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[256, 256], kernel_size=1, out_channels=192, act_cfg=None, - norm_cfg=dict(type='BN'), - num_outs=2)), + norm_cfg=dict(type="BN"), + num_outs=2, + ), + ), head=dict( - type='RTMOHead', + type="RTMOHead", num_keypoints=17, featmap_strides=(16, 32), head_module_cfg=dict( @@ -470,60 +412,38 @@ model = dict( pose_vec_channels=192, widen_factor=widen_factor, stacked_convs=2, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), assigner=dict( - type='SimOTAAssigner', - dynamic_k_indicator='oks', - oks_calculator=dict(type='PoseOKS', metainfo=metafile), - use_keypoints_for_center=True), - prior_generator=dict( - type='MlvlPointGenerator', - centralize_points=True, - strides=[16, 32]), + type="SimOTAAssigner", + dynamic_k_indicator="oks", + oks_calculator=dict(type="PoseOKS", metainfo=metafile), + use_keypoints_for_center=True, + ), + prior_generator=dict(type="MlvlPointGenerator", centralize_points=True, strides=[16, 32]), dcc_cfg=dict( in_channels=192, feat_channels=128, num_bins=(192, 256), spe_channels=128, - gau_cfg=dict( - s=128, - expansion_factor=2, - dropout_rate=0.0, - drop_path=0.0, - act_fn='SiLU', - pos_enc='add')), + gau_cfg=dict(s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", pos_enc="add"), + ), overlaps_power=0.5, - loss_cls=dict( - type='VariFocalLoss', - reduction='sum', - use_target_weight=True, - loss_weight=1.0), - loss_bbox=dict( - type='IoULoss', - mode='square', - eps=1e-16, - reduction='sum', - loss_weight=5.0), - loss_oks=dict( - type='OKSLoss', - reduction='none', - metainfo=metafile, - loss_weight=30.0), - loss_vis=dict( - type='BCELoss', - use_target_weight=True, - reduction='mean', - loss_weight=1.0), + loss_cls=dict(type="VariFocalLoss", reduction="sum", use_target_weight=True, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", mode="square", eps=1e-16, reduction="sum", loss_weight=5.0), + loss_oks=dict(type="OKSLoss", reduction="none", metainfo=metafile, loss_weight=30.0), + loss_vis=dict(type="BCELoss", use_target_weight=True, reduction="mean", loss_weight=1.0), loss_mle=dict( - type='MLECCLoss', + type="MLECCLoss", use_target_weight=True, loss_weight=1.0, ), - loss_bbox_aux=dict(type='L1Loss', reduction='sum', loss_weight=1.0), + loss_bbox_aux=dict(type="L1Loss", reduction="sum", loss_weight=1.0), ), test_cfg=dict( input_size=input_size, score_thr=0.1, nms_thr=0.65, - )) + ), +) diff --git a/mmpose/configs/body_2d_keypoint/rtmo/coco/rtmo-l_16xb16-600e_coco-640x640.py b/mmpose/configs/body_2d_keypoint/rtmo/coco/rtmo-l_16xb16-600e_coco-640x640.py index 97bbd109ca3a9937cc57ad3afefe5ea9134ec265..9a23d6b6ea79af3d996d1259198a87d4ebc67412 100644 --- a/mmpose/configs/body_2d_keypoint/rtmo/coco/rtmo-l_16xb16-600e_coco-640x640.py +++ b/mmpose/configs/body_2d_keypoint/rtmo/coco/rtmo-l_16xb16-600e_coco-640x640.py @@ -1,117 +1,96 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=600, val_interval=20, dynamic_intervals=[(580, 1)]) auto_scale_lr = dict(base_batch_size=256) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', interval=40, max_keep_ckpts=3)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", interval=40, max_keep_ckpts=3)) optim_wrapper = dict( - type='OptimWrapper', - constructor='ForceDefaultOptimWrapperConstructor', - optimizer=dict(type='AdamW', lr=0.004, weight_decay=0.05), + type="OptimWrapper", + constructor="ForceDefaultOptimWrapperConstructor", + optimizer=dict(type="AdamW", lr=0.004, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True, force_default_settings=True, - custom_keys=dict({'neck.encoder': dict(lr_mult=0.05)})), - clip_grad=dict(max_norm=0.1, norm_type=2)) + custom_keys=dict({"neck.encoder": dict(lr_mult=0.05)}), + ), + clip_grad=dict(max_norm=0.1, norm_type=2), +) param_scheduler = [ - dict( - type='QuadraticWarmupLR', - by_epoch=True, - begin=0, - end=5, - convert_to_iter_based=True), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=5, - T_max=280, - end=280, - by_epoch=True, - convert_to_iter_based=True), + dict(type="QuadraticWarmupLR", by_epoch=True, begin=0, end=5, convert_to_iter_based=True), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=5, T_max=280, end=280, by_epoch=True, convert_to_iter_based=True), # this scheduler is used to increase the lr from 2e-4 to 5e-4 - dict(type='ConstantLR', by_epoch=True, factor=2.5, begin=280, end=281), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=281, - T_max=300, - end=580, - by_epoch=True, - convert_to_iter_based=True), - dict(type='ConstantLR', by_epoch=True, factor=1, begin=580, end=600), + dict(type="ConstantLR", by_epoch=True, factor=2.5, begin=280, end=281), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=281, T_max=300, end=580, by_epoch=True, convert_to_iter_based=True), + dict(type="ConstantLR", by_epoch=True, factor=1, begin=580, end=600), ] # data input_size = (640, 640) -metafile = 'configs/_base_/datasets/coco.py' -codec = dict(type='YOLOXPoseAnnotationProcessor', input_size=input_size) +metafile = "configs/_base_/datasets/coco.py" +codec = dict(type="YOLOXPoseAnnotationProcessor", input_size=input_size) train_pipeline_stage1 = [ - dict(type='LoadImage', backend_args=None), - dict( - type='Mosaic', - img_scale=(640, 640), - pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), + dict(type="LoadImage", backend_args=None), + dict(type="Mosaic", img_scale=(640, 640), pad_val=114.0, pre_transform=[dict(type="LoadImage", backend_args=None)]), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_factor=0.1, rotate_factor=10, scale_factor=(0.75, 1.0), pad_val=114, - distribution='uniform', - transform_mode='perspective', + distribution="uniform", + transform_mode="perspective", bbox_keep_corner=False, clip_border=True, ), dict( - type='YOLOXMixUp', + type="YOLOXMixUp", img_scale=(640, 640), ratio_range=(0.8, 1.6), pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + pre_transform=[dict(type="LoadImage", backend_args=None)], + ), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage'), + dict(type="LoadImage"), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), - scale_type='long', + scale_type="long", pad_val=(114, 114, 114), bbox_keep_corner=False, clip_border=True, ), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='BottomupGetHeatmapMask', get_invalid=True), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="BottomupGetHeatmapMask", get_invalid=True), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] -data_mode = 'bottomup' -data_root = 'data/' +data_mode = "bottomup" +data_root = "data/" # train datasets dataset_coco = dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="coco/train2017/"), pipeline=train_pipeline_stage1, ) @@ -120,17 +99,16 @@ train_dataloader = dict( num_workers=8, persistent_workers=True, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=True), - dataset=dataset_coco) + sampler=dict(type="DefaultSampler", shuffle=True), + dataset=dataset_coco, +) val_pipeline = [ - dict(type='LoadImage'), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=input_size, pad_val=(114, 114, 114)), dict( - type='BottomupResize', input_size=input_size, pad_val=(114, 114, 114)), - dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'ori_shape', 'img_shape', - 'input_size', 'input_center', 'input_scale')) + type="PackPoseInputs", meta_keys=("id", "img_id", "img_path", "ori_shape", "img_shape", "input_size", "input_center", "input_scale") + ), ] val_dataloader = dict( @@ -139,54 +117,46 @@ val_dataloader = dict( persistent_workers=True, pin_memory=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json', - score_mode='bbox', - nms_mode='none', + type="CocoMetric", + ann_file=data_root + "coco/annotations/person_keypoints_val2017.json", + score_mode="bbox", + nms_mode="none", ) test_evaluator = val_evaluator # hooks custom_hooks = [ + dict(type="YOLOXPoseModeSwitchHook", num_last_epochs=20, new_train_pipeline=train_pipeline_stage2, priority=48), dict( - type='YOLOXPoseModeSwitchHook', - num_last_epochs=20, - new_train_pipeline=train_pipeline_stage2, - priority=48), - dict( - type='RTMOModeSwitchHook', + type="RTMOModeSwitchHook", epoch_attributes={ 280: { - 'proxy_target_cc': True, - 'overlaps_power': 1.0, - 'loss_cls.loss_weight': 2.0, - 'loss_mle.loss_weight': 5.0, - 'loss_oks.loss_weight': 10.0 + "proxy_target_cc": True, + "overlaps_power": 1.0, + "loss_cls.loss_weight": 2.0, + "loss_mle.loss_weight": 5.0, + "loss_oks.loss_weight": 10.0, }, }, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - strict_load=False, - priority=49), + priority=48, + ), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, strict_load=False, priority=49), ] # model @@ -194,43 +164,35 @@ widen_factor = 1.0 deepen_factor = 1.0 model = dict( - type='BottomupPoseEstimator', - init_cfg=dict( - type='Kaiming', - layer='Conv2d', - a=2.23606797749979, - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu'), + type="BottomupPoseEstimator", + init_cfg=dict(type="Kaiming", layer="Conv2d", a=2.23606797749979, distribution="uniform", mode="fan_in", nonlinearity="leaky_relu"), data_preprocessor=dict( - type='PoseDataPreprocessor', + type="PoseDataPreprocessor", pad_size_divisor=32, mean=[0, 0, 0], std=[1, 1, 1], batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(480, 800), - size_divisor=32, - interval=1), - ]), + dict(type="BatchSyncRandomResize", random_size_range=(480, 800), size_divisor=32, interval=1), + ], + ), backbone=dict( - type='CSPDarknet', + type="CSPDarknet", deepen_factor=deepen_factor, widen_factor=widen_factor, out_indices=(2, 3, 4), spp_kernal_sizes=(5, 9, 13), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmdetection/v2.0/' - 'yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco' - '_20211126_140236-d3bd2b23.pth', - prefix='backbone.', - )), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/" + "yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco" + "_20211126_140236-d3bd2b23.pth", + prefix="backbone.", + ), + ), neck=dict( - type='HybridEncoder', + type="HybridEncoder", in_channels=[256, 512, 1024], deepen_factor=deepen_factor, widen_factor=widen_factor, @@ -238,21 +200,20 @@ model = dict( output_indices=[1, 2], encoder_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=1024, - ffn_drop=0.0, - act_cfg=dict(type='GELU'))), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0, act_cfg=dict(type="GELU")), + ), projector=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[256, 256], kernel_size=1, out_channels=512, act_cfg=None, - norm_cfg=dict(type='BN'), - num_outs=2)), + norm_cfg=dict(type="BN"), + num_outs=2, + ), + ), head=dict( - type='RTMOHead', + type="RTMOHead", num_keypoints=17, featmap_strides=(16, 32), head_module_cfg=dict( @@ -263,59 +224,33 @@ model = dict( pose_vec_channels=512, widen_factor=widen_factor, stacked_convs=2, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), - assigner=dict( - type='SimOTAAssigner', - dynamic_k_indicator='oks', - oks_calculator=dict(type='PoseOKS', metainfo=metafile)), - prior_generator=dict( - type='MlvlPointGenerator', - centralize_points=True, - strides=[16, 32]), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), + assigner=dict(type="SimOTAAssigner", dynamic_k_indicator="oks", oks_calculator=dict(type="PoseOKS", metainfo=metafile)), + prior_generator=dict(type="MlvlPointGenerator", centralize_points=True, strides=[16, 32]), dcc_cfg=dict( in_channels=512, feat_channels=128, num_bins=(192, 256), spe_channels=128, - gau_cfg=dict( - s=128, - expansion_factor=2, - dropout_rate=0.0, - drop_path=0.0, - act_fn='SiLU', - pos_enc='add')), + gau_cfg=dict(s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", pos_enc="add"), + ), overlaps_power=0.5, - loss_cls=dict( - type='VariFocalLoss', - reduction='sum', - use_target_weight=True, - loss_weight=1.0), - loss_bbox=dict( - type='IoULoss', - mode='square', - eps=1e-16, - reduction='sum', - loss_weight=5.0), - loss_oks=dict( - type='OKSLoss', - reduction='none', - metainfo=metafile, - loss_weight=30.0), - loss_vis=dict( - type='BCELoss', - use_target_weight=True, - reduction='mean', - loss_weight=1.0), + loss_cls=dict(type="VariFocalLoss", reduction="sum", use_target_weight=True, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", mode="square", eps=1e-16, reduction="sum", loss_weight=5.0), + loss_oks=dict(type="OKSLoss", reduction="none", metainfo=metafile, loss_weight=30.0), + loss_vis=dict(type="BCELoss", use_target_weight=True, reduction="mean", loss_weight=1.0), loss_mle=dict( - type='MLECCLoss', + type="MLECCLoss", use_target_weight=True, loss_weight=1e-2, ), - loss_bbox_aux=dict(type='L1Loss', reduction='sum', loss_weight=1.0), + loss_bbox_aux=dict(type="L1Loss", reduction="sum", loss_weight=1.0), ), test_cfg=dict( input_size=input_size, score_thr=0.1, nms_thr=0.65, - )) + ), +) diff --git a/mmpose/configs/body_2d_keypoint/rtmo/coco/rtmo-m_16xb16-600e_coco-640x640.py b/mmpose/configs/body_2d_keypoint/rtmo/coco/rtmo-m_16xb16-600e_coco-640x640.py index de669ba604469cf08d2c8d81457c896d4f321cc4..682cf69484dba3df76ac19321199d8c61a8e8a4d 100644 --- a/mmpose/configs/body_2d_keypoint/rtmo/coco/rtmo-m_16xb16-600e_coco-640x640.py +++ b/mmpose/configs/body_2d_keypoint/rtmo/coco/rtmo-m_16xb16-600e_coco-640x640.py @@ -1,117 +1,96 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=600, val_interval=20, dynamic_intervals=[(580, 1)]) auto_scale_lr = dict(base_batch_size=256) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', interval=40, max_keep_ckpts=3)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", interval=40, max_keep_ckpts=3)) optim_wrapper = dict( - type='OptimWrapper', - constructor='ForceDefaultOptimWrapperConstructor', - optimizer=dict(type='AdamW', lr=0.004, weight_decay=0.05), + type="OptimWrapper", + constructor="ForceDefaultOptimWrapperConstructor", + optimizer=dict(type="AdamW", lr=0.004, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True, force_default_settings=True, - custom_keys=dict({'neck.encoder': dict(lr_mult=0.05)})), - clip_grad=dict(max_norm=0.1, norm_type=2)) + custom_keys=dict({"neck.encoder": dict(lr_mult=0.05)}), + ), + clip_grad=dict(max_norm=0.1, norm_type=2), +) param_scheduler = [ - dict( - type='QuadraticWarmupLR', - by_epoch=True, - begin=0, - end=5, - convert_to_iter_based=True), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=5, - T_max=280, - end=280, - by_epoch=True, - convert_to_iter_based=True), + dict(type="QuadraticWarmupLR", by_epoch=True, begin=0, end=5, convert_to_iter_based=True), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=5, T_max=280, end=280, by_epoch=True, convert_to_iter_based=True), # this scheduler is used to increase the lr from 2e-4 to 5e-4 - dict(type='ConstantLR', by_epoch=True, factor=2.5, begin=280, end=281), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=281, - T_max=300, - end=580, - by_epoch=True, - convert_to_iter_based=True), - dict(type='ConstantLR', by_epoch=True, factor=1, begin=580, end=600), + dict(type="ConstantLR", by_epoch=True, factor=2.5, begin=280, end=281), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=281, T_max=300, end=580, by_epoch=True, convert_to_iter_based=True), + dict(type="ConstantLR", by_epoch=True, factor=1, begin=580, end=600), ] # data input_size = (640, 640) -metafile = 'configs/_base_/datasets/coco.py' -codec = dict(type='YOLOXPoseAnnotationProcessor', input_size=input_size) +metafile = "configs/_base_/datasets/coco.py" +codec = dict(type="YOLOXPoseAnnotationProcessor", input_size=input_size) train_pipeline_stage1 = [ - dict(type='LoadImage', backend_args=None), - dict( - type='Mosaic', - img_scale=(640, 640), - pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), + dict(type="LoadImage", backend_args=None), + dict(type="Mosaic", img_scale=(640, 640), pad_val=114.0, pre_transform=[dict(type="LoadImage", backend_args=None)]), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_factor=0.1, rotate_factor=10, scale_factor=(0.75, 1.0), pad_val=114, - distribution='uniform', - transform_mode='perspective', + distribution="uniform", + transform_mode="perspective", bbox_keep_corner=False, clip_border=True, ), dict( - type='YOLOXMixUp', + type="YOLOXMixUp", img_scale=(640, 640), ratio_range=(0.8, 1.6), pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + pre_transform=[dict(type="LoadImage", backend_args=None)], + ), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage'), + dict(type="LoadImage"), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), - scale_type='long', + scale_type="long", pad_val=(114, 114, 114), bbox_keep_corner=False, clip_border=True, ), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='BottomupGetHeatmapMask', get_invalid=True), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="BottomupGetHeatmapMask", get_invalid=True), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] -data_mode = 'bottomup' -data_root = 'data/' +data_mode = "bottomup" +data_root = "data/" # train datasets dataset_coco = dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="coco/train2017/"), pipeline=train_pipeline_stage1, ) @@ -120,17 +99,16 @@ train_dataloader = dict( num_workers=8, persistent_workers=True, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=True), - dataset=dataset_coco) + sampler=dict(type="DefaultSampler", shuffle=True), + dataset=dataset_coco, +) val_pipeline = [ - dict(type='LoadImage'), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=input_size, pad_val=(114, 114, 114)), dict( - type='BottomupResize', input_size=input_size, pad_val=(114, 114, 114)), - dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'ori_shape', 'img_shape', - 'input_size', 'input_center', 'input_scale')) + type="PackPoseInputs", meta_keys=("id", "img_id", "img_path", "ori_shape", "img_shape", "input_size", "input_center", "input_scale") + ), ] val_dataloader = dict( @@ -139,54 +117,46 @@ val_dataloader = dict( persistent_workers=True, pin_memory=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json', - score_mode='bbox', - nms_mode='none', + type="CocoMetric", + ann_file=data_root + "coco/annotations/person_keypoints_val2017.json", + score_mode="bbox", + nms_mode="none", ) test_evaluator = val_evaluator # hooks custom_hooks = [ + dict(type="YOLOXPoseModeSwitchHook", num_last_epochs=20, new_train_pipeline=train_pipeline_stage2, priority=48), dict( - type='YOLOXPoseModeSwitchHook', - num_last_epochs=20, - new_train_pipeline=train_pipeline_stage2, - priority=48), - dict( - type='RTMOModeSwitchHook', + type="RTMOModeSwitchHook", epoch_attributes={ 280: { - 'proxy_target_cc': True, - 'overlaps_power': 1.0, - 'loss_cls.loss_weight': 2.0, - 'loss_mle.loss_weight': 5.0, - 'loss_oks.loss_weight': 10.0 + "proxy_target_cc": True, + "overlaps_power": 1.0, + "loss_cls.loss_weight": 2.0, + "loss_mle.loss_weight": 5.0, + "loss_oks.loss_weight": 10.0, }, }, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - strict_load=False, - priority=49), + priority=48, + ), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, strict_load=False, priority=49), ] # model @@ -194,42 +164,33 @@ widen_factor = 0.75 deepen_factor = 0.67 model = dict( - type='BottomupPoseEstimator', - init_cfg=dict( - type='Kaiming', - layer='Conv2d', - a=2.23606797749979, - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu'), + type="BottomupPoseEstimator", + init_cfg=dict(type="Kaiming", layer="Conv2d", a=2.23606797749979, distribution="uniform", mode="fan_in", nonlinearity="leaky_relu"), data_preprocessor=dict( - type='PoseDataPreprocessor', + type="PoseDataPreprocessor", pad_size_divisor=32, mean=[0, 0, 0], std=[1, 1, 1], batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(480, 800), - size_divisor=32, - interval=1), - ]), + dict(type="BatchSyncRandomResize", random_size_range=(480, 800), size_divisor=32, interval=1), + ], + ), backbone=dict( - type='CSPDarknet', + type="CSPDarknet", deepen_factor=deepen_factor, widen_factor=widen_factor, out_indices=(2, 3, 4), spp_kernal_sizes=(5, 9, 13), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/v1/' - 'pretrained_models/yolox_m_8x8_300e_coco_20230829.pth', - prefix='backbone.', - )), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmpose/v1/" "pretrained_models/yolox_m_8x8_300e_coco_20230829.pth", + prefix="backbone.", + ), + ), neck=dict( - type='HybridEncoder', + type="HybridEncoder", in_channels=[192, 384, 768], deepen_factor=deepen_factor, widen_factor=widen_factor, @@ -237,21 +198,20 @@ model = dict( output_indices=[1, 2], encoder_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=1024, - ffn_drop=0.0, - act_cfg=dict(type='GELU'))), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0, act_cfg=dict(type="GELU")), + ), projector=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[256, 256], kernel_size=1, out_channels=384, act_cfg=None, - norm_cfg=dict(type='BN'), - num_outs=2)), + norm_cfg=dict(type="BN"), + num_outs=2, + ), + ), head=dict( - type='RTMOHead', + type="RTMOHead", num_keypoints=17, featmap_strides=(16, 32), head_module_cfg=dict( @@ -262,59 +222,33 @@ model = dict( pose_vec_channels=384, widen_factor=widen_factor, stacked_convs=2, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), - assigner=dict( - type='SimOTAAssigner', - dynamic_k_indicator='oks', - oks_calculator=dict(type='PoseOKS', metainfo=metafile)), - prior_generator=dict( - type='MlvlPointGenerator', - centralize_points=True, - strides=[16, 32]), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), + assigner=dict(type="SimOTAAssigner", dynamic_k_indicator="oks", oks_calculator=dict(type="PoseOKS", metainfo=metafile)), + prior_generator=dict(type="MlvlPointGenerator", centralize_points=True, strides=[16, 32]), dcc_cfg=dict( in_channels=384, feat_channels=128, num_bins=(192, 256), spe_channels=128, - gau_cfg=dict( - s=128, - expansion_factor=2, - dropout_rate=0.0, - drop_path=0.0, - act_fn='SiLU', - pos_enc='add')), + gau_cfg=dict(s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", pos_enc="add"), + ), overlaps_power=0.5, - loss_cls=dict( - type='VariFocalLoss', - reduction='sum', - use_target_weight=True, - loss_weight=1.0), - loss_bbox=dict( - type='IoULoss', - mode='square', - eps=1e-16, - reduction='sum', - loss_weight=5.0), - loss_oks=dict( - type='OKSLoss', - reduction='none', - metainfo=metafile, - loss_weight=30.0), - loss_vis=dict( - type='BCELoss', - use_target_weight=True, - reduction='mean', - loss_weight=1.0), + loss_cls=dict(type="VariFocalLoss", reduction="sum", use_target_weight=True, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", mode="square", eps=1e-16, reduction="sum", loss_weight=5.0), + loss_oks=dict(type="OKSLoss", reduction="none", metainfo=metafile, loss_weight=30.0), + loss_vis=dict(type="BCELoss", use_target_weight=True, reduction="mean", loss_weight=1.0), loss_mle=dict( - type='MLECCLoss', + type="MLECCLoss", use_target_weight=True, loss_weight=1e-2, ), - loss_bbox_aux=dict(type='L1Loss', reduction='sum', loss_weight=1.0), + loss_bbox_aux=dict(type="L1Loss", reduction="sum", loss_weight=1.0), ), test_cfg=dict( input_size=input_size, score_thr=0.1, nms_thr=0.65, - )) + ), +) diff --git a/mmpose/configs/body_2d_keypoint/rtmo/coco/rtmo-s_8xb32-600e_coco-640x640.py b/mmpose/configs/body_2d_keypoint/rtmo/coco/rtmo-s_8xb32-600e_coco-640x640.py index 755c47bf82a021f2b75f303da9ea579ca28fd4b8..ca5b27fc737f84f89f561da073539e82be8ee7fe 100644 --- a/mmpose/configs/body_2d_keypoint/rtmo/coco/rtmo-s_8xb32-600e_coco-640x640.py +++ b/mmpose/configs/body_2d_keypoint/rtmo/coco/rtmo-s_8xb32-600e_coco-640x640.py @@ -1,120 +1,99 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=600, val_interval=20, dynamic_intervals=[(580, 1)]) auto_scale_lr = dict(base_batch_size=256) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', interval=40, max_keep_ckpts=3)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", interval=40, max_keep_ckpts=3)) optim_wrapper = dict( - type='OptimWrapper', - constructor='ForceDefaultOptimWrapperConstructor', - optimizer=dict(type='AdamW', lr=0.004, weight_decay=0.05), + type="OptimWrapper", + constructor="ForceDefaultOptimWrapperConstructor", + optimizer=dict(type="AdamW", lr=0.004, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True, force_default_settings=True, - custom_keys=dict({'neck.encoder': dict(lr_mult=0.05)})), - clip_grad=dict(max_norm=0.1, norm_type=2)) + custom_keys=dict({"neck.encoder": dict(lr_mult=0.05)}), + ), + clip_grad=dict(max_norm=0.1, norm_type=2), +) param_scheduler = [ - dict( - type='QuadraticWarmupLR', - by_epoch=True, - begin=0, - end=5, - convert_to_iter_based=True), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=5, - T_max=280, - end=280, - by_epoch=True, - convert_to_iter_based=True), + dict(type="QuadraticWarmupLR", by_epoch=True, begin=0, end=5, convert_to_iter_based=True), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=5, T_max=280, end=280, by_epoch=True, convert_to_iter_based=True), # this scheduler is used to increase the lr from 2e-4 to 5e-4 - dict(type='ConstantLR', by_epoch=True, factor=2.5, begin=280, end=281), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=281, - T_max=300, - end=580, - by_epoch=True, - convert_to_iter_based=True), - dict(type='ConstantLR', by_epoch=True, factor=1, begin=580, end=600), + dict(type="ConstantLR", by_epoch=True, factor=2.5, begin=280, end=281), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=281, T_max=300, end=580, by_epoch=True, convert_to_iter_based=True), + dict(type="ConstantLR", by_epoch=True, factor=1, begin=580, end=600), ] # data input_size = (640, 640) -metafile = 'configs/_base_/datasets/coco.py' -codec = dict(type='YOLOXPoseAnnotationProcessor', input_size=input_size) +metafile = "configs/_base_/datasets/coco.py" +codec = dict(type="YOLOXPoseAnnotationProcessor", input_size=input_size) train_pipeline_stage1 = [ - dict(type='LoadImage', backend_args=None), - dict( - type='Mosaic', - img_scale=(640, 640), - pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), + dict(type="LoadImage", backend_args=None), + dict(type="Mosaic", img_scale=(640, 640), pad_val=114.0, pre_transform=[dict(type="LoadImage", backend_args=None)]), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_factor=0.1, rotate_factor=10, scale_factor=(0.75, 1.0), pad_val=114, - distribution='uniform', - transform_mode='perspective', + distribution="uniform", + transform_mode="perspective", bbox_keep_corner=False, clip_border=True, ), dict( - type='YOLOXMixUp', + type="YOLOXMixUp", img_scale=(640, 640), ratio_range=(0.8, 1.6), pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + pre_transform=[dict(type="LoadImage", backend_args=None)], + ), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage'), + dict(type="LoadImage"), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_prob=0, rotate_prob=0, scale_prob=0, - scale_type='long', + scale_type="long", pad_val=(114, 114, 114), bbox_keep_corner=False, clip_border=True, ), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='BottomupGetHeatmapMask', get_invalid=True), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="BottomupGetHeatmapMask", get_invalid=True), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] -data_mode = 'bottomup' -data_root = 'data/' +data_mode = "bottomup" +data_root = "data/" # train datasets dataset_coco = dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="coco/train2017/"), pipeline=train_pipeline_stage1, ) @@ -123,17 +102,16 @@ train_dataloader = dict( num_workers=8, persistent_workers=True, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=True), - dataset=dataset_coco) + sampler=dict(type="DefaultSampler", shuffle=True), + dataset=dataset_coco, +) val_pipeline = [ - dict(type='LoadImage'), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=input_size, pad_val=(114, 114, 114)), dict( - type='BottomupResize', input_size=input_size, pad_val=(114, 114, 114)), - dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'ori_shape', 'img_shape', - 'input_size', 'input_center', 'input_scale')) + type="PackPoseInputs", meta_keys=("id", "img_id", "img_path", "ori_shape", "img_shape", "input_size", "input_center", "input_scale") + ), ] val_dataloader = dict( @@ -142,52 +120,40 @@ val_dataloader = dict( persistent_workers=True, pin_memory=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json', - score_mode='bbox', - nms_mode='none', + type="CocoMetric", + ann_file=data_root + "coco/annotations/person_keypoints_val2017.json", + score_mode="bbox", + nms_mode="none", ) test_evaluator = val_evaluator # hooks custom_hooks = [ + dict(type="YOLOXPoseModeSwitchHook", num_last_epochs=20, new_train_pipeline=train_pipeline_stage2, priority=48), dict( - type='YOLOXPoseModeSwitchHook', - num_last_epochs=20, - new_train_pipeline=train_pipeline_stage2, - priority=48), - dict( - type='RTMOModeSwitchHook', + type="RTMOModeSwitchHook", epoch_attributes={ - 280: { - 'proxy_target_cc': True, - 'loss_mle.loss_weight': 5.0, - 'loss_oks.loss_weight': 10.0 - }, + 280: {"proxy_target_cc": True, "loss_mle.loss_weight": 5.0, "loss_oks.loss_weight": 10.0}, }, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - strict_load=False, - priority=49), + priority=48, + ), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, strict_load=False, priority=49), ] # model @@ -195,43 +161,35 @@ widen_factor = 0.5 deepen_factor = 0.33 model = dict( - type='BottomupPoseEstimator', - init_cfg=dict( - type='Kaiming', - layer='Conv2d', - a=2.23606797749979, - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu'), + type="BottomupPoseEstimator", + init_cfg=dict(type="Kaiming", layer="Conv2d", a=2.23606797749979, distribution="uniform", mode="fan_in", nonlinearity="leaky_relu"), data_preprocessor=dict( - type='PoseDataPreprocessor', + type="PoseDataPreprocessor", pad_size_divisor=32, mean=[0, 0, 0], std=[1, 1, 1], batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(480, 800), - size_divisor=32, - interval=1), - ]), + dict(type="BatchSyncRandomResize", random_size_range=(480, 800), size_divisor=32, interval=1), + ], + ), backbone=dict( - type='CSPDarknet', + type="CSPDarknet", deepen_factor=deepen_factor, widen_factor=widen_factor, out_indices=(2, 3, 4), spp_kernal_sizes=(5, 9, 13), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmdetection/v2.0/' - 'yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_' - '20211121_095711-4592a793.pth', - prefix='backbone.', - )), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/" + "yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_" + "20211121_095711-4592a793.pth", + prefix="backbone.", + ), + ), neck=dict( - type='HybridEncoder', + type="HybridEncoder", in_channels=[128, 256, 512], deepen_factor=deepen_factor, widen_factor=widen_factor, @@ -239,21 +197,20 @@ model = dict( output_indices=[1, 2], encoder_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=1024, - ffn_drop=0.0, - act_cfg=dict(type='GELU'))), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0, act_cfg=dict(type="GELU")), + ), projector=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[256, 256], kernel_size=1, out_channels=256, act_cfg=None, - norm_cfg=dict(type='BN'), - num_outs=2)), + norm_cfg=dict(type="BN"), + num_outs=2, + ), + ), head=dict( - type='RTMOHead', + type="RTMOHead", num_keypoints=17, featmap_strides=(16, 32), head_module_cfg=dict( @@ -264,60 +221,38 @@ model = dict( pose_vec_channels=256, widen_factor=widen_factor, stacked_convs=2, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), assigner=dict( - type='SimOTAAssigner', - dynamic_k_indicator='oks', - oks_calculator=dict(type='PoseOKS', metainfo=metafile), - use_keypoints_for_center=True), - prior_generator=dict( - type='MlvlPointGenerator', - centralize_points=True, - strides=[16, 32]), + type="SimOTAAssigner", + dynamic_k_indicator="oks", + oks_calculator=dict(type="PoseOKS", metainfo=metafile), + use_keypoints_for_center=True, + ), + prior_generator=dict(type="MlvlPointGenerator", centralize_points=True, strides=[16, 32]), dcc_cfg=dict( in_channels=256, feat_channels=128, num_bins=(192, 256), spe_channels=128, - gau_cfg=dict( - s=128, - expansion_factor=2, - dropout_rate=0.0, - drop_path=0.0, - act_fn='SiLU', - pos_enc='add')), + gau_cfg=dict(s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", pos_enc="add"), + ), overlaps_power=0.5, - loss_cls=dict( - type='VariFocalLoss', - reduction='sum', - use_target_weight=True, - loss_weight=1.0), - loss_bbox=dict( - type='IoULoss', - mode='square', - eps=1e-16, - reduction='sum', - loss_weight=5.0), - loss_oks=dict( - type='OKSLoss', - reduction='none', - metainfo=metafile, - loss_weight=30.0), - loss_vis=dict( - type='BCELoss', - use_target_weight=True, - reduction='mean', - loss_weight=1.0), + loss_cls=dict(type="VariFocalLoss", reduction="sum", use_target_weight=True, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", mode="square", eps=1e-16, reduction="sum", loss_weight=5.0), + loss_oks=dict(type="OKSLoss", reduction="none", metainfo=metafile, loss_weight=30.0), + loss_vis=dict(type="BCELoss", use_target_weight=True, reduction="mean", loss_weight=1.0), loss_mle=dict( - type='MLECCLoss', + type="MLECCLoss", use_target_weight=True, loss_weight=1.0, ), - loss_bbox_aux=dict(type='L1Loss', reduction='sum', loss_weight=1.0), + loss_bbox_aux=dict(type="L1Loss", reduction="sum", loss_weight=1.0), ), test_cfg=dict( input_size=input_size, score_thr=0.1, nms_thr=0.65, - )) + ), +) diff --git a/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-l_16xb16-700e_body7-crowdpose-640x640.py b/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-l_16xb16-700e_body7-crowdpose-640x640.py index 6ba9fbe04ce8dab52f50f690a7fdef1caa24e09d..dc397f6ef8e02dd960b5c011ebc14e85c2037a9e 100644 --- a/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-l_16xb16-700e_body7-crowdpose-640x640.py +++ b/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-l_16xb16-700e_body7-crowdpose-640x640.py @@ -1,118 +1,95 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=700, val_interval=50, dynamic_intervals=[(670, 1)]) auto_scale_lr = dict(base_batch_size=256) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', interval=50, max_keep_ckpts=3)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", interval=50, max_keep_ckpts=3)) optim_wrapper = dict( - type='OptimWrapper', - constructor='ForceDefaultOptimWrapperConstructor', - optimizer=dict(type='AdamW', lr=0.004, weight_decay=0.05), + type="OptimWrapper", + constructor="ForceDefaultOptimWrapperConstructor", + optimizer=dict(type="AdamW", lr=0.004, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True, force_default_settings=True, - custom_keys=dict({'neck.encoder': dict(lr_mult=0.05)})), - clip_grad=dict(max_norm=0.1, norm_type=2)) + custom_keys=dict({"neck.encoder": dict(lr_mult=0.05)}), + ), + clip_grad=dict(max_norm=0.1, norm_type=2), +) param_scheduler = [ - dict( - type='QuadraticWarmupLR', - by_epoch=True, - begin=0, - end=5, - convert_to_iter_based=True), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=5, - T_max=350, - end=349, - by_epoch=True, - convert_to_iter_based=True), + dict(type="QuadraticWarmupLR", by_epoch=True, begin=0, end=5, convert_to_iter_based=True), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=5, T_max=350, end=349, by_epoch=True, convert_to_iter_based=True), # this scheduler is used to increase the lr from 2e-4 to 5e-4 - dict(type='ConstantLR', by_epoch=True, factor=2.5, begin=349, end=350), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=350, - T_max=320, - end=670, - by_epoch=True, - convert_to_iter_based=True), - dict(type='ConstantLR', by_epoch=True, factor=1, begin=670, end=700), + dict(type="ConstantLR", by_epoch=True, factor=2.5, begin=349, end=350), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=350, T_max=320, end=670, by_epoch=True, convert_to_iter_based=True), + dict(type="ConstantLR", by_epoch=True, factor=1, begin=670, end=700), ] # data input_size = (640, 640) -metafile = 'configs/_base_/datasets/crowdpose.py' -codec = dict(type='YOLOXPoseAnnotationProcessor', input_size=input_size) +metafile = "configs/_base_/datasets/crowdpose.py" +codec = dict(type="YOLOXPoseAnnotationProcessor", input_size=input_size) train_pipeline_stage1 = [ - dict(type='LoadImage', backend_args=None), + dict(type="LoadImage", backend_args=None), + dict(type="Mosaic", img_scale=(640, 640), pad_val=114.0, pre_transform=[dict(type="LoadImage", backend_args=None)]), dict( - type='Mosaic', - img_scale=(640, 640), - pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_factor=0.1, rotate_factor=10, scale_factor=(0.75, 1.0), pad_val=114, - distribution='uniform', - transform_mode='perspective', + distribution="uniform", + transform_mode="perspective", bbox_keep_corner=False, clip_border=True, ), dict( - type='YOLOXMixUp', + type="YOLOXMixUp", img_scale=(640, 640), ratio_range=(0.8, 1.6), pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + pre_transform=[dict(type="LoadImage", backend_args=None)], + ), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage'), + dict(type="LoadImage"), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_prob=0, rotate_prob=0, scale_prob=0, - scale_type='long', + scale_type="long", pad_val=(114, 114, 114), bbox_keep_corner=False, clip_border=True, ), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='BottomupGetHeatmapMask', get_invalid=True), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="BottomupGetHeatmapMask", get_invalid=True), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data settings -data_mode = 'bottomup' -data_root = 'data/' +data_mode = "bottomup" +data_root = "data/" # mapping -aic_crowdpose = [(3, 0), (0, 1), (4, 2), (1, 3), (5, 4), (2, 5), - (9, 6), (6, 7), (10, 8), (7, 9), (11, 10), (8, 11), (12, 12), - (13, 13)] +aic_crowdpose = [(3, 0), (0, 1), (4, 2), (1, 3), (5, 4), (2, 5), (9, 6), (6, 7), (10, 8), (7, 9), (11, 10), (8, 11), (12, 12), (13, 13)] coco_crowdpose = [ (5, 0), @@ -146,9 +123,7 @@ mpii_crowdpose = [ (7, 13), ] -jhmdb_crowdpose = [(4, 0), (3, 1), (8, 2), (7, 3), (12, 4), (11, 5), (6, 6), - (5, 7), (10, 8), (9, 9), (14, 10), (13, 11), (2, 12), - (0, 13)] +jhmdb_crowdpose = [(4, 0), (3, 1), (8, 2), (7, 3), (12, 4), (11, 5), (6, 6), (5, 7), (10, 8), (9, 9), (14, 10), (13, 11), (2, 12), (0, 13)] halpe_crowdpose = [ (5, 0), @@ -184,100 +159,70 @@ posetrack_crowdpose = [ # train datasets dataset_coco = dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=14, mapping=coco_crowdpose) - ], + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=14, mapping=coco_crowdpose)], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=14, mapping=aic_crowdpose) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=14, mapping=aic_crowdpose)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=14, - mapping=[(i, i) for i in range(14)]) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=14, mapping=[(i, i) for i in range(14)])], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=14, mapping=mpii_crowdpose) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=14, mapping=mpii_crowdpose)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=14, - mapping=jhmdb_crowdpose) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=14, mapping=jhmdb_crowdpose)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=14, - mapping=halpe_crowdpose) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=14, mapping=halpe_crowdpose)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=14, - mapping=posetrack_crowdpose) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=14, mapping=posetrack_crowdpose)], ) train_dataset_stage1 = dict( - type='CombinedDataset', + type="CombinedDataset", metainfo=dict(from_file=metafile), datasets=[ dataset_coco, @@ -290,25 +235,25 @@ train_dataset_stage1 = dict( ], sample_ratio_factor=[1, 0.3, 1, 0.3, 0.3, 0.4, 0.3], test_mode=False, - pipeline=train_pipeline_stage1) + pipeline=train_pipeline_stage1, +) train_dataloader = dict( batch_size=16, num_workers=8, persistent_workers=True, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=True), - dataset=train_dataset_stage1) + sampler=dict(type="DefaultSampler", shuffle=True), + dataset=train_dataset_stage1, +) # val datasets val_pipeline = [ - dict(type='LoadImage'), - dict( - type='BottomupResize', input_size=input_size, pad_val=(114, 114, 114)), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=input_size, pad_val=(114, 114, 114)), dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'ori_shape', 'img_shape', - 'input_size', 'input_center', 'input_scale')) + type="PackPoseInputs", meta_keys=("id", "img_id", "img_path", "ori_shape", "img_shape", "input_size", "input_center", "input_scale") + ), ] val_dataloader = dict( @@ -317,25 +262,26 @@ val_dataloader = dict( persistent_workers=True, pin_memory=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - score_mode='bbox', - nms_mode='none', - iou_type='keypoints_crowd', - prefix='crowdpose', + type="CocoMetric", + score_mode="bbox", + nms_mode="none", + iou_type="keypoints_crowd", + prefix="crowdpose", use_area=False, ) test_evaluator = val_evaluator @@ -343,31 +289,27 @@ test_evaluator = val_evaluator # hooks custom_hooks = [ dict( - type='YOLOXPoseModeSwitchHook', + type="YOLOXPoseModeSwitchHook", num_last_epochs=30, new_train_dataset=dataset_crowdpose, new_train_pipeline=train_pipeline_stage2, - priority=48), + priority=48, + ), dict( - type='RTMOModeSwitchHook', + type="RTMOModeSwitchHook", epoch_attributes={ 350: { - 'proxy_target_cc': True, - 'overlaps_power': 1.0, - 'loss_cls.loss_weight': 2.0, - 'loss_mle.loss_weight': 5.0, - 'loss_oks.loss_weight': 10.0 + "proxy_target_cc": True, + "overlaps_power": 1.0, + "loss_cls.loss_weight": 2.0, + "loss_mle.loss_weight": 5.0, + "loss_oks.loss_weight": 10.0, }, }, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - strict_load=False, - priority=49), + priority=48, + ), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, strict_load=False, priority=49), ] # model @@ -375,43 +317,35 @@ widen_factor = 1.0 deepen_factor = 1.0 model = dict( - type='BottomupPoseEstimator', - init_cfg=dict( - type='Kaiming', - layer='Conv2d', - a=2.23606797749979, - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu'), + type="BottomupPoseEstimator", + init_cfg=dict(type="Kaiming", layer="Conv2d", a=2.23606797749979, distribution="uniform", mode="fan_in", nonlinearity="leaky_relu"), data_preprocessor=dict( - type='PoseDataPreprocessor', + type="PoseDataPreprocessor", pad_size_divisor=32, mean=[0, 0, 0], std=[1, 1, 1], batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(480, 800), - size_divisor=32, - interval=1), - ]), + dict(type="BatchSyncRandomResize", random_size_range=(480, 800), size_divisor=32, interval=1), + ], + ), backbone=dict( - type='CSPDarknet', + type="CSPDarknet", deepen_factor=deepen_factor, widen_factor=widen_factor, out_indices=(2, 3, 4), spp_kernal_sizes=(5, 9, 13), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmdetection/v2.0/' - 'yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco' - '_20211126_140236-d3bd2b23.pth', - prefix='backbone.', - )), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/" + "yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco" + "_20211126_140236-d3bd2b23.pth", + prefix="backbone.", + ), + ), neck=dict( - type='HybridEncoder', + type="HybridEncoder", in_channels=[256, 512, 1024], deepen_factor=deepen_factor, widen_factor=widen_factor, @@ -419,21 +353,20 @@ model = dict( output_indices=[1, 2], encoder_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=1024, - ffn_drop=0.0, - act_cfg=dict(type='GELU'))), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0, act_cfg=dict(type="GELU")), + ), projector=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[256, 256], kernel_size=1, out_channels=512, act_cfg=None, - norm_cfg=dict(type='BN'), - num_outs=2)), + norm_cfg=dict(type="BN"), + num_outs=2, + ), + ), head=dict( - type='RTMOHead', + type="RTMOHead", num_keypoints=14, featmap_strides=(16, 32), head_module_cfg=dict( @@ -444,59 +377,33 @@ model = dict( pose_vec_channels=512, widen_factor=widen_factor, stacked_convs=2, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), - assigner=dict( - type='SimOTAAssigner', - dynamic_k_indicator='oks', - oks_calculator=dict(type='PoseOKS', metainfo=metafile)), - prior_generator=dict( - type='MlvlPointGenerator', - centralize_points=True, - strides=[16, 32]), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), + assigner=dict(type="SimOTAAssigner", dynamic_k_indicator="oks", oks_calculator=dict(type="PoseOKS", metainfo=metafile)), + prior_generator=dict(type="MlvlPointGenerator", centralize_points=True, strides=[16, 32]), dcc_cfg=dict( in_channels=512, feat_channels=128, num_bins=(192, 256), spe_channels=128, - gau_cfg=dict( - s=128, - expansion_factor=2, - dropout_rate=0.0, - drop_path=0.0, - act_fn='SiLU', - pos_enc='add')), + gau_cfg=dict(s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", pos_enc="add"), + ), overlaps_power=0.5, - loss_cls=dict( - type='VariFocalLoss', - reduction='sum', - use_target_weight=True, - loss_weight=1.0), - loss_bbox=dict( - type='IoULoss', - mode='square', - eps=1e-16, - reduction='sum', - loss_weight=5.0), - loss_oks=dict( - type='OKSLoss', - reduction='none', - metainfo=metafile, - loss_weight=30.0), - loss_vis=dict( - type='BCELoss', - use_target_weight=True, - reduction='mean', - loss_weight=1.0), + loss_cls=dict(type="VariFocalLoss", reduction="sum", use_target_weight=True, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", mode="square", eps=1e-16, reduction="sum", loss_weight=5.0), + loss_oks=dict(type="OKSLoss", reduction="none", metainfo=metafile, loss_weight=30.0), + loss_vis=dict(type="BCELoss", use_target_weight=True, reduction="mean", loss_weight=1.0), loss_mle=dict( - type='MLECCLoss', + type="MLECCLoss", use_target_weight=True, loss_weight=1e-3, ), - loss_bbox_aux=dict(type='L1Loss', reduction='sum', loss_weight=1.0), + loss_bbox_aux=dict(type="L1Loss", reduction="sum", loss_weight=1.0), ), test_cfg=dict( input_size=input_size, score_thr=0.1, nms_thr=0.65, - )) + ), +) diff --git a/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-l_16xb16-700e_crowdpose-640x640.py b/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-l_16xb16-700e_crowdpose-640x640.py index 6b2c78b5a3032b2df8a5ca200788939517926dcb..8fe5a18279afee7d94320aa84b4288898e6c3a76 100644 --- a/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-l_16xb16-700e_crowdpose-640x640.py +++ b/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-l_16xb16-700e_crowdpose-640x640.py @@ -1,120 +1,99 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=700, val_interval=50, dynamic_intervals=[(670, 1)]) auto_scale_lr = dict(base_batch_size=256) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', interval=50, max_keep_ckpts=3)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", interval=50, max_keep_ckpts=3)) optim_wrapper = dict( - type='OptimWrapper', - constructor='ForceDefaultOptimWrapperConstructor', - optimizer=dict(type='AdamW', lr=0.004, weight_decay=0.05), + type="OptimWrapper", + constructor="ForceDefaultOptimWrapperConstructor", + optimizer=dict(type="AdamW", lr=0.004, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True, force_default_settings=True, - custom_keys=dict({'neck.encoder': dict(lr_mult=0.05)})), - clip_grad=dict(max_norm=0.1, norm_type=2)) + custom_keys=dict({"neck.encoder": dict(lr_mult=0.05)}), + ), + clip_grad=dict(max_norm=0.1, norm_type=2), +) param_scheduler = [ - dict( - type='QuadraticWarmupLR', - by_epoch=True, - begin=0, - end=5, - convert_to_iter_based=True), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=5, - T_max=350, - end=349, - by_epoch=True, - convert_to_iter_based=True), + dict(type="QuadraticWarmupLR", by_epoch=True, begin=0, end=5, convert_to_iter_based=True), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=5, T_max=350, end=349, by_epoch=True, convert_to_iter_based=True), # this scheduler is used to increase the lr from 2e-4 to 5e-4 - dict(type='ConstantLR', by_epoch=True, factor=2.5, begin=349, end=350), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=350, - T_max=320, - end=670, - by_epoch=True, - convert_to_iter_based=True), - dict(type='ConstantLR', by_epoch=True, factor=1, begin=670, end=700), + dict(type="ConstantLR", by_epoch=True, factor=2.5, begin=349, end=350), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=350, T_max=320, end=670, by_epoch=True, convert_to_iter_based=True), + dict(type="ConstantLR", by_epoch=True, factor=1, begin=670, end=700), ] # data input_size = (640, 640) -metafile = 'configs/_base_/datasets/crowdpose.py' -codec = dict(type='YOLOXPoseAnnotationProcessor', input_size=input_size) +metafile = "configs/_base_/datasets/crowdpose.py" +codec = dict(type="YOLOXPoseAnnotationProcessor", input_size=input_size) train_pipeline_stage1 = [ - dict(type='LoadImage', backend_args=None), - dict( - type='Mosaic', - img_scale=(640, 640), - pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), + dict(type="LoadImage", backend_args=None), + dict(type="Mosaic", img_scale=(640, 640), pad_val=114.0, pre_transform=[dict(type="LoadImage", backend_args=None)]), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_factor=0.2, rotate_factor=30, scale_factor=(0.5, 1.5), pad_val=114, - distribution='uniform', - transform_mode='perspective', + distribution="uniform", + transform_mode="perspective", bbox_keep_corner=False, clip_border=True, ), dict( - type='YOLOXMixUp', + type="YOLOXMixUp", img_scale=(640, 640), ratio_range=(0.6, 1.6), pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + pre_transform=[dict(type="LoadImage", backend_args=None)], + ), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage'), + dict(type="LoadImage"), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_prob=0, rotate_prob=0, scale_prob=0, - scale_type='long', + scale_type="long", pad_val=(114, 114, 114), bbox_keep_corner=False, clip_border=True, ), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='BottomupGetHeatmapMask', get_invalid=True), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="BottomupGetHeatmapMask", get_invalid=True), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] -data_mode = 'bottomup' -data_root = 'data/' +data_mode = "bottomup" +data_root = "data/" # train datasets dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), pipeline=train_pipeline_stage1, ) @@ -123,17 +102,16 @@ train_dataloader = dict( num_workers=8, persistent_workers=True, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=True), - dataset=dataset_crowdpose) + sampler=dict(type="DefaultSampler", shuffle=True), + dataset=dataset_crowdpose, +) val_pipeline = [ - dict(type='LoadImage'), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=input_size, pad_val=(114, 114, 114)), dict( - type='BottomupResize', input_size=input_size, pad_val=(114, 114, 114)), - dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'ori_shape', 'img_shape', - 'input_size', 'input_center', 'input_scale')) + type="PackPoseInputs", meta_keys=("id", "img_id", "img_path", "ori_shape", "img_shape", "input_size", "input_center", "input_scale") + ), ] val_dataloader = dict( @@ -142,56 +120,48 @@ val_dataloader = dict( persistent_workers=True, pin_memory=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - score_mode='bbox', - nms_mode='none', - iou_type='keypoints_crowd', - prefix='crowdpose', + type="CocoMetric", + score_mode="bbox", + nms_mode="none", + iou_type="keypoints_crowd", + prefix="crowdpose", use_area=False, ) test_evaluator = val_evaluator # hooks custom_hooks = [ + dict(type="YOLOXPoseModeSwitchHook", num_last_epochs=30, new_train_pipeline=train_pipeline_stage2, priority=48), dict( - type='YOLOXPoseModeSwitchHook', - num_last_epochs=30, - new_train_pipeline=train_pipeline_stage2, - priority=48), - dict( - type='RTMOModeSwitchHook', + type="RTMOModeSwitchHook", epoch_attributes={ 350: { - 'proxy_target_cc': True, - 'overlaps_power': 1.0, - 'loss_cls.loss_weight': 2.0, - 'loss_mle.loss_weight': 5.0, - 'loss_oks.loss_weight': 10.0 + "proxy_target_cc": True, + "overlaps_power": 1.0, + "loss_cls.loss_weight": 2.0, + "loss_mle.loss_weight": 5.0, + "loss_oks.loss_weight": 10.0, }, }, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - strict_load=False, - priority=49), + priority=48, + ), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, strict_load=False, priority=49), ] # model @@ -199,43 +169,35 @@ widen_factor = 1.0 deepen_factor = 1.0 model = dict( - type='BottomupPoseEstimator', - init_cfg=dict( - type='Kaiming', - layer='Conv2d', - a=2.23606797749979, - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu'), + type="BottomupPoseEstimator", + init_cfg=dict(type="Kaiming", layer="Conv2d", a=2.23606797749979, distribution="uniform", mode="fan_in", nonlinearity="leaky_relu"), data_preprocessor=dict( - type='PoseDataPreprocessor', + type="PoseDataPreprocessor", pad_size_divisor=32, mean=[0, 0, 0], std=[1, 1, 1], batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(480, 800), - size_divisor=32, - interval=1), - ]), + dict(type="BatchSyncRandomResize", random_size_range=(480, 800), size_divisor=32, interval=1), + ], + ), backbone=dict( - type='CSPDarknet', + type="CSPDarknet", deepen_factor=deepen_factor, widen_factor=widen_factor, out_indices=(2, 3, 4), spp_kernal_sizes=(5, 9, 13), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmdetection/v2.0/' - 'yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco' - '_20211126_140236-d3bd2b23.pth', - prefix='backbone.', - )), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/" + "yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco" + "_20211126_140236-d3bd2b23.pth", + prefix="backbone.", + ), + ), neck=dict( - type='HybridEncoder', + type="HybridEncoder", in_channels=[256, 512, 1024], deepen_factor=deepen_factor, widen_factor=widen_factor, @@ -243,21 +205,20 @@ model = dict( output_indices=[1, 2], encoder_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=1024, - ffn_drop=0.0, - act_cfg=dict(type='GELU'))), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0, act_cfg=dict(type="GELU")), + ), projector=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[256, 256], kernel_size=1, out_channels=512, act_cfg=None, - norm_cfg=dict(type='BN'), - num_outs=2)), + norm_cfg=dict(type="BN"), + num_outs=2, + ), + ), head=dict( - type='RTMOHead', + type="RTMOHead", num_keypoints=14, featmap_strides=(16, 32), head_module_cfg=dict( @@ -268,59 +229,33 @@ model = dict( pose_vec_channels=512, widen_factor=widen_factor, stacked_convs=2, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), - assigner=dict( - type='SimOTAAssigner', - dynamic_k_indicator='oks', - oks_calculator=dict(type='PoseOKS', metainfo=metafile)), - prior_generator=dict( - type='MlvlPointGenerator', - centralize_points=True, - strides=[16, 32]), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), + assigner=dict(type="SimOTAAssigner", dynamic_k_indicator="oks", oks_calculator=dict(type="PoseOKS", metainfo=metafile)), + prior_generator=dict(type="MlvlPointGenerator", centralize_points=True, strides=[16, 32]), dcc_cfg=dict( in_channels=512, feat_channels=128, num_bins=(192, 256), spe_channels=128, - gau_cfg=dict( - s=128, - expansion_factor=2, - dropout_rate=0.0, - drop_path=0.0, - act_fn='SiLU', - pos_enc='add')), + gau_cfg=dict(s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", pos_enc="add"), + ), overlaps_power=0.5, - loss_cls=dict( - type='VariFocalLoss', - reduction='sum', - use_target_weight=True, - loss_weight=1.0), - loss_bbox=dict( - type='IoULoss', - mode='square', - eps=1e-16, - reduction='sum', - loss_weight=5.0), - loss_oks=dict( - type='OKSLoss', - reduction='none', - metainfo=metafile, - loss_weight=30.0), - loss_vis=dict( - type='BCELoss', - use_target_weight=True, - reduction='mean', - loss_weight=1.0), + loss_cls=dict(type="VariFocalLoss", reduction="sum", use_target_weight=True, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", mode="square", eps=1e-16, reduction="sum", loss_weight=5.0), + loss_oks=dict(type="OKSLoss", reduction="none", metainfo=metafile, loss_weight=30.0), + loss_vis=dict(type="BCELoss", use_target_weight=True, reduction="mean", loss_weight=1.0), loss_mle=dict( - type='MLECCLoss', + type="MLECCLoss", use_target_weight=True, loss_weight=1e-3, ), - loss_bbox_aux=dict(type='L1Loss', reduction='sum', loss_weight=1.0), + loss_bbox_aux=dict(type="L1Loss", reduction="sum", loss_weight=1.0), ), test_cfg=dict( input_size=input_size, score_thr=0.1, nms_thr=0.65, - )) + ), +) diff --git a/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-m_16xb16-700e_crowdpose-640x640.py b/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-m_16xb16-700e_crowdpose-640x640.py index af8da87942c89a40117309db4fc4067235693eb9..05e0f03854faad86bd52123b6603b449d843bd19 100644 --- a/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-m_16xb16-700e_crowdpose-640x640.py +++ b/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-m_16xb16-700e_crowdpose-640x640.py @@ -1,120 +1,99 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=700, val_interval=50, dynamic_intervals=[(670, 1)]) auto_scale_lr = dict(base_batch_size=256) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', interval=50, max_keep_ckpts=3)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", interval=50, max_keep_ckpts=3)) optim_wrapper = dict( - type='OptimWrapper', - constructor='ForceDefaultOptimWrapperConstructor', - optimizer=dict(type='AdamW', lr=0.004, weight_decay=0.05), + type="OptimWrapper", + constructor="ForceDefaultOptimWrapperConstructor", + optimizer=dict(type="AdamW", lr=0.004, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True, force_default_settings=True, - custom_keys=dict({'neck.encoder': dict(lr_mult=0.05)})), - clip_grad=dict(max_norm=0.1, norm_type=2)) + custom_keys=dict({"neck.encoder": dict(lr_mult=0.05)}), + ), + clip_grad=dict(max_norm=0.1, norm_type=2), +) param_scheduler = [ - dict( - type='QuadraticWarmupLR', - by_epoch=True, - begin=0, - end=5, - convert_to_iter_based=True), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=5, - T_max=350, - end=349, - by_epoch=True, - convert_to_iter_based=True), + dict(type="QuadraticWarmupLR", by_epoch=True, begin=0, end=5, convert_to_iter_based=True), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=5, T_max=350, end=349, by_epoch=True, convert_to_iter_based=True), # this scheduler is used to increase the lr from 2e-4 to 5e-4 - dict(type='ConstantLR', by_epoch=True, factor=2.5, begin=349, end=350), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=350, - T_max=320, - end=670, - by_epoch=True, - convert_to_iter_based=True), - dict(type='ConstantLR', by_epoch=True, factor=1, begin=670, end=700), + dict(type="ConstantLR", by_epoch=True, factor=2.5, begin=349, end=350), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=350, T_max=320, end=670, by_epoch=True, convert_to_iter_based=True), + dict(type="ConstantLR", by_epoch=True, factor=1, begin=670, end=700), ] # data input_size = (640, 640) -metafile = 'configs/_base_/datasets/crowdpose.py' -codec = dict(type='YOLOXPoseAnnotationProcessor', input_size=input_size) +metafile = "configs/_base_/datasets/crowdpose.py" +codec = dict(type="YOLOXPoseAnnotationProcessor", input_size=input_size) train_pipeline_stage1 = [ - dict(type='LoadImage', backend_args=None), - dict( - type='Mosaic', - img_scale=(640, 640), - pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), + dict(type="LoadImage", backend_args=None), + dict(type="Mosaic", img_scale=(640, 640), pad_val=114.0, pre_transform=[dict(type="LoadImage", backend_args=None)]), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_factor=0.2, rotate_factor=30, scale_factor=(0.5, 1.5), pad_val=114, - distribution='uniform', - transform_mode='perspective', + distribution="uniform", + transform_mode="perspective", bbox_keep_corner=False, clip_border=True, ), dict( - type='YOLOXMixUp', + type="YOLOXMixUp", img_scale=(640, 640), ratio_range=(0.6, 1.6), pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + pre_transform=[dict(type="LoadImage", backend_args=None)], + ), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage'), + dict(type="LoadImage"), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_prob=0, rotate_prob=0, scale_prob=0, - scale_type='long', + scale_type="long", pad_val=(114, 114, 114), bbox_keep_corner=False, clip_border=True, ), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='BottomupGetHeatmapMask', get_invalid=True), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="BottomupGetHeatmapMask", get_invalid=True), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] -data_mode = 'bottomup' -data_root = 'data/' +data_mode = "bottomup" +data_root = "data/" # train datasets dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), pipeline=train_pipeline_stage1, ) @@ -123,17 +102,16 @@ train_dataloader = dict( num_workers=8, persistent_workers=True, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=True), - dataset=dataset_crowdpose) + sampler=dict(type="DefaultSampler", shuffle=True), + dataset=dataset_crowdpose, +) val_pipeline = [ - dict(type='LoadImage'), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=input_size, pad_val=(114, 114, 114)), dict( - type='BottomupResize', input_size=input_size, pad_val=(114, 114, 114)), - dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'ori_shape', 'img_shape', - 'input_size', 'input_center', 'input_scale')) + type="PackPoseInputs", meta_keys=("id", "img_id", "img_path", "ori_shape", "img_shape", "input_size", "input_center", "input_scale") + ), ] val_dataloader = dict( @@ -142,56 +120,48 @@ val_dataloader = dict( persistent_workers=True, pin_memory=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - score_mode='bbox', - nms_mode='none', - iou_type='keypoints_crowd', - prefix='crowdpose', + type="CocoMetric", + score_mode="bbox", + nms_mode="none", + iou_type="keypoints_crowd", + prefix="crowdpose", use_area=False, ) test_evaluator = val_evaluator # hooks custom_hooks = [ + dict(type="YOLOXPoseModeSwitchHook", num_last_epochs=30, new_train_pipeline=train_pipeline_stage2, priority=48), dict( - type='YOLOXPoseModeSwitchHook', - num_last_epochs=30, - new_train_pipeline=train_pipeline_stage2, - priority=48), - dict( - type='RTMOModeSwitchHook', + type="RTMOModeSwitchHook", epoch_attributes={ 350: { - 'proxy_target_cc': True, - 'overlaps_power': 1.0, - 'loss_cls.loss_weight': 2.0, - 'loss_mle.loss_weight': 5.0, - 'loss_oks.loss_weight': 10.0 + "proxy_target_cc": True, + "overlaps_power": 1.0, + "loss_cls.loss_weight": 2.0, + "loss_mle.loss_weight": 5.0, + "loss_oks.loss_weight": 10.0, }, }, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - strict_load=False, - priority=49), + priority=48, + ), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, strict_load=False, priority=49), ] # model @@ -199,42 +169,33 @@ widen_factor = 0.75 deepen_factor = 0.67 model = dict( - type='BottomupPoseEstimator', - init_cfg=dict( - type='Kaiming', - layer='Conv2d', - a=2.23606797749979, - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu'), + type="BottomupPoseEstimator", + init_cfg=dict(type="Kaiming", layer="Conv2d", a=2.23606797749979, distribution="uniform", mode="fan_in", nonlinearity="leaky_relu"), data_preprocessor=dict( - type='PoseDataPreprocessor', + type="PoseDataPreprocessor", pad_size_divisor=32, mean=[0, 0, 0], std=[1, 1, 1], batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(480, 800), - size_divisor=32, - interval=1), - ]), + dict(type="BatchSyncRandomResize", random_size_range=(480, 800), size_divisor=32, interval=1), + ], + ), backbone=dict( - type='CSPDarknet', + type="CSPDarknet", deepen_factor=deepen_factor, widen_factor=widen_factor, out_indices=(2, 3, 4), spp_kernal_sizes=(5, 9, 13), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/v1/' - 'pretrained_models/yolox_m_8x8_300e_coco_20230829.pth', - prefix='backbone.', - )), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmpose/v1/" "pretrained_models/yolox_m_8x8_300e_coco_20230829.pth", + prefix="backbone.", + ), + ), neck=dict( - type='HybridEncoder', + type="HybridEncoder", in_channels=[192, 384, 768], deepen_factor=deepen_factor, widen_factor=widen_factor, @@ -242,21 +203,20 @@ model = dict( output_indices=[1, 2], encoder_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=1024, - ffn_drop=0.0, - act_cfg=dict(type='GELU'))), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0, act_cfg=dict(type="GELU")), + ), projector=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[256, 256], kernel_size=1, out_channels=384, act_cfg=None, - norm_cfg=dict(type='BN'), - num_outs=2)), + norm_cfg=dict(type="BN"), + num_outs=2, + ), + ), head=dict( - type='RTMOHead', + type="RTMOHead", num_keypoints=14, featmap_strides=(16, 32), head_module_cfg=dict( @@ -267,59 +227,33 @@ model = dict( pose_vec_channels=384, widen_factor=widen_factor, stacked_convs=2, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), - assigner=dict( - type='SimOTAAssigner', - dynamic_k_indicator='oks', - oks_calculator=dict(type='PoseOKS', metainfo=metafile)), - prior_generator=dict( - type='MlvlPointGenerator', - centralize_points=True, - strides=[16, 32]), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), + assigner=dict(type="SimOTAAssigner", dynamic_k_indicator="oks", oks_calculator=dict(type="PoseOKS", metainfo=metafile)), + prior_generator=dict(type="MlvlPointGenerator", centralize_points=True, strides=[16, 32]), dcc_cfg=dict( in_channels=384, feat_channels=128, num_bins=(192, 256), spe_channels=128, - gau_cfg=dict( - s=128, - expansion_factor=2, - dropout_rate=0.0, - drop_path=0.0, - act_fn='SiLU', - pos_enc='add')), + gau_cfg=dict(s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", pos_enc="add"), + ), overlaps_power=0.5, - loss_cls=dict( - type='VariFocalLoss', - reduction='sum', - use_target_weight=True, - loss_weight=1.0), - loss_bbox=dict( - type='IoULoss', - mode='square', - eps=1e-16, - reduction='sum', - loss_weight=5.0), - loss_oks=dict( - type='OKSLoss', - reduction='none', - metainfo=metafile, - loss_weight=30.0), - loss_vis=dict( - type='BCELoss', - use_target_weight=True, - reduction='mean', - loss_weight=1.0), + loss_cls=dict(type="VariFocalLoss", reduction="sum", use_target_weight=True, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", mode="square", eps=1e-16, reduction="sum", loss_weight=5.0), + loss_oks=dict(type="OKSLoss", reduction="none", metainfo=metafile, loss_weight=30.0), + loss_vis=dict(type="BCELoss", use_target_weight=True, reduction="mean", loss_weight=1.0), loss_mle=dict( - type='MLECCLoss', + type="MLECCLoss", use_target_weight=True, loss_weight=1e-3, ), - loss_bbox_aux=dict(type='L1Loss', reduction='sum', loss_weight=1.0), + loss_bbox_aux=dict(type="L1Loss", reduction="sum", loss_weight=1.0), ), test_cfg=dict( input_size=input_size, score_thr=0.1, nms_thr=0.65, - )) + ), +) diff --git a/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-s_8xb32-700e_crowdpose-640x640.py b/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-s_8xb32-700e_crowdpose-640x640.py index 288da890e88e77d34e923ed420bf5bb40ffdb3d5..0ea9a7108f4cce176648a5eb0245e1963259f16d 100644 --- a/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-s_8xb32-700e_crowdpose-640x640.py +++ b/mmpose/configs/body_2d_keypoint/rtmo/crowdpose/rtmo-s_8xb32-700e_crowdpose-640x640.py @@ -1,120 +1,99 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=700, val_interval=50, dynamic_intervals=[(670, 1)]) auto_scale_lr = dict(base_batch_size=256) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', interval=50, max_keep_ckpts=3)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", interval=50, max_keep_ckpts=3)) optim_wrapper = dict( - type='OptimWrapper', - constructor='ForceDefaultOptimWrapperConstructor', - optimizer=dict(type='AdamW', lr=0.004, weight_decay=0.05), + type="OptimWrapper", + constructor="ForceDefaultOptimWrapperConstructor", + optimizer=dict(type="AdamW", lr=0.004, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True, force_default_settings=True, - custom_keys=dict({'neck.encoder': dict(lr_mult=0.05)})), - clip_grad=dict(max_norm=0.1, norm_type=2)) + custom_keys=dict({"neck.encoder": dict(lr_mult=0.05)}), + ), + clip_grad=dict(max_norm=0.1, norm_type=2), +) param_scheduler = [ - dict( - type='QuadraticWarmupLR', - by_epoch=True, - begin=0, - end=5, - convert_to_iter_based=True), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=5, - T_max=350, - end=349, - by_epoch=True, - convert_to_iter_based=True), + dict(type="QuadraticWarmupLR", by_epoch=True, begin=0, end=5, convert_to_iter_based=True), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=5, T_max=350, end=349, by_epoch=True, convert_to_iter_based=True), # this scheduler is used to increase the lr from 2e-4 to 5e-4 - dict(type='ConstantLR', by_epoch=True, factor=2.5, begin=349, end=350), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=350, - T_max=320, - end=670, - by_epoch=True, - convert_to_iter_based=True), - dict(type='ConstantLR', by_epoch=True, factor=1, begin=670, end=700), + dict(type="ConstantLR", by_epoch=True, factor=2.5, begin=349, end=350), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=350, T_max=320, end=670, by_epoch=True, convert_to_iter_based=True), + dict(type="ConstantLR", by_epoch=True, factor=1, begin=670, end=700), ] # data input_size = (640, 640) -metafile = 'configs/_base_/datasets/crowdpose.py' -codec = dict(type='YOLOXPoseAnnotationProcessor', input_size=input_size) +metafile = "configs/_base_/datasets/crowdpose.py" +codec = dict(type="YOLOXPoseAnnotationProcessor", input_size=input_size) train_pipeline_stage1 = [ - dict(type='LoadImage', backend_args=None), - dict( - type='Mosaic', - img_scale=(640, 640), - pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), + dict(type="LoadImage", backend_args=None), + dict(type="Mosaic", img_scale=(640, 640), pad_val=114.0, pre_transform=[dict(type="LoadImage", backend_args=None)]), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_factor=0.2, rotate_factor=30, scale_factor=(0.5, 1.5), pad_val=114, - distribution='uniform', - transform_mode='perspective', + distribution="uniform", + transform_mode="perspective", bbox_keep_corner=False, clip_border=True, ), dict( - type='YOLOXMixUp', + type="YOLOXMixUp", img_scale=(640, 640), ratio_range=(0.6, 1.6), pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + pre_transform=[dict(type="LoadImage", backend_args=None)], + ), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage'), + dict(type="LoadImage"), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_prob=0, rotate_prob=0, scale_prob=0, - scale_type='long', + scale_type="long", pad_val=(114, 114, 114), bbox_keep_corner=False, clip_border=True, ), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='BottomupGetHeatmapMask', get_invalid=True), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="BottomupGetHeatmapMask", get_invalid=True), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] -data_mode = 'bottomup' -data_root = 'data/' +data_mode = "bottomup" +data_root = "data/" # train datasets dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), pipeline=train_pipeline_stage1, ) @@ -123,17 +102,16 @@ train_dataloader = dict( num_workers=8, persistent_workers=True, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=True), - dataset=dataset_crowdpose) + sampler=dict(type="DefaultSampler", shuffle=True), + dataset=dataset_crowdpose, +) val_pipeline = [ - dict(type='LoadImage'), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=input_size, pad_val=(114, 114, 114)), dict( - type='BottomupResize', input_size=input_size, pad_val=(114, 114, 114)), - dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'ori_shape', 'img_shape', - 'input_size', 'input_center', 'input_scale')) + type="PackPoseInputs", meta_keys=("id", "img_id", "img_path", "ori_shape", "img_shape", "input_size", "input_center", "input_scale") + ), ] val_dataloader = dict( @@ -142,56 +120,48 @@ val_dataloader = dict( persistent_workers=True, pin_memory=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - score_mode='bbox', - nms_mode='none', - iou_type='keypoints_crowd', - prefix='crowdpose', + type="CocoMetric", + score_mode="bbox", + nms_mode="none", + iou_type="keypoints_crowd", + prefix="crowdpose", use_area=False, ) test_evaluator = val_evaluator # hooks custom_hooks = [ + dict(type="YOLOXPoseModeSwitchHook", num_last_epochs=30, new_train_pipeline=train_pipeline_stage2, priority=48), dict( - type='YOLOXPoseModeSwitchHook', - num_last_epochs=30, - new_train_pipeline=train_pipeline_stage2, - priority=48), - dict( - type='RTMOModeSwitchHook', + type="RTMOModeSwitchHook", epoch_attributes={ 350: { - 'proxy_target_cc': True, - 'overlaps_power': 1.0, - 'loss_cls.loss_weight': 2.0, - 'loss_mle.loss_weight': 5.0, - 'loss_oks.loss_weight': 10.0 + "proxy_target_cc": True, + "overlaps_power": 1.0, + "loss_cls.loss_weight": 2.0, + "loss_mle.loss_weight": 5.0, + "loss_oks.loss_weight": 10.0, }, }, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - strict_load=False, - priority=49), + priority=48, + ), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, strict_load=False, priority=49), ] # model @@ -199,43 +169,35 @@ widen_factor = 0.5 deepen_factor = 0.33 model = dict( - type='BottomupPoseEstimator', - init_cfg=dict( - type='Kaiming', - layer='Conv2d', - a=2.23606797749979, - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu'), + type="BottomupPoseEstimator", + init_cfg=dict(type="Kaiming", layer="Conv2d", a=2.23606797749979, distribution="uniform", mode="fan_in", nonlinearity="leaky_relu"), data_preprocessor=dict( - type='PoseDataPreprocessor', + type="PoseDataPreprocessor", pad_size_divisor=32, mean=[0, 0, 0], std=[1, 1, 1], batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(480, 800), - size_divisor=32, - interval=1), - ]), + dict(type="BatchSyncRandomResize", random_size_range=(480, 800), size_divisor=32, interval=1), + ], + ), backbone=dict( - type='CSPDarknet', + type="CSPDarknet", deepen_factor=deepen_factor, widen_factor=widen_factor, out_indices=(2, 3, 4), spp_kernal_sizes=(5, 9, 13), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmdetection/v2.0/' - 'yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_' - '20211121_095711-4592a793.pth', - prefix='backbone.', - )), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/" + "yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_" + "20211121_095711-4592a793.pth", + prefix="backbone.", + ), + ), neck=dict( - type='HybridEncoder', + type="HybridEncoder", in_channels=[128, 256, 512], deepen_factor=deepen_factor, widen_factor=widen_factor, @@ -243,21 +205,20 @@ model = dict( output_indices=[1, 2], encoder_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=1024, - ffn_drop=0.0, - act_cfg=dict(type='GELU'))), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0, act_cfg=dict(type="GELU")), + ), projector=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[256, 256], kernel_size=1, out_channels=256, act_cfg=None, - norm_cfg=dict(type='BN'), - num_outs=2)), + norm_cfg=dict(type="BN"), + num_outs=2, + ), + ), head=dict( - type='RTMOHead', + type="RTMOHead", num_keypoints=14, featmap_strides=(16, 32), head_module_cfg=dict( @@ -268,59 +229,33 @@ model = dict( pose_vec_channels=256, widen_factor=widen_factor, stacked_convs=2, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), - assigner=dict( - type='SimOTAAssigner', - dynamic_k_indicator='oks', - oks_calculator=dict(type='PoseOKS', metainfo=metafile)), - prior_generator=dict( - type='MlvlPointGenerator', - centralize_points=True, - strides=[16, 32]), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), + assigner=dict(type="SimOTAAssigner", dynamic_k_indicator="oks", oks_calculator=dict(type="PoseOKS", metainfo=metafile)), + prior_generator=dict(type="MlvlPointGenerator", centralize_points=True, strides=[16, 32]), dcc_cfg=dict( in_channels=256, feat_channels=128, num_bins=(192, 256), spe_channels=128, - gau_cfg=dict( - s=128, - expansion_factor=2, - dropout_rate=0.0, - drop_path=0.0, - act_fn='SiLU', - pos_enc='add')), + gau_cfg=dict(s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", pos_enc="add"), + ), overlaps_power=0.5, - loss_cls=dict( - type='VariFocalLoss', - reduction='sum', - use_target_weight=True, - loss_weight=1.0), - loss_bbox=dict( - type='IoULoss', - mode='square', - eps=1e-16, - reduction='sum', - loss_weight=5.0), - loss_oks=dict( - type='OKSLoss', - reduction='none', - metainfo=metafile, - loss_weight=30.0), - loss_vis=dict( - type='BCELoss', - use_target_weight=True, - reduction='mean', - loss_weight=1.0), + loss_cls=dict(type="VariFocalLoss", reduction="sum", use_target_weight=True, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", mode="square", eps=1e-16, reduction="sum", loss_weight=5.0), + loss_oks=dict(type="OKSLoss", reduction="none", metainfo=metafile, loss_weight=30.0), + loss_vis=dict(type="BCELoss", use_target_weight=True, reduction="mean", loss_weight=1.0), loss_mle=dict( - type='MLECCLoss', + type="MLECCLoss", use_target_weight=True, loss_weight=1e-3, ), - loss_bbox_aux=dict(type='L1Loss', reduction='sum', loss_weight=1.0), + loss_bbox_aux=dict(type="L1Loss", reduction="sum", loss_weight=1.0), ), test_cfg=dict( input_size=input_size, score_thr=0.1, nms_thr=0.65, - )) + ), +) diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb256-420e_body8-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb256-420e_body8-256x192.py index 1cf3380435bd1caee7e082954c268021073d7cd8..5b2cc43a5184741f1faab8303b5f08e7816658d9 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb256-420e_body8-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb256-420e_body8-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,164 +10,124 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-l_udp-body7_210e-256x192-5e9558ef_20230504.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-l_udp-body7_210e-256x192-5e9558ef_20230504.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1024, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), + dict(type="PhotometricDistortion"), dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict(type='PhotometricDistortion'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # mapping @@ -294,78 +254,63 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) # data loaders @@ -373,10 +318,10 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[ dataset_coco, dataset_aic, @@ -388,98 +333,80 @@ train_dataloader = dict( ], pipeline=train_pipeline, test_mode=False, - )) + ), +) # val datasets val_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - data_prefix=dict(img='detection/coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + data_prefix=dict(img="detection/coco/val2017/"), pipeline=[], ) val_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_val.json', - data_prefix=dict( - img='pose/ai_challenge/ai_challenger_keypoint' - '_validation_20170911/keypoint_validation_images_20170911/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_val.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_validation_20170911/keypoint_validation_images_20170911/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) val_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) val_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_val.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_val.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) val_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_test.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_test.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) val_ochuman = dict( - type='OCHumanDataset', + type="OCHumanDataset", data_root=data_root, data_mode=data_mode, - ann_file='ochuman/annotations/' - 'ochuman_coco_format_val_range_0.00_1.00.json', - data_prefix=dict(img='pose/OCHuman/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=ochuman_coco) - ], + ann_file="ochuman/annotations/" "ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="pose/OCHuman/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=ochuman_coco)], ) val_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_val.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_val.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) val_dataloader = dict( @@ -487,28 +414,28 @@ val_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[ val_coco, val_aic, @@ -521,33 +448,23 @@ test_dataloader = dict( ], pipeline=val_pipeline, test_mode=True, - )) + ), +) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # default_hooks = dict( # checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") test_evaluator = [ - dict(type='PCKAccuracy', thr=0.1), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.1), + dict(type="AUC"), + dict(type="EPE"), ] diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb256-420e_body8-384x288.py b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb256-420e_body8-384x288.py index 19b3c8afb6f0d3b5d60bfe9f330c8ac8db15dc30..e197b7966cd70468877f53401d2aeb11dc4fe727 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb256-420e_body8-384x288.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb256-420e_body8-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,164 +10,124 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(288, 384), - sigma=(6., 6.93), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(288, 384), sigma=(6.0, 6.93), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-l_udp-body7_210e-384x288-b15bc30d_20230504.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-l_udp-body7_210e-384x288-b15bc30d_20230504.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1024, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), + dict(type="PhotometricDistortion"), dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict(type='PhotometricDistortion'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # mapping @@ -294,78 +254,63 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) # data loaders @@ -373,10 +318,10 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[ dataset_coco, dataset_aic, @@ -388,98 +333,80 @@ train_dataloader = dict( ], pipeline=train_pipeline, test_mode=False, - )) + ), +) # val datasets val_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - data_prefix=dict(img='detection/coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + data_prefix=dict(img="detection/coco/val2017/"), pipeline=[], ) val_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_val.json', - data_prefix=dict( - img='pose/ai_challenge/ai_challenger_keypoint' - '_validation_20170911/keypoint_validation_images_20170911/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_val.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_validation_20170911/keypoint_validation_images_20170911/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) val_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) val_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_val.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_val.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) val_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_test.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_test.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) val_ochuman = dict( - type='OCHumanDataset', + type="OCHumanDataset", data_root=data_root, data_mode=data_mode, - ann_file='ochuman/annotations/' - 'ochuman_coco_format_val_range_0.00_1.00.json', - data_prefix=dict(img='pose/OCHuman/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=ochuman_coco) - ], + ann_file="ochuman/annotations/" "ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="pose/OCHuman/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=ochuman_coco)], ) val_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_val.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_val.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) val_dataloader = dict( @@ -487,28 +414,28 @@ val_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[ val_coco, val_aic, @@ -521,33 +448,23 @@ test_dataloader = dict( ], pipeline=val_pipeline, test_mode=True, - )) + ), +) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # default_hooks = dict( # checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") test_evaluator = [ - dict(type='PCKAccuracy', thr=0.1), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.1), + dict(type="AUC"), + dict(type="EPE"), ] diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb512-700e_body8-halpe26-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb512-700e_body8-halpe26-256x192.py index 293a5f07ea470e6bab484a7d0ce5693bd84db888..a0905c2ecccc01c0eb99f79efabf82a408e35812 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb512-700e_body8-halpe26-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb512-700e_body8-halpe26-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # common setting num_keypoints = 26 @@ -16,181 +16,145 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=input_size, - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=input_size, sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmpose-l_simcc-body7_pt-body7_420e-256x192-4dba18fc_20230504.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/rtmpose-l_simcc-body7_pt-body7_420e-256x192-4dba18fc_20230504.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1024, out_channels=num_keypoints, input_size=input_size, in_featuremap_size=tuple([s // 32 for s in input_size]), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PhotometricDistortion'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PhotometricDistortion"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] # mapping -coco_halpe26 = [(i, i) for i in range(17)] + [(17, 20), (18, 22), (19, 24), - (20, 21), (21, 23), (22, 25)] +coco_halpe26 = [(i, i) for i in range(17)] + [(17, 20), (18, 22), (19, 24), (20, 21), (21, 23), (22, 25)] -aic_halpe26 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), - (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15), - (12, 17), (13, 18)] +aic_halpe26 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15), (12, 17), (13, 18)] -crowdpose_halpe26 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), - (7, 12), (8, 13), (9, 14), (10, 15), (11, 16), (12, 17), - (13, 18)] +crowdpose_halpe26 = [ + (0, 5), + (1, 6), + (2, 7), + (3, 8), + (4, 9), + (5, 10), + (6, 11), + (7, 12), + (8, 13), + (9, 14), + (10, 15), + (11, 16), + (12, 17), + (13, 18), +] mpii_halpe26 = [ (0, 16), @@ -254,99 +218,63 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=coco_halpe26) - ], + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=coco_halpe26)], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_halpe26) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_halpe26)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_halpe26) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_halpe26)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_halpe26) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_halpe26)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_halpe26) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_halpe26)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_halpe26) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_halpe26)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_halpe26) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_halpe26)], ) # data loaders @@ -355,10 +283,10 @@ train_dataloader = dict( num_workers=5, pin_memory=True, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/halpe26.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/halpe26.py"), datasets=[ dataset_coco, dataset_aic, @@ -370,122 +298,80 @@ train_dataloader = dict( ], pipeline=train_pipeline, test_mode=False, - )) + ), +) # val datasets val_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=coco_halpe26) - ], + ann_file="coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=coco_halpe26)], ) val_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_val.json', - data_prefix=dict( - img='pose/ai_challenge/ai_challenger_keypoint' - '_validation_20170911/keypoint_validation_images_20170911/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_halpe26) - ], + ann_file="aic/annotations/aic_val.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_validation_20170911/keypoint_validation_images_20170911/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_halpe26)], ) val_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_halpe26) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_halpe26)], ) val_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_val.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_halpe26) - ], + ann_file="mpii/annotations/mpii_val.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_halpe26)], ) val_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_test.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_halpe26) - ], + ann_file="jhmdb/annotations/Sub1_test.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_halpe26)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_halpe26) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_halpe26)], ) val_ochuman = dict( - type='OCHumanDataset', + type="OCHumanDataset", data_root=data_root, data_mode=data_mode, - ann_file='ochuman/annotations/' - 'ochuman_coco_format_val_range_0.00_1.00.json', - data_prefix=dict(img='pose/OCHuman/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=ochuman_halpe26) - ], + ann_file="ochuman/annotations/" "ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="pose/OCHuman/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=ochuman_halpe26)], ) val_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_val.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_halpe26) - ], + ann_file="posetrack18/annotations/posetrack18_val.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_halpe26)], ) val_dataloader = dict( @@ -493,10 +379,10 @@ val_dataloader = dict( num_workers=5, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/halpe26.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/halpe26.py"), datasets=[ val_coco, val_aic, @@ -509,27 +395,19 @@ val_dataloader = dict( ], pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -test_evaluator = [dict(type='PCKAccuracy', thr=0.1), dict(type='AUC')] +test_evaluator = [dict(type="PCKAccuracy", thr=0.1), dict(type="AUC")] val_evaluator = test_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb512-700e_body8-halpe26-384x288.py b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb512-700e_body8-halpe26-384x288.py index 0aa16f3db405fce481a3788029429dec50dfa732..59e101428185c1cfee1d78a9436f798ceba2f434 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb512-700e_body8-halpe26-384x288.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-l_8xb512-700e_body8-halpe26-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # common setting num_keypoints = 26 @@ -16,181 +16,145 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=input_size, - sigma=(6., 6.93), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=input_size, sigma=(6.0, 6.93), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmpose-l_simcc-body7_pt-body7_420e-384x288-3f5a1437_20230504.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/rtmpose-l_simcc-body7_pt-body7_420e-384x288-3f5a1437_20230504.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1024, out_channels=num_keypoints, input_size=input_size, in_featuremap_size=tuple([s // 32 for s in input_size]), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PhotometricDistortion'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PhotometricDistortion"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] # mapping -coco_halpe26 = [(i, i) for i in range(17)] + [(17, 20), (18, 22), (19, 24), - (20, 21), (21, 23), (22, 25)] +coco_halpe26 = [(i, i) for i in range(17)] + [(17, 20), (18, 22), (19, 24), (20, 21), (21, 23), (22, 25)] -aic_halpe26 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), - (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15), - (12, 17), (13, 18)] +aic_halpe26 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15), (12, 17), (13, 18)] -crowdpose_halpe26 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), - (7, 12), (8, 13), (9, 14), (10, 15), (11, 16), (12, 17), - (13, 18)] +crowdpose_halpe26 = [ + (0, 5), + (1, 6), + (2, 7), + (3, 8), + (4, 9), + (5, 10), + (6, 11), + (7, 12), + (8, 13), + (9, 14), + (10, 15), + (11, 16), + (12, 17), + (13, 18), +] mpii_halpe26 = [ (0, 16), @@ -254,99 +218,63 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=coco_halpe26) - ], + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=coco_halpe26)], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_halpe26) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_halpe26)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_halpe26) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_halpe26)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_halpe26) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_halpe26)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_halpe26) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_halpe26)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_halpe26) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_halpe26)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_halpe26) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_halpe26)], ) # data loaders @@ -355,10 +283,10 @@ train_dataloader = dict( num_workers=10, pin_memory=True, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/halpe26.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/halpe26.py"), datasets=[ dataset_coco, dataset_aic, @@ -370,122 +298,80 @@ train_dataloader = dict( ], pipeline=train_pipeline, test_mode=False, - )) + ), +) # val datasets val_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=coco_halpe26) - ], + ann_file="coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=coco_halpe26)], ) val_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_val.json', - data_prefix=dict( - img='pose/ai_challenge/ai_challenger_keypoint' - '_validation_20170911/keypoint_validation_images_20170911/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_halpe26) - ], + ann_file="aic/annotations/aic_val.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_validation_20170911/keypoint_validation_images_20170911/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_halpe26)], ) val_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_halpe26) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_halpe26)], ) val_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_val.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_halpe26) - ], + ann_file="mpii/annotations/mpii_val.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_halpe26)], ) val_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_test.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_halpe26) - ], + ann_file="jhmdb/annotations/Sub1_test.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_halpe26)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_halpe26) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_halpe26)], ) val_ochuman = dict( - type='OCHumanDataset', + type="OCHumanDataset", data_root=data_root, data_mode=data_mode, - ann_file='ochuman/annotations/' - 'ochuman_coco_format_val_range_0.00_1.00.json', - data_prefix=dict(img='pose/OCHuman/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=ochuman_halpe26) - ], + ann_file="ochuman/annotations/" "ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="pose/OCHuman/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=ochuman_halpe26)], ) val_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_val.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_halpe26) - ], + ann_file="posetrack18/annotations/posetrack18_val.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_halpe26)], ) val_dataloader = dict( @@ -493,10 +379,10 @@ val_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/halpe26.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/halpe26.py"), datasets=[ val_coco, val_aic, @@ -509,27 +395,19 @@ val_dataloader = dict( ], pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -test_evaluator = [dict(type='PCKAccuracy', thr=0.1), dict(type='AUC')] +test_evaluator = [dict(type="PCKAccuracy", thr=0.1), dict(type="AUC")] val_evaluator = test_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb256-420e_body8-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb256-420e_body8-256x192.py index be462bfddf334e36d9361b8242c074f299d4f4b9..1a77d53a781461786945dfba5e87f3c93474852e 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb256-420e_body8-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb256-420e_body8-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,164 +10,126 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-m_udp-body7_210e-256x192-e0c9327b_20230504.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-m_udp-body7_210e-256x192-e0c9327b_20230504.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0.0, - drop_path=0.0, - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict(type='PhotometricDistortion'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), + dict(type="PhotometricDistortion"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # mapping @@ -294,78 +256,63 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) # data loaders @@ -373,10 +320,10 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[ dataset_coco, dataset_aic, @@ -388,98 +335,80 @@ train_dataloader = dict( ], pipeline=train_pipeline, test_mode=False, - )) + ), +) # val datasets val_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - data_prefix=dict(img='detection/coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + data_prefix=dict(img="detection/coco/val2017/"), pipeline=[], ) val_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_val.json', - data_prefix=dict( - img='pose/ai_challenge/ai_challenger_keypoint' - '_validation_20170911/keypoint_validation_images_20170911/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_val.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_validation_20170911/keypoint_validation_images_20170911/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) val_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) val_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_val.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_val.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) val_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_test.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_test.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) val_ochuman = dict( - type='OCHumanDataset', + type="OCHumanDataset", data_root=data_root, data_mode=data_mode, - ann_file='ochuman/annotations/' - 'ochuman_coco_format_val_range_0.00_1.00.json', - data_prefix=dict(img='pose/OCHuman/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=ochuman_coco) - ], + ann_file="ochuman/annotations/" "ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="pose/OCHuman/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=ochuman_coco)], ) val_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_val.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_val.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) val_dataloader = dict( @@ -487,28 +416,28 @@ val_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[ val_coco, val_aic, @@ -521,33 +450,19 @@ test_dataloader = dict( ], pipeline=val_pipeline, test_mode=True, - )) + ), +) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # default_hooks = dict( # checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') -test_evaluator = [ - dict(type='PCKAccuracy', thr=0.1), - dict(type='AUC'), - dict(type='EPE') -] +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") +test_evaluator = [dict(type="PCKAccuracy", thr=0.1), dict(type="AUC"), dict(type="EPE")] diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb256-420e_body8-384x288.py b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb256-420e_body8-384x288.py index 64cfc8a604b37b8ed6de85c96e99d7295399b452..75f6c12c01d73b1c07da33d6ad553475e8b783f8 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb256-420e_body8-384x288.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb256-420e_body8-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,164 +10,126 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(288, 384), - sigma=(6., 6.93), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(288, 384), sigma=(6.0, 6.93), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-m_udp-body7_210e-384x288-b9bc2b57_20230504.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-m_udp-body7_210e-384x288-b9bc2b57_20230504.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0.0, - drop_path=0.0, - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict(type='PhotometricDistortion'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), + dict(type="PhotometricDistortion"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # mapping @@ -294,78 +256,63 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) # data loaders @@ -373,10 +320,10 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[ dataset_coco, dataset_aic, @@ -388,98 +335,80 @@ train_dataloader = dict( ], pipeline=train_pipeline, test_mode=False, - )) + ), +) # val datasets val_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - data_prefix=dict(img='detection/coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + data_prefix=dict(img="detection/coco/val2017/"), pipeline=[], ) val_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_val.json', - data_prefix=dict( - img='pose/ai_challenge/ai_challenger_keypoint' - '_validation_20170911/keypoint_validation_images_20170911/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_val.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_validation_20170911/keypoint_validation_images_20170911/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) val_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) val_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_val.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_val.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) val_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_test.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_test.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) val_ochuman = dict( - type='OCHumanDataset', + type="OCHumanDataset", data_root=data_root, data_mode=data_mode, - ann_file='ochuman/annotations/' - 'ochuman_coco_format_val_range_0.00_1.00.json', - data_prefix=dict(img='pose/OCHuman/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=ochuman_coco) - ], + ann_file="ochuman/annotations/" "ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="pose/OCHuman/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=ochuman_coco)], ) val_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_val.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_val.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) val_dataloader = dict( @@ -487,28 +416,28 @@ val_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[ val_coco, val_aic, @@ -521,33 +450,19 @@ test_dataloader = dict( ], pipeline=val_pipeline, test_mode=True, - )) + ), +) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # default_hooks = dict( # checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') -test_evaluator = [ - dict(type='PCKAccuracy', thr=0.1), - dict(type='AUC'), - dict(type='EPE') -] +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") +test_evaluator = [dict(type="PCKAccuracy", thr=0.1), dict(type="AUC"), dict(type="EPE")] diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb512-700e_body8-halpe26-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb512-700e_body8-halpe26-256x192.py index e694dd27d9e29f8615f62fc483dd976cf2644aaa..655ced0ae1cf5adf34bd4a96dc602ee56f1c70cd 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb512-700e_body8-halpe26-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb512-700e_body8-halpe26-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # common setting num_keypoints = 26 @@ -16,175 +16,145 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=input_size, - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=input_size, sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmpose-m_simcc-body7_pt-body7_420e-256x192-e48f03d0_20230504.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/rtmpose-m_simcc-body7_pt-body7_420e-256x192-e48f03d0_20230504.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=num_keypoints, input_size=input_size, in_featuremap_size=tuple([s // 32 for s in input_size]), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PhotometricDistortion'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PhotometricDistortion"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # mapping -coco_halpe26 = [(i, i) for i in range(17)] + [(17, 20), (18, 22), (19, 24), - (20, 21), (21, 23), (22, 25)] +coco_halpe26 = [(i, i) for i in range(17)] + [(17, 20), (18, 22), (19, 24), (20, 21), (21, 23), (22, 25)] -aic_halpe26 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), - (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15), - (12, 17), (13, 18)] +aic_halpe26 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15), (12, 17), (13, 18)] -crowdpose_halpe26 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), - (7, 12), (8, 13), (9, 14), (10, 15), (11, 16), (12, 17), - (13, 18)] +crowdpose_halpe26 = [ + (0, 5), + (1, 6), + (2, 7), + (3, 8), + (4, 9), + (5, 10), + (6, 11), + (7, 12), + (8, 13), + (9, 14), + (10, 15), + (11, 16), + (12, 17), + (13, 18), +] mpii_halpe26 = [ (0, 16), @@ -248,99 +218,63 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=coco_halpe26) - ], + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=coco_halpe26)], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_halpe26) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_halpe26)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_halpe26) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_halpe26)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_halpe26) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_halpe26)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_halpe26) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_halpe26)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_halpe26) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_halpe26)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_halpe26) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_halpe26)], ) # data loaders @@ -349,10 +283,10 @@ train_dataloader = dict( num_workers=10, pin_memory=True, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/halpe26.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/halpe26.py"), datasets=[ dataset_coco, dataset_aic, @@ -364,122 +298,80 @@ train_dataloader = dict( ], pipeline=train_pipeline, test_mode=False, - )) + ), +) # val datasets val_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=coco_halpe26) - ], + ann_file="coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=coco_halpe26)], ) val_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_val.json', - data_prefix=dict( - img='pose/ai_challenge/ai_challenger_keypoint' - '_validation_20170911/keypoint_validation_images_20170911/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_halpe26) - ], + ann_file="aic/annotations/aic_val.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_validation_20170911/keypoint_validation_images_20170911/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_halpe26)], ) val_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_halpe26) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_halpe26)], ) val_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_val.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_halpe26) - ], + ann_file="mpii/annotations/mpii_val.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_halpe26)], ) val_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_test.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_halpe26) - ], + ann_file="jhmdb/annotations/Sub1_test.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_halpe26)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_halpe26) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_halpe26)], ) val_ochuman = dict( - type='OCHumanDataset', + type="OCHumanDataset", data_root=data_root, data_mode=data_mode, - ann_file='ochuman/annotations/' - 'ochuman_coco_format_val_range_0.00_1.00.json', - data_prefix=dict(img='pose/OCHuman/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=ochuman_halpe26) - ], + ann_file="ochuman/annotations/" "ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="pose/OCHuman/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=ochuman_halpe26)], ) val_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_val.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_halpe26) - ], + ann_file="posetrack18/annotations/posetrack18_val.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_halpe26)], ) val_dataloader = dict( @@ -487,10 +379,10 @@ val_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/halpe26.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/halpe26.py"), datasets=[ val_coco, val_aic, @@ -503,27 +395,19 @@ val_dataloader = dict( ], pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -test_evaluator = [dict(type='PCKAccuracy', thr=0.1), dict(type='AUC')] +test_evaluator = [dict(type="PCKAccuracy", thr=0.1), dict(type="AUC")] val_evaluator = test_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb512-700e_body8-halpe26-384x288.py b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb512-700e_body8-halpe26-384x288.py index 5ee967a3097eb51e877a2e1c4a6e3a1330bdc20e..472081926d5d722575ba67c0eaa37db786beb3a6 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb512-700e_body8-halpe26-384x288.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-m_8xb512-700e_body8-halpe26-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # common setting num_keypoints = 26 @@ -16,187 +16,148 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=input_size, - sigma=(6., 6.93), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=input_size, sigma=(6.0, 6.93), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmpose-m_simcc-body7_pt-body7_420e-384x288-65e718c4_20230504.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/rtmpose-m_simcc-body7_pt-body7_420e-384x288-65e718c4_20230504.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=num_keypoints, input_size=input_size, in_featuremap_size=tuple([s // 32 for s in input_size]), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/" # backend_args = dict(backend='local') backend_args = dict( - backend='petrel', - path_mapping=dict({ - f'{data_root}': 's3://openmmlab/datasets/', - f'{data_root}': 's3://openmmlab/datasets/' - })) + backend="petrel", path_mapping=dict({f"{data_root}": "s3://openmmlab/datasets/", f"{data_root}": "s3://openmmlab/datasets/"}) +) # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PhotometricDistortion'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PhotometricDistortion"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] # mapping -coco_halpe26 = [(i, i) for i in range(17)] + [(17, 20), (18, 22), (19, 24), - (20, 21), (21, 23), (22, 25)] +coco_halpe26 = [(i, i) for i in range(17)] + [(17, 20), (18, 22), (19, 24), (20, 21), (21, 23), (22, 25)] -aic_halpe26 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), - (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15), - (12, 17), (13, 18)] +aic_halpe26 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15), (12, 17), (13, 18)] -crowdpose_halpe26 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), - (7, 12), (8, 13), (9, 14), (10, 15), (11, 16), (12, 17), - (13, 18)] +crowdpose_halpe26 = [ + (0, 5), + (1, 6), + (2, 7), + (3, 8), + (4, 9), + (5, 10), + (6, 11), + (7, 12), + (8, 13), + (9, 14), + (10, 15), + (11, 16), + (12, 17), + (13, 18), +] mpii_halpe26 = [ (0, 16), @@ -260,99 +221,63 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=coco_halpe26) - ], + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=coco_halpe26)], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_halpe26) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_halpe26)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_halpe26) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_halpe26)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_halpe26) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_halpe26)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_halpe26) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_halpe26)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_halpe26) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_halpe26)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_halpe26) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_halpe26)], ) # data loaders @@ -361,10 +286,10 @@ train_dataloader = dict( num_workers=10, pin_memory=True, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/halpe26.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/halpe26.py"), datasets=[ dataset_coco, dataset_aic, @@ -376,122 +301,80 @@ train_dataloader = dict( ], pipeline=train_pipeline, test_mode=False, - )) + ), +) # val datasets val_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=coco_halpe26) - ], + ann_file="coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=coco_halpe26)], ) val_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_val.json', - data_prefix=dict( - img='pose/ai_challenge/ai_challenger_keypoint' - '_validation_20170911/keypoint_validation_images_20170911/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_halpe26) - ], + ann_file="aic/annotations/aic_val.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_validation_20170911/keypoint_validation_images_20170911/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_halpe26)], ) val_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_halpe26) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_halpe26)], ) val_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_val.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_halpe26) - ], + ann_file="mpii/annotations/mpii_val.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_halpe26)], ) val_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_test.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_halpe26) - ], + ann_file="jhmdb/annotations/Sub1_test.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_halpe26)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_halpe26) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_halpe26)], ) val_ochuman = dict( - type='OCHumanDataset', + type="OCHumanDataset", data_root=data_root, data_mode=data_mode, - ann_file='ochuman/annotations/' - 'ochuman_coco_format_val_range_0.00_1.00.json', - data_prefix=dict(img='pose/OCHuman/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=ochuman_halpe26) - ], + ann_file="ochuman/annotations/" "ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="pose/OCHuman/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=ochuman_halpe26)], ) val_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_val.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_halpe26) - ], + ann_file="posetrack18/annotations/posetrack18_val.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_halpe26)], ) val_dataloader = dict( @@ -499,10 +382,10 @@ val_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/halpe26.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/halpe26.py"), datasets=[ val_coco, val_aic, @@ -515,28 +398,20 @@ val_dataloader = dict( ], pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # hooks # default_hooks = dict( -default_hooks = dict( - checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -test_evaluator = [dict(type='PCKAccuracy', thr=0.1), dict(type='AUC')] +test_evaluator = [dict(type="PCKAccuracy", thr=0.1), dict(type="AUC")] val_evaluator = test_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-s_8xb1024-700e_body8-halpe26-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-s_8xb1024-700e_body8-halpe26-256x192.py index 05e6ec09808ca47f337222dfac326a0ff45a8d50..b628f8e8c9207479be86e80d9182e8602195ec94 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-s_8xb1024-700e_body8-halpe26-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-s_8xb1024-700e_body8-halpe26-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # common setting num_keypoints = 26 @@ -16,181 +16,145 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.0), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.0), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=input_size, - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=input_size, sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.33, widen_factor=0.5, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmpose-s_simcc-body7_pt-body7_420e-256x192-acd4a1ef_20230504.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/rtmpose-s_simcc-body7_pt-body7_420e-256x192-acd4a1ef_20230504.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=512, out_channels=num_keypoints, input_size=input_size, in_featuremap_size=tuple([s // 32 for s in input_size]), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PhotometricDistortion'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PhotometricDistortion"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.6, 1.4], - rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] # mapping -coco_halpe26 = [(i, i) for i in range(17)] + [(17, 20), (18, 22), (19, 24), - (20, 21), (21, 23), (22, 25)] +coco_halpe26 = [(i, i) for i in range(17)] + [(17, 20), (18, 22), (19, 24), (20, 21), (21, 23), (22, 25)] -aic_halpe26 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), - (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15), - (12, 17), (13, 18)] +aic_halpe26 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15), (12, 17), (13, 18)] -crowdpose_halpe26 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), - (7, 12), (8, 13), (9, 14), (10, 15), (11, 16), (12, 17), - (13, 18)] +crowdpose_halpe26 = [ + (0, 5), + (1, 6), + (2, 7), + (3, 8), + (4, 9), + (5, 10), + (6, 11), + (7, 12), + (8, 13), + (9, 14), + (10, 15), + (11, 16), + (12, 17), + (13, 18), +] mpii_halpe26 = [ (0, 16), @@ -254,99 +218,63 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=coco_halpe26) - ], + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=coco_halpe26)], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_halpe26) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_halpe26)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_halpe26) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_halpe26)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_halpe26) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_halpe26)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_halpe26) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_halpe26)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_halpe26) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_halpe26)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_halpe26) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_halpe26)], ) # data loaders @@ -355,10 +283,10 @@ train_dataloader = dict( num_workers=10, pin_memory=True, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/halpe26.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/halpe26.py"), datasets=[ dataset_coco, dataset_aic, @@ -370,122 +298,80 @@ train_dataloader = dict( ], pipeline=train_pipeline, test_mode=False, - )) + ), +) # val datasets val_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=coco_halpe26) - ], + ann_file="coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=coco_halpe26)], ) val_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_val.json', - data_prefix=dict( - img='pose/ai_challenge/ai_challenger_keypoint' - '_validation_20170911/keypoint_validation_images_20170911/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_halpe26) - ], + ann_file="aic/annotations/aic_val.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_validation_20170911/keypoint_validation_images_20170911/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_halpe26)], ) val_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_halpe26) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_halpe26)], ) val_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_val.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_halpe26) - ], + ann_file="mpii/annotations/mpii_val.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_halpe26)], ) val_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_test.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_halpe26) - ], + ann_file="jhmdb/annotations/Sub1_test.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_halpe26)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_halpe26) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_halpe26)], ) val_ochuman = dict( - type='OCHumanDataset', + type="OCHumanDataset", data_root=data_root, data_mode=data_mode, - ann_file='ochuman/annotations/' - 'ochuman_coco_format_val_range_0.00_1.00.json', - data_prefix=dict(img='pose/OCHuman/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=ochuman_halpe26) - ], + ann_file="ochuman/annotations/" "ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="pose/OCHuman/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=ochuman_halpe26)], ) val_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_val.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_halpe26) - ], + ann_file="posetrack18/annotations/posetrack18_val.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_halpe26)], ) val_dataloader = dict( @@ -493,10 +379,10 @@ val_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/halpe26.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/halpe26.py"), datasets=[ val_coco, val_aic, @@ -509,27 +395,19 @@ val_dataloader = dict( ], pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -test_evaluator = [dict(type='PCKAccuracy', thr=0.1), dict(type='AUC')] +test_evaluator = [dict(type="PCKAccuracy", thr=0.1), dict(type="AUC")] val_evaluator = test_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-s_8xb256-420e_body8-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-s_8xb256-420e_body8-256x192.py index 7d0a69775106fb57df5d089dce7b5252c0e0f904..2a01c9fca348b23b1790bf216357d4953bb093ef 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-s_8xb256-420e_body8-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-s_8xb256-420e_body8-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,164 +10,124 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.0), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.0), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.33, widen_factor=0.5, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-s_udp-body7_210e-256x192-8c9ccbdb_20230504.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-s_udp-body7_210e-256x192-8c9ccbdb_20230504.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=512, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict(type='PhotometricDistortion'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), + dict(type="PhotometricDistortion"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # mapping @@ -294,78 +254,63 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) # data loaders @@ -373,10 +318,10 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[ dataset_coco, dataset_aic, @@ -388,98 +333,80 @@ train_dataloader = dict( ], pipeline=train_pipeline, test_mode=False, - )) + ), +) # val datasets val_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - data_prefix=dict(img='detection/coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + data_prefix=dict(img="detection/coco/val2017/"), pipeline=[], ) val_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_val.json', - data_prefix=dict( - img='pose/ai_challenge/ai_challenger_keypoint' - '_validation_20170911/keypoint_validation_images_20170911/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_val.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_validation_20170911/keypoint_validation_images_20170911/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) val_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) val_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_val.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_val.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) val_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_test.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_test.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) val_ochuman = dict( - type='OCHumanDataset', + type="OCHumanDataset", data_root=data_root, data_mode=data_mode, - ann_file='ochuman/annotations/' - 'ochuman_coco_format_val_range_0.00_1.00.json', - data_prefix=dict(img='pose/OCHuman/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=ochuman_coco) - ], + ann_file="ochuman/annotations/" "ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="pose/OCHuman/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=ochuman_coco)], ) val_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_val.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_val.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) val_dataloader = dict( @@ -487,28 +414,28 @@ val_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[ val_coco, val_aic, @@ -521,33 +448,19 @@ test_dataloader = dict( ], pipeline=val_pipeline, test_mode=True, - )) + ), +) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # default_hooks = dict( # checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') -test_evaluator = [ - dict(type='PCKAccuracy', thr=0.1), - dict(type='AUC'), - dict(type='EPE') -] +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") +test_evaluator = [dict(type="PCKAccuracy", thr=0.1), dict(type="AUC"), dict(type="EPE")] diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-t_8xb1024-700e_body8-halpe26-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-t_8xb1024-700e_body8-halpe26-256x192.py index 8d70bd27aeaf17ae36fb0c9daf24db91cc17ff5c..c880c4e0a9ce2e0d45cdf22f8023a0343ac2774a 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-t_8xb1024-700e_body8-halpe26-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-t_8xb1024-700e_body8-halpe26-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # common setting num_keypoints = 26 @@ -16,181 +16,145 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.0), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=input_size, - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=input_size, sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.167, widen_factor=0.375, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-tiny_udp-body7_210e-256x192-a3775292_20230504.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-tiny_udp-body7_210e-256x192-a3775292_20230504.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=384, out_channels=num_keypoints, input_size=input_size, in_featuremap_size=tuple([s // 32 for s in input_size]), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PhotometricDistortion'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PhotometricDistortion"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.6, 1.4], - rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] # mapping -coco_halpe26 = [(i, i) for i in range(17)] + [(17, 20), (18, 22), (19, 24), - (20, 21), (21, 23), (22, 25)] +coco_halpe26 = [(i, i) for i in range(17)] + [(17, 20), (18, 22), (19, 24), (20, 21), (21, 23), (22, 25)] -aic_halpe26 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), - (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15), - (12, 17), (13, 18)] +aic_halpe26 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15), (12, 17), (13, 18)] -crowdpose_halpe26 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), - (7, 12), (8, 13), (9, 14), (10, 15), (11, 16), (12, 17), - (13, 18)] +crowdpose_halpe26 = [ + (0, 5), + (1, 6), + (2, 7), + (3, 8), + (4, 9), + (5, 10), + (6, 11), + (7, 12), + (8, 13), + (9, 14), + (10, 15), + (11, 16), + (12, 17), + (13, 18), +] mpii_halpe26 = [ (0, 16), @@ -254,99 +218,63 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=coco_halpe26) - ], + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=coco_halpe26)], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_halpe26) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_halpe26)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_halpe26) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_halpe26)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_halpe26) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_halpe26)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_halpe26) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_halpe26)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_halpe26) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_halpe26)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_halpe26) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_halpe26)], ) # data loaders @@ -355,10 +283,10 @@ train_dataloader = dict( num_workers=10, pin_memory=True, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/halpe26.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/halpe26.py"), datasets=[ dataset_coco, dataset_aic, @@ -370,122 +298,80 @@ train_dataloader = dict( ], pipeline=train_pipeline, test_mode=False, - )) + ), +) # val datasets val_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=coco_halpe26) - ], + ann_file="coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=coco_halpe26)], ) val_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_val.json', - data_prefix=dict( - img='pose/ai_challenge/ai_challenger_keypoint' - '_validation_20170911/keypoint_validation_images_20170911/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_halpe26) - ], + ann_file="aic/annotations/aic_val.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_validation_20170911/keypoint_validation_images_20170911/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_halpe26)], ) val_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_halpe26) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_halpe26)], ) val_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_val.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_halpe26) - ], + ann_file="mpii/annotations/mpii_val.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_halpe26)], ) val_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_test.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_halpe26) - ], + ann_file="jhmdb/annotations/Sub1_test.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_halpe26)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_halpe26) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_halpe26)], ) val_ochuman = dict( - type='OCHumanDataset', + type="OCHumanDataset", data_root=data_root, data_mode=data_mode, - ann_file='ochuman/annotations/' - 'ochuman_coco_format_val_range_0.00_1.00.json', - data_prefix=dict(img='pose/OCHuman/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=ochuman_halpe26) - ], + ann_file="ochuman/annotations/" "ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="pose/OCHuman/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=ochuman_halpe26)], ) val_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_val.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_halpe26) - ], + ann_file="posetrack18/annotations/posetrack18_val.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_halpe26)], ) val_dataloader = dict( @@ -494,10 +380,10 @@ val_dataloader = dict( pin_memory=True, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/halpe26.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/halpe26.py"), datasets=[ val_coco, val_aic, @@ -510,13 +396,13 @@ val_dataloader = dict( ], pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater", max_keep_ckpts=1)) custom_hooks = [ # dict( @@ -525,12 +411,9 @@ custom_hooks = [ # momentum=0.0002, # update_buffers=True, # priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2) ] # evaluators -test_evaluator = [dict(type='PCKAccuracy', thr=0.1), dict(type='AUC')] +test_evaluator = [dict(type="PCKAccuracy", thr=0.1), dict(type="AUC")] val_evaluator = test_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-t_8xb256-420e_body8-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-t_8xb256-420e_body8-256x192.py index bdc7f80a2bbbf71a958689c4fc45df3d15c22a4e..e6aea8f5349afcc85faf3ee8b6f3f3e1244cc84c 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-t_8xb256-420e_body8-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-t_8xb256-420e_body8-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,164 +10,124 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.0), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.167, widen_factor=0.375, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-tiny_udp-body7_210e-256x192-a3775292_20230504.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-tiny_udp-body7_210e-256x192-a3775292_20230504.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=384, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict(type='PhotometricDistortion'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), + dict(type="PhotometricDistortion"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # mapping @@ -294,78 +254,63 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) # data loaders @@ -373,10 +318,10 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[ dataset_coco, dataset_aic, @@ -388,98 +333,80 @@ train_dataloader = dict( ], pipeline=train_pipeline, test_mode=False, - )) + ), +) # val datasets val_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - data_prefix=dict(img='detection/coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + data_prefix=dict(img="detection/coco/val2017/"), pipeline=[], ) val_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_val.json', - data_prefix=dict( - img='pose/ai_challenge/ai_challenger_keypoint' - '_validation_20170911/keypoint_validation_images_20170911/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=aic_coco) - ], + ann_file="aic/annotations/aic_val.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_validation_20170911/keypoint_validation_images_20170911/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=aic_coco)], ) val_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=crowdpose_coco) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=crowdpose_coco)], ) val_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_val.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=mpii_coco) - ], + ann_file="mpii/annotations/mpii_val.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=mpii_coco)], ) val_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_test.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=jhmdb_coco) - ], + ann_file="jhmdb/annotations/Sub1_test.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=jhmdb_coco)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=halpe_coco) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=halpe_coco)], ) val_ochuman = dict( - type='OCHumanDataset', + type="OCHumanDataset", data_root=data_root, data_mode=data_mode, - ann_file='ochuman/annotations/' - 'ochuman_coco_format_val_range_0.00_1.00.json', - data_prefix=dict(img='pose/OCHuman/images/'), - pipeline=[ - dict(type='KeypointConverter', num_keypoints=17, mapping=ochuman_coco) - ], + ann_file="ochuman/annotations/" "ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="pose/OCHuman/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=ochuman_coco)], ) val_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_val.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=17, mapping=posetrack_coco) - ], + ann_file="posetrack18/annotations/posetrack18_val.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=17, mapping=posetrack_coco)], ) val_dataloader = dict( @@ -487,28 +414,28 @@ val_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[ val_coco, val_aic, @@ -521,11 +448,11 @@ test_dataloader = dict( ], pipeline=val_pipeline, test_mode=True, - )) + ), +) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # default_hooks = dict( # checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) @@ -536,19 +463,10 @@ custom_hooks = [ # momentum=0.0002, # update_buffers=True, # priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2) ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') - -test_evaluator = [ - dict(type='PCKAccuracy', thr=0.1), - dict(type='AUC'), - dict(type='EPE') -] +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") + +test_evaluator = [dict(type="PCKAccuracy", thr=0.1), dict(type="AUC"), dict(type="EPE")] diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-x_8xb256-700e_body8-halpe26-384x288.py b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-x_8xb256-700e_body8-halpe26-384x288.py index e50aa42f0e5faafb2324e2d4f7d704f11a3c1cda..dc683d366afede2d01f94c3fcee61bd48da56dca 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-x_8xb256-700e_body8-halpe26-384x288.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/body8/rtmpose-x_8xb256-700e_body8-halpe26-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # common setting num_keypoints = 26 @@ -16,181 +16,145 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=input_size, - sigma=(6., 6.93), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=input_size, sigma=(6.0, 6.93), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=1.33, widen_factor=1.25, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-x_udp-body7_210e-384x288-d28b58e6_20230529.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-x_udp-body7_210e-384x288-d28b58e6_20230529.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1280, out_channels=num_keypoints, input_size=input_size, in_featuremap_size=tuple([s // 32 for s in input_size]), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PhotometricDistortion'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PhotometricDistortion"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] # mapping -coco_halpe26 = [(i, i) for i in range(17)] + [(17, 20), (18, 22), (19, 24), - (20, 21), (21, 23), (22, 25)] +coco_halpe26 = [(i, i) for i in range(17)] + [(17, 20), (18, 22), (19, 24), (20, 21), (21, 23), (22, 25)] -aic_halpe26 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), - (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15), - (12, 17), (13, 18)] +aic_halpe26 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15), (12, 17), (13, 18)] -crowdpose_halpe26 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), - (7, 12), (8, 13), (9, 14), (10, 15), (11, 16), (12, 17), - (13, 18)] +crowdpose_halpe26 = [ + (0, 5), + (1, 6), + (2, 7), + (3, 8), + (4, 9), + (5, 10), + (6, 11), + (7, 12), + (8, 13), + (9, 14), + (10, 15), + (11, 16), + (12, 17), + (13, 18), +] mpii_halpe26 = [ (0, 16), @@ -254,99 +218,63 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=coco_halpe26) - ], + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=coco_halpe26)], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_halpe26) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_halpe26)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_halpe26) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_halpe26)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_halpe26) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_halpe26)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_halpe26) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_halpe26)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_halpe26) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_halpe26)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_halpe26) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_halpe26)], ) # data loaders @@ -355,10 +283,10 @@ train_dataloader = dict( num_workers=10, pin_memory=True, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/halpe26.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/halpe26.py"), datasets=[ dataset_coco, dataset_aic, @@ -370,122 +298,80 @@ train_dataloader = dict( ], pipeline=train_pipeline, test_mode=False, - )) + ), +) # val datasets val_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=coco_halpe26) - ], + ann_file="coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=coco_halpe26)], ) val_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_val.json', - data_prefix=dict( - img='pose/ai_challenge/ai_challenger_keypoint' - '_validation_20170911/keypoint_validation_images_20170911/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_halpe26) - ], + ann_file="aic/annotations/aic_val.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_validation_20170911/keypoint_validation_images_20170911/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_halpe26)], ) val_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_halpe26) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_halpe26)], ) val_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_val.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_halpe26) - ], + ann_file="mpii/annotations/mpii_val.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_halpe26)], ) val_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_test.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_halpe26) - ], + ann_file="jhmdb/annotations/Sub1_test.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_halpe26)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_halpe26) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_halpe26)], ) val_ochuman = dict( - type='OCHumanDataset', + type="OCHumanDataset", data_root=data_root, data_mode=data_mode, - ann_file='ochuman/annotations/' - 'ochuman_coco_format_val_range_0.00_1.00.json', - data_prefix=dict(img='pose/OCHuman/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=ochuman_halpe26) - ], + ann_file="ochuman/annotations/" "ochuman_coco_format_val_range_0.00_1.00.json", + data_prefix=dict(img="pose/OCHuman/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=ochuman_halpe26)], ) val_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_val.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_halpe26) - ], + ann_file="posetrack18/annotations/posetrack18_val.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_halpe26)], ) val_dataloader = dict( @@ -493,10 +379,10 @@ val_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/halpe26.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/halpe26.py"), datasets=[ val_coco, val_aic, @@ -509,27 +395,19 @@ val_dataloader = dict( ], pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -test_evaluator = [dict(type='PCKAccuracy', thr=0.1), dict(type='AUC')] +test_evaluator = [dict(type="PCKAccuracy", thr=0.1), dict(type="AUC")] val_evaluator = test_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_aic-coco-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_aic-coco-256x192.py index 662bd72924b4e77c3f559c62475b0363d1472d0e..4ca8831f5b81122ecbbb884e1aa83992b974b3f9 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_aic-coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_aic-coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,98 +10,79 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1024, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -111,93 +92,74 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # train datasets dataset_coco = dict( - type='RepeatDataset', + type="RepeatDataset", dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ), - times=3) + times=3, +) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=17, mapping=[ (0, 6), @@ -212,7 +174,8 @@ dataset_aic = dict( (9, 11), (10, 13), (11, 15), - ]) + ], + ) ], ) @@ -221,52 +184,43 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[dataset_coco, dataset_aic], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', + ann_file="coco/annotations/person_keypoints_val2017.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_aic-coco-384x288.py b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_aic-coco-384x288.py index 7b5895962bc23a3b3c2d6bfcd30466cbcb5a8f92..020dfe93e73273a76fe29041270b4c83ab527a9b 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_aic-coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_aic-coco-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,98 +10,79 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(288, 384), - sigma=(6., 6.93), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(288, 384), sigma=(6.0, 6.93), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1024, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -111,93 +92,74 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # train datasets dataset_coco = dict( - type='RepeatDataset', + type="RepeatDataset", dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ), - times=3) + times=3, +) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=17, mapping=[ (0, 6), @@ -212,7 +174,8 @@ dataset_aic = dict( (9, 11), (10, 13), (11, 15), - ]) + ], + ) ], ) @@ -221,52 +184,43 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[dataset_coco, dataset_aic], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', + ann_file="coco/annotations/person_keypoints_val2017.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_coco-256x192.py index 85c7695c59e299ab7e68c7991847460014cd4d7a..71f7d990215688f75b14cc2e4dab05131a09d577 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,98 +10,77 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1024, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = "/datagrid/personal/purkrmir/data/COCO/original/" +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "path/to/COCO/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -111,68 +90,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -180,55 +140,46 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/densepose.json', + ann_file="annotations/densepose.json", # ann_file='annotations/person_keypoints_val2017.json', # bbox_file=f'{data_root}person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/densepose.json') - # ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/densepose.json") +# ann_file=data_root + 'annotations/person_keypoints_val2017.json') test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-m_8xb256-420e_aic-coco-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-m_8xb256-420e_aic-coco-256x192.py index c7840f6c46c636bd340488c5f4059b804d02783e..555e2748d6d958556acafee87c51de55ac9c04a3 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-m_8xb256-420e_aic-coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-m_8xb256-420e_aic-coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,98 +10,79 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -111,93 +92,74 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # train datasets dataset_coco = dict( - type='RepeatDataset', + type="RepeatDataset", dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ), - times=3) + times=3, +) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=17, mapping=[ (0, 6), @@ -212,7 +174,8 @@ dataset_aic = dict( (9, 11), (10, 13), (11, 15), - ]) + ], + ) ], ) @@ -221,52 +184,43 @@ train_dataloader = dict( batch_size=128 * 2, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[dataset_coco, dataset_aic], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', + ann_file="coco/annotations/person_keypoints_val2017.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-m_8xb256-420e_aic-coco-384x288.py b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-m_8xb256-420e_aic-coco-384x288.py index 1293a1ae1c40d94b65df4c27097e4b5d506a2b1b..71d823dd6dc617b1aab182e07f7adbf29c859446 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-m_8xb256-420e_aic-coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-m_8xb256-420e_aic-coco-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,98 +10,79 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(288, 384), - sigma=(6., 6.93), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(288, 384), sigma=(6.0, 6.93), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -111,93 +92,74 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # train datasets dataset_coco = dict( - type='RepeatDataset', + type="RepeatDataset", dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ), - times=3) + times=3, +) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=17, mapping=[ (0, 6), @@ -212,7 +174,8 @@ dataset_aic = dict( (9, 11), (10, 13), (11, 15), - ]) + ], + ) ], ) @@ -221,52 +184,43 @@ train_dataloader = dict( batch_size=128 * 2, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[dataset_coco, dataset_aic], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', + ann_file="coco/annotations/person_keypoints_val2017.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-m_8xb256-420e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-m_8xb256-420e_coco-256x192.py index f21d0e18c624d6e13f16fc394abd8a7ccd23e167..9c1a7bcdd368e6048452cc49b6b52a527a2050bc 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-m_8xb256-420e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-m_8xb256-420e_coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,98 +10,77 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -111,68 +90,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -180,53 +140,44 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', + ann_file="annotations/person_keypoints_val2017.json", # bbox_file=f'{data_root}person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-s_8xb256-420e_aic-coco-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-s_8xb256-420e_aic-coco-256x192.py index 6c9e9fdc55453eb93ab45ea6f8adfc18b333afc6..335e1ee2da7a43812a10e72b521576015d8698d0 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-s_8xb256-420e_aic-coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-s_8xb256-420e_aic-coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,98 +10,79 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.0), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.0), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.33, widen_factor=0.5, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-s_udp-aic-coco_210e-256x192-92f5a029_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-s_udp-aic-coco_210e-256x192-92f5a029_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=512, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -111,93 +92,74 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # train datasets dataset_coco = dict( - type='RepeatDataset', + type="RepeatDataset", dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ), - times=3) + times=3, +) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=17, mapping=[ (0, 6), @@ -212,7 +174,8 @@ dataset_aic = dict( (9, 11), (10, 13), (11, 15), - ]) + ], + ) ], ) @@ -221,52 +184,43 @@ train_dataloader = dict( batch_size=128 * 2, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[dataset_coco, dataset_aic], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', + ann_file="coco/annotations/person_keypoints_val2017.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-s_8xb256-420e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-s_8xb256-420e_coco-256x192.py index c0abcbb1dd880a6a74afc5cc2d47fed0bc3b72da..b039a63eadb8104416288ce27f55b412bcc4a01e 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-s_8xb256-420e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-s_8xb256-420e_coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,98 +10,77 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.0), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.33, widen_factor=0.5, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-s_udp-aic-coco_210e-256x192-92f5a029_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-s_udp-aic-coco_210e-256x192-92f5a029_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=512, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -111,68 +90,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -180,53 +140,44 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', + ann_file="annotations/person_keypoints_val2017.json", # bbox_file=f'{data_root}person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-t_8xb256-420e_aic-coco-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-t_8xb256-420e_aic-coco-256x192.py index 215a297944dce6d4d651aa3ac9d43b2878dd40b1..67cdceb05d923606793146b99e3ac0f384f3328e 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-t_8xb256-420e_aic-coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-t_8xb256-420e_aic-coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,98 +10,79 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.0), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.167, widen_factor=0.375, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-tiny_udp-aic-coco_210e-256x192-cbed682d_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-tiny_udp-aic-coco_210e-256x192-cbed682d_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=384, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -111,93 +92,74 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # train datasets dataset_coco = dict( - type='RepeatDataset', + type="RepeatDataset", dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ), - times=3) + times=3, +) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=17, mapping=[ (0, 6), @@ -212,7 +174,8 @@ dataset_aic = dict( (9, 11), (10, 13), (11, 15), - ]) + ], + ) ], ) @@ -221,36 +184,37 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[dataset_coco, dataset_aic], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', + ann_file="coco/annotations/person_keypoints_val2017.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ # Turn off EMA while training the tiny model @@ -260,14 +224,9 @@ custom_hooks = [ # momentum=0.0002, # update_buffers=True, # priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2) ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-t_8xb256-420e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-t_8xb256-420e_coco-256x192.py index cbe0978b2b66127c7ec31886b21117fa4de89048..f9a401d29524230617f38892f0005581931b9636 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-t_8xb256-420e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/coco/rtmpose-t_8xb256-420e_coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,98 +10,77 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.0), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.167, widen_factor=0.375, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-tiny_udp-aic-coco_210e-256x192-cbed682d_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-tiny_udp-aic-coco_210e-256x192-cbed682d_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=384, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -111,68 +90,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -180,37 +140,38 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', + ann_file="annotations/person_keypoints_val2017.json", # bbox_file=f'{data_root}person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ # Turn off EMA while training the tiny model @@ -220,14 +181,9 @@ custom_hooks = [ # momentum=0.0002, # update_buffers=True, # priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2) ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/crowdpose/rtmpose-m_8xb64-210e_crowdpose-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/crowdpose/rtmpose-m_8xb64-210e_crowdpose-256x192.py index e93a2f1099cd4d298a1a745a03eb7ddffd3b8998..b9b4d7316b67d149ec27b41fb95a568b7e1ab64a 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/crowdpose/rtmpose-m_8xb64-210e_crowdpose-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/crowdpose/rtmpose-m_8xb64-210e_crowdpose-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,97 +10,78 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=14, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CrowdPoseDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CrowdPoseDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -110,68 +91,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -179,56 +141,49 @@ train_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - bbox_file='data/crowdpose/annotations/det_for_crowd_test_0.1_0.5.json', - data_prefix=dict(img='pose/CrowdPose/images/'), + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + bbox_file="data/crowdpose/annotations/det_for_crowd_test_0.1_0.5.json", + data_prefix=dict(img="pose/CrowdPose/images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='crowdpose/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="crowdpose/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'crowdpose/annotations/mmpose_crowdpose_test.json', + type="CocoMetric", + ann_file=data_root + "crowdpose/annotations/mmpose_crowdpose_test.json", use_area=False, - iou_type='keypoints_crowd', - prefix='crowdpose') + iou_type="keypoints_crowd", + prefix="crowdpose", +) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-l_8xb256-420e_humanart-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-l_8xb256-420e_humanart-256x192.py index 384a712d95ffd5e5e6286cadfe53d6abd0f425fd..64779c60c23090fad4f3187c139b7717ad2cef29 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-l_8xb256-420e_humanart-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-l_8xb256-420e_humanart-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,98 +10,77 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmpose/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmpose/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1024, out_channels=17, - input_size=codec['input_size'], + input_size=codec["input_size"], in_featuremap_size=(6, 8), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'HumanArtDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "HumanArtDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -111,68 +90,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -180,53 +140,44 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart_coco.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/training_humanart_coco.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/validation_humanart.json', + ann_file="HumanArt/annotations/validation_humanart.json", # bbox_file=f'{data_root}HumanArt/person_detection_results/' # 'HumanArt_validation_detections_AP_H_56_person.json', - data_prefix=dict(img=''), + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'HumanArt/annotations/validation_humanart.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "HumanArt/annotations/validation_humanart.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-m_8xb256-420e_humanart-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-m_8xb256-420e_humanart-256x192.py index 30178cbb6dd68d56dd95e934c54ebf96b04482d8..b702f9764741800acdae327bf511721ec80a8963 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-m_8xb256-420e_humanart-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-m_8xb256-420e_humanart-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,98 +10,77 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmpose/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmpose/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=17, - input_size=codec['input_size'], + input_size=codec["input_size"], in_featuremap_size=(6, 8), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'HumanArtDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "HumanArtDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -111,68 +90,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -180,53 +140,44 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart_coco.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/training_humanart_coco.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/validation_humanart.json', + ann_file="HumanArt/annotations/validation_humanart.json", # bbox_file=f'{data_root}HumanArt/person_detection_results/' # 'HumanArt_validation_detections_AP_H_56_person.json', - data_prefix=dict(img=''), + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'HumanArt/annotations/validation_humanart.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "HumanArt/annotations/validation_humanart.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-s_8xb256-420e_humanart-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-s_8xb256-420e_humanart-256x192.py index b4263f25e741e25a0ec5b85900ff1b2587d2805d..f13e09b8bf273a27d9f6639241a1e544807deaf7 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-s_8xb256-420e_humanart-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-s_8xb256-420e_humanart-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,98 +10,77 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.0), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.33, widen_factor=0.5, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmpose/cspnext-s_udp-aic-coco_210e-256x192-92f5a029_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmpose/cspnext-s_udp-aic-coco_210e-256x192-92f5a029_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=512, out_channels=17, - input_size=codec['input_size'], + input_size=codec["input_size"], in_featuremap_size=(6, 8), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'HumanArtDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "HumanArtDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -111,68 +90,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -180,53 +140,44 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart_coco.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/training_humanart_coco.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/validation_humanart.json', + ann_file="HumanArt/annotations/validation_humanart.json", # bbox_file=f'{data_root}HumanArt/person_detection_results/' # 'HumanArt_validation_detections_AP_H_56_person.json', - data_prefix=dict(img=''), + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'HumanArt/annotations/validation_humanart.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "HumanArt/annotations/validation_humanart.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-t_8xb256-420e_humanart-256x192.py b/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-t_8xb256-420e_humanart-256x192.py index 869f04217d6caecfd422d387730cbfd28cc208c1..6c27d0b30e3b70e1eaf719db20b81a4b05ec644c 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-t_8xb256-420e_humanart-256x192.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/humanart/rtmpose-t_8xb256-420e_humanart-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 420 @@ -10,98 +10,77 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.0), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.167, widen_factor=0.375, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmpose/cspnext-tiny_udp-aic-coco_210e-256x192-cbed682d_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmpose/cspnext-tiny_udp-aic-coco_210e-256x192-cbed682d_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=384, out_channels=17, - input_size=codec['input_size'], + input_size=codec["input_size"], in_featuremap_size=(6, 8), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'HumanArtDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "HumanArtDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -111,68 +90,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -180,37 +140,38 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart_coco.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/training_humanart_coco.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/validation_humanart.json', + ann_file="HumanArt/annotations/validation_humanart.json", # bbox_file=f'{data_root}HumanArt/person_detection_results/' # 'HumanArt_validation_detections_AP_H_56_person.json', - data_prefix=dict(img=''), + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ # Turn off EMA while training the tiny model @@ -220,14 +181,9 @@ custom_hooks = [ # momentum=0.0002, # update_buffers=True, # priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2) ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'HumanArt/annotations/validation_humanart.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "HumanArt/annotations/validation_humanart.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/rtmpose/mpii/rtmpose-m_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/rtmpose/mpii/rtmpose-m_8xb64-210e_mpii-256x256.py index ca67020f510e739041a342e9aa15f68098dec189..71d08feaba86523844911f745882b6d725c95e32 100644 --- a/mmpose/configs/body_2d_keypoint/rtmpose/mpii/rtmpose-m_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/rtmpose/mpii/rtmpose-m_8xb64-210e_mpii-256x256.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -9,98 +9,77 @@ train_cfg = dict(max_epochs=max_epochs, val_interval=10) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning policy param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(256, 256), - sigma=(5.66, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(256, 256), sigma=(5.66, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=16, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -110,68 +89,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -179,50 +139,43 @@ train_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file=f'{data_root}/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file=f"{data_root}/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='PCK', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_mobilenetv2_wo-deconv-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_mobilenetv2_wo-deconv-8xb64-210e_coco-256x192.py index 800803d190265cbe8183143e7c7ed7b9ebabb21d..497206a766357d7f489f4efc575c7302523913e0 100644 --- a/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_mobilenetv2_wo-deconv-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_mobilenetv2_wo-deconv-8xb64-210e_coco-256x192.py @@ -1,84 +1,78 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', input_size=(192, 256), sigma=6.0, simcc_split_ratio=2.0) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=6.0, simcc_split_ratio=2.0) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='MobileNetV2', - widen_factor=1., - out_indices=(7, ), + type="MobileNetV2", + widen_factor=1.0, + out_indices=(7,), init_cfg=dict( - type='Pretrained', - checkpoint='mmcls://mobilenet_v2', - )), + type="Pretrained", + checkpoint="mmcls://mobilenet_v2", + ), + ), head=dict( - type='SimCCHead', + type="SimCCHead", in_channels=1280, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], deconv_out_channels=None, - loss=dict(type='KLDiscretLoss', use_target_weight=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + loss=dict(type="KLDiscretLoss", use_target_weight=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,39 +80,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_res50_8xb32-140e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_res50_8xb32-140e_coco-384x288.py index c04358299fe4189daf7ad19bbf76d18e8dd9305c..d579944d4ea080c1578ec4659bc19a42171aa294 100644 --- a/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_res50_8xb32-140e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_res50_8xb32-140e_coco-384x288.py @@ -1,80 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=140, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=1e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=1e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[90, 120], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[90, 120], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', input_size=(288, 384), sigma=6.0, simcc_split_ratio=2.0) +codec = dict(type="SimCCLabel", input_size=(288, 384), sigma=6.0, simcc_split_ratio=2.0) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='SimCCHead', + type="SimCCHead", in_channels=2048, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], - loss=dict(type='KLDiscretLoss', use_target_weight=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], + loss=dict(type="KLDiscretLoss", use_target_weight=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] test_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -82,39 +73,38 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_res50_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_res50_8xb64-210e_coco-256x192.py index 33232a4463ef44872e97b4fba455e0f3e990f109..e173d5fe9ec6bbce8c6cc81017ea10b517bfa7d4 100644 --- a/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_res50_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_res50_8xb64-210e_coco-256x192.py @@ -1,74 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=1e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=1e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict(type='MultiStepLR', milestones=[170, 200], gamma=0.1, by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', input_size=(192, 256), sigma=6.0, simcc_split_ratio=2.0) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=6.0, simcc_split_ratio=2.0) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='SimCCHead', + type="SimCCHead", in_channels=2048, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], - loss=dict(type='KLDiscretLoss', use_target_weight=True), - decoder=codec), - test_cfg=dict(flip_test=True)) + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], + loss=dict(type="KLDiscretLoss", use_target_weight=True), + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] test_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -76,39 +73,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_vipnas-mbv3_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_vipnas-mbv3_8xb64-210e_coco-256x192.py index ba8ba040cb639b02f701cf36bb8ad03eeb5ffdec..b220dc8f9adb5b0e63bcbd1ed59de448f29317cc 100644 --- a/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_vipnas-mbv3_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/simcc/coco/simcc_vipnas-mbv3_8xb64-210e_coco-256x192.py @@ -1,79 +1,72 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', input_size=(192, 256), sigma=6.0, simcc_split_ratio=2.0) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=6.0, simcc_split_ratio=2.0) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict(type='ViPNAS_MobileNetV3'), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ViPNAS_MobileNetV3"), head=dict( - type='SimCCHead', + type="SimCCHead", in_channels=160, out_channels=17, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], - deconv_type='vipnas', + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], + deconv_type="vipnas", deconv_out_channels=(160, 160, 160), deconv_num_groups=(160, 160, 160), - loss=dict(type='KLDiscretLoss', use_target_weight=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + loss=dict(type="KLDiscretLoss", use_target_weight=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -81,39 +74,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=data_root + 'person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=data_root + "person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/simcc/mpii/simcc_res50_wo-deconv-8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/simcc/mpii/simcc_res50_wo-deconv-8xb64-210e_mpii-256x256.py index ef8b47959ea1b70003fe9906889ccec3ee452a51..1bbe8b306afa23f0f3ffa5fb25f62f8362d7d875 100644 --- a/mmpose/configs/body_2d_keypoint/simcc/mpii/simcc_res50_wo-deconv-8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/simcc/mpii/simcc_res50_wo-deconv-8xb64-210e_mpii-256x256.py @@ -1,83 +1,74 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', input_size=(256, 256), sigma=6.0, simcc_split_ratio=2.0) +codec = dict(type="SimCCLabel", input_size=(256, 256), sigma=6.0, simcc_split_ratio=2.0) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='SimCCHead', + type="SimCCHead", in_channels=2048, out_channels=16, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], deconv_out_channels=None, - loss=dict(type='KLDiscretLoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KLDiscretLoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,36 +76,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file=f'{data_root}/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file=f"{data_root}/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/aic/td-hm_hrnet-w32_8xb64-210e_aic-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/aic/td-hm_hrnet-w32_8xb64-210e_aic-256x192.py index 4d4c504d388fbe627ef7f62393e5135604403110..400a8c245caf210121ca99b4b93e514a73a0340a 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/aic/td-hm_hrnet-w32_8xb64-210e_aic-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/aic/td-hm_hrnet-w32_8xb64-210e_aic-256x192.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=14, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AicDataset' -data_mode = 'topdown' -data_root = 'data/aic/' +dataset_type = "AicDataset" +data_mode = "topdown" +data_root = "data/aic/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,37 +84,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/aic_train.json', - data_prefix=dict(img='ai_challenger_keypoint_train_20170902/' - 'keypoint_train_images_20170902/'), + ann_file="annotations/aic_train.json", + data_prefix=dict(img="ai_challenger_keypoint_train_20170902/" "keypoint_train_images_20170902/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/aic_val.json', - data_prefix=dict(img='ai_challenger_keypoint_validation_20170911/' - 'keypoint_validation_images_20170911/'), + ann_file="annotations/aic_val.json", + data_prefix=dict(img="ai_challenger_keypoint_validation_20170911/" "keypoint_validation_images_20170911/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/aic_val.json', - use_area=False) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/aic_val.json", use_area=False) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/aic/td-hm_res101_8xb64-210e_aic-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/aic/td-hm_res101_8xb64-210e_aic-256x192.py index e61da3a5c4b6cbb89e78576c34ca8040f0fcca05..92c8fc31c4b9da48cc007d2bfa22e6cb61391bbb 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/aic/td-hm_res101_8xb64-210e_aic-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/aic/td-hm_res101_8xb64-210e_aic-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=14, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=14, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AicDataset' -data_mode = 'topdown' -data_root = 'data/aic/' +dataset_type = "AicDataset" +data_mode = "topdown" +data_root = "data/aic/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,37 +73,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/aic_train.json', - data_prefix=dict(img='ai_challenger_keypoint_train_20170902/' - 'keypoint_train_images_20170902/'), + ann_file="annotations/aic_train.json", + data_prefix=dict(img="ai_challenger_keypoint_train_20170902/" "keypoint_train_images_20170902/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/aic_val.json', - data_prefix=dict(img='ai_challenger_keypoint_validation_20170911/' - 'keypoint_validation_images_20170911/'), + ann_file="annotations/aic_val.json", + data_prefix=dict(img="ai_challenger_keypoint_validation_20170911/" "keypoint_validation_images_20170911/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/aic_val.json', - use_area=False) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/aic_val.json", use_area=False) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-l_udp_8xb256-210e_aic-coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-l_udp_8xb256-210e_aic-coco-256x192.py index fc1eb0d36c8b185369c8a722522f527fa37e0f8c..4469ba659d2d0bc2592f4da1447ca8dd4a7ad635 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-l_udp_8xb256-210e_aic-coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-l_udp_8xb256-210e_aic-coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,36 +10,31 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 105 to 210 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # keypoint mappings keypoint_mapping_coco = [ @@ -81,47 +76,39 @@ keypoint_mapping_aic = [ # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/' - 'cspnext-l_8xb256-rsb-a1-600e_in1k-6a760974.pth')), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/" + "cspnext-l_8xb256-rsb-a1-600e_in1k-6a760974.pth", + ), + ), head=dict( - type='HeatmapHead', - in_channels=1024, - out_channels=19, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), - test_cfg=dict( - flip_test=False, - output_keypoint_indices=[ - target for _, target in keypoint_mapping_coco - ])) + type="HeatmapHead", in_channels=1024, out_channels=19, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), + test_cfg=dict(flip_test=False, output_keypoint_indices=[target for _, target in keypoint_mapping_coco]), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -131,101 +118,72 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # train datasets dataset_coco = dict( - type='RepeatDataset', + type="RepeatDataset", dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=19, - mapping=keypoint_mapping_coco) - ], + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=19, mapping=keypoint_mapping_coco)], ), - times=3) + times=3, +) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=19, - mapping=keypoint_mapping_aic) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=19, mapping=keypoint_mapping_aic)], ) # data loaders @@ -233,52 +191,43 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_aic.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_aic.py"), datasets=[dataset_coco, dataset_aic], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', + ann_file="coco/annotations/person_keypoints_val2017.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-l_udp_8xb256-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-l_udp_8xb256-210e_coco-256x192.py index 6cce193544c775b5f4c749e3ca9c81ff547a507e..f1f352b7d18d69d15c5f60fc56e94d32e0fd4eee 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-l_udp_8xb256-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-l_udp_8xb256-210e_coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,80 +10,71 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 105 to 210 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/' - 'cspnext-l_8xb256-rsb-a1-600e_in1k-6a760974.pth')), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/" + "cspnext-l_8xb256-rsb-a1-600e_in1k-6a760974.pth", + ), + ), head=dict( - type='HeatmapHead', - in_channels=1024, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1024, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -93,68 +84,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -162,53 +134,44 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', + ann_file="annotations/person_keypoints_val2017.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-m_udp_8xb256-210e_aic-coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-m_udp_8xb256-210e_aic-coco-256x192.py index 096bf307859ee2946e8d42c66dc10ed23dbfe545..f28fc7d5f2d42588d7efe031dfa73812a5514a58 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-m_udp_8xb256-210e_aic-coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-m_udp_8xb256-210e_aic-coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,36 +10,31 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 105 to 210 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # keypoint mappings keypoint_mapping_coco = [ @@ -81,47 +76,39 @@ keypoint_mapping_aic = [ # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/' - 'cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth')), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/" + "cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth", + ), + ), head=dict( - type='HeatmapHead', - in_channels=768, - out_channels=19, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), - test_cfg=dict( - flip_test=False, - output_keypoint_indices=[ - target for _, target in keypoint_mapping_coco - ])) + type="HeatmapHead", in_channels=768, out_channels=19, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), + test_cfg=dict(flip_test=False, output_keypoint_indices=[target for _, target in keypoint_mapping_coco]), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -131,101 +118,72 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # train datasets dataset_coco = dict( - type='RepeatDataset', + type="RepeatDataset", dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=19, - mapping=keypoint_mapping_coco) - ], + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=19, mapping=keypoint_mapping_coco)], ), - times=3) + times=3, +) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=19, - mapping=keypoint_mapping_aic) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=19, mapping=keypoint_mapping_aic)], ) # data loaders @@ -233,52 +191,43 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_aic.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_aic.py"), datasets=[dataset_coco, dataset_aic], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', + ann_file="coco/annotations/person_keypoints_val2017.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-m_udp_8xb256-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-m_udp_8xb256-210e_coco-256x192.py index f86e9a8d609c2f200c888ad183f3cd890f35c388..6bf66acd29584b13e82152bc06dfb53843094754 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-m_udp_8xb256-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-m_udp_8xb256-210e_coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,80 +10,71 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 105 to 210 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/' - 'cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth')), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/" + "cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth", + ), + ), head=dict( - type='HeatmapHead', - in_channels=768, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=768, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -93,68 +84,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -162,53 +134,44 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', + ann_file="annotations/person_keypoints_val2017.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-s_udp_8xb256-210e_aic-coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-s_udp_8xb256-210e_aic-coco-256x192.py index 94cc7d02d2789fd5a82ff9d352063f5afe99aaf0..8888079322b53000322b83753bae5504fd0c7a9b 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-s_udp_8xb256-210e_aic-coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-s_udp_8xb256-210e_aic-coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,36 +10,31 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.0), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.0), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 105 to 210 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # keypoint mappings keypoint_mapping_coco = [ @@ -81,47 +76,39 @@ keypoint_mapping_aic = [ # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.33, widen_factor=0.5, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/' - 'cspnext-s_imagenet_600e-ea671761.pth')), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/" + "cspnext-s_imagenet_600e-ea671761.pth", + ), + ), head=dict( - type='HeatmapHead', - in_channels=512, - out_channels=19, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), - test_cfg=dict( - flip_test=False, - output_keypoint_indices=[ - target for _, target in keypoint_mapping_coco - ])) + type="HeatmapHead", in_channels=512, out_channels=19, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), + test_cfg=dict(flip_test=False, output_keypoint_indices=[target for _, target in keypoint_mapping_coco]), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -131,101 +118,72 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # train datasets dataset_coco = dict( - type='RepeatDataset', + type="RepeatDataset", dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=19, - mapping=keypoint_mapping_coco) - ], + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=19, mapping=keypoint_mapping_coco)], ), - times=3) + times=3, +) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=19, - mapping=keypoint_mapping_aic) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=19, mapping=keypoint_mapping_aic)], ) # data loaders @@ -233,52 +191,43 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_aic.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_aic.py"), datasets=[dataset_coco, dataset_aic], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', + ann_file="coco/annotations/person_keypoints_val2017.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-s_udp_8xb256-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-s_udp_8xb256-210e_coco-256x192.py index 6f50542e5bceb86c652bf4d8ab893386197217ef..26bbaf14b3353c4829d7c5b6f060bf41871f1079 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-s_udp_8xb256-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-s_udp_8xb256-210e_coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,80 +10,71 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 105 to 210 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.33, widen_factor=0.5, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/' - 'cspnext-s_imagenet_600e-ea671761.pth')), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/" + "cspnext-s_imagenet_600e-ea671761.pth", + ), + ), head=dict( - type='HeatmapHead', - in_channels=512, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=512, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -93,68 +84,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -162,53 +134,44 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', + ann_file="annotations/person_keypoints_val2017.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-tiny_udp_8xb256-210e_aic-coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-tiny_udp_8xb256-210e_aic-coco-256x192.py index cef1b204501573d4e0d3228c36595eb784fdc83b..28be33e4f1b174908068d75b625055548aa03e85 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-tiny_udp_8xb256-210e_aic-coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-tiny_udp_8xb256-210e_aic-coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,36 +10,31 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.0), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.0), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 105 to 210 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # keypoint mappings keypoint_mapping_coco = [ @@ -81,47 +76,39 @@ keypoint_mapping_aic = [ # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.167, widen_factor=0.375, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/' - 'cspnext-tiny_imagenet_600e-3a2dd350.pth')), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/" + "cspnext-tiny_imagenet_600e-3a2dd350.pth", + ), + ), head=dict( - type='HeatmapHead', - in_channels=384, - out_channels=19, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), - test_cfg=dict( - flip_test=False, - output_keypoint_indices=[ - target for _, target in keypoint_mapping_coco - ])) + type="HeatmapHead", in_channels=384, out_channels=19, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), + test_cfg=dict(flip_test=False, output_keypoint_indices=[target for _, target in keypoint_mapping_coco]), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -131,101 +118,72 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # train datasets dataset_coco = dict( - type='RepeatDataset', + type="RepeatDataset", dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='detection/coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=19, - mapping=keypoint_mapping_coco) - ], + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="detection/coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=19, mapping=keypoint_mapping_coco)], ), - times=3) + times=3, +) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=19, - mapping=keypoint_mapping_aic) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=19, mapping=keypoint_mapping_aic)], ) # data loaders @@ -233,36 +191,37 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_aic.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_aic.py"), datasets=[dataset_coco, dataset_aic], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', + ann_file="coco/annotations/person_keypoints_val2017.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='detection/coco/val2017/'), + data_prefix=dict(img="detection/coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ # dict( @@ -271,14 +230,9 @@ custom_hooks = [ # momentum=0.0002, # update_buffers=True, # priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2) ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "coco/annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-tiny_udp_8xb256-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-tiny_udp_8xb256-210e_coco-256x192.py index 7ec0bb2be7dbd59be7401cca1d4995d7741ee2b6..c5e1a8beb0a76cc727e33ace0b48f52502f2b9ac 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-tiny_udp_8xb256-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/cspnext-tiny_udp_8xb256-210e_coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,80 +10,71 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 105 to 210 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.167, widen_factor=0.375, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/' - 'cspnext-tiny_imagenet_600e-3a2dd350.pth')), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/" + "cspnext-tiny_imagenet_600e-3a2dd350.pth", + ), + ), head=dict( - type='HeatmapHead', - in_channels=384, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=384, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -93,68 +84,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -162,37 +134,38 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', + ann_file="annotations/person_keypoints_val2017.json", # bbox_file='data/coco/person_detection_results/' # 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ # dict( @@ -201,14 +174,9 @@ custom_hooks = [ # momentum=0.0002, # update_buffers=True, # priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2) ] # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm-vis_res50_8xb64-210e_coco-aic-256x192-merge.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm-vis_res50_8xb64-210e_coco-aic-256x192-merge.py index f5def39ed911b661b0a651a4c0b132b66bab934d..441c07b7c633c7bd8e0c5ee940e22b7e8ba1f379 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm-vis_res50_8xb64-210e_coco-aic-256x192-merge.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm-vis_res50_8xb64-210e_coco-aic-256x192-merge.py @@ -1,92 +1,80 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='VisPredictHead', + type="VisPredictHead", loss=dict( - type='BCELoss', + type="BCELoss", use_target_weight=True, use_sigmoid=True, loss_weight=1e-3, ), pose_cfg=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec)), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # train datasets @@ -94,21 +82,20 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=[], ) dataset_aic = dict( - type='AicDataset', - data_root='data/aic/', + type="AicDataset", + data_root="data/aic/", data_mode=data_mode, - ann_file='annotations/aic_train.json', - data_prefix=dict(img='ai_challenger_keypoint_train_20170902/' - 'keypoint_train_images_20170902/'), + ann_file="annotations/aic_train.json", + data_prefix=dict(img="ai_challenger_keypoint_train_20170902/" "keypoint_train_images_20170902/"), pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=17, mapping=[ (0, 6), @@ -123,7 +110,8 @@ dataset_aic = dict( (9, 11), (10, 13), (11, 15), - ]) + ], + ) ], ) @@ -132,36 +120,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[dataset_coco, dataset_aic], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', + type="CocoMetric", # score_mode='bbox', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') + ann_file=data_root + "annotations/person_keypoints_val2017.json", +) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_2xmspn50_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_2xmspn50_8xb32-210e_coco-256x192.py index 7af125c24d81c4bfa81cdafa3cb95f9729511b66..e4449399dd0e6bf9196bec877575b054f9efb3d0 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_2xmspn50_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_2xmspn50_8xb32-210e_coco-256x192.py @@ -1,114 +1,95 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings # multiple kernel_sizes of heatmap gaussian for 'Megvii' approach. kernel_sizes = [15, 11, 9, 7, 5] -codec = [ - dict( - type='MegviiHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - kernel_size=kernel_size) for kernel_size in kernel_sizes -] +codec = [dict(type="MegviiHeatmap", input_size=(192, 256), heatmap_size=(48, 64), kernel_size=kernel_size) for kernel_size in kernel_sizes] # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='MSPN', + type="MSPN", unit_channels=256, num_stages=2, num_units=4, num_blocks=[3, 4, 6, 3], - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), init_cfg=dict( - type='Pretrained', - checkpoint='torchvision://resnet50', - )), + type="Pretrained", + checkpoint="torchvision://resnet50", + ), + ), head=dict( - type='MSPNHead', + type="MSPNHead", out_shape=(64, 48), unit_channels=256, out_channels=17, num_stages=2, num_units=4, - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), # each sub list is for a stage # and each element in each list is for a unit level_indices=[0, 1, 2, 3] + [1, 2, 3, 4], - loss=([ - dict( - type='KeypointMSELoss', - use_target_weight=True, - loss_weight=0.25) - ] * 3 + [ - dict( - type='KeypointOHKMMSELoss', - use_target_weight=True, - loss_weight=1.) - ]) * 2, - decoder=codec[-1]), + loss=( + [dict(type="KeypointMSELoss", use_target_weight=True, loss_weight=0.25)] * 3 + + [dict(type="KeypointOHKMMSELoss", use_target_weight=True, loss_weight=1.0)] + ) + * 2, + decoder=codec[-1], + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='GenerateTarget', multilevel=True, encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="GenerateTarget", multilevel=True, encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -116,37 +97,35 @@ train_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json', - nms_mode='none') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json", nms_mode="none") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_2xrsn50_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_2xrsn50_8xb32-210e_coco-256x192.py index 0680f6995eee3dc9a345eab0353f5dc65c023f0f..1cfc3bb0a20b961ef13c3cdc224b8a2b7049a700 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_2xrsn50_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_2xrsn50_8xb32-210e_coco-256x192.py @@ -1,113 +1,93 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings # multiple kernel_sizes of heatmap gaussian for 'Megvii' approach. kernel_sizes = [15, 11, 9, 7, 5] -codec = [ - dict( - type='MegviiHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - kernel_size=kernel_size) for kernel_size in kernel_sizes -] +codec = [dict(type="MegviiHeatmap", input_size=(192, 256), heatmap_size=(48, 64), kernel_size=kernel_size) for kernel_size in kernel_sizes] # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='RSN', + type="RSN", unit_channels=256, num_stages=2, num_units=4, num_blocks=[3, 4, 6, 3], num_steps=4, - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), ), head=dict( - type='MSPNHead', + type="MSPNHead", out_shape=(64, 48), unit_channels=256, out_channels=17, num_stages=2, num_units=4, - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), # each sub list is for a stage # and each element in each list is for a unit level_indices=[0, 1, 2, 3] + [1, 2, 3, 4], - loss=([ - dict( - type='KeypointMSELoss', - use_target_weight=True, - loss_weight=0.25) - ] * 3 + [ - dict( - type='KeypointOHKMMSELoss', - use_target_weight=True, - loss_weight=1.) - ]) * 2, - decoder=codec[-1]), + loss=( + [dict(type="KeypointMSELoss", use_target_weight=True, loss_weight=0.25)] * 3 + + [dict(type="KeypointOHKMMSELoss", use_target_weight=True, loss_weight=1.0)] + ) + * 2, + decoder=codec[-1], + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='GenerateTarget', multilevel=True, encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="GenerateTarget", multilevel=True, encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,40 +95,38 @@ train_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json', - nms_mode='none') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json", nms_mode="none") test_evaluator = val_evaluator # fp16 settings -fp16 = dict(loss_scale='dynamic') +fp16 = dict(loss_scale="dynamic") diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_3xmspn50_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_3xmspn50_8xb32-210e_coco-256x192.py index 41162f01e5ac5c63977c11ea70b49372ef2b8476..e63e3d9304dc4ac1ac1f23755807e2c62d1ecd95 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_3xmspn50_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_3xmspn50_8xb32-210e_coco-256x192.py @@ -1,114 +1,95 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings # multiple kernel_sizes of heatmap gaussian for 'Megvii' approach. kernel_sizes = [15, 11, 9, 7, 5] -codec = [ - dict( - type='MegviiHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - kernel_size=kernel_size) for kernel_size in kernel_sizes -] +codec = [dict(type="MegviiHeatmap", input_size=(192, 256), heatmap_size=(48, 64), kernel_size=kernel_size) for kernel_size in kernel_sizes] # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='MSPN', + type="MSPN", unit_channels=256, num_stages=3, num_units=4, num_blocks=[3, 4, 6, 3], - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), init_cfg=dict( - type='Pretrained', - checkpoint='torchvision://resnet50', - )), + type="Pretrained", + checkpoint="torchvision://resnet50", + ), + ), head=dict( - type='MSPNHead', + type="MSPNHead", out_shape=(64, 48), unit_channels=256, out_channels=17, num_stages=3, num_units=4, - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), # each sub list is for a stage # and each element in each list is for a unit level_indices=[0, 1, 2, 3] * 2 + [1, 2, 3, 4], - loss=([ - dict( - type='KeypointMSELoss', - use_target_weight=True, - loss_weight=0.25) - ] * 3 + [ - dict( - type='KeypointOHKMMSELoss', - use_target_weight=True, - loss_weight=1.) - ]) * 3, - decoder=codec[-1]), + loss=( + [dict(type="KeypointMSELoss", use_target_weight=True, loss_weight=0.25)] * 3 + + [dict(type="KeypointOHKMMSELoss", use_target_weight=True, loss_weight=1.0)] + ) + * 3, + decoder=codec[-1], + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='GenerateTarget', multilevel=True, encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="GenerateTarget", multilevel=True, encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -116,37 +97,35 @@ train_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json', - nms_mode='none') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json", nms_mode="none") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_3xrsn50_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_3xrsn50_8xb32-210e_coco-256x192.py index 99326451c6d05162bc3df0c8d71e8305baf574fd..f35ec4cc6029ed933628f48eaad23fb6d216f0fd 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_3xrsn50_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_3xrsn50_8xb32-210e_coco-256x192.py @@ -1,113 +1,93 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings # multiple kernel_sizes of heatmap gaussian for 'Megvii' approach. kernel_sizes = [15, 11, 9, 7, 5] -codec = [ - dict( - type='MegviiHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - kernel_size=kernel_size) for kernel_size in kernel_sizes -] +codec = [dict(type="MegviiHeatmap", input_size=(192, 256), heatmap_size=(48, 64), kernel_size=kernel_size) for kernel_size in kernel_sizes] # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='RSN', + type="RSN", unit_channels=256, num_stages=3, num_units=4, num_blocks=[3, 4, 6, 3], num_steps=4, - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), ), head=dict( - type='MSPNHead', + type="MSPNHead", out_shape=(64, 48), unit_channels=256, out_channels=17, num_stages=3, num_units=4, - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), # each sub list is for a stage # and each element in each list is for a unit level_indices=[0, 1, 2, 3] * 2 + [1, 2, 3, 4], - loss=([ - dict( - type='KeypointMSELoss', - use_target_weight=True, - loss_weight=0.25) - ] * 3 + [ - dict( - type='KeypointOHKMMSELoss', - use_target_weight=True, - loss_weight=1.) - ]) * 3, - decoder=codec[-1]), + loss=( + [dict(type="KeypointMSELoss", use_target_weight=True, loss_weight=0.25)] * 3 + + [dict(type="KeypointOHKMMSELoss", use_target_weight=True, loss_weight=1.0)] + ) + * 3, + decoder=codec[-1], + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='GenerateTarget', multilevel=True, encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="GenerateTarget", multilevel=True, encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,40 +95,38 @@ train_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json', - nms_mode='none') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json", nms_mode="none") test_evaluator = val_evaluator # fp16 settings -fp16 = dict(loss_scale='dynamic') +fp16 = dict(loss_scale="dynamic") diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_4xmspn50_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_4xmspn50_8xb32-210e_coco-256x192.py index 999245e74dfc87985e34e3122979fe02486c5b4f..db31173b01139978fd5d7da94ef7a8871a2246cf 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_4xmspn50_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_4xmspn50_8xb32-210e_coco-256x192.py @@ -1,114 +1,95 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings # multiple kernel_sizes of heatmap gaussian for 'Megvii' approach. kernel_sizes = [15, 11, 9, 7, 5] -codec = [ - dict( - type='MegviiHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - kernel_size=kernel_size) for kernel_size in kernel_sizes -] +codec = [dict(type="MegviiHeatmap", input_size=(192, 256), heatmap_size=(48, 64), kernel_size=kernel_size) for kernel_size in kernel_sizes] # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='MSPN', + type="MSPN", unit_channels=256, num_stages=4, num_units=4, num_blocks=[3, 4, 6, 3], - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), init_cfg=dict( - type='Pretrained', - checkpoint='torchvision://resnet50', - )), + type="Pretrained", + checkpoint="torchvision://resnet50", + ), + ), head=dict( - type='MSPNHead', + type="MSPNHead", out_shape=(64, 48), unit_channels=256, out_channels=17, num_stages=4, num_units=4, - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), # each sub list is for a stage # and each element in each list is for a unit level_indices=[0, 1, 2, 3] * 3 + [1, 2, 3, 4], - loss=([ - dict( - type='KeypointMSELoss', - use_target_weight=True, - loss_weight=0.25) - ] * 3 + [ - dict( - type='KeypointOHKMMSELoss', - use_target_weight=True, - loss_weight=1.) - ]) * 4, - decoder=codec[-1]), + loss=( + [dict(type="KeypointMSELoss", use_target_weight=True, loss_weight=0.25)] * 3 + + [dict(type="KeypointOHKMMSELoss", use_target_weight=True, loss_weight=1.0)] + ) + * 4, + decoder=codec[-1], + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='GenerateTarget', multilevel=True, encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="GenerateTarget", multilevel=True, encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -116,37 +97,35 @@ train_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json', - nms_mode='none') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json", nms_mode="none") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-base-simple_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-base-simple_8xb64-210e_coco-256x192.py index 5a557805052048f2ccbb6c6dc89fc3e578922a36..fbb36a06b908427cd8eb42526f1479599f201907 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-base-simple_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-base-simple_8xb64-210e_coco-256x192.py @@ -1,116 +1,99 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), + optimizer=dict(type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), paramwise_cfg=dict( num_layers=12, layer_decay_rate=0.75, custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), }, ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), ) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='mmpretrain.VisionTransformer', - arch='base', + type="mmpretrain.VisionTransformer", + arch="base", img_size=(256, 192), patch_size=16, qkv_bias=True, drop_path_rate=0.3, with_cls_token=False, - out_type='featmap', + out_type="featmap", patch_cfg=dict(padding=2), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'v1/pretrained_models/mae_pretrain_vit_base_20230913.pth'), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "v1/pretrained_models/mae_pretrain_vit_base_20230913.pth" + ), ), - neck=dict(type='FeatureMapProcessor', scale_factor=4.0, apply_relu=True), + neck=dict(type="FeatureMapProcessor", scale_factor=4.0, apply_relu=True), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=768, out_channels=17, deconv_out_channels=[], deconv_kernel_sizes=[], final_layer=dict(kernel_size=3, padding=1), - loss=dict(type='KeypointMSELoss', use_target_weight=True), + loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec, ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -data_root = 'data/coco/' -dataset_type = 'CocoDataset' -data_mode = 'topdown' +data_root = "data/coco/" +dataset_type = "CocoDataset" +data_mode = "topdown" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -118,36 +101,35 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-base_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-base_8xb64-210e_coco-256x192.py index 06522b7b911370b214cb0917f00b327c500194aa..e12c8909ada27171484108b2d31c4fe701b66f65 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-base_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-base_8xb64-210e_coco-256x192.py @@ -1,113 +1,97 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), + optimizer=dict(type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), paramwise_cfg=dict( num_layers=12, layer_decay_rate=0.75, custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), }, ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), ) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='mmpretrain.VisionTransformer', - arch='base', + type="mmpretrain.VisionTransformer", + arch="base", img_size=(256, 192), patch_size=16, qkv_bias=True, drop_path_rate=0.3, with_cls_token=False, - out_type='featmap', + out_type="featmap", patch_cfg=dict(padding=2), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'v1/pretrained_models/mae_pretrain_vit_base_20230913.pth'), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "v1/pretrained_models/mae_pretrain_vit_base_20230913.pth" + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=768, out_channels=17, deconv_out_channels=(256, 256), deconv_kernel_sizes=(4, 4), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -data_root = 'data/coco/' -dataset_type = 'CocoDataset' -data_mode = 'topdown' +data_root = "data/coco/" +dataset_type = "CocoDataset" +data_mode = "topdown" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +99,35 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge-simple_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge-simple_8xb64-210e_coco-256x192.py index 03ae669807ff5849aec01c37633669be790555e6..5b119afd4e810f5435abaf6a47803a4766f568f7 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge-simple_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge-simple_8xb64-210e_coco-256x192.py @@ -1,116 +1,99 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), + optimizer=dict(type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), paramwise_cfg=dict( num_layers=32, layer_decay_rate=0.85, custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), }, ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), ) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='mmpretrain.VisionTransformer', - arch='huge', + type="mmpretrain.VisionTransformer", + arch="huge", img_size=(256, 192), patch_size=16, qkv_bias=True, drop_path_rate=0.55, with_cls_token=False, - out_type='featmap', + out_type="featmap", patch_cfg=dict(padding=2), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'v1/pretrained_models/mae_pretrain_vit_huge_20230913.pth'), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "v1/pretrained_models/mae_pretrain_vit_huge_20230913.pth" + ), ), - neck=dict(type='FeatureMapProcessor', scale_factor=4.0, apply_relu=True), + neck=dict(type="FeatureMapProcessor", scale_factor=4.0, apply_relu=True), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=1280, out_channels=17, deconv_out_channels=[], deconv_kernel_sizes=[], final_layer=dict(kernel_size=3, padding=1), - loss=dict(type='KeypointMSELoss', use_target_weight=True), + loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec, ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -data_root = 'data/coco/' -dataset_type = 'CocoDataset' -data_mode = 'topdown' +data_root = "data/coco/" +dataset_type = "CocoDataset" +data_mode = "topdown" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -118,36 +101,35 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge_8xb64-210e_coco-256x192.py index 6b8afcf0f4ba2cd9a60c760db428168b556f882d..73bc57575e51650b87485b2ccdafdd4cfa570779 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-huge_8xb64-210e_coco-256x192.py @@ -1,113 +1,97 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), + optimizer=dict(type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), paramwise_cfg=dict( num_layers=32, layer_decay_rate=0.85, custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), }, ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), ) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='mmpretrain.VisionTransformer', - arch='huge', + type="mmpretrain.VisionTransformer", + arch="huge", img_size=(256, 192), patch_size=16, qkv_bias=True, drop_path_rate=0.55, with_cls_token=False, - out_type='featmap', + out_type="featmap", patch_cfg=dict(padding=2), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'v1/pretrained_models/mae_pretrain_vit_huge_20230913.pth'), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "v1/pretrained_models/mae_pretrain_vit_huge_20230913.pth" + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=1280, out_channels=17, deconv_out_channels=(256, 256), deconv_kernel_sizes=(4, 4), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -data_root = 'data/coco/' -dataset_type = 'CocoDataset' -data_mode = 'topdown' +data_root = "data/coco/" +dataset_type = "CocoDataset" +data_mode = "topdown" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +99,35 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-large-simple_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-large-simple_8xb64-210e_coco-256x192.py index 2035e786dfe538f85b4cdcb19ed44dc11b4ba8f9..2331c50164bd9b2742698fe87ff0f6a4c059831c 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-large-simple_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-large-simple_8xb64-210e_coco-256x192.py @@ -1,116 +1,100 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), + optimizer=dict(type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), paramwise_cfg=dict( num_layers=24, layer_decay_rate=0.8, custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), }, ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), ) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='mmpretrain.VisionTransformer', - arch='large', + type="mmpretrain.VisionTransformer", + arch="large", img_size=(256, 192), patch_size=16, qkv_bias=True, drop_path_rate=0.5, with_cls_token=False, - out_type='featmap', + out_type="featmap", patch_cfg=dict(padding=2), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'v1/pretrained_models/mae_pretrain_vit_large_20230913.pth'), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmpose/" "v1/pretrained_models/mae_pretrain_vit_large_20230913.pth", + ), ), - neck=dict(type='FeatureMapProcessor', scale_factor=4.0, apply_relu=True), + neck=dict(type="FeatureMapProcessor", scale_factor=4.0, apply_relu=True), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=1024, out_channels=17, deconv_out_channels=[], deconv_kernel_sizes=[], final_layer=dict(kernel_size=3, padding=1), - loss=dict(type='KeypointMSELoss', use_target_weight=True), + loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec, ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -data_root = 'data/coco/' -dataset_type = 'CocoDataset' -data_mode = 'topdown' +data_root = "data/coco/" +dataset_type = "CocoDataset" +data_mode = "topdown" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -118,36 +102,35 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-large_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-large_8xb64-210e_coco-256x192.py index f1d0e90578cb654283de23e77f2353e94d0b0e42..608de6ef1f79f8610537e7115855296e4adf69cb 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-large_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-large_8xb64-210e_coco-256x192.py @@ -1,113 +1,98 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), + optimizer=dict(type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), paramwise_cfg=dict( num_layers=24, layer_decay_rate=0.8, custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), }, ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), ) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='mmpretrain.VisionTransformer', - arch='large', + type="mmpretrain.VisionTransformer", + arch="large", img_size=(256, 192), patch_size=16, qkv_bias=True, drop_path_rate=0.5, with_cls_token=False, - out_type='featmap', + out_type="featmap", patch_cfg=dict(padding=2), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'v1/pretrained_models/mae_pretrain_vit_large_20230913.pth'), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmpose/" "v1/pretrained_models/mae_pretrain_vit_large_20230913.pth", + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=1024, out_channels=17, deconv_out_channels=(256, 256), deconv_kernel_sizes=(4, 4), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -data_root = 'data/coco/' -dataset_type = 'CocoDataset' -data_mode = 'topdown' +data_root = "data/coco/" +dataset_type = "CocoDataset" +data_mode = "topdown" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +100,35 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small-simple_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small-simple_8xb64-210e_coco-256x192.py index d8216089b79d8d97f59e592a04ab4fac5c448587..ef82711bac76930151fe23e4ce79a49ca54a63cb 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small-simple_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small-simple_8xb64-210e_coco-256x192.py @@ -1,121 +1,100 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), + optimizer=dict(type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), paramwise_cfg=dict( num_layers=12, layer_decay_rate=0.8, custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), }, ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), ) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='mmpretrain.VisionTransformer', - arch={ - 'embed_dims': 384, - 'num_layers': 12, - 'num_heads': 12, - 'feedforward_channels': 384 * 4 - }, + type="mmpretrain.VisionTransformer", + arch={"embed_dims": 384, "num_layers": 12, "num_heads": 12, "feedforward_channels": 384 * 4}, img_size=(256, 192), patch_size=16, qkv_bias=True, drop_path_rate=0.1, with_cls_token=False, - out_type='featmap', + out_type="featmap", patch_cfg=dict(padding=2), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'v1/pretrained_models/mae_pretrain_vit_small_20230913.pth'), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmpose/" "v1/pretrained_models/mae_pretrain_vit_small_20230913.pth", + ), ), - neck=dict(type='FeatureMapProcessor', scale_factor=4.0, apply_relu=True), + neck=dict(type="FeatureMapProcessor", scale_factor=4.0, apply_relu=True), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=384, out_channels=17, deconv_out_channels=[], deconv_kernel_sizes=[], final_layer=dict(kernel_size=3, padding=1), - loss=dict(type='KeypointMSELoss', use_target_weight=True), + loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec, ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -data_root = 'data/coco/' -dataset_type = 'CocoDataset' -data_mode = 'topdown' +data_root = "data/coco/" +dataset_type = "CocoDataset" +data_mode = "topdown" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -123,36 +102,35 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small_8xb64-210e_coco-256x192.py index 5b77da96eba1e1fa83c93a5010609ec5cced0a5b..93cd691ba26dc69041cec3e3cafc477477650d33 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small_8xb64-210e_coco-256x192.py @@ -1,118 +1,98 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), + optimizer=dict(type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), paramwise_cfg=dict( num_layers=12, layer_decay_rate=0.8, custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), }, ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), ) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='mmpretrain.VisionTransformer', - arch={ - 'embed_dims': 384, - 'num_layers': 12, - 'num_heads': 12, - 'feedforward_channels': 384 * 4 - }, + type="mmpretrain.VisionTransformer", + arch={"embed_dims": 384, "num_layers": 12, "num_heads": 12, "feedforward_channels": 384 * 4}, img_size=(256, 192), patch_size=16, qkv_bias=True, drop_path_rate=0.1, with_cls_token=False, - out_type='featmap', + out_type="featmap", patch_cfg=dict(padding=2), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'v1/pretrained_models/mae_pretrain_vit_small_20230913.pth'), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmpose/" "v1/pretrained_models/mae_pretrain_vit_small_20230913.pth", + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=384, out_channels=17, deconv_out_channels=(256, 256), deconv_kernel_sizes=(4, 4), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -data_root = 'data/coco/' -dataset_type = 'CocoDataset' -data_mode = 'topdown' +data_root = "data/coco/" +dataset_type = "CocoDataset" +data_mode = "topdown" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -120,36 +100,35 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_alexnet_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_alexnet_8xb64-210e_coco-256x192.py index 4051f4c5ec52fe170d5a6a050e867fe5ebb255a3..9a3097c012ba24184058190323b12e05bd8d91b1 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_alexnet_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_alexnet_8xb64-210e_coco-256x192.py @@ -1,80 +1,67 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(40, 56), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(40, 56), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict(type='AlexNet', num_classes=-1), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="AlexNet", num_classes=-1), head=dict( - type='HeatmapHead', - in_channels=256, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=256, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -82,36 +69,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_cpm_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_cpm_8xb32-210e_coco-384x288.py index 38b23cf7182c45a507b87c4a372fd2e174e32eb1..0edff857e5ac3b024af789d5d643bfb4c0174663 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_cpm_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_cpm_8xb32-210e_coco-384x288.py @@ -1,88 +1,74 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(36, 48), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(36, 48), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict( - type='CPM', - in_channels=3, - out_channels=17, - feat_channels=128, - num_stages=6), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="CPM", in_channels=3, out_channels=17, feat_channels=128, num_stages=6), head=dict( - type='CPMHead', + type="CPMHead", in_channels=17, out_channels=17, num_stages=6, deconv_out_channels=None, final_layer=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -90,36 +76,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_cpm_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_cpm_8xb64-210e_coco-256x192.py index 17f7eb9677fbf0d285628e059835a45f443caeef..a035c7bf47ec8dfc3c6463c0128ffeb39f72480e 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_cpm_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_cpm_8xb64-210e_coco-256x192.py @@ -1,88 +1,74 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(24, 32), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(24, 32), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict( - type='CPM', - in_channels=3, - out_channels=17, - feat_channels=128, - num_stages=6), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="CPM", in_channels=3, out_channels=17, feat_channels=128, num_stages=6), head=dict( - type='CPMHead', + type="CPMHead", in_channels=17, out_channels=17, num_stages=6, deconv_out_channels=None, final_layer=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -90,36 +76,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hourglass52_8xb32-210e_coco-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hourglass52_8xb32-210e_coco-256x256.py index b9d49c8e6a7df8160db26ff6a0cbabe20b6f4a4a..d0863c0fe70e0093063f1566f6ed38b9f4bac945 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hourglass52_8xb32-210e_coco-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hourglass52_8xb32-210e_coco-256x256.py @@ -1,85 +1,76 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HourglassNet', + type="HourglassNet", num_stacks=1, ), head=dict( - type='CPMHead', + type="CPMHead", in_channels=256, out_channels=17, num_stages=1, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +78,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hourglass52_8xb32-210e_coco-384x384.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hourglass52_8xb32-210e_coco-384x384.py index d9932ff9e3773a591650ee94a95da2784bf562eb..e1049f54dfd4a0ee0819e615c2abde0fbdfed2de 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hourglass52_8xb32-210e_coco-384x384.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hourglass52_8xb32-210e_coco-384x384.py @@ -1,85 +1,76 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(384, 384), heatmap_size=(96, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(384, 384), heatmap_size=(96, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HourglassNet', + type="HourglassNet", num_stacks=1, ), head=dict( - type='CPMHead', + type="CPMHead", in_channels=256, out_channels=17, num_stages=1, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +78,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-base_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-base_8xb32-210e_coco-256x192.py index 8b81dbdaac0c4df6eed9f287379b25e81ab6ce7d..c4f19e83c10eee2e651373bc03125d34545739e2 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-base_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-base_8xb32-210e_coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) @@ -6,129 +6,116 @@ train_cfg = dict(max_epochs=210, val_interval=10) # optimizer optim_wrapper = dict( optimizer=dict( - type='AdamW', + type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.01, ), - paramwise_cfg=dict( - custom_keys={'relative_position_bias_table': dict(decay_mult=0.)})) + paramwise_cfg=dict(custom_keys={"relative_position_bias_table": dict(decay_mult=0.0)}), +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRFormer', + type="HRFormer", in_channels=3, norm_cfg=norm_cfg, extra=dict( drop_path_rate=0.2, with_rpe=True, stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(2, ), - num_channels=(64, ), - num_heads=[2], - mlp_ratios=[4]), + num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(2,), num_channels=(64,), num_heads=[2], mlp_ratios=[4] + ), stage2=dict( num_modules=1, num_branches=2, - block='HRFORMERBLOCK', + block="HRFORMERBLOCK", num_blocks=(2, 2), num_channels=(78, 156), num_heads=[2, 4], mlp_ratios=[4, 4], - window_sizes=[7, 7]), + window_sizes=[7, 7], + ), stage3=dict( num_modules=4, num_branches=3, - block='HRFORMERBLOCK', + block="HRFORMERBLOCK", num_blocks=(2, 2, 2), num_channels=(78, 156, 312), num_heads=[2, 4, 8], mlp_ratios=[4, 4, 4], - window_sizes=[7, 7, 7]), + window_sizes=[7, 7, 7], + ), stage4=dict( num_modules=2, num_branches=4, - block='HRFORMERBLOCK', + block="HRFORMERBLOCK", num_blocks=(2, 2, 2, 2), num_channels=(78, 156, 312, 624), num_heads=[2, 4, 8, 16], mlp_ratios=[4, 4, 4, 4], - window_sizes=[7, 7, 7, 7])), + window_sizes=[7, 7, 7, 7], + ), + ), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrformer_base-32815020_20220226.pth'), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrformer_base-32815020_20220226.pth" + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=78, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -136,39 +123,38 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator # fp16 settings -fp16 = dict(loss_scale='dynamic') +fp16 = dict(loss_scale="dynamic") diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-base_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-base_8xb32-210e_coco-384x288.py index 351685464c9560dd748da728372dbcf46a8dfc70..5a8325bb58eb42fd9ce95da0f3e1da4a3e3be1bf 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-base_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-base_8xb32-210e_coco-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) @@ -6,129 +6,116 @@ train_cfg = dict(max_epochs=210, val_interval=10) # optimizer optim_wrapper = dict( optimizer=dict( - type='AdamW', + type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.01, ), - paramwise_cfg=dict( - custom_keys={'relative_position_bias_table': dict(decay_mult=0.)})) + paramwise_cfg=dict(custom_keys={"relative_position_bias_table": dict(decay_mult=0.0)}), +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRFormer', + type="HRFormer", in_channels=3, norm_cfg=norm_cfg, extra=dict( drop_path_rate=0.2, with_rpe=True, stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(2, ), - num_channels=(64, ), - num_heads=[2], - mlp_ratios=[4]), + num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(2,), num_channels=(64,), num_heads=[2], mlp_ratios=[4] + ), stage2=dict( num_modules=1, num_branches=2, - block='HRFORMERBLOCK', + block="HRFORMERBLOCK", num_blocks=(2, 2), num_channels=(78, 156), num_heads=[2, 4], mlp_ratios=[4, 4], - window_sizes=[7, 7]), + window_sizes=[7, 7], + ), stage3=dict( num_modules=4, num_branches=3, - block='HRFORMERBLOCK', + block="HRFORMERBLOCK", num_blocks=(2, 2, 2), num_channels=(78, 156, 312), num_heads=[2, 4, 8], mlp_ratios=[4, 4, 4], - window_sizes=[7, 7, 7]), + window_sizes=[7, 7, 7], + ), stage4=dict( num_modules=2, num_branches=4, - block='HRFORMERBLOCK', + block="HRFORMERBLOCK", num_blocks=(2, 2, 2, 2), num_channels=(78, 156, 312, 624), num_heads=[2, 4, 8, 16], mlp_ratios=[4, 4, 4, 4], - window_sizes=[7, 7, 7, 7])), + window_sizes=[7, 7, 7, 7], + ), + ), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrformer_base-32815020_20220226.pth'), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrformer_base-32815020_20220226.pth" + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=78, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -136,39 +123,38 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator # fp16 settings -fp16 = dict(loss_scale='dynamic') +fp16 = dict(loss_scale="dynamic") diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-small_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-small_8xb32-210e_coco-256x192.py index 6c59395c8ad5365285c3a26d9fbeb3855b050433..e740876f0315e0f81b9f35c82cf232db91002dbb 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-small_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-small_8xb32-210e_coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) @@ -6,129 +6,116 @@ train_cfg = dict(max_epochs=210, val_interval=10) # optimizer optim_wrapper = dict( optimizer=dict( - type='AdamW', + type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.01, ), - paramwise_cfg=dict( - custom_keys={'relative_position_bias_table': dict(decay_mult=0.)})) + paramwise_cfg=dict(custom_keys={"relative_position_bias_table": dict(decay_mult=0.0)}), +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRFormer', + type="HRFormer", in_channels=3, norm_cfg=norm_cfg, extra=dict( drop_path_rate=0.1, with_rpe=True, stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(2, ), - num_channels=(64, ), - num_heads=[2], - num_mlp_ratios=[4]), + num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(2,), num_channels=(64,), num_heads=[2], num_mlp_ratios=[4] + ), stage2=dict( num_modules=1, num_branches=2, - block='HRFORMERBLOCK', + block="HRFORMERBLOCK", num_blocks=(2, 2), num_channels=(32, 64), num_heads=[1, 2], mlp_ratios=[4, 4], - window_sizes=[7, 7]), + window_sizes=[7, 7], + ), stage3=dict( num_modules=4, num_branches=3, - block='HRFORMERBLOCK', + block="HRFORMERBLOCK", num_blocks=(2, 2, 2), num_channels=(32, 64, 128), num_heads=[1, 2, 4], mlp_ratios=[4, 4, 4], - window_sizes=[7, 7, 7]), + window_sizes=[7, 7, 7], + ), stage4=dict( num_modules=2, num_branches=4, - block='HRFORMERBLOCK', + block="HRFORMERBLOCK", num_blocks=(2, 2, 2, 2), num_channels=(32, 64, 128, 256), num_heads=[1, 2, 4, 8], mlp_ratios=[4, 4, 4, 4], - window_sizes=[7, 7, 7, 7])), + window_sizes=[7, 7, 7, 7], + ), + ), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrformer_small-09516375_20220226.pth'), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrformer_small-09516375_20220226.pth" + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -136,39 +123,38 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator # fp16 settings -fp16 = dict(loss_scale='dynamic') +fp16 = dict(loss_scale="dynamic") diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-small_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-small_8xb32-210e_coco-384x288.py index eee3521a7c617e30efa16224520bda00fe2e64e7..cdc7e0fcdad44afc71431cbdce09e75f69eb84d4 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-small_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrformer-small_8xb32-210e_coco-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) @@ -6,129 +6,116 @@ train_cfg = dict(max_epochs=210, val_interval=10) # optimizer optim_wrapper = dict( optimizer=dict( - type='AdamW', + type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.01, ), - paramwise_cfg=dict( - custom_keys={'relative_position_bias_table': dict(decay_mult=0.)})) + paramwise_cfg=dict(custom_keys={"relative_position_bias_table": dict(decay_mult=0.0)}), +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRFormer', + type="HRFormer", in_channels=3, norm_cfg=norm_cfg, extra=dict( drop_path_rate=0.1, with_rpe=True, stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(2, ), - num_channels=(64, ), - num_heads=[2], - num_mlp_ratios=[4]), + num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(2,), num_channels=(64,), num_heads=[2], num_mlp_ratios=[4] + ), stage2=dict( num_modules=1, num_branches=2, - block='HRFORMERBLOCK', + block="HRFORMERBLOCK", num_blocks=(2, 2), num_channels=(32, 64), num_heads=[1, 2], mlp_ratios=[4, 4], - window_sizes=[7, 7]), + window_sizes=[7, 7], + ), stage3=dict( num_modules=4, num_branches=3, - block='HRFORMERBLOCK', + block="HRFORMERBLOCK", num_blocks=(2, 2, 2), num_channels=(32, 64, 128), num_heads=[1, 2, 4], mlp_ratios=[4, 4, 4], - window_sizes=[7, 7, 7]), + window_sizes=[7, 7, 7], + ), stage4=dict( num_modules=2, num_branches=4, - block='HRFORMERBLOCK', + block="HRFORMERBLOCK", num_blocks=(2, 2, 2, 2), num_channels=(32, 64, 128, 256), num_heads=[1, 2, 4, 8], mlp_ratios=[4, 4, 4, 4], - window_sizes=[7, 7, 7, 7])), + window_sizes=[7, 7, 7, 7], + ), + ), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrformer_small-09516375_20220226.pth'), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrformer_small-09516375_20220226.pth" + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -136,39 +123,38 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator # fp16 settings -fp16 = dict(loss_scale='dynamic') +fp16 = dict(loss_scale="dynamic") diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-256x192.py index ea486d830a5d397f0e65958c832933a3de6fee6d..e691857541688797eb07d9d02159c32a3c0df597 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-256x192.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +84,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-384x288.py index ae15d35ee11973169434b0b6d6b03ec46c9530a4..30bcb322b3d09b877075542d79afb5b1819deddb 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-384x288.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +84,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-aic-256x192-combine.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-aic-256x192-combine.py index f5d2ed0bfd422568e71aca13c7be56217dd5d381..aea9e14b4722d880e428ae1a98d3949f9ad4372c 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-aic-256x192-combine.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-aic-256x192-combine.py @@ -1,38 +1,30 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=3)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=3)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # keypoint mappings keypoint_mapping_coco = [ @@ -74,82 +66,54 @@ keypoint_mapping_aic = [ # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - metainfo=dict(from_file='configs/_base_/datasets/coco_aic.py'), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + metainfo=dict(from_file="configs/_base_/datasets/coco_aic.py"), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=19, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( - flip_test=True, - flip_mode='heatmap', - shift_heatmap=True, - output_keypoint_indices=[ - target for _, target in keypoint_mapping_coco - ])) + flip_test=True, flip_mode="heatmap", shift_heatmap=True, output_keypoint_indices=[target for _, target in keypoint_mapping_coco] + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # train datasets @@ -157,29 +121,18 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=19, - mapping=keypoint_mapping_coco) - ], + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=19, mapping=keypoint_mapping_coco)], ) dataset_aic = dict( - type='AicDataset', - data_root='data/aic/', + type="AicDataset", + data_root="data/aic/", data_mode=data_mode, - ann_file='annotations/aic_train.json', - data_prefix=dict(img='ai_challenger_keypoint_train_20170902/' - 'keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=19, - mapping=keypoint_mapping_aic) - ], + ann_file="annotations/aic_train.json", + data_prefix=dict(img="ai_challenger_keypoint_train_20170902/" "keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=19, mapping=keypoint_mapping_aic)], ) # data loaders @@ -187,35 +140,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_aic.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_aic.py"), datasets=[dataset_coco, dataset_aic], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-aic-256x192-merge.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-aic-256x192-merge.py index 847a40da2f08516a24e8bb765aac454a5cf0dc5f..0cefa0c0de5b26f404c87feb04fe7381979e1284 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-aic-256x192-merge.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-aic-256x192-merge.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # train datasets @@ -115,21 +84,20 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=[], ) dataset_aic = dict( - type='AicDataset', - data_root='data/aic/', + type="AicDataset", + data_root="data/aic/", data_mode=data_mode, - ann_file='annotations/aic_train.json', - data_prefix=dict(img='ai_challenger_keypoint_train_20170902/' - 'keypoint_train_images_20170902/'), + ann_file="annotations/aic_train.json", + data_prefix=dict(img="ai_challenger_keypoint_train_20170902/" "keypoint_train_images_20170902/"), pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=17, mapping=[ (0, 6), @@ -144,7 +112,8 @@ dataset_aic = dict( (9, 11), (10, 13), (11, 15), - ]) + ], + ) ], ) @@ -153,35 +122,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco.py"), datasets=[dataset_coco, dataset_aic], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_coarsedropout-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_coarsedropout-8xb64-210e_coco-256x192.py index a3ac0bd58901ec998641eec822561abb97779fc0..cad6ef22acfe4b9ea266f61207f055e60cc02261 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_coarsedropout-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_coarsedropout-8xb64-210e_coco-256x192.py @@ -1,128 +1,94 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/' - 'body_2d_keypoint/topdown_heatmap/coco/' - 'td-hm_hrnet-w32_8xb64-210e_coco-256x192-81c58e40_20220909.pth'), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/" + "body_2d_keypoint/topdown_heatmap/coco/" + "td-hm_hrnet-w32_8xb64-210e_coco-256x192-81c58e40_20220909.pth", + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict( - type='CoarseDropout', - max_holes=8, - max_height=40, - max_width=40, - min_holes=1, - min_height=10, - min_width=10, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="CoarseDropout", max_holes=8, max_height=40, max_width=40, min_holes=1, min_height=10, min_width=10, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -130,36 +96,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_dark-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_dark-8xb64-210e_coco-256x192.py index 7273a0503bd7e67505820de75a4be106922f43f0..007773350f0d9c460edc4733ac6a8547f52bc790 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_dark-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_dark-8xb64-210e_coco-256x192.py @@ -1,117 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -119,36 +84,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_dark-8xb64-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_dark-8xb64-210e_coco-384x288.py index 67b13b8babfe0ac672902f42212b66c5254433a2..9a2f2a2bf44ab14a416f53efd7a57a6a29f884ed 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_dark-8xb64-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_dark-8xb64-210e_coco-384x288.py @@ -1,117 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(288, 384), - heatmap_size=(72, 96), - sigma=3, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -119,36 +84,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_fp16-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_fp16-8xb64-210e_coco-256x192.py index 306d0aeb44b8014c3fa31743ff92b55b3b417927..5e495fdb90efe8baee8e136b8bc9718463c1f177 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_fp16-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_fp16-8xb64-210e_coco-256x192.py @@ -1,7 +1,7 @@ -_base_ = ['./td-hm_hrnet-w32_8xb64-210e_coco-256x192.py'] +_base_ = ["./td-hm_hrnet-w32_8xb64-210e_coco-256x192.py"] # fp16 settings optim_wrapper = dict( - type='AmpOptimWrapper', - loss_scale='dynamic', + type="AmpOptimWrapper", + loss_scale="dynamic", ) diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_gridmask-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_gridmask-8xb64-210e_coco-256x192.py index d380ad243db94d0ef80a55cee830fe28954c3b0e..f85e07e7852d97dd8416c558dd1e9e0cea120c65 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_gridmask-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_gridmask-8xb64-210e_coco-256x192.py @@ -1,125 +1,94 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/' - 'body_2d_keypoint/topdown_heatmap/coco/' - 'td-hm_hrnet-w32_8xb64-210e_coco-256x192-81c58e40_20220909.pth'), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/" + "body_2d_keypoint/topdown_heatmap/coco/" + "td-hm_hrnet-w32_8xb64-210e_coco-256x192-81c58e40_20220909.pth", + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict( - type='GridDropout', - unit_size_min=10, - unit_size_max=40, - random_offset=True, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="GridDropout", unit_size_min=10, unit_size_max=40, random_offset=True, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -127,36 +96,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_photometric-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_photometric-8xb64-210e_coco-256x192.py index f0bc7486ca27f2e58a41077527de9add9d9600b3..c120337236d231ba7605d24f87113102e7d7de5e 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_photometric-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_photometric-8xb64-210e_coco-256x192.py @@ -1,116 +1,89 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/' - 'body_2d_keypoint/topdown_heatmap/coco/' - 'td-hm_hrnet-w32_8xb64-210e_coco-256x192-81c58e40_20220909.pth'), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/" + "body_2d_keypoint/topdown_heatmap/coco/" + "td-hm_hrnet-w32_8xb64-210e_coco-256x192-81c58e40_20220909.pth", + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PhotometricDistortion'), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PhotometricDistortion"), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -118,36 +91,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_udp-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_udp-8xb64-210e_coco-256x192.py index 143a686ef7536a6cfccdbdf431de9188062caa3e..eb091d0833a19b3c9be99ad402e4e0caa2c7c614 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_udp-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_udp-8xb64-210e_coco-256x192.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +84,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_udp-8xb64-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_udp-8xb64-210e_coco-384x288.py index 113a91e18ce1fd3a934199f872ee6989c1e7cf95..b538a69e48facbe5db14a890d0c7d797f9751e33 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_udp-8xb64-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_udp-8xb64-210e_coco-384x288.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="UDPHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +84,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_udp-regress-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_udp-regress-8xb64-210e_coco-256x192.py index d147de838a2fce6b0293ede36ecac81b51942036..72525893eb5a98866c7cb6d5f4af08357a2ea4bf 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_udp-regress-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_udp-regress-8xb64-210e_coco-256x192.py @@ -1,118 +1,83 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='UDPHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - sigma=2, - heatmap_type='combined') +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2, heatmap_type="combined") # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=3 * 17, deconv_out_channels=None, - loss=dict(type='CombinedTargetMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="CombinedTargetMSELoss", use_target_weight=True), + decoder=codec, + ), train_cfg=dict(compute_acc=False), test_cfg=dict( flip_test=True, - flip_mode='udp_combined', + flip_mode="udp_combined", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -120,36 +85,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_8xb32-210e_coco-256x192.py index 1c5ff70ab47a0cf027c04983e6c1f3640ba56802..4fbe5ef4f3c3ce041ece37c4c6915ad9f5190253 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_8xb32-210e_coco-256x192.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +84,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_8xb32-210e_coco-384x288.py index f83b7d31a43bd0d84d55fbc2825438efa607fff0..e0294681010777bccdef40509be4c9a17be28724 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_8xb32-210e_coco-384x288.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +84,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_dark-8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_dark-8xb32-210e_coco-256x192.py index daf3cbaddc15d9ded726a3ce7183f2364ddb74c6..fafae55292547843a61cc3f2297c89fc1370a30d 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_dark-8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_dark-8xb32-210e_coco-256x192.py @@ -1,117 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -119,36 +84,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_dark-8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_dark-8xb32-210e_coco-384x288.py index eec52999c960c693c92b472cbff1d89d752dd2f1..402f5375b875e2e3d7ddffdab35b292701e35662 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_dark-8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_dark-8xb32-210e_coco-384x288.py @@ -1,117 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(288, 384), - heatmap_size=(72, 96), - sigma=3, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -119,36 +84,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_udp-8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_udp-8xb32-210e_coco-256x192.py index b705cb7fb3b59f158be04b4496e2a49922213f4f..90e1aa993d9f80254d08d2081c6c4c8ca6a865b6 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_udp-8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_udp-8xb32-210e_coco-256x192.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +84,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_udp-8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_udp-8xb32-210e_coco-384x288.py index cfa17ef098e5b471aba21b9d1a53dc154d8125cb..d431507b7d5f2dff37099ee9d7710ae739d5fe35 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_udp-8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_udp-8xb32-210e_coco-384x288.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="UDPHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +84,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-18_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-18_8xb32-210e_coco-384x288.py index caa7c267a09ea1080980dfeba1f26c22b9655169..7e2d122f413a79668ffbee2436a0b18d7b72f431 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-18_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-18_8xb32-210e_coco-384x288.py @@ -1,48 +1,37 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='LiteHRNet', + type="LiteHRNet", in_channels=3, extra=dict( stem=dict(stem_channels=32, out_channels=32, expand_ratio=1), @@ -51,53 +40,54 @@ model = dict( num_modules=(2, 4, 2), num_branches=(2, 3, 4), num_blocks=(2, 2, 2), - module_type=('LITE', 'LITE', 'LITE'), + module_type=("LITE", "LITE", "LITE"), with_fuse=(True, True, True), reduce_ratios=(8, 8, 8), num_channels=( (40, 80), (40, 80, 160), (40, 80, 160, 320), - )), + ), + ), with_head=True, - )), + ), + ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=40, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -105,36 +95,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-18_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-18_8xb64-210e_coco-256x192.py index 6f5a564d115bf7c94b6706ce337acbbccd94fb34..18b29f0a9635c2a597cba4e07ae458d21457cf94 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-18_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-18_8xb64-210e_coco-256x192.py @@ -1,48 +1,37 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='LiteHRNet', + type="LiteHRNet", in_channels=3, extra=dict( stem=dict(stem_channels=32, out_channels=32, expand_ratio=1), @@ -51,53 +40,54 @@ model = dict( num_modules=(2, 4, 2), num_branches=(2, 3, 4), num_blocks=(2, 2, 2), - module_type=('LITE', 'LITE', 'LITE'), + module_type=("LITE", "LITE", "LITE"), with_fuse=(True, True, True), reduce_ratios=(8, 8, 8), num_channels=( (40, 80), (40, 80, 160), (40, 80, 160, 320), - )), + ), + ), with_head=True, - )), + ), + ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=40, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -105,36 +95,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-30_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-30_8xb32-210e_coco-384x288.py index 663593552563dbe296ac3c780fda650dd8298c41..d08c12511dcc55506cc8d48bffa3ae630090a101 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-30_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-30_8xb32-210e_coco-384x288.py @@ -1,48 +1,37 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='LiteHRNet', + type="LiteHRNet", in_channels=3, extra=dict( stem=dict(stem_channels=32, out_channels=32, expand_ratio=1), @@ -51,53 +40,54 @@ model = dict( num_modules=(3, 8, 3), num_branches=(2, 3, 4), num_blocks=(2, 2, 2), - module_type=('LITE', 'LITE', 'LITE'), + module_type=("LITE", "LITE", "LITE"), with_fuse=(True, True, True), reduce_ratios=(8, 8, 8), num_channels=( (40, 80), (40, 80, 160), (40, 80, 160, 320), - )), + ), + ), with_head=True, - )), + ), + ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=40, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -105,36 +95,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-30_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-30_8xb64-210e_coco-256x192.py index 6b5d347cd9537af2a690ee3c6d02323a8c53bbd8..a768fd55909b1cccbd632de78ed2e21bcacb0891 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-30_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-30_8xb64-210e_coco-256x192.py @@ -1,48 +1,37 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='LiteHRNet', + type="LiteHRNet", in_channels=3, extra=dict( stem=dict(stem_channels=32, out_channels=32, expand_ratio=1), @@ -51,53 +40,54 @@ model = dict( num_modules=(3, 8, 3), num_branches=(2, 3, 4), num_blocks=(2, 2, 2), - module_type=('LITE', 'LITE', 'LITE'), + module_type=("LITE", "LITE", "LITE"), with_fuse=(True, True, True), reduce_ratios=(8, 8, 8), num_channels=( (40, 80), (40, 80, 160), (40, 80, 160, 320), - )), + ), + ), with_head=True, - )), + ), + ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=40, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -105,36 +95,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_mobilenetv2_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_mobilenetv2_8xb64-210e_coco-256x192.py index ff8eaccb7e093a16416ea52983d6cb7feb6d7814..28d654f53636871e42b0d80dbbcb5a4c1a3f07b2 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_mobilenetv2_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_mobilenetv2_8xb64-210e_coco-256x192.py @@ -1,87 +1,75 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='MobileNetV2', - widen_factor=1., - out_indices=(7, ), + type="MobileNetV2", + widen_factor=1.0, + out_indices=(7,), init_cfg=dict( - type='Pretrained', - checkpoint='mmcls://mobilenet_v2', - )), + type="Pretrained", + checkpoint="mmcls://mobilenet_v2", + ), + ), head=dict( - type='HeatmapHead', - in_channels=1280, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1280, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,36 +77,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_mobilenetv2_8xb64-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_mobilenetv2_8xb64-210e_coco-384x288.py index d01e4c6c3dc9924079d35bde2445fb93b3541cba..f9e5091a9b2539cf9b7abac1fb59c322b9db8b9a 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_mobilenetv2_8xb64-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_mobilenetv2_8xb64-210e_coco-384x288.py @@ -1,87 +1,75 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='MobileNetV2', - widen_factor=1., - out_indices=(7, ), + type="MobileNetV2", + widen_factor=1.0, + out_indices=(7,), init_cfg=dict( - type='Pretrained', - checkpoint='mmcls://mobilenet_v2', - )), + type="Pretrained", + checkpoint="mmcls://mobilenet_v2", + ), + ), head=dict( - type='HeatmapHead', - in_channels=1280, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1280, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,36 +77,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_mspn50_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_mspn50_8xb32-210e_coco-256x192.py index d0e2e9893c6429c99b847747170690654411e68b..3896d5afefdd160912197acaef415652ab76ceb7 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_mspn50_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_mspn50_8xb32-210e_coco-256x192.py @@ -1,114 +1,92 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings # multiple kernel_sizes of heatmap gaussian for 'Megvii' approach. kernel_sizes = [11, 9, 7, 5] -codec = [ - dict( - type='MegviiHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - kernel_size=kernel_size) for kernel_size in kernel_sizes -] +codec = [dict(type="MegviiHeatmap", input_size=(192, 256), heatmap_size=(48, 64), kernel_size=kernel_size) for kernel_size in kernel_sizes] # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='MSPN', + type="MSPN", unit_channels=256, num_stages=1, num_units=4, num_blocks=[3, 4, 6, 3], - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), init_cfg=dict( - type='Pretrained', - checkpoint='torchvision://resnet50', - )), + type="Pretrained", + checkpoint="torchvision://resnet50", + ), + ), head=dict( - type='MSPNHead', + type="MSPNHead", out_shape=(64, 48), unit_channels=256, out_channels=17, num_stages=1, num_units=4, - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), # each sub list is for a stage # and each element in each list is for a unit level_indices=[0, 1, 2, 3], - loss=[ - dict( - type='KeypointMSELoss', - use_target_weight=True, - loss_weight=0.25) - ] * 3 + [ - dict( - type='KeypointOHKMMSELoss', - use_target_weight=True, - loss_weight=1.) - ], - decoder=codec[-1]), + loss=[dict(type="KeypointMSELoss", use_target_weight=True, loss_weight=0.25)] * 3 + + [dict(type="KeypointOHKMMSELoss", use_target_weight=True, loss_weight=1.0)], + decoder=codec[-1], + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='GenerateTarget', multilevel=True, encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="GenerateTarget", multilevel=True, encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -116,37 +94,35 @@ train_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json', - nms_mode='none') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json", nms_mode="none") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_pvt-s_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_pvt-s_8xb64-210e_coco-256x192.py index 1b474b3f2fe7a5db3571846f7ab54c5c05c33136..59eb8d69b47fa112394d9b9ffd4226bb0241f254 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_pvt-s_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_pvt-s_8xb64-210e_coco-256x192.py @@ -1,90 +1,74 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='PyramidVisionTransformer', + type="PyramidVisionTransformer", num_layers=[3, 4, 6, 3], - init_cfg=dict( - type='Pretrained', - checkpoint='https://github.com/whai362/PVT/' - 'releases/download/v2/pvt_small.pth'), + init_cfg=dict(type="Pretrained", checkpoint="https://github.com/whai362/PVT/" "releases/download/v2/pvt_small.pth"), ), - neck=dict(type='FeatureMapProcessor', select_index=3), + neck=dict(type="FeatureMapProcessor", select_index=3), head=dict( - type='HeatmapHead', - in_channels=512, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=512, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -92,36 +76,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_pvtv2-b2_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_pvtv2-b2_8xb64-210e_coco-256x192.py index e8921e68030e89110afe8c44717b051b02616a13..82bcd92f455b4bd8493d663b73756b5e65276496 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_pvtv2-b2_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_pvtv2-b2_8xb64-210e_coco-256x192.py @@ -1,91 +1,75 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='PyramidVisionTransformerV2', + type="PyramidVisionTransformerV2", embed_dims=64, num_layers=[3, 4, 6, 3], - init_cfg=dict( - type='Pretrained', - checkpoint='https://github.com/whai362/PVT/' - 'releases/download/v2/pvt_v2_b2.pth'), + init_cfg=dict(type="Pretrained", checkpoint="https://github.com/whai362/PVT/" "releases/download/v2/pvt_v2_b2.pth"), ), - neck=dict(type='FeatureMapProcessor', select_index=3), + neck=dict(type="FeatureMapProcessor", select_index=3), head=dict( - type='HeatmapHead', - in_channels=512, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=512, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -93,36 +77,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_8xb32-210e_coco-384x288.py index cd13e4a4222f21baa200c4c8ccb17986aacfc935..ebb24887dee1d9a6f41def288b9f0a4c191fd8bf 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_8xb32-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_8xb64-210e_coco-256x192.py index 5486548481df742e6bc53bd32d65501971e356f5..2d806eb3836975e8ceb9d95ce772956ea97f3e1c 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_8xb64-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_dark-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_dark-8xb64-210e_coco-256x192.py index 39b7b3220d64d2ac905288c6bf2c0dd1ca2be7f1..bbba829d4f757aeae50514e3f39b14abd262c1c1 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_dark-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_dark-8xb64-210e_coco-256x192.py @@ -1,88 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -90,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_dark-8xb64-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_dark-8xb64-210e_coco-384x288.py index f7c99503d46a7e0dc4402250e073b6ce9128d121..9a40b75f09e96fbb83251ff44c0ee4ade3ab9c23 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_dark-8xb64-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res101_dark-8xb64-210e_coco-384x288.py @@ -1,88 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(288, 384), - heatmap_size=(72, 96), - sigma=3, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -90,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_8xb32-210e_coco-256x192.py index beccab1bd105b618b601d5d331cc0fc680df1bf7..8ae2e45eac22d3fa6056767953d0872dab425f6b 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_8xb32-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_8xb32-210e_coco-384x288.py index 25d5039f05e3d9b2387be6bc0690e5d3904faded..d7b3227dbafde943fbd5c1c4eebe1c8970b7045a 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_8xb32-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_dark-8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_dark-8xb32-210e_coco-256x192.py index acd91192447b4ef5f41745db0c4b93357b53b778..b5e0aa8055f751e7e48829d1e2c4706b32902355 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_dark-8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_dark-8xb32-210e_coco-256x192.py @@ -1,88 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -90,36 +73,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_dark-8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_dark-8xb32-210e_coco-384x288.py index 49bd2b224bea33419d392931391ba90806ee24a7..1ecbee36c2cfe22819a3b1a7927ff93abb90ae3c 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_dark-8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res152_dark-8xb32-210e_coco-384x288.py @@ -1,89 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(288, 384), - heatmap_size=(72, 96), - sigma=3, - unbiased=True, - blur_kernel_size=17) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3, unbiased=True, blur_kernel_size=17) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -91,36 +73,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_8xb64-210e_coco-256x192.py index 7dbe1b43f77f35fb6564b9d6322a1b8c08d93a60..088de894baf40308e5761c081d6d2a827f2c87af 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_8xb64-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_8xb64-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_8xb64-210e_coco-384x288.py index d74cc1392d27911a1e3d2b3239840717da5a4fb5..c55fb81d50323594be016e95622d05cd7b268582 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_8xb64-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_8xb64-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_dark-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_dark-8xb64-210e_coco-256x192.py index fdec305b10c5aaa202957650a81975158d0d1b9c..f563e6c0604bad212e9197795f3ed004ffb9dcf3 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_dark-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_dark-8xb64-210e_coco-256x192.py @@ -1,88 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -90,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_dark-8xb64-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_dark-8xb64-210e_coco-384x288.py index b34ad210f37ce883b21377192fbe035a7c1fcd56..049361bdb4d86134babde417a80b12680084c4c9 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_dark-8xb64-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_dark-8xb64-210e_coco-384x288.py @@ -1,88 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(288, 384), - heatmap_size=(72, 96), - sigma=3, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -90,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_fp16-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_fp16-8xb64-210e_coco-256x192.py index 66a6a27822fb72e7aef421bf1bf2230598c26125..b5d5004250d2fd0b199306edcabdb2823dfd99cf 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_fp16-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_fp16-8xb64-210e_coco-256x192.py @@ -1,7 +1,7 @@ -_base_ = ['./td-hm_res50_8xb64-210e_coco-256x192.py'] +_base_ = ["./td-hm_res50_8xb64-210e_coco-256x192.py"] # fp16 settings optim_wrapper = dict( - type='AmpOptimWrapper', - loss_scale='dynamic', + type="AmpOptimWrapper", + loss_scale="dynamic", ) diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest101_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest101_8xb32-210e_coco-384x288.py index 5bfbace9f6313fe89201ba5c243e51b4aa90ca27..9f40156741c048838bc970b57166738f5af89773 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest101_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest101_8xb32-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeSt', + type="ResNeSt", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnest101'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnest101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest101_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest101_8xb64-210e_coco-256x192.py index 030ae95d634e40f172dae07eb2bef163084906a3..7075d99f6313563438a5ad77a18427a0a095bb9b 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest101_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest101_8xb64-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeSt', + type="ResNeSt", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnest101'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnest101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest200_8xb16-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest200_8xb16-210e_coco-384x288.py index bdcdb6c75fb74e65ca53797eb33039f6d36357ce..ce4d480c254b80232d1472ce7a4e19760fbdb057 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest200_8xb16-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest200_8xb16-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=128) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeSt', + type="ResNeSt", depth=200, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnest200'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnest200"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=16, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=16, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest200_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest200_8xb64-210e_coco-256x192.py index 1a5e1e8e4a570e09b9fd3a5f096584275bfb8858..4bf5d12eeae294161f7f1de402662087978fcd28 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest200_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest200_8xb64-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeSt', + type="ResNeSt", depth=200, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnest200'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnest200"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest269_8xb16-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest269_8xb16-210e_coco-384x288.py index b519e9d2ef951298da6f3d4794d5c8660e83159d..321affc4f6d6a53fcc967b4fec5220273e123a73 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest269_8xb16-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest269_8xb16-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=128) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeSt', + type="ResNeSt", depth=269, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnest269'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnest269"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=16, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=16, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest269_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest269_8xb32-210e_coco-256x192.py index b3588d1fa31e29ec960a35050ff8659e712712ec..60f1dcff8a80d0e7f6dfaad17fe828ac2cfa1d56 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest269_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest269_8xb32-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeSt', + type="ResNeSt", depth=269, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnest269'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnest269"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest50_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest50_8xb64-210e_coco-256x192.py index 43295bb41f1b4b2c87119baec30b7efc9ecb80d9..a063b205ecd60cf683f24a8ffe9af3eff1f1f425 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest50_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest50_8xb64-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeSt', + type="ResNeSt", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnest50'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnest50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest50_8xb64-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest50_8xb64-210e_coco-384x288.py index e45320b036372894e9ddd0bcee6c457e86a8ecee..a9ec1ee4d3e28aec9253751e58f52aee92f5d72f 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest50_8xb64-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnest50_8xb64-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeSt', + type="ResNeSt", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnest50'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnest50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d101_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d101_8xb32-210e_coco-384x288.py index 4fc55228face0a1586627e3ffa823ffe645c812a..38be30a7a4a10aedfe24bcd666708cfe58a324ce 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d101_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d101_8xb32-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNetV1d', + type="ResNetV1d", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnet101_v1d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnet101_v1d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d101_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d101_8xb64-210e_coco-256x192.py index 6c8cc4e808c2fff488bc4b5c977a34d7978a6d03..d4f6be8ed559c78e6299c7b548d659ac7a5a3aaf 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d101_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d101_8xb64-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNetV1d', + type="ResNetV1d", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnet101_v1d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnet101_v1d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d152_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d152_8xb32-210e_coco-256x192.py index a85a7f80c43b090426182ab9c3acaa5659b0f4d5..2c84469ded41ba231299dcc56a9a24dac8d71cb3 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d152_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d152_8xb32-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNetV1d', + type="ResNetV1d", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnet152_v1d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnet152_v1d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d152_8xb48-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d152_8xb48-210e_coco-384x288.py index 7a728ce806415f8da3afd036835171b64976a41a..c2f285c70139ef5a8df47b164069af03f2824d3b 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d152_8xb48-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d152_8xb48-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=384) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNetV1d', + type="ResNetV1d", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnet152_v1d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnet152_v1d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=48, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d50_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d50_8xb64-210e_coco-256x192.py index c241cdd3ddbee8398c3da8d96d7d3d46bce99f24..10247dcdce31eeaa26cdce0cfcca20620bd24a55 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d50_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d50_8xb64-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNetV1d', + type="ResNetV1d", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnet50_v1d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnet50_v1d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d50_8xb64-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d50_8xb64-210e_coco-384x288.py index 4d1cea135b49cf1d70e10194685c394dc2c8bc1a..ec0999682707f92aad2f2693daef3985378ee4e3 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d50_8xb64-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnetv1d50_8xb64-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNetV1d', + type="ResNetV1d", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnet50_v1d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnet50_v1d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext101_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext101_8xb32-210e_coco-384x288.py index 508233371b4a819fe0b14a01798cfe48e6b32303..ad72dbd21cb965e9e10632b454447d3d22ceb960 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext101_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext101_8xb32-210e_coco-384x288.py @@ -1,85 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, - init_cfg=dict( - type='Pretrained', checkpoint='mmcls://resnext101_32x4d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnext101_32x4d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +73,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext101_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext101_8xb64-210e_coco-256x192.py index eafed7f07526dce3b46bcd800272764a1614a051..1e9a6b7b7decc8f62e16abf1d0d6aebfd6525d8e 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext101_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext101_8xb64-210e_coco-256x192.py @@ -1,85 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, - init_cfg=dict( - type='Pretrained', checkpoint='mmcls://resnext101_32x4d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnext101_32x4d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext152_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext152_8xb32-210e_coco-256x192.py index 27c2c263b05193b10f0c0af2235b81d86cca1bc4..f0e14ebb44fb1fe9224d8e102ad2f785bb51300c 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext152_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext152_8xb32-210e_coco-256x192.py @@ -1,85 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=152, - init_cfg=dict( - type='Pretrained', checkpoint='mmcls://resnext152_32x4d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnext152_32x4d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +73,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext152_8xb48-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext152_8xb48-210e_coco-384x288.py index c02caeb7461f1fb312d02cfe7496c57a8b9b11e2..c6caf1ed19c253cd98e24ea9543385a467eb65a2 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext152_8xb48-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext152_8xb48-210e_coco-384x288.py @@ -1,85 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=384) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=152, - init_cfg=dict( - type='Pretrained', checkpoint='mmcls://resnext152_32x4d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnext152_32x4d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +73,35 @@ train_dataloader = dict( batch_size=48, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext50_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext50_8xb64-210e_coco-256x192.py index b088a44ca6a043abac5e52596486362124d244c5..9bc9a9c90e3c4f08b30ca9cc189b465dc7f4d91a 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext50_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext50_8xb64-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnext50_32x4d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnext50_32x4d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext50_8xb64-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext50_8xb64-210e_coco-384x288.py index 9f97235218992e772298fcb74c1494331eeb50a7..f89e81c9b9dad6bbe95444c44ef2ef64b8fc7802 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext50_8xb64-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_resnext50_8xb64-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnext50_32x4d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnext50_32x4d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_rsn18_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_rsn18_8xb32-210e_coco-256x192.py index 18d16bd26784ad9f706e39bd83c25fc913ef4b08..96f23ed7850d63ac5542287dcecf15af33a8f95e 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_rsn18_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_rsn18_8xb32-210e_coco-256x192.py @@ -1,113 +1,90 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-2, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-2, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 190, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 190, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings # multiple kernel_sizes of heatmap gaussian for 'Megvii' approach. kernel_sizes = [11, 9, 7, 5] -codec = [ - dict( - type='MegviiHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - kernel_size=kernel_size) for kernel_size in kernel_sizes -] +codec = [dict(type="MegviiHeatmap", input_size=(192, 256), heatmap_size=(48, 64), kernel_size=kernel_size) for kernel_size in kernel_sizes] # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='RSN', + type="RSN", unit_channels=256, num_stages=1, num_units=4, num_blocks=[2, 2, 2, 2], num_steps=4, - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), ), head=dict( - type='MSPNHead', + type="MSPNHead", out_shape=(64, 48), unit_channels=256, out_channels=17, num_stages=1, num_units=4, - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), # each sub list is for a stage # and each element in each list is for a unit level_indices=[0, 1, 2, 3], - loss=[ - dict( - type='KeypointMSELoss', - use_target_weight=True, - loss_weight=0.25) - ] * 3 + [ - dict( - type='KeypointOHKMMSELoss', - use_target_weight=True, - loss_weight=1.) - ], - decoder=codec[-1]), + loss=[dict(type="KeypointMSELoss", use_target_weight=True, loss_weight=0.25)] * 3 + + [dict(type="KeypointOHKMMSELoss", use_target_weight=True, loss_weight=1.0)], + decoder=codec[-1], + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='GenerateTarget', multilevel=True, encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="GenerateTarget", multilevel=True, encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,40 +92,38 @@ train_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json', - nms_mode='none') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json", nms_mode="none") test_evaluator = val_evaluator # fp16 settings -fp16 = dict(loss_scale='dynamic') +fp16 = dict(loss_scale="dynamic") diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_rsn50_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_rsn50_8xb32-210e_coco-256x192.py index 069cb413123be20dee06dd8014b583dfa267fa46..f7adec2953fbfc8878da0a58f159413f1167319e 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_rsn50_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_rsn50_8xb32-210e_coco-256x192.py @@ -1,113 +1,90 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings # multiple kernel_sizes of heatmap gaussian for 'Megvii' approach. kernel_sizes = [11, 9, 7, 5] -codec = [ - dict( - type='MegviiHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - kernel_size=kernel_size) for kernel_size in kernel_sizes -] +codec = [dict(type="MegviiHeatmap", input_size=(192, 256), heatmap_size=(48, 64), kernel_size=kernel_size) for kernel_size in kernel_sizes] # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='RSN', + type="RSN", unit_channels=256, num_stages=1, num_units=4, num_blocks=[3, 4, 6, 3], num_steps=4, - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), ), head=dict( - type='MSPNHead', + type="MSPNHead", out_shape=(64, 48), unit_channels=256, out_channels=17, num_stages=1, num_units=4, - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), # each sub list is for a stage # and each element in each list is for a unit level_indices=[0, 1, 2, 3], - loss=[ - dict( - type='KeypointMSELoss', - use_target_weight=True, - loss_weight=0.25) - ] * 3 + [ - dict( - type='KeypointOHKMMSELoss', - use_target_weight=True, - loss_weight=1.) - ], - decoder=codec[-1]), + loss=[dict(type="KeypointMSELoss", use_target_weight=True, loss_weight=0.25)] * 3 + + [dict(type="KeypointOHKMMSELoss", use_target_weight=True, loss_weight=1.0)], + decoder=codec[-1], + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='GenerateTarget', multilevel=True, encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="GenerateTarget", multilevel=True, encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec[0]['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec[0]["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,40 +92,38 @@ train_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json', - nms_mode='none') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json", nms_mode="none") test_evaluator = val_evaluator # fp16 settings -fp16 = dict(loss_scale='dynamic') +fp16 = dict(loss_scale="dynamic") diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet101_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet101_8xb32-210e_coco-256x192.py index 544c87242f5f3e7e4a0b129aa927e21a8c5a4430..d7cf5105c8885b35801a1e83d8f9e999ca915c65 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet101_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet101_8xb32-210e_coco-256x192.py @@ -1,87 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SCNet', + type="SCNet", depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/scnet101-94250a77.pth'), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/scnet101-94250a77.pth"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,36 +73,35 @@ train_dataloader = dict( batch_size=32, num_workers=1, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=1, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet101_8xb48-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet101_8xb48-210e_coco-384x288.py index 1af2e44ef013ea525d0d7cfe19312c07a1b5ae93..cfb73fa86fb424dd571d738c4b5f289519c6dc8b 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet101_8xb48-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet101_8xb48-210e_coco-384x288.py @@ -1,87 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SCNet', + type="SCNet", depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/scnet101-94250a77.pth'), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/scnet101-94250a77.pth"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,36 +73,35 @@ train_dataloader = dict( batch_size=48, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet50_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet50_8xb32-210e_coco-384x288.py index efa1ad924cf5da56fc0ab69cee89eee48355376d..b570675d81acbc5069dac56326731cd0c238c7c6 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet50_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet50_8xb32-210e_coco-384x288.py @@ -1,87 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SCNet', + type="SCNet", depth=50, - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/scnet50-7ef0a199.pth'), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/scnet50-7ef0a199.pth"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,36 +73,35 @@ train_dataloader = dict( batch_size=32, num_workers=1, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=1, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet50_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet50_8xb64-210e_coco-256x192.py index 9d784d80296e085f201da67e7f45732af6fe8938..484062583253b3cb015ce6de1e95b0e421e1b844 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet50_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_scnet50_8xb64-210e_coco-256x192.py @@ -1,87 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SCNet', + type="SCNet", depth=50, - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/scnet50-7ef0a199.pth'), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/scnet50-7ef0a199.pth"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet101_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet101_8xb32-210e_coco-384x288.py index b515b744c4c9b43126b2e85b9c32b5663016be70..7123ac2a7468b145409d2f929e9cce8cfdb4e15f 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet101_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet101_8xb32-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SEResNet', + type="SEResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://se-resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://se-resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet101_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet101_8xb64-210e_coco-256x192.py index f6d9fab2eda60ecef464a645b572866d1954cbcf..68b934b696ef945e3cd85aa0021338f4df70e1fa 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet101_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet101_8xb64-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SEResNet', + type="SEResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://se-resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://se-resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet152_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet152_8xb32-210e_coco-256x192.py index a0ef9bf5711f01625f3faf0d46122dae2eca8c35..5a0b343efd61b17d8617ff1479f666445bbf4f4a 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet152_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet152_8xb32-210e_coco-256x192.py @@ -1,83 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SEResNet', + type="SEResNet", depth=152, ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,36 +72,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet152_8xb48-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet152_8xb48-210e_coco-384x288.py index 13524c121772b7a73b79dd7d9c2fd4fd6e5ad882..d0e1f9f11d04ebeda801c85d7cc6af690f3b1369 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet152_8xb48-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet152_8xb48-210e_coco-384x288.py @@ -1,83 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=384) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SEResNet', + type="SEResNet", depth=152, ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,36 +72,35 @@ train_dataloader = dict( batch_size=48, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet50_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet50_8xb64-210e_coco-256x192.py index 93fb78fac56a697164a383229436c79de5392be5..3dc8ccb3aae8fa8a71b6bb65dddd4e46a431ff0e 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet50_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet50_8xb64-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SEResNet', + type="SEResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://se-resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://se-resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet50_8xb64-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet50_8xb64-210e_coco-384x288.py index fa2002a70a94d7104d07ec2c921a6c1123f859ab..06b94a3e9fa8c8c0ee97b1ad339327e5ca2b9d9d 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet50_8xb64-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_seresnet50_8xb64-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SEResNet', + type="SEResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://se-resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://se-resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv1_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv1_8xb64-210e_coco-256x192.py index 029f48d3d90bdc113066c67200cbe15772bd0b9b..d6107f55118f64c33de6bf3df79720cd43999c94 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv1_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv1_8xb64-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ShuffleNetV1', + type="ShuffleNetV1", groups=3, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://shufflenet_v1'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://shufflenet_v1"), ), head=dict( - type='HeatmapHead', - in_channels=960, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=960, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv1_8xb64-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv1_8xb64-210e_coco-384x288.py index f06c325bd1213995bb51bd9c1e477de0604e4cb7..340ed9a4747fad9a966d43e95b1417c4b1c8f2fd 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv1_8xb64-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv1_8xb64-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ShuffleNetV1', + type="ShuffleNetV1", groups=3, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://shufflenet_v1'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://shufflenet_v1"), ), head=dict( - type='HeatmapHead', - in_channels=960, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=960, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv2_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv2_8xb64-210e_coco-256x192.py index 333998490e38105e4f73a55af6358e868943117a..4b17ce7ad27f53e9b4006224e9543dd41d712767 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv2_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv2_8xb64-210e_coco-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ShuffleNetV2', + type="ShuffleNetV2", widen_factor=1.0, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://shufflenet_v2'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://shufflenet_v2"), ), head=dict( - type='HeatmapHead', - in_channels=1024, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1024, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv2_8xb64-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv2_8xb64-210e_coco-384x288.py index e7be5484e8d56f6001a3c1e5de91dd1b8c32821f..2004dd89276f9b5419e61043f63f12dded41a626 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv2_8xb64-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_shufflenetv2_8xb64-210e_coco-384x288.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ShuffleNetV2', + type="ShuffleNetV2", widen_factor=1.0, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://shufflenet_v2'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://shufflenet_v2"), ), head=dict( - type='HeatmapHead', - in_channels=1024, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1024, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,36 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-b-p4-w7_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-b-p4-w7_8xb32-210e_coco-256x192.py index 81877b893f69b66a2263a4d5dfea8407d56668af..e2cd617fd229e978d59bdd8805032909fc8b2f3b 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-b-p4-w7_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-b-p4-w7_8xb32-210e_coco-256x192.py @@ -1,49 +1,38 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SwinTransformer', + type="SwinTransformer", embed_dims=128, depths=[2, 2, 18, 2], num_heads=[4, 8, 16, 32], @@ -51,52 +40,50 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, - out_indices=(3, ), + out_indices=(3,), with_cp=False, convert_weights=True, init_cfg=dict( - type='Pretrained', - checkpoint='https://github.com/SwinTransformer/storage/releases/' - 'download/v1.0.0/swin_base_patch4_window7_224_22k.pth'), + type="Pretrained", + checkpoint="https://github.com/SwinTransformer/storage/releases/" "download/v1.0.0/swin_base_patch4_window7_224_22k.pth", + ), ), head=dict( - type='HeatmapHead', - in_channels=1024, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1024, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -104,36 +91,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-b-p4-w7_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-b-p4-w7_8xb32-210e_coco-384x288.py index 0c1d5fa12f97259031d65030e5abee8cb61d372d..a55735ab209318a36cce833105952cfd6475bc2e 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-b-p4-w7_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-b-p4-w7_8xb32-210e_coco-384x288.py @@ -1,49 +1,38 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=2) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SwinTransformer', + type="SwinTransformer", embed_dims=128, depths=[2, 2, 18, 2], num_heads=[4, 8, 16, 32], @@ -51,52 +40,50 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, - out_indices=(3, ), + out_indices=(3,), with_cp=False, convert_weights=True, init_cfg=dict( - type='Pretrained', - checkpoint='https://github.com/SwinTransformer/storage/releases/' - 'download/v1.0.0/swin_base_patch4_window12_384_22k.pth'), + type="Pretrained", + checkpoint="https://github.com/SwinTransformer/storage/releases/" "download/v1.0.0/swin_base_patch4_window12_384_22k.pth", + ), ), head=dict( - type='HeatmapHead', - in_channels=1024, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1024, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -104,36 +91,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-l-p4-w7_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-l-p4-w7_8xb32-210e_coco-256x192.py index 14d08a49f865a901b0832f40dd2819b8ee43d58c..70a94be6577c6f030cd893966f4beb4edfed6b98 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-l-p4-w7_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-l-p4-w7_8xb32-210e_coco-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) @@ -6,53 +6,42 @@ train_cfg = dict(max_epochs=210, val_interval=10) # optimizer optim_wrapper = dict( optimizer=dict( - type='AdamW', + type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.01, ), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'relative_position_bias_table': dict(decay_mult=0.), - 'norm': dict(decay_mult=0.) - })) + "absolute_pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + } + ), +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SwinTransformer', + type="SwinTransformer", embed_dims=192, depths=[2, 2, 18, 2], num_heads=[6, 12, 24, 48], @@ -60,52 +49,50 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.5, patch_norm=True, - out_indices=(3, ), + out_indices=(3,), with_cp=False, convert_weights=True, init_cfg=dict( - type='Pretrained', - checkpoint='https://github.com/SwinTransformer/storage/releases/' - 'download/v1.0.0/swin_base_patch4_window7_224_22k.pth'), + type="Pretrained", + checkpoint="https://github.com/SwinTransformer/storage/releases/" "download/v1.0.0/swin_base_patch4_window7_224_22k.pth", + ), ), head=dict( - type='HeatmapHead', - in_channels=1536, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1536, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -113,36 +100,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-l-p4-w7_8xb32-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-l-p4-w7_8xb32-210e_coco-384x288.py index 692c8df1a616dabbcb93a9be67f4626862eae172..9125429116ecf07fd23ac900f0f56c0eca0ec89c 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-l-p4-w7_8xb32-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-l-p4-w7_8xb32-210e_coco-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) @@ -6,53 +6,42 @@ train_cfg = dict(max_epochs=210, val_interval=10) # optimizer optim_wrapper = dict( optimizer=dict( - type='AdamW', + type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.01, ), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'relative_position_bias_table': dict(decay_mult=0.), - 'norm': dict(decay_mult=0.) - })) + "absolute_pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + } + ), +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=2) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SwinTransformer', + type="SwinTransformer", embed_dims=192, depths=[2, 2, 18, 2], num_heads=[6, 12, 24, 48], @@ -60,52 +49,50 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.5, patch_norm=True, - out_indices=(3, ), + out_indices=(3,), with_cp=False, convert_weights=True, init_cfg=dict( - type='Pretrained', - checkpoint='https://github.com/SwinTransformer/storage/releases/' - 'download/v1.0.0/swin_base_patch4_window12_384_22k.pth'), + type="Pretrained", + checkpoint="https://github.com/SwinTransformer/storage/releases/" "download/v1.0.0/swin_base_patch4_window12_384_22k.pth", + ), ), head=dict( - type='HeatmapHead', - in_channels=1536, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1536, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -113,36 +100,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-t-p4-w7_8xb32-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-t-p4-w7_8xb32-210e_coco-256x192.py index 068ee0649f4cf97f5887ff5b17f44d6e1e1609b3..701707ce723e1dc9c291fdbb6a20496cb9b18e64 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-t-p4-w7_8xb32-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_swin-t-p4-w7_8xb32-210e_coco-256x192.py @@ -1,49 +1,38 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SwinTransformer', + type="SwinTransformer", embed_dims=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], @@ -51,52 +40,50 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, - out_indices=(3, ), + out_indices=(3,), with_cp=False, convert_weights=True, init_cfg=dict( - type='Pretrained', - checkpoint='https://github.com/SwinTransformer/storage/releases/' - 'download/v1.0.0/swin_tiny_patch4_window7_224.pth'), + type="Pretrained", + checkpoint="https://github.com/SwinTransformer/storage/releases/" "download/v1.0.0/swin_tiny_patch4_window7_224.pth", + ), ), head=dict( - type='HeatmapHead', - in_channels=768, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=768, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -104,36 +91,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_vgg16-bn_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_vgg16-bn_8xb64-210e_coco-256x192.py index b85adb998bb5f2660ef00d1d395a6ca8bb4763c0..0c0945b86d5e0d2ee0564bde4eb2c4b352eae9e8 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_vgg16-bn_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_vgg16-bn_8xb64-210e_coco-256x192.py @@ -1,85 +1,72 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='VGG', + type="VGG", depth=16, - norm_cfg=dict(type='BN'), - init_cfg=dict(type='Pretrained', checkpoint='mmcls://vgg16_bn'), + norm_cfg=dict(type="BN"), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://vgg16_bn"), ), head=dict( - type='HeatmapHead', - in_channels=512, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=512, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +74,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_vipnas-mbv3_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_vipnas-mbv3_8xb64-210e_coco-256x192.py index 04fcc1ad2ef3152e217fa20bc0a325d44b1e6f0d..ef13ae2ad939f3936f3f3a669fbcc5f49570aa76 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_vipnas-mbv3_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_vipnas-mbv3_8xb64-210e_coco-256x192.py @@ -1,85 +1,73 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict(type='ViPNAS_MobileNetV3'), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ViPNAS_MobileNetV3"), head=dict( - type='ViPNASHead', + type="ViPNASHead", in_channels=160, out_channels=17, deconv_out_channels=(160, 160, 160), deconv_num_groups=(160, 160, 160), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +75,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_vipnas-res50_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_vipnas-res50_8xb64-210e_coco-256x192.py index 8190d7ffd2ca650f939935487551f0a62a8bf078..8dd128b33e2972d61c2bb4e4394c76e9cdb8f931 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_vipnas-res50_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_vipnas-res50_8xb64-210e_coco-256x192.py @@ -1,83 +1,67 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict(type='ViPNAS_ResNet', depth=50), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ViPNAS_ResNet", depth=50), head=dict( - type='ViPNASHead', - in_channels=608, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="ViPNASHead", in_channels=608, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,36 +69,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/cspnext-m_udp_8xb64-210e_crowpose-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/cspnext-m_udp_8xb64-210e_crowpose-256x192.py index b083719303620be25ca2f2aa587ae85f15d6c613..a802e7f23494ebb863df3c3d985ae8d360c10720 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/cspnext-m_udp_8xb64-210e_crowpose-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/cspnext-m_udp_8xb64-210e_crowpose-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,79 +10,70 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/' - 'cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth')), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/" + "cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth", + ), + ), head=dict( - type='HeatmapHead', - in_channels=768, - out_channels=14, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=768, out_channels=14, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CrowdPoseDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CrowdPoseDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -92,68 +83,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -161,56 +133,49 @@ train_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_test.json', - bbox_file='data/crowdpose/annotations/det_for_crowd_test_0.1_0.5.json', - data_prefix=dict(img='pose/CrowdPose/images/'), + ann_file="crowdpose/annotations/mmpose_crowdpose_test.json", + bbox_file="data/crowdpose/annotations/det_for_crowd_test_0.1_0.5.json", + data_prefix=dict(img="pose/CrowdPose/images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='crowdpose/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="crowdpose/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'crowdpose/annotations/mmpose_crowdpose_test.json', + type="CocoMetric", + ann_file=data_root + "crowdpose/annotations/mmpose_crowdpose_test.json", use_area=False, - iou_type='keypoints_crowd', - prefix='crowdpose') + iou_type="keypoints_crowd", + prefix="crowdpose", +) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_hrnet-w32_8xb64-210e_crowdpose-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_hrnet-w32_8xb64-210e_crowdpose-256x192.py index 3117314a43ab214da46a83c5621f1860bcb3f57f..efc1b76675f0fd200008a0382586b11cc872c0b0 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_hrnet-w32_8xb64-210e_crowdpose-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_hrnet-w32_8xb64-210e_crowdpose-256x192.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='crowdpose/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="crowdpose/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=14, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CrowdPoseDataset' -data_mode = 'topdown' -data_root = 'data/crowdpose/' +dataset_type = "CrowdPoseDataset" +data_mode = "topdown" +data_root = "data/crowdpose/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,38 +84,41 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mmpose_crowdpose_test.json', - bbox_file='data/crowdpose/annotations/det_for_crowd_test_0.1_0.5.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mmpose_crowdpose_test.json", + bbox_file="data/crowdpose/annotations/det_for_crowd_test_0.1_0.5.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/mmpose_crowdpose_test.json', + type="CocoMetric", + ann_file=data_root + "annotations/mmpose_crowdpose_test.json", use_area=False, - iou_type='keypoints_crowd', - prefix='crowdpose') + iou_type="keypoints_crowd", + prefix="crowdpose", +) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res101_8xb64-210e_crowdpose-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res101_8xb64-210e_crowdpose-256x192.py index 79cae1d130a3713944069e37a3258811b068e655..e74a01f04494c16b64b213c629f62928dd5caa24 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res101_8xb64-210e_crowdpose-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res101_8xb64-210e_crowdpose-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='crowdpose/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="crowdpose/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=14, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=14, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CrowdPoseDataset' -data_mode = 'topdown' -data_root = 'data/crowdpose/' +dataset_type = "CrowdPoseDataset" +data_mode = "topdown" +data_root = "data/crowdpose/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,38 +73,41 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mmpose_crowdpose_test.json', - bbox_file='data/crowdpose/annotations/det_for_crowd_test_0.1_0.5.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mmpose_crowdpose_test.json", + bbox_file="data/crowdpose/annotations/det_for_crowd_test_0.1_0.5.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/mmpose_crowdpose_test.json', + type="CocoMetric", + ann_file=data_root + "annotations/mmpose_crowdpose_test.json", use_area=False, - iou_type='keypoints_crowd', - prefix='crowdpose') + iou_type="keypoints_crowd", + prefix="crowdpose", +) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res101_8xb64-210e_crowdpose-320x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res101_8xb64-210e_crowdpose-320x256.py index eac5caf859095d3867fdd45fde58774b8c5ce54e..41a91fcf8004db4b9cc48f0611569e108b956b6b 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res101_8xb64-210e_crowdpose-320x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res101_8xb64-210e_crowdpose-320x256.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='crowdpose/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="crowdpose/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 320), heatmap_size=(64, 80), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 320), heatmap_size=(64, 80), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=14, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=14, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CrowdPoseDataset' -data_mode = 'topdown' -data_root = 'data/crowdpose/' +dataset_type = "CrowdPoseDataset" +data_mode = "topdown" +data_root = "data/crowdpose/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,38 +73,41 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mmpose_crowdpose_test.json', - bbox_file='data/crowdpose/annotations/det_for_crowd_test_0.1_0.5.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mmpose_crowdpose_test.json", + bbox_file="data/crowdpose/annotations/det_for_crowd_test_0.1_0.5.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/mmpose_crowdpose_test.json', + type="CocoMetric", + ann_file=data_root + "annotations/mmpose_crowdpose_test.json", use_area=False, - iou_type='keypoints_crowd', - prefix='crowdpose') + iou_type="keypoints_crowd", + prefix="crowdpose", +) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res152_8xb64-210e_crowdpose-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res152_8xb64-210e_crowdpose-256x192.py index 5b99439535a54b4bc69ca5ee270aa5a0d7fa26bf..fd0e4a75bc31d74c53c18534f1f7e9a76dfca31f 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res152_8xb64-210e_crowdpose-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res152_8xb64-210e_crowdpose-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='crowdpose/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="crowdpose/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=14, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=14, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CrowdPoseDataset' -data_mode = 'topdown' -data_root = 'data/crowdpose/' +dataset_type = "CrowdPoseDataset" +data_mode = "topdown" +data_root = "data/crowdpose/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,38 +73,41 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mmpose_crowdpose_test.json', - bbox_file='data/crowdpose/annotations/det_for_crowd_test_0.1_0.5.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mmpose_crowdpose_test.json", + bbox_file="data/crowdpose/annotations/det_for_crowd_test_0.1_0.5.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/mmpose_crowdpose_test.json', + type="CocoMetric", + ann_file=data_root + "annotations/mmpose_crowdpose_test.json", use_area=False, - iou_type='keypoints_crowd', - prefix='crowdpose') + iou_type="keypoints_crowd", + prefix="crowdpose", +) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res50_8xb64-210e_crowdpose-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res50_8xb64-210e_crowdpose-256x192.py index d669b2e2670657a25def5234037e371bede0882d..cb2761360223b1412b9d328558f5df9085f10bfc 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res50_8xb64-210e_crowdpose-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/crowdpose/td-hm_res50_8xb64-210e_crowdpose-256x192.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='crowdpose/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="crowdpose/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=14, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=14, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CrowdPoseDataset' -data_mode = 'topdown' -data_root = 'data/crowdpose/' +dataset_type = "CrowdPoseDataset" +data_mode = "topdown" +data_root = "data/crowdpose/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,38 +73,41 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mmpose_crowdpose_test.json', - bbox_file='data/crowdpose/annotations/det_for_crowd_test_0.1_0.5.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mmpose_crowdpose_test.json", + bbox_file="data/crowdpose/annotations/det_for_crowd_test_0.1_0.5.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/mmpose_crowdpose_test.json', + type="CocoMetric", + ann_file=data_root + "annotations/mmpose_crowdpose_test.json", use_area=False, - iou_type='keypoints_crowd', - prefix='crowdpose') + iou_type="keypoints_crowd", + prefix="crowdpose", +) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/exlpose/td-hm_hrnet-w32_8xb64-210e_exlpose-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/exlpose/td-hm_hrnet-w32_8xb64-210e_exlpose-256x192.py index c1fea18a4a08fee42effafcd6424b4ab4822acca..81d981046b389d733572ca76d005890ed01a2661 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/exlpose/td-hm_hrnet-w32_8xb64-210e_exlpose-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/exlpose/td-hm_hrnet-w32_8xb64-210e_exlpose-256x192.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=14, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'ExlposeDataset' -data_mode = 'topdown' -data_root = 'data/ExLPose/' +dataset_type = "ExlposeDataset" +data_mode = "topdown" +data_root = "data/ExLPose/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,35 +84,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ExLPose/ExLPose_train_LL.json', - data_prefix=dict(img=''), + ann_file="annotations/ExLPose/ExLPose_train_LL.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/ExLPose/ExLPose_test_LL-A.json', - data_prefix=dict(img=''), + ann_file="annotations/ExLPose/ExLPose_test_LL-A.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/ExLPose/ExLPose_test_LL-A.json', - use_area=False) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/ExLPose/ExLPose_test_LL-A.json", use_area=False) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-base_8xb64-210e_humanart-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-base_8xb64-210e_humanart-256x192.py index 4aa431e044f7a6cbf8fcad8a25298b2e14fedfa2..750bcb55ce4a6e4629412d9b1ba9c35eaff57531 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-base_8xb64-210e_humanart-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-base_8xb64-210e_humanart-256x192.py @@ -1,113 +1,97 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), + optimizer=dict(type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), paramwise_cfg=dict( num_layers=12, layer_decay_rate=0.75, custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), }, ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), ) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='mmpretrain.VisionTransformer', - arch='base', + type="mmpretrain.VisionTransformer", + arch="base", img_size=(256, 192), patch_size=16, qkv_bias=True, drop_path_rate=0.3, with_cls_token=False, - out_type='featmap', + out_type="featmap", patch_cfg=dict(padding=2), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'v1/pretrained_models/mae_pretrain_vit_base.pth'), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "v1/pretrained_models/mae_pretrain_vit_base.pth" + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=768, out_channels=17, deconv_out_channels=(256, 256), deconv_kernel_sizes=(4, 4), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -data_root = 'data/' -dataset_type = 'HumanArtDataset' -data_mode = 'topdown' +data_root = "data/" +dataset_type = "HumanArtDataset" +data_mode = "topdown" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +99,35 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart_coco.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/training_humanart_coco.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/validation_humanart.json', - bbox_file=f'{data_root}HumanArt/person_detection_results/' - 'HumanArt_validation_detections_AP_H_56_person.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/validation_humanart.json", + bbox_file=f"{data_root}HumanArt/person_detection_results/" "HumanArt_validation_detections_AP_H_56_person.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'HumanArt/annotations/validation_humanart.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "HumanArt/annotations/validation_humanart.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-huge_8xb64-210e_humanart-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-huge_8xb64-210e_humanart-256x192.py index 925f68e3d18903e511a5c89426e5bd595aa4d1b6..2b7275af73a0de05d348711aa5335abab4268d10 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-huge_8xb64-210e_humanart-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-huge_8xb64-210e_humanart-256x192.py @@ -1,66 +1,49 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), + optimizer=dict(type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), paramwise_cfg=dict( num_layers=32, layer_decay_rate=0.85, custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), }, ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), ) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='mmcls.VisionTransformer', - arch='huge', + type="mmcls.VisionTransformer", + arch="huge", img_size=(256, 192), patch_size=16, qkv_bias=True, @@ -69,45 +52,46 @@ model = dict( output_cls_token=False, patch_cfg=dict(padding=2), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'v1/pretrained_models/mae_pretrain_vit_huge.pth'), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "v1/pretrained_models/mae_pretrain_vit_huge.pth" + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=1280, out_channels=17, deconv_out_channels=(256, 256), deconv_kernel_sizes=(4, 4), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -data_root = 'data/' -dataset_type = 'HumanArtDataset' -data_mode = 'topdown' +data_root = "data/" +dataset_type = "HumanArtDataset" +data_mode = "topdown" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +99,35 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart_coco.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/training_humanart_coco.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/validation_humanart.json', - bbox_file=f'{data_root}HumanArt/person_detection_results/' - 'HumanArt_validation_detections_AP_H_56_person.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/validation_humanart.json", + bbox_file=f"{data_root}HumanArt/person_detection_results/" "HumanArt_validation_detections_AP_H_56_person.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'HumanArt/annotations/validation_humanart.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "HumanArt/annotations/validation_humanart.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-large_8xb64-210e_humanart-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-large_8xb64-210e_humanart-256x192.py index 7ea9dbf3952876d9c70a06b982bf29eb461cfa8e..021b4f5c26994d54e0dc00878e5ef280ecb932f1 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-large_8xb64-210e_humanart-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-large_8xb64-210e_humanart-256x192.py @@ -1,66 +1,49 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), + optimizer=dict(type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), paramwise_cfg=dict( num_layers=24, layer_decay_rate=0.8, custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), }, ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), ) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='mmcls.VisionTransformer', - arch='large', + type="mmcls.VisionTransformer", + arch="large", img_size=(256, 192), patch_size=16, qkv_bias=True, @@ -69,45 +52,46 @@ model = dict( output_cls_token=False, patch_cfg=dict(padding=2), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'v1/pretrained_models/mae_pretrain_vit_large.pth'), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "v1/pretrained_models/mae_pretrain_vit_large.pth" + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=1024, out_channels=17, deconv_out_channels=(256, 256), deconv_kernel_sizes=(4, 4), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -data_root = 'data/' -dataset_type = 'HumanArtDataset' -data_mode = 'topdown' +data_root = "data/" +dataset_type = "HumanArtDataset" +data_mode = "topdown" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +99,35 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart_coco.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/training_humanart_coco.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/validation_humanart.json', - bbox_file=f'{data_root}HumanArt/person_detection_results/' - 'HumanArt_validation_detections_AP_H_56_person.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/validation_humanart.json", + bbox_file=f"{data_root}HumanArt/person_detection_results/" "HumanArt_validation_detections_AP_H_56_person.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'HumanArt/annotations/validation_humanart.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "HumanArt/annotations/validation_humanart.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-small_8xb64-210e_humanart-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-small_8xb64-210e_humanart-256x192.py index ed7817d2fe2f2d43c917f04c66bacf6b79f0a1f9..af7d414547284cdcae333e2a1c523a0a68b76dea 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-small_8xb64-210e_humanart-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_ViTPose-small_8xb64-210e_humanart-256x192.py @@ -1,118 +1,97 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -custom_imports = dict( - imports=['mmpose.engine.optim_wrappers.layer_decay_optim_wrapper'], - allow_failed_imports=False) +custom_imports = dict(imports=["mmpose.engine.optim_wrappers.layer_decay_optim_wrapper"], allow_failed_imports=False) optim_wrapper = dict( - optimizer=dict( - type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), + optimizer=dict(type="AdamW", lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1), paramwise_cfg=dict( num_layers=12, layer_decay_rate=0.8, custom_keys={ - 'bias': dict(decay_multi=0.0), - 'pos_embed': dict(decay_mult=0.0), - 'relative_position_bias_table': dict(decay_mult=0.0), - 'norm': dict(decay_mult=0.0), + "bias": dict(decay_multi=0.0), + "pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), }, ), - constructor='LayerDecayOptimWrapperConstructor', - clip_grad=dict(max_norm=1., norm_type=2), + constructor="LayerDecayOptimWrapperConstructor", + clip_grad=dict(max_norm=1.0, norm_type=2), ) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater", max_keep_ckpts=1)) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='mmpretrain.VisionTransformer', - arch={ - 'embed_dims': 384, - 'num_layers': 12, - 'num_heads': 12, - 'feedforward_channels': 384 * 4 - }, + type="mmpretrain.VisionTransformer", + arch={"embed_dims": 384, "num_layers": 12, "num_heads": 12, "feedforward_channels": 384 * 4}, img_size=(256, 192), patch_size=16, qkv_bias=True, drop_path_rate=0.1, with_cls_token=False, - out_type='featmap', + out_type="featmap", patch_cfg=dict(padding=2), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'v1/pretrained_models/mae_pretrain_vit_small.pth'), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "v1/pretrained_models/mae_pretrain_vit_small.pth" + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=384, out_channels=17, deconv_out_channels=(256, 256), deconv_kernel_sizes=(4, 4), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -data_root = 'data/' -dataset_type = 'HumanArtDataset' -data_mode = 'topdown' +data_root = "data/" +dataset_type = "HumanArtDataset" +data_mode = "topdown" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] # data loaders @@ -120,36 +99,35 @@ train_dataloader = dict( batch_size=64, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart_coco.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/training_humanart_coco.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/validation_humanart.json', - bbox_file=f'{data_root}HumanArt/person_detection_results/' - 'HumanArt_validation_detections_AP_H_56_person.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/validation_humanart.json", + bbox_file=f"{data_root}HumanArt/person_detection_results/" "HumanArt_validation_detections_AP_H_56_person.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'HumanArt/annotations/validation_humanart.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "HumanArt/annotations/validation_humanart.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_hrnet-w32_8xb64-210e_humanart-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_hrnet-w32_8xb64-210e_humanart-256x192.py index bf9fa25beb8ed2e5bc4dd565ef35d56e031fb779..3fb0b75fb299d4f36756f6f23702ed5c793be2f9 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_hrnet-w32_8xb64-210e_humanart-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_hrnet-w32_8xb64-210e_humanart-256x192.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'HumanArtDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "HumanArtDataset" +data_mode = "topdown" +data_root = "data/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +84,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart_coco.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/training_humanart_coco.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/validation_humanart.json', - bbox_file=f'{data_root}HumanArt/person_detection_results/' - 'HumanArt_validation_detections_AP_H_56_person.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/validation_humanart.json", + bbox_file=f"{data_root}HumanArt/person_detection_results/" "HumanArt_validation_detections_AP_H_56_person.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'HumanArt/annotations/validation_humanart.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "HumanArt/annotations/validation_humanart.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_hrnet-w48_8xb32-210e_humanart-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_hrnet-w48_8xb32-210e_humanart-256x192.py index 6a5ae0707c2ac1973a293d41a51ac8bb471ae9fe..4c6f963618986ad1e162031d4f2bdc67cfa897e2 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_hrnet-w48_8xb32-210e_humanart-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/humanart/td-hm_hrnet-w48_8xb32-210e_humanart-256x192.py @@ -1,113 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'HumanArtDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "HumanArtDataset" +data_mode = "topdown" +data_root = "data/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -115,36 +84,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart_coco.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/training_humanart_coco.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/validation_humanart.json', - bbox_file=f'{data_root}HumanArt/person_detection_results/' - 'HumanArt_validation_detections_AP_H_56_person.json', - data_prefix=dict(img=''), + ann_file="HumanArt/annotations/validation_humanart.json", + bbox_file=f"{data_root}HumanArt/person_detection_results/" "HumanArt_validation_detections_AP_H_56_person.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'HumanArt/annotations/validation_humanart.json') +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "HumanArt/annotations/validation_humanart.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_cpm_8xb32-40e_jhmdb-sub1-368x368.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_cpm_8xb32-40e_jhmdb-sub1-368x368.py index 479039f5428f7f5e736beb4cfe9c7b88c986e4ed..31923ae505b89d256e59d7a1c1d8db37869451f6 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_cpm_8xb32-40e_jhmdb-sub1-368x368.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_cpm_8xb32-40e_jhmdb-sub1-368x368.py @@ -1,92 +1,74 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=40, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=40, - milestones=[20, 30], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=40, milestones=[20, 30], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict( - checkpoint=dict(save_best='PCK', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(368, 368), heatmap_size=(46, 46), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(368, 368), heatmap_size=(46, 46), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict( - type='CPM', - in_channels=3, - out_channels=15, - feat_channels=128, - num_stages=6), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="CPM", in_channels=3, out_channels=15, feat_channels=128, num_stages=6), head=dict( - type='CPMHead', + type="CPMHead", in_channels=15, out_channels=15, num_stages=6, deconv_out_channels=None, final_layer=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'JhmdbDataset' -data_mode = 'topdown' -data_root = 'data/jhmdb/' +dataset_type = "JhmdbDataset" +data_mode = "topdown" +data_root = "data/jhmdb/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -94,34 +76,36 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub1_train.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub1_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub1_test.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub1_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='JhmdbPCKAccuracy', thr=0.2, norm_item=['bbox', 'torso']), + dict(type="JhmdbPCKAccuracy", thr=0.2, norm_item=["bbox", "torso"]), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_cpm_8xb32-40e_jhmdb-sub2-368x368.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_cpm_8xb32-40e_jhmdb-sub2-368x368.py index 88b60e9f87dfc783610aa8222a4256d9625efc60..c53913c5a6c45662b5301676f5fc3c16442ffd66 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_cpm_8xb32-40e_jhmdb-sub2-368x368.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_cpm_8xb32-40e_jhmdb-sub2-368x368.py @@ -1,92 +1,74 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=40, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=40, - milestones=[20, 30], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=40, milestones=[20, 30], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict( - checkpoint=dict(save_best='PCK', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(368, 368), heatmap_size=(46, 46), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(368, 368), heatmap_size=(46, 46), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict( - type='CPM', - in_channels=3, - out_channels=15, - feat_channels=128, - num_stages=6), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="CPM", in_channels=3, out_channels=15, feat_channels=128, num_stages=6), head=dict( - type='CPMHead', + type="CPMHead", in_channels=15, out_channels=15, num_stages=6, deconv_out_channels=None, final_layer=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'JhmdbDataset' -data_mode = 'topdown' -data_root = 'data/jhmdb/' +dataset_type = "JhmdbDataset" +data_mode = "topdown" +data_root = "data/jhmdb/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -94,34 +76,36 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub2_train.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub2_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub2_test.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub2_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='JhmdbPCKAccuracy', thr=0.2, norm_item=['bbox', 'torso']), + dict(type="JhmdbPCKAccuracy", thr=0.2, norm_item=["bbox", "torso"]), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_cpm_8xb32-40e_jhmdb-sub3-368x368.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_cpm_8xb32-40e_jhmdb-sub3-368x368.py index 602b2bcfd6aac7df667da5d71ea9d8ea233778ad..93c74bc6e6e0eedbdc6ae89c13fb545c50812547 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_cpm_8xb32-40e_jhmdb-sub3-368x368.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_cpm_8xb32-40e_jhmdb-sub3-368x368.py @@ -1,92 +1,74 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=40, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=40, - milestones=[20, 30], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=40, milestones=[20, 30], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict( - checkpoint=dict(save_best='PCK', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(368, 368), heatmap_size=(46, 46), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(368, 368), heatmap_size=(46, 46), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict( - type='CPM', - in_channels=3, - out_channels=15, - feat_channels=128, - num_stages=6), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="CPM", in_channels=3, out_channels=15, feat_channels=128, num_stages=6), head=dict( - type='CPMHead', + type="CPMHead", in_channels=15, out_channels=15, num_stages=6, deconv_out_channels=None, final_layer=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'JhmdbDataset' -data_mode = 'topdown' -data_root = 'data/jhmdb/' +dataset_type = "JhmdbDataset" +data_mode = "topdown" +data_root = "data/jhmdb/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -94,34 +76,36 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub3_train.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub3_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub3_test.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub3_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='JhmdbPCKAccuracy', thr=0.2, norm_item=['bbox', 'torso']), + dict(type="JhmdbPCKAccuracy", thr=0.2, norm_item=["bbox", "torso"]), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50-2deconv_8xb64-40e_jhmdb-sub1-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50-2deconv_8xb64-40e_jhmdb-sub1-256x256.py index 8d104e1e86e0818947f86612dbbe8b4c9b30e31f..e70c3aa703fdad9448ce2bd6bd302ea753394b63 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50-2deconv_8xb64-40e_jhmdb-sub1-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50-2deconv_8xb64-40e_jhmdb-sub1-256x256.py @@ -1,87 +1,74 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=40, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=40, - milestones=[20, 30], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=40, milestones=[20, 30], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='PCK', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(32, 32), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(32, 32), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict(type='ResNet', depth=50), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ResNet", depth=50), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=2048, out_channels=15, deconv_out_channels=(256, 256), deconv_kernel_sizes=(4, 4), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) -load_from = 'https://download.openmmlab.com/mmpose/top_down/resnet/res50_mpii_256x256-418ffc88_20200812.pth' # noqa: E501 + ), +) +load_from = "https://download.openmmlab.com/mmpose/top_down/resnet/res50_mpii_256x256-418ffc88_20200812.pth" # noqa: E501 # base dataset settings -dataset_type = 'JhmdbDataset' -data_mode = 'topdown' -data_root = 'data/jhmdb/' +dataset_type = "JhmdbDataset" +data_mode = "topdown" +data_root = "data/jhmdb/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,34 +76,36 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub1_train.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub1_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub1_test.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub1_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='JhmdbPCKAccuracy', thr=0.2, norm_item=['bbox', 'torso']), + dict(type="JhmdbPCKAccuracy", thr=0.2, norm_item=["bbox", "torso"]), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50-2deconv_8xb64-40e_jhmdb-sub2-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50-2deconv_8xb64-40e_jhmdb-sub2-256x256.py index 6135ce29ab3b070586d0324f95c37b272002459e..f27658f6fc22b66e26faaeb4d3c51a8a09a98ebb 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50-2deconv_8xb64-40e_jhmdb-sub2-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50-2deconv_8xb64-40e_jhmdb-sub2-256x256.py @@ -1,87 +1,74 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=40, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=40, - milestones=[20, 30], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=40, milestones=[20, 30], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='PCK', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(32, 32), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(32, 32), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict(type='ResNet', depth=50), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ResNet", depth=50), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=2048, out_channels=15, deconv_out_channels=(256, 256), deconv_kernel_sizes=(4, 4), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) -load_from = 'https://download.openmmlab.com/mmpose/top_down/resnet/res50_mpii_256x256-418ffc88_20200812.pth' # noqa: E501 + ), +) +load_from = "https://download.openmmlab.com/mmpose/top_down/resnet/res50_mpii_256x256-418ffc88_20200812.pth" # noqa: E501 # base dataset settings -dataset_type = 'JhmdbDataset' -data_mode = 'topdown' -data_root = 'data/jhmdb/' +dataset_type = "JhmdbDataset" +data_mode = "topdown" +data_root = "data/jhmdb/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,34 +76,36 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub2_train.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub2_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub2_test.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub2_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='JhmdbPCKAccuracy', thr=0.2, norm_item=['bbox', 'torso']), + dict(type="JhmdbPCKAccuracy", thr=0.2, norm_item=["bbox", "torso"]), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50-2deconv_8xb64-40e_jhmdb-sub3-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50-2deconv_8xb64-40e_jhmdb-sub3-256x256.py index 44d95b15b2a0e73eb93deefc32e5e3f093212648..42840b0bc2ed69bfc76edf7ba1d9cbbae078e386 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50-2deconv_8xb64-40e_jhmdb-sub3-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50-2deconv_8xb64-40e_jhmdb-sub3-256x256.py @@ -1,87 +1,74 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=40, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=40, - milestones=[20, 30], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=40, milestones=[20, 30], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='PCK', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(32, 32), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(32, 32), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict(type='ResNet', depth=50), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ResNet", depth=50), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=2048, out_channels=15, deconv_out_channels=(256, 256), deconv_kernel_sizes=(4, 4), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) -load_from = 'https://download.openmmlab.com/mmpose/top_down/resnet/res50_mpii_256x256-418ffc88_20200812.pth' # noqa: E501 + ), +) +load_from = "https://download.openmmlab.com/mmpose/top_down/resnet/res50_mpii_256x256-418ffc88_20200812.pth" # noqa: E501 # base dataset settings -dataset_type = 'JhmdbDataset' -data_mode = 'topdown' -data_root = 'data/jhmdb/' +dataset_type = "JhmdbDataset" +data_mode = "topdown" +data_root = "data/jhmdb/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,34 +76,36 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub3_train.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub3_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub3_test.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub3_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='JhmdbPCKAccuracy', thr=0.2, norm_item=['bbox', 'torso']), + dict(type="JhmdbPCKAccuracy", thr=0.2, norm_item=["bbox", "torso"]), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50_8xb64-20e_jhmdb-sub1-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50_8xb64-20e_jhmdb-sub1-256x256.py index 9578a66c18b3b58a9cd85ecb4941913eac6175ea..3d79b5f7457d202cc158f0a7818a8aba620cf0cd 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50_8xb64-20e_jhmdb-sub1-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50_8xb64-20e_jhmdb-sub1-256x256.py @@ -1,85 +1,68 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=20, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=20, - milestones=[8, 15], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=20, milestones=[8, 15], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='PCK', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict(type='ResNet', depth=50), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ResNet", depth=50), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=15, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=15, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) -load_from = 'https://download.openmmlab.com/mmpose/top_down/resnet/res50_mpii_256x256-418ffc88_20200812.pth' # noqa: E501 + ), +) +load_from = "https://download.openmmlab.com/mmpose/top_down/resnet/res50_mpii_256x256-418ffc88_20200812.pth" # noqa: E501 # base dataset settings -dataset_type = 'JhmdbDataset' -data_mode = 'topdown' -data_root = 'data/jhmdb/' +dataset_type = "JhmdbDataset" +data_mode = "topdown" +data_root = "data/jhmdb/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,34 +70,36 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub1_train.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub1_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub1_test.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub1_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='JhmdbPCKAccuracy', thr=0.2, norm_item=['bbox', 'torso']), + dict(type="JhmdbPCKAccuracy", thr=0.2, norm_item=["bbox", "torso"]), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50_8xb64-20e_jhmdb-sub2-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50_8xb64-20e_jhmdb-sub2-256x256.py index 856c89e660b7e2e866c4bd48eff32bf9faff731d..b92b6a6b4eed82f14d828d7eb1cdf8038ce11759 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50_8xb64-20e_jhmdb-sub2-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50_8xb64-20e_jhmdb-sub2-256x256.py @@ -1,85 +1,68 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=20, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=20, - milestones=[8, 15], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=20, milestones=[8, 15], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='PCK', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict(type='ResNet', depth=50), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ResNet", depth=50), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=15, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=15, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) -load_from = 'https://download.openmmlab.com/mmpose/top_down/resnet/res50_mpii_256x256-418ffc88_20200812.pth' # noqa: E501 + ), +) +load_from = "https://download.openmmlab.com/mmpose/top_down/resnet/res50_mpii_256x256-418ffc88_20200812.pth" # noqa: E501 # base dataset settings -dataset_type = 'JhmdbDataset' -data_mode = 'topdown' -data_root = 'data/jhmdb/' +dataset_type = "JhmdbDataset" +data_mode = "topdown" +data_root = "data/jhmdb/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,34 +70,36 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub2_train.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub2_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub2_test.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub2_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='JhmdbPCKAccuracy', thr=0.2, norm_item=['bbox', 'torso']), + dict(type="JhmdbPCKAccuracy", thr=0.2, norm_item=["bbox", "torso"]), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50_8xb64-20e_jhmdb-sub3-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50_8xb64-20e_jhmdb-sub3-256x256.py index 73065968848a063b462504c55e4a2ac85ffd49d9..436e3c90eccfe2f9efb75ea621c9a6f0940cfd63 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50_8xb64-20e_jhmdb-sub3-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/jhmdb/td-hm_res50_8xb64-20e_jhmdb-sub3-256x256.py @@ -1,85 +1,68 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=20, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=20, - milestones=[8, 15], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=20, milestones=[8, 15], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='PCK', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict(type='ResNet', depth=50), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ResNet", depth=50), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=15, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=15, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) -load_from = 'https://download.openmmlab.com/mmpose/top_down/resnet/res50_mpii_256x256-418ffc88_20200812.pth' # noqa: E501 + ), +) +load_from = "https://download.openmmlab.com/mmpose/top_down/resnet/res50_mpii_256x256-418ffc88_20200812.pth" # noqa: E501 # base dataset settings -dataset_type = 'JhmdbDataset' -data_mode = 'topdown' -data_root = 'data/jhmdb/' +dataset_type = "JhmdbDataset" +data_mode = "topdown" +data_root = "data/jhmdb/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,34 +70,36 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub3_train.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub3_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/Sub3_test.json', - data_prefix=dict(img=''), + ann_file="annotations/Sub3_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='JhmdbPCKAccuracy', thr=0.2, norm_item=['bbox', 'torso']), + dict(type="JhmdbPCKAccuracy", thr=0.2, norm_item=["bbox", "torso"]), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/cspnext-m_udp_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/cspnext-m_udp_8xb64-210e_mpii-256x256.py index fc8d6fdcea8d717c9ecbc70fb966364dea14257e..d6f123a61c794a748df018a47fa10f549ebafe0f 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/cspnext-m_udp_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/cspnext-m_udp_8xb64-210e_mpii-256x256.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -9,80 +9,71 @@ train_cfg = dict(max_epochs=max_epochs, val_interval=10) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning policy param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 210 to 420 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=1024) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/' - 'cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth')), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/" + "cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth", + ), + ), head=dict( - type='HeatmapHead', - in_channels=768, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=768, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=False, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -92,68 +83,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -161,50 +133,43 @@ train_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file=f'{data_root}/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file=f"{data_root}/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='PCK', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_cpm_8xb64-210e_mpii-368x368.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_cpm_8xb64-210e_mpii-368x368.py index 794c49420ab69ae202685bb70c6d8ec8e1b2a02b..2eaf3176b248b559314c8b4085ad12f4696c2f9b 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_cpm_8xb64-210e_mpii-368x368.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_cpm_8xb64-210e_mpii-368x368.py @@ -1,91 +1,73 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(368, 368), heatmap_size=(46, 46), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(368, 368), heatmap_size=(46, 46), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict( - type='CPM', - in_channels=3, - out_channels=16, - feat_channels=128, - num_stages=6), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="CPM", in_channels=3, out_channels=16, feat_channels=128, num_stages=6), head=dict( - type='CPMHead', + type="CPMHead", in_channels=16, out_channels=16, num_stages=6, deconv_out_channels=None, final_layer=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_prob=0, - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0, rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -93,33 +75,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hourglass52_8xb32-210e_mpii-384x384.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hourglass52_8xb32-210e_mpii-384x384.py index e9546504e0d3ead0b6977c33a4172a2581532a7f..a92c39aec3ca0760c5e0434f32d355c1ec1d3cd8 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hourglass52_8xb32-210e_mpii-384x384.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hourglass52_8xb32-210e_mpii-384x384.py @@ -1,84 +1,75 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(384, 384), heatmap_size=(96, 96), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(384, 384), heatmap_size=(96, 96), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HourglassNet', + type="HourglassNet", num_stacks=1, ), head=dict( - type='CPMHead', + type="CPMHead", in_channels=256, out_channels=16, num_stages=1, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,33 +77,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hourglass52_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hourglass52_8xb64-210e_mpii-256x256.py index cd854a40a3f5d6def990488b5967058997d2348f..3e9f132554ee85dca473c421e9fca90083c4c8b1 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hourglass52_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hourglass52_8xb64-210e_mpii-256x256.py @@ -1,84 +1,75 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HourglassNet', + type="HourglassNet", num_stacks=1, ), head=dict( - type='CPMHead', + type="CPMHead", in_channels=256, out_channels=16, num_stages=1, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,33 +77,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w32_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w32_8xb64-210e_mpii-256x256.py index 459f24f3bdbbdc4a93e43e16382b023d6ff76e50..88d718a18f0628f2b415239e083d4e0c6e0c2ad0 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w32_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w32_8xb64-210e_mpii-256x256.py @@ -1,112 +1,81 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=16, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -114,33 +83,35 @@ train_dataloader = dict( batch_size=16, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=16, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w32_dark-8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w32_dark-8xb64-210e_mpii-256x256.py index 5d47ed6fdc161e019c94b1ab64751a096a0d4537..16f156d8442493888c86517746f600adabbe1906 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w32_dark-8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w32_dark-8xb64-210e_mpii-256x256.py @@ -1,116 +1,81 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(256, 256), - heatmap_size=(64, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=16, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -118,33 +83,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w48_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w48_8xb64-210e_mpii-256x256.py index 4e3fce96000a2ff5e1165a88a322e1cfd1226c0a..57618c52e2f50fbe021d14ea029c920a84230abe 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w48_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w48_8xb64-210e_mpii-256x256.py @@ -1,112 +1,81 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=16, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -114,33 +83,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w48_dark-8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w48_dark-8xb64-210e_mpii-256x256.py index 18b31539a33d542ae8a0f83b42835a7cf97ec5c2..4ce8f0834e979b1e36a9de1303cc4a58006b4ed4 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w48_dark-8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_hrnet-w48_dark-8xb64-210e_mpii-256x256.py @@ -1,116 +1,81 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(256, 256), - heatmap_size=(64, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=16, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -118,33 +83,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_litehrnet-18_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_litehrnet-18_8xb64-210e_mpii-256x256.py index bdab446f5038c6d86231d27284aee3b3723bea14..b3d570dbad464569d77e996a7cfbf50ad7b10aae 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_litehrnet-18_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_litehrnet-18_8xb64-210e_mpii-256x256.py @@ -1,48 +1,37 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='LiteHRNet', + type="LiteHRNet", in_channels=3, extra=dict( stem=dict(stem_channels=32, out_channels=32, expand_ratio=1), @@ -51,53 +40,53 @@ model = dict( num_modules=(2, 4, 2), num_branches=(2, 3, 4), num_blocks=(2, 2, 2), - module_type=('LITE', 'LITE', 'LITE'), + module_type=("LITE", "LITE", "LITE"), with_fuse=(True, True, True), reduce_ratios=(8, 8, 8), num_channels=( (40, 80), (40, 80, 160), (40, 80, 160, 320), - )), + ), + ), with_head=True, - )), + ), + ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=40, out_channels=16, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_prob=0, - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0, rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -105,33 +94,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_litehrnet-30_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_litehrnet-30_8xb64-210e_mpii-256x256.py index 84089add2a0d2c6c7d7d8b75275cf097a9b68e7f..0b827473f4e8b7625df3251f982bd38043f42921 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_litehrnet-30_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_litehrnet-30_8xb64-210e_mpii-256x256.py @@ -1,48 +1,37 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='LiteHRNet', + type="LiteHRNet", in_channels=3, extra=dict( stem=dict(stem_channels=32, out_channels=32, expand_ratio=1), @@ -51,53 +40,53 @@ model = dict( num_modules=(3, 8, 3), num_branches=(2, 3, 4), num_blocks=(2, 2, 2), - module_type=('LITE', 'LITE', 'LITE'), + module_type=("LITE", "LITE", "LITE"), with_fuse=(True, True, True), reduce_ratios=(8, 8, 8), num_channels=( (40, 80), (40, 80, 160), (40, 80, 160, 320), - )), + ), + ), with_head=True, - )), + ), + ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=40, out_channels=16, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_prob=0, - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0, rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -105,33 +94,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_mobilenetv2_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_mobilenetv2_8xb64-210e_mpii-256x256.py index 41b9d3ba9ba964f34f1204d185e36dcbcb3821e0..59a160da28dbbe65f953b4632c535e87b8bc324c 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_mobilenetv2_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_mobilenetv2_8xb64-210e_mpii-256x256.py @@ -1,84 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='MobileNetV2', - widen_factor=1., - out_indices=(7, ), - init_cfg=dict(type='Pretrained', checkpoint='mmcls://mobilenet_v2'), + type="MobileNetV2", + widen_factor=1.0, + out_indices=(7,), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://mobilenet_v2"), ), head=dict( - type='HeatmapHead', - in_channels=1280, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1280, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,33 +73,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_res101_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_res101_8xb64-210e_mpii-256x256.py index def5d2fd1681262689afd40b20a0299e64118136..4bb7b6f9e4fdabffec8fa407aff6a21fd86af42c 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_res101_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_res101_8xb64-210e_mpii-256x256.py @@ -1,83 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,33 +72,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_res152_8xb32-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_res152_8xb32-210e_mpii-256x256.py index bf515d0d21e6796af7fc79fb39ec27cd0fb0c7b0..85b9690508868ac2e28a0e5d9f3e73e10f7df770 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_res152_8xb32-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_res152_8xb32-210e_mpii-256x256.py @@ -1,83 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,33 +72,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_res50_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_res50_8xb64-210e_mpii-256x256.py index dee56ae77b0c7b7fa40690e712e7c7ad4648f279..99c5ebf97464a6310db3475dd7f482cfd3b3e896 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_res50_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_res50_8xb64-210e_mpii-256x256.py @@ -1,83 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,33 +72,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnetv1d101_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnetv1d101_8xb64-210e_mpii-256x256.py index 0cbf684e38c1358cd939621294765249e1e5d68e..015ca00125b8a4fdfa4783ff47826d78d08252a3 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnetv1d101_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnetv1d101_8xb64-210e_mpii-256x256.py @@ -1,83 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNetV1d', + type="ResNetV1d", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnet101_v1d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnet101_v1d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,33 +72,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnetv1d152_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnetv1d152_8xb64-210e_mpii-256x256.py index 24653a9e56b982b150ced4157c486428a34f9d04..900353ee59874b2bbb75041725bfe8210e61c58f 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnetv1d152_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnetv1d152_8xb64-210e_mpii-256x256.py @@ -1,83 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNetV1d', + type="ResNetV1d", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnet152_v1d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnet152_v1d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,33 +72,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnetv1d50_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnetv1d50_8xb64-210e_mpii-256x256.py index 48bcfec5eb5017036168fae73396d809fcb3f567..aaffcbf92de72c6f4a2a1b54d34811f272767833 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnetv1d50_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnetv1d50_8xb64-210e_mpii-256x256.py @@ -1,83 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNetV1d', + type="ResNetV1d", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://resnet50_v1d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnet50_v1d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,33 +72,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnext152_8xb32-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnext152_8xb32-210e_mpii-256x256.py index 30afb101037cc31d9dd51ac02487e5ef749921c7..3d48c62c359f9f69dd6c3aa79ba5dbe4725821e4 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnext152_8xb32-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_resnext152_8xb32-210e_mpii-256x256.py @@ -1,84 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=152, - init_cfg=dict( - type='Pretrained', checkpoint='mmcls://resnext152_32x4d'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://resnext152_32x4d"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,33 +72,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_scnet101_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_scnet101_8xb64-210e_mpii-256x256.py index fb5c6b702c28300525db4137973889967af9d09c..ddeeb59925679833f985aaf9db80dec3748ae4b7 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_scnet101_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_scnet101_8xb64-210e_mpii-256x256.py @@ -1,86 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SCNet', + type="SCNet", depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/scnet101-94250a77.pth'), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/scnet101-94250a77.pth"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -88,33 +72,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_scnet50_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_scnet50_8xb64-210e_mpii-256x256.py index c2f7723724b80d730f70d00f7649adb5935a10fc..2e5dbd8484bb71b1065dde6316e34319fd6d9986 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_scnet50_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_scnet50_8xb64-210e_mpii-256x256.py @@ -1,86 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SCNet', + type="SCNet", depth=50, - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/scnet50-7ef0a199.pth'), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/scnet50-7ef0a199.pth"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -88,33 +72,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_seresnet101_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_seresnet101_8xb64-210e_mpii-256x256.py index 56b7fccb2e121fdd9734f9a43963f7fe1cc7511c..27380ba746d4d134de26fa4b33288e2f39a666c8 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_seresnet101_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_seresnet101_8xb64-210e_mpii-256x256.py @@ -1,83 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SEResNet', + type="SEResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://se-resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://se-resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,33 +72,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_seresnet152_8xb32-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_seresnet152_8xb32-210e_mpii-256x256.py index 79bb29e4b34fba243bca0635df2d8548e19ed76b..18525a690d5c786d048af3568733c3edf196f4fb 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_seresnet152_8xb32-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_seresnet152_8xb32-210e_mpii-256x256.py @@ -1,82 +1,69 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SEResNet', + type="SEResNet", depth=152, ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -84,33 +71,35 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_seresnet50_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_seresnet50_8xb64-210e_mpii-256x256.py index 257dc360ad1ea41cec56d57bd4de19a59146a7a5..cc20f5162e1daea23dbb66757bbf50b0eb171201 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_seresnet50_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_seresnet50_8xb64-210e_mpii-256x256.py @@ -1,83 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SEResNet', + type="SEResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://se-resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://se-resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,33 +72,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_shufflenetv1_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_shufflenetv1_8xb64-210e_mpii-256x256.py index 83eaca208f237d6eff8b7930e36bc91213af4fdf..4d82d08469b61116cc6f5a0096733ac4ba9ab40f 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_shufflenetv1_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_shufflenetv1_8xb64-210e_mpii-256x256.py @@ -1,83 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ShuffleNetV1', + type="ShuffleNetV1", groups=3, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://shufflenet_v1'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://shufflenet_v1"), ), head=dict( - type='HeatmapHead', - in_channels=960, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=960, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,33 +72,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_shufflenetv2_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_shufflenetv2_8xb64-210e_mpii-256x256.py index cd05c23596c21c7aa2f491c7e95399f2ec1126c7..1be4b5d62d256c7cf196aafe6d0580ac3450f00c 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_shufflenetv2_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/mpii/td-hm_shufflenetv2_8xb64-210e_mpii-256x256.py @@ -1,83 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ShuffleNetV2', + type="ShuffleNetV2", widen_factor=1.0, - init_cfg=dict(type='Pretrained', checkpoint='mmcls://shufflenet_v2'), + init_cfg=dict(type="Pretrained", checkpoint="mmcls://shufflenet_v2"), ), head=dict( - type='HeatmapHead', - in_channels=1024, - out_channels=16, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1024, out_channels=16, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,33 +72,35 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file='data/mpii/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file="data/mpii/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w32_8xb64-20e_posetrack18-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w32_8xb64-20e_posetrack18-256x192.py index fe8e385f1daac0ac4df7a805203a88e87f487730..3b125ec7215a2a6dbfe2d5697a778b0a632ed8f1 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w32_8xb64-20e_posetrack18-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w32_8xb64-20e_posetrack18-256x192.py @@ -1,116 +1,86 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=20, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=20, - milestones=[10, 15], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=20, milestones=[10, 15], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict( - save_best='posetrack18/Total AP', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="posetrack18/Total AP", rule="greater", interval=1)) # load from the pretrained model -load_from = 'https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-256x192-81c58e40_20220909.pth' # noqa: E501 +load_from = "https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-256x192-81c58e40_20220909.pth" # noqa: E501 # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'PoseTrack18Dataset' -data_mode = 'topdown' -data_root = 'data/posetrack18/' +dataset_type = "PoseTrack18Dataset" +data_mode = "topdown" +data_root = "data/posetrack18/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -118,38 +88,39 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/posetrack18_train.json', - data_prefix=dict(img=''), + ann_file="annotations/posetrack18_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/posetrack18_val.json', + ann_file="annotations/posetrack18_val.json", # comment `bbox_file` and '`filter_cfg` if use gt bbox for evaluation - bbox_file='data/posetrack18/annotations/' - 'posetrack18_val_human_detections.json', + bbox_file="data/posetrack18/annotations/" "posetrack18_val_human_detections.json", filter_cfg=dict(bbox_score_thr=0.4), - data_prefix=dict(img=''), + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='PoseTrack18Metric', - ann_file=data_root + 'annotations/posetrack18_val.json', + type="PoseTrack18Metric", + ann_file=data_root + "annotations/posetrack18_val.json", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w32_8xb64-20e_posetrack18-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w32_8xb64-20e_posetrack18-384x288.py index 513207441068ff0dcf37a98e995d3be47baf4817..aa619eec45b72de546f8e61bdc7c2208c23ed8a0 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w32_8xb64-20e_posetrack18-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w32_8xb64-20e_posetrack18-384x288.py @@ -1,116 +1,86 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=20, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=20, - milestones=[10, 15], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=20, milestones=[10, 15], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict( - save_best='posetrack18/Total AP', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="posetrack18/Total AP", rule="greater", interval=1)) # load from the pretrained model -load_from = 'https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-384x288-ca5956af_20220909.pth' # noqa: E501 +load_from = "https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-210e_coco-384x288-ca5956af_20220909.pth" # noqa: E501 # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'PoseTrack18Dataset' -data_mode = 'topdown' -data_root = 'data/posetrack18/' +dataset_type = "PoseTrack18Dataset" +data_mode = "topdown" +data_root = "data/posetrack18/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -118,38 +88,39 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/posetrack18_train.json', - data_prefix=dict(img=''), + ann_file="annotations/posetrack18_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/posetrack18_val.json', + ann_file="annotations/posetrack18_val.json", # comment `bbox_file` and '`filter_cfg` if use gt bbox for evaluation - bbox_file='data/posetrack18/annotations/' - 'posetrack18_val_human_detections.json', + bbox_file="data/posetrack18/annotations/" "posetrack18_val_human_detections.json", filter_cfg=dict(bbox_score_thr=0.4), - data_prefix=dict(img=''), + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='PoseTrack18Metric', - ann_file=data_root + 'annotations/posetrack18_val.json', + type="PoseTrack18Metric", + ann_file=data_root + "annotations/posetrack18_val.json", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w48_8xb64-20e_posetrack18-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w48_8xb64-20e_posetrack18-256x192.py index cac23f14e47b4ba1f6ed5cb6c43ea6c11c5e89ad..14f5d9c781e4fd5df376ad9f033c5ff62aba219d 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w48_8xb64-20e_posetrack18-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w48_8xb64-20e_posetrack18-256x192.py @@ -1,116 +1,86 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=20, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=20, - milestones=[10, 15], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=20, milestones=[10, 15], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict( - save_best='posetrack18/Total AP', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="posetrack18/Total AP", rule="greater", interval=1)) # load from the pretrained model -load_from = 'https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_8xb32-210e_coco-256x192-0e67c616_20220913.pth' # noqa: E501 +load_from = "https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_8xb32-210e_coco-256x192-0e67c616_20220913.pth" # noqa: E501 # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'PoseTrack18Dataset' -data_mode = 'topdown' -data_root = 'data/posetrack18/' +dataset_type = "PoseTrack18Dataset" +data_mode = "topdown" +data_root = "data/posetrack18/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -118,38 +88,39 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/posetrack18_train.json', - data_prefix=dict(img=''), + ann_file="annotations/posetrack18_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/posetrack18_val.json', + ann_file="annotations/posetrack18_val.json", # comment `bbox_file` and '`filter_cfg` if use gt bbox for evaluation - bbox_file='data/posetrack18/annotations/' - 'posetrack18_val_human_detections.json', + bbox_file="data/posetrack18/annotations/" "posetrack18_val_human_detections.json", filter_cfg=dict(bbox_score_thr=0.4), - data_prefix=dict(img=''), + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='PoseTrack18Metric', - ann_file=data_root + 'annotations/posetrack18_val.json', + type="PoseTrack18Metric", + ann_file=data_root + "annotations/posetrack18_val.json", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w48_8xb64-20e_posetrack18-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w48_8xb64-20e_posetrack18-384x288.py index 7ee99469fed8ae914e7aa91b3a32281f9f18ca1b..140ec0d6731b91eab076d08433efb880912308ee 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w48_8xb64-20e_posetrack18-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_hrnet-w48_8xb64-20e_posetrack18-384x288.py @@ -1,116 +1,86 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=20, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=20, - milestones=[10, 15], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=20, milestones=[10, 15], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict( - save_best='posetrack18/Total AP', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="posetrack18/Total AP", rule="greater", interval=1)) # load from the pretrained model -load_from = 'https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_8xb32-210e_coco-384x288-c161b7de_20220915.pth' # noqa: E501 +load_from = "https://download.openmmlab.com/mmpose/v1/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_8xb32-210e_coco-384x288-c161b7de_20220915.pth" # noqa: E501 # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=17, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'PoseTrack18Dataset' -data_mode = 'topdown' -data_root = 'data/posetrack18/' +dataset_type = "PoseTrack18Dataset" +data_mode = "topdown" +data_root = "data/posetrack18/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -118,38 +88,39 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/posetrack18_train.json', - data_prefix=dict(img=''), + ann_file="annotations/posetrack18_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/posetrack18_val.json', + ann_file="annotations/posetrack18_val.json", # comment `bbox_file` and '`filter_cfg` if use gt bbox for evaluation - bbox_file='data/posetrack18/annotations/' - 'posetrack18_val_human_detections.json', + bbox_file="data/posetrack18/annotations/" "posetrack18_val_human_detections.json", filter_cfg=dict(bbox_score_thr=0.4), - data_prefix=dict(img=''), + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='PoseTrack18Metric', - ann_file=data_root + 'annotations/posetrack18_val.json', + type="PoseTrack18Metric", + ann_file=data_root + "annotations/posetrack18_val.json", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_res50_8xb64-20e_posetrack18-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_res50_8xb64-20e_posetrack18-256x192.py index f8e529d120733235c82e8088cb983127cf35f95d..798ec1ab3c925541ec8e107e3e1183fb029d9e9a 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_res50_8xb64-20e_posetrack18-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_heatmap/posetrack18/td-hm_res50_8xb64-20e_posetrack18-256x192.py @@ -1,91 +1,76 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=20, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=20, - milestones=[10, 15], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=20, milestones=[10, 15], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict( - save_best='posetrack18/Total AP', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="posetrack18/Total AP", rule="greater", interval=1)) # load from the pretrained model -load_from = 'https://download.openmmlab.com/mmpose/top_down/resnet/res50_coco_256x192-ec54d7f3_20200709.pth' # noqa: E501 +load_from = "https://download.openmmlab.com/mmpose/top_down/resnet/res50_coco_256x192-ec54d7f3_20200709.pth" # noqa: E501 # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings -norm_cfg = dict(type='SyncBN', requires_grad=True) +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=17, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=17, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'PoseTrack18Dataset' -data_mode = 'topdown' -data_root = 'data/posetrack18/' +dataset_type = "PoseTrack18Dataset" +data_mode = "topdown" +data_root = "data/posetrack18/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -93,34 +78,36 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/posetrack18_train.json', - data_prefix=dict(img=''), + ann_file="annotations/posetrack18_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/posetrack18_val.json', - data_prefix=dict(img=''), + ann_file="annotations/posetrack18_val.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='PoseTrack18Metric', - ann_file=data_root + 'annotations/posetrack18_val.json', + type="PoseTrack18Metric", + ann_file=data_root + "annotations/posetrack18_val.json", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_mobilenetv2_rle-pretrained-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_mobilenetv2_rle-pretrained-8xb64-210e_coco-256x192.py index 97f5d926c66be84ef1bc8fb8f1f187730cebd46d..3df1ffac3fd1fbcf082a4647bae20f739bcaeb0c 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_mobilenetv2_rle-pretrained-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_mobilenetv2_rle-pretrained-8xb64-210e_coco-256x192.py @@ -1,58 +1,44 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=1e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=1e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(192, 256)) +codec = dict(type="RegressionLabel", input_size=(192, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='MobileNetV2', - widen_factor=1., - out_indices=(7, ), + type="MobileNetV2", + widen_factor=1.0, + out_indices=(7,), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/top_down/' - 'mobilenetv2/mobilenetv2_coco_256x192-d1e58e7b_20200727.pth')), - neck=dict(type='GlobalAveragePooling'), - head=dict( - type='RLEHead', - in_channels=1280, - num_joints=17, - loss=dict(type='RLELoss', use_target_weight=True), - decoder=codec), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/top_down/" "mobilenetv2/mobilenetv2_coco_256x192-d1e58e7b_20200727.pth", + ), + ), + neck=dict(type="GlobalAveragePooling"), + head=dict(type="RLEHead", in_channels=1280, num_joints=17, loss=dict(type="RLELoss", use_target_weight=True), decoder=codec), test_cfg=dict( flip_test=True, shift_coords=True, @@ -60,26 +46,26 @@ model = dict( ) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,40 +73,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=f'{data_root}annotations/person_keypoints_val2017.json', - score_mode='bbox_rle') +val_evaluator = dict(type="CocoMetric", ann_file=f"{data_root}annotations/person_keypoints_val2017.json", score_mode="bbox_rle") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res101_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res101_8xb64-210e_coco-256x192.py index 94f35d0fc36c749638ff397f5af5eb50a006894f..08aee513fd5601b34861c60411261da42e61939a 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res101_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res101_8xb64-210e_coco-256x192.py @@ -1,80 +1,68 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(192, 256)) +codec = dict(type="RegressionLabel", input_size=(192, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), - neck=dict(type='GlobalAveragePooling'), + neck=dict(type="GlobalAveragePooling"), head=dict( - type='RegressionHead', - in_channels=2048, - num_joints=17, - loss=dict(type='SmoothL1Loss', use_target_weight=True), - decoder=codec), + type="RegressionHead", in_channels=2048, num_joints=17, loss=dict(type="SmoothL1Loss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -82,39 +70,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=f'{data_root}annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=f"{data_root}annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res101_rle-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res101_rle-8xb64-210e_coco-256x192.py index 21b4a3cdcbab80fa080ca90581a6ab3ee44fdbe4..384d334a0e2ce6f66b6e02b0b1fb143e14ddba6d 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res101_rle-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res101_rle-8xb64-210e_coco-256x192.py @@ -1,80 +1,66 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(192, 256)) +codec = dict(type="RegressionLabel", input_size=(192, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), - neck=dict(type='GlobalAveragePooling'), - head=dict( - type='RLEHead', - in_channels=2048, - num_joints=17, - loss=dict(type='RLELoss', use_target_weight=True), - decoder=codec), + neck=dict(type="GlobalAveragePooling"), + head=dict(type="RLEHead", in_channels=2048, num_joints=17, loss=dict(type="RLELoss", use_target_weight=True), decoder=codec), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -82,40 +68,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=f'{data_root}annotations/person_keypoints_val2017.json', - score_mode='bbox_rle') +val_evaluator = dict(type="CocoMetric", ann_file=f"{data_root}annotations/person_keypoints_val2017.json", score_mode="bbox_rle") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res152_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res152_8xb64-210e_coco-256x192.py index fa56fba4987e9f4c6c4f0e284e5949c0c6f46d6c..f0fa9aca8bd8e939d09868055cfb0b861c58ae5b 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res152_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res152_8xb64-210e_coco-256x192.py @@ -1,80 +1,68 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(192, 256)) +codec = dict(type="RegressionLabel", input_size=(192, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), - neck=dict(type='GlobalAveragePooling'), + neck=dict(type="GlobalAveragePooling"), head=dict( - type='RegressionHead', - in_channels=2048, - num_joints=17, - loss=dict(type='SmoothL1Loss', use_target_weight=True), - decoder=codec), + type="RegressionHead", in_channels=2048, num_joints=17, loss=dict(type="SmoothL1Loss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -82,39 +70,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=f'{data_root}annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=f"{data_root}annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res152_rle-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res152_rle-8xb64-210e_coco-256x192.py index e2a832b652b33aaa629fdb4a07863f223051461f..f9c1606b26e8ce85a575d3f0e04e1379e8d8c5c0 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res152_rle-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res152_rle-8xb64-210e_coco-256x192.py @@ -1,80 +1,66 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(192, 256)) +codec = dict(type="RegressionLabel", input_size=(192, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), - neck=dict(type='GlobalAveragePooling'), - head=dict( - type='RLEHead', - in_channels=2048, - num_joints=17, - loss=dict(type='RLELoss', use_target_weight=True), - decoder=codec), + neck=dict(type="GlobalAveragePooling"), + head=dict(type="RLEHead", in_channels=2048, num_joints=17, loss=dict(type="RLELoss", use_target_weight=True), decoder=codec), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -82,40 +68,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=f'{data_root}annotations/person_keypoints_val2017.json', - score_mode='bbox_rle') +val_evaluator = dict(type="CocoMetric", ann_file=f"{data_root}annotations/person_keypoints_val2017.json", score_mode="bbox_rle") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res152_rle-8xb64-210e_coco-384x288.py b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res152_rle-8xb64-210e_coco-384x288.py index 6d319e927eb21ddcb71e40ecae1050c3421871d2..ee46c8049d93ede48f39e987df924c32a260c9a0 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res152_rle-8xb64-210e_coco-384x288.py +++ b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res152_rle-8xb64-210e_coco-384x288.py @@ -1,80 +1,66 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(288, 384)) +codec = dict(type="RegressionLabel", input_size=(288, 384)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), - neck=dict(type='GlobalAveragePooling'), - head=dict( - type='RLEHead', - in_channels=2048, - num_joints=17, - loss=dict(type='RLELoss', use_target_weight=True), - decoder=codec), + neck=dict(type="GlobalAveragePooling"), + head=dict(type="RLEHead", in_channels=2048, num_joints=17, loss=dict(type="RLELoss", use_target_weight=True), decoder=codec), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -82,40 +68,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=f'{data_root}annotations/person_keypoints_val2017.json', - score_mode='bbox_rle') +val_evaluator = dict(type="CocoMetric", ann_file=f"{data_root}annotations/person_keypoints_val2017.json", score_mode="bbox_rle") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res50_8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res50_8xb64-210e_coco-256x192.py index fa7e487acf470dfbd988979ffb7570f72d409df0..1a33e148e197c7a641c70cb2b9393b4901100f1a 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res50_8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res50_8xb64-210e_coco-256x192.py @@ -1,80 +1,68 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(192, 256)) +codec = dict(type="RegressionLabel", input_size=(192, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), - neck=dict(type='GlobalAveragePooling'), + neck=dict(type="GlobalAveragePooling"), head=dict( - type='RegressionHead', - in_channels=2048, - num_joints=17, - loss=dict(type='SmoothL1Loss', use_target_weight=True), - decoder=codec), + type="RegressionHead", in_channels=2048, num_joints=17, loss=dict(type="SmoothL1Loss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -82,39 +70,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=f'{data_root}annotations/person_keypoints_val2017.json') +val_evaluator = dict(type="CocoMetric", ann_file=f"{data_root}annotations/person_keypoints_val2017.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res50_rle-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res50_rle-8xb64-210e_coco-256x192.py index db530f6ec4f065fa16228ee66fee33db5afddc4f..e1e888bd9cb2832185ad613cdb473ba2f506982a 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res50_rle-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res50_rle-8xb64-210e_coco-256x192.py @@ -1,80 +1,66 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=1e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=1e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(192, 256)) +codec = dict(type="RegressionLabel", input_size=(192, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), - neck=dict(type='GlobalAveragePooling'), - head=dict( - type='RLEHead', - in_channels=2048, - num_joints=17, - loss=dict(type='RLELoss', use_target_weight=True), - decoder=codec), + neck=dict(type="GlobalAveragePooling"), + head=dict(type="RLEHead", in_channels=2048, num_joints=17, loss=dict(type="RLELoss", use_target_weight=True), decoder=codec), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -82,40 +68,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=f'{data_root}annotations/person_keypoints_val2017.json', - score_mode='bbox_rle') +val_evaluator = dict(type="CocoMetric", ann_file=f"{data_root}annotations/person_keypoints_val2017.json", score_mode="bbox_rle") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res50_rle-pretrained-8xb64-210e_coco-256x192.py b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res50_rle-pretrained-8xb64-210e_coco-256x192.py index 6b74aba7f3c138901d35652c9b7f19bebf23cceb..26236d6e0e60b037cc8cd1dd932211e9ee58c0f5 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res50_rle-pretrained-8xb64-210e_coco-256x192.py +++ b/mmpose/configs/body_2d_keypoint/topdown_regression/coco/td-reg_res50_rle-pretrained-8xb64-210e_coco-256x192.py @@ -1,84 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=1e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=1e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=train_cfg['max_epochs'], - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=train_cfg["max_epochs"], milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(192, 256)) +codec = dict(type="RegressionLabel", input_size=(192, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/td-hm_res50_8xb64-210e_coco-256x192.pth'), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/td-hm_res50_8xb64-210e_coco-256x192.pth", + ), ), - neck=dict(type='GlobalAveragePooling'), - head=dict( - type='RLEHead', - in_channels=2048, - num_joints=17, - loss=dict(type='RLELoss', use_target_weight=True), - decoder=codec), + neck=dict(type="GlobalAveragePooling"), + head=dict(type="RLEHead", in_channels=2048, num_joints=17, loss=dict(type="RLELoss", use_target_weight=True), decoder=codec), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] test_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,40 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/person_keypoints_train2017.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/person_keypoints_val2017.json', - bbox_file=f'{data_root}person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/person_keypoints_val2017.json", + bbox_file=f"{data_root}person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco/AP", rule="greater")) # evaluators -val_evaluator = dict( - type='CocoMetric', - ann_file=f'{data_root}annotations/person_keypoints_val2017.json', - score_mode='bbox_rle') +val_evaluator = dict(type="CocoMetric", ann_file=f"{data_root}annotations/person_keypoints_val2017.json", score_mode="bbox_rle") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res101_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res101_8xb64-210e_mpii-256x256.py index 6c7821f91b1161491ad2166b36bd582e194f384b..a0b4c42591e479c0efcbda5c0c635dd7d5dc440c 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res101_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res101_8xb64-210e_mpii-256x256.py @@ -1,79 +1,67 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(256, 256)) +codec = dict(type="RegressionLabel", input_size=(256, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), - neck=dict(type='GlobalAveragePooling'), + neck=dict(type="GlobalAveragePooling"), head=dict( - type='RegressionHead', - in_channels=2048, - num_joints=16, - loss=dict(type='SmoothL1Loss', use_target_weight=True), - decoder=codec), + type="RegressionHead", in_channels=2048, num_joints=16, loss=dict(type="SmoothL1Loss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -81,36 +69,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file=f'{data_root}/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file=f"{data_root}/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res152_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res152_8xb64-210e_mpii-256x256.py index c1a19b0d6e720c9f60e19d62a8712e532390cc84..f19b675164cf9de1ccf22c6f8d849da81058b2aa 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res152_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res152_8xb64-210e_mpii-256x256.py @@ -1,81 +1,69 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(256, 256)) +codec = dict(type="RegressionLabel", input_size=(256, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), - neck=dict(type='GlobalAveragePooling'), + neck=dict(type="GlobalAveragePooling"), head=dict( - type='RegressionHead', - in_channels=2048, - num_joints=16, - loss=dict(type='SmoothL1Loss', use_target_weight=True), - decoder=codec), + type="RegressionHead", in_channels=2048, num_joints=16, loss=dict(type="SmoothL1Loss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" -file_client_args = dict(backend='disk') +file_client_args = dict(backend="disk") # pipelines train_pipeline = [ - dict(type='LoadImage', file_client_args=file_client_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage", file_client_args=file_client_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', file_client_args=file_client_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", file_client_args=file_client_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -83,36 +71,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file=f'{data_root}/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file=f"{data_root}/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res50_8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res50_8xb64-210e_mpii-256x256.py index 901fd4b8d61c2aa7a5cc920d1590acf8a4ece88d..7469608e5d651b4619526abb88083fc7d9e5ddf4 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res50_8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res50_8xb64-210e_mpii-256x256.py @@ -1,79 +1,67 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(256, 256)) +codec = dict(type="RegressionLabel", input_size=(256, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), - neck=dict(type='GlobalAveragePooling'), + neck=dict(type="GlobalAveragePooling"), head=dict( - type='RegressionHead', - in_channels=2048, - num_joints=16, - loss=dict(type='SmoothL1Loss', use_target_weight=True), - decoder=codec), + type="RegressionHead", in_channels=2048, num_joints=16, loss=dict(type="SmoothL1Loss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -81,36 +69,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file=f'{data_root}/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file=f"{data_root}/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res50_rle-8xb64-210e_mpii-256x256.py b/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res50_rle-8xb64-210e_mpii-256x256.py index 9d46484755dec533fe5519a782de86404bf9986e..be3f5559ae1a66e9ced57cca838b2227fc99f5dc 100644 --- a/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res50_rle-8xb64-210e_mpii-256x256.py +++ b/mmpose/configs/body_2d_keypoint/topdown_regression/mpii/td-reg_res50_rle-8xb64-210e_mpii-256x256.py @@ -1,79 +1,65 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(256, 256)) +codec = dict(type="RegressionLabel", input_size=(256, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), - neck=dict(type='GlobalAveragePooling'), - head=dict( - type='RLEHead', - in_channels=2048, - num_joints=16, - loss=dict(type='RLELoss', use_target_weight=True), - decoder=codec), + neck=dict(type="GlobalAveragePooling"), + head=dict(type="RLEHead", in_channels=2048, num_joints=16, loss=dict(type="RLELoss", use_target_weight=True), decoder=codec), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'MpiiDataset' -data_mode = 'topdown' -data_root = 'data/mpii/' +dataset_type = "MpiiDataset" +data_mode = "topdown" +data_root = "data/mpii/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform', shift_prob=0), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -81,36 +67,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/mpii_val.json', - headbox_file=f'{data_root}/annotations/mpii_gt_val.mat', - data_prefix=dict(img='images/'), + ann_file="annotations/mpii_val.json", + headbox_file=f"{data_root}/annotations/mpii_gt_val.mat", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="PCK", rule="greater")) # evaluators -val_evaluator = dict(type='MpiiPCKAccuracy') +val_evaluator = dict(type="MpiiPCKAccuracy") test_evaluator = val_evaluator diff --git a/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_l_8xb32-300e_coco-640.py b/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_l_8xb32-300e_coco-640.py index db61ea854ac674b9d5e441e5620cca6042f0a3aa..b343474e20785ac94e43924c9f849755a6287ed7 100644 --- a/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_l_8xb32-300e_coco-640.py +++ b/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_l_8xb32-300e_coco-640.py @@ -1,9 +1,10 @@ -_base_ = './yoloxpose_s_8xb32-300e_coco-640.py' +_base_ = "./yoloxpose_s_8xb32-300e_coco-640.py" widen_factor = 1 deepen_factor = 1 -checkpoint = 'https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_' \ - 'l_8x8_300e_coco/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth' +checkpoint = ( + "https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_" "l_8x8_300e_coco/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth" +) # model settings model = dict( @@ -12,6 +13,6 @@ model = dict( widen_factor=widen_factor, init_cfg=dict(checkpoint=checkpoint), ), - neck=dict( - in_channels=[256, 512, 1024], out_channels=256, num_csp_blocks=3), - head=dict(head_module_cfg=dict(widen_factor=widen_factor))) + neck=dict(in_channels=[256, 512, 1024], out_channels=256, num_csp_blocks=3), + head=dict(head_module_cfg=dict(widen_factor=widen_factor)), +) diff --git a/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_m_8xb32-300e_coco-640.py b/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_m_8xb32-300e_coco-640.py index 1fa895bc54c05e47d713609313b5b8d43220765a..6a3c585d6717c9926c5df67a5be5d0fed2adb48c 100644 --- a/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_m_8xb32-300e_coco-640.py +++ b/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_m_8xb32-300e_coco-640.py @@ -1,9 +1,8 @@ -_base_ = './yoloxpose_s_8xb32-300e_coco-640.py' +_base_ = "./yoloxpose_s_8xb32-300e_coco-640.py" widen_factor = 0.75 deepen_factor = 0.67 -checkpoint = 'https://download.openmmlab.com/mmpose/v1/pretrained_models/' \ - 'yolox_m_8x8_300e_coco_20230829.pth' +checkpoint = "https://download.openmmlab.com/mmpose/v1/pretrained_models/" "yolox_m_8x8_300e_coco_20230829.pth" # model settings model = dict( @@ -13,4 +12,5 @@ model = dict( init_cfg=dict(checkpoint=checkpoint), ), neck=dict(in_channels=[192, 384, 768], out_channels=192, num_csp_blocks=2), - head=dict(head_module_cfg=dict(widen_factor=widen_factor))) + head=dict(head_module_cfg=dict(widen_factor=widen_factor)), +) diff --git a/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_s_8xb32-300e_coco-640.py b/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_s_8xb32-300e_coco-640.py index 948a916b06707cfc73c1c9f1ac97a9ef928e23c4..3bf5b0e857c3e5802fc586f21a4bfe863f711ada 100644 --- a/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_s_8xb32-300e_coco-640.py +++ b/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_s_8xb32-300e_coco-640.py @@ -1,44 +1,27 @@ -_base_ = '../../../_base_/default_runtime.py' +_base_ = "../../../_base_/default_runtime.py" # runtime -train_cfg = dict( - _delete_=True, - type='EpochBasedTrainLoop', - max_epochs=300, - val_interval=10, - dynamic_intervals=[(280, 1)]) +train_cfg = dict(_delete_=True, type="EpochBasedTrainLoop", max_epochs=300, val_interval=10, dynamic_intervals=[(280, 1)]) auto_scale_lr = dict(base_batch_size=256) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=3)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", interval=10, max_keep_ckpts=3)) optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.004, weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.004, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True, ), - clip_grad=dict(max_norm=0.1, norm_type=2)) + clip_grad=dict(max_norm=0.1, norm_type=2), +) param_scheduler = [ - dict( - type='QuadraticWarmupLR', - by_epoch=True, - begin=0, - end=5, - convert_to_iter_based=True), - dict( - type='CosineAnnealingLR', - eta_min=0.0002, - begin=5, - T_max=280, - end=280, - by_epoch=True, - convert_to_iter_based=True), - dict(type='ConstantLR', by_epoch=True, factor=1, begin=280, end=300), + dict(type="QuadraticWarmupLR", by_epoch=True, begin=0, end=5, convert_to_iter_based=True), + dict(type="CosineAnnealingLR", eta_min=0.0002, begin=5, T_max=280, end=280, by_epoch=True, convert_to_iter_based=True), + dict(type="ConstantLR", by_epoch=True, factor=1, begin=280, end=300), ] # model @@ -46,52 +29,45 @@ widen_factor = 0.5 deepen_factor = 0.33 model = dict( - type='BottomupPoseEstimator', - init_cfg=dict( - type='Kaiming', - layer='Conv2d', - a=2.23606797749979, - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu'), + type="BottomupPoseEstimator", + init_cfg=dict(type="Kaiming", layer="Conv2d", a=2.23606797749979, distribution="uniform", mode="fan_in", nonlinearity="leaky_relu"), data_preprocessor=dict( - type='PoseDataPreprocessor', + type="PoseDataPreprocessor", pad_size_divisor=32, mean=[0, 0, 0], std=[1, 1, 1], batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(480, 800), - size_divisor=32, - interval=1), - ]), + dict(type="BatchSyncRandomResize", random_size_range=(480, 800), size_divisor=32, interval=1), + ], + ), backbone=dict( - type='CSPDarknet', + type="CSPDarknet", deepen_factor=deepen_factor, widen_factor=widen_factor, out_indices=(2, 3, 4), spp_kernal_sizes=(5, 9, 13), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmdetection/v2.0/' - 'yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_' - '20211121_095711-4592a793.pth', - prefix='backbone.', - )), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/" + "yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_" + "20211121_095711-4592a793.pth", + prefix="backbone.", + ), + ), neck=dict( - type='YOLOXPAFPN', + type="YOLOXPAFPN", in_channels=[128, 256, 512], out_channels=128, num_csp_blocks=1, use_depthwise=False, - upsample_cfg=dict(scale_factor=2, mode='nearest'), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), + upsample_cfg=dict(scale_factor=2, mode="nearest"), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), head=dict( - type='YOLOXPoseHead', + type="YOLOXPoseHead", num_keypoints=17, featmap_strides=(8, 16, 32), head_module_cfg=dict( @@ -100,107 +76,89 @@ model = dict( feat_channels=256, widen_factor=widen_factor, stacked_convs=2, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), - prior_generator=dict( - type='MlvlPointGenerator', offset=0, strides=[8, 16, 32]), - assigner=dict(type='SimOTAAssigner', dynamic_k_indicator='oks'), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), + prior_generator=dict(type="MlvlPointGenerator", offset=0, strides=[8, 16, 32]), + assigner=dict(type="SimOTAAssigner", dynamic_k_indicator="oks"), overlaps_power=0.5, - loss_cls=dict(type='BCELoss', reduction='sum', loss_weight=1.0), - loss_bbox=dict( - type='IoULoss', - mode='square', - eps=1e-16, - reduction='sum', - loss_weight=5.0), - loss_obj=dict( - type='BCELoss', - use_target_weight=True, - reduction='sum', - loss_weight=1.0), + loss_cls=dict(type="BCELoss", reduction="sum", loss_weight=1.0), + loss_bbox=dict(type="IoULoss", mode="square", eps=1e-16, reduction="sum", loss_weight=5.0), + loss_obj=dict(type="BCELoss", use_target_weight=True, reduction="sum", loss_weight=1.0), loss_oks=dict( - type='OKSLoss', - reduction='none', - metainfo='configs/_base_/datasets/coco.py', - norm_target_weight=True, - loss_weight=30.0), - loss_vis=dict( - type='BCELoss', - use_target_weight=True, - reduction='mean', - loss_weight=1.0), - loss_bbox_aux=dict(type='L1Loss', reduction='sum', loss_weight=1.0), + type="OKSLoss", reduction="none", metainfo="configs/_base_/datasets/coco.py", norm_target_weight=True, loss_weight=30.0 + ), + loss_vis=dict(type="BCELoss", use_target_weight=True, reduction="mean", loss_weight=1.0), + loss_bbox_aux=dict(type="L1Loss", reduction="sum", loss_weight=1.0), ), test_cfg=dict( score_thr=0.01, nms_thr=0.65, - )) + ), +) # data input_size = (640, 640) -codec = dict(type='YOLOXPoseAnnotationProcessor', input_size=input_size) +codec = dict(type="YOLOXPoseAnnotationProcessor", input_size=input_size) train_pipeline_stage1 = [ - dict(type='LoadImage', backend_args=None), + dict(type="LoadImage", backend_args=None), + dict(type="Mosaic", img_scale=(640, 640), pad_val=114.0, pre_transform=[dict(type="LoadImage", backend_args=None)]), dict( - type='Mosaic', - img_scale=(640, 640), - pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_factor=0.1, rotate_factor=10, scale_factor=(0.75, 1.0), pad_val=114, - distribution='uniform', - transform_mode='perspective', + distribution="uniform", + transform_mode="perspective", bbox_keep_corner=False, clip_border=True, ), dict( - type='YOLOXMixUp', + type="YOLOXMixUp", img_scale=(640, 640), ratio_range=(0.8, 1.6), pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + pre_transform=[dict(type="LoadImage", backend_args=None)], + ), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage'), + dict(type="LoadImage"), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=(640, 640), shift_prob=0, rotate_prob=0, scale_prob=0, - scale_type='long', + scale_type="long", pad_val=(114, 114, 114), bbox_keep_corner=False, clip_border=True, ), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs'), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] -data_mode = 'bottomup' -data_root = 'data/' +data_mode = "bottomup" +data_root = "data/" dataset_coco = dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, filter_cfg=dict(filter_empty_gt=False, min_size=32), - ann_file='coco/annotations/person_keypoints_train2017.json', - data_prefix=dict(img='coco/train2017/'), + ann_file="coco/annotations/person_keypoints_train2017.json", + data_prefix=dict(img="coco/train2017/"), pipeline=train_pipeline_stage1, ) @@ -209,17 +167,16 @@ train_dataloader = dict( num_workers=8, persistent_workers=True, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=True), - dataset=dataset_coco) + sampler=dict(type="DefaultSampler", shuffle=True), + dataset=dataset_coco, +) val_pipeline = [ - dict(type='LoadImage'), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=input_size, pad_val=(114, 114, 114)), dict( - type='BottomupResize', input_size=input_size, pad_val=(114, 114, 114)), - dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'ori_shape', 'img_shape', - 'input_size', 'input_center', 'input_scale')) + type="PackPoseInputs", meta_keys=("id", "img_id", "img_path", "ori_shape", "img_shape", "input_size", "input_center", "input_scale") + ), ] val_dataloader = dict( @@ -228,39 +185,30 @@ val_dataloader = dict( persistent_workers=True, pin_memory=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/person_keypoints_val2017.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="coco/annotations/person_keypoints_val2017.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'coco/annotations/person_keypoints_val2017.json', - score_mode='bbox', - nms_mode='none', + type="CocoMetric", + ann_file=data_root + "coco/annotations/person_keypoints_val2017.json", + score_mode="bbox", + nms_mode="none", ) test_evaluator = val_evaluator custom_hooks = [ - dict( - type='YOLOXPoseModeSwitchHook', - num_last_epochs=20, - new_train_pipeline=train_pipeline_stage2, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - strict_load=False, - priority=49), + dict(type="YOLOXPoseModeSwitchHook", num_last_epochs=20, new_train_pipeline=train_pipeline_stage2, priority=48), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, strict_load=False, priority=49), ] diff --git a/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_tiny_4xb64-300e_coco-416.py b/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_tiny_4xb64-300e_coco-416.py index d13d104e02bbff1497bb1629ff94b0172672984e..e78d7a2021fc423d025ad8931780f56e8eb28de9 100644 --- a/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_tiny_4xb64-300e_coco-416.py +++ b/mmpose/configs/body_2d_keypoint/yoloxpose/coco/yoloxpose_tiny_4xb64-300e_coco-416.py @@ -1,19 +1,19 @@ -_base_ = './yoloxpose_s_8xb32-300e_coco-640.py' +_base_ = "./yoloxpose_s_8xb32-300e_coco-640.py" # model settings widen_factor = 0.375 deepen_factor = 0.33 -checkpoint = 'https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_' \ - 'tiny_8x8_300e_coco/yolox_tiny_8x8_300e_coco_20211124_171234-b4047906.pth' +checkpoint = ( + "https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_" + "tiny_8x8_300e_coco/yolox_tiny_8x8_300e_coco_20211124_171234-b4047906.pth" +) model = dict( - data_preprocessor=dict(batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(320, 640), - size_divisor=32, - interval=1), - ]), + data_preprocessor=dict( + batch_augments=[ + dict(type="BatchSyncRandomResize", random_size_range=(320, 640), size_divisor=32, interval=1), + ] + ), backbone=dict( deepen_factor=deepen_factor, widen_factor=widen_factor, @@ -23,55 +23,56 @@ model = dict( in_channels=[96, 192, 384], out_channels=96, ), - head=dict(head_module_cfg=dict(widen_factor=widen_factor), )) + head=dict( + head_module_cfg=dict(widen_factor=widen_factor), + ), +) # dataset settings train_pipeline_stage1 = [ - dict(type='LoadImage', backend_args=None), - dict( - type='Mosaic', - img_scale=_base_.input_size, - pad_val=114.0, - pre_transform=[dict(type='LoadImage', backend_args=None)]), + dict(type="LoadImage", backend_args=None), + dict(type="Mosaic", img_scale=_base_.input_size, pad_val=114.0, pre_transform=[dict(type="LoadImage", backend_args=None)]), dict( - type='BottomupRandomAffine', + type="BottomupRandomAffine", input_size=_base_.input_size, shift_factor=0.1, rotate_factor=10, scale_factor=(0.75, 1.0), pad_val=114, - distribution='uniform', - transform_mode='perspective', + distribution="uniform", + transform_mode="perspective", bbox_keep_corner=False, clip_border=True, ), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip'), - dict(type='FilterAnnotations', by_kpt=True, by_box=True, keep_empty=False), - dict(type='GenerateTarget', encoder=_base_.codec), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip"), + dict(type="FilterAnnotations", by_kpt=True, by_box=True, keep_empty=False), + dict(type="GenerateTarget", encoder=_base_.codec), dict( - type='PackPoseInputs', + type="PackPoseInputs", extra_mapping_labels={ - 'bbox': 'bboxes', - 'bbox_labels': 'labels', - 'keypoints': 'keypoints', - 'keypoints_visible': 'keypoints_visible', - 'area': 'areas' - }), + "bbox": "bboxes", + "bbox_labels": "labels", + "keypoints": "keypoints", + "keypoints_visible": "keypoints_visible", + "area": "areas", + }, + ), ] -train_dataloader = dict( - batch_size=64, dataset=dict(pipeline=train_pipeline_stage1)) +train_dataloader = dict(batch_size=64, dataset=dict(pipeline=train_pipeline_stage1)) input_size = (416, 416) val_pipeline = [ - dict(type='LoadImage'), - dict( - type='BottomupResize', input_size=input_size, pad_val=(114, 114, 114)), + dict(type="LoadImage"), + dict(type="BottomupResize", input_size=input_size, pad_val=(114, 114, 114)), dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'ori_shape', 'img_shape', - 'input_size', 'input_center', 'input_scale')) + type="PackPoseInputs", meta_keys=("id", "img_id", "img_path", "ori_shape", "img_shape", "input_size", "input_center", "input_scale") + ), ] -val_dataloader = dict(dataset=dict(pipeline=val_pipeline, )) +val_dataloader = dict( + dataset=dict( + pipeline=val_pipeline, + ) +) test_dataloader = val_dataloader diff --git a/mmpose/configs/body_3d_keypoint/image_pose_lift/h36m/image-pose-lift_tcn_8xb64-200e_h36m.py b/mmpose/configs/body_3d_keypoint/image_pose_lift/h36m/image-pose-lift_tcn_8xb64-200e_h36m.py index b3c1c2db806fc0c5b0a0d726f1ff066bb2bd1313..1cbd01c02e7774f15ad0abb9ed3c4c678af33af1 100644 --- a/mmpose/configs/body_3d_keypoint/image_pose_lift/h36m/image-pose-lift_tcn_8xb64-200e_h36m.py +++ b/mmpose/configs/body_3d_keypoint/image_pose_lift/h36m/image-pose-lift_tcn_8xb64-200e_h36m.py @@ -1,102 +1,123 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] vis_backends = [ - dict(type='LocalVisBackend'), + dict(type="LocalVisBackend"), ] -visualizer = dict( - type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer') +visualizer = dict(type="Pose3dLocalVisualizer", vis_backends=vis_backends, name="visualizer") # runtime train_cfg = dict(max_epochs=200, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict(type='Adam', lr=1e-3)) +optim_wrapper = dict(optimizer=dict(type="Adam", lr=1e-3)) # learning policy -param_scheduler = [ - dict(type='StepLR', step_size=100000, gamma=0.96, end=80, by_epoch=False) -] +param_scheduler = [dict(type="StepLR", step_size=100000, gamma=0.96, end=80, by_epoch=False)] auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', - save_best='MPJPE', - rule='less', - max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", save_best="MPJPE", rule="less", max_keep_ckpts=1)) # codec settings # 3D keypoint normalization parameters # From file: '{data_root}/annotation_body3d/fps50/joint3d_rel_stats.pkl' -target_mean = [[-2.55652589e-04, -7.11960570e-03, -9.81433052e-04], - [-5.65463051e-03, 3.19636009e-01, 7.19329269e-02], - [-1.01705840e-02, 6.91147892e-01, 1.55352986e-01], - [2.55651315e-04, 7.11954606e-03, 9.81423866e-04], - [-5.09729780e-03, 3.27040413e-01, 7.22258095e-02], - [-9.99656606e-03, 7.08277383e-01, 1.58016408e-01], - [2.90583676e-03, -2.11363307e-01, -4.74210915e-02], - [5.67537804e-03, -4.35088906e-01, -9.76974016e-02], - [5.93884964e-03, -4.91891970e-01, -1.10666618e-01], - [7.37352083e-03, -5.83948619e-01, -1.31171400e-01], - [5.41920653e-03, -3.83931702e-01, -8.68145417e-02], - [2.95964662e-03, -1.87567488e-01, -4.34536934e-02], - [1.26585822e-03, -1.20170579e-01, -2.82526049e-02], - [4.67186639e-03, -3.83644089e-01, -8.55125784e-02], - [1.67648571e-03, -1.97007177e-01, -4.31368364e-02], - [8.70569015e-04, -1.68664569e-01, -3.73902498e-02]], -target_std = [[0.11072244, 0.02238818, 0.07246294], - [0.15856311, 0.18933832, 0.20880479], - [0.19179935, 0.24320062, 0.24756193], - [0.11072181, 0.02238805, 0.07246253], - [0.15880454, 0.19977188, 0.2147063], - [0.18001944, 0.25052739, 0.24853247], - [0.05210694, 0.05211406, 0.06908241], - [0.09515367, 0.10133032, 0.12899733], - [0.11742458, 0.12648469, 0.16465091], - [0.12360297, 0.13085539, 0.16433336], - [0.14602232, 0.09707956, 0.13952731], - [0.24347532, 0.12982249, 0.20230181], - [0.2446877, 0.21501816, 0.23938235], - [0.13876084, 0.1008926, 0.1424411], - [0.23687529, 0.14491219, 0.20980829], - [0.24400695, 0.23975028, 0.25520584]] +target_mean = ( + [ + [-2.55652589e-04, -7.11960570e-03, -9.81433052e-04], + [-5.65463051e-03, 3.19636009e-01, 7.19329269e-02], + [-1.01705840e-02, 6.91147892e-01, 1.55352986e-01], + [2.55651315e-04, 7.11954606e-03, 9.81423866e-04], + [-5.09729780e-03, 3.27040413e-01, 7.22258095e-02], + [-9.99656606e-03, 7.08277383e-01, 1.58016408e-01], + [2.90583676e-03, -2.11363307e-01, -4.74210915e-02], + [5.67537804e-03, -4.35088906e-01, -9.76974016e-02], + [5.93884964e-03, -4.91891970e-01, -1.10666618e-01], + [7.37352083e-03, -5.83948619e-01, -1.31171400e-01], + [5.41920653e-03, -3.83931702e-01, -8.68145417e-02], + [2.95964662e-03, -1.87567488e-01, -4.34536934e-02], + [1.26585822e-03, -1.20170579e-01, -2.82526049e-02], + [4.67186639e-03, -3.83644089e-01, -8.55125784e-02], + [1.67648571e-03, -1.97007177e-01, -4.31368364e-02], + [8.70569015e-04, -1.68664569e-01, -3.73902498e-02], + ], +) +target_std = [ + [0.11072244, 0.02238818, 0.07246294], + [0.15856311, 0.18933832, 0.20880479], + [0.19179935, 0.24320062, 0.24756193], + [0.11072181, 0.02238805, 0.07246253], + [0.15880454, 0.19977188, 0.2147063], + [0.18001944, 0.25052739, 0.24853247], + [0.05210694, 0.05211406, 0.06908241], + [0.09515367, 0.10133032, 0.12899733], + [0.11742458, 0.12648469, 0.16465091], + [0.12360297, 0.13085539, 0.16433336], + [0.14602232, 0.09707956, 0.13952731], + [0.24347532, 0.12982249, 0.20230181], + [0.2446877, 0.21501816, 0.23938235], + [0.13876084, 0.1008926, 0.1424411], + [0.23687529, 0.14491219, 0.20980829], + [0.24400695, 0.23975028, 0.25520584], +] # 2D keypoint normalization parameters # From file: '{data_root}/annotation_body3d/fps50/joint2d_stats.pkl' -keypoints_mean = [[532.08351635, 419.74137558], [531.80953144, 418.2607141], - [530.68456967, 493.54259285], [529.36968722, 575.96448516], - [532.29767646, 421.28483336], [531.93946631, 494.72186795], - [529.71984447, 578.96110365], [532.93699382, 370.65225054], - [534.1101856, 317.90342311], [534.55416813, 304.24143901], - [534.86955004, 282.31030885], [534.11308566, 330.11296796], - [533.53637525, 376.2742511], [533.49380107, 391.72324565], - [533.52579142, 330.09494668], [532.50804964, 374.190479], - [532.72786934, 380.61615716]], -keypoints_std = [[107.73640054, 63.35908715], [119.00836213, 64.1215443], - [119.12412107, 50.53806215], [120.61688045, 56.38444891], - [101.95735275, 62.89636486], [106.24832897, 48.41178119], - [108.46734966, 54.58177071], [109.07369806, 68.70443672], - [111.20130351, 74.87287863], [111.63203838, 77.80542514], - [113.22330788, 79.90670556], [105.7145833, 73.27049436], - [107.05804267, 73.93175781], [107.97449418, 83.30391802], - [121.60675105, 74.25691526], [134.34378973, 77.48125087], - [131.79990652, 89.86721124]] +keypoints_mean = ( + [ + [532.08351635, 419.74137558], + [531.80953144, 418.2607141], + [530.68456967, 493.54259285], + [529.36968722, 575.96448516], + [532.29767646, 421.28483336], + [531.93946631, 494.72186795], + [529.71984447, 578.96110365], + [532.93699382, 370.65225054], + [534.1101856, 317.90342311], + [534.55416813, 304.24143901], + [534.86955004, 282.31030885], + [534.11308566, 330.11296796], + [533.53637525, 376.2742511], + [533.49380107, 391.72324565], + [533.52579142, 330.09494668], + [532.50804964, 374.190479], + [532.72786934, 380.61615716], + ], +) +keypoints_std = [ + [107.73640054, 63.35908715], + [119.00836213, 64.1215443], + [119.12412107, 50.53806215], + [120.61688045, 56.38444891], + [101.95735275, 62.89636486], + [106.24832897, 48.41178119], + [108.46734966, 54.58177071], + [109.07369806, 68.70443672], + [111.20130351, 74.87287863], + [111.63203838, 77.80542514], + [113.22330788, 79.90670556], + [105.7145833, 73.27049436], + [107.05804267, 73.93175781], + [107.97449418, 83.30391802], + [121.60675105, 74.25691526], + [134.34378973, 77.48125087], + [131.79990652, 89.86721124], +] codec = dict( - type='ImagePoseLifting', + type="ImagePoseLifting", num_keypoints=17, root_index=0, remove_root=True, target_mean=target_mean, target_std=target_std, keypoints_mean=keypoints_mean, - keypoints_std=keypoints_std) + keypoints_std=keypoints_std, +) # model settings model = dict( - type='PoseLifter', + type="PoseLifter", backbone=dict( - type='TCN', + type="TCN", in_channels=2 * 17, stem_channels=1024, num_blocks=2, @@ -104,25 +125,25 @@ model = dict( dropout=0.5, ), head=dict( - type='TemporalRegressionHead', + type="TemporalRegressionHead", in_channels=1024, num_joints=16, - loss=dict(type='MSELoss'), + loss=dict(type="MSELoss"), decoder=codec, - )) + ), +) # base dataset settings -dataset_type = 'Human36mDataset' -data_root = 'data/h36m/' +dataset_type = "Human36mDataset" +data_root = "data/h36m/" # pipelines train_pipeline = [ - dict(type='GenerateTarget', encoder=codec), + dict(type="GenerateTarget", encoder=codec), dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'target_root', 'target_root_index', 'target_mean', - 'target_std')) + type="PackPoseInputs", + meta_keys=("id", "category_id", "target_img_path", "flip_indices", "target_root", "target_root_index", "target_mean", "target_std"), + ), ] val_pipeline = train_pipeline @@ -131,38 +152,37 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_train.npz', + ann_file="annotation_body3d/fps50/h36m_train.npz", seq_len=1, causal=True, - keypoint_2d_src='gt', + keypoint_2d_src="gt", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_test.npz', + ann_file="annotation_body3d/fps50/h36m_test.npz", seq_len=1, causal=True, - keypoint_2d_src='gt', + keypoint_2d_src="gt", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [ - dict(type='MPJPE', mode='mpjpe'), - dict(type='MPJPE', mode='p-mpjpe') -] +val_evaluator = [dict(type="MPJPE", mode="mpjpe"), dict(type="MPJPE", mode="p-mpjpe")] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-243frm_8xb32-240e_h36m-original.py b/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-243frm_8xb32-240e_h36m-original.py index caf2e56530384f118062055711305881fa5505c2..f34aaf90787f830355cd57309cc2f63873a5a02b 100644 --- a/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-243frm_8xb32-240e_h36m-original.py +++ b/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-243frm_8xb32-240e_h36m-original.py @@ -1,46 +1,36 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] vis_backends = [ - dict(type='LocalVisBackend'), + dict(type="LocalVisBackend"), ] -visualizer = dict( - type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer') +visualizer = dict(type="Pose3dLocalVisualizer", vis_backends=vis_backends, name="visualizer") # runtime train_cfg = dict(max_epochs=240, val_interval=10) # optimizer -optim_wrapper = dict( - optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.01)) +optim_wrapper = dict(optimizer=dict(type="AdamW", lr=0.0002, weight_decay=0.01)) # learning policy -param_scheduler = [ - dict(type='ExponentialLR', gamma=0.99, end=120, by_epoch=True) -] +param_scheduler = [dict(type="ExponentialLR", gamma=0.99, end=120, by_epoch=True)] auto_scale_lr = dict(base_batch_size=512) # hooks default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', - save_best='MPJPE', - rule='less', - max_keep_ckpts=1), - logger=dict(type='LoggerHook', interval=20), + checkpoint=dict(type="CheckpointHook", save_best="MPJPE", rule="less", max_keep_ckpts=1), + logger=dict(type="LoggerHook", interval=20), ) # codec settings -train_codec = dict( - type='MotionBERTLabel', num_keypoints=17, concat_vis=True, mode='train') -val_codec = dict( - type='MotionBERTLabel', num_keypoints=17, concat_vis=True, rootrel=True) +train_codec = dict(type="MotionBERTLabel", num_keypoints=17, concat_vis=True, mode="train") +val_codec = dict(type="MotionBERTLabel", num_keypoints=17, concat_vis=True, rootrel=True) # model settings model = dict( - type='PoseLifter', + type="PoseLifter", backbone=dict( - type='DSTFormer', + type="DSTFormer", in_channels=3, feat_size=512, depth=5, @@ -50,38 +40,34 @@ model = dict( att_fuse=True, ), head=dict( - type='MotionRegressionHead', + type="MotionRegressionHead", in_channels=512, out_channels=3, embedding_size=512, - loss=dict(type='MPJPEVelocityJointLoss'), + loss=dict(type="MPJPEVelocityJointLoss"), decoder=val_codec, ), - test_cfg=dict(flip_test=True)) + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'Human36mDataset' -data_root = 'data/h36m/' +dataset_type = "Human36mDataset" +data_root = "data/h36m/" # pipelines train_pipeline = [ - dict(type='GenerateTarget', encoder=train_codec), - dict( - type='RandomFlipAroundRoot', - keypoints_flip_cfg=dict(center_mode='static', center_x=0.), - target_flip_cfg=dict(center_mode='static', center_x=0.), - flip_label=True), + dict(type="GenerateTarget", encoder=train_codec), dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'factor', 'camera_param')) + type="RandomFlipAroundRoot", + keypoints_flip_cfg=dict(center_mode="static", center_x=0.0), + target_flip_cfg=dict(center_mode="static", center_x=0.0), + flip_label=True, + ), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "factor", "camera_param")), ] val_pipeline = [ - dict(type='GenerateTarget', encoder=val_codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'factor', 'camera_param')) + dict(type="GenerateTarget", encoder=val_codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "factor", "camera_param")), ] # data loaders @@ -91,18 +77,19 @@ train_dataloader = dict( pin_memory=True, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_train_original.npz', + ann_file="annotation_body3d/fps50/h36m_train_original.npz", seq_len=1, multiple_target=243, multiple_target_step=81, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, @@ -110,28 +97,24 @@ val_dataloader = dict( pin_memory=True, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_test_original.npz', - factor_file='annotation_body3d/fps50/h36m_factors.npy', + ann_file="annotation_body3d/fps50/h36m_test_original.npz", + factor_file="annotation_body3d/fps50/h36m_factors.npy", seq_len=1, seq_step=1, multiple_target=243, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # evaluators -skip_list = [ - 'S9_Greet', 'S9_SittingDown', 'S9_Wait_1', 'S9_Greeting', 'S9_Waiting_1' -] -val_evaluator = [ - dict(type='MPJPE', mode='mpjpe', skip_list=skip_list), - dict(type='MPJPE', mode='p-mpjpe', skip_list=skip_list) -] +skip_list = ["S9_Greet", "S9_SittingDown", "S9_Wait_1", "S9_Greeting", "S9_Waiting_1"] +val_evaluator = [dict(type="MPJPE", mode="mpjpe", skip_list=skip_list), dict(type="MPJPE", mode="p-mpjpe", skip_list=skip_list)] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-243frm_8xb32-240e_h36m.py b/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-243frm_8xb32-240e_h36m.py index ea91556198fd56f978e988311ad803a4a2193ab5..12f602548698276b358a08f8da19a47c72b3e112 100644 --- a/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-243frm_8xb32-240e_h36m.py +++ b/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-243frm_8xb32-240e_h36m.py @@ -1,46 +1,36 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] vis_backends = [ - dict(type='LocalVisBackend'), + dict(type="LocalVisBackend"), ] -visualizer = dict( - type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer') +visualizer = dict(type="Pose3dLocalVisualizer", vis_backends=vis_backends, name="visualizer") # runtime train_cfg = dict(max_epochs=240, val_interval=10) # optimizer -optim_wrapper = dict( - optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.01)) +optim_wrapper = dict(optimizer=dict(type="AdamW", lr=0.0002, weight_decay=0.01)) # learning policy -param_scheduler = [ - dict(type='ExponentialLR', gamma=0.99, end=120, by_epoch=True) -] +param_scheduler = [dict(type="ExponentialLR", gamma=0.99, end=120, by_epoch=True)] auto_scale_lr = dict(base_batch_size=512) # hooks default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', - save_best='MPJPE', - rule='less', - max_keep_ckpts=1), - logger=dict(type='LoggerHook', interval=20), + checkpoint=dict(type="CheckpointHook", save_best="MPJPE", rule="less", max_keep_ckpts=1), + logger=dict(type="LoggerHook", interval=20), ) # codec settings -train_codec = dict( - type='MotionBERTLabel', num_keypoints=17, concat_vis=True, mode='train') -val_codec = dict( - type='MotionBERTLabel', num_keypoints=17, concat_vis=True, rootrel=True) +train_codec = dict(type="MotionBERTLabel", num_keypoints=17, concat_vis=True, mode="train") +val_codec = dict(type="MotionBERTLabel", num_keypoints=17, concat_vis=True, rootrel=True) # model settings model = dict( - type='PoseLifter', + type="PoseLifter", backbone=dict( - type='DSTFormer', + type="DSTFormer", in_channels=3, feat_size=512, depth=5, @@ -50,38 +40,34 @@ model = dict( att_fuse=True, ), head=dict( - type='MotionRegressionHead', + type="MotionRegressionHead", in_channels=512, out_channels=3, embedding_size=512, - loss=dict(type='MPJPEVelocityJointLoss'), + loss=dict(type="MPJPEVelocityJointLoss"), decoder=val_codec, ), - test_cfg=dict(flip_test=True)) + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'Human36mDataset' -data_root = 'data/h36m/' +dataset_type = "Human36mDataset" +data_root = "data/h36m/" # pipelines train_pipeline = [ - dict(type='GenerateTarget', encoder=train_codec), - dict( - type='RandomFlipAroundRoot', - keypoints_flip_cfg=dict(center_mode='static', center_x=0.), - target_flip_cfg=dict(center_mode='static', center_x=0.), - flip_label=True), + dict(type="GenerateTarget", encoder=train_codec), dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'factor', 'camera_param')) + type="RandomFlipAroundRoot", + keypoints_flip_cfg=dict(center_mode="static", center_x=0.0), + target_flip_cfg=dict(center_mode="static", center_x=0.0), + flip_label=True, + ), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "factor", "camera_param")), ] val_pipeline = [ - dict(type='GenerateTarget', encoder=val_codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'factor', 'camera_param')) + dict(type="GenerateTarget", encoder=val_codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "factor", "camera_param")), ] # data loaders @@ -91,18 +77,19 @@ train_dataloader = dict( pin_memory=True, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_train.npz', + ann_file="annotation_body3d/fps50/h36m_train.npz", seq_len=1, multiple_target=243, multiple_target_step=81, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, @@ -110,27 +97,23 @@ val_dataloader = dict( pin_memory=True, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_test.npz', + ann_file="annotation_body3d/fps50/h36m_test.npz", seq_len=1, seq_step=1, multiple_target=243, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # evaluators -skip_list = [ - 'S9_Greet', 'S9_SittingDown', 'S9_Wait_1', 'S9_Greeting', 'S9_Waiting_1' -] -val_evaluator = [ - dict(type='MPJPE', mode='mpjpe', skip_list=skip_list), - dict(type='MPJPE', mode='p-mpjpe', skip_list=skip_list) -] +skip_list = ["S9_Greet", "S9_SittingDown", "S9_Wait_1", "S9_Greeting", "S9_Waiting_1"] +val_evaluator = [dict(type="MPJPE", mode="mpjpe", skip_list=skip_list), dict(type="MPJPE", mode="p-mpjpe", skip_list=skip_list)] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-ft-243frm_8xb32-120e_h36m-original.py b/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-ft-243frm_8xb32-120e_h36m-original.py index 555fd8ae0e7b9a9a0b4e6b2743ff581f382096d6..316d1fccfa48d9e9904a33f9b427a908c284b3c1 100644 --- a/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-ft-243frm_8xb32-120e_h36m-original.py +++ b/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-ft-243frm_8xb32-120e_h36m-original.py @@ -1,46 +1,36 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] vis_backends = [ - dict(type='LocalVisBackend'), + dict(type="LocalVisBackend"), ] -visualizer = dict( - type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer') +visualizer = dict(type="Pose3dLocalVisualizer", vis_backends=vis_backends, name="visualizer") # runtime train_cfg = dict(max_epochs=120, val_interval=10) # optimizer -optim_wrapper = dict( - optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.01)) +optim_wrapper = dict(optimizer=dict(type="AdamW", lr=0.0002, weight_decay=0.01)) # learning policy -param_scheduler = [ - dict(type='ExponentialLR', gamma=0.99, end=60, by_epoch=True) -] +param_scheduler = [dict(type="ExponentialLR", gamma=0.99, end=60, by_epoch=True)] auto_scale_lr = dict(base_batch_size=512) # hooks default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', - save_best='MPJPE', - rule='less', - max_keep_ckpts=1), - logger=dict(type='LoggerHook', interval=20), + checkpoint=dict(type="CheckpointHook", save_best="MPJPE", rule="less", max_keep_ckpts=1), + logger=dict(type="LoggerHook", interval=20), ) # codec settings -train_codec = dict( - type='MotionBERTLabel', num_keypoints=17, concat_vis=True, mode='train') -val_codec = dict( - type='MotionBERTLabel', num_keypoints=17, concat_vis=True, rootrel=True) +train_codec = dict(type="MotionBERTLabel", num_keypoints=17, concat_vis=True, mode="train") +val_codec = dict(type="MotionBERTLabel", num_keypoints=17, concat_vis=True, rootrel=True) # model settings model = dict( - type='PoseLifter', + type="PoseLifter", backbone=dict( - type='DSTFormer', + type="DSTFormer", in_channels=3, feat_size=512, depth=5, @@ -50,43 +40,39 @@ model = dict( att_fuse=True, ), head=dict( - type='MotionRegressionHead', + type="MotionRegressionHead", in_channels=512, out_channels=3, embedding_size=512, - loss=dict(type='MPJPEVelocityJointLoss'), + loss=dict(type="MPJPEVelocityJointLoss"), decoder=val_codec, ), test_cfg=dict(flip_test=True), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/v1/body_3d_keypoint/' - 'pose_lift/h36m/motionbert_pretrain_h36m-29ffebf5_20230719.pth'), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmpose/v1/body_3d_keypoint/" + "pose_lift/h36m/motionbert_pretrain_h36m-29ffebf5_20230719.pth", + ), ) # base dataset settings -dataset_type = 'Human36mDataset' -data_root = 'data/h36m/' +dataset_type = "Human36mDataset" +data_root = "data/h36m/" # pipelines train_pipeline = [ - dict(type='GenerateTarget', encoder=train_codec), - dict( - type='RandomFlipAroundRoot', - keypoints_flip_cfg=dict(center_mode='static', center_x=0.), - target_flip_cfg=dict(center_mode='static', center_x=0.), - flip_label=True), + dict(type="GenerateTarget", encoder=train_codec), dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'factor', 'camera_param')) + type="RandomFlipAroundRoot", + keypoints_flip_cfg=dict(center_mode="static", center_x=0.0), + target_flip_cfg=dict(center_mode="static", center_x=0.0), + flip_label=True, + ), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "factor", "camera_param")), ] val_pipeline = [ - dict(type='GenerateTarget', encoder=val_codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'factor', 'camera_param')) + dict(type="GenerateTarget", encoder=val_codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "factor", "camera_param")), ] # data loaders @@ -96,18 +82,19 @@ train_dataloader = dict( pin_memory=True, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_train_original.npz', + ann_file="annotation_body3d/fps50/h36m_train_original.npz", seq_len=1, multiple_target=243, multiple_target_step=81, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, @@ -115,28 +102,24 @@ val_dataloader = dict( pin_memory=True, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_test_original.npz', - factor_file='annotation_body3d/fps50/h36m_factors.npy', + ann_file="annotation_body3d/fps50/h36m_test_original.npz", + factor_file="annotation_body3d/fps50/h36m_factors.npy", seq_len=1, seq_step=1, multiple_target=243, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # evaluators -skip_list = [ - 'S9_Greet', 'S9_SittingDown', 'S9_Wait_1', 'S9_Greeting', 'S9_Waiting_1' -] -val_evaluator = [ - dict(type='MPJPE', mode='mpjpe', skip_list=skip_list), - dict(type='MPJPE', mode='p-mpjpe', skip_list=skip_list) -] +skip_list = ["S9_Greet", "S9_SittingDown", "S9_Wait_1", "S9_Greeting", "S9_Waiting_1"] +val_evaluator = [dict(type="MPJPE", mode="mpjpe", skip_list=skip_list), dict(type="MPJPE", mode="p-mpjpe", skip_list=skip_list)] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-ft-243frm_8xb32-120e_h36m.py b/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-ft-243frm_8xb32-120e_h36m.py index 256a765539674749d5fa5d67f33a4468454fe4b8..079573737c9487a1904ec28583c316dcaa90ed51 100644 --- a/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-ft-243frm_8xb32-120e_h36m.py +++ b/mmpose/configs/body_3d_keypoint/motionbert/h36m/motionbert_dstformer-ft-243frm_8xb32-120e_h36m.py @@ -1,46 +1,36 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] vis_backends = [ - dict(type='LocalVisBackend'), + dict(type="LocalVisBackend"), ] -visualizer = dict( - type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer') +visualizer = dict(type="Pose3dLocalVisualizer", vis_backends=vis_backends, name="visualizer") # runtime train_cfg = dict(max_epochs=120, val_interval=10) # optimizer -optim_wrapper = dict( - optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.01)) +optim_wrapper = dict(optimizer=dict(type="AdamW", lr=0.0002, weight_decay=0.01)) # learning policy -param_scheduler = [ - dict(type='ExponentialLR', gamma=0.99, end=60, by_epoch=True) -] +param_scheduler = [dict(type="ExponentialLR", gamma=0.99, end=60, by_epoch=True)] auto_scale_lr = dict(base_batch_size=512) # hooks default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', - save_best='MPJPE', - rule='less', - max_keep_ckpts=1), - logger=dict(type='LoggerHook', interval=20), + checkpoint=dict(type="CheckpointHook", save_best="MPJPE", rule="less", max_keep_ckpts=1), + logger=dict(type="LoggerHook", interval=20), ) # codec settings -train_codec = dict( - type='MotionBERTLabel', num_keypoints=17, concat_vis=True, mode='train') -val_codec = dict( - type='MotionBERTLabel', num_keypoints=17, concat_vis=True, rootrel=True) +train_codec = dict(type="MotionBERTLabel", num_keypoints=17, concat_vis=True, mode="train") +val_codec = dict(type="MotionBERTLabel", num_keypoints=17, concat_vis=True, rootrel=True) # model settings model = dict( - type='PoseLifter', + type="PoseLifter", backbone=dict( - type='DSTFormer', + type="DSTFormer", in_channels=3, feat_size=512, depth=5, @@ -50,43 +40,39 @@ model = dict( att_fuse=True, ), head=dict( - type='MotionRegressionHead', + type="MotionRegressionHead", in_channels=512, out_channels=3, embedding_size=512, - loss=dict(type='MPJPEVelocityJointLoss'), + loss=dict(type="MPJPEVelocityJointLoss"), decoder=val_codec, ), test_cfg=dict(flip_test=True), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/v1/body_3d_keypoint/' - 'pose_lift/h36m/motionbert_pretrain_h36m-29ffebf5_20230719.pth'), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmpose/v1/body_3d_keypoint/" + "pose_lift/h36m/motionbert_pretrain_h36m-29ffebf5_20230719.pth", + ), ) # base dataset settings -dataset_type = 'Human36mDataset' -data_root = 'data/h36m/' +dataset_type = "Human36mDataset" +data_root = "data/h36m/" # pipelines train_pipeline = [ - dict(type='GenerateTarget', encoder=train_codec), - dict( - type='RandomFlipAroundRoot', - keypoints_flip_cfg=dict(center_mode='static', center_x=0.), - target_flip_cfg=dict(center_mode='static', center_x=0.), - flip_label=True), + dict(type="GenerateTarget", encoder=train_codec), dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'factor', 'camera_param')) + type="RandomFlipAroundRoot", + keypoints_flip_cfg=dict(center_mode="static", center_x=0.0), + target_flip_cfg=dict(center_mode="static", center_x=0.0), + flip_label=True, + ), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "factor", "camera_param")), ] val_pipeline = [ - dict(type='GenerateTarget', encoder=val_codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'factor', 'camera_param')) + dict(type="GenerateTarget", encoder=val_codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "factor", "camera_param")), ] # data loaders @@ -96,18 +82,19 @@ train_dataloader = dict( pin_memory=True, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_train.npz', + ann_file="annotation_body3d/fps50/h36m_train.npz", seq_len=1, multiple_target=243, multiple_target_step=81, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, @@ -115,27 +102,23 @@ val_dataloader = dict( pin_memory=True, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_test.npz', + ann_file="annotation_body3d/fps50/h36m_test.npz", seq_len=1, seq_step=1, multiple_target=243, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # evaluators -skip_list = [ - 'S9_Greet', 'S9_SittingDown', 'S9_Wait_1', 'S9_Greeting', 'S9_Waiting_1' -] -val_evaluator = [ - dict(type='MPJPE', mode='mpjpe', skip_list=skip_list), - dict(type='MPJPE', mode='p-mpjpe', skip_list=skip_list) -] +skip_list = ["S9_Greet", "S9_SittingDown", "S9_Wait_1", "S9_Greeting", "S9_Waiting_1"] +val_evaluator = [dict(type="MPJPE", mode="mpjpe", skip_list=skip_list), dict(type="MPJPE", mode="p-mpjpe", skip_list=skip_list)] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-1frm-supv-cpn-ft_8xb128-160e_h36m.py b/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-1frm-supv-cpn-ft_8xb128-160e_h36m.py index c1190fe83ef95895726dadd9314db8907be9559e..29af70c413b393942b50e1c31d1b2cad86a76631 100644 --- a/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-1frm-supv-cpn-ft_8xb128-160e_h36m.py +++ b/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-1frm-supv-cpn-ft_8xb128-160e_h36m.py @@ -1,47 +1,35 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] vis_backends = [ - dict(type='LocalVisBackend'), + dict(type="LocalVisBackend"), ] -visualizer = dict( - type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer') +visualizer = dict(type="Pose3dLocalVisualizer", vis_backends=vis_backends, name="visualizer") # runtime train_cfg = dict(max_epochs=160, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict(type='Adam', lr=1e-4)) +optim_wrapper = dict(optimizer=dict(type="Adam", lr=1e-4)) # learning policy -param_scheduler = [ - dict(type='ExponentialLR', gamma=0.98, end=80, by_epoch=True) -] +param_scheduler = [dict(type="ExponentialLR", gamma=0.98, end=80, by_epoch=True)] auto_scale_lr = dict(base_batch_size=1024) # hooks default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', - save_best='MPJPE', - rule='less', - max_keep_ckpts=1), - logger=dict(type='LoggerHook', interval=20), + checkpoint=dict(type="CheckpointHook", save_best="MPJPE", rule="less", max_keep_ckpts=1), + logger=dict(type="LoggerHook", interval=20), ) # codec settings -codec = dict( - type='VideoPoseLifting', - num_keypoints=17, - zero_center=True, - root_index=0, - remove_root=False) +codec = dict(type="VideoPoseLifting", num_keypoints=17, zero_center=True, root_index=0, remove_root=False) # model settings model = dict( - type='PoseLifter', + type="PoseLifter", backbone=dict( - type='TCN', + type="TCN", in_channels=2 * 17, stem_channels=1024, num_blocks=4, @@ -50,36 +38,31 @@ model = dict( use_stride_conv=True, ), head=dict( - type='TemporalRegressionHead', + type="TemporalRegressionHead", in_channels=1024, num_joints=17, - loss=dict(type='MPJPELoss'), + loss=dict(type="MPJPELoss"), decoder=codec, - )) + ), +) # base dataset settings -dataset_type = 'Human36mDataset' -data_root = 'data/h36m/' +dataset_type = "Human36mDataset" +data_root = "data/h36m/" # pipelines train_pipeline = [ dict( - type='RandomFlipAroundRoot', + type="RandomFlipAroundRoot", keypoints_flip_cfg=dict(), target_flip_cfg=dict(), ), - dict(type='GenerateTarget', encoder=codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'target_root')) + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "target_root")), ] val_pipeline = [ - dict(type='GenerateTarget', encoder=codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'target_root')) + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "target_root")), ] # data loaders @@ -87,18 +70,18 @@ train_dataloader = dict( batch_size=128, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_train.npz', + ann_file="annotation_body3d/fps50/h36m_train.npz", seq_len=1, causal=False, pad_video_seq=False, - keypoint_2d_src='detection', - keypoint_2d_det_file='joint_2d_det_files/cpn_ft_h36m_dbb_train.npy', - camera_param_file='annotation_body3d/cameras.pkl', + keypoint_2d_src="detection", + keypoint_2d_det_file="joint_2d_det_files/cpn_ft_h36m_dbb_train.npy", + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=train_pipeline, ), ) @@ -107,26 +90,24 @@ val_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_test.npz', + ann_file="annotation_body3d/fps50/h36m_test.npz", seq_len=1, causal=False, pad_video_seq=False, - keypoint_2d_src='detection', - keypoint_2d_det_file='joint_2d_det_files/cpn_ft_h36m_dbb_test.npy', - camera_param_file='annotation_body3d/cameras.pkl', + keypoint_2d_src="detection", + keypoint_2d_det_file="joint_2d_det_files/cpn_ft_h36m_dbb_test.npy", + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [ - dict(type='MPJPE', mode='mpjpe'), - dict(type='MPJPE', mode='p-mpjpe') -] +val_evaluator = [dict(type="MPJPE", mode="mpjpe"), dict(type="MPJPE", mode="p-mpjpe")] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-243frm-supv-cpn-ft_8xb128-200e_h36m.py b/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-243frm-supv-cpn-ft_8xb128-200e_h36m.py index 3ef3df570b0bab3b66027c5c54acb0edd3ef694f..505851ab89feab76692713acbab9d7b84e64a573 100644 --- a/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-243frm-supv-cpn-ft_8xb128-200e_h36m.py +++ b/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-243frm-supv-cpn-ft_8xb128-200e_h36m.py @@ -1,47 +1,35 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] vis_backends = [ - dict(type='LocalVisBackend'), + dict(type="LocalVisBackend"), ] -visualizer = dict( - type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer') +visualizer = dict(type="Pose3dLocalVisualizer", vis_backends=vis_backends, name="visualizer") # runtime train_cfg = dict(max_epochs=200, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict(type='Adam', lr=1e-4)) +optim_wrapper = dict(optimizer=dict(type="Adam", lr=1e-4)) # learning policy -param_scheduler = [ - dict(type='ExponentialLR', gamma=0.98, end=200, by_epoch=True) -] +param_scheduler = [dict(type="ExponentialLR", gamma=0.98, end=200, by_epoch=True)] auto_scale_lr = dict(base_batch_size=1024) # hooks default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', - save_best='MPJPE', - rule='less', - max_keep_ckpts=1), - logger=dict(type='LoggerHook', interval=20), + checkpoint=dict(type="CheckpointHook", save_best="MPJPE", rule="less", max_keep_ckpts=1), + logger=dict(type="LoggerHook", interval=20), ) # codec settings -codec = dict( - type='VideoPoseLifting', - num_keypoints=17, - zero_center=True, - root_index=0, - remove_root=False) +codec = dict(type="VideoPoseLifting", num_keypoints=17, zero_center=True, root_index=0, remove_root=False) # model settings model = dict( - type='PoseLifter', + type="PoseLifter", backbone=dict( - type='TCN', + type="TCN", in_channels=2 * 17, stem_channels=1024, num_blocks=4, @@ -50,36 +38,31 @@ model = dict( use_stride_conv=True, ), head=dict( - type='TemporalRegressionHead', + type="TemporalRegressionHead", in_channels=1024, num_joints=17, - loss=dict(type='MPJPELoss'), + loss=dict(type="MPJPELoss"), decoder=codec, - )) + ), +) # base dataset settings -dataset_type = 'Human36mDataset' -data_root = 'data/h36m/' +dataset_type = "Human36mDataset" +data_root = "data/h36m/" # pipelines train_pipeline = [ dict( - type='RandomFlipAroundRoot', + type="RandomFlipAroundRoot", keypoints_flip_cfg=dict(), target_flip_cfg=dict(), ), - dict(type='GenerateTarget', encoder=codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'target_root')) + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "target_root")), ] val_pipeline = [ - dict(type='GenerateTarget', encoder=codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'target_root')) + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "target_root")), ] # data loaders @@ -87,18 +70,18 @@ train_dataloader = dict( batch_size=128, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_train.npz', + ann_file="annotation_body3d/fps50/h36m_train.npz", seq_len=243, causal=False, pad_video_seq=True, - keypoint_2d_src='detection', - keypoint_2d_det_file='joint_2d_det_files/cpn_ft_h36m_dbb_train.npy', - camera_param_file='annotation_body3d/cameras.pkl', + keypoint_2d_src="detection", + keypoint_2d_det_file="joint_2d_det_files/cpn_ft_h36m_dbb_train.npy", + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=train_pipeline, ), ) @@ -107,26 +90,24 @@ val_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_test.npz', + ann_file="annotation_body3d/fps50/h36m_test.npz", seq_len=243, causal=False, pad_video_seq=True, - keypoint_2d_src='detection', - keypoint_2d_det_file='joint_2d_det_files/cpn_ft_h36m_dbb_test.npy', - camera_param_file='annotation_body3d/cameras.pkl', + keypoint_2d_src="detection", + keypoint_2d_det_file="joint_2d_det_files/cpn_ft_h36m_dbb_test.npy", + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [ - dict(type='MPJPE', mode='mpjpe'), - dict(type='MPJPE', mode='p-mpjpe') -] +val_evaluator = [dict(type="MPJPE", mode="mpjpe"), dict(type="MPJPE", mode="p-mpjpe")] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-243frm-supv_8xb128-160e_h36m.py b/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-243frm-supv_8xb128-160e_h36m.py index 0d241c498f98e3f2e5e10c4e5434a82d218ab371..043bd83476a1358762114087423beb30f68b20f7 100644 --- a/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-243frm-supv_8xb128-160e_h36m.py +++ b/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-243frm-supv_8xb128-160e_h36m.py @@ -1,47 +1,35 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] vis_backends = [ - dict(type='LocalVisBackend'), + dict(type="LocalVisBackend"), ] -visualizer = dict( - type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer') +visualizer = dict(type="Pose3dLocalVisualizer", vis_backends=vis_backends, name="visualizer") # runtime train_cfg = dict(max_epochs=160, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict(type='Adam', lr=1e-3)) +optim_wrapper = dict(optimizer=dict(type="Adam", lr=1e-3)) # learning policy -param_scheduler = [ - dict(type='ExponentialLR', gamma=0.975, end=80, by_epoch=True) -] +param_scheduler = [dict(type="ExponentialLR", gamma=0.975, end=80, by_epoch=True)] auto_scale_lr = dict(base_batch_size=1024) # hooks default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', - save_best='MPJPE', - rule='less', - max_keep_ckpts=1), - logger=dict(type='LoggerHook', interval=20), + checkpoint=dict(type="CheckpointHook", save_best="MPJPE", rule="less", max_keep_ckpts=1), + logger=dict(type="LoggerHook", interval=20), ) # codec settings -codec = dict( - type='VideoPoseLifting', - num_keypoints=17, - zero_center=True, - root_index=0, - remove_root=False) +codec = dict(type="VideoPoseLifting", num_keypoints=17, zero_center=True, root_index=0, remove_root=False) # model settings model = dict( - type='PoseLifter', + type="PoseLifter", backbone=dict( - type='TCN', + type="TCN", in_channels=2 * 17, stem_channels=1024, num_blocks=4, @@ -50,36 +38,31 @@ model = dict( use_stride_conv=True, ), head=dict( - type='TemporalRegressionHead', + type="TemporalRegressionHead", in_channels=1024, num_joints=17, - loss=dict(type='MPJPELoss'), + loss=dict(type="MPJPELoss"), decoder=codec, - )) + ), +) # base dataset settings -dataset_type = 'Human36mDataset' -data_root = 'data/h36m/' +dataset_type = "Human36mDataset" +data_root = "data/h36m/" # pipelines train_pipeline = [ dict( - type='RandomFlipAroundRoot', + type="RandomFlipAroundRoot", keypoints_flip_cfg=dict(), target_flip_cfg=dict(), ), - dict(type='GenerateTarget', encoder=codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'target_root')) + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "target_root")), ] val_pipeline = [ - dict(type='GenerateTarget', encoder=codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'target_root')) + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "target_root")), ] # data loaders @@ -87,16 +70,16 @@ train_dataloader = dict( batch_size=128, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_train.npz', + ann_file="annotation_body3d/fps50/h36m_train.npz", seq_len=243, causal=False, pad_video_seq=True, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=train_pipeline, ), ) @@ -105,24 +88,22 @@ val_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_test.npz', + ann_file="annotation_body3d/fps50/h36m_test.npz", seq_len=243, causal=False, pad_video_seq=True, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [ - dict(type='MPJPE', mode='mpjpe'), - dict(type='MPJPE', mode='p-mpjpe') -] +val_evaluator = [dict(type="MPJPE", mode="mpjpe"), dict(type="MPJPE", mode="p-mpjpe")] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-27frm-semi-supv-cpn-ft_8xb64-200e_h36m.py b/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-27frm-semi-supv-cpn-ft_8xb64-200e_h36m.py index 08bcda8ed76ebd08b8e525f904c41abb91d9a21e..4e423c8263cb0708f6797542835853653c6783c9 100644 --- a/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-27frm-semi-supv-cpn-ft_8xb64-200e_h36m.py +++ b/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-27frm-semi-supv-cpn-ft_8xb64-200e_h36m.py @@ -1,10 +1,9 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] vis_backends = [ - dict(type='LocalVisBackend'), + dict(type="LocalVisBackend"), ] -visualizer = dict( - type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer') +visualizer = dict(type="Pose3dLocalVisualizer", vis_backends=vis_backends, name="visualizer") # runtime train_cfg = None @@ -17,27 +16,18 @@ auto_scale_lr = dict(base_batch_size=1024) # hooks default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', - save_best='MPJPE', - rule='less', - max_keep_ckpts=1), - logger=dict(type='LoggerHook', interval=20), + checkpoint=dict(type="CheckpointHook", save_best="MPJPE", rule="less", max_keep_ckpts=1), + logger=dict(type="LoggerHook", interval=20), ) # codec settings -codec = dict( - type='VideoPoseLifting', - num_keypoints=17, - zero_center=True, - root_index=0, - remove_root=False) +codec = dict(type="VideoPoseLifting", num_keypoints=17, zero_center=True, root_index=0, remove_root=False) # model settings model = dict( - type='PoseLifter', + type="PoseLifter", backbone=dict( - type='TCN', + type="TCN", in_channels=2 * 17, stem_channels=1024, num_blocks=2, @@ -46,14 +36,14 @@ model = dict( use_stride_conv=True, ), head=dict( - type='TemporalRegressionHead', + type="TemporalRegressionHead", in_channels=1024, num_joints=17, - loss=dict(type='MPJPELoss'), + loss=dict(type="MPJPELoss"), decoder=codec, ), traj_backbone=dict( - type='TCN', + type="TCN", in_channels=2 * 17, stem_channels=1024, num_blocks=2, @@ -62,29 +52,27 @@ model = dict( use_stride_conv=True, ), traj_head=dict( - type='TrajectoryRegressionHead', + type="TrajectoryRegressionHead", in_channels=1024, num_joints=1, - loss=dict(type='MPJPELoss', use_target_weight=True), + loss=dict(type="MPJPELoss", use_target_weight=True), decoder=codec, ), semi_loss=dict( - type='SemiSupervisionLoss', + type="SemiSupervisionLoss", joint_parents=[0, 0, 1, 2, 0, 4, 5, 0, 7, 8, 9, 8, 11, 12, 8, 14, 15], - warmup_iterations=1311376 // 64 // 8 * 5), + warmup_iterations=1311376 // 64 // 8 * 5, + ), ) # base dataset settings -dataset_type = 'Human36mDataset' -data_root = 'data/h36m/' +dataset_type = "Human36mDataset" +data_root = "data/h36m/" # pipelines val_pipeline = [ - dict(type='GenerateTarget', encoder=codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'target_root')) + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "target_root")), ] # data loaders @@ -93,27 +81,24 @@ val_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_test.npz', + ann_file="annotation_body3d/fps50/h36m_test.npz", seq_len=27, causal=False, pad_video_seq=True, - keypoint_2d_src='detection', - keypoint_2d_det_file='joint_2d_det_files/cpn_ft_h36m_dbb_test.npy', - camera_param_file='annotation_body3d/cameras.pkl', + keypoint_2d_src="detection", + keypoint_2d_det_file="joint_2d_det_files/cpn_ft_h36m_dbb_test.npy", + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [ - dict(type='MPJPE', mode='mpjpe'), - dict(type='MPJPE', mode='p-mpjpe'), - dict(type='MPJPE', mode='n-mpjpe') -] +val_evaluator = [dict(type="MPJPE", mode="mpjpe"), dict(type="MPJPE", mode="p-mpjpe"), dict(type="MPJPE", mode="n-mpjpe")] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-27frm-semi-supv_8xb64-200e_h36m.py b/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-27frm-semi-supv_8xb64-200e_h36m.py index d145f05b17e917885bf76e7c51ed628b5b096d27..763215f4c0a9029253e19a1c6f52015985f63a4c 100644 --- a/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-27frm-semi-supv_8xb64-200e_h36m.py +++ b/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-27frm-semi-supv_8xb64-200e_h36m.py @@ -1,10 +1,9 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] vis_backends = [ - dict(type='LocalVisBackend'), + dict(type="LocalVisBackend"), ] -visualizer = dict( - type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer') +visualizer = dict(type="Pose3dLocalVisualizer", vis_backends=vis_backends, name="visualizer") # runtime train_cfg = None @@ -17,27 +16,18 @@ auto_scale_lr = dict(base_batch_size=1024) # hooks default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', - save_best='MPJPE', - rule='less', - max_keep_ckpts=1), - logger=dict(type='LoggerHook', interval=20), + checkpoint=dict(type="CheckpointHook", save_best="MPJPE", rule="less", max_keep_ckpts=1), + logger=dict(type="LoggerHook", interval=20), ) # codec settings -codec = dict( - type='VideoPoseLifting', - num_keypoints=17, - zero_center=True, - root_index=0, - remove_root=False) +codec = dict(type="VideoPoseLifting", num_keypoints=17, zero_center=True, root_index=0, remove_root=False) # model settings model = dict( - type='PoseLifter', + type="PoseLifter", backbone=dict( - type='TCN', + type="TCN", in_channels=2 * 17, stem_channels=1024, num_blocks=2, @@ -46,14 +36,14 @@ model = dict( use_stride_conv=True, ), head=dict( - type='TemporalRegressionHead', + type="TemporalRegressionHead", in_channels=1024, num_joints=17, - loss=dict(type='MPJPELoss'), + loss=dict(type="MPJPELoss"), decoder=codec, ), traj_backbone=dict( - type='TCN', + type="TCN", in_channels=2 * 17, stem_channels=1024, num_blocks=2, @@ -62,29 +52,27 @@ model = dict( use_stride_conv=True, ), traj_head=dict( - type='TrajectoryRegressionHead', + type="TrajectoryRegressionHead", in_channels=1024, num_joints=1, - loss=dict(type='MPJPELoss', use_target_weight=True), + loss=dict(type="MPJPELoss", use_target_weight=True), decoder=codec, ), semi_loss=dict( - type='SemiSupervisionLoss', + type="SemiSupervisionLoss", joint_parents=[0, 0, 1, 2, 0, 4, 5, 0, 7, 8, 9, 8, 11, 12, 8, 14, 15], - warmup_iterations=1311376 // 64 // 8 * 5), + warmup_iterations=1311376 // 64 // 8 * 5, + ), ) # base dataset settings -dataset_type = 'Human36mDataset' -data_root = 'data/h36m/' +dataset_type = "Human36mDataset" +data_root = "data/h36m/" # pipelines val_pipeline = [ - dict(type='GenerateTarget', encoder=codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'target_root')) + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "target_root")), ] # data loaders @@ -93,25 +81,22 @@ val_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_test.npz', + ann_file="annotation_body3d/fps50/h36m_test.npz", seq_len=27, causal=False, pad_video_seq=True, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [ - dict(type='MPJPE', mode='mpjpe'), - dict(type='MPJPE', mode='p-mpjpe'), - dict(type='MPJPE', mode='n-mpjpe') -] +val_evaluator = [dict(type="MPJPE", mode="mpjpe"), dict(type="MPJPE", mode="p-mpjpe"), dict(type="MPJPE", mode="n-mpjpe")] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-27frm-supv_8xb128-160e_h36m.py b/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-27frm-supv_8xb128-160e_h36m.py index 803f907b7bdc1d4cb0fe3496ad05322c48533cf9..6396dd9b6cd3578ad62c88404af211b9271bb867 100644 --- a/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-27frm-supv_8xb128-160e_h36m.py +++ b/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-27frm-supv_8xb128-160e_h36m.py @@ -1,47 +1,35 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] vis_backends = [ - dict(type='LocalVisBackend'), + dict(type="LocalVisBackend"), ] -visualizer = dict( - type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer') +visualizer = dict(type="Pose3dLocalVisualizer", vis_backends=vis_backends, name="visualizer") # runtime train_cfg = dict(max_epochs=160, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict(type='Adam', lr=1e-3)) +optim_wrapper = dict(optimizer=dict(type="Adam", lr=1e-3)) # learning policy -param_scheduler = [ - dict(type='ExponentialLR', gamma=0.975, end=80, by_epoch=True) -] +param_scheduler = [dict(type="ExponentialLR", gamma=0.975, end=80, by_epoch=True)] auto_scale_lr = dict(base_batch_size=1024) # hooks default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', - save_best='MPJPE', - rule='less', - max_keep_ckpts=1), - logger=dict(type='LoggerHook', interval=20), + checkpoint=dict(type="CheckpointHook", save_best="MPJPE", rule="less", max_keep_ckpts=1), + logger=dict(type="LoggerHook", interval=20), ) # codec settings -codec = dict( - type='VideoPoseLifting', - num_keypoints=17, - zero_center=True, - root_index=0, - remove_root=False) +codec = dict(type="VideoPoseLifting", num_keypoints=17, zero_center=True, root_index=0, remove_root=False) # model settings model = dict( - type='PoseLifter', + type="PoseLifter", backbone=dict( - type='TCN', + type="TCN", in_channels=2 * 17, stem_channels=1024, num_blocks=2, @@ -50,36 +38,31 @@ model = dict( use_stride_conv=True, ), head=dict( - type='TemporalRegressionHead', + type="TemporalRegressionHead", in_channels=1024, num_joints=17, - loss=dict(type='MPJPELoss'), + loss=dict(type="MPJPELoss"), decoder=codec, - )) + ), +) # base dataset settings -dataset_type = 'Human36mDataset' -data_root = 'data/h36m/' +dataset_type = "Human36mDataset" +data_root = "data/h36m/" # pipelines train_pipeline = [ dict( - type='RandomFlipAroundRoot', + type="RandomFlipAroundRoot", keypoints_flip_cfg=dict(), target_flip_cfg=dict(), ), - dict(type='GenerateTarget', encoder=codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'target_root')) + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "target_root")), ] val_pipeline = [ - dict(type='GenerateTarget', encoder=codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'target_root')) + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "target_root")), ] # data loaders @@ -87,16 +70,16 @@ train_dataloader = dict( batch_size=128, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_train.npz', + ann_file="annotation_body3d/fps50/h36m_train.npz", seq_len=27, causal=False, pad_video_seq=True, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=train_pipeline, ), ) @@ -105,24 +88,22 @@ val_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_test.npz', + ann_file="annotation_body3d/fps50/h36m_test.npz", seq_len=27, causal=False, pad_video_seq=True, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [ - dict(type='MPJPE', mode='mpjpe'), - dict(type='MPJPE', mode='p-mpjpe') -] +val_evaluator = [dict(type="MPJPE", mode="mpjpe"), dict(type="MPJPE", mode="p-mpjpe")] test_evaluator = val_evaluator diff --git a/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-81frm-supv_8xb128-160e_h36m.py b/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-81frm-supv_8xb128-160e_h36m.py index 4b370fe76eb80b292ef59a435c0cc0aa2d48f4b3..aa33ac2f89f04edc16418316943269ed95d4c33c 100644 --- a/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-81frm-supv_8xb128-160e_h36m.py +++ b/mmpose/configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-81frm-supv_8xb128-160e_h36m.py @@ -1,47 +1,35 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] vis_backends = [ - dict(type='LocalVisBackend'), + dict(type="LocalVisBackend"), ] -visualizer = dict( - type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer') +visualizer = dict(type="Pose3dLocalVisualizer", vis_backends=vis_backends, name="visualizer") # runtime train_cfg = dict(max_epochs=160, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict(type='Adam', lr=1e-3)) +optim_wrapper = dict(optimizer=dict(type="Adam", lr=1e-3)) # learning policy -param_scheduler = [ - dict(type='ExponentialLR', gamma=0.975, end=80, by_epoch=True) -] +param_scheduler = [dict(type="ExponentialLR", gamma=0.975, end=80, by_epoch=True)] auto_scale_lr = dict(base_batch_size=1024) # hooks default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', - save_best='MPJPE', - rule='less', - max_keep_ckpts=1), - logger=dict(type='LoggerHook', interval=20), + checkpoint=dict(type="CheckpointHook", save_best="MPJPE", rule="less", max_keep_ckpts=1), + logger=dict(type="LoggerHook", interval=20), ) # codec settings -codec = dict( - type='VideoPoseLifting', - num_keypoints=17, - zero_center=True, - root_index=0, - remove_root=False) +codec = dict(type="VideoPoseLifting", num_keypoints=17, zero_center=True, root_index=0, remove_root=False) # model settings model = dict( - type='PoseLifter', + type="PoseLifter", backbone=dict( - type='TCN', + type="TCN", in_channels=2 * 17, stem_channels=1024, num_blocks=3, @@ -50,36 +38,31 @@ model = dict( use_stride_conv=True, ), head=dict( - type='TemporalRegressionHead', + type="TemporalRegressionHead", in_channels=1024, num_joints=17, - loss=dict(type='MPJPELoss'), + loss=dict(type="MPJPELoss"), decoder=codec, - )) + ), +) # base dataset settings -dataset_type = 'Human36mDataset' -data_root = 'data/h36m/' +dataset_type = "Human36mDataset" +data_root = "data/h36m/" # pipelines train_pipeline = [ dict( - type='RandomFlipAroundRoot', + type="RandomFlipAroundRoot", keypoints_flip_cfg=dict(), target_flip_cfg=dict(), ), - dict(type='GenerateTarget', encoder=codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'target_root')) + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "target_root")), ] val_pipeline = [ - dict(type='GenerateTarget', encoder=codec), - dict( - type='PackPoseInputs', - meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices', - 'target_root')) + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs", meta_keys=("id", "category_id", "target_img_path", "flip_indices", "target_root")), ] # data loaders @@ -87,16 +70,16 @@ train_dataloader = dict( batch_size=128, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_train.npz', + ann_file="annotation_body3d/fps50/h36m_train.npz", seq_len=81, causal=False, pad_video_seq=True, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=train_pipeline, ), ) @@ -105,24 +88,22 @@ val_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - ann_file='annotation_body3d/fps50/h36m_test.npz', + ann_file="annotation_body3d/fps50/h36m_test.npz", seq_len=81, causal=False, pad_video_seq=True, - camera_param_file='annotation_body3d/cameras.pkl', + camera_param_file="annotation_body3d/cameras.pkl", data_root=data_root, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = [ - dict(type='MPJPE', mode='mpjpe'), - dict(type='MPJPE', mode='p-mpjpe') -] +val_evaluator = [dict(type="MPJPE", mode="mpjpe"), dict(type="MPJPE", mode="p-mpjpe")] test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/rtmpose/coco_wholebody_face/rtmpose-m_8xb32-60e_coco-wholebody-face-256x256.py b/mmpose/configs/face_2d_keypoint/rtmpose/coco_wholebody_face/rtmpose-m_8xb32-60e_coco-wholebody-face-256x256.py index 958a361c07a9dbfc45daabcab2fb08ba889e9525..676ea082a889faa9a71b55736adfea82ceaa3265 100644 --- a/mmpose/configs/face_2d_keypoint/rtmpose/coco_wholebody_face/rtmpose-m_8xb32-60e_coco-wholebody-face-256x256.py +++ b/mmpose/configs/face_2d_keypoint/rtmpose/coco_wholebody_face/rtmpose-m_8xb32-60e_coco-wholebody-face-256x256.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 60 @@ -10,97 +10,78 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(256, 256), - sigma=(5.66, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(256, 256), sigma=(5.66, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=68, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyFaceDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyFaceDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -110,68 +91,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), # dict(type='RandomHalfBody'), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), # dict(type='RandomHalfBody'), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -179,53 +141,45 @@ train_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='NME', rule='less', max_keep_ckpts=1, interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", max_keep_ckpts=1, interval=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/rtmpose/face6/rtmpose-m_8xb256-120e_face6-256x256.py b/mmpose/configs/face_2d_keypoint/rtmpose/face6/rtmpose-m_8xb256-120e_face6-256x256.py index abbb2ce985129538b7ecef8e5b1995bee1effa3a..0eab83e07cae54cbdc18e150a95917bc6c137326 100644 --- a/mmpose/configs/face_2d_keypoint/rtmpose/face6/rtmpose-m_8xb256-120e_face6-256x256.py +++ b/mmpose/configs/face_2d_keypoint/rtmpose/face6/rtmpose-m_8xb256-120e_face6-256x256.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # lapa coco wflw 300w cofw halpe @@ -12,169 +12,125 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.005, begin=30, end=max_epochs, T_max=max_epochs - 30, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(256, 256), - sigma=(5.66, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(256, 256), sigma=(5.66, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=106, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'LapaDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "LapaDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.2), - dict(type='MedianBlur', p=0.2), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.2), + dict(type="MedianBlur", p=0.2), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] # train dataset @@ -182,8 +138,8 @@ dataset_lapa = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='LaPa/annotations/lapa_trainval.json', - data_prefix=dict(img='pose/LaPa/'), + ann_file="LaPa/annotations/lapa_trainval.json", + data_prefix=dict(img="pose/LaPa/"), pipeline=[], ) @@ -261,7 +217,7 @@ kpt_68_to_106 = [ (64, 100), (65, 101), (66, 102), - (67, 103) + (67, 103), ] mapping_halpe = [ @@ -338,7 +294,7 @@ mapping_halpe = [ (90, 100), (91, 101), (92, 102), - (93, 103) + (93, 103), ] mapping_wflw = [ @@ -449,7 +405,7 @@ mapping_wflw = [ # (96, 104), # - (97, 105) + (97, 105), ] mapping_cofw = [ @@ -486,66 +442,51 @@ mapping_cofw = [ (26, 102), (27, 93), # - (28, 16) + (28, 16), ] dataset_coco = dict( - type='CocoWholeBodyFaceDataset', + type="CocoWholeBodyFaceDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=kpt_68_to_106) - ], + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=kpt_68_to_106)], ) dataset_wflw = dict( - type='WFLWDataset', + type="WFLWDataset", data_root=data_root, data_mode=data_mode, - ann_file='wflw/annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='pose/WFLW/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_wflw) - ], + ann_file="wflw/annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="pose/WFLW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_wflw)], ) dataset_300w = dict( - type='Face300WDataset', + type="Face300WDataset", data_root=data_root, data_mode=data_mode, - ann_file='300w/annotations/face_landmarks_300w_train.json', - data_prefix=dict(img='pose/300w/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=kpt_68_to_106) - ], + ann_file="300w/annotations/face_landmarks_300w_train.json", + data_prefix=dict(img="pose/300w/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=kpt_68_to_106)], ) dataset_cofw = dict( - type='COFWDataset', + type="COFWDataset", data_root=data_root, data_mode=data_mode, - ann_file='cofw/annotations/cofw_train.json', - data_prefix=dict(img='pose/COFW/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_cofw) - ], + ann_file="cofw/annotations/cofw_train.json", + data_prefix=dict(img="pose/COFW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_cofw)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_133kpt.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_halpe) - ], + ann_file="halpe/annotations/halpe_train_133kpt.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_halpe)], ) # data loaders @@ -553,101 +494,85 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/lapa.py'), - datasets=[ - dataset_lapa, dataset_coco, dataset_wflw, dataset_300w, - dataset_cofw, dataset_halpe - ], + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/lapa.py"), + datasets=[dataset_lapa, dataset_coco, dataset_wflw, dataset_300w, dataset_cofw, dataset_halpe], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='LaPa/annotations/lapa_test.json', - data_prefix=dict(img='pose/LaPa/'), + ann_file="LaPa/annotations/lapa_test.json", + data_prefix=dict(img="pose/LaPa/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) # test dataset val_lapa = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='LaPa/annotations/lapa_test.json', - data_prefix=dict(img='pose/LaPa/'), + ann_file="LaPa/annotations/lapa_test.json", + data_prefix=dict(img="pose/LaPa/"), pipeline=[], ) val_coco = dict( - type='CocoWholeBodyFaceDataset', + type="CocoWholeBodyFaceDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=kpt_68_to_106) - ], + ann_file="coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=kpt_68_to_106)], ) val_wflw = dict( - type='WFLWDataset', + type="WFLWDataset", data_root=data_root, data_mode=data_mode, - ann_file='wflw/annotations/face_landmarks_wflw_test.json', - data_prefix=dict(img='pose/WFLW/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_wflw) - ], + ann_file="wflw/annotations/face_landmarks_wflw_test.json", + data_prefix=dict(img="pose/WFLW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_wflw)], ) val_300w = dict( - type='Face300WDataset', + type="Face300WDataset", data_root=data_root, data_mode=data_mode, - ann_file='300w/annotations/face_landmarks_300w_test.json', - data_prefix=dict(img='pose/300w/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=kpt_68_to_106) - ], + ann_file="300w/annotations/face_landmarks_300w_test.json", + data_prefix=dict(img="pose/300w/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=kpt_68_to_106)], ) val_cofw = dict( - type='COFWDataset', + type="COFWDataset", data_root=data_root, data_mode=data_mode, - ann_file='cofw/annotations/cofw_test.json', - data_prefix=dict(img='pose/COFW/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_cofw) - ], + ann_file="cofw/annotations/cofw_test.json", + data_prefix=dict(img="pose/COFW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_cofw)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_halpe) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_halpe)], ) test_dataloader = dict( @@ -655,36 +580,27 @@ test_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/lapa.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/lapa.py"), datasets=[val_lapa, val_coco, val_wflw, val_300w, val_cofw, val_halpe], pipeline=val_pipeline, test_mode=True, - )) + ), +) # hooks -default_hooks = dict( - checkpoint=dict( - save_best='NME', rule='less', max_keep_ckpts=1, interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", max_keep_ckpts=1, interval=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/rtmpose/face6/rtmpose-s_8xb256-120e_face6-256x256.py b/mmpose/configs/face_2d_keypoint/rtmpose/face6/rtmpose-s_8xb256-120e_face6-256x256.py index 62fa305115e48a619966cdaa2ac9f03cce38bfa9..e2bb1f52193a13bc999f68e8332cf6826090c0c5 100644 --- a/mmpose/configs/face_2d_keypoint/rtmpose/face6/rtmpose-s_8xb256-120e_face6-256x256.py +++ b/mmpose/configs/face_2d_keypoint/rtmpose/face6/rtmpose-s_8xb256-120e_face6-256x256.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # lapa coco wflw 300w cofw halpe @@ -12,177 +12,133 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.0), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.005, begin=30, end=max_epochs, T_max=max_epochs - 30, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(256, 256), - sigma=(5.66, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(256, 256), sigma=(5.66, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.33, widen_factor=0.5, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/cspnext-s_imagenet_600e-ea671761.pth') + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/cspnext-s_imagenet_600e-ea671761.pth", + ), ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=512, out_channels=106, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'LapaDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "LapaDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.2), - dict(type='MedianBlur', p=0.2), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.2), + dict(type="MedianBlur", p=0.2), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] # train dataset dataset_lapa = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='LaPa/annotations/lapa_trainval.json', - data_prefix=dict(img='pose/LaPa/'), + ann_file="LaPa/annotations/lapa_trainval.json", + data_prefix=dict(img="pose/LaPa/"), pipeline=[], ) @@ -260,7 +216,7 @@ kpt_68_to_106 = [ (64, 100), (65, 101), (66, 102), - (67, 103) + (67, 103), ] mapping_halpe = [ @@ -337,7 +293,7 @@ mapping_halpe = [ (90, 100), (91, 101), (92, 102), - (93, 103) + (93, 103), ] mapping_wflw = [ @@ -448,7 +404,7 @@ mapping_wflw = [ # (96, 104), # - (97, 105) + (97, 105), ] mapping_cofw = [ @@ -485,66 +441,51 @@ mapping_cofw = [ (26, 102), (27, 93), # - (28, 16) + (28, 16), ] dataset_coco = dict( - type='CocoWholeBodyFaceDataset', + type="CocoWholeBodyFaceDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=kpt_68_to_106) - ], + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=kpt_68_to_106)], ) dataset_wflw = dict( - type='WFLWDataset', + type="WFLWDataset", data_root=data_root, data_mode=data_mode, - ann_file='wflw/annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='pose/WFLW/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_wflw) - ], + ann_file="wflw/annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="pose/WFLW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_wflw)], ) dataset_300w = dict( - type='Face300WDataset', + type="Face300WDataset", data_root=data_root, data_mode=data_mode, - ann_file='300w/annotations/face_landmarks_300w_train.json', - data_prefix=dict(img='pose/300w/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=kpt_68_to_106) - ], + ann_file="300w/annotations/face_landmarks_300w_train.json", + data_prefix=dict(img="pose/300w/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=kpt_68_to_106)], ) dataset_cofw = dict( - type='COFWDataset', + type="COFWDataset", data_root=data_root, data_mode=data_mode, - ann_file='cofw/annotations/cofw_train.json', - data_prefix=dict(img='pose/COFW/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_cofw) - ], + ann_file="cofw/annotations/cofw_train.json", + data_prefix=dict(img="pose/COFW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_cofw)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_133kpt.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_halpe) - ], + ann_file="halpe/annotations/halpe_train_133kpt.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_halpe)], ) # data loaders @@ -553,102 +494,86 @@ train_dataloader = dict( num_workers=10, pin_memory=True, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/lapa.py'), - datasets=[ - dataset_lapa, dataset_coco, dataset_wflw, dataset_300w, - dataset_cofw, dataset_halpe - ], + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/lapa.py"), + datasets=[dataset_lapa, dataset_coco, dataset_wflw, dataset_300w, dataset_cofw, dataset_halpe], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, pin_memory=True, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='LaPa/annotations/lapa_test.json', - data_prefix=dict(img='pose/LaPa/'), + ann_file="LaPa/annotations/lapa_test.json", + data_prefix=dict(img="pose/LaPa/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) # test dataset val_lapa = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='LaPa/annotations/lapa_test.json', - data_prefix=dict(img='pose/LaPa/'), + ann_file="LaPa/annotations/lapa_test.json", + data_prefix=dict(img="pose/LaPa/"), pipeline=[], ) val_coco = dict( - type='CocoWholeBodyFaceDataset', + type="CocoWholeBodyFaceDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=kpt_68_to_106) - ], + ann_file="coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=kpt_68_to_106)], ) val_wflw = dict( - type='WFLWDataset', + type="WFLWDataset", data_root=data_root, data_mode=data_mode, - ann_file='wflw/annotations/face_landmarks_wflw_test.json', - data_prefix=dict(img='pose/WFLW/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_wflw) - ], + ann_file="wflw/annotations/face_landmarks_wflw_test.json", + data_prefix=dict(img="pose/WFLW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_wflw)], ) val_300w = dict( - type='Face300WDataset', + type="Face300WDataset", data_root=data_root, data_mode=data_mode, - ann_file='300w/annotations/face_landmarks_300w_test.json', - data_prefix=dict(img='pose/300w/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=kpt_68_to_106) - ], + ann_file="300w/annotations/face_landmarks_300w_test.json", + data_prefix=dict(img="pose/300w/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=kpt_68_to_106)], ) val_cofw = dict( - type='COFWDataset', + type="COFWDataset", data_root=data_root, data_mode=data_mode, - ann_file='cofw/annotations/cofw_test.json', - data_prefix=dict(img='pose/COFW/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_cofw) - ], + ann_file="cofw/annotations/cofw_test.json", + data_prefix=dict(img="pose/COFW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_cofw)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_halpe) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_halpe)], ) test_dataloader = dict( @@ -656,36 +581,27 @@ test_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/lapa.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/lapa.py"), datasets=[val_lapa, val_coco, val_wflw, val_300w, val_cofw, val_halpe], pipeline=val_pipeline, test_mode=True, - )) + ), +) # hooks -default_hooks = dict( - checkpoint=dict( - save_best='NME', rule='less', max_keep_ckpts=1, interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", max_keep_ckpts=1, interval=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/rtmpose/face6/rtmpose-t_8xb256-120e_face6-256x256.py b/mmpose/configs/face_2d_keypoint/rtmpose/face6/rtmpose-t_8xb256-120e_face6-256x256.py index 751bedffe77aa1dc08bf5360a1f3b5ea9781f209..59d4f0ecd097186c74e5ceeac8c0396fed1dd39f 100644 --- a/mmpose/configs/face_2d_keypoint/rtmpose/face6/rtmpose-t_8xb256-120e_face6-256x256.py +++ b/mmpose/configs/face_2d_keypoint/rtmpose/face6/rtmpose-t_8xb256-120e_face6-256x256.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # lapa coco wflw 300w cofw halpe @@ -12,177 +12,125 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.0), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', - eta_min=base_lr * 0.005, - begin=30, - end=max_epochs, - T_max=90, - by_epoch=True, - convert_to_iter_based=True), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), + dict(type="CosineAnnealingLR", eta_min=base_lr * 0.005, begin=30, end=max_epochs, T_max=90, by_epoch=True, convert_to_iter_based=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(256, 256), - sigma=(5.66, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(256, 256), sigma=(5.66, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.167, widen_factor=0.375, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/cspnext-tiny_imagenet_600e-3a2dd350.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/cspnext-tiny_imagenet_600e-3a2dd350.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=384, out_channels=106, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'LapaDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "LapaDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.2), - dict(type='MedianBlur', p=0.2), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.2), + dict(type="MedianBlur", p=0.2), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] # train dataset dataset_lapa = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='LaPa/annotations/lapa_trainval.json', - data_prefix=dict(img='pose/LaPa/'), + ann_file="LaPa/annotations/lapa_trainval.json", + data_prefix=dict(img="pose/LaPa/"), pipeline=[], ) @@ -260,7 +208,7 @@ kpt_68_to_106 = [ (64, 100), (65, 101), (66, 102), - (67, 103) + (67, 103), ] mapping_halpe = [ @@ -337,7 +285,7 @@ mapping_halpe = [ (90, 100), (91, 101), (92, 102), - (93, 103) + (93, 103), ] mapping_wflw = [ @@ -448,7 +396,7 @@ mapping_wflw = [ # (96, 104), # - (97, 105) + (97, 105), ] mapping_cofw = [ @@ -485,66 +433,51 @@ mapping_cofw = [ (26, 102), (27, 93), # - (28, 16) + (28, 16), ] dataset_coco = dict( - type='CocoWholeBodyFaceDataset', + type="CocoWholeBodyFaceDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=kpt_68_to_106) - ], + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=kpt_68_to_106)], ) dataset_wflw = dict( - type='WFLWDataset', + type="WFLWDataset", data_root=data_root, data_mode=data_mode, - ann_file='wflw/annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='pose/WFLW/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_wflw) - ], + ann_file="wflw/annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="pose/WFLW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_wflw)], ) dataset_300w = dict( - type='Face300WDataset', + type="Face300WDataset", data_root=data_root, data_mode=data_mode, - ann_file='300w/annotations/face_landmarks_300w_train.json', - data_prefix=dict(img='pose/300w/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=kpt_68_to_106) - ], + ann_file="300w/annotations/face_landmarks_300w_train.json", + data_prefix=dict(img="pose/300w/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=kpt_68_to_106)], ) dataset_cofw = dict( - type='COFWDataset', + type="COFWDataset", data_root=data_root, data_mode=data_mode, - ann_file='cofw/annotations/cofw_train.json', - data_prefix=dict(img='pose/COFW/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_cofw) - ], + ann_file="cofw/annotations/cofw_train.json", + data_prefix=dict(img="pose/COFW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_cofw)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_133kpt.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_halpe) - ], + ann_file="halpe/annotations/halpe_train_133kpt.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_halpe)], ) # data loaders @@ -552,101 +485,85 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/lapa.py'), - datasets=[ - dataset_lapa, dataset_coco, dataset_wflw, dataset_300w, - dataset_cofw, dataset_halpe - ], + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/lapa.py"), + datasets=[dataset_lapa, dataset_coco, dataset_wflw, dataset_300w, dataset_cofw, dataset_halpe], pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='LaPa/annotations/lapa_test.json', - data_prefix=dict(img='pose/LaPa/'), + ann_file="LaPa/annotations/lapa_test.json", + data_prefix=dict(img="pose/LaPa/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) # test dataset val_lapa = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='LaPa/annotations/lapa_test.json', - data_prefix=dict(img='pose/LaPa/'), + ann_file="LaPa/annotations/lapa_test.json", + data_prefix=dict(img="pose/LaPa/"), pipeline=[], ) val_coco = dict( - type='CocoWholeBodyFaceDataset', + type="CocoWholeBodyFaceDataset", data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=kpt_68_to_106) - ], + ann_file="coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=kpt_68_to_106)], ) val_wflw = dict( - type='WFLWDataset', + type="WFLWDataset", data_root=data_root, data_mode=data_mode, - ann_file='wflw/annotations/face_landmarks_wflw_test.json', - data_prefix=dict(img='pose/WFLW/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_wflw) - ], + ann_file="wflw/annotations/face_landmarks_wflw_test.json", + data_prefix=dict(img="pose/WFLW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_wflw)], ) val_300w = dict( - type='Face300WDataset', + type="Face300WDataset", data_root=data_root, data_mode=data_mode, - ann_file='300w/annotations/face_landmarks_300w_test.json', - data_prefix=dict(img='pose/300w/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=kpt_68_to_106) - ], + ann_file="300w/annotations/face_landmarks_300w_test.json", + data_prefix=dict(img="pose/300w/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=kpt_68_to_106)], ) val_cofw = dict( - type='COFWDataset', + type="COFWDataset", data_root=data_root, data_mode=data_mode, - ann_file='cofw/annotations/cofw_test.json', - data_prefix=dict(img='pose/COFW/images/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_cofw) - ], + ann_file="cofw/annotations/cofw_test.json", + data_prefix=dict(img="pose/COFW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_cofw)], ) val_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), - pipeline=[ - dict( - type='KeypointConverter', num_keypoints=106, mapping=mapping_halpe) - ], + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=106, mapping=mapping_halpe)], ) test_dataloader = dict( @@ -654,19 +571,18 @@ test_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/lapa.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/lapa.py"), datasets=[val_lapa, val_coco, val_wflw, val_300w, val_cofw, val_halpe], pipeline=val_pipeline, test_mode=True, - )) + ), +) # hooks -default_hooks = dict( - checkpoint=dict( - save_best='NME', rule='less', max_keep_ckpts=1, interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", max_keep_ckpts=1, interval=1)) custom_hooks = [ # dict( @@ -675,15 +591,12 @@ custom_hooks = [ # momentum=0.0002, # update_buffers=True, # priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2) ] # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/rtmpose/lapa/rtmpose-m_8xb64-120e_lapa-256x256.py b/mmpose/configs/face_2d_keypoint/rtmpose/lapa/rtmpose-m_8xb64-120e_lapa-256x256.py index fee1201db1f56efd162292dbb2b6155b7865dced..75498d2e553721fd2a8925b00318784b9bd7f648 100644 --- a/mmpose/configs/face_2d_keypoint/rtmpose/lapa/rtmpose-m_8xb64-120e_lapa-256x256.py +++ b/mmpose/configs/face_2d_keypoint/rtmpose/lapa/rtmpose-m_8xb64-120e_lapa-256x256.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 120 @@ -10,97 +10,78 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(256, 256), - sigma=(5.66, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(256, 256), sigma=(5.66, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=106, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'LapaDataset' -data_mode = 'topdown' -data_root = 'data/LaPa/' +dataset_type = "LapaDataset" +data_mode = "topdown" +data_root = "data/LaPa/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -110,69 +91,50 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), + dict(type="PhotometricDistortion"), dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict(type='PhotometricDistortion'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.2), - dict(type='MedianBlur', p=0.2), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.2), + dict(type="MedianBlur", p=0.2), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), # dict(type='RandomHalfBody'), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -180,67 +142,60 @@ train_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/lapa_train.json', - data_prefix=dict(img=''), + ann_file="annotations/lapa_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/lapa_val.json', - data_prefix=dict(img=''), + ann_file="annotations/lapa_val.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/lapa_test.json', - data_prefix=dict(img=''), + ann_file="annotations/lapa_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) # hooks -default_hooks = dict( - checkpoint=dict( - save_best='NME', rule='less', max_keep_ckpts=1, interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", max_keep_ckpts=1, interval=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/rtmpose/wflw/rtmpose-m_8xb64-60e_wflw-256x256.py b/mmpose/configs/face_2d_keypoint/rtmpose/wflw/rtmpose-m_8xb64-60e_wflw-256x256.py index cbfd788d6062dc70aa3716920a189e681a393497..ccca69ab5696f25ad61b5386db46fad61d5067bb 100644 --- a/mmpose/configs/face_2d_keypoint/rtmpose/wflw/rtmpose-m_8xb64-60e_wflw-256x256.py +++ b/mmpose/configs/face_2d_keypoint/rtmpose/wflw/rtmpose-m_8xb64-60e_wflw-256x256.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 60 @@ -10,97 +10,78 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(256, 256), - sigma=(5.66, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(256, 256), sigma=(5.66, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=98, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'WFLWDataset' -data_mode = 'topdown' -data_root = 'data/wflw/' +dataset_type = "WFLWDataset" +data_mode = "topdown" +data_root = "data/wflw/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -110,68 +91,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), # dict(type='RandomHalfBody'), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), # dict(type='RandomHalfBody'), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -179,53 +141,45 @@ train_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_wflw_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_wflw_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='NME', rule='less', max_keep_ckpts=1, interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", max_keep_ckpts=1, interval=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_heatmap/300w/td-hm_hrnetv2-w18_8xb64-60e_300w-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_heatmap/300w/td-hm_hrnetv2-w18_8xb64-60e_300w-256x256.py index 52473a4664cca8266f603729d1a631aa6dc5b4ca..f4a6a617681b953440268e524aaa1bc148d0b108 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_heatmap/300w/td-hm_hrnetv2-w18_8xb64-60e_300w-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_heatmap/300w/td-hm_hrnetv2-w18_8xb64-60e_300w-256x256.py @@ -1,125 +1,95 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=60, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=60, - milestones=[40, 55], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=60, milestones=[40, 55], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(256, 256), - heatmap_size=(64, 64), - sigma=1.5) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=1.5) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18'), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=68, deconv_out_channels=None, - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'Face300WDataset' -data_mode = 'topdown' -data_root = 'data/300w/' +dataset_type = "Face300WDataset" +data_mode = "topdown" +data_root = "data/300w/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_prob=0, - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0, rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -127,35 +97,37 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_300w_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_300w_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_300w_valid.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_300w_valid.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_heatmap/300wlp/td-hm_hrnetv2-w18_8xb64-60e_300wlp-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_heatmap/300wlp/td-hm_hrnetv2-w18_8xb64-60e_300wlp-256x256.py index e96a6bf0ebbc5055d6f19ca7803eec647dc28448..a6c60bd6de49ed95fa751ada0e6d4670f9011643 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_heatmap/300wlp/td-hm_hrnetv2-w18_8xb64-60e_300wlp-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_heatmap/300wlp/td-hm_hrnetv2-w18_8xb64-60e_300wlp-256x256.py @@ -1,124 +1,94 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=60, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=60, - milestones=[40, 55], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=60, milestones=[40, 55], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(256, 256), - heatmap_size=(64, 64), - sigma=1.5) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=1.5) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18'), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=68, deconv_out_channels=None, - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'Face300WLPDataset' -data_mode = 'topdown' -data_root = 'data/300wlp/' +dataset_type = "Face300WLPDataset" +data_mode = "topdown" +data_root = "data/300wlp/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_prob=0, - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0, rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -126,35 +96,37 @@ train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_300wlp_train.json', - data_prefix=dict(img='train/'), + ann_file="annotations/face_landmarks_300wlp_train.json", + data_prefix=dict(img="train/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_300wlp_valid.json', - data_prefix=dict(img='val/'), + ann_file="annotations/face_landmarks_300wlp_valid.json", + data_prefix=dict(img="val/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_heatmap/aflw/td-hm_hrnetv2-w18_8xb64-60e_aflw-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_heatmap/aflw/td-hm_hrnetv2-w18_8xb64-60e_aflw-256x256.py index a157a01442f155d34f7fd330014028bc77c4f888..54c4b7adcca8eb85b723edf9db69230c62286c1b 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_heatmap/aflw/td-hm_hrnetv2-w18_8xb64-60e_aflw-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_heatmap/aflw/td-hm_hrnetv2-w18_8xb64-60e_aflw-256x256.py @@ -1,122 +1,95 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=60, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=60, - milestones=[40, 55], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=60, milestones=[40, 55], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18'), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=19, deconv_out_channels=None, - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AFLWDataset' -data_mode = 'topdown' -data_root = 'data/aflw/' +dataset_type = "AFLWDataset" +data_mode = "topdown" +data_root = "data/aflw/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_prob=0, - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0, rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -124,33 +97,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_aflw_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_aflw_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_aflw_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_aflw_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='NME', norm_mode='use_norm_item', norm_item='bbox_size') +val_evaluator = dict(type="NME", norm_mode="use_norm_item", norm_item="bbox_size") test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_heatmap/aflw/td-hm_hrnetv2-w18_dark-8xb64-60e_aflw-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_heatmap/aflw/td-hm_hrnetv2-w18_dark-8xb64-60e_aflw-256x256.py index 44100cebe60bbe023837dba7586f2c913b731918..175126264ba42c752e91f88d1578f59fa007e5e9 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_heatmap/aflw/td-hm_hrnetv2-w18_dark-8xb64-60e_aflw-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_heatmap/aflw/td-hm_hrnetv2-w18_dark-8xb64-60e_aflw-256x256.py @@ -1,126 +1,95 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=60, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=60, - milestones=[40, 55], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=60, milestones=[40, 55], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(256, 256), - heatmap_size=(64, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18'), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=19, deconv_out_channels=None, - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'AFLWDataset' -data_mode = 'topdown' -data_root = 'data/aflw/' +dataset_type = "AFLWDataset" +data_mode = "topdown" +data_root = "data/aflw/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_prob=0, - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0, rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -128,33 +97,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_aflw_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_aflw_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_aflw_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_aflw_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators -val_evaluator = dict( - type='NME', norm_mode='use_norm_item', norm_item='bbox_size') +val_evaluator = dict(type="NME", norm_mode="use_norm_item", norm_item="bbox_size") test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_hourglass52_8xb32-60e_coco-wholebody-face-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_hourglass52_8xb32-60e_coco-wholebody-face-256x256.py index 0e6f5c5c9084bf03ec95e203c57bad4a91ce7179..18d78d47639e59c68ab518ca3624f4882f9289ae 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_hourglass52_8xb32-60e_coco-wholebody-face-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_hourglass52_8xb32-60e_coco-wholebody-face-256x256.py @@ -1,87 +1,75 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=60, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[40, 55], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[40, 55], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HourglassNet', + type="HourglassNet", num_stacks=1, ), head=dict( - type='CPMHead', + type="CPMHead", in_channels=256, out_channels=68, num_stages=1, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyFaceDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyFaceDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,35 +77,37 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_hrnetv2-w18_8xb32-60e_coco-wholebody-face-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_hrnetv2-w18_8xb32-60e_coco-wholebody-face-256x256.py index dfeac90ced1307eeaa8fe9c83c59a3ae67b1cb23..7c2ea191ab1596ede83f17d53743547ec3c58566 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_hrnetv2-w18_8xb32-60e_coco-wholebody-face-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_hrnetv2-w18_8xb32-60e_coco-wholebody-face-256x256.py @@ -1,120 +1,95 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=60, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[40, 55], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[40, 55], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18')), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), + ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=68, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyFaceDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyFaceDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -122,35 +97,37 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_hrnetv2-w18_dark-8xb32-60e_coco-wholebody-face-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_hrnetv2-w18_dark-8xb32-60e_coco-wholebody-face-256x256.py index 3c34f9aa5dc733f6dd1363212791b7f2c5b7f447..fb8145c0031b8bf922a33d8c31b14d4325e1f70b 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_hrnetv2-w18_dark-8xb32-60e_coco-wholebody-face-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_hrnetv2-w18_dark-8xb32-60e_coco-wholebody-face-256x256.py @@ -1,124 +1,95 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=60, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[40, 55], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[40, 55], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(256, 256), - heatmap_size=(64, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18')), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), + ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=68, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyFaceDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyFaceDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -126,35 +97,37 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_mobilenetv2_8xb32-60e_coco-wholebody-face-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_mobilenetv2_8xb32-60e_coco-wholebody-face-256x256.py index 6f1a8629fc7448a4edc5e3a98b554b615efb7102..fa45ace8604ddeec1676b24c1fd49933370e62c8 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_mobilenetv2_8xb32-60e_coco-wholebody-face-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_mobilenetv2_8xb32-60e_coco-wholebody-face-256x256.py @@ -1,86 +1,68 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=60, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[40, 55], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[40, 55], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='MobileNetV2', - widen_factor=1., - out_indices=(7, ), - init_cfg=dict(type='Pretrained', checkpoint='mmcls://mobilenet_v2')), + type="MobileNetV2", widen_factor=1.0, out_indices=(7,), init_cfg=dict(type="Pretrained", checkpoint="mmcls://mobilenet_v2") + ), head=dict( - type='HeatmapHead', - in_channels=1280, - out_channels=68, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1280, out_channels=68, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyFaceDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyFaceDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -88,35 +70,37 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_res50_8xb32-60e_coco-wholebody-face-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_res50_8xb32-60e_coco-wholebody-face-256x256.py index 0070e55d69d26b5e50edfef7868dc4faa5b0b5f4..1754b55f69fc7c56416b233adc0d4c1017bbfbe8 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_res50_8xb32-60e_coco-wholebody-face-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_res50_8xb32-60e_coco-wholebody-face-256x256.py @@ -1,85 +1,66 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=60, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[40, 55], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[40, 55], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict( - type='ResNet', - depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ResNet", depth=50, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50")), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=68, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=68, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyFaceDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyFaceDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,35 +68,37 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_scnet50_8xb32-60e_coco-wholebody-face-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_scnet50_8xb32-60e_coco-wholebody-face-256x256.py index 8f79f4b1d362b527cd684ae927e61cf17ec821cd..3ee7f7464c9e45c34d7a50e0ddeed9bbd53cc256 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_scnet50_8xb32-60e_coco-wholebody-face-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_heatmap/coco_wholebody_face/td-hm_scnet50_8xb32-60e_coco-wholebody-face-256x256.py @@ -1,88 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=60, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[40, 55], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[40, 55], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SCNet', + type="SCNet", depth=50, - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/scnet50-7ef0a199.pth')), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/scnet50-7ef0a199.pth"), + ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=68, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=68, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyFaceDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyFaceDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -90,35 +72,37 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_heatmap/cofw/td-hm_hrnetv2-w18_8xb64-60e_cofw-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_heatmap/cofw/td-hm_hrnetv2-w18_8xb64-60e_cofw-256x256.py index 7c52342e950246755a9e5c0ed60302da936bb6fe..d8c644396a06afc2f99657ce44c40b281fc51f9a 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_heatmap/cofw/td-hm_hrnetv2-w18_8xb64-60e_cofw-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_heatmap/cofw/td-hm_hrnetv2-w18_8xb64-60e_cofw-256x256.py @@ -1,125 +1,95 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=60, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=50, - milestones=[40, 55], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=50, milestones=[40, 55], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(256, 256), - heatmap_size=(64, 64), - sigma=1.5) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=1.5) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18'), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=29, deconv_out_channels=None, - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'COFWDataset' -data_mode = 'topdown' -data_root = 'data/cofw/' +dataset_type = "COFWDataset" +data_mode = "topdown" +data_root = "data/cofw/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_prob=0, - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0, rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -127,35 +97,37 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/cofw_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/cofw_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/cofw_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/cofw_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_heatmap/wflw/td-hm_hrnetv2-w18_8xb64-60e_wflw-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_heatmap/wflw/td-hm_hrnetv2-w18_8xb64-60e_wflw-256x256.py index ae373c816aec3e09f7780f304ce11687e48b8e32..10272c3830af6efef239a32a88cc614beb95bc26 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_heatmap/wflw/td-hm_hrnetv2-w18_8xb64-60e_wflw-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_heatmap/wflw/td-hm_hrnetv2-w18_8xb64-60e_wflw-256x256.py @@ -1,122 +1,95 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=60, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=60, - milestones=[40, 55], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=60, milestones=[40, 55], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18'), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=98, deconv_out_channels=None, - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'WFLWDataset' -data_mode = 'topdown' -data_root = 'data/wflw/' +dataset_type = "WFLWDataset" +data_mode = "topdown" +data_root = "data/wflw/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_prob=0, - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0, rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -124,35 +97,37 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_wflw_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_wflw_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_heatmap/wflw/td-hm_hrnetv2-w18_awing-8xb64-60e_wflw-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_heatmap/wflw/td-hm_hrnetv2-w18_awing-8xb64-60e_wflw-256x256.py index ada24a97bb7954d42b2d300ea9e8a14b494da938..d61c8949452074394f91926d05756074b076a2ce 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_heatmap/wflw/td-hm_hrnetv2-w18_awing-8xb64-60e_wflw-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_heatmap/wflw/td-hm_hrnetv2-w18_awing-8xb64-60e_wflw-256x256.py @@ -1,122 +1,95 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=60, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=60, - milestones=[40, 55], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=60, milestones=[40, 55], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18'), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=98, deconv_out_channels=None, - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - loss=dict(type='AdaptiveWingLoss', use_target_weight=True), - decoder=codec), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + loss=dict(type="AdaptiveWingLoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'WFLWDataset' -data_mode = 'topdown' -data_root = 'data/wflw/' +dataset_type = "WFLWDataset" +data_mode = "topdown" +data_root = "data/wflw/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_prob=0, - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0, rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -124,35 +97,37 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_wflw_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_wflw_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_heatmap/wflw/td-hm_hrnetv2-w18_dark-8xb64-60e_wflw-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_heatmap/wflw/td-hm_hrnetv2-w18_dark-8xb64-60e_wflw-256x256.py index 973a850f3fdf2ab6300e8e56c4e1b92b15d3f63a..0f4c68d82e0b8eca4b1c37a7c1c75d0bc3f795a8 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_heatmap/wflw/td-hm_hrnetv2-w18_dark-8xb64-60e_wflw-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_heatmap/wflw/td-hm_hrnetv2-w18_dark-8xb64-60e_wflw-256x256.py @@ -1,126 +1,95 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=60, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=2e-3, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=2e-3, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=60, - milestones=[40, 55], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=60, milestones=[40, 55], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(256, 256), - heatmap_size=(64, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18'), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=98, deconv_out_channels=None, - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'WFLWDataset' -data_mode = 'topdown' -data_root = 'data/wflw/' +dataset_type = "WFLWDataset" +data_mode = "topdown" +data_root = "data/wflw/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_prob=0, - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_prob=0, rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -128,35 +97,37 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_wflw_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_wflw_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_regression/wflw/td-reg_res50_8xb64-210e_wflw-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_regression/wflw/td-reg_res50_8xb64-210e_wflw-256x256.py index 2742f497b8fbdd7889281c660b9ccd804ccf754d..8763d262adf3fd87fe90137df2239cdb7d2180aa 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_regression/wflw/td-reg_res50_8xb64-210e_wflw-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_regression/wflw/td-reg_res50_8xb64-210e_wflw-256x256.py @@ -1,83 +1,68 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(256, 256)) +codec = dict(type="RegressionLabel", input_size=(256, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), - neck=dict(type='GlobalAveragePooling'), + neck=dict(type="GlobalAveragePooling"), head=dict( - type='RegressionHead', - in_channels=2048, - num_joints=98, - loss=dict(type='SmoothL1Loss', use_target_weight=True), - decoder=codec), + type="RegressionHead", in_channels=2048, num_joints=98, loss=dict(type="SmoothL1Loss", use_target_weight=True), decoder=codec + ), train_cfg=dict(), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'WFLWDataset' -data_mode = 'topdown' -data_root = 'data/wflw/' +dataset_type = "WFLWDataset" +data_mode = "topdown" +data_root = "data/wflw/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # dataloaders @@ -85,38 +70,40 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_wflw_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_wflw_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less')) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less")) # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_regression/wflw/td-reg_res50_softwingloss_8xb64-210e_wflw-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_regression/wflw/td-reg_res50_softwingloss_8xb64-210e_wflw-256x256.py index eb4199073d712024f0495746ad902f4ea4dd9052..e583257ebed9f71cb6cdaf7a85e957ca864a823d 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_regression/wflw/td-reg_res50_softwingloss_8xb64-210e_wflw-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_regression/wflw/td-reg_res50_softwingloss_8xb64-210e_wflw-256x256.py @@ -1,83 +1,68 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(256, 256)) +codec = dict(type="RegressionLabel", input_size=(256, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), - neck=dict(type='GlobalAveragePooling'), + neck=dict(type="GlobalAveragePooling"), head=dict( - type='RegressionHead', - in_channels=2048, - num_joints=98, - loss=dict(type='SoftWingLoss', use_target_weight=True), - decoder=codec), + type="RegressionHead", in_channels=2048, num_joints=98, loss=dict(type="SoftWingLoss", use_target_weight=True), decoder=codec + ), train_cfg=dict(), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'WFLWDataset' -data_mode = 'topdown' -data_root = 'data/wflw/' +dataset_type = "WFLWDataset" +data_mode = "topdown" +data_root = "data/wflw/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # dataloaders @@ -85,38 +70,40 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_wflw_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_wflw_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less')) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less")) # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/face_2d_keypoint/topdown_regression/wflw/td-reg_res50_wingloss_8xb64-210e_wflw-256x256.py b/mmpose/configs/face_2d_keypoint/topdown_regression/wflw/td-reg_res50_wingloss_8xb64-210e_wflw-256x256.py index ab519cd401bbd07212e4834c9a5d655418b49fb1..963f4708b3cc495f46fa56d273e098b2504c2186 100644 --- a/mmpose/configs/face_2d_keypoint/topdown_regression/wflw/td-reg_res50_wingloss_8xb64-210e_wflw-256x256.py +++ b/mmpose/configs/face_2d_keypoint/topdown_regression/wflw/td-reg_res50_wingloss_8xb64-210e_wflw-256x256.py @@ -1,83 +1,66 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict(type='RegressionLabel', input_size=(256, 256)) +codec = dict(type="RegressionLabel", input_size=(256, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), - neck=dict(type='GlobalAveragePooling'), - head=dict( - type='RegressionHead', - in_channels=2048, - num_joints=98, - loss=dict(type='WingLoss', use_target_weight=True), - decoder=codec), + neck=dict(type="GlobalAveragePooling"), + head=dict(type="RegressionHead", in_channels=2048, num_joints=98, loss=dict(type="WingLoss", use_target_weight=True), decoder=codec), train_cfg=dict(), test_cfg=dict( flip_test=True, shift_coords=True, - )) + ), +) # base dataset settings -dataset_type = 'WFLWDataset' -data_mode = 'topdown' -data_root = 'data/wflw/' +dataset_type = "WFLWDataset" +data_mode = "topdown" +data_root = "data/wflw/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # dataloaders @@ -85,38 +68,40 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="images/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/face_landmarks_wflw_test.json', - data_prefix=dict(img='images/'), + ann_file="annotations/face_landmarks_wflw_test.json", + data_prefix=dict(img="images/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict(checkpoint=dict(save_best='NME', rule='less')) +default_hooks = dict(checkpoint=dict(save_best="NME", rule="less")) # evaluators val_evaluator = dict( - type='NME', - norm_mode='keypoint_distance', + type="NME", + norm_mode="keypoint_distance", ) test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_8xb64-210e_deepfashion_full-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_8xb64-210e_deepfashion_full-256x192.py index 4a30ead782415117e309c6bba4da904740e6c884..abc2b228c072b5bdae72b7e131899ea365618648 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_8xb64-210e_deepfashion_full-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_8xb64-210e_deepfashion_full-256x192.py @@ -1,114 +1,81 @@ -_base_ = '../../../_base_/default_runtime.py' +_base_ = "../../../_base_/default_runtime.py" # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=8, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashionDataset' -data_mode = 'topdown' -data_root = 'data/fld/' +dataset_type = "DeepFashionDataset" +data_mode = "topdown" +data_root = "data/fld/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] test_pipeline = val_pipeline @@ -117,53 +84,56 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - subset='full', + subset="full", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_full_train.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_full_train.json", + data_prefix=dict(img="img/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - subset='full', + subset="full", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_full_val.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_full_val.json", + data_prefix=dict(img="img/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - subset='full', + subset="full", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_full_test.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_full_test.json", + data_prefix=dict(img="img/"), test_mode=True, pipeline=test_pipeline, - )) + ), +) # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_8xb64-210e_deepfashion_lower-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_8xb64-210e_deepfashion_lower-256x192.py index 0a86c38ba8161bdb129db09e87d64c964242877a..155ae73f10ab997978ff505f7297a275dace1abc 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_8xb64-210e_deepfashion_lower-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_8xb64-210e_deepfashion_lower-256x192.py @@ -1,114 +1,81 @@ -_base_ = '../../../_base_/default_runtime.py' +_base_ = "../../../_base_/default_runtime.py" # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=4, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashionDataset' -data_mode = 'topdown' -data_root = 'data/fld/' +dataset_type = "DeepFashionDataset" +data_mode = "topdown" +data_root = "data/fld/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] test_pipeline = val_pipeline @@ -117,53 +84,56 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - subset='lower', + subset="lower", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_lower_train.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_lower_train.json", + data_prefix=dict(img="img/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - subset='lower', + subset="lower", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_lower_val.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_lower_val.json", + data_prefix=dict(img="img/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - subset='lower', + subset="lower", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_lower_test.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_lower_test.json", + data_prefix=dict(img="img/"), test_mode=True, pipeline=test_pipeline, - )) + ), +) # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_8xb64-210e_deepfashion_upper-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_8xb64-210e_deepfashion_upper-256x192.py index 7d6af18fd947fb480b6977aa9f3b28ee0e6c1e30..251cdd0d592e75cb0ff5f90dec60972cf09fe15e 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_8xb64-210e_deepfashion_upper-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_8xb64-210e_deepfashion_upper-256x192.py @@ -1,114 +1,81 @@ -_base_ = '../../../_base_/default_runtime.py' +_base_ = "../../../_base_/default_runtime.py" # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=6, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashionDataset' -data_mode = 'topdown' -data_root = 'data/fld/' +dataset_type = "DeepFashionDataset" +data_mode = "topdown" +data_root = "data/fld/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] test_pipeline = val_pipeline @@ -117,53 +84,56 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - subset='upper', + subset="upper", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_upper_train.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_upper_train.json", + data_prefix=dict(img="img/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - subset='upper', + subset="upper", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_upper_val.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_upper_val.json", + data_prefix=dict(img="img/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - subset='upper', + subset="upper", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_upper_test.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_upper_test.json", + data_prefix=dict(img="img/"), test_mode=True, pipeline=test_pipeline, - )) + ), +) # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_udp_8xb64-210e_deepfashion_full-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_udp_8xb64-210e_deepfashion_full-256x192.py index 8977c25b56930ca89774a4147edc385b994eeb7f..4b202f15743b970f624cd0be007eeba1334f8a91 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_udp_8xb64-210e_deepfashion_full-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_udp_8xb64-210e_deepfashion_full-256x192.py @@ -1,26 +1,24 @@ -_base_ = './td-hm_hrnet-w32_8xb64-210e_deepfashion_full-256x192.py' +_base_ = "./td-hm_hrnet-w32_8xb64-210e_deepfashion_full-256x192.py" # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) -model = dict( - test_cfg=dict(flip_test=True, flip_mode='heatmap', shift_heatmap=False)) +model = dict(test_cfg=dict(flip_test=True, flip_mode="heatmap", shift_heatmap=False)) # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_udp_8xb64-210e_deepfashion_lower-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_udp_8xb64-210e_deepfashion_lower-256x192.py index 595035b132f954cfcd58c8b23de410ef6f710e8b..9536ab5e8d6d9d058f8c6542a315bcb71efc2936 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_udp_8xb64-210e_deepfashion_lower-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_udp_8xb64-210e_deepfashion_lower-256x192.py @@ -1,26 +1,24 @@ -_base_ = './td-hm_hrnet-w32_8xb64-210e_deepfashion_lower-256x192.py' +_base_ = "./td-hm_hrnet-w32_8xb64-210e_deepfashion_lower-256x192.py" # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) -model = dict( - test_cfg=dict(flip_test=True, flip_mode='heatmap', shift_heatmap=False)) +model = dict(test_cfg=dict(flip_test=True, flip_mode="heatmap", shift_heatmap=False)) # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_udp_8xb64-210e_deepfashion_upper-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_udp_8xb64-210e_deepfashion_upper-256x192.py index 777ffddb22047daf7b1183f530f8509495fc92ce..430c37dd09a50258b826716ad4dcc1cb2cd0e639 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_udp_8xb64-210e_deepfashion_upper-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w32_udp_8xb64-210e_deepfashion_upper-256x192.py @@ -1,26 +1,24 @@ -_base_ = './td-hm_hrnet-w32_8xb64-210e_deepfashion_upper-256x192.py' +_base_ = "./td-hm_hrnet-w32_8xb64-210e_deepfashion_upper-256x192.py" # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) -model = dict( - test_cfg=dict(flip_test=True, flip_mode='heatmap', shift_heatmap=False)) +model = dict(test_cfg=dict(flip_test=True, flip_mode="heatmap", shift_heatmap=False)) # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_8xb32-210e_deepfashion_full-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_8xb32-210e_deepfashion_full-256x192.py index bf7a80d59f8c81ffdbd2f8e3764c32019561812b..5207ac67f27460ebd4b31f2abe31c4ccb81cb9fc 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_8xb32-210e_deepfashion_full-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_8xb32-210e_deepfashion_full-256x192.py @@ -1,42 +1,21 @@ -_base_ = './td-hm_hrnet-w32_8xb64-210e_deepfashion_full-256x192.py' +_base_ = "./td-hm_hrnet-w32_8xb64-210e_deepfashion_full-256x192.py" # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) model = dict( backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), - head=dict(in_channels=48)) + head=dict(in_channels=48), +) train_dataloader = dict(batch_size=32) diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_8xb32-210e_deepfashion_lower-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_8xb32-210e_deepfashion_lower-256x192.py index a26e3f0cd43950ec6e56f1ef02b8c726f2abce4b..354808131a99e73e3fd3f4a272a1faf645136cc1 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_8xb32-210e_deepfashion_lower-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_8xb32-210e_deepfashion_lower-256x192.py @@ -1,42 +1,21 @@ -_base_ = './td-hm_hrnet-w32_8xb64-210e_deepfashion_lower-256x192.py' # noqa +_base_ = "./td-hm_hrnet-w32_8xb64-210e_deepfashion_lower-256x192.py" # noqa # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) model = dict( backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), - head=dict(in_channels=48)) + head=dict(in_channels=48), +) train_dataloader = dict(batch_size=32) diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_8xb32-210e_deepfashion_upper-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_8xb32-210e_deepfashion_upper-256x192.py index cd619bd96307420cbf6885a73df2b4f5b2635783..35e2469a490ae827d87eb7e4496d2d3d593bf0ad 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_8xb32-210e_deepfashion_upper-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_8xb32-210e_deepfashion_upper-256x192.py @@ -1,42 +1,21 @@ -_base_ = './td-hm_hrnet-w32_8xb64-210e_deepfashion_upper-256x192.py' # noqa +_base_ = "./td-hm_hrnet-w32_8xb64-210e_deepfashion_upper-256x192.py" # noqa # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) model = dict( backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), - head=dict(in_channels=48)) + head=dict(in_channels=48), +) train_dataloader = dict(batch_size=32) diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_udp_8xb32-210e_deepfashion_full-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_udp_8xb32-210e_deepfashion_full-256x192.py index 5445d7d377ddae776760b81c6e12249d697a1928..45428b510bf2d325d39d0aa11b039c6d76664faf 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_udp_8xb32-210e_deepfashion_full-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_udp_8xb32-210e_deepfashion_full-256x192.py @@ -1,31 +1,29 @@ -_base_ = './td-hm_hrnet-w48_8xb32-210e_deepfashion_full-256x192.py' +_base_ = "./td-hm_hrnet-w48_8xb32-210e_deepfashion_full-256x192.py" # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) -model = dict( - test_cfg=dict(flip_test=True, flip_mode='heatmap', shift_heatmap=False)) +model = dict(test_cfg=dict(flip_test=True, flip_mode="heatmap", shift_heatmap=False)) # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] train_dataloader = dict(batch_size=32) diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_udp_8xb32-210e_deepfashion_lower-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_udp_8xb32-210e_deepfashion_lower-256x192.py index c7c5c0966653b59f4a5e84f188bd226793ec8ab6..f53a402dc89f2fbabb9f4eaa27b76a5ef39bf688 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_udp_8xb32-210e_deepfashion_lower-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_udp_8xb32-210e_deepfashion_lower-256x192.py @@ -1,31 +1,29 @@ -_base_ = './td-hm_hrnet-w48_8xb32-210e_deepfashion_lower-256x192.py' +_base_ = "./td-hm_hrnet-w48_8xb32-210e_deepfashion_lower-256x192.py" # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) -model = dict( - test_cfg=dict(flip_test=True, flip_mode='heatmap', shift_heatmap=False)) +model = dict(test_cfg=dict(flip_test=True, flip_mode="heatmap", shift_heatmap=False)) # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] train_dataloader = dict(batch_size=32) diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_udp_8xb32-210e_deepfashion_upper-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_udp_8xb32-210e_deepfashion_upper-256x192.py index 706a87da84d006582078f34774b70a70e38d553f..b4d8b0c0753c9be1c1de8290057b2efbf86f7c7e 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_udp_8xb32-210e_deepfashion_upper-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_hrnet-w48_udp_8xb32-210e_deepfashion_upper-256x192.py @@ -1,31 +1,29 @@ -_base_ = './td-hm_hrnet-w48_8xb32-210e_deepfashion_upper-256x192.py' +_base_ = "./td-hm_hrnet-w48_8xb32-210e_deepfashion_upper-256x192.py" # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) -model = dict( - test_cfg=dict(flip_test=True, flip_mode='heatmap', shift_heatmap=False)) +model = dict(test_cfg=dict(flip_test=True, flip_mode="heatmap", shift_heatmap=False)) # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size'], use_udp=True), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"], use_udp=True), + dict(type="PackPoseInputs"), ] train_dataloader = dict(batch_size=32) diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res101_8xb64-210e_deepfashion_full-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res101_8xb64-210e_deepfashion_full-256x192.py index 57e9558f7602b07b26fbb198e4d6fb3233e2e9e8..11d570475ce9ca4a14c1ff000e1c904d3f3e2c75 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res101_8xb64-210e_deepfashion_full-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res101_8xb64-210e_deepfashion_full-256x192.py @@ -1,8 +1,3 @@ -_base_ = './td-hm_res50_8xb64-210e_deepfashion_full-256x192.py' +_base_ = "./td-hm_res50_8xb64-210e_deepfashion_full-256x192.py" -model = dict( - backbone=dict( - type='ResNet', - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(type="ResNet", depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res101_8xb64-210e_deepfashion_lower-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res101_8xb64-210e_deepfashion_lower-256x192.py index 0073adfdfbcc9cd1695f6fc28776da6df4fa110b..0899d9b6a80973e40818526ea7e01086949cd3e4 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res101_8xb64-210e_deepfashion_lower-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res101_8xb64-210e_deepfashion_lower-256x192.py @@ -1,8 +1,3 @@ -_base_ = './td-hm_res50_8xb64-210e_deepfashion_lower-256x192.py' +_base_ = "./td-hm_res50_8xb64-210e_deepfashion_lower-256x192.py" -model = dict( - backbone=dict( - type='ResNet', - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(type="ResNet", depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res101_8xb64-210e_deepfashion_upper-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res101_8xb64-210e_deepfashion_upper-256x192.py index cf2198fa2804547f17afd037e4f4e282f4ca2b63..122af7b568b10c45e16ad9a4b9ffe137d70b2f21 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res101_8xb64-210e_deepfashion_upper-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res101_8xb64-210e_deepfashion_upper-256x192.py @@ -1,8 +1,3 @@ -_base_ = './td-hm_res50_8xb64-210e_deepfashion_upper-256x192.py' +_base_ = "./td-hm_res50_8xb64-210e_deepfashion_upper-256x192.py" -model = dict( - backbone=dict( - type='ResNet', - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(type="ResNet", depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res152_8xb32-210e_deepfashion_full-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res152_8xb32-210e_deepfashion_full-256x192.py index 04dee6d3a5f4811588fa42d6fa821e9d1883b52e..2935fe8760c9eee5cdb73d791cc40d1fcc14167c 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res152_8xb32-210e_deepfashion_full-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res152_8xb32-210e_deepfashion_full-256x192.py @@ -1,13 +1,8 @@ -_base_ = './td-hm_res50_8xb64-210e_deepfashion_full-256x192.py' +_base_ = "./td-hm_res50_8xb64-210e_deepfashion_full-256x192.py" # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) -model = dict( - backbone=dict( - type='ResNet', - depth=152, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet152'))) +model = dict(backbone=dict(type="ResNet", depth=152, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"))) train_dataloader = dict(batch_size=32) diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res152_8xb32-210e_deepfashion_lower-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res152_8xb32-210e_deepfashion_lower-256x192.py index ef4b3d57d300f7645c08ba5e3f4de378b449c80f..31b43192e6bfbe20567f2e83a5e5db6a2c050370 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res152_8xb32-210e_deepfashion_lower-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res152_8xb32-210e_deepfashion_lower-256x192.py @@ -1,13 +1,8 @@ -_base_ = './td-hm_res50_8xb64-210e_deepfashion_lower-256x192.py' +_base_ = "./td-hm_res50_8xb64-210e_deepfashion_lower-256x192.py" # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) -model = dict( - backbone=dict( - type='ResNet', - depth=152, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet152'))) +model = dict(backbone=dict(type="ResNet", depth=152, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"))) train_dataloader = dict(batch_size=32) diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res152_8xb32-210e_deepfashion_upper-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res152_8xb32-210e_deepfashion_upper-256x192.py index 122ad6817ac9d7d763b99d72f01e4f69b1721953..c51a2d2c1b662b14c7de94a868e5d418137c9f00 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res152_8xb32-210e_deepfashion_upper-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res152_8xb32-210e_deepfashion_upper-256x192.py @@ -1,13 +1,8 @@ -_base_ = './td-hm_res50_8xb64-210e_deepfashion_upper-256x192.py' +_base_ = "./td-hm_res50_8xb64-210e_deepfashion_upper-256x192.py" # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) -model = dict( - backbone=dict( - type='ResNet', - depth=152, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet152'))) +model = dict(backbone=dict(type="ResNet", depth=152, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"))) train_dataloader = dict(batch_size=32) diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res50_8xb64-210e_deepfashion_full-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res50_8xb64-210e_deepfashion_full-256x192.py index 292e83cb12dfcfeb4193d2be8d9844cf11816a4f..c61b402b2cac0a6d762e259e4df6948c2ffb55d6 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res50_8xb64-210e_deepfashion_full-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res50_8xb64-210e_deepfashion_full-256x192.py @@ -1,85 +1,70 @@ -_base_ = '../../../_base_/default_runtime.py' +_base_ = "../../../_base_/default_runtime.py" # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=8, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=8, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashionDataset' -data_mode = 'topdown' -data_root = 'data/fld/' +dataset_type = "DeepFashionDataset" +data_mode = "topdown" +data_root = "data/fld/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] test_pipeline = val_pipeline @@ -88,53 +73,56 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - subset='full', + subset="full", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_full_train.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_full_train.json", + data_prefix=dict(img="img/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - subset='full', + subset="full", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_full_val.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_full_val.json", + data_prefix=dict(img="img/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - subset='full', + subset="full", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_full_test.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_full_test.json", + data_prefix=dict(img="img/"), test_mode=True, pipeline=test_pipeline, - )) + ), +) # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res50_8xb64-210e_deepfashion_lower-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res50_8xb64-210e_deepfashion_lower-256x192.py index 51e4ddfcbd251a47925682edd4017a18a8af0f03..128f971c0892961afe415f5a4a3087b0b5fead49 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res50_8xb64-210e_deepfashion_lower-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res50_8xb64-210e_deepfashion_lower-256x192.py @@ -1,85 +1,70 @@ -_base_ = '../../../_base_/default_runtime.py' +_base_ = "../../../_base_/default_runtime.py" # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=64) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=4, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=4, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashionDataset' -data_mode = 'topdown' -data_root = 'data/fld/' +dataset_type = "DeepFashionDataset" +data_mode = "topdown" +data_root = "data/fld/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] test_pipeline = val_pipeline @@ -88,53 +73,56 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - subset='lower', + subset="lower", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_lower_train.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_lower_train.json", + data_prefix=dict(img="img/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - subset='lower', + subset="lower", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_lower_val.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_lower_val.json", + data_prefix=dict(img="img/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - subset='lower', + subset="lower", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_lower_test.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_lower_test.json", + data_prefix=dict(img="img/"), test_mode=True, pipeline=test_pipeline, - )) + ), +) # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res50_8xb64-210e_deepfashion_upper-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res50_8xb64-210e_deepfashion_upper-256x192.py index 29663966904fbd994d274b4a5ecf8c6393ec8ad5..99e998667c931fb3b277cdc8fcc0eb7de119ad1f 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res50_8xb64-210e_deepfashion_upper-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion/td-hm_res50_8xb64-210e_deepfashion_upper-256x192.py @@ -1,85 +1,70 @@ -_base_ = '../../../_base_/default_runtime.py' +_base_ = "../../../_base_/default_runtime.py" # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=64) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=6, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=6, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashionDataset' -data_mode = 'topdown' -data_root = 'data/fld/' +dataset_type = "DeepFashionDataset" +data_mode = "topdown" +data_root = "data/fld/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] test_pipeline = val_pipeline @@ -88,53 +73,56 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - subset='upper', + subset="upper", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_upper_train.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_upper_train.json", + data_prefix=dict(img="img/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - subset='upper', + subset="upper", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_upper_val.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_upper_val.json", + data_prefix=dict(img="img/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - subset='upper', + subset="upper", data_root=data_root, data_mode=data_mode, - ann_file='annotations/fld_upper_test.json', - data_prefix=dict(img='img/'), + ann_file="annotations/fld_upper_test.json", + data_prefix=dict(img="img/"), test_mode=True, pipeline=test_pipeline, - )) + ), +) # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_1xb64-210e_deepfasion2-long-sleeved-dress-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_1xb64-210e_deepfasion2-long-sleeved-dress-256x192.py index 09dfaaa390bb2020e4a511d6ba111d35d5fa4378..6aaf4c8ad1d0062e1ce36b3b6c8c1d7cb283f019 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_1xb64-210e_deepfasion2-long-sleeved-dress-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_1xb64-210e_deepfasion2-long-sleeved-dress-256x192.py @@ -1,85 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=64) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=294, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=294, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashion2Dataset' -data_mode = 'topdown' -data_root = 'data/deepfasion2/' +dataset_type = "DeepFashion2Dataset" +data_mode = "topdown" +data_root = "data/deepfasion2/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='train/deepfashion2_long_sleeved_dress_train.json', - data_prefix=dict(img='train/image/'), + ann_file="train/deepfashion2_long_sleeved_dress_train.json", + data_prefix=dict(img="train/image/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='validation/deepfashion2_long_sleeved_dress_validation.json', - data_prefix=dict(img='validation/image/'), + ann_file="validation/deepfashion2_long_sleeved_dress_validation.json", + data_prefix=dict(img="validation/image/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_1xb64-210e_deepfasion2-skirt-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_1xb64-210e_deepfasion2-skirt-256x192.py index f0e6f0c63218874f4e40bdd06eb0cbc57b9365a7..053f73a2880b81cffc72bc7f72ad1c7a872843c0 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_1xb64-210e_deepfasion2-skirt-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_1xb64-210e_deepfasion2-skirt-256x192.py @@ -1,85 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=64) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=294, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=294, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashion2Dataset' -data_mode = 'topdown' -data_root = 'data/deepfasion2/' +dataset_type = "DeepFashion2Dataset" +data_mode = "topdown" +data_root = "data/deepfasion2/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='train/deepfashion2_skirt_train.json', - data_prefix=dict(img='train/image/'), + ann_file="train/deepfashion2_skirt_train.json", + data_prefix=dict(img="train/image/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='validation/deepfashion2_skirt_validation.json', - data_prefix=dict(img='validation/image/'), + ann_file="validation/deepfashion2_skirt_validation.json", + data_prefix=dict(img="validation/image/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_1xb64-210e_deepfasion2-vest-dress-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_1xb64-210e_deepfasion2-vest-dress-256x192.py index 9bed7421991041145f028e2b91689b8c5125d205..49e5ca179ba4724451fa33f9be7adf1b8fdcfa5a 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_1xb64-210e_deepfasion2-vest-dress-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_1xb64-210e_deepfasion2-vest-dress-256x192.py @@ -1,85 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=64) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=294, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=294, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashion2Dataset' -data_mode = 'topdown' -data_root = 'data/deepfasion2/' +dataset_type = "DeepFashion2Dataset" +data_mode = "topdown" +data_root = "data/deepfasion2/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='train/deepfashion2_vest_dress_train.json', - data_prefix=dict(img='train/image/'), + ann_file="train/deepfashion2_vest_dress_train.json", + data_prefix=dict(img="train/image/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='validation/deepfashion2_vest_dress_validation.json', - data_prefix=dict(img='validation/image/'), + ann_file="validation/deepfashion2_vest_dress_validation.json", + data_prefix=dict(img="validation/image/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_2xb64-210e_deepfasion2-trousers-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_2xb64-210e_deepfasion2-trousers-256x192.py index 617e59ae74be40511256c2b9e358300ea2348f27..ee60d051cd01d4a4a5d67f5426c8501336dbd03b 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_2xb64-210e_deepfasion2-trousers-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_2xb64-210e_deepfasion2-trousers-256x192.py @@ -1,85 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=128) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=294, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=294, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashion2Dataset' -data_mode = 'topdown' -data_root = 'data/deepfasion2/' +dataset_type = "DeepFashion2Dataset" +data_mode = "topdown" +data_root = "data/deepfasion2/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='train/deepfashion2_trousers_train.json', - data_prefix=dict(img='train/image/'), + ann_file="train/deepfashion2_trousers_train.json", + data_prefix=dict(img="train/image/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='validation/deepfashion2_trousers_validation.json', - data_prefix=dict(img='validation/image/'), + ann_file="validation/deepfashion2_trousers_validation.json", + data_prefix=dict(img="validation/image/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_3xb64-210e_deepfasion2-shorts-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_3xb64-210e_deepfasion2-shorts-256x192.py index aa3b2774fcaedf9c7ace5a335775011e6c0a7d29..c9f7fae57a2db2d253463e2ab4d6ccf36093aae0 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_3xb64-210e_deepfasion2-shorts-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_3xb64-210e_deepfasion2-shorts-256x192.py @@ -1,85 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=192) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=294, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=294, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashion2Dataset' -data_mode = 'topdown' -data_root = 'data/deepfasion2/' +dataset_type = "DeepFashion2Dataset" +data_mode = "topdown" +data_root = "data/deepfasion2/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='train/deepfashion2_shorts_train.json', - data_prefix=dict(img='train/image/'), + ann_file="train/deepfashion2_shorts_train.json", + data_prefix=dict(img="train/image/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='validation/deepfashion2_shorts_validation.json', - data_prefix=dict(img='validation/image/'), + ann_file="validation/deepfashion2_shorts_validation.json", + data_prefix=dict(img="validation/image/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-short-sleeved-dress-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-short-sleeved-dress-256x192.py index 0bfcabaa5478596cc026309e5f57e6ea5db83abc..f956a211ac5bf83964d8fbb5e80b8f2e963a7ef2 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-short-sleeved-dress-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-short-sleeved-dress-256x192.py @@ -1,85 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=294, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=294, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashion2Dataset' -data_mode = 'topdown' -data_root = 'data/deepfasion2/' +dataset_type = "DeepFashion2Dataset" +data_mode = "topdown" +data_root = "data/deepfasion2/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='train/deepfashion2_short_sleeved_dress_train.json', - data_prefix=dict(img='train/image/'), + ann_file="train/deepfashion2_short_sleeved_dress_train.json", + data_prefix=dict(img="train/image/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='validation/deepfashion2_short_sleeved_dress_validation.json', - data_prefix=dict(img='validation/image/'), + ann_file="validation/deepfashion2_short_sleeved_dress_validation.json", + data_prefix=dict(img="validation/image/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-sling-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-sling-256x192.py index f627eb182c90b57ae53a4a9141f00ed333d3e229..8c48557ad331437b772d8da2e84ea000c1241aec 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-sling-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-sling-256x192.py @@ -1,85 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=294, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=294, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashion2Dataset' -data_mode = 'topdown' -data_root = 'data/deepfasion2/' +dataset_type = "DeepFashion2Dataset" +data_mode = "topdown" +data_root = "data/deepfasion2/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='train/deepfashion2_sling_train.json', - data_prefix=dict(img='train/image/'), + ann_file="train/deepfashion2_sling_train.json", + data_prefix=dict(img="train/image/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='validation/deepfashion2_sling_validation.json', - data_prefix=dict(img='validation/image/'), + ann_file="validation/deepfashion2_sling_validation.json", + data_prefix=dict(img="validation/image/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-sling-dress-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-sling-dress-256x192.py index 8b59607060c41a8ddbb4d38c5acc41e243cd2e96..6143247d61615959b9de9513959e48accf87fa20 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-sling-dress-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-sling-dress-256x192.py @@ -1,85 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=294, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=294, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashion2Dataset' -data_mode = 'topdown' -data_root = 'data/deepfasion2/' +dataset_type = "DeepFashion2Dataset" +data_mode = "topdown" +data_root = "data/deepfasion2/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='train/deepfashion2_sling_dress_train.json', - data_prefix=dict(img='train/image/'), + ann_file="train/deepfashion2_sling_dress_train.json", + data_prefix=dict(img="train/image/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='validation/deepfashion2_sling_dress_validation.json', - data_prefix=dict(img='validation/image/'), + ann_file="validation/deepfashion2_sling_dress_validation.json", + data_prefix=dict(img="validation/image/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-vest-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-vest-256x192.py index 4249d5a8971e80a4e068e51543b9191f36488542..b8296c1c8cc903fa9bc3b8f527fb3af9f739b3c5 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-vest-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_4xb64-210e_deepfasion2-vest-256x192.py @@ -1,85 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=294, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=294, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashion2Dataset' -data_mode = 'topdown' -data_root = 'data/deepfasion2/' +dataset_type = "DeepFashion2Dataset" +data_mode = "topdown" +data_root = "data/deepfasion2/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='train/deepfashion2_vest_train.json', - data_prefix=dict(img='train/image/'), + ann_file="train/deepfashion2_vest_train.json", + data_prefix=dict(img="train/image/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='validation/deepfashion2_vest_validation.json', - data_prefix=dict(img='validation/image/'), + ann_file="validation/deepfashion2_vest_validation.json", + data_prefix=dict(img="validation/image/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_6xb64-210e_deepfasion2-short-sleeved-shirt-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_6xb64-210e_deepfasion2-short-sleeved-shirt-256x192.py index 4161952dcf31904e8df8c70ff25ca207c1cea2ae..cfade3ad2a6593b57ab42ef734e5e5baced595dd 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_6xb64-210e_deepfasion2-short-sleeved-shirt-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_6xb64-210e_deepfasion2-short-sleeved-shirt-256x192.py @@ -1,85 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=384) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=294, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=294, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashion2Dataset' -data_mode = 'topdown' -data_root = 'data/deepfasion2/' +dataset_type = "DeepFashion2Dataset" +data_mode = "topdown" +data_root = "data/deepfasion2/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='train/deepfashion2_short_sleeved_shirt_train.json', - data_prefix=dict(img='train/image/'), + ann_file="train/deepfashion2_short_sleeved_shirt_train.json", + data_prefix=dict(img="train/image/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='validation/deepfashion2_short_sleeved_shirt_validation.json', - data_prefix=dict(img='validation/image/'), + ann_file="validation/deepfashion2_short_sleeved_shirt_validation.json", + data_prefix=dict(img="validation/image/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_8xb64-210e_deepfasion2-long-sleeved-outwear-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_8xb64-210e_deepfasion2-long-sleeved-outwear-256x192.py index 36e0318bf7a954fdbd35a8b59219a6cde2396df2..96b7661c3301ff2a55a197b5bb565b7e823c9d36 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_8xb64-210e_deepfasion2-long-sleeved-outwear-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_8xb64-210e_deepfasion2-long-sleeved-outwear-256x192.py @@ -1,85 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=294, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=294, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashion2Dataset' -data_mode = 'topdown' -data_root = 'data/deepfasion2/' +dataset_type = "DeepFashion2Dataset" +data_mode = "topdown" +data_root = "data/deepfasion2/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,37 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='train/deepfashion2_long_sleeved_outwear_train.json', - data_prefix=dict(img='train/image/'), + ann_file="train/deepfashion2_long_sleeved_outwear_train.json", + data_prefix=dict(img="train/image/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='validation/' - 'deepfashion2_long_sleeved_outwear_validation.json', - data_prefix=dict(img='validation/image/'), + ann_file="validation/" "deepfashion2_long_sleeved_outwear_validation.json", + data_prefix=dict(img="validation/image/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_8xb64-210e_deepfasion2-long-sleeved-shirt-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_8xb64-210e_deepfasion2-long-sleeved-shirt-256x192.py index f82e3cb5fb04011130521a35080b00f01a70ac68..b5fdaf0dc53afe40a8a95cc9567f957483ddeea2 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_8xb64-210e_deepfasion2-long-sleeved-shirt-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_8xb64-210e_deepfasion2-long-sleeved-shirt-256x192.py @@ -1,85 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=294, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=294, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashion2Dataset' -data_mode = 'topdown' -data_root = 'data/deepfasion2/' +dataset_type = "DeepFashion2Dataset" +data_mode = "topdown" +data_root = "data/deepfasion2/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='train/deepfashion2_long_sleeved_shirt_train.json', - data_prefix=dict(img='train/image/'), + ann_file="train/deepfashion2_long_sleeved_shirt_train.json", + data_prefix=dict(img="train/image/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='validation/deepfashion2_long_sleeved_shirt_validation.json', - data_prefix=dict(img='validation/image/'), + ann_file="validation/deepfashion2_long_sleeved_shirt_validation.json", + data_prefix=dict(img="validation/image/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_8xb64-210e_deepfasion2-short-sleeved-outwear-256x192.py b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_8xb64-210e_deepfasion2-short-sleeved-outwear-256x192.py index 30db99de9e96eaede42332daae3d55f578b941f2..abb3881d0574f7240ba75ce1beea78b60c699327 100644 --- a/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_8xb64-210e_deepfasion2-short-sleeved-outwear-256x192.py +++ b/mmpose/configs/fashion_2d_keypoint/topdown_heatmap/deepfashion2/td-hm_res50_8xb64-210e_deepfasion2-short-sleeved-outwear-256x192.py @@ -1,85 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - logger=dict(type='LoggerHook', interval=10), - checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(logger=dict(type="LoggerHook", interval=10), checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=294, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=294, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'DeepFashion2Dataset' -data_mode = 'topdown' -data_root = 'data/deepfasion2/' +dataset_type = "DeepFashion2Dataset" +data_mode = "topdown" +data_root = "data/deepfasion2/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,37 +72,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='train/deepfashion2_short_sleeved_outwear_train.json', - data_prefix=dict(img='train/image/'), + ann_file="train/deepfashion2_short_sleeved_outwear_train.json", + data_prefix=dict(img="train/image/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='validation/' - 'deepfashion2_short_sleeved_outwear_validation.json', - data_prefix=dict(img='validation/image/'), + ann_file="validation/" "deepfashion2_short_sleeved_outwear_validation.json", + data_prefix=dict(img="validation/image/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/rtmpose/coco_wholebody_hand/rtmpose-m_8xb32-210e_coco-wholebody-hand-256x256.py b/mmpose/configs/hand_2d_keypoint/rtmpose/coco_wholebody_hand/rtmpose-m_8xb32-210e_coco-wholebody-hand-256x256.py index 48c719339443eac75dfb4849553294751fc2f62d..fa777ee80df160b2b8e7323a87aac4f3606b852e 100644 --- a/mmpose/configs/hand_2d_keypoint/rtmpose/coco_wholebody_hand/rtmpose-m_8xb32-210e_coco-wholebody-hand-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/rtmpose/coco_wholebody_hand/rtmpose-m_8xb32-210e_coco-wholebody-hand-256x256.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,97 +10,78 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(256, 256), - sigma=(5.66, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(256, 256), sigma=(5.66, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=21, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyHandDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyHandDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -110,69 +91,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), # dict(type='RandomHalfBody'), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=180), + dict(type="RandomFlip", direction="horizontal"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], - rotate_factor=180), - dict(type='RandomFlip', direction='horizontal'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), # dict(type='RandomHalfBody'), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=180), + dict(type="RandomFlip", direction="horizontal"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=180), - dict(type='RandomFlip', direction='horizontal'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -180,53 +141,42 @@ train_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE') -] +val_evaluator = [dict(type="PCKAccuracy", thr=0.2), dict(type="AUC"), dict(type="EPE")] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/rtmpose/hand5/rtmpose-m_8xb256-210e_hand5-256x256.py b/mmpose/configs/hand_2d_keypoint/rtmpose/hand5/rtmpose-m_8xb256-210e_hand5-256x256.py index f329f1cb1df65ad7cff2ed255d6d89e859e78ea2..0ed57cefdd2903e5059e02d78adb35a0d93beb1b 100644 --- a/mmpose/configs/hand_2d_keypoint/rtmpose/hand5/rtmpose-m_8xb256-210e_hand5-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/rtmpose/hand5/rtmpose-m_8xb256-210e_hand5-256x256.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # coco-hand onehand10k freihand2d rhd2d halpehand @@ -12,163 +12,124 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(256, 256), - sigma=(5.66, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(256, 256), sigma=(5.66, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmpose/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmpose/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth", # noqa + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=21, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyHandDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoWholeBodyHandDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), # dict(type='RandomHalfBody'), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=180), + dict(type="RandomFlip", direction="horizontal"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], - rotate_factor=180), - dict(type='RandomFlip', direction='horizontal'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), # dict(type='RandomHalfBody'), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=180), + dict(type="RandomFlip", direction="horizontal"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=180), - dict(type='RandomFlip', direction='horizontal'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.2), - dict(type='MedianBlur', p=0.2), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.2), + dict(type="MedianBlur", p=0.2), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # train datasets @@ -176,38 +137,38 @@ dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ) dataset_onehand10k = dict( - type='OneHand10KDataset', + type="OneHand10KDataset", data_root=data_root, data_mode=data_mode, - ann_file='onehand10k/annotations/onehand10k_train.json', - data_prefix=dict(img='pose/OneHand10K/'), + ann_file="onehand10k/annotations/onehand10k_train.json", + data_prefix=dict(img="pose/OneHand10K/"), pipeline=[], ) dataset_freihand = dict( - type='FreiHandDataset', + type="FreiHandDataset", data_root=data_root, data_mode=data_mode, - ann_file='freihand/annotations/freihand_train.json', - data_prefix=dict(img='pose/FreiHand/'), + ann_file="freihand/annotations/freihand_train.json", + data_prefix=dict(img="pose/FreiHand/"), pipeline=[], ) dataset_rhd = dict( - type='Rhd2DDataset', + type="Rhd2DDataset", data_root=data_root, data_mode=data_mode, - ann_file='rhd/annotations/rhd_train.json', - data_prefix=dict(img='pose/RHD/'), + ann_file="rhd/annotations/rhd_train.json", + data_prefix=dict(img="pose/RHD/"), pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=21, mapping=[ (0, 0), @@ -231,16 +192,17 @@ dataset_rhd = dict( (18, 19), (19, 18), (20, 17), - ]) + ], + ) ], ) dataset_halpehand = dict( - type='HalpeHandDataset', + type="HalpeHandDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015/'), + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015/"), pipeline=[], ) @@ -249,56 +211,53 @@ train_dataloader = dict( batch_size=256, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict( - from_file='configs/_base_/datasets/coco_wholebody_hand.py'), - datasets=[ - dataset_coco, dataset_onehand10k, dataset_freihand, dataset_rhd, - dataset_halpehand - ], + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody_hand.py"), + datasets=[dataset_coco, dataset_onehand10k, dataset_freihand, dataset_rhd, dataset_halpehand], pipeline=train_pipeline, test_mode=False, - )) + ), +) # test datasets val_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='detection/coco/val2017/'), + ann_file="coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="detection/coco/val2017/"), pipeline=[], ) val_onehand10k = dict( - type='OneHand10KDataset', + type="OneHand10KDataset", data_root=data_root, data_mode=data_mode, - ann_file='onehand10k/annotations/onehand10k_test.json', - data_prefix=dict(img='pose/OneHand10K/'), + ann_file="onehand10k/annotations/onehand10k_test.json", + data_prefix=dict(img="pose/OneHand10K/"), pipeline=[], ) val_freihand = dict( - type='FreiHandDataset', + type="FreiHandDataset", data_root=data_root, data_mode=data_mode, - ann_file='freihand/annotations/freihand_test.json', - data_prefix=dict(img='pose/FreiHand/'), + ann_file="freihand/annotations/freihand_test.json", + data_prefix=dict(img="pose/FreiHand/"), pipeline=[], ) val_rhd = dict( - type='Rhd2DDataset', + type="Rhd2DDataset", data_root=data_root, data_mode=data_mode, - ann_file='rhd/annotations/rhd_test.json', - data_prefix=dict(img='pose/RHD/'), + ann_file="rhd/annotations/rhd_test.json", + data_prefix=dict(img="pose/RHD/"), pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=21, mapping=[ (0, 0), @@ -322,16 +281,17 @@ val_rhd = dict( (18, 19), (19, 18), (20, 17), - ]) + ], + ) ], ) val_halpehand = dict( - type='HalpeHandDataset', + type="HalpeHandDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_val_v1.json', - data_prefix=dict(img='detection/coco/val2017/'), + ann_file="halpe/annotations/halpe_val_v1.json", + data_prefix=dict(img="detection/coco/val2017/"), pipeline=[], ) @@ -340,41 +300,26 @@ test_dataloader = dict( num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CombinedDataset', - metainfo=dict( - from_file='configs/_base_/datasets/coco_wholebody_hand.py'), - datasets=[ - val_coco, val_onehand10k, val_freihand, val_rhd, val_halpehand - ], + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody_hand.py"), + datasets=[val_coco, val_onehand10k, val_freihand, val_rhd, val_halpehand], pipeline=val_pipeline, test_mode=True, - )) + ), +) val_dataloader = test_dataloader # hooks -default_hooks = dict( - checkpoint=dict(save_best='AUC', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE') -] +val_evaluator = [dict(type="PCKAccuracy", thr=0.2), dict(type="AUC"), dict(type="EPE")] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_hourglass52_8xb32-210e_coco-wholebody-hand-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_hourglass52_8xb32-210e_coco-wholebody-hand-256x256.py index e0bc1c8739c9d8ea1fc585882abe5b8189087e2a..d7ce323d1597c664df9f88d7ffd5cfe865a8d510 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_hourglass52_8xb32-210e_coco-wholebody-hand-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_hourglass52_8xb32-210e_coco-wholebody-hand-256x256.py @@ -1,87 +1,75 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HourglassNet', + type="HourglassNet", num_stacks=1, ), head=dict( - type='CPMHead', + type="CPMHead", in_channels=256, out_channels=21, num_stages=1, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyHandDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyHandDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - rotate_factor=180.0, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=180.0, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,35 +77,33 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE') -] +val_evaluator = [dict(type="PCKAccuracy", thr=0.2), dict(type="AUC"), dict(type="EPE")] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_hrnetv2-w18_8xb32-210e_coco-wholebody-hand-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_hrnetv2-w18_8xb32-210e_coco-wholebody-hand-256x256.py index a9b9f0f281b9bc72598b9e1ffacd99f58248175d..2a8d6ba3bdab2877bb84203693cc2693bb516a32 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_hrnetv2-w18_8xb32-210e_coco-wholebody-hand-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_hrnetv2-w18_8xb32-210e_coco-wholebody-hand-256x256.py @@ -1,118 +1,94 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18')), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), + ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=21, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyHandDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyHandDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -120,35 +96,33 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE') -] +val_evaluator = [dict(type="PCKAccuracy", thr=0.2), dict(type="AUC"), dict(type="EPE")] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_hrnetv2-w18_dark-8xb32-210e_coco-wholebody-hand-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_hrnetv2-w18_dark-8xb32-210e_coco-wholebody-hand-256x256.py index 5d67f393f6612aab494a47c15bd9ce7b68fc8b4d..f26fe6500611e11621bfa695ecdb64c084f564b9 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_hrnetv2-w18_dark-8xb32-210e_coco-wholebody-hand-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_hrnetv2-w18_dark-8xb32-210e_coco-wholebody-hand-256x256.py @@ -1,122 +1,94 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(256, 256), - heatmap_size=(64, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18')), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), + ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=21, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyHandDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyHandDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='RandomFlip', direction='horizontal'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="RandomFlip", direction="horizontal"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -124,35 +96,33 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE') -] +val_evaluator = [dict(type="PCKAccuracy", thr=0.2), dict(type="AUC"), dict(type="EPE")] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_litehrnet-w18_8xb32-210e_coco-wholebody-hand-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_litehrnet-w18_8xb32-210e_coco-wholebody-hand-256x256.py index f3a6150e49e687bb3d510bd7139d66bd8ac8b37f..fa36eeb64c5bf253907156c09e8a518295de1645 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_litehrnet-w18_8xb32-210e_coco-wholebody-hand-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_litehrnet-w18_8xb32-210e_coco-wholebody-hand-256x256.py @@ -1,47 +1,36 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='LiteHRNet', + type="LiteHRNet", in_channels=3, extra=dict( stem=dict(stem_channels=32, out_channels=32, expand_ratio=1), @@ -50,51 +39,53 @@ model = dict( num_modules=(2, 4, 2), num_branches=(2, 3, 4), num_blocks=(2, 2, 2), - module_type=('LITE', 'LITE', 'LITE'), + module_type=("LITE", "LITE", "LITE"), with_fuse=(True, True, True), reduce_ratios=(8, 8, 8), num_channels=( (40, 80), (40, 80, 160), (40, 80, 160, 320), - )), + ), + ), with_head=True, - )), + ), + ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=40, out_channels=21, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyHandDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyHandDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='RandomFlip', direction='horizontal'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="RandomFlip", direction="horizontal"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -102,35 +93,33 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE') -] +val_evaluator = [dict(type="PCKAccuracy", thr=0.2), dict(type="AUC"), dict(type="EPE")] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_mobilenetv2_8xb32-210e_coco-wholebody-hand-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_mobilenetv2_8xb32-210e_coco-wholebody-hand-256x256.py index dba8538a5fe7b4313b888cae5a21f0c55b58c340..83ba16869b3953adfc59a8fee72aea8c93742943 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_mobilenetv2_8xb32-210e_coco-wholebody-hand-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_mobilenetv2_8xb32-210e_coco-wholebody-hand-256x256.py @@ -1,84 +1,67 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='MobileNetV2', - widen_factor=1., - out_indices=(7, ), - init_cfg=dict(type='Pretrained', checkpoint='mmcls://mobilenet_v2')), + type="MobileNetV2", widen_factor=1.0, out_indices=(7,), init_cfg=dict(type="Pretrained", checkpoint="mmcls://mobilenet_v2") + ), head=dict( - type='HeatmapHead', - in_channels=1280, - out_channels=21, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1280, out_channels=21, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyHandDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyHandDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='RandomFlip', direction='horizontal'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="RandomFlip", direction="horizontal"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -86,35 +69,33 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE') -] +val_evaluator = [dict(type="PCKAccuracy", thr=0.2), dict(type="AUC"), dict(type="EPE")] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_res50_8xb32-210e_coco-wholebody-hand-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_res50_8xb32-210e_coco-wholebody-hand-256x256.py index c04950bfaabcfebc806ce541e8d5285d0bca75be..58536912a5f2c003d1ff5493c6e70f4533c15406 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_res50_8xb32-210e_coco-wholebody-hand-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_res50_8xb32-210e_coco-wholebody-hand-256x256.py @@ -1,83 +1,65 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict( - type='ResNet', - depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ResNet", depth=50, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50")), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=21, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=21, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyHandDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyHandDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='RandomFlip', direction='horizontal'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="RandomFlip", direction="horizontal"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -85,35 +67,33 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE') -] +val_evaluator = [dict(type="PCKAccuracy", thr=0.2), dict(type="AUC"), dict(type="EPE")] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_scnet50_8xb32-210e_coco-wholebody-hand-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_scnet50_8xb32-210e_coco-wholebody-hand-256x256.py index f596227c5c109fe51b0fd822c1f2b26b4abaae83..c18835558f196c5d49b0804603b2cef8ba2e1c3e 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_scnet50_8xb32-210e_coco-wholebody-hand-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/coco_wholebody_hand/td-hm_scnet50_8xb32-210e_coco-wholebody-hand-256x256.py @@ -1,86 +1,69 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='SCNet', + type="SCNet", depth=50, - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/scnet50-7ef0a199.pth')), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/scnet50-7ef0a199.pth"), + ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=21, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=21, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyHandDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyHandDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='RandomFlip', direction='horizontal'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="RandomFlip", direction="horizontal"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -88,35 +71,33 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE') -] +val_evaluator = [dict(type="PCKAccuracy", thr=0.2), dict(type="AUC"), dict(type="EPE")] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/freihand2d/td-hm_res50_8xb64-100e_freihand2d-224x224.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/freihand2d/td-hm_res50_8xb64-100e_freihand2d-224x224.py index cd1750cdebc9d977ae917432b3e714ee1275f3d8..5d62d7dd487b11676844acdb7d2a33d19a11aa1b 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/freihand2d/td-hm_res50_8xb64-100e_freihand2d-224x224.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/freihand2d/td-hm_res50_8xb64-100e_freihand2d-224x224.py @@ -1,87 +1,66 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=100, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=100, - milestones=[50, 70], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=100, milestones=[50, 70], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='AUC', rule='greater', interval=1)) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater", interval=1)) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(224, 224), heatmap_size=(56, 56), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(224, 224), heatmap_size=(56, 56), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict( - type='ResNet', - depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ResNet", depth=50, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50")), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=21, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=21, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'FreiHandDataset' -data_mode = 'topdown' -data_root = 'data/freihand/' +dataset_type = "FreiHandDataset" +data_mode = "topdown" +data_root = "data/freihand/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale', padding=0.8), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', - shift_factor=0.25, - rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale", padding=0.8), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", shift_factor=0.25, rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale', padding=0.8), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale", padding=0.8), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,50 +68,53 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/freihand_train.json', - data_prefix=dict(img=''), + ann_file="annotations/freihand_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/freihand_val.json', - data_prefix=dict(img=''), + ann_file="annotations/freihand_val.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/freihand_test.json', - data_prefix=dict(img=''), + ann_file="annotations/freihand_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_hrnetv2-w18_8xb64-210e_onehand10k-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_hrnetv2-w18_8xb64-210e_onehand10k-256x256.py index 99419065aa879884fddb6afa568257fb1b9fe340..428717bcc7d1e008aa35d2e2ccad903352b18893 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_hrnetv2-w18_8xb64-210e_onehand10k-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_hrnetv2-w18_8xb64-210e_onehand10k-256x256.py @@ -1,121 +1,98 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://msra/hrnetv2_w18', - )), + type="Pretrained", + checkpoint="open-mmlab://msra/hrnetv2_w18", + ), + ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=21, deconv_out_channels=None, - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'OneHand10KDataset' -data_mode = 'topdown' -data_root = 'data/onehand10k/' +dataset_type = "OneHand10KDataset" +data_mode = "topdown" +data_root = "data/onehand10k/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -123,36 +100,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/onehand10k_train.json', - data_prefix=dict(img=''), + ann_file="annotations/onehand10k_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/onehand10k_test.json', - data_prefix=dict(img=''), + ann_file="annotations/onehand10k_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_hrnetv2-w18_dark-8xb64-210e_onehand10k-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_hrnetv2-w18_dark-8xb64-210e_onehand10k-256x256.py index 610e9d149b658166a37d1fa1a028efab32d0637d..857e0f65bb9a4c7fc59266cfa4ad5578046df2cc 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_hrnetv2-w18_dark-8xb64-210e_onehand10k-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_hrnetv2-w18_dark-8xb64-210e_onehand10k-256x256.py @@ -1,125 +1,98 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(256, 256), - heatmap_size=(64, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://msra/hrnetv2_w18', - )), + type="Pretrained", + checkpoint="open-mmlab://msra/hrnetv2_w18", + ), + ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=21, deconv_out_channels=None, - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'OneHand10KDataset' -data_mode = 'topdown' -data_root = 'data/onehand10k/' +dataset_type = "OneHand10KDataset" +data_mode = "topdown" +data_root = "data/onehand10k/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -127,36 +100,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/onehand10k_train.json', - data_prefix=dict(img=''), + ann_file="annotations/onehand10k_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/onehand10k_test.json', - data_prefix=dict(img=''), + ann_file="annotations/onehand10k_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_hrnetv2-w18_udp-8xb64-210e_onehand10k-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_hrnetv2-w18_udp-8xb64-210e_onehand10k-256x256.py index 54e2220d636601ac4a19a116aa6d0aabe138dbef..5e2ef730951ff36c3f24f05559c754d668f4c91b 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_hrnetv2-w18_udp-8xb64-210e_onehand10k-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_hrnetv2-w18_udp-8xb64-210e_onehand10k-256x256.py @@ -1,121 +1,98 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://msra/hrnetv2_w18', - )), + type="Pretrained", + checkpoint="open-mmlab://msra/hrnetv2_w18", + ), + ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=21, deconv_out_channels=None, - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'OneHand10KDataset' -data_mode = 'topdown' -data_root = 'data/onehand10k/' +dataset_type = "OneHand10KDataset" +data_mode = "topdown" +data_root = "data/onehand10k/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -123,36 +100,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/onehand10k_train.json', - data_prefix=dict(img=''), + ann_file="annotations/onehand10k_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/onehand10k_test.json', - data_prefix=dict(img=''), + ann_file="annotations/onehand10k_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_mobilenetv2_8xb64-210e_onehand10k-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_mobilenetv2_8xb64-210e_onehand10k-256x256.py index 1f4e61c37c5692b62d407f642e23b38197e23d47..2f80c647809aca6d47d05966e9abf4bc5730c911 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_mobilenetv2_8xb64-210e_onehand10k-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_mobilenetv2_8xb64-210e_onehand10k-256x256.py @@ -1,88 +1,74 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='MobileNetV2', - widen_factor=1., - out_indices=(7, ), + type="MobileNetV2", + widen_factor=1.0, + out_indices=(7,), init_cfg=dict( - type='Pretrained', - checkpoint='mmcls://mobilenet_v2', - )), + type="Pretrained", + checkpoint="mmcls://mobilenet_v2", + ), + ), head=dict( - type='HeatmapHead', - in_channels=1280, - out_channels=21, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1280, out_channels=21, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'OneHand10KDataset' -data_mode = 'topdown' -data_root = 'data/onehand10k/' +dataset_type = "OneHand10KDataset" +data_mode = "topdown" +data_root = "data/onehand10k/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -90,36 +76,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/onehand10k_train.json', - data_prefix=dict(img=''), + ann_file="annotations/onehand10k_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/onehand10k_test.json', - data_prefix=dict(img=''), + ann_file="annotations/onehand10k_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_res50_8xb32-210e_onehand10k-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_res50_8xb32-210e_onehand10k-256x256.py index 36589d899ddd930143749845b5fd5650917d23ec..090c6277418d3350ee238445dba6c23b1ed05840 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_res50_8xb32-210e_onehand10k-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/onehand10k/td-hm_res50_8xb32-210e_onehand10k-256x256.py @@ -1,87 +1,73 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, init_cfg=dict( - type='Pretrained', - checkpoint='torchvision://resnet50', - )), + type="Pretrained", + checkpoint="torchvision://resnet50", + ), + ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=21, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=21, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'OneHand10KDataset' -data_mode = 'topdown' -data_root = 'data/onehand10k/' +dataset_type = "OneHand10KDataset" +data_mode = "topdown" +data_root = "data/onehand10k/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,36 +75,38 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/onehand10k_train.json', - data_prefix=dict(img=''), + ann_file="annotations/onehand10k_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/onehand10k_test.json', - data_prefix=dict(img=''), + ann_file="annotations/onehand10k_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_hrnetv2-w18_8xb64-210e_rhd2d-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_hrnetv2-w18_8xb64-210e_rhd2d-256x256.py index 4a9bcc9b896ae499e034605209f1c7eb14ba7b39..34bd107dece2e356153ab347d2e348c6c8a7fbad 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_hrnetv2-w18_8xb64-210e_rhd2d-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_hrnetv2-w18_8xb64-210e_rhd2d-256x256.py @@ -1,121 +1,98 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://msra/hrnetv2_w18', - )), + type="Pretrained", + checkpoint="open-mmlab://msra/hrnetv2_w18", + ), + ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=21, deconv_out_channels=None, - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'Rhd2DDataset' -data_mode = 'topdown' -data_root = 'data/rhd/' +dataset_type = "Rhd2DDataset" +data_mode = "topdown" +data_root = "data/rhd/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -123,36 +100,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/rhd_train.json', - data_prefix=dict(img=''), + ann_file="annotations/rhd_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/rhd_test.json', - data_prefix=dict(img=''), + ann_file="annotations/rhd_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_hrnetv2-w18_dark-8xb64-210e_rhd2d-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_hrnetv2-w18_dark-8xb64-210e_rhd2d-256x256.py index 44b8dc0f5a1c55d10293c40e5b8314fca6aa9b9c..6ed7c02b64c8ec32f15b9a01b86f66fc08bb452c 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_hrnetv2-w18_dark-8xb64-210e_rhd2d-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_hrnetv2-w18_dark-8xb64-210e_rhd2d-256x256.py @@ -1,125 +1,98 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(256, 256), - heatmap_size=(64, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://msra/hrnetv2_w18', - )), + type="Pretrained", + checkpoint="open-mmlab://msra/hrnetv2_w18", + ), + ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=21, deconv_out_channels=None, - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'Rhd2DDataset' -data_mode = 'topdown' -data_root = 'data/rhd/' +dataset_type = "Rhd2DDataset" +data_mode = "topdown" +data_root = "data/rhd/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -127,36 +100,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/rhd_train.json', - data_prefix=dict(img=''), + ann_file="annotations/rhd_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/rhd_test.json', - data_prefix=dict(img=''), + ann_file="annotations/rhd_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_hrnetv2-w18_udp-8xb64-210e_rhd2d-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_hrnetv2-w18_udp-8xb64-210e_rhd2d-256x256.py index d1c796234dd22760f6f52dcb97d05ad0410ceabb..bd8c51ba18cf02964c92b088dfaf1ace36c383d2 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_hrnetv2-w18_udp-8xb64-210e_rhd2d-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_hrnetv2-w18_udp-8xb64-210e_rhd2d-256x256.py @@ -1,121 +1,98 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(18, 36)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(18, 36, 72)), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(18, 36)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, - block='BASIC', + block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144), - multiscale_output=True), - upsample=dict(mode='bilinear', align_corners=False)), + multiscale_output=True, + ), + upsample=dict(mode="bilinear", align_corners=False), + ), init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://msra/hrnetv2_w18', - )), + type="Pretrained", + checkpoint="open-mmlab://msra/hrnetv2_w18", + ), + ), neck=dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=270, out_channels=21, deconv_out_channels=None, - conv_out_channels=(270, ), - conv_kernel_sizes=(1, ), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + conv_out_channels=(270,), + conv_kernel_sizes=(1,), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'Rhd2DDataset' -data_mode = 'topdown' -data_root = 'data/rhd/' +dataset_type = "Rhd2DDataset" +data_mode = "topdown" +data_root = "data/rhd/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -123,36 +100,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/rhd_train.json', - data_prefix=dict(img=''), + ann_file="annotations/rhd_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/rhd_test.json', - data_prefix=dict(img=''), + ann_file="annotations/rhd_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_mobilenetv2_8xb64-210e_rhd2d-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_mobilenetv2_8xb64-210e_rhd2d-256x256.py index d7176bacd73cad44295c84ae8f4b9b1d1201bf35..bd2228d7a71390df2a11b8cbce3a5f589f77a29d 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_mobilenetv2_8xb64-210e_rhd2d-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_mobilenetv2_8xb64-210e_rhd2d-256x256.py @@ -1,88 +1,74 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='MobileNetV2', - widen_factor=1., - out_indices=(7, ), + type="MobileNetV2", + widen_factor=1.0, + out_indices=(7,), init_cfg=dict( - type='Pretrained', - checkpoint='mmcls://mobilenet_v2', - )), + type="Pretrained", + checkpoint="mmcls://mobilenet_v2", + ), + ), head=dict( - type='HeatmapHead', - in_channels=1280, - out_channels=21, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1280, out_channels=21, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'Rhd2DDataset' -data_mode = 'topdown' -data_root = 'data/rhd/' +dataset_type = "Rhd2DDataset" +data_mode = "topdown" +data_root = "data/rhd/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -90,36 +76,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/rhd_train.json', - data_prefix=dict(img=''), + ann_file="annotations/rhd_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/rhd_test.json', - data_prefix=dict(img=''), + ann_file="annotations/rhd_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_res50_8xb64-210e_rhd2d-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_res50_8xb64-210e_rhd2d-256x256.py index da5556802891a6d742527e7889d0f31161223eef..0f01ac6a7cadfadf8f4f1e92bb7bff3f9396219a 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_res50_8xb64-210e_rhd2d-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_heatmap/rhd2d/td-hm_res50_8xb64-210e_rhd2d-256x256.py @@ -1,87 +1,73 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(256, 256), heatmap_size=(64, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(256, 256), heatmap_size=(64, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, init_cfg=dict( - type='Pretrained', - checkpoint='torchvision://resnet50', - )), + type="Pretrained", + checkpoint="torchvision://resnet50", + ), + ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=21, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=21, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'Rhd2DDataset' -data_mode = 'topdown' -data_root = 'data/rhd/' +dataset_type = "Rhd2DDataset" +data_mode = "topdown" +data_root = "data/rhd/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,36 +75,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/rhd_train.json', - data_prefix=dict(img=''), + ann_file="annotations/rhd_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/rhd_test.json', - data_prefix=dict(img=''), + ann_file="annotations/rhd_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_regression/onehand10k/td-reg_res50_8xb64-210e_onehand10k-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_regression/onehand10k/td-reg_res50_8xb64-210e_onehand10k-256x256.py index ee1556d45e18e3253f421948d1affd4eabfe673f..2caa90a5d5fa10730e1f8424e1650bc2cbc7d4b7 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_regression/onehand10k/td-reg_res50_8xb64-210e_onehand10k-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_regression/onehand10k/td-reg_res50_8xb64-210e_onehand10k-256x256.py @@ -1,85 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict(type='RegressionLabel', input_size=(256, 256)) +codec = dict(type="RegressionLabel", input_size=(256, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), - neck=dict(type='GlobalAveragePooling'), + neck=dict(type="GlobalAveragePooling"), head=dict( - type='RegressionHead', - in_channels=2048, - num_joints=21, - loss=dict(type='SmoothL1Loss', use_target_weight=True), - decoder=codec), + type="RegressionHead", in_channels=2048, num_joints=21, loss=dict(type="SmoothL1Loss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'OneHand10KDataset' -data_mode = 'topdown' -data_root = 'data/onehand10k/' +dataset_type = "OneHand10KDataset" +data_mode = "topdown" +data_root = "data/onehand10k/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +73,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/onehand10k_train.json', - data_prefix=dict(img=''), + ann_file="annotations/onehand10k_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/onehand10k_test.json', - data_prefix=dict(img=''), + ann_file="annotations/onehand10k_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_2d_keypoint/topdown_regression/rhd2d/td-reg_res50_8xb64-210e_rhd2d-256x256.py b/mmpose/configs/hand_2d_keypoint/topdown_regression/rhd2d/td-reg_res50_8xb64-210e_rhd2d-256x256.py index a350c24bfe2d3d7246ed63c78f737414ee5f247e..e03a1e9df0cf0be95a7aa54a38896aafa72660ce 100644 --- a/mmpose/configs/hand_2d_keypoint/topdown_regression/rhd2d/td-reg_res50_8xb64-210e_rhd2d-256x256.py +++ b/mmpose/configs/hand_2d_keypoint/topdown_regression/rhd2d/td-reg_res50_8xb64-210e_rhd2d-256x256.py @@ -1,85 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict(checkpoint=dict(save_best='AUC', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="AUC", rule="greater")) # codec settings -codec = dict(type='RegressionLabel', input_size=(256, 256)) +codec = dict(type="RegressionLabel", input_size=(256, 256)) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), - neck=dict(type='GlobalAveragePooling'), + neck=dict(type="GlobalAveragePooling"), head=dict( - type='RegressionHead', - in_channels=2048, - num_joints=21, - loss=dict(type='SmoothL1Loss', use_target_weight=True), - decoder=codec), + type="RegressionHead", in_channels=2048, num_joints=21, loss=dict(type="SmoothL1Loss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'Rhd2DDataset' -data_mode = 'topdown' -data_root = 'data/rhd/' +dataset_type = "Rhd2DDataset" +data_mode = "topdown" +data_root = "data/rhd/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict( - type='RandomBBoxTransform', rotate_factor=180, - scale_factor=(0.7, 1.3)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomBBoxTransform", rotate_factor=180, scale_factor=(0.7, 1.3)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,36 +73,38 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/rhd_train.json', - data_prefix=dict(img=''), + ann_file="annotations/rhd_train.json", + data_prefix=dict(img=""), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/rhd_test.json', - data_prefix=dict(img=''), + ann_file="annotations/rhd_test.json", + data_prefix=dict(img=""), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # evaluators val_evaluator = [ - dict(type='PCKAccuracy', thr=0.2), - dict(type='AUC'), - dict(type='EPE'), + dict(type="PCKAccuracy", thr=0.2), + dict(type="AUC"), + dict(type="EPE"), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/hand_3d_keypoint/internet/interhand3d/internet_res50_4xb16-20e_interhand3d-256x256.py b/mmpose/configs/hand_3d_keypoint/internet/interhand3d/internet_res50_4xb16-20e_interhand3d-256x256.py index ffe9f0f051cce39c54cecf22a1b0f38983d84ce6..d4b8d8a84e96a13268a1248970699159f8e7a681 100644 --- a/mmpose/configs/hand_3d_keypoint/internet/interhand3d/internet_res50_4xb16-20e_interhand3d-256x256.py +++ b/mmpose/configs/hand_3d_keypoint/internet/interhand3d/internet_res50_4xb16-20e_interhand3d-256x256.py @@ -1,116 +1,120 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # visualization vis_backends = [ - dict(type='LocalVisBackend'), + dict(type="LocalVisBackend"), ] -visualizer = dict( - type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer') +visualizer = dict(type="Pose3dLocalVisualizer", vis_backends=vis_backends, name="visualizer") # runtime train_cfg = dict(max_epochs=20, val_interval=1) # optimizer -optim_wrapper = dict(optimizer=dict(type='Adam', lr=0.0002)) +optim_wrapper = dict(optimizer=dict(type="Adam", lr=0.0002)) # learning policy -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=20, - milestones=[15, 17], - gamma=0.1, - by_epoch=True) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=20, milestones=[15, 17], gamma=0.1, by_epoch=True)] auto_scale_lr = dict(base_batch_size=128) # hooks default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', - interval=1, - save_best='MPJPE_all', - rule='less', - max_keep_ckpts=1), - logger=dict(type='LoggerHook', interval=20), + checkpoint=dict(type="CheckpointHook", interval=1, save_best="MPJPE_all", rule="less", max_keep_ckpts=1), + logger=dict(type="LoggerHook", interval=20), ) # codec settings codec = dict( - type='Hand3DHeatmap', - image_size=[256, 256], - root_heatmap_size=64, - heatmap_size=[64, 64, 64], - sigma=2.5, - max_bound=255, - depth_size=64) + type="Hand3DHeatmap", image_size=[256, 256], root_heatmap_size=64, heatmap_size=[64, 64, 64], sigma=2.5, max_bound=255, depth_size=64 +) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict( - type='ResNet', - depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ResNet", depth=50, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50")), head=dict( - type='InternetHead', + type="InternetHead", keypoint_head_cfg=dict( in_channels=2048, out_channels=21 * 64, - depth_size=codec['depth_size'], + depth_size=codec["depth_size"], deconv_out_channels=(256, 256, 256), deconv_kernel_sizes=(4, 4, 4), ), root_head_cfg=dict( in_channels=2048, - heatmap_size=codec['root_heatmap_size'], - hidden_dims=(512, ), + heatmap_size=codec["root_heatmap_size"], + hidden_dims=(512,), ), hand_type_head_cfg=dict( in_channels=2048, num_labels=2, - hidden_dims=(512, ), + hidden_dims=(512,), ), - decoder=codec), - test_cfg=dict(flip_test=False)) + decoder=codec, + ), + test_cfg=dict(flip_test=False), +) # base dataset settings -dataset_type = 'InterHand3DDataset' -data_mode = 'topdown' -data_root = 'data/interhand2.6m/' +dataset_type = "InterHand3DDataset" +data_mode = "topdown" +data_root = "data/interhand2.6m/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='HandRandomFlip', prob=0.5), - dict(type='RandomBBoxTransform', rotate_factor=90.0), - dict(type='TopdownAffine', input_size=codec['image_size']), - dict(type='GenerateTarget', encoder=codec), + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="HandRandomFlip", prob=0.5), + dict(type="RandomBBoxTransform", rotate_factor=90.0), + dict(type="TopdownAffine", input_size=codec["image_size"]), + dict(type="GenerateTarget", encoder=codec), dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'rotation', 'img_shape', - 'focal', 'principal_pt', 'input_size', 'input_center', - 'input_scale', 'hand_type', 'hand_type_valid', 'flip', - 'flip_indices', 'abs_depth')) + type="PackPoseInputs", + meta_keys=( + "id", + "img_id", + "img_path", + "rotation", + "img_shape", + "focal", + "principal_pt", + "input_size", + "input_center", + "input_scale", + "hand_type", + "hand_type_valid", + "flip", + "flip_indices", + "abs_depth", + ), + ), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['image_size']), + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["image_size"]), dict( - type='PackPoseInputs', - meta_keys=('id', 'img_id', 'img_path', 'rotation', 'img_shape', - 'focal', 'principal_pt', 'input_size', 'input_center', - 'input_scale', 'hand_type', 'hand_type_valid', 'flip', - 'flip_indices', 'abs_depth')) + type="PackPoseInputs", + meta_keys=( + "id", + "img_id", + "img_path", + "rotation", + "img_shape", + "focal", + "principal_pt", + "input_size", + "input_center", + "input_scale", + "hand_type", + "hand_type_valid", + "flip", + "flip_indices", + "abs_depth", + ), + ), ] # data loaders @@ -119,64 +123,61 @@ train_dataloader = dict( num_workers=1, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, - ann_file='annotations/all/InterHand2.6M_train_data.json', - camera_param_file='annotations/all/InterHand2.6M_train_camera.json', - joint_file='annotations/all/InterHand2.6M_train_joint_3d.json', + ann_file="annotations/all/InterHand2.6M_train_data.json", + camera_param_file="annotations/all/InterHand2.6M_train_camera.json", + joint_file="annotations/all/InterHand2.6M_train_joint_3d.json", use_gt_root_depth=True, rootnet_result_file=None, data_mode=data_mode, data_root=data_root, - data_prefix=dict(img='images/train/'), + data_prefix=dict(img="images/train/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=16, num_workers=1, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - ann_file='annotations/machine_annot/InterHand2.6M_val_data.json', - camera_param_file='annotations/machine_annot/' - 'InterHand2.6M_val_camera.json', - joint_file='annotations/machine_annot/InterHand2.6M_val_joint_3d.json', + ann_file="annotations/machine_annot/InterHand2.6M_val_data.json", + camera_param_file="annotations/machine_annot/" "InterHand2.6M_val_camera.json", + joint_file="annotations/machine_annot/InterHand2.6M_val_joint_3d.json", use_gt_root_depth=True, rootnet_result_file=None, data_mode=data_mode, data_root=data_root, - data_prefix=dict(img='images/val/'), + data_prefix=dict(img="images/val/"), pipeline=val_pipeline, test_mode=True, - )) + ), +) test_dataloader = dict( batch_size=16, num_workers=1, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, - ann_file='annotations/all/' - 'InterHand2.6M_test_data.json', - camera_param_file='annotations/all/' - 'InterHand2.6M_test_camera.json', - joint_file='annotations/all/' - 'InterHand2.6M_test_joint_3d.json', + ann_file="annotations/all/" "InterHand2.6M_test_data.json", + camera_param_file="annotations/all/" "InterHand2.6M_test_camera.json", + joint_file="annotations/all/" "InterHand2.6M_test_joint_3d.json", use_gt_root_depth=True, rootnet_result_file=None, data_mode=data_mode, data_root=data_root, - data_prefix=dict(img='images/test/'), + data_prefix=dict(img="images/test/"), pipeline=val_pipeline, test_mode=True, - )) + ), +) # evaluators -val_evaluator = [ - dict(type='InterHandMetric', modes=['MPJPE', 'MRRPE', 'HandednessAcc']) -] +val_evaluator = [dict(type="InterHandMetric", modes=["MPJPE", "MRRPE", "HandednessAcc"])] test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/CO-DETR/codetr/__init__.py b/mmpose/configs/mmdet/CO-DETR/codetr/__init__.py index 2ca4c02d9f7b71643b3b63ef4df254b87d4f9661..7b3a4d061846b7b1a87b21bc39ee66279577751a 100644 --- a/mmpose/configs/mmdet/CO-DETR/codetr/__init__.py +++ b/mmpose/configs/mmdet/CO-DETR/codetr/__init__.py @@ -3,11 +3,15 @@ from .co_atss_head import CoATSSHead from .co_dino_head import CoDINOHead from .co_roi_head import CoStandardRoIHead from .codetr import CoDETR -from .transformer import (CoDinoTransformer, DetrTransformerDecoderLayer, - DetrTransformerEncoder, DinoTransformerDecoder) +from .transformer import CoDinoTransformer, DetrTransformerDecoderLayer, DetrTransformerEncoder, DinoTransformerDecoder __all__ = [ - 'CoDETR', 'CoDinoTransformer', 'DinoTransformerDecoder', 'CoDINOHead', - 'CoATSSHead', 'CoStandardRoIHead', 'DetrTransformerEncoder', - 'DetrTransformerDecoderLayer' + "CoDETR", + "CoDinoTransformer", + "DinoTransformerDecoder", + "CoDINOHead", + "CoATSSHead", + "CoStandardRoIHead", + "DetrTransformerEncoder", + "DetrTransformerDecoderLayer", ] diff --git a/mmpose/configs/mmdet/CO-DETR/codetr/co_atss_head.py b/mmpose/configs/mmdet/CO-DETR/codetr/co_atss_head.py index c6ae0180da7be292b67a5bb83c1ad34b848ff17a..2df8f1e093c0698e9d956d9a6360107495a13c55 100644 --- a/mmpose/configs/mmdet/CO-DETR/codetr/co_atss_head.py +++ b/mmpose/configs/mmdet/CO-DETR/codetr/co_atss_head.py @@ -1,25 +1,25 @@ from typing import List import torch -from torch import Tensor - from mmdet.models.dense_heads import ATSSHead from mmdet.models.utils import images_to_levels, multi_apply from mmdet.registry import MODELS from mmdet.utils import InstanceList, OptInstanceList, reduce_mean +from torch import Tensor @MODELS.register_module() class CoATSSHead(ATSSHead): def loss_by_feat( - self, - cls_scores: List[Tensor], - bbox_preds: List[Tensor], - centernesses: List[Tensor], - batch_gt_instances: InstanceList, - batch_img_metas: List[dict], - batch_gt_instances_ignore: OptInstanceList = None) -> dict: + self, + cls_scores: List[Tensor], + bbox_preds: List[Tensor], + centernesses: List[Tensor], + batch_gt_instances: InstanceList, + batch_img_metas: List[dict], + batch_gt_instances_ignore: OptInstanceList = None, + ) -> dict: """Calculate the loss based on the features extracted by the detection head. @@ -47,54 +47,55 @@ class CoATSSHead(ATSSHead): assert len(featmap_sizes) == self.prior_generator.num_levels device = cls_scores[0].device - anchor_list, valid_flag_list = self.get_anchors( - featmap_sizes, batch_img_metas, device=device) + anchor_list, valid_flag_list = self.get_anchors(featmap_sizes, batch_img_metas, device=device) cls_reg_targets = self.get_targets( + anchor_list, valid_flag_list, batch_gt_instances, batch_img_metas, batch_gt_instances_ignore=batch_gt_instances_ignore + ) + + ( anchor_list, - valid_flag_list, - batch_gt_instances, - batch_img_metas, - batch_gt_instances_ignore=batch_gt_instances_ignore) - - (anchor_list, labels_list, label_weights_list, bbox_targets_list, - bbox_weights_list, avg_factor, ori_anchors, ori_labels, - ori_bbox_targets) = cls_reg_targets - - avg_factor = reduce_mean( - torch.tensor(avg_factor, dtype=torch.float, device=device)).item() - - losses_cls, losses_bbox, loss_centerness, \ - bbox_avg_factor = multi_apply( - self.loss_by_feat_single, - anchor_list, - cls_scores, - bbox_preds, - centernesses, - labels_list, - label_weights_list, - bbox_targets_list, - avg_factor=avg_factor) + labels_list, + label_weights_list, + bbox_targets_list, + bbox_weights_list, + avg_factor, + ori_anchors, + ori_labels, + ori_bbox_targets, + ) = cls_reg_targets + + avg_factor = reduce_mean(torch.tensor(avg_factor, dtype=torch.float, device=device)).item() + + losses_cls, losses_bbox, loss_centerness, bbox_avg_factor = multi_apply( + self.loss_by_feat_single, + anchor_list, + cls_scores, + bbox_preds, + centernesses, + labels_list, + label_weights_list, + bbox_targets_list, + avg_factor=avg_factor, + ) bbox_avg_factor = sum(bbox_avg_factor) bbox_avg_factor = reduce_mean(bbox_avg_factor).clamp_(min=1).item() losses_bbox = list(map(lambda x: x / bbox_avg_factor, losses_bbox)) # diff - pos_coords = (ori_anchors, ori_labels, ori_bbox_targets, 'atss') - return dict( - loss_cls=losses_cls, - loss_bbox=losses_bbox, - loss_centerness=loss_centerness, - pos_coords=pos_coords) - - def get_targets(self, - anchor_list: List[List[Tensor]], - valid_flag_list: List[List[Tensor]], - batch_gt_instances: InstanceList, - batch_img_metas: List[dict], - batch_gt_instances_ignore: OptInstanceList = None, - unmap_outputs: bool = True) -> tuple: + pos_coords = (ori_anchors, ori_labels, ori_bbox_targets, "atss") + return dict(loss_cls=losses_cls, loss_bbox=losses_bbox, loss_centerness=loss_centerness, pos_coords=pos_coords) + + def get_targets( + self, + anchor_list: List[List[Tensor]], + valid_flag_list: List[List[Tensor]], + batch_gt_instances: InstanceList, + batch_img_metas: List[dict], + batch_gt_instances_ignore: OptInstanceList = None, + unmap_outputs: bool = True, + ) -> tuple: """Get targets for ATSS head. This method is almost the same as `AnchorHead.get_targets()`. Besides @@ -117,37 +118,49 @@ class CoATSSHead(ATSSHead): # compute targets for each image if batch_gt_instances_ignore is None: batch_gt_instances_ignore = [None] * num_imgs - (all_anchors, all_labels, all_label_weights, all_bbox_targets, - all_bbox_weights, pos_inds_list, neg_inds_list, - sampling_results_list) = multi_apply( - self._get_targets_single, - anchor_list, - valid_flag_list, - num_level_anchors_list, - batch_gt_instances, - batch_img_metas, - batch_gt_instances_ignore, - unmap_outputs=unmap_outputs) + ( + all_anchors, + all_labels, + all_label_weights, + all_bbox_targets, + all_bbox_weights, + pos_inds_list, + neg_inds_list, + sampling_results_list, + ) = multi_apply( + self._get_targets_single, + anchor_list, + valid_flag_list, + num_level_anchors_list, + batch_gt_instances, + batch_img_metas, + batch_gt_instances_ignore, + unmap_outputs=unmap_outputs, + ) # Get `avg_factor` of all images, which calculate in `SamplingResult`. # When using sampling method, avg_factor is usually the sum of # positive and negative priors. When using `PseudoSampler`, # `avg_factor` is usually equal to the number of positive priors. - avg_factor = sum( - [results.avg_factor for results in sampling_results_list]) + avg_factor = sum([results.avg_factor for results in sampling_results_list]) # split targets to a list w.r.t. multiple levels anchors_list = images_to_levels(all_anchors, num_level_anchors) labels_list = images_to_levels(all_labels, num_level_anchors) - label_weights_list = images_to_levels(all_label_weights, - num_level_anchors) - bbox_targets_list = images_to_levels(all_bbox_targets, - num_level_anchors) - bbox_weights_list = images_to_levels(all_bbox_weights, - num_level_anchors) + label_weights_list = images_to_levels(all_label_weights, num_level_anchors) + bbox_targets_list = images_to_levels(all_bbox_targets, num_level_anchors) + bbox_weights_list = images_to_levels(all_bbox_weights, num_level_anchors) # diff ori_anchors = all_anchors ori_labels = all_labels ori_bbox_targets = all_bbox_targets - return (anchors_list, labels_list, label_weights_list, - bbox_targets_list, bbox_weights_list, avg_factor, ori_anchors, - ori_labels, ori_bbox_targets) + return ( + anchors_list, + labels_list, + label_weights_list, + bbox_targets_list, + bbox_weights_list, + avg_factor, + ori_anchors, + ori_labels, + ori_bbox_targets, + ) diff --git a/mmpose/configs/mmdet/CO-DETR/codetr/co_dino_head.py b/mmpose/configs/mmdet/CO-DETR/codetr/co_dino_head.py index 192acf97d86c5d24b623608a46d564a8753b5b7b..2a58f21f185a98c7cc93c08a5fd40c449958422f 100644 --- a/mmpose/configs/mmdet/CO-DETR/codetr/co_dino_head.py +++ b/mmpose/configs/mmdet/CO-DETR/codetr/co_dino_head.py @@ -7,36 +7,33 @@ import torch.nn as nn import torch.nn.functional as F from mmcv.cnn import Linear from mmcv.ops import batched_nms -from mmengine.structures import InstanceData -from torch import Tensor - from mmdet.models import DINOHead from mmdet.models.layers import CdnQueryGenerator from mmdet.models.layers.transformer import inverse_sigmoid from mmdet.models.utils import multi_apply from mmdet.registry import MODELS from mmdet.structures import SampleList -from mmdet.structures.bbox import (bbox_cxcywh_to_xyxy, bbox_overlaps, - bbox_xyxy_to_cxcywh) +from mmdet.structures.bbox import bbox_cxcywh_to_xyxy, bbox_overlaps, bbox_xyxy_to_cxcywh from mmdet.utils import InstanceList, reduce_mean +from mmengine.structures import InstanceData +from torch import Tensor @MODELS.register_module() class CoDINOHead(DINOHead): - def __init__(self, - *args, - num_query=900, - transformer=None, - in_channels=2048, - max_pos_coords=300, - dn_cfg=None, - use_zero_padding=False, - positional_encoding=dict( - type='SinePositionalEncoding', - num_feats=128, - normalize=True), - **kwargs): + def __init__( + self, + *args, + num_query=900, + transformer=None, + in_channels=2048, + max_pos_coords=300, + dn_cfg=None, + use_zero_padding=False, + positional_encoding=dict(type="SinePositionalEncoding", num_feats=128, normalize=True), + **kwargs, + ): self.with_box_refine = True self.mixed_selection = True self.in_channels = in_channels @@ -45,17 +42,15 @@ class CoDINOHead(DINOHead): self.num_query = num_query self.use_zero_padding = use_zero_padding - if 'two_stage_num_proposals' in transformer: - assert transformer['two_stage_num_proposals'] == num_query, \ - 'two_stage_num_proposals must be equal to num_query for DINO' + if "two_stage_num_proposals" in transformer: + assert transformer["two_stage_num_proposals"] == num_query, "two_stage_num_proposals must be equal to num_query for DINO" else: - transformer['two_stage_num_proposals'] = num_query - transformer['as_two_stage'] = True + transformer["two_stage_num_proposals"] = num_query + transformer["as_two_stage"] = True if self.mixed_selection: - transformer['mixed_selection'] = self.mixed_selection + transformer["mixed_selection"] = self.mixed_selection self.transformer = transformer - self.act_cfg = transformer.get('act_cfg', - dict(type='ReLU', inplace=True)) + self.act_cfg = transformer.get("act_cfg", dict(type="ReLU", inplace=True)) super().__init__(*args, **kwargs) @@ -66,11 +61,11 @@ class CoDINOHead(DINOHead): def _init_layers(self): self.transformer = MODELS.build(self.transformer) self.embed_dims = self.transformer.embed_dims - assert hasattr(self.positional_encoding, 'num_feats') + assert hasattr(self.positional_encoding, "num_feats") num_feats = self.positional_encoding.num_feats - assert num_feats * 2 == self.embed_dims, 'embed_dims should' \ - f' be exactly 2 times of num_feats. Found {self.embed_dims}' \ - f' and {num_feats}.' + assert num_feats * 2 == self.embed_dims, ( + "embed_dims should" f" be exactly 2 times of num_feats. Found {self.embed_dims}" f" and {num_feats}." + ) """Initialize classification branch and regression branch of head.""" fc_cls = Linear(self.embed_dims, self.cls_out_channels) reg_branch = [] @@ -85,63 +80,48 @@ class CoDINOHead(DINOHead): # last reg_branch is used to generate proposal from # encode feature map when as_two_stage is True. - num_pred = (self.transformer.decoder.num_layers + 1) if \ - self.as_two_stage else self.transformer.decoder.num_layers + num_pred = (self.transformer.decoder.num_layers + 1) if self.as_two_stage else self.transformer.decoder.num_layers self.cls_branches = _get_clones(fc_cls, num_pred) self.reg_branches = _get_clones(reg_branch, num_pred) self.downsample = nn.Sequential( - nn.Conv2d( - self.embed_dims, - self.embed_dims, - kernel_size=3, - stride=2, - padding=1), nn.GroupNorm(32, self.embed_dims)) + nn.Conv2d(self.embed_dims, self.embed_dims, kernel_size=3, stride=2, padding=1), nn.GroupNorm(32, self.embed_dims) + ) def init_denoising(self, dn_cfg): if dn_cfg is not None: - dn_cfg['num_classes'] = self.num_classes - dn_cfg['num_matching_queries'] = self.num_query - dn_cfg['embed_dims'] = self.embed_dims + dn_cfg["num_classes"] = self.num_classes + dn_cfg["num_matching_queries"] = self.num_query + dn_cfg["embed_dims"] = self.embed_dims self.dn_generator = CdnQueryGenerator(**dn_cfg) - def forward(self, - mlvl_feats, - img_metas, - dn_label_query=None, - dn_bbox_query=None, - attn_mask=None): + def forward(self, mlvl_feats, img_metas, dn_label_query=None, dn_bbox_query=None, attn_mask=None): batch_size = mlvl_feats[0].size(0) - input_img_h, input_img_w = img_metas[0]['batch_input_shape'] - img_masks = mlvl_feats[0].new_ones( - (batch_size, input_img_h, input_img_w)) + input_img_h, input_img_w = img_metas[0]["batch_input_shape"] + img_masks = mlvl_feats[0].new_ones((batch_size, input_img_h, input_img_w)) for img_id in range(batch_size): - img_h, img_w = img_metas[img_id]['img_shape'] + img_h, img_w = img_metas[img_id]["img_shape"] img_masks[img_id, :img_h, :img_w] = 0 mlvl_masks = [] mlvl_positional_encodings = [] for feat in mlvl_feats: - mlvl_masks.append( - F.interpolate(img_masks[None], - size=feat.shape[-2:]).to(torch.bool).squeeze(0)) - mlvl_positional_encodings.append( - self.positional_encoding(mlvl_masks[-1])) + mlvl_masks.append(F.interpolate(img_masks[None], size=feat.shape[-2:]).to(torch.bool).squeeze(0)) + mlvl_positional_encodings.append(self.positional_encoding(mlvl_masks[-1])) query_embeds = None - hs, inter_references, topk_score, topk_anchor, enc_outputs = \ - self.transformer( - mlvl_feats, - mlvl_masks, - query_embeds, - mlvl_positional_encodings, - dn_label_query, - dn_bbox_query, - attn_mask, - reg_branches=self.reg_branches if self.with_box_refine else None, # noqa:E501 - cls_branches=self.cls_branches if self.as_two_stage else None # noqa:E501 - ) + hs, inter_references, topk_score, topk_anchor, enc_outputs = self.transformer( + mlvl_feats, + mlvl_masks, + query_embeds, + mlvl_positional_encodings, + dn_label_query, + dn_bbox_query, + attn_mask, + reg_branches=self.reg_branches if self.with_box_refine else None, # noqa:E501 + cls_branches=self.cls_branches if self.as_two_stage else None, # noqa:E501 + ) outs = [] num_level = len(mlvl_feats) start = 0 @@ -183,28 +163,15 @@ class CoDINOHead(DINOHead): return outputs_classes, outputs_coords, topk_score, topk_anchor, outs - def predict(self, - feats: List[Tensor], - batch_data_samples: SampleList, - rescale: bool = True) -> InstanceList: - batch_img_metas = [ - data_samples.metainfo for data_samples in batch_data_samples - ] + def predict(self, feats: List[Tensor], batch_data_samples: SampleList, rescale: bool = True) -> InstanceList: + batch_img_metas = [data_samples.metainfo for data_samples in batch_data_samples] outs = self.forward(feats, batch_img_metas) - predictions = self.predict_by_feat( - *outs, batch_img_metas=batch_img_metas, rescale=rescale) + predictions = self.predict_by_feat(*outs, batch_img_metas=batch_img_metas, rescale=rescale) return predictions - def predict_by_feat(self, - all_cls_scores, - all_bbox_preds, - enc_cls_scores, - enc_bbox_preds, - enc_outputs, - batch_img_metas, - rescale=True): + def predict_by_feat(self, all_cls_scores, all_bbox_preds, enc_cls_scores, enc_bbox_preds, enc_outputs, batch_img_metas, rescale=True): cls_scores = all_cls_scores[-1] bbox_preds = all_bbox_preds[-1] @@ -214,16 +181,11 @@ class CoDINOHead(DINOHead): cls_score = cls_scores[img_id] bbox_pred = bbox_preds[img_id] img_meta = batch_img_metas[img_id] - results = self._predict_by_feat_single(cls_score, bbox_pred, - img_meta, rescale) + results = self._predict_by_feat_single(cls_score, bbox_pred, img_meta, rescale) result_list.append(results) return result_list - def _predict_by_feat_single(self, - cls_score: Tensor, - bbox_pred: Tensor, - img_meta: dict, - rescale: bool = True) -> InstanceData: + def _predict_by_feat_single(self, cls_score: Tensor, bbox_pred: Tensor, img_meta: dict, rescale: bool = True) -> InstanceData: """Transform outputs from the last decoder layer into bbox predictions for each image. @@ -250,11 +212,11 @@ class CoDINOHead(DINOHead): the last dimension 4 arrange as (x1, y1, x2, y2). """ assert len(cls_score) == len(bbox_pred) # num_queries - max_per_img = self.test_cfg.get('max_per_img', self.num_query) - score_thr = self.test_cfg.get('score_thr', 0) - with_nms = self.test_cfg.get('nms', None) + max_per_img = self.test_cfg.get("max_per_img", self.num_query) + score_thr = self.test_cfg.get("score_thr", 0) + with_nms = self.test_cfg.get("nms", None) - img_shape = img_meta['img_shape'] + img_shape = img_meta["img_shape"] # exclude background if self.loss_cls.use_sigmoid: cls_score = cls_score.sigmoid() @@ -280,9 +242,8 @@ class CoDINOHead(DINOHead): det_bboxes[:, 0::2].clamp_(min=0, max=img_shape[1]) det_bboxes[:, 1::2].clamp_(min=0, max=img_shape[0]) if rescale: - assert img_meta.get('scale_factor') is not None - det_bboxes /= det_bboxes.new_tensor( - img_meta['scale_factor']).repeat((1, 2)) + assert img_meta.get("scale_factor") is not None + det_bboxes /= det_bboxes.new_tensor(img_meta["scale_factor"]).repeat((1, 2)) results = InstanceData() results.bboxes = det_bboxes @@ -290,9 +251,7 @@ class CoDINOHead(DINOHead): results.labels = det_labels if with_nms and results.bboxes.numel() > 0: - det_bboxes, keep_idxs = batched_nms(results.bboxes, results.scores, - results.labels, - self.test_cfg.nms) + det_bboxes, keep_idxs = batched_nms(results.bboxes, results.scores, results.labels, self.test_cfg.nms) results = results[keep_idxs] results.scores = det_bboxes[:, -1] results = results[:max_per_img] @@ -308,14 +267,11 @@ class CoDINOHead(DINOHead): batch_img_metas.append(data_sample.metainfo) batch_gt_instances.append(data_sample.gt_instances) - dn_label_query, dn_bbox_query, attn_mask, dn_meta = \ - self.dn_generator(batch_data_samples) + dn_label_query, dn_bbox_query, attn_mask, dn_meta = self.dn_generator(batch_data_samples) - outs = self(x, batch_img_metas, dn_label_query, dn_bbox_query, - attn_mask) + outs = self(x, batch_img_metas, dn_label_query, dn_bbox_query, attn_mask) - loss_inputs = outs[:-1] + (batch_gt_instances, batch_img_metas, - dn_meta) + loss_inputs = outs[:-1] + (batch_gt_instances, batch_img_metas, dn_meta) losses = self.loss_by_feat(*loss_inputs) enc_outputs = outs[-1] return losses, enc_outputs @@ -345,24 +301,19 @@ class CoDINOHead(DINOHead): as_two_stage is True it would be returned, otherwise \ `None` would be returned. """ - aux_coords, aux_labels, aux_targets, aux_label_weights, \ - aux_bbox_weights, aux_feats, attn_masks = aux_targets + aux_coords, aux_labels, aux_targets, aux_label_weights, aux_bbox_weights, aux_feats, attn_masks = aux_targets batch_size = mlvl_feats[0].size(0) - input_img_h, input_img_w = img_metas[0]['batch_input_shape'] - img_masks = mlvl_feats[0].new_ones( - (batch_size, input_img_h, input_img_w)) + input_img_h, input_img_w = img_metas[0]["batch_input_shape"] + img_masks = mlvl_feats[0].new_ones((batch_size, input_img_h, input_img_w)) for img_id in range(batch_size): - img_h, img_w = img_metas[img_id]['img_shape'] + img_h, img_w = img_metas[img_id]["img_shape"] img_masks[img_id, :img_h, :img_w] = 0 mlvl_masks = [] mlvl_positional_encodings = [] for feat in mlvl_feats: - mlvl_masks.append( - F.interpolate(img_masks[None], - size=feat.shape[-2:]).to(torch.bool).squeeze(0)) - mlvl_positional_encodings.append( - self.positional_encoding(mlvl_masks[-1])) + mlvl_masks.append(F.interpolate(img_masks[None], size=feat.shape[-2:]).to(torch.bool).squeeze(0)) + mlvl_positional_encodings.append(self.positional_encoding(mlvl_masks[-1])) query_embeds = None hs, inter_references = self.transformer.forward_aux( @@ -376,7 +327,8 @@ class CoDINOHead(DINOHead): cls_branches=self.cls_branches if self.as_two_stage else None, return_encoder_output=True, attn_masks=attn_masks, - head_idx=head_idx) + head_idx=head_idx, + ) hs = hs.permute(0, 2, 1, 3) outputs_classes = [] @@ -401,11 +353,7 @@ class CoDINOHead(DINOHead): return outputs_classes, outputs_coords, None, None - def loss_aux(self, - x, - pos_coords=None, - head_idx=0, - batch_data_samples=None): + def loss_aux(self, x, pos_coords=None, head_idx=0, batch_data_samples=None): batch_gt_instances = [] batch_img_metas = [] for data_sample in batch_data_samples: @@ -415,8 +363,7 @@ class CoDINOHead(DINOHead): gt_bboxes = [b.bboxes for b in batch_gt_instances] gt_labels = [b.labels for b in batch_gt_instances] - aux_targets = self.get_aux_targets(pos_coords, batch_img_metas, x, - head_idx) + aux_targets = self.get_aux_targets(pos_coords, batch_img_metas, x, head_idx) outs = self.forward_aux(x[:-1], batch_img_metas, aux_targets, head_idx) outs = outs + aux_targets if gt_labels is None: @@ -434,13 +381,10 @@ class CoDINOHead(DINOHead): all_feats = [] for i in range(bs): label = labels[i] - feats = [ - feat[i].reshape(c, -1).transpose(1, 0) for feat in mlvl_feats - ] + feats = [feat[i].reshape(c, -1).transpose(1, 0) for feat in mlvl_feats] feats = torch.cat(feats, dim=0) bg_class_ind = self.num_classes - pos_inds = ((label >= 0) - & (label < bg_class_ind)).nonzero().squeeze(1) + pos_inds = ((label >= 0) & (label < bg_class_ind)).nonzero().squeeze(1) max_num_coords = max(max_num_coords, len(pos_inds)) all_feats.append(feats) max_num_coords = min(self.max_pos_coords, max_num_coords) @@ -459,25 +403,21 @@ class CoDINOHead(DINOHead): for i in range(bs): coord, label, target = coords[i], labels[i], targets[i] feats = all_feats[i] - if 'rcnn' in head_name: + if "rcnn" in head_name: feats = pos_coords[-2][i] num_coords_per_point = 1 else: num_coords_per_point = coord.shape[0] // feats.shape[0] feats = feats.unsqueeze(1).repeat(1, num_coords_per_point, 1) - feats = feats.reshape(feats.shape[0] * num_coords_per_point, - feats.shape[-1]) + feats = feats.reshape(feats.shape[0] * num_coords_per_point, feats.shape[-1]) img_meta = img_metas[i] - img_h, img_w = img_meta['img_shape'] - factor = coord.new_tensor([img_w, img_h, img_w, - img_h]).unsqueeze(0) + img_h, img_w = img_meta["img_shape"] + factor = coord.new_tensor([img_w, img_h, img_w, img_h]).unsqueeze(0) bg_class_ind = self.num_classes - pos_inds = ((label >= 0) - & (label < bg_class_ind)).nonzero().squeeze(1) + pos_inds = ((label >= 0) & (label < bg_class_ind)).nonzero().squeeze(1) neg_inds = (label == bg_class_ind).nonzero().squeeze(1) if pos_inds.shape[0] > max_num_coords: - indices = torch.randperm( - pos_inds.shape[0])[:max_num_coords].cuda() + indices = torch.randperm(pos_inds.shape[0])[:max_num_coords].cuda() pos_inds = pos_inds[indices] coord = bbox_xyxy_to_cxcywh(coord[pos_inds] / factor) @@ -486,34 +426,42 @@ class CoDINOHead(DINOHead): feat = feats[pos_inds] if self.use_zero_padding: - label_weights[i][:len(label)] = 1 - bbox_weights[i][:len(label)] = 1 - attn_mask = torch.zeros([ - max_num_coords, - max_num_coords, - ]).bool().to(coord.device) + label_weights[i][: len(label)] = 1 + bbox_weights[i][: len(label)] = 1 + attn_mask = ( + torch.zeros( + [ + max_num_coords, + max_num_coords, + ] + ) + .bool() + .to(coord.device) + ) else: - bbox_weights[i][:len(label)] = 1 + bbox_weights[i][: len(label)] = 1 if coord.shape[0] < max_num_coords: padding_shape = max_num_coords - coord.shape[0] if self.use_zero_padding: padding_coord = coord.new_zeros([padding_shape, 4]) - padding_label = label.new_ones([padding_shape - ]) * self.num_classes + padding_label = label.new_ones([padding_shape]) * self.num_classes padding_target = target.new_zeros([padding_shape, 4]) padding_feat = feat.new_zeros([padding_shape, c]) - attn_mask[coord.shape[0]:, 0:coord.shape[0], ] = True - attn_mask[:, coord.shape[0]:, ] = True + attn_mask[ + coord.shape[0] :, + 0 : coord.shape[0], + ] = True + attn_mask[ + :, + coord.shape[0] :, + ] = True else: - indices = torch.randperm( - neg_inds.shape[0])[:padding_shape].cuda() + indices = torch.randperm(neg_inds.shape[0])[:padding_shape].cuda() neg_inds = neg_inds[indices] - padding_coord = bbox_xyxy_to_cxcywh(coords[i][neg_inds] / - factor) + padding_coord = bbox_xyxy_to_cxcywh(coords[i][neg_inds] / factor) padding_label = labels[i][neg_inds] - padding_target = bbox_xyxy_to_cxcywh(targets[i][neg_inds] / - factor) + padding_target = bbox_xyxy_to_cxcywh(targets[i][neg_inds] / factor) padding_feat = feats[neg_inds] coord = torch.cat((coord, padding_coord), dim=0) label = torch.cat((label, padding_label), dim=0) @@ -527,10 +475,8 @@ class CoDINOHead(DINOHead): aux_feats.append(feat.unsqueeze(0)) if self.use_zero_padding: - attn_masks = torch.cat( - attn_masks, dim=0).unsqueeze(1).repeat(1, 8, 1, 1) - attn_masks = attn_masks.reshape(bs * 8, max_num_coords, - max_num_coords) + attn_masks = torch.cat(attn_masks, dim=0).unsqueeze(1).repeat(1, 8, 1, 1) + attn_masks = attn_masks.reshape(bs * 8, max_num_coords, max_num_coords) else: attn_masks = None @@ -540,67 +486,65 @@ class CoDINOHead(DINOHead): aux_feats = torch.cat(aux_feats, dim=0) aux_label_weights = label_weights aux_bbox_weights = bbox_weights - return (aux_coords, aux_labels, aux_targets, aux_label_weights, - aux_bbox_weights, aux_feats, attn_masks) - - def loss_aux_by_feat(self, - all_cls_scores, - all_bbox_preds, - enc_cls_scores, - enc_bbox_preds, - aux_coords, - aux_labels, - aux_targets, - aux_label_weights, - aux_bbox_weights, - aux_feats, - attn_masks, - gt_bboxes_list, - gt_labels_list, - img_metas, - gt_bboxes_ignore=None): + return (aux_coords, aux_labels, aux_targets, aux_label_weights, aux_bbox_weights, aux_feats, attn_masks) + + def loss_aux_by_feat( + self, + all_cls_scores, + all_bbox_preds, + enc_cls_scores, + enc_bbox_preds, + aux_coords, + aux_labels, + aux_targets, + aux_label_weights, + aux_bbox_weights, + aux_feats, + attn_masks, + gt_bboxes_list, + gt_labels_list, + img_metas, + gt_bboxes_ignore=None, + ): num_dec_layers = len(all_cls_scores) all_labels = [aux_labels for _ in range(num_dec_layers)] all_label_weights = [aux_label_weights for _ in range(num_dec_layers)] all_bbox_targets = [aux_targets for _ in range(num_dec_layers)] all_bbox_weights = [aux_bbox_weights for _ in range(num_dec_layers)] img_metas_list = [img_metas for _ in range(num_dec_layers)] - all_gt_bboxes_ignore_list = [ - gt_bboxes_ignore for _ in range(num_dec_layers) - ] + all_gt_bboxes_ignore_list = [gt_bboxes_ignore for _ in range(num_dec_layers)] losses_cls, losses_bbox, losses_iou = multi_apply( - self._loss_aux_by_feat_single, all_cls_scores, all_bbox_preds, - all_labels, all_label_weights, all_bbox_targets, all_bbox_weights, - img_metas_list, all_gt_bboxes_ignore_list) + self._loss_aux_by_feat_single, + all_cls_scores, + all_bbox_preds, + all_labels, + all_label_weights, + all_bbox_targets, + all_bbox_weights, + img_metas_list, + all_gt_bboxes_ignore_list, + ) loss_dict = dict() # loss of proposal generated from encode feature map. # loss from the last decoder layer - loss_dict['loss_cls_aux'] = losses_cls[-1] - loss_dict['loss_bbox_aux'] = losses_bbox[-1] - loss_dict['loss_iou_aux'] = losses_iou[-1] + loss_dict["loss_cls_aux"] = losses_cls[-1] + loss_dict["loss_bbox_aux"] = losses_bbox[-1] + loss_dict["loss_iou_aux"] = losses_iou[-1] # loss from other decoder layers num_dec_layer = 0 - for loss_cls_i, loss_bbox_i, loss_iou_i in zip(losses_cls[:-1], - losses_bbox[:-1], - losses_iou[:-1]): - loss_dict[f'd{num_dec_layer}.loss_cls_aux'] = loss_cls_i - loss_dict[f'd{num_dec_layer}.loss_bbox_aux'] = loss_bbox_i - loss_dict[f'd{num_dec_layer}.loss_iou_aux'] = loss_iou_i + for loss_cls_i, loss_bbox_i, loss_iou_i in zip(losses_cls[:-1], losses_bbox[:-1], losses_iou[:-1]): + loss_dict[f"d{num_dec_layer}.loss_cls_aux"] = loss_cls_i + loss_dict[f"d{num_dec_layer}.loss_bbox_aux"] = loss_bbox_i + loss_dict[f"d{num_dec_layer}.loss_iou_aux"] = loss_iou_i num_dec_layer += 1 return loss_dict - def _loss_aux_by_feat_single(self, - cls_scores, - bbox_preds, - labels, - label_weights, - bbox_targets, - bbox_weights, - img_metas, - gt_bboxes_ignore_list=None): + def _loss_aux_by_feat_single( + self, cls_scores, bbox_preds, labels, label_weights, bbox_targets, bbox_weights, img_metas, gt_bboxes_ignore_list=None + ): num_imgs = cls_scores.size(0) num_q = cls_scores.size(1) @@ -610,40 +554,29 @@ class CoDINOHead(DINOHead): bbox_targets = bbox_targets.reshape(num_imgs * num_q, 4) bbox_weights = bbox_weights.reshape(num_imgs * num_q, 4) except Exception: - return cls_scores.mean() * 0, cls_scores.mean( - ) * 0, cls_scores.mean() * 0 + return cls_scores.mean() * 0, cls_scores.mean() * 0, cls_scores.mean() * 0 bg_class_ind = self.num_classes - num_total_pos = len( - ((labels >= 0) & (labels < bg_class_ind)).nonzero().squeeze(1)) + num_total_pos = len(((labels >= 0) & (labels < bg_class_ind)).nonzero().squeeze(1)) num_total_neg = num_imgs * num_q - num_total_pos # classification loss cls_scores = cls_scores.reshape(-1, self.cls_out_channels) # construct weighted avg_factor to match with the official DETR repo - cls_avg_factor = num_total_pos * 1.0 + \ - num_total_neg * self.bg_cls_weight + cls_avg_factor = num_total_pos * 1.0 + num_total_neg * self.bg_cls_weight if self.sync_cls_avg_factor: - cls_avg_factor = reduce_mean( - cls_scores.new_tensor([cls_avg_factor])) + cls_avg_factor = reduce_mean(cls_scores.new_tensor([cls_avg_factor])) cls_avg_factor = max(cls_avg_factor, 1) bg_class_ind = self.num_classes - pos_inds = ((labels >= 0) - & (labels < bg_class_ind)).nonzero().squeeze(1) + pos_inds = ((labels >= 0) & (labels < bg_class_ind)).nonzero().squeeze(1) scores = label_weights.new_zeros(labels.shape) pos_bbox_targets = bbox_targets[pos_inds] pos_decode_bbox_targets = bbox_cxcywh_to_xyxy(pos_bbox_targets) pos_bbox_pred = bbox_preds.reshape(-1, 4)[pos_inds] pos_decode_bbox_pred = bbox_cxcywh_to_xyxy(pos_bbox_pred) - scores[pos_inds] = bbox_overlaps( - pos_decode_bbox_pred.detach(), - pos_decode_bbox_targets, - is_aligned=True) - loss_cls = self.loss_cls( - cls_scores, (labels, scores), - weight=label_weights, - avg_factor=cls_avg_factor) + scores[pos_inds] = bbox_overlaps(pos_decode_bbox_pred.detach(), pos_decode_bbox_targets, is_aligned=True) + loss_cls = self.loss_cls(cls_scores, (labels, scores), weight=label_weights, avg_factor=cls_avg_factor) # Compute the average number of gt boxes across all gpus, for # normalization purposes @@ -653,10 +586,8 @@ class CoDINOHead(DINOHead): # construct factors used for rescale bboxes factors = [] for img_meta, bbox_pred in zip(img_metas, bbox_preds): - img_h, img_w = img_meta['img_shape'] - factor = bbox_pred.new_tensor([img_w, img_h, img_w, - img_h]).unsqueeze(0).repeat( - bbox_pred.size(0), 1) + img_h, img_w = img_meta["img_shape"] + factor = bbox_pred.new_tensor([img_w, img_h, img_w, img_h]).unsqueeze(0).repeat(bbox_pred.size(0), 1) factors.append(factor) factors = torch.cat(factors, 0) @@ -668,10 +599,8 @@ class CoDINOHead(DINOHead): bboxes_gt = bbox_cxcywh_to_xyxy(bbox_targets) * factors # regression IoU loss, defaultly GIoU loss - loss_iou = self.loss_iou( - bboxes, bboxes_gt, bbox_weights, avg_factor=num_total_pos) + loss_iou = self.loss_iou(bboxes, bboxes_gt, bbox_weights, avg_factor=num_total_pos) # regression L1 loss - loss_bbox = self.loss_bbox( - bbox_preds, bbox_targets, bbox_weights, avg_factor=num_total_pos) + loss_bbox = self.loss_bbox(bbox_preds, bbox_targets, bbox_weights, avg_factor=num_total_pos) return loss_cls, loss_bbox, loss_iou diff --git a/mmpose/configs/mmdet/CO-DETR/codetr/co_roi_head.py b/mmpose/configs/mmdet/CO-DETR/codetr/co_roi_head.py index 9aafb53beddf07428e59d83e9de832ff5102821a..dcd276c07e97bf3a764ae028607a338af149375c 100644 --- a/mmpose/configs/mmdet/CO-DETR/codetr/co_roi_head.py +++ b/mmpose/configs/mmdet/CO-DETR/codetr/co_roi_head.py @@ -1,8 +1,6 @@ from typing import List, Tuple import torch -from torch import Tensor - from mmdet.models.roi_heads import StandardRoIHead from mmdet.models.task_modules.samplers import SamplingResult from mmdet.models.utils import unpack_gt_instances @@ -10,13 +8,13 @@ from mmdet.registry import MODELS from mmdet.structures import DetDataSample from mmdet.structures.bbox import bbox2roi from mmdet.utils import InstanceList +from torch import Tensor @MODELS.register_module() class CoStandardRoIHead(StandardRoIHead): - def loss(self, x: Tuple[Tensor], rpn_results_list: InstanceList, - batch_data_samples: List[DetDataSample]) -> dict: + def loss(self, x: Tuple[Tensor], rpn_results_list: InstanceList, batch_data_samples: List[DetDataSample]) -> dict: max_proposal = 2000 assert len(rpn_results_list) == len(batch_data_samples) @@ -29,37 +27,32 @@ class CoStandardRoIHead(StandardRoIHead): for i in range(num_imgs): # rename rpn_results.bboxes to rpn_results.priors rpn_results = rpn_results_list[i] - rpn_results.priors = rpn_results.pop('bboxes') + rpn_results.priors = rpn_results.pop("bboxes") - assign_result = self.bbox_assigner.assign( - rpn_results, batch_gt_instances[i], - batch_gt_instances_ignore[i]) + assign_result = self.bbox_assigner.assign(rpn_results, batch_gt_instances[i], batch_gt_instances_ignore[i]) sampling_result = self.bbox_sampler.sample( - assign_result, - rpn_results, - batch_gt_instances[i], - feats=[lvl_feat[i][None] for lvl_feat in x]) + assign_result, rpn_results, batch_gt_instances[i], feats=[lvl_feat[i][None] for lvl_feat in x] + ) sampling_results.append(sampling_result) losses = dict() # bbox head forward and loss if self.with_bbox: bbox_results = self.bbox_loss(x, sampling_results) - losses.update(bbox_results['loss_bbox']) + losses.update(bbox_results["loss_bbox"]) - bbox_targets = bbox_results['bbox_targets'] + bbox_targets = bbox_results["bbox_targets"] for res in sampling_results: max_proposal = min(max_proposal, res.bboxes.shape[0]) ori_coords = bbox2roi([res.bboxes for res in sampling_results]) - ori_proposals, ori_labels, \ - ori_bbox_targets, ori_bbox_feats = [], [], [], [] + ori_proposals, ori_labels, ori_bbox_targets, ori_bbox_feats = [], [], [], [] for i in range(num_imgs): idx = (ori_coords[:, 0] == i).nonzero().squeeze(1) idx = idx[:max_proposal] ori_proposal = ori_coords[idx][:, 1:].unsqueeze(0) ori_label = bbox_targets[0][idx].unsqueeze(0) ori_bbox_target = bbox_targets[2][idx].unsqueeze(0) - ori_bbox_feat = bbox_results['bbox_feats'].mean(-1).mean(-1) + ori_bbox_feat = bbox_results["bbox_feats"].mean(-1).mean(-1) ori_bbox_feat = ori_bbox_feat[idx].unsqueeze(0) ori_proposals.append(ori_proposal) ori_labels.append(ori_label) @@ -69,14 +62,12 @@ class CoStandardRoIHead(StandardRoIHead): ori_labels = torch.cat(ori_labels, dim=0) ori_bbox_targets = torch.cat(ori_bbox_targets, dim=0) ori_bbox_feats = torch.cat(ori_bbox_feats, dim=0) - pos_coords = (ori_coords, ori_labels, ori_bbox_targets, - ori_bbox_feats, 'rcnn') + pos_coords = (ori_coords, ori_labels, ori_bbox_targets, ori_bbox_feats, "rcnn") losses.update(pos_coords=pos_coords) return losses - def bbox_loss(self, x: Tuple[Tensor], - sampling_results: List[SamplingResult]) -> dict: + def bbox_loss(self, x: Tuple[Tensor], sampling_results: List[SamplingResult]) -> dict: """Perform forward propagation and loss calculation of the bbox head on the features of the upstream network. @@ -96,13 +87,14 @@ class CoStandardRoIHead(StandardRoIHead): bbox_results = self._bbox_forward(x, rois) bbox_loss_and_target = self.bbox_head.loss_and_target( - cls_score=bbox_results['cls_score'], - bbox_pred=bbox_results['bbox_pred'], + cls_score=bbox_results["cls_score"], + bbox_pred=bbox_results["bbox_pred"], rois=rois, sampling_results=sampling_results, - rcnn_train_cfg=self.train_cfg) + rcnn_train_cfg=self.train_cfg, + ) - bbox_results.update(loss_bbox=bbox_loss_and_target['loss_bbox']) + bbox_results.update(loss_bbox=bbox_loss_and_target["loss_bbox"]) # diff - bbox_results.update(bbox_targets=bbox_loss_and_target['bbox_targets']) + bbox_results.update(bbox_targets=bbox_loss_and_target["bbox_targets"]) return bbox_results diff --git a/mmpose/configs/mmdet/CO-DETR/codetr/codetr.py b/mmpose/configs/mmdet/CO-DETR/codetr/codetr.py index 82826f641075c0af7eebd322b6b36b53390cc648..3d9e0e7fa6c2b9c5ec017ab811ee6955d8f8de80 100644 --- a/mmpose/configs/mmdet/CO-DETR/codetr/codetr.py +++ b/mmpose/configs/mmdet/CO-DETR/codetr/codetr.py @@ -3,42 +3,41 @@ from typing import Tuple, Union import torch import torch.nn as nn -from torch import Tensor - from mmdet.models.detectors.base import BaseDetector from mmdet.registry import MODELS from mmdet.structures import OptSampleList, SampleList from mmdet.utils import InstanceList, OptConfigType, OptMultiConfig +from torch import Tensor @MODELS.register_module() class CoDETR(BaseDetector): def __init__( - self, - backbone, - neck=None, - query_head=None, # detr head - rpn_head=None, # two-stage rpn - roi_head=[None], # two-stage - bbox_head=[None], # one-stage - train_cfg=[None, None], - test_cfg=[None, None], - # Control whether to consider positive samples - # from the auxiliary head as additional positive queries. - with_pos_coord=True, - use_lsj=True, - eval_module='detr', - # Evaluate the Nth head. - eval_index=0, - data_preprocessor: OptConfigType = None, - init_cfg: OptMultiConfig = None): - super(CoDETR, self).__init__( - data_preprocessor=data_preprocessor, init_cfg=init_cfg) + self, + backbone, + neck=None, + query_head=None, # detr head + rpn_head=None, # two-stage rpn + roi_head=[None], # two-stage + bbox_head=[None], # one-stage + train_cfg=[None, None], + test_cfg=[None, None], + # Control whether to consider positive samples + # from the auxiliary head as additional positive queries. + with_pos_coord=True, + use_lsj=True, + eval_module="detr", + # Evaluate the Nth head. + eval_index=0, + data_preprocessor: OptConfigType = None, + init_cfg: OptMultiConfig = None, + ): + super(CoDETR, self).__init__(data_preprocessor=data_preprocessor, init_cfg=init_cfg) self.with_pos_coord = with_pos_coord self.use_lsj = use_lsj - assert eval_module in ['detr', 'one-stage', 'two-stage'] + assert eval_module in ["detr", "one-stage", "two-stage"] self.eval_module = eval_module self.backbone = MODELS.build(backbone) @@ -48,30 +47,23 @@ class CoDETR(BaseDetector): self.eval_index = eval_index head_idx = 0 if query_head is not None: - query_head.update(train_cfg=train_cfg[head_idx] if ( - train_cfg is not None and train_cfg[head_idx] is not None - ) else None) + query_head.update(train_cfg=train_cfg[head_idx] if (train_cfg is not None and train_cfg[head_idx] is not None) else None) query_head.update(test_cfg=test_cfg[head_idx]) self.query_head = MODELS.build(query_head) self.query_head.init_weights() head_idx += 1 if rpn_head is not None: - rpn_train_cfg = train_cfg[head_idx].rpn if ( - train_cfg is not None - and train_cfg[head_idx] is not None) else None + rpn_train_cfg = train_cfg[head_idx].rpn if (train_cfg is not None and train_cfg[head_idx] is not None) else None rpn_head_ = rpn_head.copy() - rpn_head_.update( - train_cfg=rpn_train_cfg, test_cfg=test_cfg[head_idx].rpn) + rpn_head_.update(train_cfg=rpn_train_cfg, test_cfg=test_cfg[head_idx].rpn) self.rpn_head = MODELS.build(rpn_head_) self.rpn_head.init_weights() self.roi_head = nn.ModuleList() for i in range(len(roi_head)): if roi_head[i]: - rcnn_train_cfg = train_cfg[i + head_idx].rcnn if ( - train_cfg - and train_cfg[i + head_idx] is not None) else None + rcnn_train_cfg = train_cfg[i + head_idx].rcnn if (train_cfg and train_cfg[i + head_idx] is not None) else None roi_head[i].update(train_cfg=rcnn_train_cfg) roi_head[i].update(test_cfg=test_cfg[i + head_idx].rcnn) self.roi_head.append(MODELS.build(roi_head[i])) @@ -81,12 +73,13 @@ class CoDETR(BaseDetector): for i in range(len(bbox_head)): if bbox_head[i]: bbox_head[i].update( - train_cfg=train_cfg[i + head_idx + len(self.roi_head)] if ( - train_cfg and train_cfg[i + head_idx + - len(self.roi_head)] is not None - ) else None) - bbox_head[i].update(test_cfg=test_cfg[i + head_idx + - len(self.roi_head)]) + train_cfg=( + train_cfg[i + head_idx + len(self.roi_head)] + if (train_cfg and train_cfg[i + head_idx + len(self.roi_head)] is not None) + else None + ) + ) + bbox_head[i].update(test_cfg=test_cfg[i + head_idx + len(self.roi_head)]) self.bbox_head.append(MODELS.build(bbox_head[i])) self.bbox_head[-1].init_weights() @@ -97,31 +90,29 @@ class CoDETR(BaseDetector): @property def with_rpn(self): """bool: whether the detector has RPN""" - return hasattr(self, 'rpn_head') and self.rpn_head is not None + return hasattr(self, "rpn_head") and self.rpn_head is not None @property def with_query_head(self): """bool: whether the detector has a RoI head""" - return hasattr(self, 'query_head') and self.query_head is not None + return hasattr(self, "query_head") and self.query_head is not None @property def with_roi_head(self): """bool: whether the detector has a RoI head""" - return hasattr(self, 'roi_head') and self.roi_head is not None and len( - self.roi_head) > 0 + return hasattr(self, "roi_head") and self.roi_head is not None and len(self.roi_head) > 0 @property def with_shared_head(self): """bool: whether the detector has a shared head in the RoI Head""" - return hasattr(self, 'roi_head') and self.roi_head[0].with_shared_head + return hasattr(self, "roi_head") and self.roi_head[0].with_shared_head @property def with_bbox(self): """bool: whether the detector has a bbox head""" - return ((hasattr(self, 'roi_head') and self.roi_head is not None - and len(self.roi_head) > 0) - or (hasattr(self, 'bbox_head') and self.bbox_head is not None - and len(self.bbox_head) > 0)) + return (hasattr(self, "roi_head") and self.roi_head is not None and len(self.roi_head) > 0) or ( + hasattr(self, "bbox_head") and self.bbox_head is not None and len(self.bbox_head) > 0 + ) def extract_feat(self, batch_inputs: Tensor) -> Tuple[Tensor]: """Extract features. @@ -138,19 +129,16 @@ class CoDETR(BaseDetector): x = self.neck(x) return x - def _forward(self, - batch_inputs: Tensor, - batch_data_samples: OptSampleList = None): + def _forward(self, batch_inputs: Tensor, batch_data_samples: OptSampleList = None): pass - def loss(self, batch_inputs: Tensor, - batch_data_samples: SampleList) -> Union[dict, list]: + def loss(self, batch_inputs: Tensor, batch_data_samples: SampleList) -> Union[dict, list]: batch_input_shape = batch_data_samples[0].batch_input_shape if self.use_lsj: for data_samples in batch_data_samples: img_metas = data_samples.metainfo input_img_h, input_img_w = batch_input_shape - img_metas['img_shape'] = [input_img_h, input_img_w] + img_metas["img_shape"] = [input_img_h, input_img_w] x = self.extract_feat(batch_inputs) @@ -159,7 +147,7 @@ class CoDETR(BaseDetector): def upd_loss(losses, idx, weight=1): new_losses = dict() for k, v in losses.items(): - new_k = '{}{}'.format(k, idx) + new_k = "{}{}".format(k, idx) if isinstance(v, list) or isinstance(v, tuple): new_losses[new_k] = [i * weight for i in v] else: @@ -173,69 +161,59 @@ class CoDETR(BaseDetector): # RPN forward and loss if self.with_rpn: - proposal_cfg = self.train_cfg[self.head_idx].get( - 'rpn_proposal', self.test_cfg[self.head_idx].rpn) + proposal_cfg = self.train_cfg[self.head_idx].get("rpn_proposal", self.test_cfg[self.head_idx].rpn) rpn_data_samples = copy.deepcopy(batch_data_samples) # set cat_id of gt_labels to 0 in RPN for data_sample in rpn_data_samples: - data_sample.gt_instances.labels = \ - torch.zeros_like(data_sample.gt_instances.labels) + data_sample.gt_instances.labels = torch.zeros_like(data_sample.gt_instances.labels) - rpn_losses, proposal_list = self.rpn_head.loss_and_predict( - x, rpn_data_samples, proposal_cfg=proposal_cfg) + rpn_losses, proposal_list = self.rpn_head.loss_and_predict(x, rpn_data_samples, proposal_cfg=proposal_cfg) # avoid get same name with roi_head loss keys = rpn_losses.keys() for key in list(keys): - if 'loss' in key and 'rpn' not in key: - rpn_losses[f'rpn_{key}'] = rpn_losses.pop(key) + if "loss" in key and "rpn" not in key: + rpn_losses[f"rpn_{key}"] = rpn_losses.pop(key) losses.update(rpn_losses) else: - assert batch_data_samples[0].get('proposals', None) is not None + assert batch_data_samples[0].get("proposals", None) is not None # use pre-defined proposals in InstanceData for the second stage # to extract ROI features. - proposal_list = [ - data_sample.proposals for data_sample in batch_data_samples - ] + proposal_list = [data_sample.proposals for data_sample in batch_data_samples] positive_coords = [] for i in range(len(self.roi_head)): - roi_losses = self.roi_head[i].loss(x, proposal_list, - batch_data_samples) + roi_losses = self.roi_head[i].loss(x, proposal_list, batch_data_samples) if self.with_pos_coord: - positive_coords.append(roi_losses.pop('pos_coords')) + positive_coords.append(roi_losses.pop("pos_coords")) else: - if 'pos_coords' in roi_losses.keys(): - roi_losses.pop('pos_coords') + if "pos_coords" in roi_losses.keys(): + roi_losses.pop("pos_coords") roi_losses = upd_loss(roi_losses, idx=i) losses.update(roi_losses) for i in range(len(self.bbox_head)): bbox_losses = self.bbox_head[i].loss(x, batch_data_samples) if self.with_pos_coord: - pos_coords = bbox_losses.pop('pos_coords') + pos_coords = bbox_losses.pop("pos_coords") positive_coords.append(pos_coords) else: - if 'pos_coords' in bbox_losses.keys(): - bbox_losses.pop('pos_coords') + if "pos_coords" in bbox_losses.keys(): + bbox_losses.pop("pos_coords") bbox_losses = upd_loss(bbox_losses, idx=i + len(self.roi_head)) losses.update(bbox_losses) if self.with_pos_coord and len(positive_coords) > 0: for i in range(len(positive_coords)): - bbox_losses = self.query_head.loss_aux(x, positive_coords[i], - i, batch_data_samples) + bbox_losses = self.query_head.loss_aux(x, positive_coords[i], i, batch_data_samples) bbox_losses = upd_loss(bbox_losses, idx=i) losses.update(bbox_losses) return losses - def predict(self, - batch_inputs: Tensor, - batch_data_samples: SampleList, - rescale: bool = True) -> SampleList: + def predict(self, batch_inputs: Tensor, batch_data_samples: SampleList, rescale: bool = True) -> SampleList: """Predict results from a batch of inputs and data samples with post- processing. @@ -259,62 +237,41 @@ class CoDETR(BaseDetector): - bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2). """ - assert self.eval_module in ['detr', 'one-stage', 'two-stage'] + assert self.eval_module in ["detr", "one-stage", "two-stage"] if self.use_lsj: for data_samples in batch_data_samples: img_metas = data_samples.metainfo - input_img_h, input_img_w = img_metas['batch_input_shape'] - img_metas['img_shape'] = [input_img_h, input_img_w] + input_img_h, input_img_w = img_metas["batch_input_shape"] + img_metas["img_shape"] = [input_img_h, input_img_w] img_feats = self.extract_feat(batch_inputs) - if self.with_bbox and self.eval_module == 'one-stage': - results_list = self.predict_bbox_head( - img_feats, batch_data_samples, rescale=rescale) - elif self.with_roi_head and self.eval_module == 'two-stage': - results_list = self.predict_roi_head( - img_feats, batch_data_samples, rescale=rescale) + if self.with_bbox and self.eval_module == "one-stage": + results_list = self.predict_bbox_head(img_feats, batch_data_samples, rescale=rescale) + elif self.with_roi_head and self.eval_module == "two-stage": + results_list = self.predict_roi_head(img_feats, batch_data_samples, rescale=rescale) else: - results_list = self.predict_query_head( - img_feats, batch_data_samples, rescale=rescale) + results_list = self.predict_query_head(img_feats, batch_data_samples, rescale=rescale) - batch_data_samples = self.add_pred_to_datasample( - batch_data_samples, results_list) + batch_data_samples = self.add_pred_to_datasample(batch_data_samples, results_list) return batch_data_samples - def predict_query_head(self, - mlvl_feats: Tuple[Tensor], - batch_data_samples: SampleList, - rescale: bool = True) -> InstanceList: - return self.query_head.predict( - mlvl_feats, batch_data_samples=batch_data_samples, rescale=rescale) - - def predict_roi_head(self, - mlvl_feats: Tuple[Tensor], - batch_data_samples: SampleList, - rescale: bool = True) -> InstanceList: - assert self.with_bbox, 'Bbox head must be implemented.' + def predict_query_head(self, mlvl_feats: Tuple[Tensor], batch_data_samples: SampleList, rescale: bool = True) -> InstanceList: + return self.query_head.predict(mlvl_feats, batch_data_samples=batch_data_samples, rescale=rescale) + + def predict_roi_head(self, mlvl_feats: Tuple[Tensor], batch_data_samples: SampleList, rescale: bool = True) -> InstanceList: + assert self.with_bbox, "Bbox head must be implemented." if self.with_query_head: - batch_img_metas = [ - data_samples.metainfo for data_samples in batch_data_samples - ] + batch_img_metas = [data_samples.metainfo for data_samples in batch_data_samples] results = self.query_head.forward(mlvl_feats, batch_img_metas) mlvl_feats = results[-1] - rpn_results_list = self.rpn_head.predict( - mlvl_feats, batch_data_samples, rescale=False) - return self.roi_head[self.eval_index].predict( - mlvl_feats, rpn_results_list, batch_data_samples, rescale=rescale) - - def predict_bbox_head(self, - mlvl_feats: Tuple[Tensor], - batch_data_samples: SampleList, - rescale: bool = True) -> InstanceList: - assert self.with_bbox, 'Bbox head must be implemented.' + rpn_results_list = self.rpn_head.predict(mlvl_feats, batch_data_samples, rescale=False) + return self.roi_head[self.eval_index].predict(mlvl_feats, rpn_results_list, batch_data_samples, rescale=rescale) + + def predict_bbox_head(self, mlvl_feats: Tuple[Tensor], batch_data_samples: SampleList, rescale: bool = True) -> InstanceList: + assert self.with_bbox, "Bbox head must be implemented." if self.with_query_head: - batch_img_metas = [ - data_samples.metainfo for data_samples in batch_data_samples - ] + batch_img_metas = [data_samples.metainfo for data_samples in batch_data_samples] results = self.query_head.forward(mlvl_feats, batch_img_metas) mlvl_feats = results[-1] - return self.bbox_head[self.eval_index].predict( - mlvl_feats, batch_data_samples, rescale=rescale) + return self.bbox_head[self.eval_index].predict(mlvl_feats, batch_data_samples, rescale=rescale) diff --git a/mmpose/configs/mmdet/CO-DETR/codetr/transformer.py b/mmpose/configs/mmdet/CO-DETR/codetr/transformer.py index 009f94a8bcc88c584b336bab272a48b4960202de..935c26323ceed840a456ac4444a916e191495081 100644 --- a/mmpose/configs/mmdet/CO-DETR/codetr/transformer.py +++ b/mmpose/configs/mmdet/CO-DETR/codetr/transformer.py @@ -4,17 +4,14 @@ import warnings import torch import torch.nn as nn from mmcv.cnn import build_norm_layer -from mmcv.cnn.bricks.transformer import (BaseTransformerLayer, - TransformerLayerSequence, - build_transformer_layer_sequence) +from mmcv.cnn.bricks.transformer import BaseTransformerLayer, TransformerLayerSequence, build_transformer_layer_sequence from mmcv.ops import MultiScaleDeformableAttention +from mmdet.models.layers.transformer import inverse_sigmoid +from mmdet.registry import MODELS from mmengine.model import BaseModule from mmengine.model.weight_init import xavier_init from torch.nn.init import normal_ -from mmdet.models.layers.transformer import inverse_sigmoid -from mmdet.registry import MODELS - try: from fairscale.nn.checkpoint import checkpoint_wrapper except Exception: @@ -55,8 +52,8 @@ class Transformer(BaseModule): def init_weights(self): # follow the official DETR to init parameters for m in self.modules(): - if hasattr(m, 'weight') and m.weight.dim() > 1: - xavier_init(m, distribution='uniform') + if hasattr(m, "weight") and m.weight.dim() > 1: + xavier_init(m, distribution="uniform") self._is_init = True def forward(self, x, mask, query_embed, pos_embed): @@ -86,24 +83,12 @@ class Transformer(BaseModule): # use `view` instead of `flatten` for dynamically exporting to ONNX x = x.view(bs, c, -1).permute(2, 0, 1) # [bs, c, h, w] -> [h*w, bs, c] pos_embed = pos_embed.view(bs, c, -1).permute(2, 0, 1) - query_embed = query_embed.unsqueeze(1).repeat( - 1, bs, 1) # [num_query, dim] -> [num_query, bs, dim] + query_embed = query_embed.unsqueeze(1).repeat(1, bs, 1) # [num_query, dim] -> [num_query, bs, dim] mask = mask.view(bs, -1) # [bs, h, w] -> [bs, h*w] - memory = self.encoder( - query=x, - key=None, - value=None, - query_pos=pos_embed, - query_key_padding_mask=mask) + memory = self.encoder(query=x, key=None, value=None, query_pos=pos_embed, query_key_padding_mask=mask) target = torch.zeros_like(query_embed) # out_dec: [num_layers, num_query, bs, dim] - out_dec = self.decoder( - query=target, - key=memory, - value=memory, - key_pos=pos_embed, - query_pos=query_embed, - key_padding_mask=mask) + out_dec = self.decoder(query=target, key=memory, value=memory, key_pos=pos_embed, query_pos=query_embed, key_padding_mask=mask) out_dec = out_dec.transpose(1, 2) memory = memory.permute(1, 2, 0).reshape(bs, c, h, w) return out_dec, memory @@ -124,13 +109,7 @@ class DeformableDetrTransformerDecoder(TransformerLayerSequence): super(DeformableDetrTransformerDecoder, self).__init__(*args, **kwargs) self.return_intermediate = return_intermediate - def forward(self, - query, - *args, - reference_points=None, - valid_ratios=None, - reg_branches=None, - **kwargs): + def forward(self, query, *args, reference_points=None, valid_ratios=None, reg_branches=None, **kwargs): """Forward function for `TransformerDecoder`. Args: @@ -158,30 +137,22 @@ class DeformableDetrTransformerDecoder(TransformerLayerSequence): intermediate_reference_points = [] for lid, layer in enumerate(self.layers): if reference_points.shape[-1] == 4: - reference_points_input = reference_points[:, :, None] * \ - torch.cat([valid_ratios, valid_ratios], -1)[:, None] + reference_points_input = reference_points[:, :, None] * torch.cat([valid_ratios, valid_ratios], -1)[:, None] else: assert reference_points.shape[-1] == 2 - reference_points_input = reference_points[:, :, None] * \ - valid_ratios[:, None] - output = layer( - output, - *args, - reference_points=reference_points_input, - **kwargs) + reference_points_input = reference_points[:, :, None] * valid_ratios[:, None] + output = layer(output, *args, reference_points=reference_points_input, **kwargs) output = output.permute(1, 0, 2) if reg_branches is not None: tmp = reg_branches[lid](output) if reference_points.shape[-1] == 4: - new_reference_points = tmp + inverse_sigmoid( - reference_points) + new_reference_points = tmp + inverse_sigmoid(reference_points) new_reference_points = new_reference_points.sigmoid() else: assert reference_points.shape[-1] == 2 new_reference_points = tmp - new_reference_points[..., :2] = tmp[ - ..., :2] + inverse_sigmoid(reference_points) + new_reference_points[..., :2] = tmp[..., :2] + inverse_sigmoid(reference_points) new_reference_points = new_reference_points.sigmoid() reference_points = new_reference_points.detach() @@ -191,8 +162,7 @@ class DeformableDetrTransformerDecoder(TransformerLayerSequence): intermediate_reference_points.append(reference_points) if self.return_intermediate: - return torch.stack(intermediate), torch.stack( - intermediate_reference_points) + return torch.stack(intermediate), torch.stack(intermediate_reference_points) return output, reference_points @@ -210,11 +180,7 @@ class DeformableDetrTransformer(Transformer): `as_two_stage` as True. Default: 300. """ - def __init__(self, - as_two_stage=False, - num_feature_levels=4, - two_stage_num_proposals=300, - **kwargs): + def __init__(self, as_two_stage=False, num_feature_levels=4, two_stage_num_proposals=300, **kwargs): super(DeformableDetrTransformer, self).__init__(**kwargs) self.as_two_stage = as_two_stage self.num_feature_levels = num_feature_levels @@ -224,14 +190,12 @@ class DeformableDetrTransformer(Transformer): def init_layers(self): """Initialize layers of the DeformableDetrTransformer.""" - self.level_embeds = nn.Parameter( - torch.Tensor(self.num_feature_levels, self.embed_dims)) + self.level_embeds = nn.Parameter(torch.Tensor(self.num_feature_levels, self.embed_dims)) if self.as_two_stage: self.enc_output = nn.Linear(self.embed_dims, self.embed_dims) self.enc_output_norm = nn.LayerNorm(self.embed_dims) - self.pos_trans = nn.Linear(self.embed_dims * 2, - self.embed_dims * 2) + self.pos_trans = nn.Linear(self.embed_dims * 2, self.embed_dims * 2) self.pos_trans_norm = nn.LayerNorm(self.embed_dims * 2) else: self.reference_points = nn.Linear(self.embed_dims, 2) @@ -245,11 +209,10 @@ class DeformableDetrTransformer(Transformer): if isinstance(m, MultiScaleDeformableAttention): m.init_weights() if not self.as_two_stage: - xavier_init(self.reference_points, distribution='uniform', bias=0.) + xavier_init(self.reference_points, distribution="uniform", bias=0.0) normal_(self.level_embeds) - def gen_encoder_output_proposals(self, memory, memory_padding_mask, - spatial_shapes): + def gen_encoder_output_proposals(self, memory, memory_padding_mask, spatial_shapes): """Generate proposals from encoded memory. Args: @@ -278,40 +241,31 @@ class DeformableDetrTransformer(Transformer): proposals = [] _cur = 0 for lvl, (H, W) in enumerate(spatial_shapes): - mask_flatten_ = memory_padding_mask[:, _cur:(_cur + H * W)].view( - N, H, W, 1) + mask_flatten_ = memory_padding_mask[:, _cur : (_cur + H * W)].view(N, H, W, 1) valid_H = torch.sum(~mask_flatten_[:, :, 0, 0], 1) valid_W = torch.sum(~mask_flatten_[:, 0, :, 0], 1) grid_y, grid_x = torch.meshgrid( - torch.linspace( - 0, H - 1, H, dtype=torch.float32, device=memory.device), - torch.linspace( - 0, W - 1, W, dtype=torch.float32, device=memory.device)) + torch.linspace(0, H - 1, H, dtype=torch.float32, device=memory.device), + torch.linspace(0, W - 1, W, dtype=torch.float32, device=memory.device), + ) grid = torch.cat([grid_x.unsqueeze(-1), grid_y.unsqueeze(-1)], -1) - scale = torch.cat([valid_W.unsqueeze(-1), - valid_H.unsqueeze(-1)], 1).view(N, 1, 1, 2) + scale = torch.cat([valid_W.unsqueeze(-1), valid_H.unsqueeze(-1)], 1).view(N, 1, 1, 2) grid = (grid.unsqueeze(0).expand(N, -1, -1, -1) + 0.5) / scale wh = torch.ones_like(grid) * 0.05 * (2.0**lvl) proposal = torch.cat((grid, wh), -1).view(N, -1, 4) proposals.append(proposal) - _cur += (H * W) + _cur += H * W output_proposals = torch.cat(proposals, 1) - output_proposals_valid = ((output_proposals > 0.01) & - (output_proposals < 0.99)).all( - -1, keepdim=True) + output_proposals_valid = ((output_proposals > 0.01) & (output_proposals < 0.99)).all(-1, keepdim=True) output_proposals = torch.log(output_proposals / (1 - output_proposals)) - output_proposals = output_proposals.masked_fill( - memory_padding_mask.unsqueeze(-1), float('inf')) - output_proposals = output_proposals.masked_fill( - ~output_proposals_valid, float('inf')) + output_proposals = output_proposals.masked_fill(memory_padding_mask.unsqueeze(-1), float("inf")) + output_proposals = output_proposals.masked_fill(~output_proposals_valid, float("inf")) output_memory = memory - output_memory = output_memory.masked_fill( - memory_padding_mask.unsqueeze(-1), float(0)) - output_memory = output_memory.masked_fill(~output_proposals_valid, - float(0)) + output_memory = output_memory.masked_fill(memory_padding_mask.unsqueeze(-1), float(0)) + output_memory = output_memory.masked_fill(~output_proposals_valid, float(0)) output_memory = self.enc_output_norm(self.enc_output(output_memory)) return output_memory, output_proposals @@ -335,14 +289,11 @@ class DeformableDetrTransformer(Transformer): reference_points_list = [] for lvl, (H, W) in enumerate(spatial_shapes): ref_y, ref_x = torch.meshgrid( - torch.linspace( - 0.5, H - 0.5, H, dtype=torch.float32, device=device), - torch.linspace( - 0.5, W - 0.5, W, dtype=torch.float32, device=device)) - ref_y = ref_y.reshape(-1)[None] / ( - valid_ratios[:, None, lvl, 1] * H) - ref_x = ref_x.reshape(-1)[None] / ( - valid_ratios[:, None, lvl, 0] * W) + torch.linspace(0.5, H - 0.5, H, dtype=torch.float32, device=device), + torch.linspace(0.5, W - 0.5, W, dtype=torch.float32, device=device), + ) + ref_y = ref_y.reshape(-1)[None] / (valid_ratios[:, None, lvl, 1] * H) + ref_x = ref_x.reshape(-1)[None] / (valid_ratios[:, None, lvl, 0] * W) ref = torch.stack((ref_x, ref_y), -1) reference_points_list.append(ref) reference_points = torch.cat(reference_points_list, 1) @@ -359,32 +310,20 @@ class DeformableDetrTransformer(Transformer): valid_ratio = torch.stack([valid_ratio_w, valid_ratio_h], -1) return valid_ratio - def get_proposal_pos_embed(self, - proposals, - num_pos_feats=128, - temperature=10000): + def get_proposal_pos_embed(self, proposals, num_pos_feats=128, temperature=10000): """Get the position embedding of proposal.""" scale = 2 * math.pi - dim_t = torch.arange( - num_pos_feats, dtype=torch.float32, device=proposals.device) - dim_t = temperature**(2 * (dim_t // 2) / num_pos_feats) + dim_t = torch.arange(num_pos_feats, dtype=torch.float32, device=proposals.device) + dim_t = temperature ** (2 * (dim_t // 2) / num_pos_feats) # N, L, 4 proposals = proposals.sigmoid() * scale # N, L, 4, 128 pos = proposals[:, :, :, None] / dim_t # N, L, 4, 64, 2 - pos = torch.stack((pos[:, :, :, 0::2].sin(), pos[:, :, :, 1::2].cos()), - dim=4).flatten(2) + pos = torch.stack((pos[:, :, :, 0::2].sin(), pos[:, :, :, 1::2].cos()), dim=4).flatten(2) return pos - def forward(self, - mlvl_feats, - mlvl_masks, - query_embed, - mlvl_pos_embeds, - reg_branches=None, - cls_branches=None, - **kwargs): + def forward(self, mlvl_feats, mlvl_masks, query_embed, mlvl_pos_embeds, reg_branches=None, cls_branches=None, **kwargs): """Forward function for `Transformer`. Args: @@ -439,8 +378,7 @@ class DeformableDetrTransformer(Transformer): mask_flatten = [] lvl_pos_embed_flatten = [] spatial_shapes = [] - for lvl, (feat, mask, pos_embed) in enumerate( - zip(mlvl_feats, mlvl_masks, mlvl_pos_embeds)): + for lvl, (feat, mask, pos_embed) in enumerate(zip(mlvl_feats, mlvl_masks, mlvl_pos_embeds)): bs, c, h, w = feat.shape spatial_shape = (h, w) spatial_shapes.append(spatial_shape) @@ -454,21 +392,14 @@ class DeformableDetrTransformer(Transformer): feat_flatten = torch.cat(feat_flatten, 1) mask_flatten = torch.cat(mask_flatten, 1) lvl_pos_embed_flatten = torch.cat(lvl_pos_embed_flatten, 1) - spatial_shapes = torch.as_tensor( - spatial_shapes, dtype=torch.long, device=feat_flatten.device) - level_start_index = torch.cat((spatial_shapes.new_zeros( - (1, )), spatial_shapes.prod(1).cumsum(0)[:-1])) - valid_ratios = torch.stack( - [self.get_valid_ratio(m) for m in mlvl_masks], 1) - - reference_points = \ - self.get_reference_points(spatial_shapes, - valid_ratios, - device=feat.device) + spatial_shapes = torch.as_tensor(spatial_shapes, dtype=torch.long, device=feat_flatten.device) + level_start_index = torch.cat((spatial_shapes.new_zeros((1,)), spatial_shapes.prod(1).cumsum(0)[:-1])) + valid_ratios = torch.stack([self.get_valid_ratio(m) for m in mlvl_masks], 1) + + reference_points = self.get_reference_points(spatial_shapes, valid_ratios, device=feat.device) feat_flatten = feat_flatten.permute(1, 0, 2) # (H*W, bs, embed_dims) - lvl_pos_embed_flatten = lvl_pos_embed_flatten.permute( - 1, 0, 2) # (H*W, bs, embed_dims) + lvl_pos_embed_flatten = lvl_pos_embed_flatten.permute(1, 0, 2) # (H*W, bs, embed_dims) memory = self.encoder( query=feat_flatten, key=None, @@ -479,19 +410,15 @@ class DeformableDetrTransformer(Transformer): reference_points=reference_points, level_start_index=level_start_index, valid_ratios=valid_ratios, - **kwargs) + **kwargs, + ) memory = memory.permute(1, 0, 2) bs, _, c = memory.shape if self.as_two_stage: - output_memory, output_proposals = \ - self.gen_encoder_output_proposals( - memory, mask_flatten, spatial_shapes) - enc_outputs_class = cls_branches[self.decoder.num_layers]( - output_memory) - enc_outputs_coord_unact = \ - reg_branches[ - self.decoder.num_layers](output_memory) + output_proposals + output_memory, output_proposals = self.gen_encoder_output_proposals(memory, mask_flatten, spatial_shapes) + enc_outputs_class = cls_branches[self.decoder.num_layers](output_memory) + enc_outputs_coord_unact = reg_branches[self.decoder.num_layers](output_memory) + output_proposals topk = self.two_stage_num_proposals # We only use the first channel in enc_outputs_class as foreground, @@ -502,16 +429,12 @@ class DeformableDetrTransformer(Transformer): # num_classes (similar convention in RPN). # See https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/dense_heads/deformable_detr_head.py#L241 # noqa # This follows the official implementation of Deformable DETR. - topk_proposals = torch.topk( - enc_outputs_class[..., 0], topk, dim=1)[1] - topk_coords_unact = torch.gather( - enc_outputs_coord_unact, 1, - topk_proposals.unsqueeze(-1).repeat(1, 1, 4)) + topk_proposals = torch.topk(enc_outputs_class[..., 0], topk, dim=1)[1] + topk_coords_unact = torch.gather(enc_outputs_coord_unact, 1, topk_proposals.unsqueeze(-1).repeat(1, 1, 4)) topk_coords_unact = topk_coords_unact.detach() reference_points = topk_coords_unact.sigmoid() init_reference_out = reference_points - pos_trans_out = self.pos_trans_norm( - self.pos_trans(self.get_proposal_pos_embed(topk_coords_unact))) + pos_trans_out = self.pos_trans_norm(self.pos_trans(self.get_proposal_pos_embed(topk_coords_unact))) query_pos, query = torch.split(pos_trans_out, c, dim=2) else: query_pos, query = torch.split(query_embed, c, dim=1) @@ -535,15 +458,13 @@ class DeformableDetrTransformer(Transformer): level_start_index=level_start_index, valid_ratios=valid_ratios, reg_branches=reg_branches, - **kwargs) + **kwargs, + ) inter_references_out = inter_references if self.as_two_stage: - return inter_states, init_reference_out,\ - inter_references_out, enc_outputs_class,\ - enc_outputs_coord_unact - return inter_states, init_reference_out, \ - inter_references_out, None, None + return inter_states, init_reference_out, inter_references_out, enc_outputs_class, enc_outputs_coord_unact + return inter_states, init_reference_out, inter_references_out, None, None @MODELS.register_module() @@ -556,24 +477,13 @@ class CoDeformableDetrTransformerDecoder(TransformerLayerSequence): `LN`. """ - def __init__(self, - *args, - return_intermediate=False, - look_forward_twice=False, - **kwargs): + def __init__(self, *args, return_intermediate=False, look_forward_twice=False, **kwargs): - super(CoDeformableDetrTransformerDecoder, - self).__init__(*args, **kwargs) + super(CoDeformableDetrTransformerDecoder, self).__init__(*args, **kwargs) self.return_intermediate = return_intermediate self.look_forward_twice = look_forward_twice - def forward(self, - query, - *args, - reference_points=None, - valid_ratios=None, - reg_branches=None, - **kwargs): + def forward(self, query, *args, reference_points=None, valid_ratios=None, reg_branches=None, **kwargs): """Forward function for `TransformerDecoder`. Args: @@ -601,42 +511,31 @@ class CoDeformableDetrTransformerDecoder(TransformerLayerSequence): intermediate_reference_points = [] for lid, layer in enumerate(self.layers): if reference_points.shape[-1] == 4: - reference_points_input = reference_points[:, :, None] * \ - torch.cat([valid_ratios, valid_ratios], -1)[:, None] + reference_points_input = reference_points[:, :, None] * torch.cat([valid_ratios, valid_ratios], -1)[:, None] else: assert reference_points.shape[-1] == 2 - reference_points_input = reference_points[:, :, None] * \ - valid_ratios[:, None] - output = layer( - output, - *args, - reference_points=reference_points_input, - **kwargs) + reference_points_input = reference_points[:, :, None] * valid_ratios[:, None] + output = layer(output, *args, reference_points=reference_points_input, **kwargs) output = output.permute(1, 0, 2) if reg_branches is not None: tmp = reg_branches[lid](output) if reference_points.shape[-1] == 4: - new_reference_points = tmp + inverse_sigmoid( - reference_points) + new_reference_points = tmp + inverse_sigmoid(reference_points) new_reference_points = new_reference_points.sigmoid() else: assert reference_points.shape[-1] == 2 new_reference_points = tmp - new_reference_points[..., :2] = tmp[ - ..., :2] + inverse_sigmoid(reference_points) + new_reference_points[..., :2] = tmp[..., :2] + inverse_sigmoid(reference_points) new_reference_points = new_reference_points.sigmoid() reference_points = new_reference_points.detach() output = output.permute(1, 0, 2) if self.return_intermediate: intermediate.append(output) - intermediate_reference_points.append( - new_reference_points if self. - look_forward_twice else reference_points) + intermediate_reference_points.append(new_reference_points if self.look_forward_twice else reference_points) if self.return_intermediate: - return torch.stack(intermediate), torch.stack( - intermediate_reference_points) + return torch.stack(intermediate), torch.stack(intermediate_reference_points) return output, reference_points @@ -644,12 +543,7 @@ class CoDeformableDetrTransformerDecoder(TransformerLayerSequence): @MODELS.register_module() class CoDeformableDetrTransformer(DeformableDetrTransformer): - def __init__(self, - mixed_selection=True, - with_pos_coord=True, - with_coord_feat=True, - num_co_heads=1, - **kwargs): + def __init__(self, mixed_selection=True, with_pos_coord=True, with_coord_feat=True, num_co_heads=1, **kwargs): self.mixed_selection = mixed_selection self.with_pos_coord = with_pos_coord self.with_coord_feat = with_coord_feat @@ -666,52 +560,44 @@ class CoDeformableDetrTransformer(DeformableDetrTransformer): # we keep this bug for reproducing our results with ResNet-50. # You can fix this bug when reproducing results with # swin transformer. - self.head_pos_embed = nn.Embedding(self.num_co_heads, 1, 1, - self.embed_dims) + self.head_pos_embed = nn.Embedding(self.num_co_heads, 1, 1, self.embed_dims) self.aux_pos_trans = nn.ModuleList() self.aux_pos_trans_norm = nn.ModuleList() self.pos_feats_trans = nn.ModuleList() self.pos_feats_norm = nn.ModuleList() for i in range(self.num_co_heads): - self.aux_pos_trans.append( - nn.Linear(self.embed_dims * 2, self.embed_dims * 2)) - self.aux_pos_trans_norm.append( - nn.LayerNorm(self.embed_dims * 2)) + self.aux_pos_trans.append(nn.Linear(self.embed_dims * 2, self.embed_dims * 2)) + self.aux_pos_trans_norm.append(nn.LayerNorm(self.embed_dims * 2)) if self.with_coord_feat: - self.pos_feats_trans.append( - nn.Linear(self.embed_dims, self.embed_dims)) - self.pos_feats_norm.append( - nn.LayerNorm(self.embed_dims)) - - def get_proposal_pos_embed(self, - proposals, - num_pos_feats=128, - temperature=10000): + self.pos_feats_trans.append(nn.Linear(self.embed_dims, self.embed_dims)) + self.pos_feats_norm.append(nn.LayerNorm(self.embed_dims)) + + def get_proposal_pos_embed(self, proposals, num_pos_feats=128, temperature=10000): """Get the position embedding of proposal.""" num_pos_feats = self.embed_dims // 2 scale = 2 * math.pi - dim_t = torch.arange( - num_pos_feats, dtype=torch.float32, device=proposals.device) - dim_t = temperature**(2 * (dim_t // 2) / num_pos_feats) + dim_t = torch.arange(num_pos_feats, dtype=torch.float32, device=proposals.device) + dim_t = temperature ** (2 * (dim_t // 2) / num_pos_feats) # N, L, 4 proposals = proposals.sigmoid() * scale # N, L, 4, 128 pos = proposals[:, :, :, None] / dim_t # N, L, 4, 64, 2 - pos = torch.stack((pos[:, :, :, 0::2].sin(), pos[:, :, :, 1::2].cos()), - dim=4).flatten(2) + pos = torch.stack((pos[:, :, :, 0::2].sin(), pos[:, :, :, 1::2].cos()), dim=4).flatten(2) return pos - def forward(self, - mlvl_feats, - mlvl_masks, - query_embed, - mlvl_pos_embeds, - reg_branches=None, - cls_branches=None, - return_encoder_output=False, - attn_masks=None, - **kwargs): + def forward( + self, + mlvl_feats, + mlvl_masks, + query_embed, + mlvl_pos_embeds, + reg_branches=None, + cls_branches=None, + return_encoder_output=False, + attn_masks=None, + **kwargs, + ): """Forward function for `Transformer`. Args: @@ -766,8 +652,7 @@ class CoDeformableDetrTransformer(DeformableDetrTransformer): mask_flatten = [] lvl_pos_embed_flatten = [] spatial_shapes = [] - for lvl, (feat, mask, pos_embed) in enumerate( - zip(mlvl_feats, mlvl_masks, mlvl_pos_embeds)): + for lvl, (feat, mask, pos_embed) in enumerate(zip(mlvl_feats, mlvl_masks, mlvl_pos_embeds)): bs, c, h, w = feat.shape spatial_shape = (h, w) spatial_shapes.append(spatial_shape) @@ -781,21 +666,14 @@ class CoDeformableDetrTransformer(DeformableDetrTransformer): feat_flatten = torch.cat(feat_flatten, 1) mask_flatten = torch.cat(mask_flatten, 1) lvl_pos_embed_flatten = torch.cat(lvl_pos_embed_flatten, 1) - spatial_shapes = torch.as_tensor( - spatial_shapes, dtype=torch.long, device=feat_flatten.device) - level_start_index = torch.cat((spatial_shapes.new_zeros( - (1, )), spatial_shapes.prod(1).cumsum(0)[:-1])) - valid_ratios = torch.stack( - [self.get_valid_ratio(m) for m in mlvl_masks], 1) - - reference_points = \ - self.get_reference_points(spatial_shapes, - valid_ratios, - device=feat.device) + spatial_shapes = torch.as_tensor(spatial_shapes, dtype=torch.long, device=feat_flatten.device) + level_start_index = torch.cat((spatial_shapes.new_zeros((1,)), spatial_shapes.prod(1).cumsum(0)[:-1])) + valid_ratios = torch.stack([self.get_valid_ratio(m) for m in mlvl_masks], 1) + + reference_points = self.get_reference_points(spatial_shapes, valid_ratios, device=feat.device) feat_flatten = feat_flatten.permute(1, 0, 2) # (H*W, bs, embed_dims) - lvl_pos_embed_flatten = lvl_pos_embed_flatten.permute( - 1, 0, 2) # (H*W, bs, embed_dims) + lvl_pos_embed_flatten = lvl_pos_embed_flatten.permute(1, 0, 2) # (H*W, bs, embed_dims) memory = self.encoder( query=feat_flatten, key=None, @@ -806,32 +684,24 @@ class CoDeformableDetrTransformer(DeformableDetrTransformer): reference_points=reference_points, level_start_index=level_start_index, valid_ratios=valid_ratios, - **kwargs) + **kwargs, + ) memory = memory.permute(1, 0, 2) bs, _, c = memory.shape if self.as_two_stage: - output_memory, output_proposals = \ - self.gen_encoder_output_proposals( - memory, mask_flatten, spatial_shapes) - enc_outputs_class = cls_branches[self.decoder.num_layers]( - output_memory) - enc_outputs_coord_unact = \ - reg_branches[ - self.decoder.num_layers](output_memory) + output_proposals + output_memory, output_proposals = self.gen_encoder_output_proposals(memory, mask_flatten, spatial_shapes) + enc_outputs_class = cls_branches[self.decoder.num_layers](output_memory) + enc_outputs_coord_unact = reg_branches[self.decoder.num_layers](output_memory) + output_proposals topk = self.two_stage_num_proposals topk = query_embed.shape[0] - topk_proposals = torch.topk( - enc_outputs_class[..., 0], topk, dim=1)[1] - topk_coords_unact = torch.gather( - enc_outputs_coord_unact, 1, - topk_proposals.unsqueeze(-1).repeat(1, 1, 4)) + topk_proposals = torch.topk(enc_outputs_class[..., 0], topk, dim=1)[1] + topk_coords_unact = torch.gather(enc_outputs_coord_unact, 1, topk_proposals.unsqueeze(-1).repeat(1, 1, 4)) topk_coords_unact = topk_coords_unact.detach() reference_points = topk_coords_unact.sigmoid() init_reference_out = reference_points - pos_trans_out = self.pos_trans_norm( - self.pos_trans(self.get_proposal_pos_embed(topk_coords_unact))) + pos_trans_out = self.pos_trans_norm(self.pos_trans(self.get_proposal_pos_embed(topk_coords_unact))) if not self.mixed_selection: query_pos, query = torch.split(pos_trans_out, c, dim=2) @@ -862,41 +732,37 @@ class CoDeformableDetrTransformer(DeformableDetrTransformer): valid_ratios=valid_ratios, reg_branches=reg_branches, attn_masks=attn_masks, - **kwargs) + **kwargs, + ) inter_references_out = inter_references if self.as_two_stage: if return_encoder_output: - return inter_states, init_reference_out,\ - inter_references_out, enc_outputs_class,\ - enc_outputs_coord_unact, memory - return inter_states, init_reference_out,\ - inter_references_out, enc_outputs_class,\ - enc_outputs_coord_unact + return inter_states, init_reference_out, inter_references_out, enc_outputs_class, enc_outputs_coord_unact, memory + return inter_states, init_reference_out, inter_references_out, enc_outputs_class, enc_outputs_coord_unact if return_encoder_output: - return inter_states, init_reference_out, \ - inter_references_out, None, None, memory - return inter_states, init_reference_out, \ - inter_references_out, None, None - - def forward_aux(self, - mlvl_feats, - mlvl_masks, - query_embed, - mlvl_pos_embeds, - pos_anchors, - pos_feats=None, - reg_branches=None, - cls_branches=None, - return_encoder_output=False, - attn_masks=None, - head_idx=0, - **kwargs): + return inter_states, init_reference_out, inter_references_out, None, None, memory + return inter_states, init_reference_out, inter_references_out, None, None + + def forward_aux( + self, + mlvl_feats, + mlvl_masks, + query_embed, + mlvl_pos_embeds, + pos_anchors, + pos_feats=None, + reg_branches=None, + cls_branches=None, + return_encoder_output=False, + attn_masks=None, + head_idx=0, + **kwargs, + ): feat_flatten = [] mask_flatten = [] spatial_shapes = [] - for lvl, (feat, mask, pos_embed) in enumerate( - zip(mlvl_feats, mlvl_masks, mlvl_pos_embeds)): + for lvl, (feat, mask, pos_embed) in enumerate(zip(mlvl_feats, mlvl_masks, mlvl_pos_embeds)): bs, c, h, w = feat.shape spatial_shape = (h, w) spatial_shapes.append(spatial_shape) @@ -906,12 +772,9 @@ class CoDeformableDetrTransformer(DeformableDetrTransformer): mask_flatten.append(mask) feat_flatten = torch.cat(feat_flatten, 1) mask_flatten = torch.cat(mask_flatten, 1) - spatial_shapes = torch.as_tensor( - spatial_shapes, dtype=torch.long, device=feat_flatten.device) - level_start_index = torch.cat((spatial_shapes.new_zeros( - (1, )), spatial_shapes.prod(1).cumsum(0)[:-1])) - valid_ratios = torch.stack( - [self.get_valid_ratio(m) for m in mlvl_masks], 1) + spatial_shapes = torch.as_tensor(spatial_shapes, dtype=torch.long, device=feat_flatten.device) + level_start_index = torch.cat((spatial_shapes.new_zeros((1,)), spatial_shapes.prod(1).cumsum(0)[:-1])) + valid_ratios = torch.stack([self.get_valid_ratio(m) for m in mlvl_masks], 1) feat_flatten = feat_flatten.permute(1, 0, 2) # (H*W, bs, embed_dims) @@ -923,13 +786,10 @@ class CoDeformableDetrTransformer(DeformableDetrTransformer): reference_points = pos_anchors init_reference_out = reference_points if self.num_co_heads > 0: - pos_trans_out = self.aux_pos_trans_norm[head_idx]( - self.aux_pos_trans[head_idx]( - self.get_proposal_pos_embed(topk_coords_unact))) + pos_trans_out = self.aux_pos_trans_norm[head_idx](self.aux_pos_trans[head_idx](self.get_proposal_pos_embed(topk_coords_unact))) query_pos, query = torch.split(pos_trans_out, c, dim=2) if self.with_coord_feat: - query = query + self.pos_feats_norm[head_idx]( - self.pos_feats_trans[head_idx](pos_feats)) + query = query + self.pos_feats_norm[head_idx](self.pos_feats_trans[head_idx](pos_feats)) query_pos = query_pos + self.head_pos_embed.weight[head_idx] # decoder @@ -948,16 +808,15 @@ class CoDeformableDetrTransformer(DeformableDetrTransformer): valid_ratios=valid_ratios, reg_branches=reg_branches, attn_masks=attn_masks, - **kwargs) + **kwargs, + ) inter_references_out = inter_references - return inter_states, init_reference_out, \ - inter_references_out + return inter_states, init_reference_out, inter_references_out def build_MLP(input_dim, hidden_dim, output_dim, num_layers): - assert num_layers > 1, \ - f'num_layers should be greater than 1 but got {num_layers}' + assert num_layers > 1, f"num_layers should be greater than 1 but got {num_layers}" h = [hidden_dim] * (num_layers - 1) layers = list() for n, k in zip([input_dim] + h[:-1], h): @@ -977,8 +836,7 @@ class DinoTransformerDecoder(DeformableDetrTransformerDecoder): self._init_layers() def _init_layers(self): - self.ref_point_head = build_MLP(self.embed_dims * 2, self.embed_dims, - self.embed_dims, 2) + self.ref_point_head = build_MLP(self.embed_dims * 2, self.embed_dims, self.embed_dims, 2) self.norm = nn.LayerNorm(self.embed_dims) @staticmethod @@ -986,76 +844,52 @@ class DinoTransformerDecoder(DeformableDetrTransformerDecoder): # n_query, bs, _ = pos_tensor.size() # sineembed_tensor = torch.zeros(n_query, bs, 256) scale = 2 * math.pi - dim_t = torch.arange( - pos_feat, dtype=torch.float32, device=pos_tensor.device) - dim_t = 10000**(2 * (dim_t // 2) / pos_feat) + dim_t = torch.arange(pos_feat, dtype=torch.float32, device=pos_tensor.device) + dim_t = 10000 ** (2 * (dim_t // 2) / pos_feat) x_embed = pos_tensor[:, :, 0] * scale y_embed = pos_tensor[:, :, 1] * scale pos_x = x_embed[:, :, None] / dim_t pos_y = y_embed[:, :, None] / dim_t - pos_x = torch.stack((pos_x[:, :, 0::2].sin(), pos_x[:, :, 1::2].cos()), - dim=3).flatten(2) - pos_y = torch.stack((pos_y[:, :, 0::2].sin(), pos_y[:, :, 1::2].cos()), - dim=3).flatten(2) + pos_x = torch.stack((pos_x[:, :, 0::2].sin(), pos_x[:, :, 1::2].cos()), dim=3).flatten(2) + pos_y = torch.stack((pos_y[:, :, 0::2].sin(), pos_y[:, :, 1::2].cos()), dim=3).flatten(2) if pos_tensor.size(-1) == 2: pos = torch.cat((pos_y, pos_x), dim=2) elif pos_tensor.size(-1) == 4: w_embed = pos_tensor[:, :, 2] * scale pos_w = w_embed[:, :, None] / dim_t - pos_w = torch.stack( - (pos_w[:, :, 0::2].sin(), pos_w[:, :, 1::2].cos()), - dim=3).flatten(2) + pos_w = torch.stack((pos_w[:, :, 0::2].sin(), pos_w[:, :, 1::2].cos()), dim=3).flatten(2) h_embed = pos_tensor[:, :, 3] * scale pos_h = h_embed[:, :, None] / dim_t - pos_h = torch.stack( - (pos_h[:, :, 0::2].sin(), pos_h[:, :, 1::2].cos()), - dim=3).flatten(2) + pos_h = torch.stack((pos_h[:, :, 0::2].sin(), pos_h[:, :, 1::2].cos()), dim=3).flatten(2) pos = torch.cat((pos_y, pos_x, pos_w, pos_h), dim=2) else: - raise ValueError('Unknown pos_tensor shape(-1):{}'.format( - pos_tensor.size(-1))) + raise ValueError("Unknown pos_tensor shape(-1):{}".format(pos_tensor.size(-1))) return pos - def forward(self, - query, - *args, - reference_points=None, - valid_ratios=None, - reg_branches=None, - **kwargs): + def forward(self, query, *args, reference_points=None, valid_ratios=None, reg_branches=None, **kwargs): output = query intermediate = [] intermediate_reference_points = [reference_points] for lid, layer in enumerate(self.layers): if reference_points.shape[-1] == 4: - reference_points_input = \ - reference_points[:, :, None] * torch.cat( - [valid_ratios, valid_ratios], -1)[:, None] + reference_points_input = reference_points[:, :, None] * torch.cat([valid_ratios, valid_ratios], -1)[:, None] else: assert reference_points.shape[-1] == 2 - reference_points_input = \ - reference_points[:, :, None] * valid_ratios[:, None] + reference_points_input = reference_points[:, :, None] * valid_ratios[:, None] - query_sine_embed = self.gen_sineembed_for_position( - reference_points_input[:, :, 0, :], self.embed_dims // 2) + query_sine_embed = self.gen_sineembed_for_position(reference_points_input[:, :, 0, :], self.embed_dims // 2) query_pos = self.ref_point_head(query_sine_embed) query_pos = query_pos.permute(1, 0, 2) - output = layer( - output, - *args, - query_pos=query_pos, - reference_points=reference_points_input, - **kwargs) + output = layer(output, *args, query_pos=query_pos, reference_points=reference_points_input, **kwargs) output = output.permute(1, 0, 2) if reg_branches is not None: tmp = reg_branches[lid](output) assert reference_points.shape[-1] == 4 - new_reference_points = tmp + inverse_sigmoid( - reference_points, eps=1e-3) + new_reference_points = tmp + inverse_sigmoid(reference_points, eps=1e-3) new_reference_points = new_reference_points.sigmoid() reference_points = new_reference_points.detach() @@ -1067,8 +901,7 @@ class DinoTransformerDecoder(DeformableDetrTransformerDecoder): # in the DeformDETR, reference_points was appended. if self.return_intermediate: - return torch.stack(intermediate), torch.stack( - intermediate_reference_points) + return torch.stack(intermediate), torch.stack(intermediate_reference_points) return output, reference_points @@ -1081,12 +914,10 @@ class CoDinoTransformer(CoDeformableDetrTransformer): def init_layers(self): """Initialize layers of the DinoTransformer.""" - self.level_embeds = nn.Parameter( - torch.Tensor(self.num_feature_levels, self.embed_dims)) + self.level_embeds = nn.Parameter(torch.Tensor(self.num_feature_levels, self.embed_dims)) self.enc_output = nn.Linear(self.embed_dims, self.embed_dims) self.enc_output_norm = nn.LayerNorm(self.embed_dims) - self.query_embed = nn.Embedding(self.two_stage_num_proposals, - self.embed_dims) + self.query_embed = nn.Embedding(self.two_stage_num_proposals, self.embed_dims) def _init_layers(self): if self.with_pos_coord: @@ -1096,40 +927,36 @@ class CoDinoTransformer(CoDeformableDetrTransformer): self.pos_feats_trans = nn.ModuleList() self.pos_feats_norm = nn.ModuleList() for i in range(self.num_co_heads): - self.aux_pos_trans.append( - nn.Linear(self.embed_dims * 2, self.embed_dims)) - self.aux_pos_trans_norm.append( - nn.LayerNorm(self.embed_dims)) + self.aux_pos_trans.append(nn.Linear(self.embed_dims * 2, self.embed_dims)) + self.aux_pos_trans_norm.append(nn.LayerNorm(self.embed_dims)) if self.with_coord_feat: - self.pos_feats_trans.append( - nn.Linear(self.embed_dims, self.embed_dims)) - self.pos_feats_norm.append( - nn.LayerNorm(self.embed_dims)) + self.pos_feats_trans.append(nn.Linear(self.embed_dims, self.embed_dims)) + self.pos_feats_norm.append(nn.LayerNorm(self.embed_dims)) def init_weights(self): super().init_weights() nn.init.normal_(self.query_embed.weight.data) - def forward(self, - mlvl_feats, - mlvl_masks, - query_embed, - mlvl_pos_embeds, - dn_label_query, - dn_bbox_query, - attn_mask, - reg_branches=None, - cls_branches=None, - **kwargs): - assert self.as_two_stage and query_embed is None, \ - 'as_two_stage must be True for DINO' + def forward( + self, + mlvl_feats, + mlvl_masks, + query_embed, + mlvl_pos_embeds, + dn_label_query, + dn_bbox_query, + attn_mask, + reg_branches=None, + cls_branches=None, + **kwargs, + ): + assert self.as_two_stage and query_embed is None, "as_two_stage must be True for DINO" feat_flatten = [] mask_flatten = [] lvl_pos_embed_flatten = [] spatial_shapes = [] - for lvl, (feat, mask, pos_embed) in enumerate( - zip(mlvl_feats, mlvl_masks, mlvl_pos_embeds)): + for lvl, (feat, mask, pos_embed) in enumerate(zip(mlvl_feats, mlvl_masks, mlvl_pos_embeds)): bs, c, h, w = feat.shape spatial_shape = (h, w) spatial_shapes.append(spatial_shape) @@ -1143,19 +970,14 @@ class CoDinoTransformer(CoDeformableDetrTransformer): feat_flatten = torch.cat(feat_flatten, 1) mask_flatten = torch.cat(mask_flatten, 1) lvl_pos_embed_flatten = torch.cat(lvl_pos_embed_flatten, 1) - spatial_shapes = torch.as_tensor( - spatial_shapes, dtype=torch.long, device=feat_flatten.device) - level_start_index = torch.cat((spatial_shapes.new_zeros( - (1, )), spatial_shapes.prod(1).cumsum(0)[:-1])) - valid_ratios = torch.stack( - [self.get_valid_ratio(m) for m in mlvl_masks], 1) + spatial_shapes = torch.as_tensor(spatial_shapes, dtype=torch.long, device=feat_flatten.device) + level_start_index = torch.cat((spatial_shapes.new_zeros((1,)), spatial_shapes.prod(1).cumsum(0)[:-1])) + valid_ratios = torch.stack([self.get_valid_ratio(m) for m in mlvl_masks], 1) - reference_points = self.get_reference_points( - spatial_shapes, valid_ratios, device=feat.device) + reference_points = self.get_reference_points(spatial_shapes, valid_ratios, device=feat.device) feat_flatten = feat_flatten.permute(1, 0, 2) # (H*W, bs, embed_dims) - lvl_pos_embed_flatten = lvl_pos_embed_flatten.permute( - 1, 0, 2) # (H*W, bs, embed_dims) + lvl_pos_embed_flatten = lvl_pos_embed_flatten.permute(1, 0, 2) # (H*W, bs, embed_dims) memory = self.encoder( query=feat_flatten, key=None, @@ -1166,40 +988,32 @@ class CoDinoTransformer(CoDeformableDetrTransformer): reference_points=reference_points, level_start_index=level_start_index, valid_ratios=valid_ratios, - **kwargs) + **kwargs, + ) memory = memory.permute(1, 0, 2) bs, _, c = memory.shape - output_memory, output_proposals = self.gen_encoder_output_proposals( - memory, mask_flatten, spatial_shapes) - enc_outputs_class = cls_branches[self.decoder.num_layers]( - output_memory) - enc_outputs_coord_unact = reg_branches[self.decoder.num_layers]( - output_memory) + output_proposals + output_memory, output_proposals = self.gen_encoder_output_proposals(memory, mask_flatten, spatial_shapes) + enc_outputs_class = cls_branches[self.decoder.num_layers](output_memory) + enc_outputs_coord_unact = reg_branches[self.decoder.num_layers](output_memory) + output_proposals cls_out_features = cls_branches[self.decoder.num_layers].out_features topk = self.two_stage_num_proposals # NOTE In DeformDETR, enc_outputs_class[..., 0] is used for topk topk_indices = torch.topk(enc_outputs_class.max(-1)[0], topk, dim=1)[1] - topk_score = torch.gather( - enc_outputs_class, 1, - topk_indices.unsqueeze(-1).repeat(1, 1, cls_out_features)) - topk_coords_unact = torch.gather( - enc_outputs_coord_unact, 1, - topk_indices.unsqueeze(-1).repeat(1, 1, 4)) + topk_score = torch.gather(enc_outputs_class, 1, topk_indices.unsqueeze(-1).repeat(1, 1, cls_out_features)) + topk_coords_unact = torch.gather(enc_outputs_coord_unact, 1, topk_indices.unsqueeze(-1).repeat(1, 1, 4)) topk_anchor = topk_coords_unact.sigmoid() topk_coords_unact = topk_coords_unact.detach() - query = self.query_embed.weight[:, None, :].repeat(1, bs, - 1).transpose(0, 1) + query = self.query_embed.weight[:, None, :].repeat(1, bs, 1).transpose(0, 1) # NOTE the query_embed here is not spatial query as in DETR. # It is actually content query, which is named tgt in other # DETR-like models if dn_label_query is not None: query = torch.cat([dn_label_query, query], dim=1) if dn_bbox_query is not None: - reference_points = torch.cat([dn_bbox_query, topk_coords_unact], - dim=1) + reference_points = torch.cat([dn_bbox_query, topk_coords_unact], dim=1) else: reference_points = topk_coords_unact reference_points = reference_points.sigmoid() @@ -1217,31 +1031,32 @@ class CoDinoTransformer(CoDeformableDetrTransformer): level_start_index=level_start_index, valid_ratios=valid_ratios, reg_branches=reg_branches, - **kwargs) + **kwargs, + ) inter_references_out = inter_references - return inter_states, inter_references_out, \ - topk_score, topk_anchor, memory - - def forward_aux(self, - mlvl_feats, - mlvl_masks, - query_embed, - mlvl_pos_embeds, - pos_anchors, - pos_feats=None, - reg_branches=None, - cls_branches=None, - return_encoder_output=False, - attn_masks=None, - head_idx=0, - **kwargs): + return inter_states, inter_references_out, topk_score, topk_anchor, memory + + def forward_aux( + self, + mlvl_feats, + mlvl_masks, + query_embed, + mlvl_pos_embeds, + pos_anchors, + pos_feats=None, + reg_branches=None, + cls_branches=None, + return_encoder_output=False, + attn_masks=None, + head_idx=0, + **kwargs, + ): feat_flatten = [] mask_flatten = [] spatial_shapes = [] - for lvl, (feat, mask, pos_embed) in enumerate( - zip(mlvl_feats, mlvl_masks, mlvl_pos_embeds)): + for lvl, (feat, mask, pos_embed) in enumerate(zip(mlvl_feats, mlvl_masks, mlvl_pos_embeds)): bs, c, h, w = feat.shape spatial_shape = (h, w) spatial_shapes.append(spatial_shape) @@ -1251,12 +1066,9 @@ class CoDinoTransformer(CoDeformableDetrTransformer): mask_flatten.append(mask) feat_flatten = torch.cat(feat_flatten, 1) mask_flatten = torch.cat(mask_flatten, 1) - spatial_shapes = torch.as_tensor( - spatial_shapes, dtype=torch.long, device=feat_flatten.device) - level_start_index = torch.cat((spatial_shapes.new_zeros( - (1, )), spatial_shapes.prod(1).cumsum(0)[:-1])) - valid_ratios = torch.stack( - [self.get_valid_ratio(m) for m in mlvl_masks], 1) + spatial_shapes = torch.as_tensor(spatial_shapes, dtype=torch.long, device=feat_flatten.device) + level_start_index = torch.cat((spatial_shapes.new_zeros((1,)), spatial_shapes.prod(1).cumsum(0)[:-1])) + valid_ratios = torch.stack([self.get_valid_ratio(m) for m in mlvl_masks], 1) feat_flatten = feat_flatten.permute(1, 0, 2) # (H*W, bs, embed_dims) @@ -1267,13 +1079,10 @@ class CoDinoTransformer(CoDeformableDetrTransformer): topk_coords_unact = inverse_sigmoid(pos_anchors) reference_points = pos_anchors if self.num_co_heads > 0: - pos_trans_out = self.aux_pos_trans_norm[head_idx]( - self.aux_pos_trans[head_idx]( - self.get_proposal_pos_embed(topk_coords_unact))) + pos_trans_out = self.aux_pos_trans_norm[head_idx](self.aux_pos_trans[head_idx](self.get_proposal_pos_embed(topk_coords_unact))) query = pos_trans_out if self.with_coord_feat: - query = query + self.pos_feats_norm[head_idx]( - self.pos_feats_trans[head_idx](pos_feats)) + query = query + self.pos_feats_norm[head_idx](self.pos_feats_trans[head_idx](pos_feats)) # decoder query = query.permute(1, 0, 2) @@ -1289,7 +1098,8 @@ class CoDinoTransformer(CoDeformableDetrTransformer): level_start_index=level_start_index, valid_ratios=valid_ratios, reg_branches=reg_branches, - **kwargs) + **kwargs, + ) inter_references_out = inter_references @@ -1305,26 +1115,21 @@ class DetrTransformerEncoder(TransformerLayerSequence): `LN`. Only used when `self.pre_norm` is `True` """ - def __init__(self, - *args, - post_norm_cfg=dict(type='LN'), - with_cp=-1, - **kwargs): + def __init__(self, *args, post_norm_cfg=dict(type="LN"), with_cp=-1, **kwargs): super(DetrTransformerEncoder, self).__init__(*args, **kwargs) if post_norm_cfg is not None: - self.post_norm = build_norm_layer( - post_norm_cfg, self.embed_dims)[1] if self.pre_norm else None + self.post_norm = build_norm_layer(post_norm_cfg, self.embed_dims)[1] if self.pre_norm else None else: - assert not self.pre_norm, f'Use prenorm in ' \ - f'{self.__class__.__name__},' \ - f'Please specify post_norm_cfg' + assert not self.pre_norm, f"Use prenorm in " f"{self.__class__.__name__}," f"Please specify post_norm_cfg" self.post_norm = None self.with_cp = with_cp if self.with_cp > 0: if checkpoint_wrapper is None: - warnings.warn('If you want to reduce GPU memory usage, \ + warnings.warn( + "If you want to reduce GPU memory usage, \ please install fairscale by executing the \ - following command: pip install fairscale.') + following command: pip install fairscale." + ) return for i in range(self.with_cp): self.layers[i] = checkpoint_wrapper(self.layers[i]) @@ -1353,15 +1158,17 @@ class DetrTransformerDecoderLayer(BaseTransformerLayer): Default๏ผš2. """ - def __init__(self, - attn_cfgs, - feedforward_channels, - ffn_dropout=0.0, - operation_order=None, - act_cfg=dict(type='ReLU', inplace=True), - norm_cfg=dict(type='LN'), - ffn_num_fcs=2, - **kwargs): + def __init__( + self, + attn_cfgs, + feedforward_channels, + ffn_dropout=0.0, + operation_order=None, + act_cfg=dict(type="ReLU", inplace=True), + norm_cfg=dict(type="LN"), + ffn_num_fcs=2, + **kwargs, + ): super(DetrTransformerDecoderLayer, self).__init__( attn_cfgs=attn_cfgs, feedforward_channels=feedforward_channels, @@ -1370,7 +1177,7 @@ class DetrTransformerDecoderLayer(BaseTransformerLayer): act_cfg=act_cfg, norm_cfg=norm_cfg, ffn_num_fcs=ffn_num_fcs, - **kwargs) + **kwargs, + ) assert len(operation_order) == 6 - assert set(operation_order) == set( - ['self_attn', 'norm', 'cross_attn', 'ffn']) + assert set(operation_order) == set(["self_attn", "norm", "cross_attn", "ffn"]) diff --git a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_r50_8xb2_1x_coco.py b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_r50_8xb2_1x_coco.py index 1a4130437666428213eb3250f8eee9d2a4d1442b..f9c70e9bd2f0b2907e1175863fa683586904dcb4 100644 --- a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_r50_8xb2_1x_coco.py +++ b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_r50_8xb2_1x_coco.py @@ -1,46 +1,65 @@ -_base_ = './co_dino_5scale_r50_lsj_8xb2_1x_coco.py' +_base_ = "./co_dino_5scale_r50_lsj_8xb2_1x_coco.py" -model = dict( - use_lsj=False, data_preprocessor=dict(pad_mask=False, batch_augments=None)) +model = dict(use_lsj=False, data_preprocessor=dict(pad_mask=False, batch_augments=None)) # train_pipeline, NOTE the img_scale and the Pad's size_divisor is different # from the default setting in mmdet. train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='PackDetInputs') + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="PackDetInputs"), ] train_dataloader = dict( @@ -48,20 +67,19 @@ train_dataloader = dict( _delete_=True, type=_base_.dataset_type, data_root=_base_.data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=False, min_size=32), pipeline=train_pipeline, - backend_args=_base_.backend_args)) + backend_args=_base_.backend_args, + ) +) test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) diff --git a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_1x_coco.py b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_1x_coco.py index 876b90f89c8795186d830689c9bdb420b0cfbb18..e08956bbfdb48f6ab60bc120024acb1dcf281f13 100644 --- a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_1x_coco.py +++ b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_1x_coco.py @@ -1,7 +1,6 @@ -_base_ = 'mmdet::common/ssj_scp_270k_coco-instance.py' +_base_ = "mmdet::common/ssj_scp_270k_coco-instance.py" -custom_imports = dict( - imports=['projects.CO-DETR.codetr'], allow_failed_imports=False) +custom_imports = dict(imports=["projects.CO-DETR.codetr"], allow_failed_imports=False) # model settings num_dec_layer = 6 @@ -9,314 +8,226 @@ loss_lambda = 2.0 num_classes = 80 image_size = (1024, 1024) -batch_augments = [ - dict(type='BatchFixedSizePad', size=image_size, pad_mask=True) -] +batch_augments = [dict(type="BatchFixedSizePad", size=image_size, pad_mask=True)] model = dict( - type='CoDETR', + type="CoDETR", # If using the lsj augmentation, # it is recommended to set it to True. use_lsj=True, # detr: 52.1 # one-stage: 49.4 # two-stage: 47.9 - eval_module='detr', # in ['detr', 'one-stage', 'two-stage'] + eval_module="detr", # in ['detr', 'one-stage', 'two-stage'] data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_mask=True, - batch_augments=batch_augments), + batch_augments=batch_augments, + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), neck=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[256, 512, 1024, 2048], kernel_size=1, out_channels=256, act_cfg=None, - norm_cfg=dict(type='GN', num_groups=32), - num_outs=5), + norm_cfg=dict(type="GN", num_groups=32), + num_outs=5, + ), query_head=dict( - type='CoDINOHead', + type="CoDINOHead", num_query=900, num_classes=num_classes, in_channels=2048, as_two_stage=True, - dn_cfg=dict( - label_noise_scale=0.5, - box_noise_scale=1.0, - group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100)), + dn_cfg=dict(label_noise_scale=0.5, box_noise_scale=1.0, group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100)), transformer=dict( - type='CoDinoTransformer', + type="CoDinoTransformer", with_coord_feat=False, num_co_heads=2, # ATSS Aux Head + Faster RCNN Aux Head num_feature_levels=5, encoder=dict( - type='DetrTransformerEncoder', + type="DetrTransformerEncoder", num_layers=6, # number of layers that use checkpoint. # The maximum value for the setting is num_layers. # FairScale must be installed for it to work. with_cp=4, transformerlayers=dict( - type='BaseTransformerLayer', - attn_cfgs=dict( - type='MultiScaleDeformableAttention', - embed_dims=256, - num_levels=5, - dropout=0.0), + type="BaseTransformerLayer", + attn_cfgs=dict(type="MultiScaleDeformableAttention", embed_dims=256, num_levels=5, dropout=0.0), feedforward_channels=2048, ffn_dropout=0.0, - operation_order=('self_attn', 'norm', 'ffn', 'norm'))), + operation_order=("self_attn", "norm", "ffn", "norm"), + ), + ), decoder=dict( - type='DinoTransformerDecoder', + type="DinoTransformerDecoder", num_layers=6, return_intermediate=True, transformerlayers=dict( - type='DetrTransformerDecoderLayer', + type="DetrTransformerDecoderLayer", attn_cfgs=[ - dict( - type='MultiheadAttention', - embed_dims=256, - num_heads=8, - dropout=0.0), - dict( - type='MultiScaleDeformableAttention', - embed_dims=256, - num_levels=5, - dropout=0.0), + dict(type="MultiheadAttention", embed_dims=256, num_heads=8, dropout=0.0), + dict(type="MultiScaleDeformableAttention", embed_dims=256, num_levels=5, dropout=0.0), ], feedforward_channels=2048, ffn_dropout=0.0, - operation_order=('self_attn', 'norm', 'cross_attn', 'norm', - 'ffn', 'norm')))), - positional_encoding=dict( - type='SinePositionalEncoding', - num_feats=128, - temperature=20, - normalize=True), - loss_cls=dict( # Different from the DINO - type='QualityFocalLoss', - use_sigmoid=True, - beta=2.0, - loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=5.0), - loss_iou=dict(type='GIoULoss', loss_weight=2.0)), + operation_order=("self_attn", "norm", "cross_attn", "norm", "ffn", "norm"), + ), + ), + ), + positional_encoding=dict(type="SinePositionalEncoding", num_feats=128, temperature=20, normalize=True), + loss_cls=dict(type="QualityFocalLoss", use_sigmoid=True, beta=2.0, loss_weight=1.0), # Different from the DINO + loss_bbox=dict(type="L1Loss", loss_weight=5.0), + loss_iou=dict(type="GIoULoss", loss_weight=2.0), + ), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=4, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64, 128]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - loss_weight=1.0 * num_dec_layer * loss_lambda), - loss_bbox=dict( - type='L1Loss', loss_weight=1.0 * num_dec_layer * loss_lambda)), + type="AnchorGenerator", octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64, 128] + ), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0 * num_dec_layer * loss_lambda), + loss_bbox=dict(type="L1Loss", loss_weight=1.0 * num_dec_layer * loss_lambda), + ), roi_head=[ dict( - type='CoStandardRoIHead', + type="CoStandardRoIHead", bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict( - type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32, 64], - finest_scale=56), + finest_scale=56, + ), bbox_head=dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=num_classes, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, reg_decoded_bbox=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0 * num_dec_layer * loss_lambda), - loss_bbox=dict( - type='GIoULoss', - loss_weight=10.0 * num_dec_layer * loss_lambda))) + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0 * num_dec_layer * loss_lambda), + loss_bbox=dict(type="GIoULoss", loss_weight=10.0 * num_dec_layer * loss_lambda), + ), + ) ], bbox_head=[ dict( - type='CoATSSHead', + type="CoATSSHead", num_classes=num_classes, in_channels=256, stacked_convs=1, feat_channels=256, anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[4, 8, 16, 32, 64, 128]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0 * num_dec_layer * loss_lambda), - loss_bbox=dict( - type='GIoULoss', - loss_weight=2.0 * num_dec_layer * loss_lambda), - loss_centerness=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - loss_weight=1.0 * num_dec_layer * loss_lambda)), + type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[4, 8, 16, 32, 64, 128] + ), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0 * num_dec_layer * loss_lambda), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0 * num_dec_layer * loss_lambda), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0 * num_dec_layer * loss_lambda), + ), ], # model training and testing settings train_cfg=[ dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='FocalLossCost', weight=2.0), - dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'), - dict(type='IoUCost', iou_mode='giou', weight=2.0) - ])), + dict(type="FocalLossCost", weight=2.0), + dict(type="BBoxL1Cost", weight=5.0, box_format="xywh"), + dict(type="IoUCost", iou_mode="giou", weight=2.0), + ], + ) + ), dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=4000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=4000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False)), - dict( - assigner=dict(type='ATSSAssigner', topk=9), - allowed_border=-1, - pos_weight=-1, - debug=False) + debug=False, + ), + ), + dict(assigner=dict(type="ATSSAssigner", topk=9), allowed_border=-1, pos_weight=-1, debug=False), ], test_cfg=[ # Deferent from the DINO, we use the NMS. dict( max_per_img=300, # NMS can improve the mAP by 0.2. - nms=dict(type='soft_nms', iou_threshold=0.8)), + nms=dict(type="soft_nms", iou_threshold=0.8), + ), dict( - rpn=dict( - nms_pre=1000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.0, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100)), + rpn=dict(nms_pre=1000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.0, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), + ), dict( # atss bbox head: nms_pre=1000, min_bbox_size=0, score_thr=0.0, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100), + nms=dict(type="nms", iou_threshold=0.6), + max_per_img=100, + ), # soft-nms is also supported for rcnn testing # e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05) - ]) + ], +) # LSJ + CopyPaste load_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomResize', - scale=image_size, - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=image_size, - recompute_bbox=True, - allow_negative_crop=True), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=image_size, pad_val=dict(img=(114, 114, 114))), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomResize", scale=image_size, ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=image_size, recompute_bbox=True, allow_negative_crop=True), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=image_size, pad_val=dict(img=(114, 114, 114))), ] -train_pipeline = [ - dict(type='CopyPaste', max_num_pasted=100), - dict(type='PackDetInputs') -] +train_pipeline = [dict(type="CopyPaste", max_num_pasted=100), dict(type="PackDetInputs")] train_dataloader = dict( - sampler=dict(type='DefaultSampler', shuffle=True), - dataset=dict( - pipeline=train_pipeline, - dataset=dict( - filter_cfg=dict(filter_empty_gt=False), pipeline=load_pipeline))) + sampler=dict(type="DefaultSampler", shuffle=True), + dataset=dict(pipeline=train_pipeline, dataset=dict(filter_cfg=dict(filter_empty_gt=False), pipeline=load_pipeline)), +) # follow ViTDet test_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='Resize', scale=image_size, keep_ratio=True), # diff - dict(type='Pad', size=image_size, pad_val=dict(img=(114, 114, 114))), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile"), + dict(type="Resize", scale=image_size, keep_ratio=True), # diff + dict(type="Pad", size=image_size, pad_val=dict(img=(114, 114, 114))), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) @@ -324,33 +235,21 @@ test_dataloader = val_dataloader optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=2e-4, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=2e-4, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), - paramwise_cfg=dict(custom_keys={'backbone': dict(lr_mult=0.1)})) + paramwise_cfg=dict(custom_keys={"backbone": dict(lr_mult=0.1)}), +) -val_evaluator = dict(metric='bbox') +val_evaluator = dict(metric="bbox") test_evaluator = val_evaluator max_epochs = 12 -train_cfg = dict( - _delete_=True, - type='EpochBasedTrainLoop', - max_epochs=max_epochs, - val_interval=1) +train_cfg = dict(_delete_=True, type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[11], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[11], gamma=0.1)] -default_hooks = dict( - checkpoint=dict(by_epoch=True, interval=1, max_keep_ckpts=3)) +default_hooks = dict(checkpoint=dict(by_epoch=True, interval=1, max_keep_ckpts=3)) log_processor = dict(by_epoch=True) # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_3x_coco.py b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_3x_coco.py index 9a9fc34f680a3de3f96a548817f3d4e37983fee7..81afc6479f9ca2cf51c16b580acc7ee9c95fe6dd 100644 --- a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_3x_coco.py +++ b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_3x_coco.py @@ -1,4 +1,4 @@ -_base_ = ['co_dino_5scale_r50_lsj_8xb2_1x_coco.py'] +_base_ = ["co_dino_5scale_r50_lsj_8xb2_1x_coco.py"] param_scheduler = [dict(milestones=[30])] train_cfg = dict(max_epochs=36) diff --git a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_16xb1_16e_o365tococo.py b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_16xb1_16e_o365tococo.py index 77821c380f3407c2288377dc78232fd12205fc76..789562fce97291f2a8eee3aa545cf74c92a0da15 100644 --- a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_16xb1_16e_o365tococo.py +++ b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_16xb1_16e_o365tococo.py @@ -1,13 +1,13 @@ -_base_ = ['co_dino_5scale_r50_8xb2_1x_coco.py'] +_base_ = ["co_dino_5scale_r50_8xb2_1x_coco.py"] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth' # noqa -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/codetr/co_dino_5scale_swin_large_16e_o365tococo-614254c9.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth" # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/codetr/co_dino_5scale_swin_large_16e_o365tococo-614254c9.pth" # noqa # model settings model = dict( backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", pretrain_img_size=384, embed_dims=192, depths=[2, 2, 18, 2], @@ -16,8 +16,8 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, out_indices=(0, 1, 2, 3), @@ -25,75 +25,123 @@ model = dict( # in FPN, otherwise some parameter will not be used with_cp=True, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), neck=dict(in_channels=[192, 384, 768, 1536]), - query_head=dict( - dn_cfg=dict(box_noise_scale=0.4, group_cfg=dict(num_dn_queries=500)), - transformer=dict(encoder=dict(with_cp=6)))) + query_head=dict(dn_cfg=dict(box_noise_scale=0.4, group_cfg=dict(num_dn_queries=500)), transformer=dict(encoder=dict(with_cp=6))), +) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 2048), (512, 2048), (544, 2048), (576, 2048), - (608, 2048), (640, 2048), (672, 2048), (704, 2048), - (736, 2048), (768, 2048), (800, 2048), (832, 2048), - (864, 2048), (896, 2048), (928, 2048), (960, 2048), - (992, 2048), (1024, 2048), (1056, 2048), - (1088, 2048), (1120, 2048), (1152, 2048), - (1184, 2048), (1216, 2048), (1248, 2048), - (1280, 2048), (1312, 2048), (1344, 2048), - (1376, 2048), (1408, 2048), (1440, 2048), - (1472, 2048), (1504, 2048), (1536, 2048)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 2048), + (512, 2048), + (544, 2048), + (576, 2048), + (608, 2048), + (640, 2048), + (672, 2048), + (704, 2048), + (736, 2048), + (768, 2048), + (800, 2048), + (832, 2048), + (864, 2048), + (896, 2048), + (928, 2048), + (960, 2048), + (992, 2048), + (1024, 2048), + (1056, 2048), + (1088, 2048), + (1120, 2048), + (1152, 2048), + (1184, 2048), + (1216, 2048), + (1248, 2048), + (1280, 2048), + (1312, 2048), + (1344, 2048), + (1376, 2048), + (1408, 2048), + (1440, 2048), + (1472, 2048), + (1504, 2048), + (1536, 2048), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 2048), (512, 2048), (544, 2048), (576, 2048), - (608, 2048), (640, 2048), (672, 2048), (704, 2048), - (736, 2048), (768, 2048), (800, 2048), (832, 2048), - (864, 2048), (896, 2048), (928, 2048), (960, 2048), - (992, 2048), (1024, 2048), (1056, 2048), - (1088, 2048), (1120, 2048), (1152, 2048), - (1184, 2048), (1216, 2048), (1248, 2048), - (1280, 2048), (1312, 2048), (1344, 2048), - (1376, 2048), (1408, 2048), (1440, 2048), - (1472, 2048), (1504, 2048), (1536, 2048)], - keep_ratio=True) - ] - ]), - dict(type='PackDetInputs') + type="RandomChoiceResize", + scales=[ + (480, 2048), + (512, 2048), + (544, 2048), + (576, 2048), + (608, 2048), + (640, 2048), + (672, 2048), + (704, 2048), + (736, 2048), + (768, 2048), + (800, 2048), + (832, 2048), + (864, 2048), + (896, 2048), + (928, 2048), + (960, 2048), + (992, 2048), + (1024, 2048), + (1056, 2048), + (1088, 2048), + (1120, 2048), + (1152, 2048), + (1184, 2048), + (1216, 2048), + (1248, 2048), + (1280, 2048), + (1312, 2048), + (1344, 2048), + (1376, 2048), + (1408, 2048), + (1440, 2048), + (1472, 2048), + (1504, 2048), + (1536, 2048), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="PackDetInputs"), ] -train_dataloader = dict( - batch_size=1, num_workers=1, dataset=dict(pipeline=train_pipeline)) +train_dataloader = dict(batch_size=1, num_workers=1, dataset=dict(pipeline=train_pipeline)) test_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='Resize', scale=(2048, 1280), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile"), + dict(type="Resize", scale=(2048, 1280), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) @@ -104,12 +152,4 @@ optim_wrapper = dict(optimizer=dict(lr=1e-4)) max_epochs = 16 train_cfg = dict(max_epochs=max_epochs) -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[8], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[8], gamma=0.1)] diff --git a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_16xb1_1x_coco.py b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_16xb1_1x_coco.py index d4a873464d422334a42d72543bbccc3b344aa97e..52c18694a234ad87631f22548138757f99dd7e1f 100644 --- a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_16xb1_1x_coco.py +++ b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_16xb1_1x_coco.py @@ -1,12 +1,12 @@ -_base_ = ['co_dino_5scale_r50_8xb2_1x_coco.py'] +_base_ = ["co_dino_5scale_r50_8xb2_1x_coco.py"] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth" # noqa # model settings model = dict( backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", pretrain_img_size=384, embed_dims=192, depths=[2, 2, 18, 2], @@ -15,8 +15,8 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, out_indices=(0, 1, 2, 3), @@ -24,8 +24,10 @@ model = dict( # in FPN, otherwise some parameter will not be used with_cp=False, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), neck=dict(in_channels=[192, 384, 768, 1536]), - query_head=dict(transformer=dict(encoder=dict(with_cp=6)))) + query_head=dict(transformer=dict(encoder=dict(with_cp=6))), +) train_dataloader = dict(batch_size=1, num_workers=1) diff --git a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_16xb1_3x_coco.py b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_16xb1_3x_coco.py index c2fce29b98b5ffe7e51396b8b88b289fc4c8ffbc..d017dfe39c23f31343de6bb90b770ddf48b13361 100644 --- a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_16xb1_3x_coco.py +++ b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_16xb1_3x_coco.py @@ -1,4 +1,4 @@ -_base_ = ['co_dino_5scale_swin_l_16xb1_1x_coco.py'] +_base_ = ["co_dino_5scale_swin_l_16xb1_1x_coco.py"] # model settings model = dict(backbone=dict(drop_path_rate=0.6)) diff --git a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_lsj_16xb1_1x_coco.py b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_lsj_16xb1_1x_coco.py index 4a9b3688b8ebf6525f4d96526dd543576ae6253b..5b19ad5e7e6b36802ad654126f16c09e2ee9fd1a 100644 --- a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_lsj_16xb1_1x_coco.py +++ b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_lsj_16xb1_1x_coco.py @@ -1,17 +1,15 @@ -_base_ = ['co_dino_5scale_r50_lsj_8xb2_1x_coco.py'] +_base_ = ["co_dino_5scale_r50_lsj_8xb2_1x_coco.py"] image_size = (1280, 1280) -batch_augments = [ - dict(type='BatchFixedSizePad', size=image_size, pad_mask=True) -] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth' # noqa +batch_augments = [dict(type="BatchFixedSizePad", size=image_size, pad_mask=True)] +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth" # noqa # model settings model = dict( data_preprocessor=dict(batch_augments=batch_augments), backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", pretrain_img_size=384, embed_dims=192, depths=[2, 2, 18, 2], @@ -20,8 +18,8 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, out_indices=(0, 1, 2, 3), @@ -29,43 +27,30 @@ model = dict( # in FPN, otherwise some parameter will not be used with_cp=False, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), neck=dict(in_channels=[192, 384, 768, 1536]), - query_head=dict(transformer=dict(encoder=dict(with_cp=6)))) + query_head=dict(transformer=dict(encoder=dict(with_cp=6))), +) load_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomResize', - scale=image_size, - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=image_size, - recompute_bbox=True, - allow_negative_crop=True), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=image_size, pad_val=dict(img=(114, 114, 114))), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomResize", scale=image_size, ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=image_size, recompute_bbox=True, allow_negative_crop=True), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=image_size, pad_val=dict(img=(114, 114, 114))), ] -train_dataloader = dict( - batch_size=1, - num_workers=1, - dataset=dict(dataset=dict(pipeline=load_pipeline))) +train_dataloader = dict(batch_size=1, num_workers=1, dataset=dict(dataset=dict(pipeline=load_pipeline))) test_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='Resize', scale=image_size, keep_ratio=True), - dict(type='Pad', size=image_size, pad_val=dict(img=(114, 114, 114))), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile"), + dict(type="Resize", scale=image_size, keep_ratio=True), + dict(type="Pad", size=image_size, pad_val=dict(img=(114, 114, 114))), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) diff --git a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_lsj_16xb1_3x_coco.py b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_lsj_16xb1_3x_coco.py index bf9cd4f439287d7174f9b773b7177ade179cd536..ef8f902c52187d3dd617d9cd2240a3f8571d391e 100644 --- a/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_lsj_16xb1_3x_coco.py +++ b/mmpose/configs/mmdet/CO-DETR/configs/codino/co_dino_5scale_swin_l_lsj_16xb1_3x_coco.py @@ -1,7 +1,7 @@ -_base_ = ['co_dino_5scale_swin_l_lsj_16xb1_1x_coco.py'] +_base_ = ["co_dino_5scale_swin_l_lsj_16xb1_1x_coco.py"] model = dict(backbone=dict(drop_path_rate=0.5)) -param_scheduler = [dict(type='MultiStepLR', milestones=[30])] +param_scheduler = [dict(type="MultiStepLR", milestones=[30])] train_cfg = dict(max_epochs=36) diff --git a/mmpose/configs/mmdet/_base_/datasets/ade20k_instance.py b/mmpose/configs/mmdet/_base_/datasets/ade20k_instance.py index 57f657aa67f34830515f410425eccc96cb065af4..e8c64322b0d9d42aae3a436cf4de6258f9c08e9d 100644 --- a/mmpose/configs/mmdet/_base_/datasets/ade20k_instance.py +++ b/mmpose/configs/mmdet/_base_/datasets/ade20k_instance.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'ADE20KInstanceDataset' -data_root = 'data/ADEChallengeData2016/' +dataset_type = "ADE20KInstanceDataset" +data_root = "data/ADEChallengeData2016/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,14 +18,11 @@ data_root = 'data/ADEChallengeData2016/' backend_args = None test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(2560, 640), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(2560, 640), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] val_dataloader = dict( @@ -33,21 +30,24 @@ val_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='ade20k_instance_val.json', - data_prefix=dict(img='images/validation'), + ann_file="ade20k_instance_val.json", + data_prefix=dict(img="images/validation"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'ade20k_instance_val.json', - metric=['bbox', 'segm'], + type="CocoMetric", + ann_file=data_root + "ade20k_instance_val.json", + metric=["bbox", "segm"], format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/ade20k_panoptic.py b/mmpose/configs/mmdet/_base_/datasets/ade20k_panoptic.py index 7be5ddd7f0732193f4f92bc49e52493602928162..5e5b6295e5b935a0cd37baced3026d3bd312042b 100644 --- a/mmpose/configs/mmdet/_base_/datasets/ade20k_panoptic.py +++ b/mmpose/configs/mmdet/_base_/datasets/ade20k_panoptic.py @@ -1,17 +1,14 @@ # dataset settings -dataset_type = 'ADE20KPanopticDataset' -data_root = 'data/ADEChallengeData2016/' +dataset_type = "ADE20KPanopticDataset" +data_root = "data/ADEChallengeData2016/" backend_args = None test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(2560, 640), keep_ratio=True), - dict(type='LoadPanopticAnnotations', backend_args=backend_args), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(2560, 640), keep_ratio=True), + dict(type="LoadPanopticAnnotations", backend_args=backend_args), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] val_dataloader = dict( @@ -19,20 +16,23 @@ val_dataloader = dict( num_workers=0, persistent_workers=False, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='ade20k_panoptic_val.json', - data_prefix=dict(img='images/validation/', seg='ade20k_panoptic_val/'), + ann_file="ade20k_panoptic_val.json", + data_prefix=dict(img="images/validation/", seg="ade20k_panoptic_val/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoPanopticMetric', - ann_file=data_root + 'ade20k_panoptic_val.json', - seg_prefix=data_root + 'ade20k_panoptic_val/', - backend_args=backend_args) + type="CocoPanopticMetric", + ann_file=data_root + "ade20k_panoptic_val.json", + seg_prefix=data_root + "ade20k_panoptic_val/", + backend_args=backend_args, +) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/ade20k_semantic.py b/mmpose/configs/mmdet/_base_/datasets/ade20k_semantic.py index 522a775704182ededaa36f318cd1eb185784918f..d46150b6f061e662ee20f602c4b3fe0a688e7d74 100644 --- a/mmpose/configs/mmdet/_base_/datasets/ade20k_semantic.py +++ b/mmpose/configs/mmdet/_base_/datasets/ade20k_semantic.py @@ -1,5 +1,5 @@ -dataset_type = 'ADE20KSegDataset' -data_root = 'data/ADEChallengeData2016/' +dataset_type = "ADE20KSegDataset" +data_root = "data/ADEChallengeData2016/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -17,16 +17,10 @@ data_root = 'data/ADEChallengeData2016/' backend_args = None test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(2048, 512), keep_ratio=True), - dict( - type='LoadAnnotations', - with_bbox=False, - with_mask=False, - with_seg=True, - reduce_zero_label=True), - dict( - type='PackDetInputs', meta_keys=('img_path', 'ori_shape', 'img_shape')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(2048, 512), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=False, with_mask=False, with_seg=True, reduce_zero_label=True), + dict(type="PackDetInputs", meta_keys=("img_path", "ori_shape", "img_shape")), ] val_dataloader = dict( @@ -34,15 +28,15 @@ val_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - data_prefix=dict( - img_path='images/validation', - seg_map_path='annotations/validation'), - pipeline=test_pipeline)) + data_prefix=dict(img_path="images/validation", seg_map_path="annotations/validation"), + pipeline=test_pipeline, + ), +) test_dataloader = val_dataloader -val_evaluator = dict(type='SemSegMetric', iou_metrics=['mIoU']) +val_evaluator = dict(type="SemSegMetric", iou_metrics=["mIoU"]) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/cityscapes_detection.py b/mmpose/configs/mmdet/_base_/datasets/cityscapes_detection.py index caeba6bfcd26d8954fc9d499446e93323e372959..09087096554d7a3250efb53051409b71c5224dc7 100644 --- a/mmpose/configs/mmdet/_base_/datasets/cityscapes_detection.py +++ b/mmpose/configs/mmdet/_base_/datasets/cityscapes_detection.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'CityscapesDataset' -data_root = 'data/cityscapes/' +dataset_type = "CityscapesDataset" +data_root = "data/cityscapes/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,67 +18,64 @@ data_root = 'data/cityscapes/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', - scale=[(2048, 800), (2048, 1024)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=[(2048, 800), (2048, 1024)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(2048, 1024), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(2048, 1024), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( - type='RepeatDataset', + type="RepeatDataset", times=8, dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instancesonly_filtered_gtFine_train.json', - data_prefix=dict(img='leftImg8bit/train/'), + ann_file="annotations/instancesonly_filtered_gtFine_train.json", + data_prefix=dict(img="leftImg8bit/train/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args))) + backend_args=backend_args, + ), + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instancesonly_filtered_gtFine_val.json', - data_prefix=dict(img='leftImg8bit/val/'), + ann_file="annotations/instancesonly_filtered_gtFine_val.json", + data_prefix=dict(img="leftImg8bit/val/"), test_mode=True, filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instancesonly_filtered_gtFine_val.json', - metric='bbox', - backend_args=backend_args) + type="CocoMetric", ann_file=data_root + "annotations/instancesonly_filtered_gtFine_val.json", metric="bbox", backend_args=backend_args +) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/cityscapes_instance.py b/mmpose/configs/mmdet/_base_/datasets/cityscapes_instance.py index 136403136c67a6726662832b66f56701ff5aba8a..131ebfa019451421156fd480e27f19ea7594407b 100644 --- a/mmpose/configs/mmdet/_base_/datasets/cityscapes_instance.py +++ b/mmpose/configs/mmdet/_base_/datasets/cityscapes_instance.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'CityscapesDataset' -data_root = 'data/cityscapes/' +dataset_type = "CityscapesDataset" +data_root = "data/cityscapes/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,75 +18,75 @@ data_root = 'data/cityscapes/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomResize', - scale=[(2048, 800), (2048, 1024)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomResize", scale=[(2048, 800), (2048, 1024)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(2048, 1024), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(2048, 1024), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( - type='RepeatDataset', + type="RepeatDataset", times=8, dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instancesonly_filtered_gtFine_train.json', - data_prefix=dict(img='leftImg8bit/train/'), + ann_file="annotations/instancesonly_filtered_gtFine_train.json", + data_prefix=dict(img="leftImg8bit/train/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args))) + backend_args=backend_args, + ), + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instancesonly_filtered_gtFine_val.json', - data_prefix=dict(img='leftImg8bit/val/'), + ann_file="annotations/instancesonly_filtered_gtFine_val.json", + data_prefix=dict(img="leftImg8bit/val/"), test_mode=True, filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = [ dict( - type='CocoMetric', - ann_file=data_root + - 'annotations/instancesonly_filtered_gtFine_val.json', - metric=['bbox', 'segm'], - backend_args=backend_args), + type="CocoMetric", + ann_file=data_root + "annotations/instancesonly_filtered_gtFine_val.json", + metric=["bbox", "segm"], + backend_args=backend_args, + ), dict( - type='CityScapesMetric', - seg_prefix=data_root + 'gtFine/val', - outfile_prefix='./work_dirs/cityscapes_metric/instance', - backend_args=backend_args) + type="CityScapesMetric", + seg_prefix=data_root + "gtFine/val", + outfile_prefix="./work_dirs/cityscapes_metric/instance", + backend_args=backend_args, + ), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/coco_caption.py b/mmpose/configs/mmdet/_base_/datasets/coco_caption.py index a1bd898313927e4fca336dfa10f05e78b9fb7162..102fa56918b1348b999d13d7186dc53e61e4adb3 100644 --- a/mmpose/configs/mmdet/_base_/datasets/coco_caption.py +++ b/mmpose/configs/mmdet/_base_/datasets/coco_caption.py @@ -1,7 +1,7 @@ # data settings -dataset_type = 'CocoCaptionDataset' -data_root = 'data/coco/' +dataset_type = "CocoCaptionDataset" +data_root = "data/coco/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -19,16 +19,9 @@ data_root = 'data/coco/' backend_args = None test_pipeline = [ - dict( - type='LoadImageFromFile', - imdecode_backend='pillow', - backend_args=backend_args), - dict( - type='Resize', - scale=(224, 224), - interpolation='bicubic', - backend='pillow'), - dict(type='PackInputs', meta_keys=['image_id']), + dict(type="LoadImageFromFile", imdecode_backend="pillow", backend_args=backend_args), + dict(type="Resize", scale=(224, 224), interpolation="bicubic", backend="pillow"), + dict(type="PackInputs", meta_keys=["image_id"]), ] # ann_file download from @@ -42,17 +35,18 @@ val_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/coco_karpathy_val.json', + ann_file="annotations/coco_karpathy_val.json", pipeline=test_pipeline, - )) + ), +) val_evaluator = dict( - type='COCOCaptionMetric', - ann_file=data_root + 'annotations/coco_karpathy_val_gt.json', + type="COCOCaptionMetric", + ann_file=data_root + "annotations/coco_karpathy_val_gt.json", ) # # If you want standard test, please manually configure the test dataset diff --git a/mmpose/configs/mmdet/_base_/datasets/coco_detection.py b/mmpose/configs/mmdet/_base_/datasets/coco_detection.py index fdf8dfad9476b1d7b7a4e8c3e2832f115a1ea7f2..02aa42873896def9e3776c2a487c1b0f9541196d 100644 --- a/mmpose/configs/mmdet/_base_/datasets/coco_detection.py +++ b/mmpose/configs/mmdet/_base_/datasets/coco_detection.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,58 +18,60 @@ data_root = 'data/coco/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox', + type="CocoMetric", + ann_file=data_root + "annotations/instances_val2017.json", + metric="bbox", format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = val_evaluator # inference on test dataset and diff --git a/mmpose/configs/mmdet/_base_/datasets/coco_human_instance.py b/mmpose/configs/mmdet/_base_/datasets/coco_human_instance.py index 7e0d886d407c94daa5c61543e3149f22a8986f36..a07e6ea85a38cc4b5bda6133b09f03c5f6aa513b 100644 --- a/mmpose/configs/mmdet/_base_/datasets/coco_human_instance.py +++ b/mmpose/configs/mmdet/_base_/datasets/coco_human_instance.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'CocoHumanDataset' -data_root = "/datagrid/personal/purkrmir/data/COCO/original/" +dataset_type = "CocoHumanDataset" +data_root = "/path/to/COCO/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,58 +18,60 @@ data_root = "/datagrid/personal/purkrmir/data/COCO/original/" backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric=['bbox', 'segm'], + type="CocoMetric", + ann_file=data_root + "annotations/instances_val2017.json", + metric=["bbox", "segm"], format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = val_evaluator # inference on test dataset and diff --git a/mmpose/configs/mmdet/_base_/datasets/coco_instance.py b/mmpose/configs/mmdet/_base_/datasets/coco_instance.py index e91cb354038db4df3b990b307a5da9d77f341a88..e656dc7346f9c5d50baa7d56b7a79e20193627a1 100644 --- a/mmpose/configs/mmdet/_base_/datasets/coco_instance.py +++ b/mmpose/configs/mmdet/_base_/datasets/coco_instance.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,58 +18,60 @@ data_root = 'data/coco/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric=['bbox', 'segm'], + type="CocoMetric", + ann_file=data_root + "annotations/instances_val2017.json", + metric=["bbox", "segm"], format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = val_evaluator # inference on test dataset and diff --git a/mmpose/configs/mmdet/_base_/datasets/coco_instance_semantic.py b/mmpose/configs/mmdet/_base_/datasets/coco_instance_semantic.py index cc961863306690c056e564b542d518c0ebfbb7e2..992f33ac96651bf080427a7ca31c422dc339f669 100644 --- a/mmpose/configs/mmdet/_base_/datasets/coco_instance_semantic.py +++ b/mmpose/configs/mmdet/_base_/datasets/coco_instance_semantic.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,61 +18,61 @@ data_root = 'data/coco/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict( - type='LoadAnnotations', with_bbox=True, with_mask=True, with_seg=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, with_seg=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict( - type='LoadAnnotations', with_bbox=True, with_mask=True, with_seg=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, with_seg=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/', seg='stuffthingmaps/train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/", seg="stuffthingmaps/train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric=['bbox', 'segm'], + type="CocoMetric", + ann_file=data_root + "annotations/instances_val2017.json", + metric=["bbox", "segm"], format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/coco_panoptic.py b/mmpose/configs/mmdet/_base_/datasets/coco_panoptic.py index 0b95b619e68ed531d361bbd11a2382852c13446e..07c50f36b0bd9b42078bea57768da6a6a10d4b08 100644 --- a/mmpose/configs/mmdet/_base_/datasets/coco_panoptic.py +++ b/mmpose/configs/mmdet/_base_/datasets/coco_panoptic.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'CocoPanopticDataset' -data_root = 'data/coco/' +dataset_type = "CocoPanopticDataset" +data_root = "data/coco/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,58 +18,59 @@ data_root = 'data/coco/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadPanopticAnnotations', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadPanopticAnnotations", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='LoadPanopticAnnotations', backend_args=backend_args), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadPanopticAnnotations", backend_args=backend_args), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/panoptic_train2017.json', - data_prefix=dict( - img='train2017/', seg='annotations/panoptic_train2017/'), + ann_file="annotations/panoptic_train2017.json", + data_prefix=dict(img="train2017/", seg="annotations/panoptic_train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/panoptic_val2017.json', - data_prefix=dict(img='val2017/', seg='annotations/panoptic_val2017/'), + ann_file="annotations/panoptic_val2017.json", + data_prefix=dict(img="val2017/", seg="annotations/panoptic_val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoPanopticMetric', - ann_file=data_root + 'annotations/panoptic_val2017.json', - seg_prefix=data_root + 'annotations/panoptic_val2017/', - backend_args=backend_args) + type="CocoPanopticMetric", + ann_file=data_root + "annotations/panoptic_val2017.json", + seg_prefix=data_root + "annotations/panoptic_val2017/", + backend_args=backend_args, +) test_evaluator = val_evaluator # inference on test dataset and diff --git a/mmpose/configs/mmdet/_base_/datasets/coco_semantic.py b/mmpose/configs/mmdet/_base_/datasets/coco_semantic.py index 944bbbaeaeb6f10f0946bd1fc828bb01ea6c1fc3..bdf198c64b2032d7af80f8b3fc6bc03f070d3a06 100644 --- a/mmpose/configs/mmdet/_base_/datasets/coco_semantic.py +++ b/mmpose/configs/mmdet/_base_/datasets/coco_semantic.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'CocoSegDataset' -data_root = 'data/coco/' +dataset_type = "CocoSegDataset" +data_root = "data/coco/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,28 +18,18 @@ data_root = 'data/coco/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict( - type='LoadAnnotations', - with_bbox=False, - with_label=False, - with_seg=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=False, with_label=False, with_seg=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict( - type='LoadAnnotations', - with_bbox=False, - with_label=False, - with_seg=True), - dict( - type='PackDetInputs', - meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=False, with_label=False, with_seg=True), + dict(type="PackDetInputs", meta_keys=("img_path", "ori_shape", "img_shape", "scale_factor")), ] # For stuffthingmaps_semseg, please refer to @@ -48,31 +38,31 @@ train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - data_prefix=dict( - img_path='train2017/', - seg_map_path='stuffthingmaps_semseg/train2017/'), - pipeline=train_pipeline)) + data_prefix=dict(img_path="train2017/", seg_map_path="stuffthingmaps_semseg/train2017/"), + pipeline=train_pipeline, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - data_prefix=dict( - img_path='val2017/', - seg_map_path='stuffthingmaps_semseg/val2017/'), - pipeline=test_pipeline)) + data_prefix=dict(img_path="val2017/", seg_map_path="stuffthingmaps_semseg/val2017/"), + pipeline=test_pipeline, + ), +) test_dataloader = val_dataloader -val_evaluator = dict(type='SemSegMetric', iou_metrics=['mIoU']) +val_evaluator = dict(type="SemSegMetric", iou_metrics=["mIoU"]) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/deepfashion.py b/mmpose/configs/mmdet/_base_/datasets/deepfashion.py index a93dc7152f7a2e28ab726c79f9398a1034b7b4a1..232737fcd68968f9642d2bca0e3706bd3df08463 100644 --- a/mmpose/configs/mmdet/_base_/datasets/deepfashion.py +++ b/mmpose/configs/mmdet/_base_/datasets/deepfashion.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'DeepFashionDataset' -data_root = 'data/DeepFashion/In-shop/' +dataset_type = "DeepFashionDataset" +data_root = "data/DeepFashion/In-shop/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,78 +18,82 @@ data_root = 'data/DeepFashion/In-shop/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='Resize', scale=(750, 1101), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="Resize", scale=(750, 1101), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(750, 1101), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(750, 1101), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( - type='RepeatDataset', + type="RepeatDataset", times=2, dataset=dict( type=dataset_type, data_root=data_root, - ann_file='Anno/segmentation/DeepFashion_segmentation_train.json', - data_prefix=dict(img='Img/'), + ann_file="Anno/segmentation/DeepFashion_segmentation_train.json", + data_prefix=dict(img="Img/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args))) + backend_args=backend_args, + ), + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='Anno/segmentation/DeepFashion_segmentation_query.json', - data_prefix=dict(img='Img/'), + ann_file="Anno/segmentation/DeepFashion_segmentation_query.json", + data_prefix=dict(img="Img/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='Anno/segmentation/DeepFashion_segmentation_gallery.json', - data_prefix=dict(img='Img/'), + ann_file="Anno/segmentation/DeepFashion_segmentation_gallery.json", + data_prefix=dict(img="Img/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + - 'Anno/segmentation/DeepFashion_segmentation_query.json', - metric=['bbox', 'segm'], + type="CocoMetric", + ann_file=data_root + "Anno/segmentation/DeepFashion_segmentation_query.json", + metric=["bbox", "segm"], format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = dict( - type='CocoMetric', - ann_file=data_root + - 'Anno/segmentation/DeepFashion_segmentation_gallery.json', - metric=['bbox', 'segm'], + type="CocoMetric", + ann_file=data_root + "Anno/segmentation/DeepFashion_segmentation_gallery.json", + metric=["bbox", "segm"], format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) diff --git a/mmpose/configs/mmdet/_base_/datasets/dsdl.py b/mmpose/configs/mmdet/_base_/datasets/dsdl.py index 1f19e5e498b18a404f3c4e6419316b5f9981e811..209afda7882fd9c9df218532cf48c965354df075 100644 --- a/mmpose/configs/mmdet/_base_/datasets/dsdl.py +++ b/mmpose/configs/mmdet/_base_/datasets/dsdl.py @@ -1,7 +1,7 @@ -dataset_type = 'DSDLDetDataset' -data_root = 'path to dataset folder' -train_ann = 'path to train yaml file' -val_ann = 'path to val yaml file' +dataset_type = "DSDLDetDataset" +data_root = "path to dataset folder" +train_ann = "path to train yaml file" +val_ann = "path to val yaml file" backend_args = None # backend_args = dict( @@ -12,51 +12,46 @@ backend_args = None # })) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'instances')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "instances")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, ann_file=train_ann, filter_cfg=dict(filter_empty_gt=True, min_size=32, bbox_min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), - dataset=dict( - type=dataset_type, - data_root=data_root, - ann_file=val_ann, - test_mode=True, - pipeline=test_pipeline)) + sampler=dict(type="DefaultSampler", shuffle=False), + dataset=dict(type=dataset_type, data_root=data_root, ann_file=val_ann, test_mode=True, pipeline=test_pipeline), +) test_dataloader = val_dataloader -val_evaluator = dict(type='CocoMetric', metric='bbox') +val_evaluator = dict(type="CocoMetric", metric="bbox") # val_evaluator = dict(type='VOCMetric', metric='mAP', eval_mode='11points') test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/isaid_instance.py b/mmpose/configs/mmdet/_base_/datasets/isaid_instance.py index 09ddcab02bdd52374d5093d446abb0e34751f7a3..eb7fe7ed0ab325d480c29dd330b502bcbbfbd969 100644 --- a/mmpose/configs/mmdet/_base_/datasets/isaid_instance.py +++ b/mmpose/configs/mmdet/_base_/datasets/isaid_instance.py @@ -1,59 +1,61 @@ # dataset settings -dataset_type = 'iSAIDDataset' -data_root = 'data/iSAID/' +dataset_type = "iSAIDDataset" +data_root = "data/iSAID/" backend_args = None # Please see `projects/iSAID/README.md` for data preparation train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='Resize', scale=(800, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="Resize", scale=(800, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(800, 800), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(800, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='train/instancesonly_filtered_train.json', - data_prefix=dict(img='train/images/'), + ann_file="train/instancesonly_filtered_train.json", + data_prefix=dict(img="train/images/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='val/instancesonly_filtered_val.json', - data_prefix=dict(img='val/images/'), + ann_file="val/instancesonly_filtered_val.json", + data_prefix=dict(img="val/images/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'val/instancesonly_filtered_val.json', - metric=['bbox', 'segm'], + type="CocoMetric", + ann_file=data_root + "val/instancesonly_filtered_val.json", + metric=["bbox", "segm"], format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/lvis_v0.5_instance.py b/mmpose/configs/mmdet/_base_/datasets/lvis_v0.5_instance.py index d0ca44efb6d31aae5f6426a1c8b89d2e9be2104f..da938dc7f30a0c6be38e54be87ae5f07e6aaa49f 100644 --- a/mmpose/configs/mmdet/_base_/datasets/lvis_v0.5_instance.py +++ b/mmpose/configs/mmdet/_base_/datasets/lvis_v0.5_instance.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'LVISV05Dataset' -data_root = 'data/lvis_v0.5/' +dataset_type = "LVISV05Dataset" +data_root = "data/lvis_v0.5/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,62 +18,58 @@ data_root = 'data/lvis_v0.5/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( - type='ClassBalancedDataset', + type="ClassBalancedDataset", oversample_thr=1e-3, dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/lvis_v0.5_train.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/lvis_v0.5_train.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args))) + backend_args=backend_args, + ), + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/lvis_v0.5_val.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/lvis_v0.5_val.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='LVISMetric', - ann_file=data_root + 'annotations/lvis_v0.5_val.json', - metric=['bbox', 'segm'], - backend_args=backend_args) + type="LVISMetric", ann_file=data_root + "annotations/lvis_v0.5_val.json", metric=["bbox", "segm"], backend_args=backend_args +) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/lvis_v1_instance.py b/mmpose/configs/mmdet/_base_/datasets/lvis_v1_instance.py index 0413f370a2b635362a60c20881769064bac9a603..cd188f93b1447390d85519113cef5128a4aa8c28 100644 --- a/mmpose/configs/mmdet/_base_/datasets/lvis_v1_instance.py +++ b/mmpose/configs/mmdet/_base_/datasets/lvis_v1_instance.py @@ -1,22 +1,15 @@ # dataset settings -_base_ = 'lvis_v0.5_instance.py' -dataset_type = 'LVISV1Dataset' -data_root = 'data/lvis_v1/' +_base_ = "lvis_v0.5_instance.py" +dataset_type = "LVISV1Dataset" +data_root = "data/lvis_v1/" train_dataloader = dict( - dataset=dict( - dataset=dict( - type=dataset_type, - data_root=data_root, - ann_file='annotations/lvis_v1_train.json', - data_prefix=dict(img='')))) + dataset=dict(dataset=dict(type=dataset_type, data_root=data_root, ann_file="annotations/lvis_v1_train.json", data_prefix=dict(img=""))) +) val_dataloader = dict( - dataset=dict( - type=dataset_type, - data_root=data_root, - ann_file='annotations/lvis_v1_val.json', - data_prefix=dict(img=''))) + dataset=dict(type=dataset_type, data_root=data_root, ann_file="annotations/lvis_v1_val.json", data_prefix=dict(img="")) +) test_dataloader = val_dataloader -val_evaluator = dict(ann_file=data_root + 'annotations/lvis_v1_val.json') +val_evaluator = dict(ann_file=data_root + "annotations/lvis_v1_val.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/mot_challenge.py b/mmpose/configs/mmdet/_base_/datasets/mot_challenge.py index ce2828ef70a34c123792d252bf992f423049d065..0e9708a7253e4da1496eddc7e75bfa51f9c6a602 100644 --- a/mmpose/configs/mmdet/_base_/datasets/mot_challenge.py +++ b/mmpose/configs/mmdet/_base_/datasets/mot_challenge.py @@ -1,56 +1,48 @@ # dataset settings -dataset_type = 'MOTChallengeDataset' -data_root = 'data/MOT17/' +dataset_type = "MOTChallengeDataset" +data_root = "data/MOT17/" img_scale = (1088, 1088) backend_args = None # data pipeline train_pipeline = [ + dict(type="UniformRefFrameSample", num_ref_imgs=1, frame_range=10, filter_key_img=True), dict( - type='UniformRefFrameSample', - num_ref_imgs=1, - frame_range=10, - filter_key_img=True), - dict( - type='TransformBroadcaster', + type="TransformBroadcaster", share_random_params=True, transforms=[ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadTrackAnnotations'), - dict( - type='RandomResize', - scale=img_scale, - ratio_range=(0.8, 1.2), - keep_ratio=True, - clip_object_border=False), - dict(type='PhotoMetricDistortion') - ]), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadTrackAnnotations"), + dict(type="RandomResize", scale=img_scale, ratio_range=(0.8, 1.2), keep_ratio=True, clip_object_border=False), + dict(type="PhotoMetricDistortion"), + ], + ), dict( - type='TransformBroadcaster', + type="TransformBroadcaster", # different cropped positions for different frames share_random_params=False, - transforms=[ - dict( - type='RandomCrop', crop_size=img_scale, bbox_clip_border=False) - ]), + transforms=[dict(type="RandomCrop", crop_size=img_scale, bbox_clip_border=False)], + ), dict( - type='TransformBroadcaster', + type="TransformBroadcaster", share_random_params=True, transforms=[ - dict(type='RandomFlip', prob=0.5), - ]), - dict(type='PackTrackInputs') + dict(type="RandomFlip", prob=0.5), + ], + ), + dict(type="PackTrackInputs"), ] test_pipeline = [ dict( - type='TransformBroadcaster', + type="TransformBroadcaster", transforms=[ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=img_scale, keep_ratio=True), - dict(type='LoadTrackAnnotations') - ]), - dict(type='PackTrackInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=img_scale, keep_ratio=True), + dict(type="LoadTrackAnnotations"), + ], + ), + dict(type="PackTrackInputs"), ] # dataloader @@ -58,15 +50,17 @@ train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='TrackImgSampler'), # image-based sampling + sampler=dict(type="TrackImgSampler"), # image-based sampling dataset=dict( type=dataset_type, data_root=data_root, visibility_thr=-1, - ann_file='annotations/half-train_cocoformat.json', - data_prefix=dict(img_path='train'), - metainfo=dict(classes=('pedestrian', )), - pipeline=train_pipeline)) + ann_file="annotations/half-train_cocoformat.json", + data_prefix=dict(img_path="train"), + metainfo=dict(classes=("pedestrian",)), + pipeline=train_pipeline, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, @@ -74,17 +68,18 @@ val_dataloader = dict( # Now we support two ways to test, image_based and video_based # if you want to use video_based sampling, you can use as follows # sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), - sampler=dict(type='TrackImgSampler'), # image-based sampling + sampler=dict(type="TrackImgSampler"), # image-based sampling dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/half-val_cocoformat.json', - data_prefix=dict(img_path='train'), + ann_file="annotations/half-val_cocoformat.json", + data_prefix=dict(img_path="train"), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + ), +) test_dataloader = val_dataloader # evaluator -val_evaluator = dict( - type='MOTChallengeMetric', metric=['HOTA', 'CLEAR', 'Identity']) +val_evaluator = dict(type="MOTChallengeMetric", metric=["HOTA", "CLEAR", "Identity"]) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/mot_challenge_det.py b/mmpose/configs/mmdet/_base_/datasets/mot_challenge_det.py index a988572c3837eb2a8a6bf7b9eca06f3d82abdfda..98c7457cd1560e67026492e556220b0e9d6c18b2 100644 --- a/mmpose/configs/mmdet/_base_/datasets/mot_challenge_det.py +++ b/mmpose/configs/mmdet/_base_/datasets/mot_challenge_det.py @@ -1,66 +1,58 @@ # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/MOT17/' +dataset_type = "CocoDataset" +data_root = "data/MOT17/" backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args, to_float32=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', - scale=(1088, 1088), - ratio_range=(0.8, 1.2), - keep_ratio=True, - clip_object_border=False), - dict(type='PhotoMetricDistortion'), - dict(type='RandomCrop', crop_size=(1088, 1088), bbox_clip_border=False), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args, to_float32=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=(1088, 1088), ratio_range=(0.8, 1.2), keep_ratio=True, clip_object_border=False), + dict(type="PhotoMetricDistortion"), + dict(type="RandomCrop", crop_size=(1088, 1088), bbox_clip_border=False), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1088, 1088), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1088, 1088), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/half-train_cocoformat.json', - data_prefix=dict(img='train/'), - metainfo=dict(classes=('pedestrian', )), + ann_file="annotations/half-train_cocoformat.json", + data_prefix=dict(img="train/"), + metainfo=dict(classes=("pedestrian",)), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/half-val_cocoformat.json', - data_prefix=dict(img='train/'), - metainfo=dict(classes=('pedestrian', )), + ann_file="annotations/half-val_cocoformat.json", + data_prefix=dict(img="train/"), + metainfo=dict(classes=("pedestrian",)), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/half-val_cocoformat.json', - metric='bbox', - format_only=False) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/half-val_cocoformat.json", metric="bbox", format_only=False) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/mot_challenge_reid.py b/mmpose/configs/mmdet/_base_/datasets/mot_challenge_reid.py index 57a95b531f3591e60daaabc5eea6f11c7424215b..eccbb19f814fcd726812ed9dea1d94530d68eac8 100644 --- a/mmpose/configs/mmdet/_base_/datasets/mot_challenge_reid.py +++ b/mmpose/configs/mmdet/_base_/datasets/mot_challenge_reid.py @@ -1,31 +1,25 @@ # dataset settings -dataset_type = 'ReIDDataset' -data_root = 'data/MOT17/' +dataset_type = "ReIDDataset" +data_root = "data/MOT17/" backend_args = None # data pipeline train_pipeline = [ dict( - type='TransformBroadcaster', + type="TransformBroadcaster", share_random_params=False, transforms=[ - dict( - type='LoadImageFromFile', - backend_args=backend_args, - to_float32=True), - dict( - type='Resize', - scale=(128, 256), - keep_ratio=False, - clip_object_border=False), - dict(type='RandomFlip', prob=0.5, direction='horizontal'), - ]), - dict(type='PackReIDInputs', meta_keys=('flip', 'flip_direction')) + dict(type="LoadImageFromFile", backend_args=backend_args, to_float32=True), + dict(type="Resize", scale=(128, 256), keep_ratio=False, clip_object_border=False), + dict(type="RandomFlip", prob=0.5, direction="horizontal"), + ], + ), + dict(type="PackReIDInputs", meta_keys=("flip", "flip_direction")), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args, to_float32=True), - dict(type='Resize', scale=(128, 256), keep_ratio=False), - dict(type='PackReIDInputs') + dict(type="LoadImageFromFile", backend_args=backend_args, to_float32=True), + dict(type="Resize", scale=(128, 256), keep_ratio=False), + dict(type="PackReIDInputs"), ] # dataloader @@ -33,29 +27,33 @@ train_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, triplet_sampler=dict(num_ids=8, ins_per_id=4), - data_prefix=dict(img_path='reid/imgs'), - ann_file='reid/meta/train_80.txt', - pipeline=train_pipeline)) + data_prefix=dict(img_path="reid/imgs"), + ann_file="reid/meta/train_80.txt", + pipeline=train_pipeline, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, triplet_sampler=None, - data_prefix=dict(img_path='reid/imgs'), - ann_file='reid/meta/val_20.txt', - pipeline=test_pipeline)) + data_prefix=dict(img_path="reid/imgs"), + ann_file="reid/meta/val_20.txt", + pipeline=test_pipeline, + ), +) test_dataloader = val_dataloader # evaluator -val_evaluator = dict(type='ReIDMetrics', metric=['mAP', 'CMC']) +val_evaluator = dict(type="ReIDMetrics", metric=["mAP", "CMC"]) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/objects365v1_detection.py b/mmpose/configs/mmdet/_base_/datasets/objects365v1_detection.py index ee398698608543e13188452a816283e9a2563390..74f092dbc33f0b9230357dbfa9afa5cc663895f9 100644 --- a/mmpose/configs/mmdet/_base_/datasets/objects365v1_detection.py +++ b/mmpose/configs/mmdet/_base_/datasets/objects365v1_detection.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'Objects365V1Dataset' -data_root = 'data/Objects365/Obj365_v1/' +dataset_type = "Objects365V1Dataset" +data_root = "data/Objects365/Obj365_v1/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,57 +18,59 @@ data_root = 'data/Objects365/Obj365_v1/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/objects365_train.json', - data_prefix=dict(img='train/'), + ann_file="annotations/objects365_train.json", + data_prefix=dict(img="train/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/objects365_val.json', - data_prefix=dict(img='val/'), + ann_file="annotations/objects365_val.json", + data_prefix=dict(img="val/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/objects365_val.json', - metric='bbox', + type="CocoMetric", + ann_file=data_root + "annotations/objects365_val.json", + metric="bbox", sort_categories=True, format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/objects365v2_detection.py b/mmpose/configs/mmdet/_base_/datasets/objects365v2_detection.py index b25a7ba901befa8d61e3cdae8a7c68fb8a9c5aef..3b050fac787a42b9991cacb57e6eb75af60c44e4 100644 --- a/mmpose/configs/mmdet/_base_/datasets/objects365v2_detection.py +++ b/mmpose/configs/mmdet/_base_/datasets/objects365v2_detection.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'Objects365V2Dataset' -data_root = 'data/Objects365/Obj365_v2/' +dataset_type = "Objects365V2Dataset" +data_root = "data/Objects365/Obj365_v2/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,56 +18,58 @@ data_root = 'data/Objects365/Obj365_v2/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/zhiyuan_objv2_train.json', - data_prefix=dict(img='train/'), + ann_file="annotations/zhiyuan_objv2_train.json", + data_prefix=dict(img="train/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/zhiyuan_objv2_val.json', - data_prefix=dict(img='val/'), + ann_file="annotations/zhiyuan_objv2_val.json", + data_prefix=dict(img="val/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/zhiyuan_objv2_val.json', - metric='bbox', + type="CocoMetric", + ann_file=data_root + "annotations/zhiyuan_objv2_val.json", + metric="bbox", format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/openimages_detection.py b/mmpose/configs/mmdet/_base_/datasets/openimages_detection.py index 129661b405c70d3e2d0d2c4741e3a59333dd960c..0c3d2238c6431705b6bd1b907c3cfd58daca41e5 100644 --- a/mmpose/configs/mmdet/_base_/datasets/openimages_detection.py +++ b/mmpose/configs/mmdet/_base_/datasets/openimages_detection.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'OpenImagesDataset' -data_root = 'data/OpenImages/' +dataset_type = "OpenImagesDataset" +data_root = "data/OpenImages/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,64 +18,61 @@ data_root = 'data/OpenImages/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1024, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1024, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1024, 800), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1024, 800), keep_ratio=True), # avoid bboxes being resized - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadAnnotations", with_bbox=True), # TODO: find a better way to collect image_level_labels dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'instances', 'image_level_labels')) + type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "instances", "image_level_labels") + ), ] train_dataloader = dict( batch_size=2, num_workers=0, # workers_per_gpu > 0 may occur out of memory persistent_workers=False, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/oidv6-train-annotations-bbox.csv', - data_prefix=dict(img='OpenImages/train/'), - label_file='annotations/class-descriptions-boxable.csv', - hierarchy_file='annotations/bbox_labels_600_hierarchy.json', - meta_file='annotations/train-image-metas.pkl', + ann_file="annotations/oidv6-train-annotations-bbox.csv", + data_prefix=dict(img="OpenImages/train/"), + label_file="annotations/class-descriptions-boxable.csv", + hierarchy_file="annotations/bbox_labels_600_hierarchy.json", + meta_file="annotations/train-image-metas.pkl", pipeline=train_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_dataloader = dict( batch_size=1, num_workers=0, persistent_workers=False, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/validation-annotations-bbox.csv', - data_prefix=dict(img='OpenImages/validation/'), - label_file='annotations/class-descriptions-boxable.csv', - hierarchy_file='annotations/bbox_labels_600_hierarchy.json', - meta_file='annotations/validation-image-metas.pkl', - image_level_ann_file='annotations/validation-' - 'annotations-human-imagelabels-boxable.csv', + ann_file="annotations/validation-annotations-bbox.csv", + data_prefix=dict(img="OpenImages/validation/"), + label_file="annotations/class-descriptions-boxable.csv", + hierarchy_file="annotations/bbox_labels_600_hierarchy.json", + meta_file="annotations/validation-image-metas.pkl", + image_level_ann_file="annotations/validation-" "annotations-human-imagelabels-boxable.csv", pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='OpenImagesMetric', - iou_thrs=0.5, - ioa_thrs=0.5, - use_group_of=True, - get_supercategory=True) +val_evaluator = dict(type="OpenImagesMetric", iou_thrs=0.5, ioa_thrs=0.5, use_group_of=True, get_supercategory=True) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/refcoco+.py b/mmpose/configs/mmdet/_base_/datasets/refcoco+.py index ae0278ddf6c30fda6e4fb42aed1cb1b9a55109ec..ee822300080e8c0452fb67f2492c44c4ca92fe7c 100644 --- a/mmpose/configs/mmdet/_base_/datasets/refcoco+.py +++ b/mmpose/configs/mmdet/_base_/datasets/refcoco+.py @@ -1,22 +1,14 @@ # dataset settings -dataset_type = 'RefCocoDataset' -data_root = 'data/coco/' +dataset_type = "RefCocoDataset" +data_root = "data/coco/" backend_args = None test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict( - type='LoadAnnotations', - with_mask=True, - with_bbox=False, - with_seg=False, - with_label=False), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'gt_masks', 'text')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_mask=True, with_bbox=False, with_seg=False, with_label=False), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "gt_masks", "text")), ] val_dataloader = dict( @@ -24,32 +16,36 @@ val_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - data_prefix=dict(img_path='train2014/'), - ann_file='refcoco+/instances.json', - split_file='refcoco+/refs(unc).p', - split='val', - text_mode='select_first', - pipeline=test_pipeline)) + data_prefix=dict(img_path="train2014/"), + ann_file="refcoco+/instances.json", + split_file="refcoco+/refs(unc).p", + split="val", + text_mode="select_first", + pipeline=test_pipeline, + ), +) test_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - data_prefix=dict(img_path='train2014/'), - ann_file='refcoco+/instances.json', - split_file='refcoco+/refs(unc).p', - split='testA', # or 'testB' - text_mode='select_first', - pipeline=test_pipeline)) + data_prefix=dict(img_path="train2014/"), + ann_file="refcoco+/instances.json", + split_file="refcoco+/refs(unc).p", + split="testA", # or 'testB' + text_mode="select_first", + pipeline=test_pipeline, + ), +) -val_evaluator = dict(type='RefSegMetric', metric=['cIoU', 'mIoU']) +val_evaluator = dict(type="RefSegMetric", metric=["cIoU", "mIoU"]) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/refcoco.py b/mmpose/configs/mmdet/_base_/datasets/refcoco.py index 7b6caefa9a4bbfabdb49689588821f99d882a80f..f242bd8beda8ae44d308d6b985102cae086b36b2 100644 --- a/mmpose/configs/mmdet/_base_/datasets/refcoco.py +++ b/mmpose/configs/mmdet/_base_/datasets/refcoco.py @@ -1,22 +1,14 @@ # dataset settings -dataset_type = 'RefCocoDataset' -data_root = 'data/coco/' +dataset_type = "RefCocoDataset" +data_root = "data/coco/" backend_args = None test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict( - type='LoadAnnotations', - with_mask=True, - with_bbox=False, - with_seg=False, - with_label=False), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'gt_masks', 'text')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_mask=True, with_bbox=False, with_seg=False, with_label=False), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "gt_masks", "text")), ] val_dataloader = dict( @@ -24,32 +16,36 @@ val_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - data_prefix=dict(img_path='train2014/'), - ann_file='refcoco/instances.json', - split_file='refcoco/refs(unc).p', - split='val', - text_mode='select_first', - pipeline=test_pipeline)) + data_prefix=dict(img_path="train2014/"), + ann_file="refcoco/instances.json", + split_file="refcoco/refs(unc).p", + split="val", + text_mode="select_first", + pipeline=test_pipeline, + ), +) test_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - data_prefix=dict(img_path='train2014/'), - ann_file='refcoco/instances.json', - split_file='refcoco/refs(unc).p', - split='testA', # or 'testB' - text_mode='select_first', - pipeline=test_pipeline)) + data_prefix=dict(img_path="train2014/"), + ann_file="refcoco/instances.json", + split_file="refcoco/refs(unc).p", + split="testA", # or 'testB' + text_mode="select_first", + pipeline=test_pipeline, + ), +) -val_evaluator = dict(type='RefSegMetric', metric=['cIoU', 'mIoU']) +val_evaluator = dict(type="RefSegMetric", metric=["cIoU", "mIoU"]) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/refcocog.py b/mmpose/configs/mmdet/_base_/datasets/refcocog.py index 19dbeef1cde79fcb2aa80bb9936a60cc30089963..bb856b5e23a983937ed691c9124136e9baec2d60 100644 --- a/mmpose/configs/mmdet/_base_/datasets/refcocog.py +++ b/mmpose/configs/mmdet/_base_/datasets/refcocog.py @@ -1,22 +1,14 @@ # dataset settings -dataset_type = 'RefCocoDataset' -data_root = 'data/coco/' +dataset_type = "RefCocoDataset" +data_root = "data/coco/" backend_args = None test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict( - type='LoadAnnotations', - with_mask=True, - with_bbox=False, - with_seg=False, - with_label=False), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'gt_masks', 'text')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_mask=True, with_bbox=False, with_seg=False, with_label=False), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "gt_masks", "text")), ] val_dataloader = dict( @@ -24,32 +16,36 @@ val_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - data_prefix=dict(img_path='train2014/'), - ann_file='refcocog/instances.json', - split_file='refcocog/refs(umd).p', - split='val', - text_mode='select_first', - pipeline=test_pipeline)) + data_prefix=dict(img_path="train2014/"), + ann_file="refcocog/instances.json", + split_file="refcocog/refs(umd).p", + split="val", + text_mode="select_first", + pipeline=test_pipeline, + ), +) test_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - data_prefix=dict(img_path='train2014/'), - ann_file='refcocog/instances.json', - split_file='refcocog/refs(umd).p', - split='test', - text_mode='select_first', - pipeline=test_pipeline)) + data_prefix=dict(img_path="train2014/"), + ann_file="refcocog/instances.json", + split_file="refcocog/refs(umd).p", + split="test", + text_mode="select_first", + pipeline=test_pipeline, + ), +) -val_evaluator = dict(type='RefSegMetric', metric=['cIoU', 'mIoU']) +val_evaluator = dict(type="RefSegMetric", metric=["cIoU", "mIoU"]) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/semi_coco_detection.py b/mmpose/configs/mmdet/_base_/datasets/semi_coco_detection.py index 694f25f841e06dbb59a699dfe13c18e34dbdce9f..37d9e8804aa9bdc3dd8ee7161889fc00c396105b 100644 --- a/mmpose/configs/mmdet/_base_/datasets/semi_coco_detection.py +++ b/mmpose/configs/mmdet/_base_/datasets/semi_coco_detection.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,94 +18,87 @@ data_root = 'data/coco/' backend_args = None color_space = [ - [dict(type='ColorTransform')], - [dict(type='AutoContrast')], - [dict(type='Equalize')], - [dict(type='Sharpness')], - [dict(type='Posterize')], - [dict(type='Solarize')], - [dict(type='Color')], - [dict(type='Contrast')], - [dict(type='Brightness')], + [dict(type="ColorTransform")], + [dict(type="AutoContrast")], + [dict(type="Equalize")], + [dict(type="Sharpness")], + [dict(type="Posterize")], + [dict(type="Solarize")], + [dict(type="Color")], + [dict(type="Contrast")], + [dict(type="Brightness")], ] geometric = [ - [dict(type='Rotate')], - [dict(type='ShearX')], - [dict(type='ShearY')], - [dict(type='TranslateX')], - [dict(type='TranslateY')], + [dict(type="Rotate")], + [dict(type="ShearX")], + [dict(type="ShearY")], + [dict(type="TranslateX")], + [dict(type="TranslateY")], ] scale = [(1333, 400), (1333, 1200)] -branch_field = ['sup', 'unsup_teacher', 'unsup_student'] +branch_field = ["sup", "unsup_teacher", "unsup_student"] # pipeline used to augment labeled data, # which will be sent to student model for supervised training. sup_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomResize', scale=scale, keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='RandAugment', aug_space=color_space, aug_num=1), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), - dict( - type='MultiBranch', - branch_field=branch_field, - sup=dict(type='PackDetInputs')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=scale, keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="RandAugment", aug_space=color_space, aug_num=1), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="MultiBranch", branch_field=branch_field, sup=dict(type="PackDetInputs")), ] # pipeline used to augment unlabeled data weakly, # which will be sent to teacher model for predicting pseudo instances. weak_pipeline = [ - dict(type='RandomResize', scale=scale, keep_ratio=True), - dict(type='RandomFlip', prob=0.5), + dict(type="RandomResize", scale=scale, keep_ratio=True), + dict(type="RandomFlip", prob=0.5), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', - 'homography_matrix')), + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction", "homography_matrix"), + ), ] # pipeline used to augment unlabeled data strongly, # which will be sent to student model for unsupervised training. strong_pipeline = [ - dict(type='RandomResize', scale=scale, keep_ratio=True), - dict(type='RandomFlip', prob=0.5), + dict(type="RandomResize", scale=scale, keep_ratio=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomOrder', + type="RandomOrder", transforms=[ - dict(type='RandAugment', aug_space=color_space, aug_num=1), - dict(type='RandAugment', aug_space=geometric, aug_num=1), - ]), - dict(type='RandomErasing', n_patches=(1, 5), ratio=(0, 0.2)), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="RandAugment", aug_space=color_space, aug_num=1), + dict(type="RandAugment", aug_space=geometric, aug_num=1), + ], + ), + dict(type="RandomErasing", n_patches=(1, 5), ratio=(0, 0.2)), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', - 'homography_matrix')), + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction", "homography_matrix"), + ), ] # pipeline used to augment unlabeled data into different views unsup_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadEmptyAnnotations'), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadEmptyAnnotations"), dict( - type='MultiBranch', + type="MultiBranch", branch_field=branch_field, unsup_teacher=weak_pipeline, unsup_student=strong_pipeline, - ) + ), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] batch_size = 5 @@ -126,53 +119,55 @@ num_workers = 5 labeled_dataset = dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=sup_pipeline, - backend_args=backend_args) + backend_args=backend_args, +) unlabeled_dataset = dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_unlabeled2017.json', - data_prefix=dict(img='unlabeled2017/'), + ann_file="annotations/instances_unlabeled2017.json", + data_prefix=dict(img="unlabeled2017/"), filter_cfg=dict(filter_empty_gt=False), pipeline=unsup_pipeline, - backend_args=backend_args) + backend_args=backend_args, +) train_dataloader = dict( batch_size=batch_size, num_workers=num_workers, persistent_workers=True, - sampler=dict( - type='GroupMultiSourceSampler', - batch_size=batch_size, - source_ratio=[1, 4]), - dataset=dict( - type='ConcatDataset', datasets=[labeled_dataset, unlabeled_dataset])) + sampler=dict(type="GroupMultiSourceSampler", batch_size=batch_size, source_ratio=[1, 4]), + dataset=dict(type="ConcatDataset", datasets=[labeled_dataset, unlabeled_dataset]), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox', + type="CocoMetric", + ann_file=data_root + "annotations/instances_val2017.json", + metric="bbox", format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/v3det.py b/mmpose/configs/mmdet/_base_/datasets/v3det.py index 38ccbf864b6248192dfbf4abaf4858b5f93d45e8..be43e16f8b0b0f5a9e7a0e72c8d5c8ce435b083d 100644 --- a/mmpose/configs/mmdet/_base_/datasets/v3det.py +++ b/mmpose/configs/mmdet/_base_/datasets/v3det.py @@ -1,69 +1,68 @@ # dataset settings -dataset_type = 'V3DetDataset' -data_root = 'data/V3Det/' +dataset_type = "V3DetDataset" +data_root = "data/V3Det/" backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( - type='ClassBalancedDataset', + type="ClassBalancedDataset", oversample_thr=1e-3, dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/v3det_2023_v1_train.json', - data_prefix=dict(img=''), + ann_file="annotations/v3det_2023_v1_train.json", + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=True, min_size=4), pipeline=train_pipeline, - backend_args=backend_args))) + backend_args=backend_args, + ), + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/v3det_2023_v1_val.json', - data_prefix=dict(img=''), + ann_file="annotations/v3det_2023_v1_val.json", + data_prefix=dict(img=""), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/v3det_2023_v1_val.json', - metric='bbox', + type="CocoMetric", + ann_file=data_root + "annotations/v3det_2023_v1_val.json", + metric="bbox", format_only=False, backend_args=backend_args, use_mp_eval=True, - proposal_nums=[300]) + proposal_nums=[300], +) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/voc0712.py b/mmpose/configs/mmdet/_base_/datasets/voc0712.py index 47f5e6563b7f47dd6cfec02248d4c8decd32afe4..44a231eaaa777eb5dd49f2ff92a734168f5c002a 100644 --- a/mmpose/configs/mmdet/_base_/datasets/voc0712.py +++ b/mmpose/configs/mmdet/_base_/datasets/voc0712.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'VOCDataset' -data_root = 'data/VOCdevkit/' +dataset_type = "VOCDataset" +data_root = "data/VOCdevkit/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -18,75 +18,77 @@ data_root = 'data/VOCdevkit/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1000, 600), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1000, 600), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1000, 600), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1000, 600), keep_ratio=True), # avoid bboxes being resized - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( - type='RepeatDataset', + type="RepeatDataset", times=3, dataset=dict( - type='ConcatDataset', + type="ConcatDataset", # VOCDataset will add different `dataset_type` in dataset.metainfo, # which will get error if using ConcatDataset. Adding # `ignore_keys` can avoid this error. - ignore_keys=['dataset_type'], + ignore_keys=["dataset_type"], datasets=[ dict( type=dataset_type, data_root=data_root, - ann_file='VOC2007/ImageSets/Main/trainval.txt', - data_prefix=dict(sub_data_root='VOC2007/'), - filter_cfg=dict( - filter_empty_gt=True, min_size=32, bbox_min_size=32), + ann_file="VOC2007/ImageSets/Main/trainval.txt", + data_prefix=dict(sub_data_root="VOC2007/"), + filter_cfg=dict(filter_empty_gt=True, min_size=32, bbox_min_size=32), pipeline=train_pipeline, - backend_args=backend_args), + backend_args=backend_args, + ), dict( type=dataset_type, data_root=data_root, - ann_file='VOC2012/ImageSets/Main/trainval.txt', - data_prefix=dict(sub_data_root='VOC2012/'), - filter_cfg=dict( - filter_empty_gt=True, min_size=32, bbox_min_size=32), + ann_file="VOC2012/ImageSets/Main/trainval.txt", + data_prefix=dict(sub_data_root="VOC2012/"), + filter_cfg=dict(filter_empty_gt=True, min_size=32, bbox_min_size=32), pipeline=train_pipeline, - backend_args=backend_args) - ]))) + backend_args=backend_args, + ), + ], + ), + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='VOC2007/ImageSets/Main/test.txt', - data_prefix=dict(sub_data_root='VOC2007/'), + ann_file="VOC2007/ImageSets/Main/test.txt", + data_prefix=dict(sub_data_root="VOC2007/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader # Pascal VOC2007 uses `11points` as default evaluate mode, while PASCAL # VOC2012 defaults to use 'area'. -val_evaluator = dict(type='VOCMetric', metric='mAP', eval_mode='11points') +val_evaluator = dict(type="VOCMetric", metric="mAP", eval_mode="11points") test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/wider_face.py b/mmpose/configs/mmdet/_base_/datasets/wider_face.py index 7042bc46e877ed899969730325143307e15adf64..a52d82fa281982877f259933a6274f24e7bbee90 100644 --- a/mmpose/configs/mmdet/_base_/datasets/wider_face.py +++ b/mmpose/configs/mmdet/_base_/datasets/wider_face.py @@ -1,6 +1,6 @@ # dataset settings -dataset_type = 'WIDERFaceDataset' -data_root = 'data/WIDERFace/' +dataset_type = "WIDERFaceDataset" +data_root = "data/WIDERFace/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module # automatically infer from prefix (not support LMDB and Memcache yet) @@ -19,20 +19,17 @@ backend_args = None img_scale = (640, 640) # VGA resolution train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=img_scale, keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=img_scale, keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=img_scale, keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=img_scale, keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( @@ -40,34 +37,39 @@ train_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='train.txt', - data_prefix=dict(img='WIDER_train'), + ann_file="train.txt", + data_prefix=dict(img="WIDER_train"), filter_cfg=dict(filter_empty_gt=True, bbox_min_size=17, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='val.txt', - data_prefix=dict(img='WIDER_val'), + ann_file="val.txt", + data_prefix=dict(img="WIDER_val"), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + ), +) test_dataloader = val_dataloader val_evaluator = dict( # TODO: support WiderFace-Evaluation for easy, medium, hard cases - type='VOCMetric', - metric='mAP', - eval_mode='11points') + type="VOCMetric", + metric="mAP", + eval_mode="11points", +) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/_base_/datasets/youtube_vis.py b/mmpose/configs/mmdet/_base_/datasets/youtube_vis.py index ece07cc3879e512082e302c2e3f76108c57a0234..a7fcc14f42a2bb6462faa92f5e79847044873203 100644 --- a/mmpose/configs/mmdet/_base_/datasets/youtube_vis.py +++ b/mmpose/configs/mmdet/_base_/datasets/youtube_vis.py @@ -1,37 +1,35 @@ -dataset_type = 'YouTubeVISDataset' -data_root = 'data/youtube_vis_2019/' +dataset_type = "YouTubeVISDataset" +data_root = "data/youtube_vis_2019/" dataset_version = data_root[-5:-1] # 2019 or 2021 backend_args = None # dataset settings train_pipeline = [ + dict(type="UniformRefFrameSample", num_ref_imgs=1, frame_range=100, filter_key_img=True), dict( - type='UniformRefFrameSample', - num_ref_imgs=1, - frame_range=100, - filter_key_img=True), - dict( - type='TransformBroadcaster', + type="TransformBroadcaster", share_random_params=True, transforms=[ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadTrackAnnotations', with_mask=True), - dict(type='Resize', scale=(640, 360), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - ]), - dict(type='PackTrackInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadTrackAnnotations", with_mask=True), + dict(type="Resize", scale=(640, 360), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + ], + ), + dict(type="PackTrackInputs"), ] test_pipeline = [ dict( - type='TransformBroadcaster', + type="TransformBroadcaster", transforms=[ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(640, 360), keep_ratio=True), - dict(type='LoadTrackAnnotations', with_mask=True), - ]), - dict(type='PackTrackInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(640, 360), keep_ratio=True), + dict(type="LoadTrackAnnotations", with_mask=True), + ], + ), + dict(type="PackTrackInputs"), ] # dataloader @@ -40,27 +38,31 @@ train_dataloader = dict( num_workers=2, persistent_workers=True, # sampler=dict(type='TrackImgSampler'), # image-based sampling - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='TrackAspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="TrackAspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, dataset_version=dataset_version, - ann_file='annotations/youtube_vis_2019_train.json', - data_prefix=dict(img_path='train/JPEGImages'), - pipeline=train_pipeline)) + ann_file="annotations/youtube_vis_2019_train.json", + data_prefix=dict(img_path="train/JPEGImages"), + pipeline=train_pipeline, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, dataset_version=dataset_version, - ann_file='annotations/youtube_vis_2019_valid.json', - data_prefix=dict(img_path='valid/JPEGImages'), + ann_file="annotations/youtube_vis_2019_valid.json", + data_prefix=dict(img_path="valid/JPEGImages"), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + ), +) test_dataloader = val_dataloader diff --git a/mmpose/configs/mmdet/_base_/default_runtime.py b/mmpose/configs/mmdet/_base_/default_runtime.py index 870e5614c86d7e1bbdad13d77a0db03a46ce717a..8d28f3fa4f12f6e4078b98a257c5ee56d2ba59c9 100644 --- a/mmpose/configs/mmdet/_base_/default_runtime.py +++ b/mmpose/configs/mmdet/_base_/default_runtime.py @@ -1,24 +1,24 @@ -default_scope = 'mmdet' +default_scope = "mmdet" default_hooks = dict( - timer=dict(type='IterTimerHook'), - logger=dict(type='LoggerHook', interval=50), - param_scheduler=dict(type='ParamSchedulerHook'), - checkpoint=dict(type='CheckpointHook', interval=1), - sampler_seed=dict(type='DistSamplerSeedHook'), - visualization=dict(type='DetVisualizationHook')) + timer=dict(type="IterTimerHook"), + logger=dict(type="LoggerHook", interval=50), + param_scheduler=dict(type="ParamSchedulerHook"), + checkpoint=dict(type="CheckpointHook", interval=1), + sampler_seed=dict(type="DistSamplerSeedHook"), + visualization=dict(type="DetVisualizationHook"), +) env_cfg = dict( cudnn_benchmark=False, - mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), - dist_cfg=dict(backend='nccl'), + mp_cfg=dict(mp_start_method="fork", opencv_num_threads=0), + dist_cfg=dict(backend="nccl"), ) -vis_backends = [dict(type='LocalVisBackend')] -visualizer = dict( - type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer') -log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True) +vis_backends = [dict(type="LocalVisBackend")] +visualizer = dict(type="DetLocalVisualizer", vis_backends=vis_backends, name="visualizer") +log_processor = dict(type="LogProcessor", window_size=50, by_epoch=True) -log_level = 'INFO' +log_level = "INFO" load_from = None resume = False diff --git a/mmpose/configs/mmdet/_base_/models/cascade-mask-rcnn_r50_fpn.py b/mmpose/configs/mmdet/_base_/models/cascade-mask-rcnn_r50_fpn.py index c5167f7a02e66c80bd8ec8cc7572acb22eaadba5..7f3cbdeef90e10888418a1cc43cbb3d3d2b54436 100644 --- a/mmpose/configs/mmdet/_base_/models/cascade-mask-rcnn_r50_fpn.py +++ b/mmpose/configs/mmdet/_base_/models/cascade-mask-rcnn_r50_fpn.py @@ -1,203 +1,139 @@ # model settings model = dict( - type='CascadeRCNN', + type="CascadeRCNN", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_mask=True, - pad_size_divisor=32), + pad_size_divisor=32, + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.0), + ), roi_head=dict( - type='CascadeRoIHead', + type="CascadeRoIHead", num_stages=3, stage_loss_weights=[1, 0.5, 0.25], bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=[ dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.05, 0.05, 0.1, 0.1]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.033, 0.033, 0.067, 0.067]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), ], mask_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), mask_head=dict( - type='FCNMaskHead', + type="FCNMaskHead", num_convs=4, in_channels=256, conv_out_channels=256, num_classes=80, - loss_mask=dict( - type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), + loss_mask=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0), + ), + ), # model training and testing settings train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=[ dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, - debug=False), + debug=False, + ), dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.6, - neg_iou_thr=0.6, - min_pos_iou=0.6, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, - debug=False), + debug=False, + ), dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.7, - min_pos_iou=0.7, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.7, min_pos_iou=0.7, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, - debug=False) - ]), + debug=False, + ), + ], + ), test_cfg=dict( - rpn=dict( - nms_pre=1000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100, - mask_thr_binary=0.5))) + rpn=dict(nms_pre=1000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100, mask_thr_binary=0.5), + ), +) diff --git a/mmpose/configs/mmdet/_base_/models/cascade-rcnn_r50_fpn.py b/mmpose/configs/mmdet/_base_/models/cascade-rcnn_r50_fpn.py index 50c57f01ca3a6ea827f71801b0c233af268914f9..7901fbea7f163de97b29d221e721a1c749fedfbb 100644 --- a/mmpose/configs/mmdet/_base_/models/cascade-rcnn_r50_fpn.py +++ b/mmpose/configs/mmdet/_base_/models/cascade-rcnn_r50_fpn.py @@ -1,185 +1,117 @@ # model settings model = dict( - type='CascadeRCNN', + type="CascadeRCNN", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.0), + ), roi_head=dict( - type='CascadeRoIHead', + type="CascadeRoIHead", num_stages=3, stage_loss_weights=[1, 0.5, 0.25], bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=[ dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.05, 0.05, 0.1, 0.1]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.033, 0.033, 0.067, 0.067]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) - ]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), + ], + ), # model training and testing settings train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=[ dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False), + debug=False, + ), dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.6, - neg_iou_thr=0.6, - min_pos_iou=0.6, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False), + debug=False, + ), dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.7, - min_pos_iou=0.7, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.7, min_pos_iou=0.7, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False) - ]), + debug=False, + ), + ], + ), test_cfg=dict( - rpn=dict( - nms_pre=1000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100))) + rpn=dict(nms_pre=1000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), + ), +) diff --git a/mmpose/configs/mmdet/_base_/models/fast-rcnn_r50_fpn.py b/mmpose/configs/mmdet/_base_/models/fast-rcnn_r50_fpn.py index 2bd45e9266b01df302b78e50258fa1572144cb21..d3463394a0fe5fb54c43c391a795966f0d399652 100644 --- a/mmpose/configs/mmdet/_base_/models/fast-rcnn_r50_fpn.py +++ b/mmpose/configs/mmdet/_base_/models/fast-rcnn_r50_fpn.py @@ -1,68 +1,51 @@ # model settings model = dict( - type='FastRCNN', + type="FastRCNN", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), roi_head=dict( - type='StandardRoIHead', + type="StandardRoIHead", bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0))), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), + ), # model training and testing settings train_cfg=dict( rcnn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False)), - test_cfg=dict( - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100))) + debug=False, + ) + ), + test_cfg=dict(rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100)), +) diff --git a/mmpose/configs/mmdet/_base_/models/faster-rcnn_r50-caffe-c4.py b/mmpose/configs/mmdet/_base_/models/faster-rcnn_r50-caffe-c4.py index 15d2db72e48790505c2a1e4e7d184c1803f7ab31..1f7e40074ac2b97159350d0821db551fa2636765 100644 --- a/mmpose/configs/mmdet/_base_/models/faster-rcnn_r50-caffe-c4.py +++ b/mmpose/configs/mmdet/_base_/models/faster-rcnn_r50-caffe-c4.py @@ -1,123 +1,86 @@ # model settings -norm_cfg = dict(type='BN', requires_grad=False) +norm_cfg = dict(type="BN", requires_grad=False) model = dict( - type='FasterRCNN', + type="FasterRCNN", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=3, strides=(1, 2, 2), dilations=(1, 1, 1), - out_indices=(2, ), + out_indices=(2,), frozen_stages=1, norm_cfg=norm_cfg, norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=1024, feat_channels=1024, - anchor_generator=dict( - type='AnchorGenerator', - scales=[2, 4, 8, 16, 32], - ratios=[0.5, 1.0, 2.0], - strides=[16]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[2, 4, 8, 16, 32], ratios=[0.5, 1.0, 2.0], strides=[16]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), roi_head=dict( - type='StandardRoIHead', + type="StandardRoIHead", shared_head=dict( - type='ResLayer', + type="ResLayer", depth=50, stage=3, stride=2, dilation=1, - style='caffe', + style="caffe", norm_cfg=norm_cfg, norm_eval=True, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=0), out_channels=1024, - featmap_strides=[16]), + featmap_strides=[16], + ), bbox_head=dict( - type='BBoxHead', + type="BBoxHead", with_avg_pool=True, roi_feat_size=7, in_channels=2048, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0))), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), + ), # model training and testing settings train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=12000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=12000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False)), + debug=False, + ), + ), test_cfg=dict( - rpn=dict( - nms_pre=6000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100))) + rpn=dict(nms_pre=6000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), + ), +) diff --git a/mmpose/configs/mmdet/_base_/models/faster-rcnn_r50-caffe-dc5.py b/mmpose/configs/mmdet/_base_/models/faster-rcnn_r50-caffe-dc5.py index 189915e3d9ce7239493da6465931f91e2d9d664f..9155f234e57b7cc664d95e905076baf31457de12 100644 --- a/mmpose/configs/mmdet/_base_/models/faster-rcnn_r50-caffe-dc5.py +++ b/mmpose/configs/mmdet/_base_/models/faster-rcnn_r50-caffe-dc5.py @@ -1,111 +1,75 @@ # model settings -norm_cfg = dict(type='BN', requires_grad=False) +norm_cfg = dict(type="BN", requires_grad=False) model = dict( - type='FasterRCNN', + type="FasterRCNN", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, strides=(1, 2, 2, 1), dilations=(1, 1, 1, 2), - out_indices=(3, ), + out_indices=(3,), frozen_stages=1, norm_cfg=norm_cfg, norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=2048, feat_channels=2048, - anchor_generator=dict( - type='AnchorGenerator', - scales=[2, 4, 8, 16, 32], - ratios=[0.5, 1.0, 2.0], - strides=[16]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[2, 4, 8, 16, 32], ratios=[0.5, 1.0, 2.0], strides=[16]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), roi_head=dict( - type='StandardRoIHead', + type="StandardRoIHead", bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=2048, - featmap_strides=[16]), + featmap_strides=[16], + ), bbox_head=dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=2048, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0))), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), + ), # model training and testing settings train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=12000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=12000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False)), + debug=False, + ), + ), test_cfg=dict( - rpn=dict( - nms=dict(type='nms', iou_threshold=0.7), - nms_pre=6000, - max_per_img=1000, - min_bbox_size=0), - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100))) + rpn=dict(nms=dict(type="nms", iou_threshold=0.7), nms_pre=6000, max_per_img=1000, min_bbox_size=0), + rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), + ), +) diff --git a/mmpose/configs/mmdet/_base_/models/faster-rcnn_r50_fpn.py b/mmpose/configs/mmdet/_base_/models/faster-rcnn_r50_fpn.py index 31aa1461799a988a11adb901306a063fd3f0b951..76eee4aff8263b849687f4c7541928df439f3c09 100644 --- a/mmpose/configs/mmdet/_base_/models/faster-rcnn_r50_fpn.py +++ b/mmpose/configs/mmdet/_base_/models/faster-rcnn_r50_fpn.py @@ -1,114 +1,75 @@ # model settings model = dict( - type='FasterRCNN', + type="FasterRCNN", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), roi_head=dict( - type='StandardRoIHead', + type="StandardRoIHead", bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0))), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), + ), # model training and testing settings train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=2000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=2000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False)), + debug=False, + ), + ), test_cfg=dict( - rpn=dict( - nms_pre=1000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100) + rpn=dict(nms_pre=1000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), # soft-nms is also supported for rcnn testing # e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05) - )) + ), +) diff --git a/mmpose/configs/mmdet/_base_/models/mask-rcnn_r50-caffe-c4.py b/mmpose/configs/mmdet/_base_/models/mask-rcnn_r50-caffe-c4.py index de1131b24893ae24bd99923895fd844837c9b46d..bb993ae59736c597f1874222b8d02a7bb2e9c1da 100644 --- a/mmpose/configs/mmdet/_base_/models/mask-rcnn_r50-caffe-c4.py +++ b/mmpose/configs/mmdet/_base_/models/mask-rcnn_r50-caffe-c4.py @@ -1,132 +1,91 @@ # model settings -norm_cfg = dict(type='BN', requires_grad=False) +norm_cfg = dict(type="BN", requires_grad=False) model = dict( - type='MaskRCNN', + type="MaskRCNN", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_mask=True, - pad_size_divisor=32), + pad_size_divisor=32, + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=3, strides=(1, 2, 2), dilations=(1, 1, 1), - out_indices=(2, ), + out_indices=(2,), frozen_stages=1, norm_cfg=norm_cfg, norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=1024, feat_channels=1024, - anchor_generator=dict( - type='AnchorGenerator', - scales=[2, 4, 8, 16, 32], - ratios=[0.5, 1.0, 2.0], - strides=[16]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[2, 4, 8, 16, 32], ratios=[0.5, 1.0, 2.0], strides=[16]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), roi_head=dict( - type='StandardRoIHead', - shared_head=dict( - type='ResLayer', - depth=50, - stage=3, - stride=2, - dilation=1, - style='caffe', - norm_cfg=norm_cfg, - norm_eval=True), + type="StandardRoIHead", + shared_head=dict(type="ResLayer", depth=50, stage=3, stride=2, dilation=1, style="caffe", norm_cfg=norm_cfg, norm_eval=True), bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=0), out_channels=1024, - featmap_strides=[16]), + featmap_strides=[16], + ), bbox_head=dict( - type='BBoxHead', + type="BBoxHead", with_avg_pool=True, roi_feat_size=7, in_channels=2048, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), mask_roi_extractor=None, mask_head=dict( - type='FCNMaskHead', + type="FCNMaskHead", num_convs=0, in_channels=2048, conv_out_channels=256, num_classes=80, - loss_mask=dict( - type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), + loss_mask=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0), + ), + ), # model training and testing settings train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=12000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=12000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=14, pos_weight=-1, - debug=False)), + debug=False, + ), + ), test_cfg=dict( - rpn=dict( - nms_pre=6000, - nms=dict(type='nms', iou_threshold=0.7), - max_per_img=1000, - min_bbox_size=0), - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100, - mask_thr_binary=0.5))) + rpn=dict(nms_pre=6000, nms=dict(type="nms", iou_threshold=0.7), max_per_img=1000, min_bbox_size=0), + rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100, mask_thr_binary=0.5), + ), +) diff --git a/mmpose/configs/mmdet/_base_/models/mask-rcnn_r50_fpn.py b/mmpose/configs/mmdet/_base_/models/mask-rcnn_r50_fpn.py index b4ff7a49d0a2f0abd4823ef89ad957d9708085e7..8866f65d9cf2bff407955b4adf9eed4523319cfd 100644 --- a/mmpose/configs/mmdet/_base_/models/mask-rcnn_r50_fpn.py +++ b/mmpose/configs/mmdet/_base_/models/mask-rcnn_r50_fpn.py @@ -1,127 +1,93 @@ # model settings model = dict( - type='MaskRCNN', + type="MaskRCNN", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_mask=True, - pad_size_divisor=32), + pad_size_divisor=32, + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), roi_head=dict( - type='StandardRoIHead', + type="StandardRoIHead", bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), mask_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), mask_head=dict( - type='FCNMaskHead', + type="FCNMaskHead", num_convs=4, in_channels=256, conv_out_channels=256, num_classes=80, - loss_mask=dict( - type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), + loss_mask=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0), + ), + ), # model training and testing settings train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=2000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=2000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, - debug=False)), + debug=False, + ), + ), test_cfg=dict( - rpn=dict( - nms_pre=1000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100, - mask_thr_binary=0.5))) + rpn=dict(nms_pre=1000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100, mask_thr_binary=0.5), + ), +) diff --git a/mmpose/configs/mmdet/_base_/models/retinanet_r50_fpn.py b/mmpose/configs/mmdet/_base_/models/retinanet_r50_fpn.py index 53662c9f1390af22b15c5591e122b0aa0b2d6c92..dbb630e28712a89c9e871f3799f28d9345255110 100644 --- a/mmpose/configs/mmdet/_base_/models/retinanet_r50_fpn.py +++ b/mmpose/configs/mmdet/_base_/models/retinanet_r50_fpn.py @@ -1,68 +1,41 @@ # model settings model = dict( - type='RetinaNet', + type="RetinaNet", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_input', - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_input", num_outs=5), bbox_head=dict( - type='RetinaHead', + type="RetinaHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=4, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + type="AnchorGenerator", octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128] + ), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), # model training and testing settings train_cfg=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - sampler=dict( - type='PseudoSampler'), # Focal loss should use PseudoSampler + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), + sampler=dict(type="PseudoSampler"), # Focal loss should use PseudoSampler allowed_border=-1, pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100)) + debug=False, + ), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), +) diff --git a/mmpose/configs/mmdet/_base_/models/rpn_r50-caffe-c4.py b/mmpose/configs/mmdet/_base_/models/rpn_r50-caffe-c4.py index ed1dbe746d432d96d70e7dc9048c9e1b1727c938..333cb9088f76400930cefefb202044b78cd155ff 100644 --- a/mmpose/configs/mmdet/_base_/models/rpn_r50-caffe-c4.py +++ b/mmpose/configs/mmdet/_base_/models/rpn_r50-caffe-c4.py @@ -1,64 +1,41 @@ # model settings model = dict( - type='RPN', + type="RPN", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=3, strides=(1, 2, 2), dilations=(1, 1, 1), - out_indices=(2, ), + out_indices=(2,), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), neck=None, rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=1024, feat_channels=1024, - anchor_generator=dict( - type='AnchorGenerator', - scales=[2, 4, 8, 16, 32], - ratios=[0.5, 1.0, 2.0], - strides=[16]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[2, 4, 8, 16, 32], ratios=[0.5, 1.0, 2.0], strides=[16]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), # model training and testing settings train_cfg=dict( rpn=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, ignore_iof_thr=-1), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, - debug=False)), - test_cfg=dict( - rpn=dict( - nms_pre=12000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0))) + debug=False, + ) + ), + test_cfg=dict(rpn=dict(nms_pre=12000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0)), +) diff --git a/mmpose/configs/mmdet/_base_/models/rpn_r50_fpn.py b/mmpose/configs/mmdet/_base_/models/rpn_r50_fpn.py index 6bc4790434a368d0728d74dcd7ba79e665aae276..59075a15331fc3e47b6086fcae4b06c339608005 100644 --- a/mmpose/configs/mmdet/_base_/models/rpn_r50_fpn.py +++ b/mmpose/configs/mmdet/_base_/models/rpn_r50_fpn.py @@ -1,64 +1,39 @@ # model settings model = dict( - type='RPN', + type="RPN", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), # model training and testing settings train_cfg=dict( rpn=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, ignore_iof_thr=-1), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, - debug=False)), - test_cfg=dict( - rpn=dict( - nms_pre=2000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0))) + debug=False, + ) + ), + test_cfg=dict(rpn=dict(nms_pre=2000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0)), +) diff --git a/mmpose/configs/mmdet/_base_/models/ssd300.py b/mmpose/configs/mmdet/_base_/models/ssd300.py index fd113c7cbc41494eabb6a56061f8a90343ac9efd..1541058a25fa353bc30b3906bfd62cb09fafc1c5 100644 --- a/mmpose/configs/mmdet/_base_/models/ssd300.py +++ b/mmpose/configs/mmdet/_base_/models/ssd300.py @@ -1,63 +1,49 @@ # model settings input_size = 300 model = dict( - type='SingleStageDetector', - data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[1, 1, 1], - bgr_to_rgb=True, - pad_size_divisor=1), + type="SingleStageDetector", + data_preprocessor=dict(type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[1, 1, 1], bgr_to_rgb=True, pad_size_divisor=1), backbone=dict( - type='SSDVGG', + type="SSDVGG", depth=16, with_last_pool=False, ceil_mode=True, out_indices=(3, 4), out_feature_indices=(22, 34), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://vgg16_caffe')), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://vgg16_caffe"), + ), neck=dict( - type='SSDNeck', + type="SSDNeck", in_channels=(512, 1024), out_channels=(512, 1024, 512, 256, 256, 256), level_strides=(2, 2, 1, 1), level_paddings=(1, 1, 0, 0), - l2_norm_scale=20), + l2_norm_scale=20, + ), bbox_head=dict( - type='SSDHead', + type="SSDHead", in_channels=(512, 1024, 512, 256, 256, 256), num_classes=80, anchor_generator=dict( - type='SSDAnchorGenerator', + type="SSDAnchorGenerator", scale_major=False, input_size=input_size, basesize_ratio_range=(0.15, 0.9), strides=[8, 16, 32, 64, 100, 300], - ratios=[[2], [2, 3], [2, 3], [2, 3], [2], [2]]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2])), + ratios=[[2], [2, 3], [2, 3], [2, 3], [2], [2]], + ), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + ), # model training and testing settings train_cfg=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0., - ignore_iof_thr=-1, - gt_max_assign_all=False), - sampler=dict(type='PseudoSampler'), - smoothl1_beta=1., + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.0, ignore_iof_thr=-1, gt_max_assign_all=False), + sampler=dict(type="PseudoSampler"), + smoothl1_beta=1.0, allowed_border=-1, pos_weight=-1, neg_pos_ratio=3, - debug=False), - test_cfg=dict( - nms_pre=1000, - nms=dict(type='nms', iou_threshold=0.45), - min_bbox_size=0, - score_thr=0.02, - max_per_img=200)) + debug=False, + ), + test_cfg=dict(nms_pre=1000, nms=dict(type="nms", iou_threshold=0.45), min_bbox_size=0, score_thr=0.02, max_per_img=200), +) cudnn_benchmark = True diff --git a/mmpose/configs/mmdet/_base_/schedules/schedule_1x.py b/mmpose/configs/mmdet/_base_/schedules/schedule_1x.py index 95f30be74ff37080ba0d227d55bbd587feeaa892..e7b67f51c758804abb9b44db4efdb4fd95b57050 100644 --- a/mmpose/configs/mmdet/_base_/schedules/schedule_1x.py +++ b/mmpose/configs/mmdet/_base_/schedules/schedule_1x.py @@ -1,25 +1,16 @@ # training schedule for 1x -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=12, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=12, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically diff --git a/mmpose/configs/mmdet/_base_/schedules/schedule_20e.py b/mmpose/configs/mmdet/_base_/schedules/schedule_20e.py index 75f958b0ed11d77ae3aebff6b7a5d8cb80797d9f..82e95ffc08cb5e5c3daf2ee53e92c2ebd8b6fde2 100644 --- a/mmpose/configs/mmdet/_base_/schedules/schedule_20e.py +++ b/mmpose/configs/mmdet/_base_/schedules/schedule_20e.py @@ -1,25 +1,16 @@ # training schedule for 20e -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=20, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=20, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=20, - by_epoch=True, - milestones=[16, 19], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=20, by_epoch=True, milestones=[16, 19], gamma=0.1), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically diff --git a/mmpose/configs/mmdet/_base_/schedules/schedule_2x.py b/mmpose/configs/mmdet/_base_/schedules/schedule_2x.py index 5b7b241de6f3285e0f127f3c0581c8c84de463e4..4c7aa383fbcfe01c3821db18e80e770a580f3864 100644 --- a/mmpose/configs/mmdet/_base_/schedules/schedule_2x.py +++ b/mmpose/configs/mmdet/_base_/schedules/schedule_2x.py @@ -1,25 +1,16 @@ # training schedule for 2x -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=24, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=24, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=24, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=24, by_epoch=True, milestones=[16, 22], gamma=0.1), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically diff --git a/mmpose/configs/mmdet/albu_example/mask-rcnn_r50_fpn_albu-1x_coco.py b/mmpose/configs/mmdet/albu_example/mask-rcnn_r50_fpn_albu-1x_coco.py index b8a2780e99b88c78adbe74c024fcd2d693817030..bcd77efec957413ade76082266d35d36e511f6f8 100644 --- a/mmpose/configs/mmdet/albu_example/mask-rcnn_r50_fpn_albu-1x_coco.py +++ b/mmpose/configs/mmdet/albu_example/mask-rcnn_r50_fpn_albu-1x_coco.py @@ -1,66 +1,39 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" albu_train_transforms = [ + dict(type="ShiftScaleRotate", shift_limit=0.0625, scale_limit=0.0, rotate_limit=0, interpolation=1, p=0.5), + dict(type="RandomBrightnessContrast", brightness_limit=[0.1, 0.3], contrast_limit=[0.1, 0.3], p=0.2), dict( - type='ShiftScaleRotate', - shift_limit=0.0625, - scale_limit=0.0, - rotate_limit=0, - interpolation=1, - p=0.5), - dict( - type='RandomBrightnessContrast', - brightness_limit=[0.1, 0.3], - contrast_limit=[0.1, 0.3], - p=0.2), - dict( - type='OneOf', - transforms=[ - dict( - type='RGBShift', - r_shift_limit=10, - g_shift_limit=10, - b_shift_limit=10, - p=1.0), - dict( - type='HueSaturationValue', - hue_shift_limit=20, - sat_shift_limit=30, - val_shift_limit=20, - p=1.0) - ], - p=0.1), - dict(type='JpegCompression', quality_lower=85, quality_upper=95, p=0.2), - dict(type='ChannelShuffle', p=0.1), - dict( - type='OneOf', + type="OneOf", transforms=[ - dict(type='Blur', blur_limit=3, p=1.0), - dict(type='MedianBlur', blur_limit=3, p=1.0) + dict(type="RGBShift", r_shift_limit=10, g_shift_limit=10, b_shift_limit=10, p=1.0), + dict(type="HueSaturationValue", hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, p=1.0), ], - p=0.1), + p=0.1, + ), + dict(type="JpegCompression", quality_lower=85, quality_upper=95, p=0.2), + dict(type="ChannelShuffle", p=0.1), + dict(type="OneOf", transforms=[dict(type="Blur", blur_limit=3, p=1.0), dict(type="MedianBlur", blur_limit=3, p=1.0)], p=0.1), ] train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), dict( - type='Albu', + type="Albu", transforms=albu_train_transforms, bbox_params=dict( - type='BboxParams', - format='pascal_voc', - label_fields=['gt_bboxes_labels', 'gt_ignore_flags'], + type="BboxParams", + format="pascal_voc", + label_fields=["gt_bboxes_labels", "gt_ignore_flags"], min_visibility=0.0, - filter_lost_elements=True), - keymap={ - 'img': 'image', - 'gt_masks': 'masks', - 'gt_bboxes': 'bboxes' - }, - skip_img_without_anno=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + filter_lost_elements=True, + ), + keymap={"img": "image", "gt_masks": "masks", "gt_bboxes": "bboxes"}, + skip_img_without_anno=True, + ), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/atss/atss_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/atss/atss_r101_fpn_1x_coco.py index 5225d2ab672738d4d427eba252e92bd554252476..127ccd4d1525371658829edb6c51a77946da81a6 100644 --- a/mmpose/configs/mmdet/atss/atss_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/atss/atss_r101_fpn_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './atss_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./atss_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/atss/atss_r101_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/atss/atss_r101_fpn_8xb8-amp-lsj-200e_coco.py index 69999ce45aee9c76dcc4af974e6e9baabbd5b44b..1eb2b476f3ac9a49a3fe93a92cac823024822912 100644 --- a/mmpose/configs/mmdet/atss/atss_r101_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/atss/atss_r101_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,7 +1,3 @@ -_base_ = './atss_r50_fpn_8xb8-amp-lsj-200e_coco.py' +_base_ = "./atss_r50_fpn_8xb8-amp-lsj-200e_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/atss/atss_r18_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/atss/atss_r18_fpn_8xb8-amp-lsj-200e_coco.py index 12d9f13263619333391befd6692c83622091ef4e..09590ee2bba3c906c07b201ca6caaf8a99eb6ed6 100644 --- a/mmpose/configs/mmdet/atss/atss_r18_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/atss/atss_r18_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,7 +1,6 @@ -_base_ = './atss_r50_fpn_8xb8-amp-lsj-200e_coco.py' +_base_ = "./atss_r50_fpn_8xb8-amp-lsj-200e_coco.py" model = dict( - backbone=dict( - depth=18, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18')), - neck=dict(in_channels=[64, 128, 256, 512])) + backbone=dict(depth=18, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet18")), + neck=dict(in_channels=[64, 128, 256, 512]), +) diff --git a/mmpose/configs/mmdet/atss/atss_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/atss/atss_r50_fpn_1x_coco.py index 306435d7d2fc645f1c2deae784c1875cc4ceaf98..032dbd92de87b4c1c041e164a21f18c5bd846012 100644 --- a/mmpose/configs/mmdet/atss/atss_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/atss/atss_r50_fpn_1x_coco.py @@ -1,71 +1,38 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings model = dict( - type='ATSS', + type="ATSS", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5), bbox_head=dict( - type='ATSSHead', + type="ATSSHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=2.0), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), # training and testing settings - train_cfg=dict( - assigner=dict(type='ATSSAssigner', topk=9), - allowed_border=-1, - pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + train_cfg=dict(assigner=dict(type="ATSSAssigner", topk=9), allowed_border=-1, pos_weight=-1, debug=False), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/atss/atss_r50_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/atss/atss_r50_fpn_8xb8-amp-lsj-200e_coco.py index e3b3c46f4b926b82fbab438d6d50eb6c079dabc3..627cb4f6d1f545211d0d78d26a26a7c678017566 100644 --- a/mmpose/configs/mmdet/atss/atss_r50_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/atss/atss_r50_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,79 +1,51 @@ -_base_ = '../common/lsj-200e_coco-detection.py' +_base_ = "../common/lsj-200e_coco-detection.py" image_size = (1024, 1024) -batch_augments = [dict(type='BatchFixedSizePad', size=image_size)] +batch_augments = [dict(type="BatchFixedSizePad", size=image_size)] model = dict( - type='ATSS', + type="ATSS", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32, - batch_augments=batch_augments), + batch_augments=batch_augments, + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5), bbox_head=dict( - type='ATSSHead', + type="ATSSHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=2.0), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), # training and testing settings - train_cfg=dict( - assigner=dict(type='ATSSAssigner', topk=9), - allowed_border=-1, - pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + train_cfg=dict(assigner=dict(type="ATSSAssigner", topk=9), allowed_border=-1, pos_weight=-1, debug=False), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) train_dataloader = dict(batch_size=8, num_workers=4) # Enable automatic-mixed-precision training with AmpOptimWrapper. -optim_wrapper = dict( - type='AmpOptimWrapper', - optimizer=dict( - type='SGD', lr=0.01 * 4, momentum=0.9, weight_decay=0.00004)) +optim_wrapper = dict(type="AmpOptimWrapper", optimizer=dict(type="SGD", lr=0.01 * 4, momentum=0.9, weight_decay=0.00004)) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/autoassign/autoassign_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/autoassign/autoassign_r50-caffe_fpn_1x_coco.py index 76a361952d95b655451186ef1cb39df2f24ae305..f904684cb083071dba59056260bd124601c8e8ca 100644 --- a/mmpose/configs/mmdet/autoassign/autoassign_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/autoassign/autoassign_r50-caffe_fpn_1x_coco.py @@ -1,69 +1,51 @@ # We follow the original implementation which # adopts the Caffe pre-trained backbone. -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings model = dict( - type='AutoAssign', + type="AutoAssign", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[102.9801, 115.9465, 122.7717], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), neck=dict( - type='FPN', + type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs=True, num_outs=5, relu_before_extra_convs=True, - init_cfg=dict(type='Caffe2Xavier', layer='Conv2d')), + init_cfg=dict(type="Caffe2Xavier", layer="Conv2d"), + ), bbox_head=dict( - type='AutoAssignHead', + type="AutoAssignHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, strides=[8, 16, 32, 64, 128], - loss_bbox=dict(type='GIoULoss', loss_weight=5.0)), + loss_bbox=dict(type="GIoULoss", loss_weight=5.0), + ), train_cfg=None, - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] # optimizer -optim_wrapper = dict( - optimizer=dict(lr=0.01), paramwise_cfg=dict(norm_decay_mult=0.)) +optim_wrapper = dict(optimizer=dict(lr=0.01), paramwise_cfg=dict(norm_decay_mult=0.0)) diff --git a/mmpose/configs/mmdet/boxinst/boxinst_r101_fpn_ms-90k_coco.py b/mmpose/configs/mmdet/boxinst/boxinst_r101_fpn_ms-90k_coco.py index ab2b11628a79aee7f6f6403cecf8f7b1d0526d69..ee8413d2e8705cb32e708dfef42baec21f133136 100644 --- a/mmpose/configs/mmdet/boxinst/boxinst_r101_fpn_ms-90k_coco.py +++ b/mmpose/configs/mmdet/boxinst/boxinst_r101_fpn_ms-90k_coco.py @@ -1,8 +1,4 @@ -_base_ = './boxinst_r50_fpn_ms-90k_coco.py' +_base_ = "./boxinst_r50_fpn_ms-90k_coco.py" # model settings -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/boxinst/boxinst_r50_fpn_ms-90k_coco.py b/mmpose/configs/mmdet/boxinst/boxinst_r50_fpn_ms-90k_coco.py index 371f252a153855e19f3a3bb25cd42c83a4bb77fd..ef1317da8d04f6d64d40f645fc3b8a4e09120bee 100644 --- a/mmpose/configs/mmdet/boxinst/boxinst_r50_fpn_ms-90k_coco.py +++ b/mmpose/configs/mmdet/boxinst/boxinst_r50_fpn_ms-90k_coco.py @@ -1,10 +1,10 @@ -_base_ = '../common/ms-90k_coco.py' +_base_ = "../common/ms-90k_coco.py" # model settings model = dict( - type='BoxInst', + type="BoxInst", data_preprocessor=dict( - type='BoxInstDataPreprocessor', + type="BoxInstDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, @@ -13,27 +13,30 @@ model = dict( pairwise_size=3, pairwise_dilation=2, pairwise_color_thresh=0.3, - bottom_pixels_removed=10), + bottom_pixels_removed=10, + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), - style='pytorch'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + style="pytorch", + ), neck=dict( - type='FPN', + type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, - add_extra_convs='on_output', # use P5 + add_extra_convs="on_output", # use P5 num_outs=5, - relu_before_extra_convs=True), + relu_before_extra_convs=True, + ), bbox_head=dict( - type='BoxInstBboxHead', + type="BoxInstBboxHead", num_params=593, num_classes=80, in_channels=256, @@ -45,17 +48,12 @@ model = dict( dcn_on_last_conv=False, center_sampling=True, conv_bias=True, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=1.0), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=1.0), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), mask_head=dict( - type='BoxInstMaskHead', + type="BoxInstMaskHead", num_layers=3, feat_channels=16, size_of_interest=8, @@ -69,25 +67,17 @@ model = dict( out_channels=16, mask_stride=8, num_stacked_convs=4, - norm_cfg=dict(type='BN', requires_grad=True)), - loss_mask=dict( - type='DiceLoss', - use_sigmoid=True, - activate=True, - eps=5e-6, - loss_weight=1.0)), + norm_cfg=dict(type="BN", requires_grad=True), + ), + loss_mask=dict(type="DiceLoss", use_sigmoid=True, activate=True, eps=5e-6, loss_weight=1.0), + ), # model training and testing settings - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100, - mask_thr=0.5)) + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100, mask_thr=0.5), +) # optimizer optim_wrapper = dict(optimizer=dict(lr=0.01)) # evaluator -val_evaluator = dict(metric=['bbox', 'segm']) +val_evaluator = dict(metric=["bbox", "segm"]) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py b/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py index 24b3f7841947204f2ecea385dcfa8b97fa0c6e85..c3dc5395b1bd110e50768e0e24fe3c5e84e9e6f1 100644 --- a/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py +++ b/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py @@ -1,91 +1,68 @@ -_base_ = ['../yolox/yolox_x_8xb8-300e_coco.py'] +_base_ = ["../yolox/yolox_x_8xb8-300e_coco.py"] -dataset_type = 'MOTChallengeDataset' -data_root = 'data/MOT17/' +dataset_type = "MOTChallengeDataset" +data_root = "data/MOT17/" img_scale = (1440, 800) # weight, height batch_size = 4 detector = _base_.model -detector.pop('data_preprocessor') +detector.pop("data_preprocessor") detector.bbox_head.update(dict(num_classes=1)) detector.test_cfg.nms.update(dict(iou_threshold=0.7)) -detector['init_cfg'] = dict( - type='Pretrained', - checkpoint= # noqa: E251 - 'https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_x_8x8_300e_coco/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth' # noqa: E501 +detector["init_cfg"] = dict( + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_x_8x8_300e_coco/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth", # noqa: E251 # noqa: E501 ) del _base_.model model = dict( - type='ByteTrack', + type="ByteTrack", data_preprocessor=dict( - type='TrackDataPreprocessor', + type="TrackDataPreprocessor", pad_size_divisor=32, # in bytetrack, we provide joint train detector and evaluate tracking # performance, use_det_processor means use independent detector # data_preprocessor. of course, you can train detector independently # like strongsort use_det_processor=True, - batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(576, 1024), - size_divisor=32, - interval=10) - ]), + batch_augments=[dict(type="BatchSyncRandomResize", random_size_range=(576, 1024), size_divisor=32, interval=10)], + ), detector=detector, tracker=dict( - type='ByteTracker', - motion=dict(type='KalmanFilter'), + type="ByteTracker", + motion=dict(type="KalmanFilter"), obj_score_thrs=dict(high=0.6, low=0.1), init_track_thr=0.7, weight_iou_with_det_scores=True, match_iou_thrs=dict(high=0.1, low=0.5, tentative=0.3), - num_frames_retain=30)) + num_frames_retain=30, + ), +) train_pipeline = [ - dict( - type='Mosaic', - img_scale=img_scale, - pad_val=114.0, - bbox_clip_border=False), - dict( - type='RandomAffine', - scaling_ratio_range=(0.1, 2), - border=(-img_scale[0] // 2, -img_scale[1] // 2), - bbox_clip_border=False), - dict( - type='MixUp', - img_scale=img_scale, - ratio_range=(0.8, 1.6), - pad_val=114.0, - bbox_clip_border=False), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict( - type='Resize', - scale=img_scale, - keep_ratio=True, - clip_object_border=False), - dict(type='Pad', size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False), - dict(type='PackDetInputs') + dict(type="Mosaic", img_scale=img_scale, pad_val=114.0, bbox_clip_border=False), + dict(type="RandomAffine", scaling_ratio_range=(0.1, 2), border=(-img_scale[0] // 2, -img_scale[1] // 2), bbox_clip_border=False), + dict(type="MixUp", img_scale=img_scale, ratio_range=(0.8, 1.6), pad_val=114.0, bbox_clip_border=False), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Resize", scale=img_scale, keep_ratio=True, clip_object_border=False), + dict(type="Pad", size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1), keep_empty=False), + dict(type="PackDetInputs"), ] test_pipeline = [ dict( - type='TransformBroadcaster', + type="TransformBroadcaster", transforms=[ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='Resize', scale=img_scale, keep_ratio=True), - dict( - type='Pad', - size_divisor=32, - pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='LoadTrackAnnotations'), - ]), - dict(type='PackTrackInputs') + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="Resize", scale=img_scale, keep_ratio=True), + dict(type="Pad", size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="LoadTrackAnnotations"), + ], + ), + dict(type="PackTrackInputs"), ] train_dataloader = dict( _delete_=True, @@ -93,53 +70,53 @@ train_dataloader = dict( num_workers=4, persistent_workers=True, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='MultiImageMixDataset', + type="MultiImageMixDataset", dataset=dict( - type='ConcatDataset', + type="ConcatDataset", datasets=[ dict( - type='CocoDataset', - data_root='data/MOT17', - ann_file='annotations/half-train_cocoformat.json', - data_prefix=dict(img='train'), + type="CocoDataset", + data_root="data/MOT17", + ann_file="annotations/half-train_cocoformat.json", + data_prefix=dict(img="train"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - metainfo=dict(classes=('pedestrian', )), + metainfo=dict(classes=("pedestrian",)), pipeline=[ - dict( - type='LoadImageFromFile', - backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - ]), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + ], + ), dict( - type='CocoDataset', - data_root='data/crowdhuman', - ann_file='annotations/crowdhuman_train.json', - data_prefix=dict(img='train'), + type="CocoDataset", + data_root="data/crowdhuman", + ann_file="annotations/crowdhuman_train.json", + data_prefix=dict(img="train"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - metainfo=dict(classes=('pedestrian', )), + metainfo=dict(classes=("pedestrian",)), pipeline=[ - dict( - type='LoadImageFromFile', - backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - ]), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + ], + ), dict( - type='CocoDataset', - data_root='data/crowdhuman', - ann_file='annotations/crowdhuman_val.json', - data_prefix=dict(img='val'), + type="CocoDataset", + data_root="data/crowdhuman", + ann_file="annotations/crowdhuman_val.json", + data_prefix=dict(img="val"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - metainfo=dict(classes=('pedestrian', )), + metainfo=dict(classes=("pedestrian",)), pipeline=[ - dict( - type='LoadImageFromFile', - backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - ]), - ]), - pipeline=train_pipeline)) + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + ], + ), + ], + ), + pipeline=train_pipeline, + ), +) val_dataloader = dict( _delete_=True, @@ -150,14 +127,16 @@ val_dataloader = dict( drop_last=False, # video_based # sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), - sampler=dict(type='TrackImgSampler'), # image_based + sampler=dict(type="TrackImgSampler"), # image_based dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/half-val_cocoformat.json', - data_prefix=dict(img_path='train'), + ann_file="annotations/half-val_cocoformat.json", + data_prefix=dict(img_path="train"), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + ), +) test_dataloader = val_dataloader # optimizer @@ -171,71 +150,59 @@ max_epochs = 80 num_last_epochs = 10 interval = 5 -train_cfg = dict( - type='EpochBasedTrainLoop', - max_epochs=max_epochs, - val_begin=70, - val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_begin=70, val_interval=1) # learning policy param_scheduler = [ dict( # use quadratic formula to warm up 1 epochs - type='QuadraticWarmupLR', + type="QuadraticWarmupLR", by_epoch=True, begin=0, end=1, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), dict( # use cosine lr from 1 to 70 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=1, T_max=max_epochs - num_last_epochs, end=max_epochs - num_last_epochs, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), dict( # use fixed lr during last 10 epochs - type='ConstantLR', + type="ConstantLR", by_epoch=True, factor=1, begin=max_epochs - num_last_epochs, end=max_epochs, - ) + ), ] custom_hooks = [ - dict( - type='YOLOXModeSwitchHook', - num_last_epochs=num_last_epochs, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0001, - update_buffers=True, - priority=49) + dict(type="YOLOXModeSwitchHook", num_last_epochs=num_last_epochs, priority=48), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0001, update_buffers=True, priority=49), ] default_hooks = dict( - checkpoint=dict( - _delete_=True, type='CheckpointHook', interval=1, max_keep_ckpts=10), - visualization=dict(type='TrackVisualizationHook', draw=False)) + checkpoint=dict(_delete_=True, type="CheckpointHook", interval=1, max_keep_ckpts=10), + visualization=dict(type="TrackVisualizationHook", draw=False), +) -vis_backends = [dict(type='LocalVisBackend')] -visualizer = dict( - type='TrackLocalVisualizer', vis_backends=vis_backends, name='visualizer') +vis_backends = [dict(type="LocalVisBackend")] +visualizer = dict(type="TrackLocalVisualizer", vis_backends=vis_backends, name="visualizer") # evaluator val_evaluator = dict( _delete_=True, - type='MOTChallengeMetric', - metric=['HOTA', 'CLEAR', 'Identity'], - postprocess_tracklet_cfg=[ - dict(type='InterpolateTracklets', min_num_frames=5, max_num_frames=20) - ]) + type="MOTChallengeMetric", + metric=["HOTA", "CLEAR", "Identity"], + postprocess_tracklet_cfg=[dict(type="InterpolateTracklets", min_num_frames=5, max_num_frames=20)], +) test_evaluator = val_evaluator # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-80e_crowdhuman-mot20train_test-mot20test.py b/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-80e_crowdhuman-mot20train_test-mot20test.py index 9202f5fbda29d2a1d4cc81322c99d638ebf475d6..c238fa334c7762f5bc45e052d08e9bf0935c918f 100644 --- a/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-80e_crowdhuman-mot20train_test-mot20test.py +++ b/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-80e_crowdhuman-mot20train_test-mot20test.py @@ -1,127 +1,101 @@ -_base_ = [ - './bytetrack_yolox_x_8xb4-80e_crowdhuman-mot17halftrain_' - 'test-mot17halfval.py' -] +_base_ = ["./bytetrack_yolox_x_8xb4-80e_crowdhuman-mot17halftrain_" "test-mot17halfval.py"] -dataset_type = 'MOTChallengeDataset' +dataset_type = "MOTChallengeDataset" img_scale = (1600, 896) # weight, height model = dict( data_preprocessor=dict( - type='TrackDataPreprocessor', + type="TrackDataPreprocessor", use_det_processor=True, pad_size_divisor=32, - batch_augments=[ - dict(type='BatchSyncRandomResize', random_size_range=(640, 1152)) - ]), + batch_augments=[dict(type="BatchSyncRandomResize", random_size_range=(640, 1152))], + ), tracker=dict( weight_iou_with_det_scores=False, match_iou_thrs=dict(high=0.3), - )) + ), +) train_pipeline = [ - dict( - type='Mosaic', - img_scale=img_scale, - pad_val=114.0, - bbox_clip_border=True), - dict( - type='RandomAffine', - scaling_ratio_range=(0.1, 2), - border=(-img_scale[0] // 2, -img_scale[1] // 2), - bbox_clip_border=True), - dict( - type='MixUp', - img_scale=img_scale, - ratio_range=(0.8, 1.6), - pad_val=114.0, - bbox_clip_border=True), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict( - type='Resize', - scale=img_scale, - keep_ratio=True, - clip_object_border=True), - dict(type='Pad', size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False), - dict(type='PackDetInputs') + dict(type="Mosaic", img_scale=img_scale, pad_val=114.0, bbox_clip_border=True), + dict(type="RandomAffine", scaling_ratio_range=(0.1, 2), border=(-img_scale[0] // 2, -img_scale[1] // 2), bbox_clip_border=True), + dict(type="MixUp", img_scale=img_scale, ratio_range=(0.8, 1.6), pad_val=114.0, bbox_clip_border=True), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Resize", scale=img_scale, keep_ratio=True, clip_object_border=True), + dict(type="Pad", size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1), keep_empty=False), + dict(type="PackDetInputs"), ] test_pipeline = [ dict( - type='TransformBroadcaster', + type="TransformBroadcaster", transforms=[ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='Resize', scale=img_scale, keep_ratio=True), - dict( - type='Pad', - size_divisor=32, - pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='LoadTrackAnnotations'), - ]), - dict(type='PackTrackInputs') + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="Resize", scale=img_scale, keep_ratio=True), + dict(type="Pad", size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="LoadTrackAnnotations"), + ], + ), + dict(type="PackTrackInputs"), ] train_dataloader = dict( dataset=dict( - type='MultiImageMixDataset', + type="MultiImageMixDataset", dataset=dict( - type='ConcatDataset', + type="ConcatDataset", datasets=[ dict( - type='CocoDataset', - data_root='data/MOT20', - ann_file='annotations/train_cocoformat.json', + type="CocoDataset", + data_root="data/MOT20", + ann_file="annotations/train_cocoformat.json", # TODO: mmdet use img as key, but img_path is needed - data_prefix=dict(img='train'), + data_prefix=dict(img="train"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - metainfo=dict(classes=('pedestrian', )), + metainfo=dict(classes=("pedestrian",)), pipeline=[ - dict( - type='LoadImageFromFile', - backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - ]), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + ], + ), dict( - type='CocoDataset', - data_root='data/crowdhuman', - ann_file='annotations/crowdhuman_train.json', - data_prefix=dict(img='train'), + type="CocoDataset", + data_root="data/crowdhuman", + ann_file="annotations/crowdhuman_train.json", + data_prefix=dict(img="train"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - metainfo=dict(classes=('pedestrian', )), + metainfo=dict(classes=("pedestrian",)), pipeline=[ - dict( - type='LoadImageFromFile', - backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - ]), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + ], + ), dict( - type='CocoDataset', - data_root='data/crowdhuman', - ann_file='annotations/crowdhuman_val.json', - data_prefix=dict(img='val'), + type="CocoDataset", + data_root="data/crowdhuman", + ann_file="annotations/crowdhuman_val.json", + data_prefix=dict(img="val"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - metainfo=dict(classes=('pedestrian', )), + metainfo=dict(classes=("pedestrian",)), pipeline=[ - dict( - type='LoadImageFromFile', - backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - ]), - ]), - pipeline=train_pipeline)) -val_dataloader = dict( - dataset=dict(ann_file='annotations/train_cocoformat.json')) + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + ], + ), + ], + ), + pipeline=train_pipeline, + ) +) +val_dataloader = dict(dataset=dict(ann_file="annotations/train_cocoformat.json")) -test_dataloader = dict( - dataset=dict( - data_root='data/MOT20', ann_file='annotations/test_cocoformat.json')) +test_dataloader = dict(dataset=dict(data_root="data/MOT20", ann_file="annotations/test_cocoformat.json")) test_evaluator = dict( - type='MOTChallengeMetrics', - postprocess_tracklet_cfg=[ - dict(type='InterpolateTracklets', min_num_frames=5, max_num_frames=20) - ], + type="MOTChallengeMetrics", + postprocess_tracklet_cfg=[dict(type="InterpolateTracklets", min_num_frames=5, max_num_frames=20)], format_only=True, - outfile_prefix='./mot_20_test_res') + outfile_prefix="./mot_20_test_res", +) diff --git a/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py b/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py index 9c2119203a46e76cd8b6cc8f755334f58ffb086d..ac09a25f259f371bfbf9ae3d607a70182c4d41c5 100644 --- a/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py +++ b/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py @@ -1,9 +1,6 @@ -_base_ = [ - './bytetrack_yolox_x_8xb4-80e_crowdhuman-mot17halftrain_' - 'test-mot17halfval.py' -] +_base_ = ["./bytetrack_yolox_x_8xb4-80e_crowdhuman-mot17halftrain_" "test-mot17halfval.py"] # fp16 settings -optim_wrapper = dict(type='AmpOptimWrapper', loss_scale='dynamic') -val_cfg = dict(type='ValLoop', fp16=True) -test_cfg = dict(type='TestLoop', fp16=True) +optim_wrapper = dict(type="AmpOptimWrapper", loss_scale="dynamic") +val_cfg = dict(type="ValLoop", fp16=True) +test_cfg = dict(type="TestLoop", fp16=True) diff --git a/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17test.py b/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17test.py index 3f4427c18bff66ab1fa2a9ba22517989722d0625..1ef17ec6395e1f9de1aa6e941013a6330e8589eb 100644 --- a/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17test.py +++ b/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17test.py @@ -1,17 +1,11 @@ -_base_ = [ - './bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-' - 'mot17halftrain_test-mot17halfval.py' -] +_base_ = ["./bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-" "mot17halftrain_test-mot17halfval.py"] test_dataloader = dict( - dataset=dict( - data_root='data/MOT17/', - ann_file='annotations/test_cocoformat.json', - data_prefix=dict(img_path='test'))) + dataset=dict(data_root="data/MOT17/", ann_file="annotations/test_cocoformat.json", data_prefix=dict(img_path="test")) +) test_evaluator = dict( - type='MOTChallengeMetrics', - postprocess_tracklet_cfg=[ - dict(type='InterpolateTracklets', min_num_frames=5, max_num_frames=20) - ], + type="MOTChallengeMetrics", + postprocess_tracklet_cfg=[dict(type="InterpolateTracklets", min_num_frames=5, max_num_frames=20)], format_only=True, - outfile_prefix='./mot_17_test_res') + outfile_prefix="./mot_17_test_res", +) diff --git a/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot20train_test-mot20test.py b/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot20train_test-mot20test.py index 1016999729263d72bbd75019be4968bc3960e368..36a088effb6538e493b45bd226d2fccf067f9a51 100644 --- a/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot20train_test-mot20test.py +++ b/mmpose/configs/mmdet/bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot20train_test-mot20test.py @@ -1,8 +1,6 @@ -_base_ = [ - './bytetrack_yolox_x_8xb4-80e_crowdhuman-mot20train_test-mot20test.py' -] +_base_ = ["./bytetrack_yolox_x_8xb4-80e_crowdhuman-mot20train_test-mot20test.py"] # fp16 settings -optim_wrapper = dict(type='AmpOptimWrapper', loss_scale='dynamic') -val_cfg = dict(type='ValLoop', fp16=True) -test_cfg = dict(type='TestLoop', fp16=True) +optim_wrapper = dict(type="AmpOptimWrapper", loss_scale="dynamic") +val_cfg = dict(type="ValLoop", fp16=True) +test_cfg = dict(type="TestLoop", fp16=True) diff --git a/mmpose/configs/mmdet/bytetrack/yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py b/mmpose/configs/mmdet/bytetrack/yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py index 8fc3acd487211d04fb3d6e4504ded5235393e4a7..e59dace256229f91bfcadb235829c3c5f951b030 100644 --- a/mmpose/configs/mmdet/bytetrack/yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py +++ b/mmpose/configs/mmdet/bytetrack/yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py @@ -1,6 +1,4 @@ -_base_ = [ - '../strongsort/yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py' # noqa: E501 -] +_base_ = ["../strongsort/yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py"] # noqa: E501 # fp16 settings -optim_wrapper = dict(type='AmpOptimWrapper', loss_scale='dynamic') +optim_wrapper = dict(type="AmpOptimWrapper", loss_scale="dynamic") diff --git a/mmpose/configs/mmdet/carafe/faster-rcnn_r50_fpn-carafe_1x_coco.py b/mmpose/configs/mmdet/carafe/faster-rcnn_r50_fpn-carafe_1x_coco.py index 388305cceac2e81eb1b4df6eac36662df7b8bf0d..4678be0fa018332280ece7de121d1ca2b08d85ef 100644 --- a/mmpose/configs/mmdet/carafe/faster-rcnn_r50_fpn-carafe_1x_coco.py +++ b/mmpose/configs/mmdet/carafe/faster-rcnn_r50_fpn-carafe_1x_coco.py @@ -1,8 +1,8 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" model = dict( data_preprocessor=dict(pad_size_divisor=64), neck=dict( - type='FPN_CARAFE', + type="FPN_CARAFE", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5, @@ -10,11 +10,7 @@ model = dict( end_level=-1, norm_cfg=None, act_cfg=None, - order=('conv', 'norm', 'act'), - upsample_cfg=dict( - type='carafe', - up_kernel=5, - up_group=1, - encoder_kernel=3, - encoder_dilation=1, - compressed_channels=64))) + order=("conv", "norm", "act"), + upsample_cfg=dict(type="carafe", up_kernel=5, up_group=1, encoder_kernel=3, encoder_dilation=1, compressed_channels=64), + ), +) diff --git a/mmpose/configs/mmdet/carafe/mask-rcnn_r50_fpn-carafe_1x_coco.py b/mmpose/configs/mmdet/carafe/mask-rcnn_r50_fpn-carafe_1x_coco.py index 6ce621de77aff60f39126136cb25ca9ca38a1c9f..f7b7fbd5c7556b543367d4b36bbc111e5a765b6f 100644 --- a/mmpose/configs/mmdet/carafe/mask-rcnn_r50_fpn-carafe_1x_coco.py +++ b/mmpose/configs/mmdet/carafe/mask-rcnn_r50_fpn-carafe_1x_coco.py @@ -1,8 +1,8 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" model = dict( data_preprocessor=dict(pad_size_divisor=64), neck=dict( - type='FPN_CARAFE', + type="FPN_CARAFE", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5, @@ -10,21 +10,14 @@ model = dict( end_level=-1, norm_cfg=None, act_cfg=None, - order=('conv', 'norm', 'act'), - upsample_cfg=dict( - type='carafe', - up_kernel=5, - up_group=1, - encoder_kernel=3, - encoder_dilation=1, - compressed_channels=64)), + order=("conv", "norm", "act"), + upsample_cfg=dict(type="carafe", up_kernel=5, up_group=1, encoder_kernel=3, encoder_dilation=1, compressed_channels=64), + ), roi_head=dict( mask_head=dict( upsample_cfg=dict( - type='carafe', - scale_factor=2, - up_kernel=5, - up_group=1, - encoder_kernel=3, - encoder_dilation=1, - compressed_channels=64)))) + type="carafe", scale_factor=2, up_kernel=5, up_group=1, encoder_kernel=3, encoder_dilation=1, compressed_channels=64 + ) + ) + ), +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101-caffe_fpn_1x_coco.py index 6d85340e1cb92c60293c3710d05ef708d3726fdd..d7bbda042ba906d0c4a759014dfbbbc5b777b134 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101-caffe_fpn_1x_coco.py @@ -1,7 +1,2 @@ -_base_ = './cascade-mask-rcnn_r50-caffe_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) +_base_ = "./cascade-mask-rcnn_r50-caffe_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"))) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101-caffe_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101-caffe_fpn_ms-3x_coco.py index a6855ee8c6fffd5e8d48f6cc2bb41e9dde9f6516..9c6710edc09bf7140c6ee2e94e33efad169f13ab 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101-caffe_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101-caffe_fpn_ms-3x_coco.py @@ -1,7 +1,2 @@ -_base_ = './cascade-mask-rcnn_r50-caffe_fpn_ms-3x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) +_base_ = "./cascade-mask-rcnn_r50-caffe_fpn_ms-3x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"))) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101_fpn_1x_coco.py index c3d962c229d2621e7364c13959e3c4c1137edef1..a33a91cd2dad9da3e9a7c6362cf05dccc43ec4b6 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101_fpn_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './cascade-mask-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./cascade-mask-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101_fpn_20e_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101_fpn_20e_coco.py index 497148f513edb79ca58f719f242be6274f923a65..3d262f8a42497b3e1db2928ce1cc876cb8129463 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101_fpn_20e_coco.py @@ -1,6 +1,2 @@ -_base_ = './cascade-mask-rcnn_r50_fpn_20e_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./cascade-mask-rcnn_r50_fpn_20e_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101_fpn_ms-3x_coco.py index 183b5c50ff5563d987b2937d27d6d02bdd6cc2bd..e6d9eae00dd00121e2ee60d3a6801631e1476cb6 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r101_fpn_ms-3x_coco.py @@ -1,6 +1,2 @@ -_base_ = './cascade-mask-rcnn_r50_fpn_ms-3x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./cascade-mask-rcnn_r50_fpn_ms-3x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50-caffe_fpn_1x_coco.py index 497f68c4ab458ec49ad1d0c89cabbb2c0eb444f3..6c8a2e6f8df954cb8f9cce5b4d557b71040847b7 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50-caffe_fpn_1x_coco.py @@ -1,14 +1,11 @@ -_base_ = ['./cascade-mask-rcnn_r50_fpn_1x_coco.py'] +_base_ = ["./cascade-mask-rcnn_r50_fpn_1x_coco.py"] model = dict( - data_preprocessor=dict( - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False), + data_preprocessor=dict(mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False), backbone=dict( norm_cfg=dict(requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe'))) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50-caffe_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50-caffe_fpn_ms-3x_coco.py index 6677a9fea501a7683475dc8b865659cef5485bbe..e00a1230ce5572ca4c8e00b82fd7ab067489063a 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50-caffe_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50-caffe_fpn_ms-3x_coco.py @@ -1,18 +1,12 @@ -_base_ = [ - '../common/ms_3x_coco-instance.py', - '../_base_/models/cascade-mask-rcnn_r50_fpn.py' -] +_base_ = ["../common/ms_3x_coco-instance.py", "../_base_/models/cascade-mask-rcnn_r50_fpn.py"] model = dict( # use caffe img_norm - data_preprocessor=dict( - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False), + data_preprocessor=dict(mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False), backbone=dict( norm_cfg=dict(requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe'))) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50_fpn_1x_coco.py index f59bb94eaaf3e850e971268383cd0275bcddf54d..77b3ebac41ebf64e6113defd40d3179e5cb435eb 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50_fpn_1x_coco.py @@ -1,5 +1,6 @@ _base_ = [ - '../_base_/models/cascade-mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50_fpn_20e_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50_fpn_20e_coco.py index 35c8aa6748d25e4c9c834478488ee21b44c8f2bd..eb9ed4f3009e780456a889f5f01069de783b4ed9 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50_fpn_20e_coco.py @@ -1,5 +1,6 @@ _base_ = [ - '../_base_/models/cascade-mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_20e.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_20e.py", + "../_base_/default_runtime.py", ] diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50_fpn_ms-3x_coco.py index b15006f451f346216243dc61140e9907535f0b20..b4efc09e6349e0e98a21e2bab2f05b7ff2e67a43 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_r50_fpn_ms-3x_coco.py @@ -1,4 +1 @@ -_base_ = [ - '../common/ms_3x_coco-instance.py', - '../_base_/models/cascade-mask-rcnn_r50_fpn.py' -] +_base_ = ["../common/ms_3x_coco-instance.py", "../_base_/models/cascade-mask-rcnn_r50_fpn.py"] diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_1x_coco.py index 87a4cc325a10b01cbf5a91e336da2281bc19a728..538065afa502b92b6c3c474a3376a5fb6ecc4b35 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './cascade-mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "./cascade-mask-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_20e_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_20e_coco.py index 5e8dcaa6891877c89acb024b9811a4fe7a87bc3b..328e99efbb30e830141a36d1db6d4e4dc284d2ad 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_20e_coco.py @@ -1,14 +1,15 @@ -_base_ = './cascade-mask-rcnn_r50_fpn_20e_coco.py' +_base_ = "./cascade-mask-rcnn_r50_fpn_20e_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_ms-3x_coco.py index 3a0f61b9aee2b0ab80c5c9b998a73826e5ff45a6..44485685b51333e6e8c6575295ae32927c7c39e9 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_ms-3x_coco.py @@ -1,14 +1,15 @@ -_base_ = './cascade-mask-rcnn_r50_fpn_ms-3x_coco.py' +_base_ = "./cascade-mask-rcnn_r50_fpn_ms-3x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x8d_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x8d_fpn_ms-3x_coco.py index 8cf08306850bdaef776a0ce53b88b23b9013a1a0..91172f1257b2933b7d00adb2957a9851ebe031be 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x8d_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-32x8d_fpn_ms-3x_coco.py @@ -1,24 +1,21 @@ -_base_ = './cascade-mask-rcnn_r50_fpn_ms-3x_coco.py' +_base_ = "./cascade-mask-rcnn_r50_fpn_ms-3x_coco.py" model = dict( # ResNeXt-101-32x8d model trained with Caffe2 at FB, # so the mean and std need to be changed. data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[57.375, 57.120, 58.395], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[57.375, 57.120, 58.395], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=8, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), - style='pytorch', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnext101_32x8d'))) + norm_cfg=dict(type="BN", requires_grad=False), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnext101_32x8d"), + ), +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-64x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-64x4d_fpn_1x_coco.py index fb2e6b6b9507dcf38403d38499e1d57bd792a4da..a69584ed461fbe29c233fa3c0d41402644b56ce4 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-64x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-64x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './cascade-mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "./cascade-mask-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-64x4d_fpn_20e_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-64x4d_fpn_20e_coco.py index cc20c171542b5d75634d99d9ed25eea3acf8df19..1cc1889dd046c2d3bfda80961e238a6fd87eb053 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-64x4d_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-64x4d_fpn_20e_coco.py @@ -1,14 +1,15 @@ -_base_ = './cascade-mask-rcnn_r50_fpn_20e_coco.py' +_base_ = "./cascade-mask-rcnn_r50_fpn_20e_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-64x4d_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-64x4d_fpn_ms-3x_coco.py index f4ecc42655903c271e7e181b719d09821118a204..93e67cb2fbd079a7002a8bd3002595237656fcbd 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-64x4d_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-mask-rcnn_x101-64x4d_fpn_ms-3x_coco.py @@ -1,14 +1,15 @@ -_base_ = './cascade-mask-rcnn_r50_fpn_ms-3x_coco.py' +_base_ = "./cascade-mask-rcnn_r50_fpn_ms-3x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101-caffe_fpn_1x_coco.py index b6eaee2db700b897255ed44a5fd30bc23929388f..ecc2da8385b7135c7a18fce2067f11034c05aede 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101-caffe_fpn_1x_coco.py @@ -1,7 +1,2 @@ -_base_ = './cascade-rcnn_r50-caffe_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) +_base_ = "./cascade-rcnn_r50-caffe_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"))) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101_fpn_1x_coco.py index 1cdf5108b7d2908e420c52c59f8a9805c7989702..1d59d82d3a9a48c41f285117cec64ff5fa3e788d 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101_fpn_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './cascade-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./cascade-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101_fpn_20e_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101_fpn_20e_coco.py index 84c285fc9e59d4191e79dd337ece2baff3d38b02..ebe4863551fe86324f14ff45635abad1535efa0d 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101_fpn_20e_coco.py @@ -1,6 +1,2 @@ -_base_ = './cascade-rcnn_r50_fpn_20e_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./cascade-rcnn_r50_fpn_20e_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101_fpn_8xb8-amp-lsj-200e_coco.py index 1fc52e9cb8e1e9c27d45e32200b0b72efa8c363d..30bb76abdac063d117708893cdd5fb515d7f8e5f 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r101_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,7 +1,3 @@ -_base_ = './cascade-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py' +_base_ = "./cascade-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r18_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r18_fpn_8xb8-amp-lsj-200e_coco.py index aa30a3d07f5644dfc6f79f0eafc374518149e777..979ab9d32fc1f94c7a0b41c9255afb717cb87904 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r18_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r18_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,7 +1,6 @@ -_base_ = './cascade-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py' +_base_ = "./cascade-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py" model = dict( - backbone=dict( - depth=18, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18')), - neck=dict(in_channels=[64, 128, 256, 512])) + backbone=dict(depth=18, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet18")), + neck=dict(in_channels=[64, 128, 256, 512]), +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50-caffe_fpn_1x_coco.py index ad90e259b2d8410309bfd877b74755524b94f788..5c8615234bdf7191ce4f586a7314313da2d05c95 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50-caffe_fpn_1x_coco.py @@ -1,16 +1,13 @@ -_base_ = './cascade-rcnn_r50_fpn_1x_coco.py' +_base_ = "./cascade-rcnn_r50_fpn_1x_coco.py" model = dict( # use caffe img_norm data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( norm_cfg=dict(requires_grad=False), - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe'))) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50_fpn_1x_coco.py index 1a07c8b2302b9c2337d4da2d32c388142ca1f748..1aeaa6a3a388d2b96805419e48e41eaf0107eb58 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50_fpn_1x_coco.py @@ -1,5 +1,6 @@ _base_ = [ - '../_base_/models/cascade-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50_fpn_20e_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50_fpn_20e_coco.py index 30f3ff106018ba51173f018c196cf62a88fdb172..da82c03ca0178f35a7c0fa9698d28fc133663dd6 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50_fpn_20e_coco.py @@ -1,5 +1,6 @@ _base_ = [ - '../_base_/models/cascade-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_20e.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_20e.py", + "../_base_/default_runtime.py", ] diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py index cd25f02608c3f51a59e35185a41080c6e8e3a1ea..03c3ac236b51572ac7b5fba7d284dcf6c8058c81 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,21 +1,13 @@ -_base_ = [ - '../_base_/models/cascade-rcnn_r50_fpn.py', - '../common/lsj-200e_coco-detection.py' -] +_base_ = ["../_base_/models/cascade-rcnn_r50_fpn.py", "../common/lsj-200e_coco-detection.py"] image_size = (1024, 1024) -batch_augments = [dict(type='BatchFixedSizePad', size=image_size)] +batch_augments = [dict(type="BatchFixedSizePad", size=image_size)] # disable allowed_border to avoid potential errors. -model = dict( - data_preprocessor=dict(batch_augments=batch_augments), - train_cfg=dict(rpn=dict(allowed_border=-1))) +model = dict(data_preprocessor=dict(batch_augments=batch_augments), train_cfg=dict(rpn=dict(allowed_border=-1))) train_dataloader = dict(batch_size=8, num_workers=4) # Enable automatic-mixed-precision training with AmpOptimWrapper. -optim_wrapper = dict( - type='AmpOptimWrapper', - optimizer=dict( - type='SGD', lr=0.02 * 4, momentum=0.9, weight_decay=0.00004)) +optim_wrapper = dict(type="AmpOptimWrapper", optimizer=dict(type="SGD", lr=0.02 * 4, momentum=0.9, weight_decay=0.00004)) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101-32x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101-32x4d_fpn_1x_coco.py index 50e0b9544592d61b3c14ec7f64f3e6eaa2e96a57..4114542a94ea8cd5e30fd2c60169ed685c9655db 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101-32x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101-32x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './cascade-rcnn_r50_fpn_1x_coco.py' +_base_ = "./cascade-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101-32x4d_fpn_20e_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101-32x4d_fpn_20e_coco.py index 6120189205d883d98b2d323a160ec54ea26aab13..e1ae1b0a9e54f3c4bcf6170d57006c5e1b95ddb3 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101-32x4d_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101-32x4d_fpn_20e_coco.py @@ -1,14 +1,15 @@ -_base_ = './cascade-rcnn_r50_fpn_20e_coco.py' +_base_ = "./cascade-rcnn_r50_fpn_20e_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101-64x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101-64x4d_fpn_1x_coco.py index 29475e39273dccad13058e9114728770e77f71ef..72c4692e47454a67e472cb3a5632381444dd95e8 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101-64x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101-64x4d_fpn_1x_coco.py @@ -1,15 +1,16 @@ -_base_ = './cascade-rcnn_r50_fpn_1x_coco.py' +_base_ = "./cascade-rcnn_r50_fpn_1x_coco.py" model = dict( - type='CascadeRCNN', + type="CascadeRCNN", backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ), +) diff --git a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101_64x4d_fpn_20e_coco.py b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101_64x4d_fpn_20e_coco.py index e2aa57eaaf43788fc3628f1463e94405279c7416..0b32327f58907577a4a1d4cc9b4c7ec172e720f7 100644 --- a/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101_64x4d_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/cascade_rcnn/cascade-rcnn_x101_64x4d_fpn_20e_coco.py @@ -1,15 +1,16 @@ -_base_ = './cascade-rcnn_r50_fpn_20e_coco.py' +_base_ = "./cascade-rcnn_r50_fpn_20e_coco.py" model = dict( - type='CascadeRCNN', + type="CascadeRCNN", backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ), +) diff --git a/mmpose/configs/mmdet/cascade_rpn/cascade-rpn_fast-rcnn_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rpn/cascade-rpn_fast-rcnn_r50-caffe_fpn_1x_coco.py index ba23ce90652d2ab2e9362be9a6231742d1815a70..ebc55728e172cd60d699b75394a74254da337ae0 100644 --- a/mmpose/configs/mmdet/cascade_rpn/cascade-rpn_fast-rcnn_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rpn/cascade-rpn_fast-rcnn_r50-caffe_fpn_1x_coco.py @@ -1,27 +1,25 @@ -_base_ = '../fast_rcnn/fast-rcnn_r50-caffe_fpn_1x_coco.py' +_base_ = "../fast_rcnn/fast-rcnn_r50-caffe_fpn_1x_coco.py" model = dict( roi_head=dict( bbox_head=dict( bbox_coder=dict(target_stds=[0.04, 0.04, 0.08, 0.08]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.5), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.5), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ) + ), # model training and testing settings - train_cfg=dict( - rcnn=dict( - assigner=dict( - pos_iou_thr=0.65, neg_iou_thr=0.65, min_pos_iou=0.65), - sampler=dict(num=256))), - test_cfg=dict(rcnn=dict(score_thr=1e-3))) + train_cfg=dict(rcnn=dict(assigner=dict(pos_iou_thr=0.65, neg_iou_thr=0.65, min_pos_iou=0.65), sampler=dict(num=256))), + test_cfg=dict(rcnn=dict(score_thr=1e-3)), +) # MMEngine support the following two ways, users can choose # according to convenience # train_dataloader = dict(dataset=dict(proposal_file='proposals/crpn_r50_caffe_fpn_1x_train2017.pkl')) # noqa -_base_.train_dataloader.dataset.proposal_file = 'proposals/crpn_r50_caffe_fpn_1x_train2017.pkl' # noqa +_base_.train_dataloader.dataset.proposal_file = "proposals/crpn_r50_caffe_fpn_1x_train2017.pkl" # noqa # val_dataloader = dict(dataset=dict(proposal_file='proposals/crpn_r50_caffe_fpn_1x_val2017.pkl')) # noqa # test_dataloader = val_dataloader -_base_.val_dataloader.dataset.proposal_file = 'proposals/crpn_r50_caffe_fpn_1x_val2017.pkl' # noqa +_base_.val_dataloader.dataset.proposal_file = "proposals/crpn_r50_caffe_fpn_1x_val2017.pkl" # noqa test_dataloader = _base_.val_dataloader optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/cascade_rpn/cascade-rpn_faster-rcnn_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rpn/cascade-rpn_faster-rcnn_r50-caffe_fpn_1x_coco.py index 2f7eced00144fb8fff1f234210a2b3f3fe475c8f..169f844976ec6b0e4262de7ed9f39d6e956fbdfd 100644 --- a/mmpose/configs/mmdet/cascade_rpn/cascade-rpn_faster-rcnn_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rpn/cascade-rpn_faster-rcnn_r50-caffe_fpn_1x_coco.py @@ -1,89 +1,59 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50-caffe_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50-caffe_fpn_1x_coco.py" rpn_weight = 0.7 model = dict( rpn_head=dict( _delete_=True, - type='CascadeRPNHead', + type="CascadeRPNHead", num_stages=2, stages=[ dict( - type='StageCascadeRPNHead', + type="StageCascadeRPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[1.0], - strides=[4, 8, 16, 32, 64]), - adapt_cfg=dict(type='dilation', dilation=3), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[1.0], strides=[4, 8, 16, 32, 64]), + adapt_cfg=dict(type="dilation", dilation=3), bridged_feature=True, with_cls=False, reg_decoded_bbox=True, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=(.0, .0, .0, .0), - target_stds=(0.1, 0.1, 0.5, 0.5)), - loss_bbox=dict( - type='IoULoss', linear=True, - loss_weight=10.0 * rpn_weight)), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=(0.0, 0.0, 0.0, 0.0), target_stds=(0.1, 0.1, 0.5, 0.5)), + loss_bbox=dict(type="IoULoss", linear=True, loss_weight=10.0 * rpn_weight), + ), dict( - type='StageCascadeRPNHead', + type="StageCascadeRPNHead", in_channels=256, feat_channels=256, - adapt_cfg=dict(type='offset'), + adapt_cfg=dict(type="offset"), bridged_feature=False, with_cls=True, reg_decoded_bbox=True, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=(.0, .0, .0, .0), - target_stds=(0.05, 0.05, 0.1, 0.1)), - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - loss_weight=1.0 * rpn_weight), - loss_bbox=dict( - type='IoULoss', linear=True, - loss_weight=10.0 * rpn_weight)) - ]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=(0.0, 0.0, 0.0, 0.0), target_stds=(0.05, 0.05, 0.1, 0.1)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0 * rpn_weight), + loss_bbox=dict(type="IoULoss", linear=True, loss_weight=10.0 * rpn_weight), + ), + ], + ), roi_head=dict( bbox_head=dict( bbox_coder=dict(target_stds=[0.04, 0.04, 0.08, 0.08]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.5), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.5), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ) + ), # model training and testing settings train_cfg=dict( rpn=[ + dict(assigner=dict(type="RegionAssigner", center_ratio=0.2, ignore_ratio=0.5), allowed_border=-1, pos_weight=-1, debug=False), dict( - assigner=dict( - type='RegionAssigner', center_ratio=0.2, ignore_ratio=0.5), - allowed_border=-1, - pos_weight=-1, - debug=False), - dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.7, - min_pos_iou=0.3, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.7, min_pos_iou=0.3, ignore_iof_thr=-1), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, - debug=False) + debug=False, + ), ], rpn_proposal=dict(max_per_img=300, nms=dict(iou_threshold=0.8)), - rcnn=dict( - assigner=dict( - pos_iou_thr=0.65, neg_iou_thr=0.65, min_pos_iou=0.65), - sampler=dict(type='RandomSampler', num=256))), - test_cfg=dict( - rpn=dict(max_per_img=300, nms=dict(iou_threshold=0.8)), - rcnn=dict(score_thr=1e-3))) + rcnn=dict(assigner=dict(pos_iou_thr=0.65, neg_iou_thr=0.65, min_pos_iou=0.65), sampler=dict(type="RandomSampler", num=256)), + ), + test_cfg=dict(rpn=dict(max_per_img=300, nms=dict(iou_threshold=0.8)), rcnn=dict(score_thr=1e-3)), +) optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/cascade_rpn/cascade-rpn_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/cascade_rpn/cascade-rpn_r50-caffe_fpn_1x_coco.py index 6eba24d11368ee0cdaae4fa316020ea3750be7f0..96c3e4f504b0bda51b3de6f8c6b4e4c5c0e300d7 100644 --- a/mmpose/configs/mmdet/cascade_rpn/cascade-rpn_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/cascade_rpn/cascade-rpn_r50-caffe_fpn_1x_coco.py @@ -1,76 +1,57 @@ -_base_ = '../rpn/rpn_r50-caffe_fpn_1x_coco.py' +_base_ = "../rpn/rpn_r50-caffe_fpn_1x_coco.py" model = dict( rpn_head=dict( _delete_=True, - type='CascadeRPNHead', + type="CascadeRPNHead", num_stages=2, stages=[ dict( - type='StageCascadeRPNHead', + type="StageCascadeRPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[1.0], - strides=[4, 8, 16, 32, 64]), - adapt_cfg=dict(type='dilation', dilation=3), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[1.0], strides=[4, 8, 16, 32, 64]), + adapt_cfg=dict(type="dilation", dilation=3), bridged_feature=True, sampling=False, with_cls=False, reg_decoded_bbox=True, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=(.0, .0, .0, .0), - target_stds=(0.1, 0.1, 0.5, 0.5)), - loss_bbox=dict(type='IoULoss', linear=True, loss_weight=10.0)), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=(0.0, 0.0, 0.0, 0.0), target_stds=(0.1, 0.1, 0.5, 0.5)), + loss_bbox=dict(type="IoULoss", linear=True, loss_weight=10.0), + ), dict( - type='StageCascadeRPNHead', + type="StageCascadeRPNHead", in_channels=256, feat_channels=256, - adapt_cfg=dict(type='offset'), + adapt_cfg=dict(type="offset"), bridged_feature=False, sampling=True, with_cls=True, reg_decoded_bbox=True, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=(.0, .0, .0, .0), - target_stds=(0.05, 0.05, 0.1, 0.1)), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, - loss_weight=1.0), - loss_bbox=dict(type='IoULoss', linear=True, loss_weight=10.0)) - ]), - train_cfg=dict(rpn=[ - dict( - assigner=dict( - type='RegionAssigner', center_ratio=0.2, ignore_ratio=0.5), - allowed_border=-1, - pos_weight=-1, - debug=False), - dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.7, - min_pos_iou=0.3, - ignore_iof_thr=-1, - iou_calculator=dict(type='BboxOverlaps2D')), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), - allowed_border=-1, - pos_weight=-1, - debug=False) - ]), - test_cfg=dict( - rpn=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.8), - min_bbox_size=0))) + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=(0.0, 0.0, 0.0, 0.0), target_stds=(0.05, 0.05, 0.1, 0.1)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", linear=True, loss_weight=10.0), + ), + ], + ), + train_cfg=dict( + rpn=[ + dict(assigner=dict(type="RegionAssigner", center_ratio=0.2, ignore_ratio=0.5), allowed_border=-1, pos_weight=-1, debug=False), + dict( + assigner=dict( + type="MaxIoUAssigner", + pos_iou_thr=0.7, + neg_iou_thr=0.7, + min_pos_iou=0.3, + ignore_iof_thr=-1, + iou_calculator=dict(type="BboxOverlaps2D"), + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), + allowed_border=-1, + pos_weight=-1, + debug=False, + ), + ] + ), + test_cfg=dict(rpn=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.8), min_bbox_size=0)), +) optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/centernet/centernet-update_r101_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/centernet/centernet-update_r101_fpn_8xb8-amp-lsj-200e_coco.py index 4fc65e0f8aeb1f02a0bea675146ced7a56800251..5770e6c966cde72781e71eb7f22cfb329a88ce10 100644 --- a/mmpose/configs/mmdet/centernet/centernet-update_r101_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/centernet/centernet-update_r101_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,7 +1,3 @@ -_base_ = './centernet-update_r50_fpn_8xb8-amp-lsj-200e_coco.py' +_base_ = "./centernet-update_r50_fpn_8xb8-amp-lsj-200e_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/centernet/centernet-update_r18_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/centernet/centernet-update_r18_fpn_8xb8-amp-lsj-200e_coco.py index ab3ae32ecd54cd08664e883a0888ef43040528d1..55231e4601b02785f8da1d290f4907a0d135f9d5 100644 --- a/mmpose/configs/mmdet/centernet/centernet-update_r18_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/centernet/centernet-update_r18_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,7 +1,6 @@ -_base_ = './centernet-update_r50_fpn_8xb8-amp-lsj-200e_coco.py' +_base_ = "./centernet-update_r50_fpn_8xb8-amp-lsj-200e_coco.py" model = dict( - backbone=dict( - depth=18, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18')), - neck=dict(in_channels=[64, 128, 256, 512])) + backbone=dict(depth=18, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet18")), + neck=dict(in_channels=[64, 128, 256, 512]), +) diff --git a/mmpose/configs/mmdet/centernet/centernet-update_r50-caffe_fpn_ms-1x_coco.py b/mmpose/configs/mmdet/centernet/centernet-update_r50-caffe_fpn_ms-1x_coco.py index 1f6e2b3919d6d2197c0ae9e1d721dc4eab00cf9c..6ed37187ba60bafc9581b8049701a7d77478d479 100644 --- a/mmpose/configs/mmdet/centernet/centernet-update_r50-caffe_fpn_ms-1x_coco.py +++ b/mmpose/configs/mmdet/centernet/centernet-update_r50-caffe_fpn_ms-1x_coco.py @@ -1,42 +1,36 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] model = dict( - type='CenterNet', + type="CenterNet", # use caffe img_norm data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), neck=dict( - type='FPN', + type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, - add_extra_convs='on_output', + add_extra_convs="on_output", num_outs=5, # There is a chance to get 40.3 after switching init_cfg, # otherwise it is about 39.9~40.1 - init_cfg=dict(type='Caffe2Xavier', layer='Conv2d'), - relu_before_extra_convs=True), + init_cfg=dict(type="Caffe2Xavier", layer="Conv2d"), + relu_before_extra_convs=True, + ), bbox_head=dict( - type='CenterNetUpdateHead', + type="CenterNetUpdateHead", num_classes=80, in_channels=256, stacked_convs=4, @@ -47,57 +41,35 @@ model = dict( more_pos_thresh=0.2, more_pos_topk=9, soft_weight_on_reg=False, - loss_cls=dict( - type='GaussianFocalLoss', - pos_weight=0.25, - neg_weight=0.75, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=2.0), + loss_cls=dict(type="GaussianFocalLoss", pos_weight=0.25, neg_weight=0.75, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), ), train_cfg=None, - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # single-scale training is about 39.3 train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=0.00025, - by_epoch=False, - begin=0, - end=4000), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.00025, by_epoch=False, begin=0, end=4000), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] optim_wrapper = dict( optimizer=dict(lr=0.01), # Experiments show that there is no need to turn on clip_grad. - paramwise_cfg=dict(norm_decay_mult=0.)) + paramwise_cfg=dict(norm_decay_mult=0.0), +) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/centernet/centernet-update_r50_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/centernet/centernet-update_r50_fpn_8xb8-amp-lsj-200e_coco.py index 34e0c680d39486467464f0ea7d6e1e08bf0c5240..545bdad7c43565d97040be6f30e373c5ffc38425 100644 --- a/mmpose/configs/mmdet/centernet/centernet-update_r50_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/centernet/centernet-update_r50_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,80 +1,64 @@ -_base_ = '../common/lsj-200e_coco-detection.py' +_base_ = "../common/lsj-200e_coco-detection.py" image_size = (1024, 1024) -batch_augments = [dict(type='BatchFixedSizePad', size=image_size)] +batch_augments = [dict(type="BatchFixedSizePad", size=image_size)] model = dict( - type='CenterNet', + type="CenterNet", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32, - batch_augments=batch_augments), + batch_augments=batch_augments, + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), neck=dict( - type='FPN', + type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, - add_extra_convs='on_output', + add_extra_convs="on_output", num_outs=5, - init_cfg=dict(type='Caffe2Xavier', layer='Conv2d'), - relu_before_extra_convs=True), + init_cfg=dict(type="Caffe2Xavier", layer="Conv2d"), + relu_before_extra_convs=True, + ), bbox_head=dict( - type='CenterNetUpdateHead', + type="CenterNetUpdateHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, strides=[8, 16, 32, 64, 128], - loss_cls=dict( - type='GaussianFocalLoss', - pos_weight=0.25, - neg_weight=0.75, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=2.0), + loss_cls=dict(type="GaussianFocalLoss", pos_weight=0.25, neg_weight=0.75, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), ), train_cfg=None, - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) train_dataloader = dict(batch_size=8, num_workers=4) # Enable automatic-mixed-precision training with AmpOptimWrapper. optim_wrapper = dict( - type='AmpOptimWrapper', - optimizer=dict( - type='SGD', lr=0.01 * 4, momentum=0.9, weight_decay=0.00004), - paramwise_cfg=dict(norm_decay_mult=0.)) + type="AmpOptimWrapper", + optimizer=dict(type="SGD", lr=0.01 * 4, momentum=0.9, weight_decay=0.00004), + paramwise_cfg=dict(norm_decay_mult=0.0), +) param_scheduler = [ - dict( - type='LinearLR', - start_factor=0.00025, - by_epoch=False, - begin=0, - end=4000), - dict( - type='MultiStepLR', - begin=0, - end=25, - by_epoch=True, - milestones=[22, 24], - gamma=0.1) + dict(type="LinearLR", start_factor=0.00025, by_epoch=False, begin=0, end=4000), + dict(type="MultiStepLR", begin=0, end=25, by_epoch=True, milestones=[22, 24], gamma=0.1), ] # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/centernet/centernet_r18-dcnv2_8xb16-crop512-140e_coco.py b/mmpose/configs/mmdet/centernet/centernet_r18-dcnv2_8xb16-crop512-140e_coco.py index 732a55d59ad7dee175d8b72f798f0be044f23326..d5bed4605d47765298eb98164bef90565db738f9 100644 --- a/mmpose/configs/mmdet/centernet/centernet_r18-dcnv2_8xb16-crop512-140e_coco.py +++ b/mmpose/configs/mmdet/centernet/centernet_r18-dcnv2_8xb16-crop512-140e_coco.py @@ -1,54 +1,44 @@ _base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py', - './centernet_tta.py' + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", + "./centernet_tta.py", ] -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" # model settings model = dict( - type='CenterNet', - data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="CenterNet", + data_preprocessor=dict(type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=18, norm_eval=False, - norm_cfg=dict(type='BN'), - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18')), - neck=dict( - type='CTResNetNeck', - in_channels=512, - num_deconv_filters=(256, 128, 64), - num_deconv_kernels=(4, 4, 4), - use_dcn=True), + norm_cfg=dict(type="BN"), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet18"), + ), + neck=dict(type="CTResNetNeck", in_channels=512, num_deconv_filters=(256, 128, 64), num_deconv_kernels=(4, 4, 4), use_dcn=True), bbox_head=dict( - type='CenterNetHead', + type="CenterNetHead", num_classes=80, in_channels=64, feat_channels=64, - loss_center_heatmap=dict(type='GaussianFocalLoss', loss_weight=1.0), - loss_wh=dict(type='L1Loss', loss_weight=0.1), - loss_offset=dict(type='L1Loss', loss_weight=1.0)), + loss_center_heatmap=dict(type="GaussianFocalLoss", loss_weight=1.0), + loss_wh=dict(type="L1Loss", loss_weight=0.1), + loss_offset=dict(type="L1Loss", loss_weight=1.0), + ), train_cfg=None, - test_cfg=dict(topk=100, local_maximum_kernel=3, max_per_img=100)) + test_cfg=dict(topk=100, local_maximum_kernel=3, max_per_img=100), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PhotoMetricDistortion", brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict( - type='PhotoMetricDistortion', - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18), - dict( - type='RandomCenterCropPad', + type="RandomCenterCropPad", # The cropped images are padded into squares during training, # but may be less than crop_size. crop_size=(512, 512), @@ -56,32 +46,29 @@ train_pipeline = [ mean=[0, 0, 0], std=[1, 1, 1], to_rgb=True, - test_pad_mode=None), + test_pad_mode=None, + ), # Make sure the output is always crop_size. - dict(type='Resize', scale=(512, 512), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="Resize", scale=(512, 512), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict( - type='LoadImageFromFile', - backend_args={{_base_.backend_args}}, - to_float32=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}, to_float32=True), # don't need Resize dict( - type='RandomCenterCropPad', + type="RandomCenterCropPad", ratios=None, border=None, mean=[0, 0, 0], std=[1, 1, 1], to_rgb=True, test_mode=True, - test_pad_mode=['logical_or', 31], - test_pad_add_pix=1), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'border')) + test_pad_mode=["logical_or", 31], + test_pad_add_pix=1, + ), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "border")), ] # Use RepeatDataset to speed up training @@ -89,20 +76,22 @@ train_dataloader = dict( batch_size=16, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( _delete_=True, - type='RepeatDataset', + type="RepeatDataset", times=5, dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, backend_args={{_base_.backend_args}}, - ))) + ), + ), +) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader @@ -117,16 +106,8 @@ max_epochs = 28 # learning policy # Based on the default settings of modern detectors, we added warmup settings. param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[18, 24], # the real step is [18*5, 24*5] - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[18, 24], gamma=0.1), # the real step is [18*5, 24*5] ] train_cfg = dict(max_epochs=max_epochs) # the real epoch is 28*5=140 diff --git a/mmpose/configs/mmdet/centernet/centernet_r18_8xb16-crop512-140e_coco.py b/mmpose/configs/mmdet/centernet/centernet_r18_8xb16-crop512-140e_coco.py index 6094b64221bd91eaafc9868e01c718d4421b418a..ff83157d0ce19735be1d0a4b4b3f7a1f99cf4bc6 100644 --- a/mmpose/configs/mmdet/centernet/centernet_r18_8xb16-crop512-140e_coco.py +++ b/mmpose/configs/mmdet/centernet/centernet_r18_8xb16-crop512-140e_coco.py @@ -1,3 +1,3 @@ -_base_ = './centernet_r18-dcnv2_8xb16-crop512-140e_coco.py' +_base_ = "./centernet_r18-dcnv2_8xb16-crop512-140e_coco.py" model = dict(neck=dict(use_dcn=False)) diff --git a/mmpose/configs/mmdet/centernet/centernet_tta.py b/mmpose/configs/mmdet/centernet/centernet_tta.py index edd7b03ecdeb272870919dcbd4842d6b8e32d8d4..67f7fb00d0cbe78f1a1cef94dde67456db46e14d 100644 --- a/mmpose/configs/mmdet/centernet/centernet_tta.py +++ b/mmpose/configs/mmdet/centernet/centernet_tta.py @@ -1,39 +1,34 @@ # This is different from the TTA of official CenterNet. -tta_model = dict( - type='DetTTAModel', - tta_cfg=dict(nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)) +tta_model = dict(type="DetTTAModel", tta_cfg=dict(nms=dict(type="nms", iou_threshold=0.5), max_per_img=100)) tta_pipeline = [ - dict(type='LoadImageFromFile', to_float32=True, backend_args=None), + dict(type="LoadImageFromFile", to_float32=True, backend_args=None), dict( - type='TestTimeAug', + type="TestTimeAug", transforms=[ [ # ``RandomFlip`` must be placed before ``RandomCenterCropPad``, # otherwise bounding box coordinates after flipping cannot be # recovered correctly. - dict(type='RandomFlip', prob=1.), - dict(type='RandomFlip', prob=0.) + dict(type="RandomFlip", prob=1.0), + dict(type="RandomFlip", prob=0.0), ], [ dict( - type='RandomCenterCropPad', + type="RandomCenterCropPad", ratios=None, border=None, mean=[0, 0, 0], std=[1, 1, 1], to_rgb=True, test_mode=True, - test_pad_mode=['logical_or', 31], - test_pad_add_pix=1), + test_pad_mode=["logical_or", 31], + test_pad_add_pix=1, + ), ], - [dict(type='LoadAnnotations', with_bbox=True)], - [ - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'flip', 'flip_direction', 'border')) - ] - ]) + [dict(type="LoadAnnotations", with_bbox=True)], + [dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "flip", "flip_direction", "border"))], + ], + ), ] diff --git a/mmpose/configs/mmdet/centripetalnet/centripetalnet_hourglass104_16xb6-crop511-210e-mstest_coco.py b/mmpose/configs/mmdet/centripetalnet/centripetalnet_hourglass104_16xb6-crop511-210e-mstest_coco.py index b757ffd16dca2d2b51d27ad413fdba889252c87f..fe277cb6bdc6612fd9f32a16b37c371ec61755a3 100644 --- a/mmpose/configs/mmdet/centripetalnet/centripetalnet_hourglass104_16xb6-crop511-210e-mstest_coco.py +++ b/mmpose/configs/mmdet/centripetalnet/centripetalnet_hourglass104_16xb6-crop511-210e-mstest_coco.py @@ -1,38 +1,31 @@ -_base_ = [ - '../_base_/default_runtime.py', '../_base_/datasets/coco_detection.py' -] +_base_ = ["../_base_/default_runtime.py", "../_base_/datasets/coco_detection.py"] -data_preprocessor = dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True) +data_preprocessor = dict(type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True) # model settings model = dict( - type='CornerNet', + type="CornerNet", data_preprocessor=data_preprocessor, backbone=dict( - type='HourglassNet', + type="HourglassNet", downsample_times=5, num_stacks=2, stage_channels=[256, 256, 384, 384, 384, 512], stage_blocks=[2, 2, 2, 2, 2, 4], - norm_cfg=dict(type='BN', requires_grad=True)), + norm_cfg=dict(type="BN", requires_grad=True), + ), neck=None, bbox_head=dict( - type='CentripetalHead', + type="CentripetalHead", num_classes=80, in_channels=256, num_feat_levels=2, corner_emb_channels=0, - loss_heatmap=dict( - type='GaussianFocalLoss', alpha=2.0, gamma=4.0, loss_weight=1), - loss_offset=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1), - loss_guiding_shift=dict( - type='SmoothL1Loss', beta=1.0, loss_weight=0.05), - loss_centripetal_shift=dict( - type='SmoothL1Loss', beta=1.0, loss_weight=1)), + loss_heatmap=dict(type="GaussianFocalLoss", alpha=2.0, gamma=4.0, loss_weight=1), + loss_offset=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1), + loss_guiding_shift=dict(type="SmoothL1Loss", beta=1.0, loss_weight=0.05), + loss_centripetal_shift=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1), + ), # training and testing settings train_cfg=None, test_cfg=dict( @@ -41,141 +34,106 @@ model = dict( distance_threshold=0.5, score_thr=0.05, max_per_img=100, - nms=dict(type='soft_nms', iou_threshold=0.5, method='gaussian'))) + nms=dict(type="soft_nms", iou_threshold=0.5, method="gaussian"), + ), +) # data settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PhotoMetricDistortion', - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PhotoMetricDistortion", brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict( # The cropped images are padded into squares during training, # but may be smaller than crop_size. - type='RandomCenterCropPad', + type="RandomCenterCropPad", crop_size=(511, 511), ratios=(0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3), test_mode=False, test_pad_mode=None, - mean=data_preprocessor['mean'], - std=data_preprocessor['std'], + mean=data_preprocessor["mean"], + std=data_preprocessor["std"], # Image data is not converted to rgb. - to_rgb=data_preprocessor['bgr_to_rgb']), - dict(type='Resize', scale=(511, 511), keep_ratio=False), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs'), + to_rgb=data_preprocessor["bgr_to_rgb"], + ), + dict(type="Resize", scale=(511, 511), keep_ratio=False), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict( - type='LoadImageFromFile', - to_float32=True, - backend_args=_base_.backend_args), + dict(type="LoadImageFromFile", to_float32=True, backend_args=_base_.backend_args), # don't need Resize dict( - type='RandomCenterCropPad', + type="RandomCenterCropPad", crop_size=None, ratios=None, border=None, test_mode=True, - test_pad_mode=['logical_or', 127], - mean=data_preprocessor['mean'], - std=data_preprocessor['std'], + test_pad_mode=["logical_or", 127], + mean=data_preprocessor["mean"], + std=data_preprocessor["std"], # Image data is not converted to rgb. - to_rgb=data_preprocessor['bgr_to_rgb']), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'border')) + to_rgb=data_preprocessor["bgr_to_rgb"], + ), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "border")), ] -train_dataloader = dict( - batch_size=6, - num_workers=3, - batch_sampler=None, - dataset=dict(pipeline=train_pipeline)) +train_dataloader = dict(batch_size=6, num_workers=3, batch_sampler=None, dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='Adam', lr=0.0005), - clip_grad=dict(max_norm=35, norm_type=2)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="Adam", lr=0.0005), clip_grad=dict(max_norm=35, norm_type=2)) max_epochs = 210 # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 3, - by_epoch=False, - begin=0, - end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[190], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 3, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[190], gamma=0.1), ] -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. # base_batch_size = (16 GPUs) x (6 samples per GPU) auto_scale_lr = dict(base_batch_size=96) -tta_model = dict( - type='DetTTAModel', - tta_cfg=dict( - nms=dict(type='soft_nms', iou_threshold=0.5, method='gaussian'), - max_per_img=100)) +tta_model = dict(type="DetTTAModel", tta_cfg=dict(nms=dict(type="soft_nms", iou_threshold=0.5, method="gaussian"), max_per_img=100)) tta_pipeline = [ + dict(type="LoadImageFromFile", to_float32=True, backend_args=_base_.backend_args), dict( - type='LoadImageFromFile', - to_float32=True, - backend_args=_base_.backend_args), - dict( - type='TestTimeAug', + type="TestTimeAug", transforms=[ [ # ``RandomFlip`` must be placed before ``RandomCenterCropPad``, # otherwise bounding box coordinates after flipping cannot be # recovered correctly. - dict(type='RandomFlip', prob=1.), - dict(type='RandomFlip', prob=0.) + dict(type="RandomFlip", prob=1.0), + dict(type="RandomFlip", prob=0.0), ], [ dict( - type='RandomCenterCropPad', + type="RandomCenterCropPad", crop_size=None, ratios=None, border=None, test_mode=True, - test_pad_mode=['logical_or', 127], - mean=data_preprocessor['mean'], - std=data_preprocessor['std'], + test_pad_mode=["logical_or", 127], + mean=data_preprocessor["mean"], + std=data_preprocessor["std"], # Image data is not converted to rgb. - to_rgb=data_preprocessor['bgr_to_rgb']) + to_rgb=data_preprocessor["bgr_to_rgb"], + ) ], - [dict(type='LoadAnnotations', with_bbox=True)], - [ - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'flip', 'flip_direction', 'border')) - ] - ]) + [dict(type="LoadAnnotations", with_bbox=True)], + [dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "flip", "flip_direction", "border"))], + ], + ), ] diff --git a/mmpose/configs/mmdet/cityscapes/faster-rcnn_r50_fpn_1x_cityscapes.py b/mmpose/configs/mmdet/cityscapes/faster-rcnn_r50_fpn_1x_cityscapes.py index ccd0de2aff1c1f3071e70e67dbf94b1c1cfe7e8b..605f853ed4c21c3bafb2523275c34338e15b58c4 100644 --- a/mmpose/configs/mmdet/cityscapes/faster-rcnn_r50_fpn_1x_cityscapes.py +++ b/mmpose/configs/mmdet/cityscapes/faster-rcnn_r50_fpn_1x_cityscapes.py @@ -1,14 +1,13 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/cityscapes_detection.py', - '../_base_/default_runtime.py', '../_base_/schedules/schedule_1x.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/cityscapes_detection.py", + "../_base_/default_runtime.py", + "../_base_/schedules/schedule_1x.py", ] model = dict( backbone=dict(init_cfg=None), - roi_head=dict( - bbox_head=dict( - num_classes=8, - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)))) + roi_head=dict(bbox_head=dict(num_classes=8, loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0))), +) # optimizer # lr is set for a batch size of 8 @@ -16,23 +15,23 @@ optim_wrapper = dict(optimizer=dict(lr=0.01)) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', + type="MultiStepLR", begin=0, end=8, by_epoch=True, # [7] yields higher performance than [6] milestones=[7], - gamma=0.1) + gamma=0.1, + ), ] # actual epoch = 8 * 8 = 64 train_cfg = dict(max_epochs=8) # For better, more stable performance initialize from COCO -load_from = 'https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth" # noqa # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/cityscapes/mask-rcnn_r50_fpn_1x_cityscapes.py b/mmpose/configs/mmdet/cityscapes/mask-rcnn_r50_fpn_1x_cityscapes.py index 772268b121e7b8858c4cfcf3b6820e6146634d0d..eacc93fe403eb26f684a6db911048787e2ee5b42 100644 --- a/mmpose/configs/mmdet/cityscapes/mask-rcnn_r50_fpn_1x_cityscapes.py +++ b/mmpose/configs/mmdet/cityscapes/mask-rcnn_r50_fpn_1x_cityscapes.py @@ -1,16 +1,16 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/cityscapes_instance.py', - '../_base_/default_runtime.py', '../_base_/schedules/schedule_1x.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/cityscapes_instance.py", + "../_base_/default_runtime.py", + "../_base_/schedules/schedule_1x.py", ] model = dict( backbone=dict(init_cfg=None), roi_head=dict( - bbox_head=dict( - type='Shared2FCBBoxHead', - num_classes=8, - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)), - mask_head=dict(num_classes=8))) + bbox_head=dict(type="Shared2FCBBoxHead", num_classes=8, loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0)), + mask_head=dict(num_classes=8), + ), +) # optimizer # lr is set for a batch size of 8 @@ -18,23 +18,23 @@ optim_wrapper = dict(optimizer=dict(lr=0.01)) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', + type="MultiStepLR", begin=0, end=8, by_epoch=True, # [7] yields higher performance than [6] milestones=[7], - gamma=0.1) + gamma=0.1, + ), ] # actual epoch = 8 * 8 = 64 train_cfg = dict(max_epochs=8) # For better, more stable performance initialize from COCO -load_from = 'https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_fpn_1x_coco/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_fpn_1x_coco/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth" # noqa # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/common/lsj-100e_coco-detection.py b/mmpose/configs/mmdet/common/lsj-100e_coco-detection.py index bb631e5d5c1253cc3a5d81a8cdc6cd86133d9b53..50517bab538c9b651e55e9ed8a4946f4b4d6ad65 100644 --- a/mmpose/configs/mmdet/common/lsj-100e_coco-detection.py +++ b/mmpose/configs/mmdet/common/lsj-100e_coco-detection.py @@ -1,7 +1,7 @@ -_base_ = '../_base_/default_runtime.py' +_base_ = "../_base_/default_runtime.py" # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" image_size = (1024, 1024) # Example to use different file client @@ -20,31 +20,19 @@ image_size = (1024, 1024) backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', - scale=image_size, - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=image_size, - recompute_bbox=True, - allow_negative_crop=True), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=image_size, ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=image_size, recompute_bbox=True, allow_negative_crop=True), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] # Use RepeatDataset to speed up training @@ -52,65 +40,61 @@ train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='RepeatDataset', + type="RepeatDataset", times=4, # simply change this from 2 to 16 for 50e - 400e training. dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args))) + backend_args=backend_args, + ), + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox', + type="CocoMetric", + ann_file=data_root + "annotations/instances_val2017.json", + metric="bbox", format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = val_evaluator max_epochs = 25 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=5) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=5) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # optimizer assumes bs=64 -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.00004)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.1, momentum=0.9, weight_decay=0.00004)) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.067, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[22, 24], - gamma=0.1) + dict(type="LinearLR", start_factor=0.067, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[22, 24], gamma=0.1), ] # only keep latest 2 checkpoints diff --git a/mmpose/configs/mmdet/common/lsj-100e_coco-instance.py b/mmpose/configs/mmdet/common/lsj-100e_coco-instance.py index 6e62729d639c7659115a7f5f6449fa9021338be6..acb398d09def4783f61fc17a22f909621a86d9ca 100644 --- a/mmpose/configs/mmdet/common/lsj-100e_coco-instance.py +++ b/mmpose/configs/mmdet/common/lsj-100e_coco-instance.py @@ -1,7 +1,7 @@ -_base_ = '../_base_/default_runtime.py' +_base_ = "../_base_/default_runtime.py" # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" image_size = (1024, 1024) # Example to use different file client @@ -20,31 +20,19 @@ image_size = (1024, 1024) backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomResize', - scale=image_size, - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=image_size, - recompute_bbox=True, - allow_negative_crop=True), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomResize", scale=image_size, ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=image_size, recompute_bbox=True, allow_negative_crop=True), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] # Use RepeatDataset to speed up training @@ -52,65 +40,61 @@ train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='RepeatDataset', + type="RepeatDataset", times=4, # simply change this from 2 to 16 for 50e - 400e training. dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args))) + backend_args=backend_args, + ), + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric=['bbox', 'segm'], + type="CocoMetric", + ann_file=data_root + "annotations/instances_val2017.json", + metric=["bbox", "segm"], format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = val_evaluator max_epochs = 25 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=5) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=5) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # optimizer assumes bs=64 -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.00004)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.1, momentum=0.9, weight_decay=0.00004)) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.067, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[22, 24], - gamma=0.1) + dict(type="LinearLR", start_factor=0.067, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[22, 24], gamma=0.1), ] # only keep latest 2 checkpoints diff --git a/mmpose/configs/mmdet/common/lsj-200e_coco-detection.py b/mmpose/configs/mmdet/common/lsj-200e_coco-detection.py index 83d12947fed900f05d748b6f90ef29cc5fbc407a..56a516f163fe60b4f945ba7e3458b9bacdd79b6d 100644 --- a/mmpose/configs/mmdet/common/lsj-200e_coco-detection.py +++ b/mmpose/configs/mmdet/common/lsj-200e_coco-detection.py @@ -1,18 +1,10 @@ -_base_ = './lsj-100e_coco-detection.py' +_base_ = "./lsj-100e_coco-detection.py" # 8x25=200e train_dataloader = dict(dataset=dict(times=8)) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.067, by_epoch=False, begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=25, - by_epoch=True, - milestones=[22, 24], - gamma=0.1) + dict(type="LinearLR", start_factor=0.067, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=25, by_epoch=True, milestones=[22, 24], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/common/lsj-200e_coco-instance.py b/mmpose/configs/mmdet/common/lsj-200e_coco-instance.py index af3e4bf160c01045c6e36d67bdee796e7bf96cd3..af81bb02848bf696fb30fd7f0a8bfa8ac8ef709c 100644 --- a/mmpose/configs/mmdet/common/lsj-200e_coco-instance.py +++ b/mmpose/configs/mmdet/common/lsj-200e_coco-instance.py @@ -1,18 +1,10 @@ -_base_ = './lsj-100e_coco-instance.py' +_base_ = "./lsj-100e_coco-instance.py" # 8x25=200e train_dataloader = dict(dataset=dict(times=8)) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.067, by_epoch=False, begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=25, - by_epoch=True, - milestones=[22, 24], - gamma=0.1) + dict(type="LinearLR", start_factor=0.067, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=25, by_epoch=True, milestones=[22, 24], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/common/ms-90k_coco.py b/mmpose/configs/mmdet/common/ms-90k_coco.py index e2d6c3dafb61d59bbbe9d0c6188a1bbff3b736b3..c21daa5079d63ce128fbc341ec646aa0a565ff8e 100644 --- a/mmpose/configs/mmdet/common/ms-90k_coco.py +++ b/mmpose/configs/mmdet/common/ms-90k_coco.py @@ -1,8 +1,8 @@ -_base_ = '../_base_/default_runtime.py' +_base_ = "../_base_/default_runtime.py" # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module # automatically infer from prefix (not support LMDB and Memcache yet) @@ -19,99 +19,84 @@ data_root = 'data/coco/' backend_args = None # Align with Detectron2 -backend = 'pillow' +backend = "pillow" train_pipeline = [ + dict(type="LoadImageFromFile", backend_args=backend_args, imdecode_backend=backend), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='LoadImageFromFile', - backend_args=backend_args, - imdecode_backend=backend), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], + type="RandomChoiceResize", + scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True, - backend=backend), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + backend=backend, + ), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict( - type='LoadImageFromFile', - backend_args=backend_args, - imdecode_backend=backend), - dict(type='Resize', scale=(1333, 800), keep_ratio=True, backend=backend), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args, imdecode_backend=backend), + dict(type="Resize", scale=(1333, 800), keep_ratio=True, backend=backend), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, pin_memory=True, - sampler=dict(type='InfiniteSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="InfiniteSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox', + type="CocoMetric", + ann_file=data_root + "annotations/instances_val2017.json", + metric="bbox", format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = val_evaluator # training schedule for 90k max_iter = 90000 -train_cfg = dict( - type='IterBasedTrainLoop', max_iters=max_iter, val_interval=10000) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="IterBasedTrainLoop", max_iters=max_iter, val_interval=10000) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_iter, - by_epoch=False, - milestones=[60000, 80000], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_iter, by_epoch=False, milestones=[60000, 80000], gamma=0.1), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically # or not by default. diff --git a/mmpose/configs/mmdet/common/ms-poly-90k_coco-instance.py b/mmpose/configs/mmdet/common/ms-poly-90k_coco-instance.py index d5566b3c3b8bfe0a49c8c062fb0fc972d5ae1f55..2ddeb589c504caaba84a1df3327b40a37d829728 100644 --- a/mmpose/configs/mmdet/common/ms-poly-90k_coco-instance.py +++ b/mmpose/configs/mmdet/common/ms-poly-90k_coco-instance.py @@ -1,8 +1,8 @@ -_base_ = '../_base_/default_runtime.py' +_base_ = "../_base_/default_runtime.py" # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module # automatically infer from prefix (not support LMDB and Memcache yet) @@ -19,107 +19,84 @@ data_root = 'data/coco/' backend_args = None # Align with Detectron2 -backend = 'pillow' +backend = "pillow" train_pipeline = [ + dict(type="LoadImageFromFile", backend_args=backend_args, imdecode_backend=backend), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), dict( - type='LoadImageFromFile', - backend_args=backend_args, - imdecode_backend=backend), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], + type="RandomChoiceResize", + scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True, - backend=backend), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + backend=backend, + ), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict( - type='LoadImageFromFile', - backend_args=backend_args, - imdecode_backend=backend), - dict(type='Resize', scale=(1333, 800), keep_ratio=True, backend=backend), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args, imdecode_backend=backend), + dict(type="Resize", scale=(1333, 800), keep_ratio=True, backend=backend), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, pin_memory=True, - sampler=dict(type='InfiniteSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="InfiniteSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric=['bbox', 'segm'], + type="CocoMetric", + ann_file=data_root + "annotations/instances_val2017.json", + metric=["bbox", "segm"], format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = val_evaluator # training schedule for 90k max_iter = 90000 -train_cfg = dict( - type='IterBasedTrainLoop', max_iters=max_iter, val_interval=10000) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="IterBasedTrainLoop", max_iters=max_iter, val_interval=10000) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_iter, - by_epoch=False, - milestones=[60000, 80000], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_iter, by_epoch=False, milestones=[60000, 80000], gamma=0.1), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically # or not by default. diff --git a/mmpose/configs/mmdet/common/ms-poly_3x_coco-instance.py b/mmpose/configs/mmdet/common/ms-poly_3x_coco-instance.py index 04072f9b84c06d546767649f7e17736444db7ce2..7ed43725de9dc9f50dfd7e4a9dee736015ff0412 100644 --- a/mmpose/configs/mmdet/common/ms-poly_3x_coco-instance.py +++ b/mmpose/configs/mmdet/common/ms-poly_3x_coco-instance.py @@ -1,7 +1,7 @@ -_base_ = '../_base_/default_runtime.py' +_base_ = "../_base_/default_runtime.py" # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -21,95 +21,76 @@ backend_args = None # In mstrain 3x config, img_scale=[(1333, 640), (1333, 800)], # multiscale_mode='range' train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict( - type='RandomResize', scale=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs'), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="RandomResize", scale=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( - type='RepeatDataset', + type="RepeatDataset", times=3, dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args))) + backend_args=backend_args, + ), + ), +) val_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric=['bbox', 'segm'], - backend_args=backend_args) + type="CocoMetric", ann_file=data_root + "annotations/instances_val2017.json", metric=["bbox", "segm"], backend_args=backend_args +) test_evaluator = val_evaluator # training schedule for 3x with `RepeatDataset` -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=12, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=12, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate # Experiments show that using milestones=[9, 11] has higher performance param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[9, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[9, 11], gamma=0.1), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically diff --git a/mmpose/configs/mmdet/common/ms_3x_coco-instance.py b/mmpose/configs/mmdet/common/ms_3x_coco-instance.py index f80cf88e9b1e770dce3157abc852aea996eec624..c04280004b525f614322039ee48e34b58fa9bf5d 100644 --- a/mmpose/configs/mmdet/common/ms_3x_coco-instance.py +++ b/mmpose/configs/mmdet/common/ms_3x_coco-instance.py @@ -1,8 +1,8 @@ -_base_ = '../_base_/default_runtime.py' +_base_ = "../_base_/default_runtime.py" # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -20,86 +20,73 @@ data_root = 'data/coco/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomResize', scale=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomResize", scale=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( - type='RepeatDataset', + type="RepeatDataset", times=3, dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args))) + backend_args=backend_args, + ), + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox', - backend_args=backend_args) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/instances_val2017.json", metric="bbox", backend_args=backend_args) test_evaluator = val_evaluator # training schedule for 3x with `RepeatDataset` -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=12, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=12, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate # Experiments show that using milestones=[9, 11] has higher performance param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[9, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[9, 11], gamma=0.1), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically diff --git a/mmpose/configs/mmdet/common/ms_3x_coco.py b/mmpose/configs/mmdet/common/ms_3x_coco.py index facbb34cf05088d8832502d3c9a38d812d328308..1b6eeedd03f1ac1b8dc3e52c1c128ce21d9ec4a4 100644 --- a/mmpose/configs/mmdet/common/ms_3x_coco.py +++ b/mmpose/configs/mmdet/common/ms_3x_coco.py @@ -1,8 +1,8 @@ -_base_ = '../_base_/default_runtime.py' +_base_ = "../_base_/default_runtime.py" # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -20,86 +20,73 @@ data_root = 'data/coco/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', scale=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( - type='RepeatDataset', + type="RepeatDataset", times=3, dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args))) + backend_args=backend_args, + ), + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox', - backend_args=backend_args) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/instances_val2017.json", metric="bbox", backend_args=backend_args) test_evaluator = val_evaluator # training schedule for 3x with `RepeatDataset` -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=12, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=12, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate # Experiments show that using milestones=[9, 11] has higher performance param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[9, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[9, 11], gamma=0.1), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically diff --git a/mmpose/configs/mmdet/common/ssj_270k_coco-instance.py b/mmpose/configs/mmdet/common/ssj_270k_coco-instance.py index 7407644fd59bb03d6e0afde83f8893a351ddc356..d77bf018b769e9f62a4908dd0be666368db604a1 100644 --- a/mmpose/configs/mmdet/common/ssj_270k_coco-instance.py +++ b/mmpose/configs/mmdet/common/ssj_270k_coco-instance.py @@ -1,7 +1,7 @@ -_base_ = '../_base_/default_runtime.py' +_base_ = "../_base_/default_runtime.py" # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" image_size = (1024, 1024) @@ -23,97 +23,79 @@ backend_args = None # Standard Scale Jittering (SSJ) resizes and crops an image # with a resize range of 0.8 to 1.25 of the original image size. train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomResize', - scale=image_size, - ratio_range=(0.8, 1.25), - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=image_size, - recompute_bbox=True, - allow_negative_crop=True), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomResize", scale=image_size, ratio_range=(0.8, 1.25), keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=image_size, recompute_bbox=True, allow_negative_crop=True), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='InfiniteSampler'), + sampler=dict(type="InfiniteSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric=['bbox', 'segm'], + type="CocoMetric", + ann_file=data_root + "annotations/instances_val2017.json", + metric=["bbox", "segm"], format_only=False, - backend_args=backend_args) + backend_args=backend_args, +) test_evaluator = val_evaluator # The model is trained by 270k iterations with batch_size 64, # which is roughly equivalent to 144 epochs. max_iters = 270000 -train_cfg = dict( - type='IterBasedTrainLoop', max_iters=max_iters, val_interval=10000) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="IterBasedTrainLoop", max_iters=max_iters, val_interval=10000) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # optimizer assumes bs=64 -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.00004)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.1, momentum=0.9, weight_decay=0.00004)) # learning rate policy # lr steps at [0.9, 0.95, 0.975] of the maximum iterations param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=270000, - by_epoch=False, - milestones=[243000, 256500, 263250], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=270000, by_epoch=False, milestones=[243000, 256500, 263250], gamma=0.1), ] default_hooks = dict(checkpoint=dict(by_epoch=False, interval=10000)) diff --git a/mmpose/configs/mmdet/common/ssj_scp_270k_coco-instance.py b/mmpose/configs/mmdet/common/ssj_scp_270k_coco-instance.py index 06159dd40312ec935ac383701fa7b052b863e1bf..a55945441a9b74d42357f7c935a3aa764d923050 100644 --- a/mmpose/configs/mmdet/common/ssj_scp_270k_coco-instance.py +++ b/mmpose/configs/mmdet/common/ssj_scp_270k_coco-instance.py @@ -1,7 +1,7 @@ -_base_ = 'ssj_270k_coco-instance.py' +_base_ = "ssj_270k_coco-instance.py" # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" image_size = (1024, 1024) @@ -23,38 +23,29 @@ backend_args = None # Standard Scale Jittering (SSJ) resizes and crops an image # with a resize range of 0.8 to 1.25 of the original image size. load_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomResize', - scale=image_size, - ratio_range=(0.8, 1.25), - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=image_size, - recompute_bbox=True, - allow_negative_crop=True), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=image_size), -] -train_pipeline = [ - dict(type='CopyPaste', max_num_pasted=100), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomResize", scale=image_size, ratio_range=(0.8, 1.25), keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=image_size, recompute_bbox=True, allow_negative_crop=True), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=image_size), ] +train_pipeline = [dict(type="CopyPaste", max_num_pasted=100), dict(type="PackDetInputs")] train_dataloader = dict( dataset=dict( _delete_=True, - type='MultiImageMixDataset', + type="MultiImageMixDataset", dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=load_pipeline, - backend_args=backend_args), - pipeline=train_pipeline)) + backend_args=backend_args, + ), + pipeline=train_pipeline, + ) +) diff --git a/mmpose/configs/mmdet/condinst/condinst_r50_fpn_ms-poly-90k_coco_instance.py b/mmpose/configs/mmdet/condinst/condinst_r50_fpn_ms-poly-90k_coco_instance.py index 39639d874cbeb54b64a2789f251f1f6dad585ce3..9197b2b8f328bfe61f4ca79cff7709b232119153 100644 --- a/mmpose/configs/mmdet/condinst/condinst_r50_fpn_ms-poly-90k_coco_instance.py +++ b/mmpose/configs/mmdet/condinst/condinst_r50_fpn_ms-poly-90k_coco_instance.py @@ -1,35 +1,38 @@ -_base_ = '../common/ms-poly-90k_coco-instance.py' +_base_ = "../common/ms-poly-90k_coco-instance.py" # model settings model = dict( - type='CondInst', + type="CondInst", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_mask=True, - pad_size_divisor=32), + pad_size_divisor=32, + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), - style='pytorch'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + style="pytorch", + ), neck=dict( - type='FPN', + type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, - add_extra_convs='on_output', # use P5 + add_extra_convs="on_output", # use P5 num_outs=5, - relu_before_extra_convs=True), + relu_before_extra_convs=True, + ), bbox_head=dict( - type='CondInstBboxHead', + type="CondInstBboxHead", num_params=169, num_classes=80, in_channels=256, @@ -41,17 +44,12 @@ model = dict( dcn_on_last_conv=False, center_sampling=True, conv_bias=True, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=1.0), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=1.0), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), mask_head=dict( - type='CondInstMaskHead', + type="CondInstMaskHead", num_layers=3, feat_channels=8, size_of_interest=8, @@ -65,21 +63,13 @@ model = dict( out_channels=8, mask_stride=8, num_stacked_convs=4, - norm_cfg=dict(type='BN', requires_grad=True)), - loss_mask=dict( - type='DiceLoss', - use_sigmoid=True, - activate=True, - eps=5e-6, - loss_weight=1.0)), + norm_cfg=dict(type="BN", requires_grad=True), + ), + loss_mask=dict(type="DiceLoss", use_sigmoid=True, activate=True, eps=5e-6, loss_weight=1.0), + ), # model training and testing settings - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100, - mask_thr=0.5)) + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100, mask_thr=0.5), +) # optimizer optim_wrapper = dict(optimizer=dict(lr=0.01)) diff --git a/mmpose/configs/mmdet/conditional_detr/conditional-detr_r50_8xb2-50e_coco.py b/mmpose/configs/mmdet/conditional_detr/conditional-detr_r50_8xb2-50e_coco.py index a21476448d0cbab6b6e4b94aa46d686e38667879..d3fbe57433ad719e5b04538137d29aea197be174 100644 --- a/mmpose/configs/mmdet/conditional_detr/conditional-detr_r50_8xb2-50e_coco.py +++ b/mmpose/configs/mmdet/conditional_detr/conditional-detr_r50_8xb2-50e_coco.py @@ -1,42 +1,31 @@ -_base_ = ['../detr/detr_r50_8xb2-150e_coco.py'] +_base_ = ["../detr/detr_r50_8xb2-150e_coco.py"] model = dict( - type='ConditionalDETR', + type="ConditionalDETR", num_queries=300, decoder=dict( num_layers=6, layer_cfg=dict( - self_attn_cfg=dict( - _delete_=True, - embed_dims=256, - num_heads=8, - attn_drop=0.1, - cross_attn=False), - cross_attn_cfg=dict( - _delete_=True, - embed_dims=256, - num_heads=8, - attn_drop=0.1, - cross_attn=True))), + self_attn_cfg=dict(_delete_=True, embed_dims=256, num_heads=8, attn_drop=0.1, cross_attn=False), + cross_attn_cfg=dict(_delete_=True, embed_dims=256, num_heads=8, attn_drop=0.1, cross_attn=True), + ), + ), bbox_head=dict( - type='ConditionalDETRHead', - loss_cls=dict( - _delete_=True, - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=2.0)), + type="ConditionalDETRHead", loss_cls=dict(_delete_=True, type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=2.0) + ), # training and testing settings train_cfg=dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='FocalLossCost', weight=2.0), - dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'), - dict(type='IoUCost', iou_mode='giou', weight=2.0) - ]))) + dict(type="FocalLossCost", weight=2.0), + dict(type="BBoxL1Cost", weight=5.0, box_format="xywh"), + dict(type="IoUCost", iou_mode="giou", weight=2.0), + ], + ) + ), +) # learning policy -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=50, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=50, val_interval=1) -param_scheduler = [dict(type='MultiStepLR', end=50, milestones=[40])] +param_scheduler = [dict(type="MultiStepLR", end=50, milestones=[40])] diff --git a/mmpose/configs/mmdet/convnext/cascade-mask-rcnn_convnext-s-p4-w7_fpn_4conv1fc-giou_amp-ms-crop-3x_coco.py b/mmpose/configs/mmdet/convnext/cascade-mask-rcnn_convnext-s-p4-w7_fpn_4conv1fc-giou_amp-ms-crop-3x_coco.py index 9a5fbedcaa78636f11a5718f1123d33e7e2ac273..3ffec5834a947d7ff06bc037f901483ec686a10c 100644 --- a/mmpose/configs/mmdet/convnext/cascade-mask-rcnn_convnext-s-p4-w7_fpn_4conv1fc-giou_amp-ms-crop-3x_coco.py +++ b/mmpose/configs/mmdet/convnext/cascade-mask-rcnn_convnext-s-p4-w7_fpn_4conv1fc-giou_amp-ms-crop-3x_coco.py @@ -1,26 +1,21 @@ -_base_ = './cascade-mask-rcnn_convnext-t-p4-w7_fpn_4conv1fc-giou_amp-ms-crop-3x_coco.py' # noqa +_base_ = "./cascade-mask-rcnn_convnext-t-p4-w7_fpn_4conv1fc-giou_amp-ms-crop-3x_coco.py" # noqa # please install mmpretrain # import mmpretrain.models to trigger register_module in mmpretrain -custom_imports = dict( - imports=['mmpretrain.models'], allow_failed_imports=False) -checkpoint_file = 'https://download.openmmlab.com/mmclassification/v0/convnext/downstream/convnext-small_3rdparty_32xb128-noema_in1k_20220301-303e75e3.pth' # noqa +custom_imports = dict(imports=["mmpretrain.models"], allow_failed_imports=False) +checkpoint_file = "https://download.openmmlab.com/mmclassification/v0/convnext/downstream/convnext-small_3rdparty_32xb128-noema_in1k_20220301-303e75e3.pth" # noqa model = dict( backbone=dict( _delete_=True, - type='mmpretrain.ConvNeXt', - arch='small', + type="mmpretrain.ConvNeXt", + arch="small", out_indices=[0, 1, 2, 3], drop_path_rate=0.6, layer_scale_init_value=1.0, gap_before_final_norm=False, - init_cfg=dict( - type='Pretrained', checkpoint=checkpoint_file, - prefix='backbone.'))) + init_cfg=dict(type="Pretrained", checkpoint=checkpoint_file, prefix="backbone."), + ) +) -optim_wrapper = dict(paramwise_cfg={ - 'decay_rate': 0.7, - 'decay_type': 'layer_wise', - 'num_layers': 12 -}) +optim_wrapper = dict(paramwise_cfg={"decay_rate": 0.7, "decay_type": "layer_wise", "num_layers": 12}) diff --git a/mmpose/configs/mmdet/convnext/cascade-mask-rcnn_convnext-t-p4-w7_fpn_4conv1fc-giou_amp-ms-crop-3x_coco.py b/mmpose/configs/mmdet/convnext/cascade-mask-rcnn_convnext-t-p4-w7_fpn_4conv1fc-giou_amp-ms-crop-3x_coco.py index c92f86838c31710dd550c36d9abc11d79bb6e2eb..80a9edd3be16531cde5dcf851640f2a4737bff23 100644 --- a/mmpose/configs/mmdet/convnext/cascade-mask-rcnn_convnext-t-p4-w7_fpn_4conv1fc-giou_amp-ms-crop-3x_coco.py +++ b/mmpose/configs/mmdet/convnext/cascade-mask-rcnn_convnext-t-p4-w7_fpn_4conv1fc-giou_amp-ms-crop-3x_coco.py @@ -1,122 +1,132 @@ _base_ = [ - '../_base_/models/cascade-mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] # please install mmpretrain # import mmpretrain.models to trigger register_module in mmpretrain -custom_imports = dict( - imports=['mmpretrain.models'], allow_failed_imports=False) -checkpoint_file = 'https://download.openmmlab.com/mmclassification/v0/convnext/downstream/convnext-tiny_3rdparty_32xb128-noema_in1k_20220301-795e9634.pth' # noqa +custom_imports = dict(imports=["mmpretrain.models"], allow_failed_imports=False) +checkpoint_file = "https://download.openmmlab.com/mmclassification/v0/convnext/downstream/convnext-tiny_3rdparty_32xb128-noema_in1k_20220301-795e9634.pth" # noqa model = dict( backbone=dict( _delete_=True, - type='mmpretrain.ConvNeXt', - arch='tiny', + type="mmpretrain.ConvNeXt", + arch="tiny", out_indices=[0, 1, 2, 3], drop_path_rate=0.4, layer_scale_init_value=1.0, gap_before_final_norm=False, - init_cfg=dict( - type='Pretrained', checkpoint=checkpoint_file, - prefix='backbone.')), + init_cfg=dict(type="Pretrained", checkpoint=checkpoint_file, prefix="backbone."), + ), neck=dict(in_channels=[96, 192, 384, 768]), - roi_head=dict(bbox_head=[ - dict( - type='ConvFCBBoxHead', - num_shared_convs=4, - num_shared_fcs=1, - in_channels=256, - conv_out_channels=256, - fc_out_channels=1024, - roi_feat_size=7, - num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), - reg_class_agnostic=False, - reg_decoded_bbox=True, - norm_cfg=dict(type='SyncBN', requires_grad=True), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=10.0)), - dict( - type='ConvFCBBoxHead', - num_shared_convs=4, - num_shared_fcs=1, - in_channels=256, - conv_out_channels=256, - fc_out_channels=1024, - roi_feat_size=7, - num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.05, 0.05, 0.1, 0.1]), - reg_class_agnostic=False, - reg_decoded_bbox=True, - norm_cfg=dict(type='SyncBN', requires_grad=True), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=10.0)), - dict( - type='ConvFCBBoxHead', - num_shared_convs=4, - num_shared_fcs=1, - in_channels=256, - conv_out_channels=256, - fc_out_channels=1024, - roi_feat_size=7, - num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.033, 0.033, 0.067, 0.067]), - reg_class_agnostic=False, - reg_decoded_bbox=True, - norm_cfg=dict(type='SyncBN', requires_grad=True), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=10.0)) - ])) + roi_head=dict( + bbox_head=[ + dict( + type="ConvFCBBoxHead", + num_shared_convs=4, + num_shared_fcs=1, + in_channels=256, + conv_out_channels=256, + fc_out_channels=1024, + roi_feat_size=7, + num_classes=80, + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + reg_class_agnostic=False, + reg_decoded_bbox=True, + norm_cfg=dict(type="SyncBN", requires_grad=True), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=10.0), + ), + dict( + type="ConvFCBBoxHead", + num_shared_convs=4, + num_shared_fcs=1, + in_channels=256, + conv_out_channels=256, + fc_out_channels=1024, + roi_feat_size=7, + num_classes=80, + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), + reg_class_agnostic=False, + reg_decoded_bbox=True, + norm_cfg=dict(type="SyncBN", requires_grad=True), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=10.0), + ), + dict( + type="ConvFCBBoxHead", + num_shared_convs=4, + num_shared_fcs=1, + in_channels=256, + conv_out_channels=256, + fc_out_channels=1024, + roi_feat_size=7, + num_classes=80, + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), + reg_class_agnostic=False, + reg_decoded_bbox=True, + norm_cfg=dict(type="SyncBN", requires_grad=True), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=10.0), + ), + ] + ), +) # augmentation strategy originates from DETR / Sparse RCNN train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', - transforms=[[ - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoice", + transforms=[ + [ + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) + ], + [ + dict(type="RandomChoiceResize", scales=[(400, 1333), (500, 1333), (600, 1333)], keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], ], - [ - dict( - type='RandomChoiceResize', - scales=[(400, 1333), (500, 1333), (600, 1333)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), - (576, 1333), (608, 1333), (640, 1333), - (672, 1333), (704, 1333), (736, 1333), - (768, 1333), (800, 1333)], - keep_ratio=True) - ]]), - dict(type='PackDetInputs') + ), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) @@ -125,30 +135,14 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[27, 33], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[27, 33], gamma=0.1), ] # Enable automatic-mixed-precision training with AmpOptimWrapper. optim_wrapper = dict( - type='AmpOptimWrapper', - constructor='LearningRateDecayOptimizerConstructor', - paramwise_cfg={ - 'decay_rate': 0.7, - 'decay_type': 'layer_wise', - 'num_layers': 6 - }, - optimizer=dict( - _delete_=True, - type='AdamW', - lr=0.0002, - betas=(0.9, 0.999), - weight_decay=0.05)) + type="AmpOptimWrapper", + constructor="LearningRateDecayOptimizerConstructor", + paramwise_cfg={"decay_rate": 0.7, "decay_type": "layer_wise", "num_layers": 6}, + optimizer=dict(_delete_=True, type="AdamW", lr=0.0002, betas=(0.9, 0.999), weight_decay=0.05), +) diff --git a/mmpose/configs/mmdet/convnext/mask-rcnn_convnext-t-p4-w7_fpn_amp-ms-crop-3x_coco.py b/mmpose/configs/mmdet/convnext/mask-rcnn_convnext-t-p4-w7_fpn_amp-ms-crop-3x_coco.py index 5792b5b5c5a03c85a7d69040dd9a0b5381bc7995..f53dfe063542a63c6b7084dab0f072f12f2cf312 100644 --- a/mmpose/configs/mmdet/convnext/mask-rcnn_convnext-t-p4-w7_fpn_amp-ms-crop-3x_coco.py +++ b/mmpose/configs/mmdet/convnext/mask-rcnn_convnext-t-p4-w7_fpn_amp-ms-crop-3x_coco.py @@ -1,63 +1,80 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] # please install mmpretrain # import mmpretrain.models to trigger register_module in mmpretrain -custom_imports = dict( - imports=['mmpretrain.models'], allow_failed_imports=False) -checkpoint_file = 'https://download.openmmlab.com/mmclassification/v0/convnext/downstream/convnext-tiny_3rdparty_32xb128-noema_in1k_20220301-795e9634.pth' # noqa +custom_imports = dict(imports=["mmpretrain.models"], allow_failed_imports=False) +checkpoint_file = "https://download.openmmlab.com/mmclassification/v0/convnext/downstream/convnext-tiny_3rdparty_32xb128-noema_in1k_20220301-795e9634.pth" # noqa model = dict( backbone=dict( _delete_=True, - type='mmpretrain.ConvNeXt', - arch='tiny', + type="mmpretrain.ConvNeXt", + arch="tiny", out_indices=[0, 1, 2, 3], drop_path_rate=0.4, layer_scale_init_value=1.0, gap_before_final_norm=False, - init_cfg=dict( - type='Pretrained', checkpoint=checkpoint_file, - prefix='backbone.')), - neck=dict(in_channels=[96, 192, 384, 768])) + init_cfg=dict(type="Pretrained", checkpoint=checkpoint_file, prefix="backbone."), + ), + neck=dict(in_channels=[96, 192, 384, 768]), +) # augmentation strategy originates from DETR / Sparse RCNN train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', - transforms=[[ - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoice", + transforms=[ + [ + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) + ], + [ + dict(type="RandomChoiceResize", scales=[(400, 1333), (500, 1333), (600, 1333)], keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], ], - [ - dict( - type='RandomChoiceResize', - scales=[(400, 1333), (500, 1333), (600, 1333)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), - (576, 1333), (608, 1333), (640, 1333), - (672, 1333), (704, 1333), (736, 1333), - (768, 1333), (800, 1333)], - keep_ratio=True) - ]]), - dict(type='PackDetInputs') + ), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) @@ -66,31 +83,20 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[27, 33], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[27, 33], gamma=0.1), ] # Enable automatic-mixed-precision training with AmpOptimWrapper. optim_wrapper = dict( - type='AmpOptimWrapper', - constructor='LearningRateDecayOptimizerConstructor', - paramwise_cfg={ - 'decay_rate': 0.95, - 'decay_type': 'layer_wise', - 'num_layers': 6 - }, + type="AmpOptimWrapper", + constructor="LearningRateDecayOptimizerConstructor", + paramwise_cfg={"decay_rate": 0.95, "decay_type": "layer_wise", "num_layers": 6}, optimizer=dict( _delete_=True, - type='AdamW', + type="AdamW", lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05, - )) + ), +) diff --git a/mmpose/configs/mmdet/cornernet/cornernet_hourglass104_10xb5-crop511-210e-mstest_coco.py b/mmpose/configs/mmdet/cornernet/cornernet_hourglass104_10xb5-crop511-210e-mstest_coco.py index 76339163b618a5a9d41a542ec75192aedb409eea..0b6815c7003739c9e199fd3aac6415fec51a5eb5 100644 --- a/mmpose/configs/mmdet/cornernet/cornernet_hourglass104_10xb5-crop511-210e-mstest_coco.py +++ b/mmpose/configs/mmdet/cornernet/cornernet_hourglass104_10xb5-crop511-210e-mstest_coco.py @@ -1,4 +1,4 @@ -_base_ = './cornernet_hourglass104_8xb6-210e-mstest_coco.py' +_base_ = "./cornernet_hourglass104_8xb6-210e-mstest_coco.py" train_dataloader = dict(batch_size=5) diff --git a/mmpose/configs/mmdet/cornernet/cornernet_hourglass104_32xb3-210e-mstest_coco.py b/mmpose/configs/mmdet/cornernet/cornernet_hourglass104_32xb3-210e-mstest_coco.py index 51a4740318a1d85a62b6b4482c53808c98fb8a62..82b2cf31f717b44039e7b815f8bd04c32f19dfc7 100644 --- a/mmpose/configs/mmdet/cornernet/cornernet_hourglass104_32xb3-210e-mstest_coco.py +++ b/mmpose/configs/mmdet/cornernet/cornernet_hourglass104_32xb3-210e-mstest_coco.py @@ -1,4 +1,4 @@ -_base_ = './cornernet_hourglass104_8xb6-210e-mstest_coco.py' +_base_ = "./cornernet_hourglass104_8xb6-210e-mstest_coco.py" train_dataloader = dict(batch_size=3) diff --git a/mmpose/configs/mmdet/cornernet/cornernet_hourglass104_8xb6-210e-mstest_coco.py b/mmpose/configs/mmdet/cornernet/cornernet_hourglass104_8xb6-210e-mstest_coco.py index bdb46fff164f796d9333c123deb701c341bdc1e3..40fb4578a47379b5f9c6192834d94b24b033969d 100644 --- a/mmpose/configs/mmdet/cornernet/cornernet_hourglass104_8xb6-210e-mstest_coco.py +++ b/mmpose/configs/mmdet/cornernet/cornernet_hourglass104_8xb6-210e-mstest_coco.py @@ -1,38 +1,30 @@ -_base_ = [ - '../_base_/default_runtime.py', '../_base_/datasets/coco_detection.py' -] +_base_ = ["../_base_/default_runtime.py", "../_base_/datasets/coco_detection.py"] -data_preprocessor = dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True) +data_preprocessor = dict(type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True) # model settings model = dict( - type='CornerNet', + type="CornerNet", data_preprocessor=data_preprocessor, backbone=dict( - type='HourglassNet', + type="HourglassNet", downsample_times=5, num_stacks=2, stage_channels=[256, 256, 384, 384, 384, 512], stage_blocks=[2, 2, 2, 2, 2, 4], - norm_cfg=dict(type='BN', requires_grad=True)), + norm_cfg=dict(type="BN", requires_grad=True), + ), neck=None, bbox_head=dict( - type='CornerHead', + type="CornerHead", num_classes=80, in_channels=256, num_feat_levels=2, corner_emb_channels=1, - loss_heatmap=dict( - type='GaussianFocalLoss', alpha=2.0, gamma=4.0, loss_weight=1), - loss_embedding=dict( - type='AssociativeEmbeddingLoss', - pull_weight=0.10, - push_weight=0.10), - loss_offset=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1)), + loss_heatmap=dict(type="GaussianFocalLoss", alpha=2.0, gamma=4.0, loss_weight=1), + loss_embedding=dict(type="AssociativeEmbeddingLoss", pull_weight=0.10, push_weight=0.10), + loss_offset=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1), + ), # training and testing settings train_cfg=None, test_cfg=dict( @@ -41,143 +33,111 @@ model = dict( distance_threshold=0.5, score_thr=0.05, max_per_img=100, - nms=dict(type='soft_nms', iou_threshold=0.5, method='gaussian'))) + nms=dict(type="soft_nms", iou_threshold=0.5, method="gaussian"), + ), +) # data settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PhotoMetricDistortion', - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PhotoMetricDistortion", brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict( # The cropped images are padded into squares during training, # but may be smaller than crop_size. - type='RandomCenterCropPad', + type="RandomCenterCropPad", crop_size=(511, 511), ratios=(0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3), test_mode=False, test_pad_mode=None, - mean=data_preprocessor['mean'], - std=data_preprocessor['std'], + mean=data_preprocessor["mean"], + std=data_preprocessor["std"], # Image data is not converted to rgb. - to_rgb=data_preprocessor['bgr_to_rgb']), + to_rgb=data_preprocessor["bgr_to_rgb"], + ), # Make sure the output is always crop_size. - dict(type='Resize', scale=(511, 511), keep_ratio=False), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs'), + dict(type="Resize", scale=(511, 511), keep_ratio=False), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ dict( - type='LoadImageFromFile', + type="LoadImageFromFile", to_float32=True, backend_args=_base_.backend_args, ), # don't need Resize dict( - type='RandomCenterCropPad', + type="RandomCenterCropPad", crop_size=None, ratios=None, border=None, test_mode=True, - test_pad_mode=['logical_or', 127], - mean=data_preprocessor['mean'], - std=data_preprocessor['std'], + test_pad_mode=["logical_or", 127], + mean=data_preprocessor["mean"], + std=data_preprocessor["std"], # Image data is not converted to rgb. - to_rgb=data_preprocessor['bgr_to_rgb']), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'border')) + to_rgb=data_preprocessor["bgr_to_rgb"], + ), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "border")), ] -train_dataloader = dict( - batch_size=6, - num_workers=3, - batch_sampler=None, - dataset=dict(pipeline=train_pipeline)) +train_dataloader = dict(batch_size=6, num_workers=3, batch_sampler=None, dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='Adam', lr=0.0005), - clip_grad=dict(max_norm=35, norm_type=2)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="Adam", lr=0.0005), clip_grad=dict(max_norm=35, norm_type=2)) max_epochs = 210 # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 3, - by_epoch=False, - begin=0, - end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[180], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 3, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[180], gamma=0.1), ] -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. # base_batch_size = (8 GPUs) x (6 samples per GPU) auto_scale_lr = dict(base_batch_size=48) -tta_model = dict( - type='DetTTAModel', - tta_cfg=dict( - nms=dict(type='soft_nms', iou_threshold=0.5, method='gaussian'), - max_per_img=100)) +tta_model = dict(type="DetTTAModel", tta_cfg=dict(nms=dict(type="soft_nms", iou_threshold=0.5, method="gaussian"), max_per_img=100)) tta_pipeline = [ + dict(type="LoadImageFromFile", to_float32=True, backend_args=_base_.backend_args), dict( - type='LoadImageFromFile', - to_float32=True, - backend_args=_base_.backend_args), - dict( - type='TestTimeAug', + type="TestTimeAug", transforms=[ [ # ``RandomFlip`` must be placed before ``RandomCenterCropPad``, # otherwise bounding box coordinates after flipping cannot be # recovered correctly. - dict(type='RandomFlip', prob=1.), - dict(type='RandomFlip', prob=0.) + dict(type="RandomFlip", prob=1.0), + dict(type="RandomFlip", prob=0.0), ], [ dict( - type='RandomCenterCropPad', + type="RandomCenterCropPad", crop_size=None, ratios=None, border=None, test_mode=True, - test_pad_mode=['logical_or', 127], - mean=data_preprocessor['mean'], - std=data_preprocessor['std'], + test_pad_mode=["logical_or", 127], + mean=data_preprocessor["mean"], + std=data_preprocessor["std"], # Image data is not converted to rgb. - to_rgb=data_preprocessor['bgr_to_rgb']) + to_rgb=data_preprocessor["bgr_to_rgb"], + ) ], - [dict(type='LoadAnnotations', with_bbox=True)], - [ - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'flip', 'flip_direction', 'border')) - ] - ]) + [dict(type="LoadAnnotations", with_bbox=True)], + [dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "flip", "flip_direction", "border"))], + ], + ), ] diff --git a/mmpose/configs/mmdet/crowddet/crowddet-rcnn_r50_fpn_8xb2-30e_crowdhuman.py b/mmpose/configs/mmdet/crowddet/crowddet-rcnn_r50_fpn_8xb2-30e_crowdhuman.py index 8815be77d49cf77afff6f888ee225e928e43b402..4ebc6bf0fcd2a7a1795a7c71bc5956b3d86e6ee1 100644 --- a/mmpose/configs/mmdet/crowddet/crowddet-rcnn_r50_fpn_8xb2-30e_crowdhuman.py +++ b/mmpose/configs/mmdet/crowddet/crowddet-rcnn_r50_fpn_8xb2-30e_crowdhuman.py @@ -1,9 +1,9 @@ -_base_ = ['../_base_/default_runtime.py'] +_base_ = ["../_base_/default_runtime.py"] model = dict( - type='CrowdDet', + type="CrowdDet", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], bgr_to_rgb=False, @@ -11,127 +11,92 @@ model = dict( # This option is set according to https://github.com/Purkialo/CrowdDet/ # blob/master/lib/data/CrowdHuman.py The images in the entire batch are # resize together. - batch_augments=[ - dict(type='BatchResize', scale=(1400, 800), pad_size_divisor=64) - ]), + batch_augments=[dict(type="BatchResize", scale=(1400, 800), pad_size_divisor=64)], + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), neck=dict( - type='FPN', + type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5, - upsample_cfg=dict(mode='bilinear', align_corners=False)), + upsample_cfg=dict(mode="bilinear", align_corners=False), + ), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, anchor_generator=dict( - type='AnchorGenerator', + type="AnchorGenerator", scales=[8], ratios=[1.0, 2.0, 3.0], strides=[4, 8, 16, 32, 64], - centers=[(8, 8), (8, 8), (8, 8), (8, 8), (8, 8)]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0.0, 0.0, 0.0, 0.0], - target_stds=[1.0, 1.0, 1.0, 1.0], - clip_border=False), - loss_cls=dict(type='CrossEntropyLoss', loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + centers=[(8, 8), (8, 8), (8, 8), (8, 8), (8, 8)], + ), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0], clip_border=False), + loss_cls=dict(type="CrossEntropyLoss", loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), roi_head=dict( - type='MultiInstanceRoIHead', + type="MultiInstanceRoIHead", bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict( - type='RoIAlign', - output_size=7, - sampling_ratio=-1, - aligned=True, - use_torchvision=True), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=-1, aligned=True, use_torchvision=True), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=dict( - type='MultiInstanceBBoxHead', + type="MultiInstanceBBoxHead", with_refine=False, num_shared_fcs=2, in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', - loss_weight=1.0, - use_sigmoid=False, - reduction='none'), - loss_bbox=dict( - type='SmoothL1Loss', loss_weight=1.0, reduction='none'))), + loss_cls=dict(type="CrossEntropyLoss", loss_weight=1.0, use_sigmoid=False, reduction="none"), + loss_bbox=dict(type="SmoothL1Loss", loss_weight=1.0, reduction="none"), + ), + ), # model training and testing settings train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=(0.3, 0.7), - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=(0.3, 0.7), min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=2400, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=2), + debug=False, + ), + rpn_proposal=dict(nms_pre=2400, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=2), rcnn=dict( assigner=dict( - type='MultiInstanceAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.3, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='MultiInsRandomSampler', - num=512, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MultiInstanceAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.3, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="MultiInsRandomSampler", num=512, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), pos_weight=-1, - debug=False)), + debug=False, + ), + ), test_cfg=dict( - rpn=dict( - nms_pre=1200, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=2), - rcnn=dict( - nms=dict(type='nms', iou_threshold=0.5), - score_thr=0.01, - max_per_img=500))) + rpn=dict(nms_pre=1200, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=2), + rcnn=dict(nms=dict(type="nms", iou_threshold=0.5), score_thr=0.01, max_per_img=500), + ), +) -dataset_type = 'CrowdHumanDataset' -data_root = 'data/CrowdHuman/' +dataset_type = "CrowdHumanDataset" +data_root = "data/CrowdHuman/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -149,79 +114,66 @@ data_root = 'data/CrowdHuman/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip', - 'flip_direction')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "flip", "flip_direction")), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1400, 800), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1400, 800), keep_ratio=True), # avoid bboxes being resized - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), batch_sampler=None, # The 'batch_sampler' may decrease the precision dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotation_train.odgt', - data_prefix=dict(img='Images/'), + ann_file="annotation_train.odgt", + data_prefix=dict(img="Images/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotation_val.odgt', - data_prefix=dict(img='Images/'), + ann_file="annotation_val.odgt", + data_prefix=dict(img="Images/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader val_evaluator = dict( - type='CrowdHumanMetric', - ann_file=data_root + 'annotation_val.odgt', - metric=['AP', 'MR', 'JI'], - backend_args=backend_args) + type="CrowdHumanMetric", ann_file=data_root + "annotation_val.odgt", metric=["AP", "MR", "JI"], backend_args=backend_args +) test_evaluator = val_evaluator -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=30, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=30, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=800), - dict( - type='MultiStepLR', - begin=0, - end=30, - by_epoch=True, - milestones=[24, 27], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=800), + dict(type="MultiStepLR", begin=0, end=30, by_epoch=True, milestones=[24, 27], gamma=0.1), ] # optimizer auto_scale_lr = dict(base_batch_size=16) -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.002, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.002, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/crowddet/crowddet-rcnn_refine_r50_fpn_8xb2-30e_crowdhuman.py b/mmpose/configs/mmdet/crowddet/crowddet-rcnn_refine_r50_fpn_8xb2-30e_crowdhuman.py index 80277ce1c1436c37c4e2a4d13293d0ecb8ba4722..d6e1a1c744e056c1352147d5a63e393b9149d8ec 100644 --- a/mmpose/configs/mmdet/crowddet/crowddet-rcnn_refine_r50_fpn_8xb2-30e_crowdhuman.py +++ b/mmpose/configs/mmdet/crowddet/crowddet-rcnn_refine_r50_fpn_8xb2-30e_crowdhuman.py @@ -1,3 +1,3 @@ -_base_ = './crowddet-rcnn_r50_fpn_8xb2-30e_crowdhuman.py' +_base_ = "./crowddet-rcnn_r50_fpn_8xb2-30e_crowdhuman.py" model = dict(roi_head=dict(bbox_head=dict(with_refine=True))) diff --git a/mmpose/configs/mmdet/dab_detr/dab-detr_r50_8xb2-50e_coco.py b/mmpose/configs/mmdet/dab_detr/dab-detr_r50_8xb2-50e_coco.py index 314ed97e2d80ae3c95119abf9166f95d416c010e..2ec102b0886ec378df2e21a380223c850b06df99 100644 --- a/mmpose/configs/mmdet/dab_detr/dab-detr_r50_8xb2-50e_coco.py +++ b/mmpose/configs/mmdet/dab_detr/dab-detr_r50_8xb2-50e_coco.py @@ -1,157 +1,136 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/default_runtime.py"] model = dict( - type='DABDETR', + type="DABDETR", num_queries=300, with_random_refpoints=False, num_patterns=0, data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=1), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=1 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, - out_indices=(3, ), + out_indices=(3,), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='ChannelMapper', - in_channels=[2048], - kernel_size=1, - out_channels=256, - act_cfg=None, - norm_cfg=None, - num_outs=1), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="ChannelMapper", in_channels=[2048], kernel_size=1, out_channels=256, act_cfg=None, norm_cfg=None, num_outs=1), encoder=dict( num_layers=6, layer_cfg=dict( - self_attn_cfg=dict( - embed_dims=256, num_heads=8, dropout=0., batch_first=True), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, - num_fcs=2, - ffn_drop=0., - act_cfg=dict(type='PReLU')))), + self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0, batch_first=True), + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, num_fcs=2, ffn_drop=0.0, act_cfg=dict(type="PReLU")), + ), + ), decoder=dict( num_layers=6, query_dim=4, - query_scale_type='cond_elewise', + query_scale_type="cond_elewise", with_modulated_hw_attn=True, layer_cfg=dict( - self_attn_cfg=dict( - embed_dims=256, - num_heads=8, - attn_drop=0., - proj_drop=0., - cross_attn=False), - cross_attn_cfg=dict( - embed_dims=256, - num_heads=8, - attn_drop=0., - proj_drop=0., - cross_attn=True), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, - num_fcs=2, - ffn_drop=0., - act_cfg=dict(type='PReLU'))), - return_intermediate=True), + self_attn_cfg=dict(embed_dims=256, num_heads=8, attn_drop=0.0, proj_drop=0.0, cross_attn=False), + cross_attn_cfg=dict(embed_dims=256, num_heads=8, attn_drop=0.0, proj_drop=0.0, cross_attn=True), + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, num_fcs=2, ffn_drop=0.0, act_cfg=dict(type="PReLU")), + ), + return_intermediate=True, + ), positional_encoding=dict(num_feats=128, temperature=20, normalize=True), bbox_head=dict( - type='DABDETRHead', + type="DABDETRHead", num_classes=80, embed_dims=256, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=5.0), - loss_iou=dict(type='GIoULoss', loss_weight=2.0)), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=5.0), + loss_iou=dict(type="GIoULoss", loss_weight=2.0), + ), # training and testing settings train_cfg=dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='FocalLossCost', weight=2., eps=1e-8), - dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'), - dict(type='IoUCost', iou_mode='giou', weight=2.0) - ])), - test_cfg=dict(max_per_img=300)) + dict(type="FocalLossCost", weight=2.0, eps=1e-8), + dict(type="BBoxL1Cost", weight=5.0, box_format="xywh"), + dict(type="IoUCost", iou_mode="giou", weight=2.0), + ], + ) + ), + test_cfg=dict(max_per_img=300), +) # train_pipeline, NOTE the img_scale and the Pad's size_divisor is different # from the default setting in mmdet. train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', - transforms=[[ - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoice", + transforms=[ + [ + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) + ], + [ + dict(type="RandomChoiceResize", scales=[(400, 1333), (500, 1333), (600, 1333)], keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], ], - [ - dict( - type='RandomChoiceResize', - scales=[(400, 1333), (500, 1333), (600, 1333)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), - (576, 1333), (608, 1333), (640, 1333), - (672, 1333), (704, 1333), (736, 1333), - (768, 1333), (800, 1333)], - keep_ratio=True) - ]]), - dict(type='PackDetInputs') + ), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0001, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0001, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), - paramwise_cfg=dict( - custom_keys={'backbone': dict(lr_mult=0.1, decay_mult=1.0)})) + paramwise_cfg=dict(custom_keys={"backbone": dict(lr_mult=0.1, decay_mult=1.0)}), +) # learning policy max_epochs = 50 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[40], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[40], gamma=0.1)] # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/dcn/cascade-mask-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/dcn/cascade-mask-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py index 8c0ff9890e82bd0c1ee4e445e37d2c7afa534161..b95f7bdc883b65b8a2afcf49986ee6b4ce320daa 100644 --- a/mmpose/configs/mmdet/dcn/cascade-mask-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/dcn/cascade-mask-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py @@ -1,5 +1,2 @@ -_base_ = '../cascade_rcnn/cascade-mask-rcnn_r101_fpn_1x_coco.py' -model = dict( - backbone=dict( - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +_base_ = "../cascade_rcnn/cascade-mask-rcnn_r101_fpn_1x_coco.py" +model = dict(backbone=dict(dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) diff --git a/mmpose/configs/mmdet/dcn/cascade-mask-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/dcn/cascade-mask-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py index cfcc5e73cc508e11d77c5a3557f30632b545b803..ecffd068fccf5025a7b2aae52ee15dffab9176eb 100644 --- a/mmpose/configs/mmdet/dcn/cascade-mask-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/dcn/cascade-mask-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py @@ -1,5 +1,2 @@ -_base_ = '../cascade_rcnn/cascade-mask-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +_base_ = "../cascade_rcnn/cascade-mask-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) diff --git a/mmpose/configs/mmdet/dcn/cascade-mask-rcnn_x101-32x4d-dconv-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/dcn/cascade-mask-rcnn_x101-32x4d-dconv-c3-c5_fpn_1x_coco.py index 48b25f62125da09368c446bcd6ccff9b0219a7cc..3089aebe154ec2406433ab1008c90447d43109f0 100644 --- a/mmpose/configs/mmdet/dcn/cascade-mask-rcnn_x101-32x4d-dconv-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/dcn/cascade-mask-rcnn_x101-32x4d-dconv-c3-c5_fpn_1x_coco.py @@ -1,5 +1,2 @@ -_base_ = '../cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_1x_coco.py' -model = dict( - backbone=dict( - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +_base_ = "../cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_1x_coco.py" +model = dict(backbone=dict(dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) diff --git a/mmpose/configs/mmdet/dcn/cascade-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/dcn/cascade-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py index 8a942da754119b8d913f807907322a3d96c83ff8..36eca9df71542d1e680eeaa89348db452e6ad100 100644 --- a/mmpose/configs/mmdet/dcn/cascade-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/dcn/cascade-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py @@ -1,5 +1,2 @@ -_base_ = '../cascade_rcnn/cascade-rcnn_r101_fpn_1x_coco.py' -model = dict( - backbone=dict( - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +_base_ = "../cascade_rcnn/cascade-rcnn_r101_fpn_1x_coco.py" +model = dict(backbone=dict(dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) diff --git a/mmpose/configs/mmdet/dcn/cascade-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/dcn/cascade-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py index f6bf5b7998a972f41b52f90955ef52977adfd68c..f43b9abbecb21ebf84cfb1ed9a573d697e303be7 100644 --- a/mmpose/configs/mmdet/dcn/cascade-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/dcn/cascade-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py @@ -1,5 +1,2 @@ -_base_ = '../cascade_rcnn/cascade-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +_base_ = "../cascade_rcnn/cascade-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) diff --git a/mmpose/configs/mmdet/dcn/faster-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/dcn/faster-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py index db44e7e87b2d11555140ab2c8a19f32e1ce65770..f29e89aab25fb5185be0eb8360d063db5bfcf7cd 100644 --- a/mmpose/configs/mmdet/dcn/faster-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/dcn/faster-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py @@ -1,5 +1,2 @@ -_base_ = '../faster_rcnn/faster-rcnn_r101_fpn_1x_coco.py' -model = dict( - backbone=dict( - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +_base_ = "../faster_rcnn/faster-rcnn_r101_fpn_1x_coco.py" +model = dict(backbone=dict(dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) diff --git a/mmpose/configs/mmdet/dcn/faster-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/dcn/faster-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py index 95f20467af60167a4a61f253e4354dadd832ccc7..c1b47e6ee1d65b45927401d7e6022fe4bd2938b3 100644 --- a/mmpose/configs/mmdet/dcn/faster-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/dcn/faster-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py @@ -1,5 +1,2 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) diff --git a/mmpose/configs/mmdet/dcn/faster-rcnn_r50_fpn_dpool_1x_coco.py b/mmpose/configs/mmdet/dcn/faster-rcnn_r50_fpn_dpool_1x_coco.py index c65ce5fd0267dc892455da6495cd3be9f1f99fcf..2ff00a54115caa6523d711718b6cf4f89ca6ad01 100644 --- a/mmpose/configs/mmdet/dcn/faster-rcnn_r50_fpn_dpool_1x_coco.py +++ b/mmpose/configs/mmdet/dcn/faster-rcnn_r50_fpn_dpool_1x_coco.py @@ -1,12 +1,11 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" model = dict( roi_head=dict( bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict( - _delete_=True, - type='DeformRoIPoolPack', - output_size=7, - output_channels=256), + type="SingleRoIExtractor", + roi_layer=dict(_delete_=True, type="DeformRoIPoolPack", output_size=7, output_channels=256), out_channels=256, - featmap_strides=[4, 8, 16, 32]))) + featmap_strides=[4, 8, 16, 32], + ) + ) +) diff --git a/mmpose/configs/mmdet/dcn/faster-rcnn_x101-32x4d-dconv-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/dcn/faster-rcnn_x101-32x4d-dconv-c3-c5_fpn_1x_coco.py index e4ed832f5e7ff0d050be33e57d2fa611e9ae7e8e..7c5ba9c4b54994abbd9b98d74c399fa774b4b81d 100644 --- a/mmpose/configs/mmdet/dcn/faster-rcnn_x101-32x4d-dconv-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/dcn/faster-rcnn_x101-32x4d-dconv-c3-c5_fpn_1x_coco.py @@ -1,16 +1,17 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/dcn/mask-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/dcn/mask-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py index 3f36714a5301823ca401820ab9d926374428ee70..698bdb57afa0a59e19159913cc22f5f3bc8dc472 100644 --- a/mmpose/configs/mmdet/dcn/mask-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/dcn/mask-rcnn_r101-dconv-c3-c5_fpn_1x_coco.py @@ -1,5 +1,2 @@ -_base_ = '../mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py' -model = dict( - backbone=dict( - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +_base_ = "../mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py" +model = dict(backbone=dict(dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) diff --git a/mmpose/configs/mmdet/dcn/mask-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/dcn/mask-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py index 0b281d417b4f6a7320201da261e5fdf6950556a1..46bd782cdf369b52d47e18f3bbd2e798045eb3f6 100644 --- a/mmpose/configs/mmdet/dcn/mask-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/dcn/mask-rcnn_r50-dconv-c3-c5_fpn_1x_coco.py @@ -1,5 +1,2 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) diff --git a/mmpose/configs/mmdet/dcn/mask-rcnn_r50-dconv-c3-c5_fpn_amp-1x_coco.py b/mmpose/configs/mmdet/dcn/mask-rcnn_r50-dconv-c3-c5_fpn_amp-1x_coco.py index 9d01594314aad74bc47d7331c42a39f2ca453071..e75b13151834c31e2ab426218bbce289db011cd1 100644 --- a/mmpose/configs/mmdet/dcn/mask-rcnn_r50-dconv-c3-c5_fpn_amp-1x_coco.py +++ b/mmpose/configs/mmdet/dcn/mask-rcnn_r50-dconv-c3-c5_fpn_amp-1x_coco.py @@ -1,10 +1,7 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) # MMEngine support the following two ways, users can choose # according to convenience # optim_wrapper = dict(type='AmpOptimWrapper') -_base_.optim_wrapper.type = 'AmpOptimWrapper' +_base_.optim_wrapper.type = "AmpOptimWrapper" diff --git a/mmpose/configs/mmdet/dcnv2/faster-rcnn_r50-mdconv-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/dcnv2/faster-rcnn_r50-mdconv-c3-c5_fpn_1x_coco.py index a7f7e4eecaf74418690975d54d09eeb0e31f9a1f..b3cdd5cb6ddfc0ff6525266df16bd3aea2a4c01b 100644 --- a/mmpose/configs/mmdet/dcnv2/faster-rcnn_r50-mdconv-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/dcnv2/faster-rcnn_r50-mdconv-c3-c5_fpn_1x_coco.py @@ -1,5 +1,2 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(dcn=dict(type="DCNv2", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) diff --git a/mmpose/configs/mmdet/dcnv2/faster-rcnn_r50-mdconv-group4-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/dcnv2/faster-rcnn_r50-mdconv-group4-c3-c5_fpn_1x_coco.py index 5c58dbed3782403a5fac3c6809598372e47cd72c..6d95f62f30b62e7e07f454965564582cb61b9f03 100644 --- a/mmpose/configs/mmdet/dcnv2/faster-rcnn_r50-mdconv-group4-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/dcnv2/faster-rcnn_r50-mdconv-group4-c3-c5_fpn_1x_coco.py @@ -1,5 +1,2 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - dcn=dict(type='DCNv2', deform_groups=4, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(dcn=dict(type="DCNv2", deform_groups=4, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) diff --git a/mmpose/configs/mmdet/dcnv2/faster-rcnn_r50_fpn_mdpool_1x_coco.py b/mmpose/configs/mmdet/dcnv2/faster-rcnn_r50_fpn_mdpool_1x_coco.py index 6198d6d7d72f8d012c777330f1116b46b89290be..100abd1017e9194462625ad0c904d61d63afe085 100644 --- a/mmpose/configs/mmdet/dcnv2/faster-rcnn_r50_fpn_mdpool_1x_coco.py +++ b/mmpose/configs/mmdet/dcnv2/faster-rcnn_r50_fpn_mdpool_1x_coco.py @@ -1,12 +1,11 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" model = dict( roi_head=dict( bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict( - _delete_=True, - type='ModulatedDeformRoIPoolPack', - output_size=7, - output_channels=256), + type="SingleRoIExtractor", + roi_layer=dict(_delete_=True, type="ModulatedDeformRoIPoolPack", output_size=7, output_channels=256), out_channels=256, - featmap_strides=[4, 8, 16, 32]))) + featmap_strides=[4, 8, 16, 32], + ) + ) +) diff --git a/mmpose/configs/mmdet/dcnv2/mask-rcnn_r50-mdconv-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/dcnv2/mask-rcnn_r50-mdconv-c3-c5_fpn_1x_coco.py index f7a90bbf31bea3663820caa4541de3ceafeb7366..8d0cd9224022f38f18c6fda12de612e1541d3fe5 100644 --- a/mmpose/configs/mmdet/dcnv2/mask-rcnn_r50-mdconv-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/dcnv2/mask-rcnn_r50-mdconv-c3-c5_fpn_1x_coco.py @@ -1,5 +1,2 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(dcn=dict(type="DCNv2", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) diff --git a/mmpose/configs/mmdet/dcnv2/mask-rcnn_r50-mdconv-c3-c5_fpn_amp-1x_coco.py b/mmpose/configs/mmdet/dcnv2/mask-rcnn_r50-mdconv-c3-c5_fpn_amp-1x_coco.py index 3b3894c2d61ee3208170235ba1aa98def79a7120..a914884e760e72d7dd365753d3fed52af85d08c9 100644 --- a/mmpose/configs/mmdet/dcnv2/mask-rcnn_r50-mdconv-c3-c5_fpn_amp-1x_coco.py +++ b/mmpose/configs/mmdet/dcnv2/mask-rcnn_r50-mdconv-c3-c5_fpn_amp-1x_coco.py @@ -1,10 +1,7 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(dcn=dict(type="DCNv2", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) # MMEngine support the following two ways, users can choose # according to convenience # optim_wrapper = dict(type='AmpOptimWrapper') -_base_.optim_wrapper.type = 'AmpOptimWrapper' +_base_.optim_wrapper.type = "AmpOptimWrapper" diff --git a/mmpose/configs/mmdet/ddod/ddod_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/ddod/ddod_r50_fpn_1x_coco.py index fed1116b1f92e613517a57aa196839e4de3037dc..a33011fc65c7db7ff4f605cf9e6c9cf43e25c436 100644 --- a/mmpose/configs/mmdet/ddod/ddod_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/ddod/ddod_r50_fpn_1x_coco.py @@ -1,72 +1,44 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] model = dict( - type='DDOD', + type="DDOD", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5), bbox_head=dict( - type='DDODHead', + type="DDODHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=2.0), - loss_iou=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), + loss_iou=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), train_cfg=dict( # assigner is mean cls_assigner - assigner=dict(type='ATSSAssigner', topk=9, alpha=0.8), - reg_assigner=dict(type='ATSSAssigner', topk=9, alpha=0.5), + assigner=dict(type="ATSSAssigner", topk=9, alpha=0.8), + reg_assigner=dict(type="ATSSAssigner", topk=9, alpha=0.5), allowed_border=-1, pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + debug=False, + ), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/ddq/ddq-detr-4scale_r50_8xb2-12e_coco.py b/mmpose/configs/mmdet/ddq/ddq-detr-4scale_r50_8xb2-12e_coco.py index 5e64afc087e1ed68b8b5d1474127c832f893cb9b..fb9f51196a6689d5f768171c1e747aa4f4b6b8ab 100644 --- a/mmpose/configs/mmdet/ddq/ddq-detr-4scale_r50_8xb2-12e_coco.py +++ b/mmpose/configs/mmdet/ddq/ddq-detr-4scale_r50_8xb2-12e_coco.py @@ -1,47 +1,42 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/default_runtime.py"] model = dict( - type='DDQDETR', + type="DDQDETR", num_queries=900, # num_matching_queries # ratio of num_dense queries to num_queries dense_topk_ratio=1.5, with_box_refine=True, as_two_stage=True, data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=1), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=1 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), neck=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[512, 1024, 2048], kernel_size=1, out_channels=256, act_cfg=None, - norm_cfg=dict(type='GN', num_groups=32), - num_outs=4), + norm_cfg=dict(type="GN", num_groups=32), + num_outs=4, + ), # encoder class name: DeformableDetrTransformerEncoder encoder=dict( num_layers=6, layer_cfg=dict( - self_attn_cfg=dict(embed_dims=256, num_levels=4, - dropout=0.0), # 0.1 for DeformDETR - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, # 1024 for DeformDETR - ffn_drop=0.0))), # 0.1 for DeformDETR + self_attn_cfg=dict(embed_dims=256, num_levels=4, dropout=0.0), # 0.1 for DeformDETR + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), # 1024 for DeformDETR + ), + ), # 0.1 for DeformDETR # decoder class name: DDQTransformerDecoder decoder=dict( # `num_layers` >= 2, because attention masks of the last @@ -49,119 +44,115 @@ model = dict( num_layers=6, return_intermediate=True, layer_cfg=dict( - self_attn_cfg=dict(embed_dims=256, num_heads=8, - dropout=0.0), # 0.1 for DeformDETR - cross_attn_cfg=dict(embed_dims=256, num_levels=4, - dropout=0.0), # 0.1 for DeformDETR - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, # 1024 for DeformDETR - ffn_drop=0.0)), # 0.1 for DeformDETR - post_norm_cfg=None), - positional_encoding=dict( - num_feats=128, - normalize=True, - offset=0.0, # -0.5 for DeformDETR - temperature=20), # 10000 for DeformDETR + self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), # 0.1 for DeformDETR + cross_attn_cfg=dict(embed_dims=256, num_levels=4, dropout=0.0), # 0.1 for DeformDETR + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), # 1024 for DeformDETR + ), # 0.1 for DeformDETR + post_norm_cfg=None, + ), + positional_encoding=dict(num_feats=128, normalize=True, offset=0.0, temperature=20), # -0.5 for DeformDETR # 10000 for DeformDETR bbox_head=dict( - type='DDQDETRHead', + type="DDQDETRHead", num_classes=80, sync_cls_avg_factor=True, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=5.0), - loss_iou=dict(type='GIoULoss', loss_weight=2.0)), - dn_cfg=dict( - label_noise_scale=0.5, - box_noise_scale=1.0, - group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100)), - dqs_cfg=dict(type='nms', iou_threshold=0.8), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=5.0), + loss_iou=dict(type="GIoULoss", loss_weight=2.0), + ), + dn_cfg=dict(label_noise_scale=0.5, box_noise_scale=1.0, group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100)), + dqs_cfg=dict(type="nms", iou_threshold=0.8), # training and testing settings train_cfg=dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='FocalLossCost', weight=2.0), - dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'), - dict(type='IoUCost', iou_mode='giou', weight=2.0) - ])), - test_cfg=dict(max_per_img=300)) + dict(type="FocalLossCost", weight=2.0), + dict(type="BBoxL1Cost", weight=5.0, box_format="xywh"), + dict(type="IoUCost", iou_mode="giou", weight=2.0), + ], + ) + ), + test_cfg=dict(max_per_img=300), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='PackDetInputs') + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="PackDetInputs"), ] -train_dataloader = dict( - dataset=dict( - filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline)) +train_dataloader = dict(dataset=dict(filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline)) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0002, weight_decay=0.05), clip_grad=dict(max_norm=0.1, norm_type=2), - paramwise_cfg=dict(custom_keys={'backbone': dict(lr_mult=0.1)})) + paramwise_cfg=dict(custom_keys={"backbone": dict(lr_mult=0.1)}), +) # learning policy max_epochs = 12 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") param_scheduler = [ - dict( - type='LinearLR', - start_factor=0.0001, - by_epoch=False, - begin=0, - end=2000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.0001, by_epoch=False, begin=0, end=2000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[11], gamma=0.1), ] # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/ddq/ddq-detr-4scale_swinl_8xb2-30e_coco.py b/mmpose/configs/mmdet/ddq/ddq-detr-4scale_swinl_8xb2-30e_coco.py index d863649411e3157373961b3da339990df1e6f267..c7c480daee44b65abf8bd3765f562a8e40d9a733 100644 --- a/mmpose/configs/mmdet/ddq/ddq-detr-4scale_swinl_8xb2-30e_coco.py +++ b/mmpose/configs/mmdet/ddq/ddq-detr-4scale_swinl_8xb2-30e_coco.py @@ -1,22 +1,17 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', '../_base_/default_runtime.py' -] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth' # noqa: E501 +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/default_runtime.py"] +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth" # noqa: E501 model = dict( - type='DDQDETR', + type="DDQDETR", num_queries=900, # num_matching_queries # ratio of num_dense queries to num_queries dense_topk_ratio=1.5, with_box_refine=True, as_two_stage=True, data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=1), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=1 + ), backbone=dict( - type='SwinTransformer', + type="SwinTransformer", pretrain_img_size=384, embed_dims=192, depths=[2, 2, 18, 2], @@ -25,150 +20,146 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=(1, 2, 3), with_cp=False, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), neck=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[384, 768, 1536], kernel_size=1, out_channels=256, act_cfg=None, - norm_cfg=dict(type='GN', num_groups=32), - num_outs=4), + norm_cfg=dict(type="GN", num_groups=32), + num_outs=4, + ), # encoder class name: DeformableDetrTransformerEncoder encoder=dict( num_layers=6, layer_cfg=dict( - self_attn_cfg=dict(embed_dims=256, num_levels=4, - dropout=0.0), # 0.1 for DeformDETR - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, # 1024 for DeformDETR - ffn_drop=0.0))), # 0.1 for DeformDETR + self_attn_cfg=dict(embed_dims=256, num_levels=4, dropout=0.0), # 0.1 for DeformDETR + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), # 1024 for DeformDETR + ), + ), # 0.1 for DeformDETR # decoder class name: DDQTransformerDecoder decoder=dict( num_layers=6, return_intermediate=True, layer_cfg=dict( - self_attn_cfg=dict(embed_dims=256, num_heads=8, - dropout=0.0), # 0.1 for DeformDETR - cross_attn_cfg=dict(embed_dims=256, num_levels=4, - dropout=0.0), # 0.1 for DeformDETR - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, # 1024 for DeformDETR - ffn_drop=0.0)), # 0.1 for DeformDETR - post_norm_cfg=None), - positional_encoding=dict( - num_feats=128, - normalize=True, - offset=0.0, # -0.5 for DeformDETR - temperature=20), # 10000 for DeformDETR + self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), # 0.1 for DeformDETR + cross_attn_cfg=dict(embed_dims=256, num_levels=4, dropout=0.0), # 0.1 for DeformDETR + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), # 1024 for DeformDETR + ), # 0.1 for DeformDETR + post_norm_cfg=None, + ), + positional_encoding=dict(num_feats=128, normalize=True, offset=0.0, temperature=20), # -0.5 for DeformDETR # 10000 for DeformDETR bbox_head=dict( - type='DDQDETRHead', + type="DDQDETRHead", num_classes=80, sync_cls_avg_factor=True, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=5.0), - loss_iou=dict(type='GIoULoss', loss_weight=2.0)), - dn_cfg=dict( - label_noise_scale=0.5, - box_noise_scale=1.0, - group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100)), - dqs_cfg=dict(type='nms', iou_threshold=0.8), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=5.0), + loss_iou=dict(type="GIoULoss", loss_weight=2.0), + ), + dn_cfg=dict(label_noise_scale=0.5, box_noise_scale=1.0, group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100)), + dqs_cfg=dict(type="nms", iou_threshold=0.8), # training and testing settings train_cfg=dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='FocalLossCost', weight=2.0), - dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'), - dict(type='IoUCost', iou_mode='giou', weight=2.0) - ])), - test_cfg=dict(max_per_img=300)) + dict(type="FocalLossCost", weight=2.0), + dict(type="BBoxL1Cost", weight=5.0, box_format="xywh"), + dict(type="IoUCost", iou_mode="giou", weight=2.0), + ], + ) + ), + test_cfg=dict(max_per_img=300), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='PackDetInputs') + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="PackDetInputs"), ] -train_dataloader = dict( - dataset=dict( - filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline)) +train_dataloader = dict(dataset=dict(filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline)) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0002, weight_decay=0.05), clip_grad=dict(max_norm=0.1, norm_type=2), - paramwise_cfg=dict(custom_keys={'backbone': dict(lr_mult=0.05)})) + paramwise_cfg=dict(custom_keys={"backbone": dict(lr_mult=0.05)}), +) # learning policy max_epochs = 30 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") param_scheduler = [ - dict( - type='LinearLR', - start_factor=0.0001, - by_epoch=False, - begin=0, - end=2000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[20, 26], - gamma=0.1) + dict(type="LinearLR", start_factor=0.0001, by_epoch=False, begin=0, end=2000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[20, 26], gamma=0.1), ] # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/ddq/ddq-detr-5scale_r50_8xb2-12e_coco.py b/mmpose/configs/mmdet/ddq/ddq-detr-5scale_r50_8xb2-12e_coco.py index 3c38f553bdd46bc4e0611bbd0fd4bab0c1929825..426baeb68d01614f079adc7df0725957957bbca5 100644 --- a/mmpose/configs/mmdet/ddq/ddq-detr-5scale_r50_8xb2-12e_coco.py +++ b/mmpose/configs/mmdet/ddq/ddq-detr-5scale_r50_8xb2-12e_coco.py @@ -1,8 +1,6 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/default_runtime.py"] model = dict( - type='DDQDETR', + type="DDQDETR", num_queries=900, # num_matching_queries # ratio of num_dense queries to num_queries dense_topk_ratio=1.5, @@ -10,159 +8,152 @@ model = dict( as_two_stage=True, num_feature_levels=5, data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=1), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=1 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), neck=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[256, 512, 1024, 2048], kernel_size=1, out_channels=256, act_cfg=None, - norm_cfg=dict(type='GN', num_groups=32), - num_outs=5), + norm_cfg=dict(type="GN", num_groups=32), + num_outs=5, + ), # encoder class name: DeformableDetrTransformerEncoder encoder=dict( num_layers=6, layer_cfg=dict( - self_attn_cfg=dict(embed_dims=256, num_levels=5, - dropout=0.0), # 0.1 for DeformDETR - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, # 1024 for DeformDETR - ffn_drop=0.0))), # 0.1 for DeformDETR + self_attn_cfg=dict(embed_dims=256, num_levels=5, dropout=0.0), # 0.1 for DeformDETR + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), # 1024 for DeformDETR + ), + ), # 0.1 for DeformDETR # decoder class name: DDQTransformerDecoder decoder=dict( num_layers=6, return_intermediate=True, layer_cfg=dict( - self_attn_cfg=dict(embed_dims=256, num_heads=8, - dropout=0.0), # 0.1 for DeformDETR - cross_attn_cfg=dict(embed_dims=256, num_levels=5, - dropout=0.0), # 0.1 for DeformDETR - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, # 1024 for DeformDETR - ffn_drop=0.0)), # 0.1 for DeformDETR - post_norm_cfg=None), - positional_encoding=dict( - num_feats=128, - normalize=True, - offset=0.0, # -0.5 for DeformDETR - temperature=20), # 10000 for DeformDETR + self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), # 0.1 for DeformDETR + cross_attn_cfg=dict(embed_dims=256, num_levels=5, dropout=0.0), # 0.1 for DeformDETR + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), # 1024 for DeformDETR + ), # 0.1 for DeformDETR + post_norm_cfg=None, + ), + positional_encoding=dict(num_feats=128, normalize=True, offset=0.0, temperature=20), # -0.5 for DeformDETR # 10000 for DeformDETR bbox_head=dict( - type='DDQDETRHead', + type="DDQDETRHead", num_classes=80, sync_cls_avg_factor=True, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=5.0), - loss_iou=dict(type='GIoULoss', loss_weight=2.0)), - dn_cfg=dict( - label_noise_scale=0.5, - box_noise_scale=1.0, - group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100)), - dqs_cfg=dict(type='nms', iou_threshold=0.8), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=5.0), + loss_iou=dict(type="GIoULoss", loss_weight=2.0), + ), + dn_cfg=dict(label_noise_scale=0.5, box_noise_scale=1.0, group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100)), + dqs_cfg=dict(type="nms", iou_threshold=0.8), # training and testing settings train_cfg=dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='FocalLossCost', weight=2.0), - dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'), - dict(type='IoUCost', iou_mode='giou', weight=2.0) - ])), - test_cfg=dict(max_per_img=300)) + dict(type="FocalLossCost", weight=2.0), + dict(type="BBoxL1Cost", weight=5.0, box_format="xywh"), + dict(type="IoUCost", iou_mode="giou", weight=2.0), + ], + ) + ), + test_cfg=dict(max_per_img=300), +) # train_pipeline, NOTE the img_scale and the Pad's size_divisor is different # from the default setting in mmdet. train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='PackDetInputs') + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="PackDetInputs"), ] -train_dataloader = dict( - dataset=dict( - filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline)) +train_dataloader = dict(dataset=dict(filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline)) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0002, weight_decay=0.05), clip_grad=dict(max_norm=0.1, norm_type=2), - paramwise_cfg=dict(custom_keys={'backbone': dict(lr_mult=0.1)})) + paramwise_cfg=dict(custom_keys={"backbone": dict(lr_mult=0.1)}), +) # learning policy max_epochs = 12 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") param_scheduler = [ - dict( - type='LinearLR', - start_factor=0.0001, - by_epoch=False, - begin=0, - end=2000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.0001, by_epoch=False, begin=0, end=2000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[11], gamma=0.1), ] # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/deepfashion/mask-rcnn_r50_fpn_15e_deepfashion.py b/mmpose/configs/mmdet/deepfashion/mask-rcnn_r50_fpn_15e_deepfashion.py index 403b18a4ca8ed61aedcb99218ecc79302826ff8c..10294cea09ab5d217b943c4f5a41e9584926a84a 100644 --- a/mmpose/configs/mmdet/deepfashion/mask-rcnn_r50_fpn_15e_deepfashion.py +++ b/mmpose/configs/mmdet/deepfashion/mask-rcnn_r50_fpn_15e_deepfashion.py @@ -1,23 +1,14 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/deepfashion.py', '../_base_/schedules/schedule_1x.py', - '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/deepfashion.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -model = dict( - roi_head=dict( - bbox_head=dict(num_classes=15), mask_head=dict(num_classes=15))) +model = dict(roi_head=dict(bbox_head=dict(num_classes=15), mask_head=dict(num_classes=15))) # runtime settings max_epochs = 15 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[8, 11], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/deepsort/deepsort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py b/mmpose/configs/mmdet/deepsort/deepsort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py index 70d3393829b422740bfba5d1746c7651e9c2d69c..5cb98e523bf394f1f2e1e8528ec686759ece59f1 100644 --- a/mmpose/configs/mmdet/deepsort/deepsort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py +++ b/mmpose/configs/mmdet/deepsort/deepsort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py @@ -1,85 +1,69 @@ -_base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/mot_challenge.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/models/faster-rcnn_r50_fpn.py", "../_base_/datasets/mot_challenge.py", "../_base_/default_runtime.py"] -default_hooks = dict( - logger=dict(type='LoggerHook', interval=1), - visualization=dict(type='TrackVisualizationHook', draw=False)) +default_hooks = dict(logger=dict(type="LoggerHook", interval=1), visualization=dict(type="TrackVisualizationHook", draw=False)) -vis_backends = [dict(type='LocalVisBackend')] -visualizer = dict( - type='TrackLocalVisualizer', vis_backends=vis_backends, name='visualizer') +vis_backends = [dict(type="LocalVisBackend")] +visualizer = dict(type="TrackLocalVisualizer", vis_backends=vis_backends, name="visualizer") # custom hooks custom_hooks = [ # Synchronize model buffers such as running_mean and running_var in BN # at the end of each epoch - dict(type='SyncBuffersHook') + dict(type="SyncBuffersHook") ] detector = _base_.model -detector.pop('data_preprocessor') +detector.pop("data_preprocessor") detector.rpn_head.bbox_coder.update(dict(clip_border=False)) detector.roi_head.bbox_head.update(dict(num_classes=1)) detector.roi_head.bbox_head.bbox_coder.update(dict(clip_border=False)) -detector['init_cfg'] = dict( - type='Pretrained', - checkpoint= # noqa: E251 - 'https://download.openmmlab.com/mmtracking/mot/faster_rcnn/' - 'faster-rcnn_r50_fpn_4e_mot17-half-64ee2ed4.pth') +detector["init_cfg"] = dict( + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmtracking/mot/faster_rcnn/" "faster-rcnn_r50_fpn_4e_mot17-half-64ee2ed4.pth", # noqa: E251 +) del _base_.model model = dict( - type='DeepSORT', + type="DeepSORT", data_preprocessor=dict( - type='TrackDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="TrackDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), detector=detector, reid=dict( - type='BaseReID', - data_preprocessor=dict(type='mmpretrain.ClsDataPreprocessor'), - backbone=dict( - type='mmpretrain.ResNet', - depth=50, - num_stages=4, - out_indices=(3, ), - style='pytorch'), - neck=dict(type='GlobalAveragePooling', kernel_size=(8, 4), stride=1), + type="BaseReID", + data_preprocessor=dict(type="mmpretrain.ClsDataPreprocessor"), + backbone=dict(type="mmpretrain.ResNet", depth=50, num_stages=4, out_indices=(3,), style="pytorch"), + neck=dict(type="GlobalAveragePooling", kernel_size=(8, 4), stride=1), head=dict( - type='LinearReIDHead', + type="LinearReIDHead", num_fcs=1, in_channels=2048, fc_channels=1024, out_channels=128, num_classes=380, - loss_cls=dict(type='mmpretrain.CrossEntropyLoss', loss_weight=1.0), - loss_triplet=dict(type='TripletLoss', margin=0.3, loss_weight=1.0), - norm_cfg=dict(type='BN1d'), - act_cfg=dict(type='ReLU')), + loss_cls=dict(type="mmpretrain.CrossEntropyLoss", loss_weight=1.0), + loss_triplet=dict(type="TripletLoss", margin=0.3, loss_weight=1.0), + norm_cfg=dict(type="BN1d"), + act_cfg=dict(type="ReLU"), + ), init_cfg=dict( - type='Pretrained', - checkpoint= # noqa: E251 - 'https://download.openmmlab.com/mmtracking/mot/reid/tracktor_reid_r50_iter25245-a452f51f.pth' # noqa: E501 - )), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmtracking/mot/reid/tracktor_reid_r50_iter25245-a452f51f.pth", # noqa: E251 # noqa: E501 + ), + ), tracker=dict( - type='SORTTracker', - motion=dict(type='KalmanFilter', center_only=False), + type="SORTTracker", + motion=dict(type="KalmanFilter", center_only=False), obj_score_thr=0.5, - reid=dict( - num_samples=10, - img_scale=(256, 128), - img_norm_cfg=None, - match_score_thr=2.0), + reid=dict(num_samples=10, img_scale=(256, 128), img_norm_cfg=None, match_score_thr=2.0), match_iou_thr=0.5, momentums=None, num_tentatives=2, - num_frames_retain=100)) + num_frames_retain=100, + ), +) train_dataloader = None train_cfg = None -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") diff --git a/mmpose/configs/mmdet/deepsort/deepsort_faster-rcnn_r50_fpn_8xb2-4e_mot17train_test-mot17test.py b/mmpose/configs/mmdet/deepsort/deepsort_faster-rcnn_r50_fpn_8xb2-4e_mot17train_test-mot17test.py index 687ce7adfcc1742bab75cca939a99df37b43689c..2a4145b97ab7dd5b5fd61a55db64647fc8cb4c72 100644 --- a/mmpose/configs/mmdet/deepsort/deepsort_faster-rcnn_r50_fpn_8xb2-4e_mot17train_test-mot17test.py +++ b/mmpose/configs/mmdet/deepsort/deepsort_faster-rcnn_r50_fpn_8xb2-4e_mot17train_test-mot17test.py @@ -1,15 +1,8 @@ -_base_ = [ - './deepsort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain' - '_test-mot17halfval.py' -] +_base_ = ["./deepsort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain" "_test-mot17halfval.py"] # dataloader -val_dataloader = dict( - dataset=dict(ann_file='annotations/train_cocoformat.json')) -test_dataloader = dict( - dataset=dict( - ann_file='annotations/test_cocoformat.json', - data_prefix=dict(img_path='test'))) +val_dataloader = dict(dataset=dict(ann_file="annotations/train_cocoformat.json")) +test_dataloader = dict(dataset=dict(ann_file="annotations/test_cocoformat.json", data_prefix=dict(img_path="test"))) # evaluator -test_evaluator = dict(format_only=True, outfile_prefix='./mot_17_test_res') +test_evaluator = dict(format_only=True, outfile_prefix="./mot_17_test_res") diff --git a/mmpose/configs/mmdet/deformable_detr/deformable-detr-refine-twostage_r50_16xb2-50e_coco.py b/mmpose/configs/mmdet/deformable_detr/deformable-detr-refine-twostage_r50_16xb2-50e_coco.py index eeb67fc98486cfd929a8177b9af6be3cdab9aa4b..1ff3acd6228a4d7ddfcbbbeb7952c404150847cb 100644 --- a/mmpose/configs/mmdet/deformable_detr/deformable-detr-refine-twostage_r50_16xb2-50e_coco.py +++ b/mmpose/configs/mmdet/deformable_detr/deformable-detr-refine-twostage_r50_16xb2-50e_coco.py @@ -1,2 +1,2 @@ -_base_ = 'deformable-detr-refine_r50_16xb2-50e_coco.py' +_base_ = "deformable-detr-refine_r50_16xb2-50e_coco.py" model = dict(as_two_stage=True) diff --git a/mmpose/configs/mmdet/deformable_detr/deformable-detr-refine_r50_16xb2-50e_coco.py b/mmpose/configs/mmdet/deformable_detr/deformable-detr-refine_r50_16xb2-50e_coco.py index b968674f4a9fc450803cdba018b0c4e9e6ca422a..f8ab9504fa83f8e7d2cab51f7d2a80a8bb7e9edf 100644 --- a/mmpose/configs/mmdet/deformable_detr/deformable-detr-refine_r50_16xb2-50e_coco.py +++ b/mmpose/configs/mmdet/deformable_detr/deformable-detr-refine_r50_16xb2-50e_coco.py @@ -1,2 +1,2 @@ -_base_ = 'deformable-detr_r50_16xb2-50e_coco.py' +_base_ = "deformable-detr_r50_16xb2-50e_coco.py" model = dict(with_box_refine=True) diff --git a/mmpose/configs/mmdet/deformable_detr/deformable-detr_r50_16xb2-50e_coco.py b/mmpose/configs/mmdet/deformable_detr/deformable-detr_r50_16xb2-50e_coco.py index e0dee411c8e27ab440ccc874e40f4207b24a21e7..99d922ea404ca7267761bcf4636127ea030b036c 100644 --- a/mmpose/configs/mmdet/deformable_detr/deformable-detr_r50_16xb2-50e_coco.py +++ b/mmpose/configs/mmdet/deformable_detr/deformable-detr_r50_16xb2-50e_coco.py @@ -1,154 +1,151 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/default_runtime.py"] model = dict( - type='DeformableDETR', + type="DeformableDETR", num_queries=300, num_feature_levels=4, with_box_refine=False, as_two_stage=False, data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=1), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=1 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), neck=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[512, 1024, 2048], kernel_size=1, out_channels=256, act_cfg=None, - norm_cfg=dict(type='GN', num_groups=32), - num_outs=4), + norm_cfg=dict(type="GN", num_groups=32), + num_outs=4, + ), encoder=dict( # DeformableDetrTransformerEncoder num_layers=6, layer_cfg=dict( # DeformableDetrTransformerEncoderLayer - self_attn_cfg=dict( # MultiScaleDeformableAttention - embed_dims=256, - batch_first=True), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=1024, ffn_drop=0.1))), + self_attn_cfg=dict(embed_dims=256, batch_first=True), # MultiScaleDeformableAttention + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.1), + ), + ), decoder=dict( # DeformableDetrTransformerDecoder num_layers=6, return_intermediate=True, layer_cfg=dict( # DeformableDetrTransformerDecoderLayer - self_attn_cfg=dict( # MultiheadAttention - embed_dims=256, - num_heads=8, - dropout=0.1, - batch_first=True), - cross_attn_cfg=dict( # MultiScaleDeformableAttention - embed_dims=256, - batch_first=True), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=1024, ffn_drop=0.1)), - post_norm_cfg=None), + self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.1, batch_first=True), # MultiheadAttention + cross_attn_cfg=dict(embed_dims=256, batch_first=True), # MultiScaleDeformableAttention + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.1), + ), + post_norm_cfg=None, + ), positional_encoding=dict(num_feats=128, normalize=True, offset=-0.5), bbox_head=dict( - type='DeformableDETRHead', + type="DeformableDETRHead", num_classes=80, sync_cls_avg_factor=True, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=2.0), - loss_bbox=dict(type='L1Loss', loss_weight=5.0), - loss_iou=dict(type='GIoULoss', loss_weight=2.0)), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=2.0), + loss_bbox=dict(type="L1Loss", loss_weight=5.0), + loss_iou=dict(type="GIoULoss", loss_weight=2.0), + ), # training and testing settings train_cfg=dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='FocalLossCost', weight=2.0), - dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'), - dict(type='IoUCost', iou_mode='giou', weight=2.0) - ])), - test_cfg=dict(max_per_img=100)) + dict(type="FocalLossCost", weight=2.0), + dict(type="BBoxL1Cost", weight=5.0, box_format="xywh"), + dict(type="IoUCost", iou_mode="giou", weight=2.0), + ], + ) + ), + test_cfg=dict(max_per_img=100), +) # train_pipeline, NOTE the img_scale and the Pad's size_divisor is different # from the default setting in mmdet. train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='PackDetInputs') + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="PackDetInputs"), ] -train_dataloader = dict( - dataset=dict( - filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline)) +train_dataloader = dict(dataset=dict(filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline)) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0002, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), paramwise_cfg=dict( - custom_keys={ - 'backbone': dict(lr_mult=0.1), - 'sampling_offsets': dict(lr_mult=0.1), - 'reference_points': dict(lr_mult=0.1) - })) + custom_keys={"backbone": dict(lr_mult=0.1), "sampling_offsets": dict(lr_mult=0.1), "reference_points": dict(lr_mult=0.1)} + ), +) # learning policy max_epochs = 50 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[40], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[40], gamma=0.1)] # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/detectors/cascade-rcnn_r50-rfp_1x_coco.py b/mmpose/configs/mmdet/detectors/cascade-rcnn_r50-rfp_1x_coco.py index c30c84d74cf68bc4369db16b6b2602626acb6fdf..2870bf39e768d55fb02ee57185a2412f9d824865 100644 --- a/mmpose/configs/mmdet/detectors/cascade-rcnn_r50-rfp_1x_coco.py +++ b/mmpose/configs/mmdet/detectors/cascade-rcnn_r50-rfp_1x_coco.py @@ -1,28 +1,29 @@ _base_ = [ - '../_base_/models/cascade-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( - backbone=dict( - type='DetectoRS_ResNet', - conv_cfg=dict(type='ConvAWS'), - output_img=True), + backbone=dict(type="DetectoRS_ResNet", conv_cfg=dict(type="ConvAWS"), output_img=True), neck=dict( - type='RFP', + type="RFP", rfp_steps=2, aspp_out_channels=64, aspp_dilations=(1, 3, 6, 1), rfp_backbone=dict( rfp_inplanes=256, - type='DetectoRS_ResNet', + type="DetectoRS_ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - conv_cfg=dict(type='ConvAWS'), - pretrained='torchvision://resnet50', - style='pytorch'))) + conv_cfg=dict(type="ConvAWS"), + pretrained="torchvision://resnet50", + style="pytorch", + ), + ), +) diff --git a/mmpose/configs/mmdet/detectors/cascade-rcnn_r50-sac_1x_coco.py b/mmpose/configs/mmdet/detectors/cascade-rcnn_r50-sac_1x_coco.py index 24d6cd3a95ecf262caac667cfcc32d6885fa5880..4a07ab97f3090ea5f411e16359eb17ebcfacf30d 100644 --- a/mmpose/configs/mmdet/detectors/cascade-rcnn_r50-sac_1x_coco.py +++ b/mmpose/configs/mmdet/detectors/cascade-rcnn_r50-sac_1x_coco.py @@ -1,12 +1,15 @@ _base_ = [ - '../_base_/models/cascade-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( backbone=dict( - type='DetectoRS_ResNet', - conv_cfg=dict(type='ConvAWS'), - sac=dict(type='SAC', use_deform=True), - stage_with_sac=(False, True, True, True))) + type="DetectoRS_ResNet", + conv_cfg=dict(type="ConvAWS"), + sac=dict(type="SAC", use_deform=True), + stage_with_sac=(False, True, True, True), + ) +) diff --git a/mmpose/configs/mmdet/detectors/detectors_cascade-rcnn_r50_1x_coco.py b/mmpose/configs/mmdet/detectors/detectors_cascade-rcnn_r50_1x_coco.py index 19d13d9c8c38b666b7481a58a641918b5d20e0ad..b30f58ae7d2691855c55c9dcd12d49203d4f06c6 100644 --- a/mmpose/configs/mmdet/detectors/detectors_cascade-rcnn_r50_1x_coco.py +++ b/mmpose/configs/mmdet/detectors/detectors_cascade-rcnn_r50_1x_coco.py @@ -1,32 +1,37 @@ _base_ = [ - '../_base_/models/cascade-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( backbone=dict( - type='DetectoRS_ResNet', - conv_cfg=dict(type='ConvAWS'), - sac=dict(type='SAC', use_deform=True), + type="DetectoRS_ResNet", + conv_cfg=dict(type="ConvAWS"), + sac=dict(type="SAC", use_deform=True), stage_with_sac=(False, True, True, True), - output_img=True), + output_img=True, + ), neck=dict( - type='RFP', + type="RFP", rfp_steps=2, aspp_out_channels=64, aspp_dilations=(1, 3, 6, 1), rfp_backbone=dict( rfp_inplanes=256, - type='DetectoRS_ResNet', + type="DetectoRS_ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - conv_cfg=dict(type='ConvAWS'), - sac=dict(type='SAC', use_deform=True), + conv_cfg=dict(type="ConvAWS"), + sac=dict(type="SAC", use_deform=True), stage_with_sac=(False, True, True, True), - pretrained='torchvision://resnet50', - style='pytorch'))) + pretrained="torchvision://resnet50", + style="pytorch", + ), + ), +) diff --git a/mmpose/configs/mmdet/detectors/detectors_htc-r101_20e_coco.py b/mmpose/configs/mmdet/detectors/detectors_htc-r101_20e_coco.py index 93d7d2b1adeb3fbdb7bac0107edf4433669e8015..5a03c44e471b7cfba572574156b1d9935cc5d065 100644 --- a/mmpose/configs/mmdet/detectors/detectors_htc-r101_20e_coco.py +++ b/mmpose/configs/mmdet/detectors/detectors_htc-r101_20e_coco.py @@ -1,28 +1,32 @@ -_base_ = '../htc/htc_r101_fpn_20e_coco.py' +_base_ = "../htc/htc_r101_fpn_20e_coco.py" model = dict( backbone=dict( - type='DetectoRS_ResNet', - conv_cfg=dict(type='ConvAWS'), - sac=dict(type='SAC', use_deform=True), + type="DetectoRS_ResNet", + conv_cfg=dict(type="ConvAWS"), + sac=dict(type="SAC", use_deform=True), stage_with_sac=(False, True, True, True), - output_img=True), + output_img=True, + ), neck=dict( - type='RFP', + type="RFP", rfp_steps=2, aspp_out_channels=64, aspp_dilations=(1, 3, 6, 1), rfp_backbone=dict( rfp_inplanes=256, - type='DetectoRS_ResNet', + type="DetectoRS_ResNet", depth=101, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - conv_cfg=dict(type='ConvAWS'), - sac=dict(type='SAC', use_deform=True), + conv_cfg=dict(type="ConvAWS"), + sac=dict(type="SAC", use_deform=True), stage_with_sac=(False, True, True, True), - pretrained='torchvision://resnet101', - style='pytorch'))) + pretrained="torchvision://resnet101", + style="pytorch", + ), + ), +) diff --git a/mmpose/configs/mmdet/detectors/detectors_htc-r50_1x_coco.py b/mmpose/configs/mmdet/detectors/detectors_htc-r50_1x_coco.py index 0d2fc4f77fcca715c1dfb613306d214b636aa0c0..169c648109e971aca044c4bb6afb594bd12b1cc3 100644 --- a/mmpose/configs/mmdet/detectors/detectors_htc-r50_1x_coco.py +++ b/mmpose/configs/mmdet/detectors/detectors_htc-r50_1x_coco.py @@ -1,28 +1,32 @@ -_base_ = '../htc/htc_r50_fpn_1x_coco.py' +_base_ = "../htc/htc_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='DetectoRS_ResNet', - conv_cfg=dict(type='ConvAWS'), - sac=dict(type='SAC', use_deform=True), + type="DetectoRS_ResNet", + conv_cfg=dict(type="ConvAWS"), + sac=dict(type="SAC", use_deform=True), stage_with_sac=(False, True, True, True), - output_img=True), + output_img=True, + ), neck=dict( - type='RFP', + type="RFP", rfp_steps=2, aspp_out_channels=64, aspp_dilations=(1, 3, 6, 1), rfp_backbone=dict( rfp_inplanes=256, - type='DetectoRS_ResNet', + type="DetectoRS_ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - conv_cfg=dict(type='ConvAWS'), - sac=dict(type='SAC', use_deform=True), + conv_cfg=dict(type="ConvAWS"), + sac=dict(type="SAC", use_deform=True), stage_with_sac=(False, True, True, True), - pretrained='torchvision://resnet50', - style='pytorch'))) + pretrained="torchvision://resnet50", + style="pytorch", + ), + ), +) diff --git a/mmpose/configs/mmdet/detectors/htc_r50-rfp_1x_coco.py b/mmpose/configs/mmdet/detectors/htc_r50-rfp_1x_coco.py index 496104e12550a1985f9c9e3748a343f69d7df6d8..0dc005f4bb8684602f00842626651567b6ee175e 100644 --- a/mmpose/configs/mmdet/detectors/htc_r50-rfp_1x_coco.py +++ b/mmpose/configs/mmdet/detectors/htc_r50-rfp_1x_coco.py @@ -1,24 +1,24 @@ -_base_ = '../htc/htc_r50_fpn_1x_coco.py' +_base_ = "../htc/htc_r50_fpn_1x_coco.py" model = dict( - backbone=dict( - type='DetectoRS_ResNet', - conv_cfg=dict(type='ConvAWS'), - output_img=True), + backbone=dict(type="DetectoRS_ResNet", conv_cfg=dict(type="ConvAWS"), output_img=True), neck=dict( - type='RFP', + type="RFP", rfp_steps=2, aspp_out_channels=64, aspp_dilations=(1, 3, 6, 1), rfp_backbone=dict( rfp_inplanes=256, - type='DetectoRS_ResNet', + type="DetectoRS_ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - conv_cfg=dict(type='ConvAWS'), - pretrained='torchvision://resnet50', - style='pytorch'))) + conv_cfg=dict(type="ConvAWS"), + pretrained="torchvision://resnet50", + style="pytorch", + ), + ), +) diff --git a/mmpose/configs/mmdet/detectors/htc_r50-sac_1x_coco.py b/mmpose/configs/mmdet/detectors/htc_r50-sac_1x_coco.py index 72d4db963ffd95851b945911b3db9941426583ab..23acc2739661cd4562c225df7d80876875bb3bff 100644 --- a/mmpose/configs/mmdet/detectors/htc_r50-sac_1x_coco.py +++ b/mmpose/configs/mmdet/detectors/htc_r50-sac_1x_coco.py @@ -1,8 +1,10 @@ -_base_ = '../htc/htc_r50_fpn_1x_coco.py' +_base_ = "../htc/htc_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='DetectoRS_ResNet', - conv_cfg=dict(type='ConvAWS'), - sac=dict(type='SAC', use_deform=True), - stage_with_sac=(False, True, True, True))) + type="DetectoRS_ResNet", + conv_cfg=dict(type="ConvAWS"), + sac=dict(type="SAC", use_deform=True), + stage_with_sac=(False, True, True, True), + ) +) diff --git a/mmpose/configs/mmdet/detr/detr_r101_8xb2-500e_coco.py b/mmpose/configs/mmdet/detr/detr_r101_8xb2-500e_coco.py index 6661aacdc54e889aa38b2e759c40fd9797ae44ad..db75d7ccb2bcf70920a564f7a1c3f6d270d37d5a 100644 --- a/mmpose/configs/mmdet/detr/detr_r101_8xb2-500e_coco.py +++ b/mmpose/configs/mmdet/detr/detr_r101_8xb2-500e_coco.py @@ -1,7 +1,3 @@ -_base_ = './detr_r50_8xb2-500e_coco.py' +_base_ = "./detr_r50_8xb2-500e_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/detr/detr_r18_8xb2-500e_coco.py b/mmpose/configs/mmdet/detr/detr_r18_8xb2-500e_coco.py index 305b9d6fee8d75273b588f32b2e21582473cb137..f0ab6ecc8cb729a780d404ccb32a8294518ef854 100644 --- a/mmpose/configs/mmdet/detr/detr_r18_8xb2-500e_coco.py +++ b/mmpose/configs/mmdet/detr/detr_r18_8xb2-500e_coco.py @@ -1,7 +1,3 @@ -_base_ = './detr_r50_8xb2-500e_coco.py' +_base_ = "./detr_r50_8xb2-500e_coco.py" -model = dict( - backbone=dict( - depth=18, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18')), - neck=dict(in_channels=[512])) +model = dict(backbone=dict(depth=18, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet18")), neck=dict(in_channels=[512])) diff --git a/mmpose/configs/mmdet/detr/detr_r50_8xb2-150e_coco.py b/mmpose/configs/mmdet/detr/detr_r50_8xb2-150e_coco.py index aaa15410532e552cae387ef4eaa57227af1d855d..def744c5b1ecc64f13428d6dd0f1ae581abb1b97 100644 --- a/mmpose/configs/mmdet/detr/detr_r50_8xb2-150e_coco.py +++ b/mmpose/configs/mmdet/detr/detr_r50_8xb2-150e_coco.py @@ -1,153 +1,131 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/default_runtime.py"] model = dict( - type='DETR', + type="DETR", num_queries=100, data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=1), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=1 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, - out_indices=(3, ), + out_indices=(3,), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='ChannelMapper', - in_channels=[2048], - kernel_size=1, - out_channels=256, - act_cfg=None, - norm_cfg=None, - num_outs=1), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="ChannelMapper", in_channels=[2048], kernel_size=1, out_channels=256, act_cfg=None, norm_cfg=None, num_outs=1), encoder=dict( # DetrTransformerEncoder num_layers=6, layer_cfg=dict( # DetrTransformerEncoderLayer - self_attn_cfg=dict( # MultiheadAttention - embed_dims=256, - num_heads=8, - dropout=0.1, - batch_first=True), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, - num_fcs=2, - ffn_drop=0.1, - act_cfg=dict(type='ReLU', inplace=True)))), + self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.1, batch_first=True), # MultiheadAttention + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, num_fcs=2, ffn_drop=0.1, act_cfg=dict(type="ReLU", inplace=True)), + ), + ), decoder=dict( # DetrTransformerDecoder num_layers=6, layer_cfg=dict( # DetrTransformerDecoderLayer - self_attn_cfg=dict( # MultiheadAttention - embed_dims=256, - num_heads=8, - dropout=0.1, - batch_first=True), - cross_attn_cfg=dict( # MultiheadAttention - embed_dims=256, - num_heads=8, - dropout=0.1, - batch_first=True), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, - num_fcs=2, - ffn_drop=0.1, - act_cfg=dict(type='ReLU', inplace=True))), - return_intermediate=True), + self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.1, batch_first=True), # MultiheadAttention + cross_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.1, batch_first=True), # MultiheadAttention + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, num_fcs=2, ffn_drop=0.1, act_cfg=dict(type="ReLU", inplace=True)), + ), + return_intermediate=True, + ), positional_encoding=dict(num_feats=128, normalize=True), bbox_head=dict( - type='DETRHead', + type="DETRHead", num_classes=80, embed_dims=256, - loss_cls=dict( - type='CrossEntropyLoss', - bg_cls_weight=0.1, - use_sigmoid=False, - loss_weight=1.0, - class_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=5.0), - loss_iou=dict(type='GIoULoss', loss_weight=2.0)), + loss_cls=dict(type="CrossEntropyLoss", bg_cls_weight=0.1, use_sigmoid=False, loss_weight=1.0, class_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=5.0), + loss_iou=dict(type="GIoULoss", loss_weight=2.0), + ), # training and testing settings train_cfg=dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='ClassificationCost', weight=1.), - dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'), - dict(type='IoUCost', iou_mode='giou', weight=2.0) - ])), - test_cfg=dict(max_per_img=100)) + dict(type="ClassificationCost", weight=1.0), + dict(type="BBoxL1Cost", weight=5.0, box_format="xywh"), + dict(type="IoUCost", iou_mode="giou", weight=2.0), + ], + ) + ), + test_cfg=dict(max_per_img=100), +) # train_pipeline, NOTE the img_scale and the Pad's size_divisor is different # from the default setting in mmdet. train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', - transforms=[[ - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoice", + transforms=[ + [ + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) + ], + [ + dict(type="RandomChoiceResize", scales=[(400, 1333), (500, 1333), (600, 1333)], keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], ], - [ - dict( - type='RandomChoiceResize', - scales=[(400, 1333), (500, 1333), (600, 1333)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), - (576, 1333), (608, 1333), (640, 1333), - (672, 1333), (704, 1333), (736, 1333), - (768, 1333), (800, 1333)], - keep_ratio=True) - ]]), - dict(type='PackDetInputs') + ), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0001, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0001, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), - paramwise_cfg=dict( - custom_keys={'backbone': dict(lr_mult=0.1, decay_mult=1.0)})) + paramwise_cfg=dict(custom_keys={"backbone": dict(lr_mult=0.1, decay_mult=1.0)}), +) # learning policy max_epochs = 150 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[100], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[100], gamma=0.1)] # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/detr/detr_r50_8xb2-500e_coco.py b/mmpose/configs/mmdet/detr/detr_r50_8xb2-500e_coco.py index f07d5dce05b08c74aea2059989b45d5d275c53e0..d14729d2288c32ec6611e9449eaf61248ccaefc1 100644 --- a/mmpose/configs/mmdet/detr/detr_r50_8xb2-500e_coco.py +++ b/mmpose/configs/mmdet/detr/detr_r50_8xb2-500e_coco.py @@ -1,19 +1,10 @@ -_base_ = './detr_r50_8xb2-150e_coco.py' +_base_ = "./detr_r50_8xb2-150e_coco.py" # learning policy max_epochs = 500 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=10) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=10) -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[334], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[334], gamma=0.1)] # only keep latest 2 checkpoints default_hooks = dict(checkpoint=dict(max_keep_ckpts=2)) diff --git a/mmpose/configs/mmdet/dino/dino-4scale_r50_8xb2-12e_coco.py b/mmpose/configs/mmdet/dino/dino-4scale_r50_8xb2-12e_coco.py index 5831f898b4a706accb2b828b6194b2974e78d0fc..700b73ac68aed6b2b372bc90c6e95c71bb5dabbb 100644 --- a/mmpose/configs/mmdet/dino/dino-4scale_r50_8xb2-12e_coco.py +++ b/mmpose/configs/mmdet/dino/dino-4scale_r50_8xb2-12e_coco.py @@ -1,161 +1,152 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/default_runtime.py"] model = dict( - type='DINO', + type="DINO", num_queries=900, # num_matching_queries with_box_refine=True, as_two_stage=True, data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=1), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=1 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), neck=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[512, 1024, 2048], kernel_size=1, out_channels=256, act_cfg=None, - norm_cfg=dict(type='GN', num_groups=32), - num_outs=4), + norm_cfg=dict(type="GN", num_groups=32), + num_outs=4, + ), encoder=dict( num_layers=6, layer_cfg=dict( - self_attn_cfg=dict(embed_dims=256, num_levels=4, - dropout=0.0), # 0.1 for DeformDETR - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, # 1024 for DeformDETR - ffn_drop=0.0))), # 0.1 for DeformDETR + self_attn_cfg=dict(embed_dims=256, num_levels=4, dropout=0.0), # 0.1 for DeformDETR + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), # 1024 for DeformDETR + ), + ), # 0.1 for DeformDETR decoder=dict( num_layers=6, return_intermediate=True, layer_cfg=dict( - self_attn_cfg=dict(embed_dims=256, num_heads=8, - dropout=0.0), # 0.1 for DeformDETR - cross_attn_cfg=dict(embed_dims=256, num_levels=4, - dropout=0.0), # 0.1 for DeformDETR - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, # 1024 for DeformDETR - ffn_drop=0.0)), # 0.1 for DeformDETR - post_norm_cfg=None), - positional_encoding=dict( - num_feats=128, - normalize=True, - offset=0.0, # -0.5 for DeformDETR - temperature=20), # 10000 for DeformDETR + self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), # 0.1 for DeformDETR + cross_attn_cfg=dict(embed_dims=256, num_levels=4, dropout=0.0), # 0.1 for DeformDETR + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), # 1024 for DeformDETR + ), # 0.1 for DeformDETR + post_norm_cfg=None, + ), + positional_encoding=dict(num_feats=128, normalize=True, offset=0.0, temperature=20), # -0.5 for DeformDETR # 10000 for DeformDETR bbox_head=dict( - type='DINOHead', + type="DINOHead", num_classes=80, sync_cls_avg_factor=True, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), # 2.0 in DeformDETR - loss_bbox=dict(type='L1Loss', loss_weight=5.0), - loss_iou=dict(type='GIoULoss', loss_weight=2.0)), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), # 2.0 in DeformDETR + loss_bbox=dict(type="L1Loss", loss_weight=5.0), + loss_iou=dict(type="GIoULoss", loss_weight=2.0), + ), dn_cfg=dict( # TODO: Move to model.train_cfg ? - label_noise_scale=0.5, - box_noise_scale=1.0, # 0.4 for DN-DETR - group_cfg=dict(dynamic=True, num_groups=None, - num_dn_queries=100)), # TODO: half num_dn_queries + label_noise_scale=0.5, box_noise_scale=1.0, group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100) # 0.4 for DN-DETR + ), # TODO: half num_dn_queries # training and testing settings train_cfg=dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='FocalLossCost', weight=2.0), - dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'), - dict(type='IoUCost', iou_mode='giou', weight=2.0) - ])), - test_cfg=dict(max_per_img=300)) # 100 for DeformDETR + dict(type="FocalLossCost", weight=2.0), + dict(type="BBoxL1Cost", weight=5.0, box_format="xywh"), + dict(type="IoUCost", iou_mode="giou", weight=2.0), + ], + ) + ), + test_cfg=dict(max_per_img=300), +) # 100 for DeformDETR # train_pipeline, NOTE the img_scale and the Pad's size_divisor is different # from the default setting in mmdet. train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='PackDetInputs') + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="PackDetInputs"), ] -train_dataloader = dict( - dataset=dict( - filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline)) +train_dataloader = dict(dataset=dict(filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline)) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict( - type='AdamW', - lr=0.0001, # 0.0002 for DeformDETR - weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0001, weight_decay=0.0001), # 0.0002 for DeformDETR clip_grad=dict(max_norm=0.1, norm_type=2), - paramwise_cfg=dict(custom_keys={'backbone': dict(lr_mult=0.1)}) + paramwise_cfg=dict(custom_keys={"backbone": dict(lr_mult=0.1)}), ) # custom_keys contains sampling_offsets and reference_points in DeformDETR # noqa # learning policy max_epochs = 12 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[11], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[11], gamma=0.1)] # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/dino/dino-4scale_r50_8xb2-24e_coco.py b/mmpose/configs/mmdet/dino/dino-4scale_r50_8xb2-24e_coco.py index 8534ac6a7ccc7f3f8c081275b3567a0a0792b7a5..94abd3e572e53755f7640a47e00f6655f1ecc018 100644 --- a/mmpose/configs/mmdet/dino/dino-4scale_r50_8xb2-24e_coco.py +++ b/mmpose/configs/mmdet/dino/dino-4scale_r50_8xb2-24e_coco.py @@ -1,13 +1,4 @@ -_base_ = './dino-4scale_r50_8xb2-12e_coco.py' +_base_ = "./dino-4scale_r50_8xb2-12e_coco.py" max_epochs = 24 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[20], - gamma=0.1) -] +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[20], gamma=0.1)] diff --git a/mmpose/configs/mmdet/dino/dino-4scale_r50_8xb2-36e_coco.py b/mmpose/configs/mmdet/dino/dino-4scale_r50_8xb2-36e_coco.py index 1c2cf4602d358dfed5b737f8a74843c89a54702d..931b4343a559fd2929f1128c89124c57dab3a976 100644 --- a/mmpose/configs/mmdet/dino/dino-4scale_r50_8xb2-36e_coco.py +++ b/mmpose/configs/mmdet/dino/dino-4scale_r50_8xb2-36e_coco.py @@ -1,13 +1,4 @@ -_base_ = './dino-4scale_r50_8xb2-12e_coco.py' +_base_ = "./dino-4scale_r50_8xb2-12e_coco.py" max_epochs = 36 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[30], - gamma=0.1) -] +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[30], gamma=0.1)] diff --git a/mmpose/configs/mmdet/dino/dino-4scale_r50_improved_8xb2-12e_coco.py b/mmpose/configs/mmdet/dino/dino-4scale_r50_improved_8xb2-12e_coco.py index 6a4a82bacc1f1e990d4720db81cae0af5c012557..118d420048f0e001b90d99c98ce11c7eef44e3b9 100644 --- a/mmpose/configs/mmdet/dino/dino-4scale_r50_improved_8xb2-12e_coco.py +++ b/mmpose/configs/mmdet/dino/dino-4scale_r50_improved_8xb2-12e_coco.py @@ -1,18 +1,17 @@ -_base_ = ['dino-4scale_r50_8xb2-12e_coco.py'] +_base_ = ["dino-4scale_r50_8xb2-12e_coco.py"] # from deformable detr hyper model = dict( backbone=dict(frozen_stages=-1), bbox_head=dict(loss_cls=dict(loss_weight=2.0)), positional_encoding=dict(offset=-0.5, temperature=10000), - dn_cfg=dict(group_cfg=dict(num_dn_queries=300))) + dn_cfg=dict(group_cfg=dict(num_dn_queries=300)), +) # optimizer optim_wrapper = dict( optimizer=dict(lr=0.0002), paramwise_cfg=dict( - custom_keys={ - 'backbone': dict(lr_mult=0.1), - 'sampling_offsets': dict(lr_mult=0.1), - 'reference_points': dict(lr_mult=0.1) - })) + custom_keys={"backbone": dict(lr_mult=0.1), "sampling_offsets": dict(lr_mult=0.1), "reference_points": dict(lr_mult=0.1)} + ), +) diff --git a/mmpose/configs/mmdet/dino/dino-5scale_swin-l_8xb2-12e_coco.py b/mmpose/configs/mmdet/dino/dino-5scale_swin-l_8xb2-12e_coco.py index 3d39f22f50926a11137d143976fe4033ec3a8640..982afc59fd29e15241e996be7bb99a887b6dcdff 100644 --- a/mmpose/configs/mmdet/dino/dino-5scale_swin-l_8xb2-12e_coco.py +++ b/mmpose/configs/mmdet/dino/dino-5scale_swin-l_8xb2-12e_coco.py @@ -1,12 +1,12 @@ -_base_ = './dino-4scale_r50_8xb2-12e_coco.py' +_base_ = "./dino-4scale_r50_8xb2-12e_coco.py" -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth" # noqa num_levels = 5 model = dict( num_feature_levels=num_levels, backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", pretrain_img_size=384, embed_dims=192, depths=[2, 2, 18, 2], @@ -15,8 +15,8 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=(0, 1, 2, 3), @@ -24,7 +24,9 @@ model = dict( # in FPN, otherwise some parameter will not be used with_cp=True, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), neck=dict(in_channels=[192, 384, 768, 1536], num_outs=num_levels), encoder=dict(layer_cfg=dict(self_attn_cfg=dict(num_levels=num_levels))), - decoder=dict(layer_cfg=dict(cross_attn_cfg=dict(num_levels=num_levels)))) + decoder=dict(layer_cfg=dict(cross_attn_cfg=dict(num_levels=num_levels))), +) diff --git a/mmpose/configs/mmdet/dino/dino-5scale_swin-l_8xb2-36e_coco.py b/mmpose/configs/mmdet/dino/dino-5scale_swin-l_8xb2-36e_coco.py index d55a38e61d411892c6de819cf46247ba4d41d427..602b50adced392b6c3038e959b3e04d7ba41b1c5 100644 --- a/mmpose/configs/mmdet/dino/dino-5scale_swin-l_8xb2-36e_coco.py +++ b/mmpose/configs/mmdet/dino/dino-5scale_swin-l_8xb2-36e_coco.py @@ -1,13 +1,4 @@ -_base_ = './dino-5scale_swin-l_8xb2-12e_coco.py' +_base_ = "./dino-5scale_swin-l_8xb2-12e_coco.py" max_epochs = 36 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[27, 33], - gamma=0.1) -] +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[27, 33], gamma=0.1)] diff --git a/mmpose/configs/mmdet/double_heads/dh-faster-rcnn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/double_heads/dh-faster-rcnn_r50_fpn_1x_coco.py index 6b9b6e69a12d978a55fbba049fc2b1c5229c1fc5..ee74dc7227f5515f1fee4003869c68ff09e67444 100644 --- a/mmpose/configs/mmdet/double_heads/dh-faster-rcnn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/double_heads/dh-faster-rcnn_r50_fpn_1x_coco.py @@ -1,11 +1,11 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" model = dict( roi_head=dict( - type='DoubleHeadRoIHead', + type="DoubleHeadRoIHead", reg_roi_scale_factor=1.3, bbox_head=dict( _delete_=True, - type='DoubleConvFCBBoxHead', + type="DoubleConvFCBBoxHead", num_convs=4, num_fcs=2, in_channels=256, @@ -13,11 +13,10 @@ model = dict( fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=2.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=2.0)))) + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=2.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=2.0), + ), + ) +) diff --git a/mmpose/configs/mmdet/dsdl/coco.py b/mmpose/configs/mmdet/dsdl/coco.py index 3c9e895e53c1588028cf6def2fe79d49fd98d6e1..c292ac6185cadaafecf023ba70c3cc248595f86a 100644 --- a/mmpose/configs/mmdet/dsdl/coco.py +++ b/mmpose/configs/mmdet/dsdl/coco.py @@ -1,18 +1,19 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py', - '../_base_/datasets/dsdl.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", + "../_base_/datasets/dsdl.py", ] # dsdl dataset settings # please visit our platform [OpenDataLab](https://opendatalab.com/) # to downloaded dsdl dataset. -data_root = 'data/COCO2017' -img_prefix = 'original' -train_ann = 'dsdl/set-train/train.yaml' -val_ann = 'dsdl/set-val/val.yaml' -specific_key_path = dict(ignore_flag='./annotations/*/iscrowd') +data_root = "data/COCO2017" +img_prefix = "original" +train_ann = "dsdl/set-train/train.yaml" +val_ann = "dsdl/set-val/val.yaml" +specific_key_path = dict(ignore_flag="./annotations/*/iscrowd") train_dataloader = dict( dataset=dict( @@ -21,7 +22,8 @@ train_dataloader = dict( ann_file=train_ann, data_prefix=dict(img_path=img_prefix), filter_cfg=dict(filter_empty_gt=True, min_size=32, bbox_min_size=32), - )) + ) +) val_dataloader = dict( dataset=dict( @@ -29,5 +31,6 @@ val_dataloader = dict( data_root=data_root, ann_file=val_ann, data_prefix=dict(img_path=img_prefix), - )) + ) +) test_dataloader = val_dataloader diff --git a/mmpose/configs/mmdet/dsdl/coco_instance.py b/mmpose/configs/mmdet/dsdl/coco_instance.py index e34f93c97f55f5eeef55f9de73f1a8389f8980c6..d9bab48d1dd4260b40415824ec8d03fa1c39ebf2 100644 --- a/mmpose/configs/mmdet/dsdl/coco_instance.py +++ b/mmpose/configs/mmdet/dsdl/coco_instance.py @@ -1,36 +1,34 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py', - '../_base_/datasets/dsdl.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", + "../_base_/datasets/dsdl.py", ] # dsdl dataset settings. # please visit our platform [OpenDataLab](https://opendatalab.com/) # to downloaded dsdl dataset. -data_root = 'data/COCO2017' -img_prefix = 'original' -train_ann = 'dsdl/set-train/train.yaml' -val_ann = 'dsdl/set-val/val.yaml' -specific_key_path = dict(ignore_flag='./annotations/*/iscrowd') +data_root = "data/COCO2017" +img_prefix = "original" +train_ann = "dsdl/set-train/train.yaml" +val_ann = "dsdl/set-val/val.yaml" +specific_key_path = dict(ignore_flag="./annotations/*/iscrowd") backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'instances')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "instances")), ] train_dataloader = dict( @@ -42,7 +40,8 @@ train_dataloader = dict( data_prefix=dict(img_path=img_prefix), filter_cfg=dict(filter_empty_gt=True, min_size=32, bbox_min_size=32), pipeline=train_pipeline, - )) + ) +) val_dataloader = dict( dataset=dict( @@ -52,11 +51,11 @@ val_dataloader = dict( ann_file=val_ann, data_prefix=dict(img_path=img_prefix), pipeline=test_pipeline, - )) + ) +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoMetric', metric=['bbox', 'segm'], format_only=False) +val_evaluator = dict(type="CocoMetric", metric=["bbox", "segm"], format_only=False) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/dsdl/objects365v2.py b/mmpose/configs/mmdet/dsdl/objects365v2.py index d25a2323027c22eaf9777f6e62e4992880b29d2c..3d84f8e0730c571fb956d4b832d08bef380ee1c2 100644 --- a/mmpose/configs/mmdet/dsdl/objects365v2.py +++ b/mmpose/configs/mmdet/dsdl/objects365v2.py @@ -1,7 +1,8 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py', - '../_base_/datasets/dsdl.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", + "../_base_/datasets/dsdl.py", ] model = dict(roi_head=dict(bbox_head=dict(num_classes=365))) @@ -10,11 +11,11 @@ model = dict(roi_head=dict(bbox_head=dict(num_classes=365))) # please visit our platform [OpenDataLab](https://opendatalab.com/) # to downloaded dsdl dataset. -data_root = 'data/Objects365' -img_prefix = 'original' -train_ann = 'dsdl/set-train/train.yaml' -val_ann = 'dsdl/set-val/val.yaml' -specific_key_path = dict(ignore_flag='./annotations/*/iscrowd') +data_root = "data/Objects365" +img_prefix = "original" +train_ann = "dsdl/set-train/train.yaml" +val_ann = "dsdl/set-val/val.yaml" +specific_key_path = dict(ignore_flag="./annotations/*/iscrowd") train_dataloader = dict( dataset=dict( @@ -23,7 +24,8 @@ train_dataloader = dict( ann_file=train_ann, data_prefix=dict(img_path=img_prefix), filter_cfg=dict(filter_empty_gt=True, min_size=32, bbox_min_size=32), - )) + ) +) val_dataloader = dict( dataset=dict( @@ -32,23 +34,17 @@ val_dataloader = dict( ann_file=val_ann, data_prefix=dict(img_path=img_prefix), test_mode=True, - )) + ) +) test_dataloader = val_dataloader -default_hooks = dict(logger=dict(type='LoggerHook', interval=1000), ) -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=3, val_interval=1) +default_hooks = dict( + logger=dict(type="LoggerHook", interval=1000), +) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=3, val_interval=1) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[1, 2], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[1, 2], gamma=0.1), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/dsdl/openimagesv6.py b/mmpose/configs/mmdet/dsdl/openimagesv6.py index a65f942a0d4f8cfdaa3cfb712276d6de34d62a84..c7e07bd60e7fa34f88719882dfd2ebe27273526b 100644 --- a/mmpose/configs/mmdet/dsdl/openimagesv6.py +++ b/mmpose/configs/mmdet/dsdl/openimagesv6.py @@ -1,7 +1,7 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/schedules/schedule_1x.py', - '../_base_/default_runtime.py', + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict(roi_head=dict(bbox_head=dict(num_classes=601))) @@ -10,39 +10,36 @@ model = dict(roi_head=dict(bbox_head=dict(num_classes=601))) # please visit our platform [OpenDataLab](https://opendatalab.com/) # to downloaded dsdl dataset. -dataset_type = 'DSDLDetDataset' -data_root = 'data/OpenImages' -train_ann = 'dsdl/set-train/train.yaml' -val_ann = 'dsdl/set-val/val.yaml' +dataset_type = "DSDLDetDataset" +data_root = "data/OpenImages" +train_ann = "dsdl/set-train/train.yaml" +val_ann = "dsdl/set-val/val.yaml" specific_key_path = dict( - image_level_labels='./image_labels/*/label', - Label='./objects/*/label', - is_group_of='./objects/*/isgroupof', + image_level_labels="./image_labels/*/label", + Label="./objects/*/label", + is_group_of="./objects/*/isgroupof", ) -backend_args = dict( - backend='petrel', - path_mapping=dict({'data/': 's3://open_dataset_original/'})) +backend_args = dict(backend="petrel", path_mapping=dict({"data/": "s3://open_dataset_original/"})) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1024, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1024, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1024, 800), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1024, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'instances', 'image_level_labels')) + type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "instances", "image_level_labels") + ), ] train_dataloader = dict( - sampler=dict(type='ClassAwareSampler', num_sample_class=1), + sampler=dict(type="ClassAwareSampler", num_sample_class=1), dataset=dict( type=dataset_type, with_imagelevel_label=True, @@ -51,7 +48,9 @@ train_dataloader = dict( data_root=data_root, ann_file=train_ann, filter_cfg=dict(filter_empty_gt=True, min_size=32, bbox_min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + ), +) val_dataloader = dict( dataset=dict( @@ -62,33 +61,23 @@ val_dataloader = dict( data_root=data_root, ann_file=val_ann, test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + ) +) test_dataloader = val_dataloader -default_hooks = dict(logger=dict(type='LoggerHook', interval=1000), ) -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=3, val_interval=1) +default_hooks = dict( + logger=dict(type="LoggerHook", interval=1000), +) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=3, val_interval=1) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[1, 2], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[1, 2], gamma=0.1), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) -val_evaluator = dict( - type='OpenImagesMetric', - iou_thrs=0.5, - ioa_thrs=0.5, - use_group_of=True, - get_supercategory=True) +val_evaluator = dict(type="OpenImagesMetric", iou_thrs=0.5, ioa_thrs=0.5, use_group_of=True, get_supercategory=True) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/dsdl/voc07.py b/mmpose/configs/mmdet/dsdl/voc07.py index b7b864714e4987ca9d31eda5fee746e741b7aa10..64ea0953410eda981cb152a6329c106d7d66322c 100644 --- a/mmpose/configs/mmdet/dsdl/voc07.py +++ b/mmpose/configs/mmdet/dsdl/voc07.py @@ -1,6 +1,4 @@ -_base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/models/faster-rcnn_r50_fpn.py", "../_base_/default_runtime.py"] # model setting model = dict(roi_head=dict(bbox_head=dict(num_classes=20))) @@ -9,32 +7,29 @@ model = dict(roi_head=dict(bbox_head=dict(num_classes=20))) # please visit our platform [OpenDataLab](https://opendatalab.com/) # to downloaded dsdl dataset. -dataset_type = 'DSDLDetDataset' -data_root = 'data/VOC07-det' -img_prefix = 'original' -train_ann = 'dsdl/set-train/train.yaml' -val_ann = 'dsdl/set-test/test.yaml' +dataset_type = "DSDLDetDataset" +data_root = "data/VOC07-det" +img_prefix = "original" +train_ann = "dsdl/set-train/train.yaml" +val_ann = "dsdl/set-test/test.yaml" -specific_key_path = dict(ignore_flag='./objects/*/difficult') +specific_key_path = dict(ignore_flag="./objects/*/difficult") backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1000, 600), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1000, 600), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1000, 600), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1000, 600), keep_ratio=True), # avoid bboxes being resized - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'instances')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "instances")), ] train_dataloader = dict( dataset=dict( @@ -44,7 +39,9 @@ train_dataloader = dict( ann_file=train_ann, data_prefix=dict(img_path=img_prefix), filter_cfg=dict(filter_empty_gt=True, min_size=32, bbox_min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + ) +) val_dataloader = dict( dataset=dict( @@ -54,38 +51,29 @@ val_dataloader = dict( ann_file=val_ann, data_prefix=dict(img_path=img_prefix), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + ) +) test_dataloader = val_dataloader # Pascal VOC2007 uses `11points` as default evaluate mode, while PASCAL # VOC2012 defaults to use 'area'. -val_evaluator = dict(type='VOCMetric', metric='mAP', eval_mode='11points') +val_evaluator = dict(type="VOCMetric", metric="mAP", eval_mode="11points") # val_evaluator = dict(type='CocoMetric', metric='bbox') test_evaluator = val_evaluator # training schedule, voc dataset is repeated 3 times, in # `_base_/datasets/voc0712.py`, so the actual epoch = 4 * 3 = 12 max_epochs = 12 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=3) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=3) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[9], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[9], gamma=0.1)] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically diff --git a/mmpose/configs/mmdet/dsdl/voc0712.py b/mmpose/configs/mmdet/dsdl/voc0712.py index 9ec1bb8f98e56d0402c9a80934c3b77bd7919fa4..7f973d60462036b11f625f85b4bb70dd92a04b06 100644 --- a/mmpose/configs/mmdet/dsdl/voc0712.py +++ b/mmpose/configs/mmdet/dsdl/voc0712.py @@ -1,7 +1,7 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/schedules/schedule_1x.py', - '../_base_/default_runtime.py', + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", # '../_base_/datasets/dsdl.py' ] @@ -12,42 +12,41 @@ model = dict(roi_head=dict(bbox_head=dict(num_classes=20))) # please visit our platform [OpenDataLab](https://opendatalab.com/) # to downloaded dsdl dataset. -dataset_type = 'DSDLDetDataset' -data_root_07 = 'data/VOC07-det' -data_root_12 = 'data/VOC12-det' -img_prefix = 'original' +dataset_type = "DSDLDetDataset" +data_root_07 = "data/VOC07-det" +data_root_12 = "data/VOC12-det" +img_prefix = "original" -train_ann = 'dsdl/set-train/train.yaml' -val_ann = 'dsdl/set-val/val.yaml' -test_ann = 'dsdl/set-test/test.yaml' +train_ann = "dsdl/set-train/train.yaml" +val_ann = "dsdl/set-val/val.yaml" +test_ann = "dsdl/set-test/test.yaml" backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1000, 600), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1000, 600), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(1000, 600), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(1000, 600), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'instances')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "instances")), ] -specific_key_path = dict(ignore_flag='./objects/*/difficult', ) +specific_key_path = dict( + ignore_flag="./objects/*/difficult", +) train_dataloader = dict( dataset=dict( - type='RepeatDataset', + type="RepeatDataset", times=3, dataset=dict( - type='ConcatDataset', + type="ConcatDataset", datasets=[ dict( type=dataset_type, @@ -55,37 +54,40 @@ train_dataloader = dict( data_root=data_root_07, ann_file=train_ann, data_prefix=dict(img_path=img_prefix), - filter_cfg=dict( - filter_empty_gt=True, min_size=32, bbox_min_size=32), - pipeline=train_pipeline), + filter_cfg=dict(filter_empty_gt=True, min_size=32, bbox_min_size=32), + pipeline=train_pipeline, + ), dict( type=dataset_type, specific_key_path=specific_key_path, data_root=data_root_07, ann_file=val_ann, data_prefix=dict(img_path=img_prefix), - filter_cfg=dict( - filter_empty_gt=True, min_size=32, bbox_min_size=32), - pipeline=train_pipeline), + filter_cfg=dict(filter_empty_gt=True, min_size=32, bbox_min_size=32), + pipeline=train_pipeline, + ), dict( type=dataset_type, specific_key_path=specific_key_path, data_root=data_root_12, ann_file=train_ann, data_prefix=dict(img_path=img_prefix), - filter_cfg=dict( - filter_empty_gt=True, min_size=32, bbox_min_size=32), - pipeline=train_pipeline), + filter_cfg=dict(filter_empty_gt=True, min_size=32, bbox_min_size=32), + pipeline=train_pipeline, + ), dict( type=dataset_type, specific_key_path=specific_key_path, data_root=data_root_12, ann_file=val_ann, data_prefix=dict(img_path=img_prefix), - filter_cfg=dict( - filter_empty_gt=True, min_size=32, bbox_min_size=32), - pipeline=train_pipeline), - ]))) + filter_cfg=dict(filter_empty_gt=True, min_size=32, bbox_min_size=32), + pipeline=train_pipeline, + ), + ], + ), + ) +) val_dataloader = dict( dataset=dict( @@ -94,36 +96,27 @@ val_dataloader = dict( data_root=data_root_07, ann_file=test_ann, test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + ) +) test_dataloader = val_dataloader -val_evaluator = dict(type='CocoMetric', metric='bbox') +val_evaluator = dict(type="CocoMetric", metric="bbox") # val_evaluator = dict(type='VOCMetric', metric='mAP', eval_mode='11points') test_evaluator = val_evaluator # training schedule, voc dataset is repeated 3 times, in # `_base_/datasets/voc0712.py`, so the actual epoch = 4 * 3 = 12 max_epochs = 4 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[3], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[3], gamma=0.1)] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically diff --git a/mmpose/configs/mmdet/dyhead/atss_r50-caffe_fpn_dyhead_1x_coco.py b/mmpose/configs/mmdet/dyhead/atss_r50-caffe_fpn_dyhead_1x_coco.py index 8716f1226cb0b37435d0318d62599a74e6126f19..534c3a38334febcc723099102056b13bb285a734 100644 --- a/mmpose/configs/mmdet/dyhead/atss_r50-caffe_fpn_dyhead_1x_coco.py +++ b/mmpose/configs/mmdet/dyhead/atss_r50-caffe_fpn_dyhead_1x_coco.py @@ -1,101 +1,66 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] model = dict( - type='ATSS', + type="ATSS", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=128), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=128 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), neck=[ + dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5), dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5), - dict( - type='DyHead', + type="DyHead", in_channels=256, out_channels=256, num_blocks=6, # disable zero_init_offset to follow official implementation - zero_init_offset=False) + zero_init_offset=False, + ), ], bbox_head=dict( - type='ATSSHead', + type="ATSSHead", num_classes=80, in_channels=256, pred_kernel_size=1, # follow DyHead official implementation stacked_convs=0, feat_channels=256, anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128], - center_offset=0.5), # follow DyHead official implementation - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=2.0), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128], center_offset=0.5 + ), # follow DyHead official implementation + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), # training and testing settings - train_cfg=dict( - assigner=dict(type='ATSSAssigner', topk=9), - allowed_border=-1, - pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + train_cfg=dict(assigner=dict(type="ATSSAssigner", topk=9), allowed_border=-1, pos_weight=-1, debug=False), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # optimizer optim_wrapper = dict(optimizer=dict(lr=0.01)) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True, backend='pillow'), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True, backend="pillow"), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(1333, 800), keep_ratio=True, backend='pillow'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(1333, 800), keep_ratio=True, backend="pillow"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/dyhead/atss_r50_fpn_dyhead_1x_coco.py b/mmpose/configs/mmdet/dyhead/atss_r50_fpn_dyhead_1x_coco.py index 89e89b98ca437bb13fe5d01acc05cfdcd04e8fa0..5ef337bc7a044284d6773208e11fa9d8c45e61f2 100644 --- a/mmpose/configs/mmdet/dyhead/atss_r50_fpn_dyhead_1x_coco.py +++ b/mmpose/configs/mmdet/dyhead/atss_r50_fpn_dyhead_1x_coco.py @@ -1,72 +1,40 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] model = dict( - type='ATSS', + type="ATSS", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), neck=[ - dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5), - dict(type='DyHead', in_channels=256, out_channels=256, num_blocks=6) + dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5), + dict(type="DyHead", in_channels=256, out_channels=256, num_blocks=6), ], bbox_head=dict( - type='ATSSHead', + type="ATSSHead", num_classes=80, in_channels=256, stacked_convs=0, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=2.0), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), # training and testing settings - train_cfg=dict( - assigner=dict(type='ATSSAssigner', topk=9), - allowed_border=-1, - pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + train_cfg=dict(assigner=dict(type="ATSSAssigner", topk=9), allowed_border=-1, pos_weight=-1, debug=False), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # optimizer optim_wrapper = dict(optimizer=dict(lr=0.01)) diff --git a/mmpose/configs/mmdet/dyhead/atss_swin-l-p4-w12_fpn_dyhead_ms-2x_coco.py b/mmpose/configs/mmdet/dyhead/atss_swin-l-p4-w12_fpn_dyhead_ms-2x_coco.py index f537b9dc9b17aa50f0044b874585fe1e0ba15216..f98d920a820ec5bcdcee8307e1a816c00d8ed172 100644 --- a/mmpose/configs/mmdet/dyhead/atss_swin-l-p4-w12_fpn_dyhead_ms-2x_coco.py +++ b/mmpose/configs/mmdet/dyhead/atss_swin-l-p4-w12_fpn_dyhead_ms-2x_coco.py @@ -1,19 +1,13 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth" # noqa model = dict( - type='ATSS', + type="ATSS", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=128), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=128 + ), backbone=dict( - type='SwinTransformer', + type="SwinTransformer", pretrain_img_size=384, embed_dims=192, depths=[2, 2, 18, 2], @@ -22,8 +16,8 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=(1, 2, 3), @@ -31,110 +25,83 @@ model = dict( # in FPN, otherwise some parameter will not be used with_cp=False, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), neck=[ + dict(type="FPN", in_channels=[384, 768, 1536], out_channels=256, start_level=0, add_extra_convs="on_output", num_outs=5), dict( - type='FPN', - in_channels=[384, 768, 1536], - out_channels=256, - start_level=0, - add_extra_convs='on_output', - num_outs=5), - dict( - type='DyHead', + type="DyHead", in_channels=256, out_channels=256, num_blocks=6, # disable zero_init_offset to follow official implementation - zero_init_offset=False) + zero_init_offset=False, + ), ], bbox_head=dict( - type='ATSSHead', + type="ATSSHead", num_classes=80, in_channels=256, pred_kernel_size=1, # follow DyHead official implementation stacked_convs=0, feat_channels=256, anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128], - center_offset=0.5), # follow DyHead official implementation - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=2.0), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128], center_offset=0.5 + ), # follow DyHead official implementation + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), # training and testing settings - train_cfg=dict( - assigner=dict(type='ATSSAssigner', topk=9), - allowed_border=-1, - pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + train_cfg=dict(assigner=dict(type="ATSSAssigner", topk=9), allowed_border=-1, pos_weight=-1, debug=False), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', - scale=[(2000, 480), (2000, 1200)], - keep_ratio=True, - backend='pillow'), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=[(2000, 480), (2000, 1200)], keep_ratio=True, backend="pillow"), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(2000, 1200), keep_ratio=True, backend='pillow'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(2000, 1200), keep_ratio=True, backend="pillow"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( dataset=dict( _delete_=True, - type='RepeatDataset', + type="RepeatDataset", times=2, dataset=dict( type={{_base_.dataset_type}}, data_root={{_base_.data_root}}, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args={{_base_.backend_args}}))) + backend_args={{_base_.backend_args}}, + ), + ) +) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader # optimizer optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict( - type='AdamW', lr=0.00005, betas=(0.9, 0.999), weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.00005, betas=(0.9, 0.999), weight_decay=0.05), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'relative_position_bias_table': dict(decay_mult=0.), - 'norm': dict(decay_mult=0.) - }), - clip_grad=None) + "absolute_pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + } + ), + clip_grad=None, +) diff --git a/mmpose/configs/mmdet/dynamic_rcnn/dynamic-rcnn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/dynamic_rcnn/dynamic-rcnn_r50_fpn_1x_coco.py index f64dfa0b9102d5f7b32793b9d21e19c67afdfc2a..27d1f14f11dae4fc029d535be8e644491d3ddf10 100644 --- a/mmpose/configs/mmdet/dynamic_rcnn/dynamic-rcnn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/dynamic_rcnn/dynamic-rcnn_r50_fpn_1x_coco.py @@ -1,28 +1,22 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" model = dict( roi_head=dict( - type='DynamicRoIHead', + type="DynamicRoIHead", bbox_head=dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), + ), train_cfg=dict( rpn_proposal=dict(nms=dict(iou_threshold=0.85)), - rcnn=dict( - dynamic_rcnn=dict( - iou_topk=75, - beta_topk=10, - update_iter_interval=100, - initial_iou=0.4, - initial_beta=1.0))), - test_cfg=dict(rpn=dict(nms=dict(iou_threshold=0.85)))) + rcnn=dict(dynamic_rcnn=dict(iou_topk=75, beta_topk=10, update_iter_interval=100, initial_iou=0.4, initial_beta=1.0)), + ), + test_cfg=dict(rpn=dict(nms=dict(iou_threshold=0.85))), +) diff --git a/mmpose/configs/mmdet/efficientnet/retinanet_effb3_fpn_8xb4-crop896-1x_coco.py b/mmpose/configs/mmdet/efficientnet/retinanet_effb3_fpn_8xb4-crop896-1x_coco.py index 2d0d9cefd0b565b2cce42117eb872ac9373ea4b9..1a993b213f99509a16df3a81be8b09837056f45e 100644 --- a/mmpose/configs/mmdet/efficientnet/retinanet_effb3_fpn_8xb4-crop896-1x_coco.py +++ b/mmpose/configs/mmdet/efficientnet/retinanet_effb3_fpn_8xb4-crop896-1x_coco.py @@ -1,87 +1,76 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/schedules/schedule_1x.py', - '../_base_/datasets/coco_detection.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/datasets/coco_detection.py", + "../_base_/default_runtime.py", ] image_size = (896, 896) -batch_augments = [dict(type='BatchFixedSizePad', size=image_size)] -norm_cfg = dict(type='BN', requires_grad=True) -checkpoint = 'https://download.openmmlab.com/mmclassification/v0/efficientnet/efficientnet-b3_3rdparty_8xb32-aa_in1k_20220119-5b4887a0.pth' # noqa +batch_augments = [dict(type="BatchFixedSizePad", size=image_size)] +norm_cfg = dict(type="BN", requires_grad=True) +checkpoint = ( + "https://download.openmmlab.com/mmclassification/v0/efficientnet/efficientnet-b3_3rdparty_8xb32-aa_in1k_20220119-5b4887a0.pth" # noqa +) model = dict( data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32, - batch_augments=batch_augments), + batch_augments=batch_augments, + ), backbone=dict( _delete_=True, - type='EfficientNet', - arch='b3', + type="EfficientNet", + arch="b3", drop_path_rate=0.2, out_indices=(3, 4, 5), frozen_stages=0, - norm_cfg=dict( - type='SyncBN', requires_grad=True, eps=1e-3, momentum=0.01), + norm_cfg=dict(type="SyncBN", requires_grad=True, eps=1e-3, momentum=0.01), norm_eval=False, - init_cfg=dict( - type='Pretrained', prefix='backbone', checkpoint=checkpoint)), + init_cfg=dict(type="Pretrained", prefix="backbone", checkpoint=checkpoint), + ), neck=dict( in_channels=[48, 136, 384], start_level=0, out_channels=256, relu_before_extra_convs=True, no_norm_on_lateral=True, - norm_cfg=norm_cfg), - bbox_head=dict(type='RetinaSepBNHead', num_ins=5, norm_cfg=norm_cfg), + norm_cfg=norm_cfg, + ), + bbox_head=dict(type="RetinaSepBNHead", num_ins=5, norm_cfg=norm_cfg), # training and testing settings - train_cfg=dict(assigner=dict(neg_iou_thr=0.5))) + train_cfg=dict(assigner=dict(neg_iou_thr=0.5)), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', - scale=image_size, - ratio_range=(0.8, 1.2), - keep_ratio=True), - dict(type='RandomCrop', crop_size=image_size), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=image_size, ratio_range=(0.8, 1.2), keep_ratio=True), + dict(type="RandomCrop", crop_size=image_size), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=image_size, keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=image_size, keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -train_dataloader = dict( - batch_size=4, num_workers=4, dataset=dict(pipeline=train_pipeline)) +train_dataloader = dict(batch_size=4, num_workers=4, dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader # optimizer -optim_wrapper = dict( - optimizer=dict(lr=0.04), - paramwise_cfg=dict(norm_decay_mult=0, bypass_duplicate=True)) +optim_wrapper = dict(optimizer=dict(lr=0.04), paramwise_cfg=dict(norm_decay_mult=0, bypass_duplicate=True)) # learning policy max_epochs = 12 param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[8, 11], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn0010-dcn_fpn_1x_coco.py b/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn0010-dcn_fpn_1x_coco.py index e1ae17a7ee4d3516e6aca90697fa165f592cf51e..c103e293e0b41817a5b5e6b3ce3e6836397a5a48 100644 --- a/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn0010-dcn_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn0010-dcn_fpn_1x_coco.py @@ -1,16 +1,14 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( plugins=[ dict( - cfg=dict( - type='GeneralizedAttention', - spatial_range=-1, - num_heads=8, - attention_type='0010', - kv_stride=2), + cfg=dict(type="GeneralizedAttention", spatial_range=-1, num_heads=8, attention_type="0010", kv_stride=2), stages=(False, False, True, True), - position='after_conv2') + position="after_conv2", + ) ], - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) + dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), + stage_with_dcn=(False, True, True, True), + ) +) diff --git a/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn0010_fpn_1x_coco.py b/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn0010_fpn_1x_coco.py index 7336d292eafe8c92407f831e712946a23e231db0..8eabf37240dc40a978dc5a5aaa0b24e091b8c5b6 100644 --- a/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn0010_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn0010_fpn_1x_coco.py @@ -1,13 +1,12 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" model = dict( - backbone=dict(plugins=[ - dict( - cfg=dict( - type='GeneralizedAttention', - spatial_range=-1, - num_heads=8, - attention_type='0010', - kv_stride=2), - stages=(False, False, True, True), - position='after_conv2') - ])) + backbone=dict( + plugins=[ + dict( + cfg=dict(type="GeneralizedAttention", spatial_range=-1, num_heads=8, attention_type="0010", kv_stride=2), + stages=(False, False, True, True), + position="after_conv2", + ) + ] + ) +) diff --git a/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn1111-dcn_fpn_1x_coco.py b/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn1111-dcn_fpn_1x_coco.py index 980e23d4509a19fe438d5c8494e2905d940705b1..c4be9ccbea3f5a5e9cc4a209fe8c6465fa078e32 100644 --- a/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn1111-dcn_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn1111-dcn_fpn_1x_coco.py @@ -1,16 +1,14 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( plugins=[ dict( - cfg=dict( - type='GeneralizedAttention', - spatial_range=-1, - num_heads=8, - attention_type='1111', - kv_stride=2), + cfg=dict(type="GeneralizedAttention", spatial_range=-1, num_heads=8, attention_type="1111", kv_stride=2), stages=(False, False, True, True), - position='after_conv2') + position="after_conv2", + ) ], - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) + dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), + stage_with_dcn=(False, True, True, True), + ) +) diff --git a/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn1111_fpn_1x_coco.py b/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn1111_fpn_1x_coco.py index 426bc09fd64c16b43b33a5c797265aa9ec2c0c15..b421219828d12371a7954e9ddaec230f98947e4d 100644 --- a/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn1111_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/empirical_attention/faster-rcnn_r50-attn1111_fpn_1x_coco.py @@ -1,13 +1,12 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" model = dict( - backbone=dict(plugins=[ - dict( - cfg=dict( - type='GeneralizedAttention', - spatial_range=-1, - num_heads=8, - attention_type='1111', - kv_stride=2), - stages=(False, False, True, True), - position='after_conv2') - ])) + backbone=dict( + plugins=[ + dict( + cfg=dict(type="GeneralizedAttention", spatial_range=-1, num_heads=8, attention_type="1111", kv_stride=2), + stages=(False, False, True, True), + position="after_conv2", + ) + ] + ) +) diff --git a/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r101-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r101-caffe_fpn_1x_coco.py index 02c70296fca04d59b2b87801fa7834c0dc3d30f0..dc7a8bef165701f7217f8bb781a24a71d3d00c07 100644 --- a/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r101-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r101-caffe_fpn_1x_coco.py @@ -1,7 +1,2 @@ -_base_ = './fast-rcnn_r50-caffe_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) +_base_ = "./fast-rcnn_r50-caffe_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"))) diff --git a/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r101_fpn_1x_coco.py index 5af6b223c5bf66928a1d79ffba904d86006a3741..6682f9e5bc67a752d6193b55283c7eba6b6460cc 100644 --- a/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r101_fpn_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './fast-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./fast-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r101_fpn_2x_coco.py b/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r101_fpn_2x_coco.py index 73425cf1ac3be429c69f6cf6b482fee91a8e2782..2f83fbaf4dc635c3710ac54803b42b1925efc46c 100644 --- a/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r101_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r101_fpn_2x_coco.py @@ -1,6 +1,2 @@ -_base_ = './fast-rcnn_r50_fpn_2x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./fast-rcnn_r50_fpn_2x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r50-caffe_fpn_1x_coco.py index 3110f9fdf590ea665c9d7b7e28a56613cd79b786..e7679fa8ae64b0ca0bc7d2fe4c0427070baa01fd 100644 --- a/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r50-caffe_fpn_1x_coco.py @@ -1,16 +1,13 @@ -_base_ = './fast-rcnn_r50_fpn_1x_coco.py' +_base_ = "./fast-rcnn_r50_fpn_1x_coco.py" model = dict( data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - norm_cfg=dict(type='BN', requires_grad=False), - style='caffe', + norm_cfg=dict(type="BN", requires_grad=False), + style="caffe", norm_eval=True, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), +) diff --git a/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r50_fpn_1x_coco.py index daefe2d2d287b865b925263a81c12a6e30c58c4d..b4492797504496d973f27b1fe2a534b54be827f8 100644 --- a/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r50_fpn_1x_coco.py @@ -1,39 +1,33 @@ _base_ = [ - '../_base_/models/fast-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/fast-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadProposals', num_max_proposals=2000), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadProposals", num_max_proposals=2000), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='ProposalBroadcaster', + type="ProposalBroadcaster", transforms=[ - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - ]), - dict(type='PackDetInputs') + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + ], + ), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadProposals', num_max_proposals=None), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadProposals", num_max_proposals=None), dict( - type='ProposalBroadcaster', + type="ProposalBroadcaster", transforms=[ - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - ]), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + ], + ), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -train_dataloader = dict( - dataset=dict( - proposal_file='proposals/rpn_r50_fpn_1x_train2017.pkl', - pipeline=train_pipeline)) -val_dataloader = dict( - dataset=dict( - proposal_file='proposals/rpn_r50_fpn_1x_val2017.pkl', - pipeline=test_pipeline)) +train_dataloader = dict(dataset=dict(proposal_file="proposals/rpn_r50_fpn_1x_train2017.pkl", pipeline=train_pipeline)) +val_dataloader = dict(dataset=dict(proposal_file="proposals/rpn_r50_fpn_1x_val2017.pkl", pipeline=test_pipeline)) test_dataloader = val_dataloader diff --git a/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r50_fpn_2x_coco.py b/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r50_fpn_2x_coco.py index d609a7c02d657e15316a4c5747983a4d9a10fc7c..2a9e50bc6f598dc17a2cac3cfedcbac71a1a6962 100644 --- a/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r50_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/fast_rcnn/fast-rcnn_r50_fpn_2x_coco.py @@ -1,14 +1,7 @@ -_base_ = './fast-rcnn_r50_fpn_1x_coco.py' +_base_ = "./fast-rcnn_r50_fpn_1x_coco.py" train_cfg = dict(max_epochs=24) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=24, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=24, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101-caffe_fpn_1x_coco.py index a18f1ada31ed2a2d1023d16470a271ad49c3be2e..84ebcb1dbb524888af367d874f76d02301ba3a83 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101-caffe_fpn_1x_coco.py @@ -1,7 +1,2 @@ -_base_ = './faster-rcnn_r50-caffe_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) +_base_ = "./faster-rcnn_r50-caffe_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"))) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101-caffe_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101-caffe_fpn_ms-3x_coco.py index 1cdb4d4973e364c4f37b80644388a4859f55772e..1ee8e1158b940832a2bdcbb3fc1fb41a8521dbfc 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101-caffe_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101-caffe_fpn_ms-3x_coco.py @@ -1,11 +1,11 @@ -_base_ = 'faster-rcnn_r50_fpn_ms-3x_coco.py' +_base_ = "faster-rcnn_r50_fpn_ms-3x_coco.py" model = dict( backbone=dict( depth=101, norm_cfg=dict(requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"), + ) +) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_1x_coco.py index d113ae6295fdc3f3058ef498eb9b675154a05c12..7506d4fe973b42f0b05f84b4bcbb73c24b99fd3b 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './faster-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./faster-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_2x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_2x_coco.py index b471fb3cbd8a79165e0cd19afc3ba98bbcfeb74e..47b97c67a686dd0913c74eec1b9c76e86651fc9d 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_2x_coco.py @@ -1,6 +1,2 @@ -_base_ = './faster-rcnn_r50_fpn_2x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./faster-rcnn_r50_fpn_2x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_8xb8-amp-lsj-200e_coco.py index a71d4afd3246d083bdf0f5a84be2fbf2340f621f..895ebffde271751bbd028e35cf08640dcf1edbf4 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,7 +1,3 @@ -_base_ = './faster-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py' +_base_ = "./faster-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_ms-3x_coco.py index 8ef6d1f8ea6b45e9a4bfe438910da827d079479b..206ecd53d1e85329c30620487437d3fdc63047ba 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r101_fpn_ms-3x_coco.py @@ -1,7 +1,3 @@ -_base_ = 'faster-rcnn_r50_fpn_ms-3x_coco.py' +_base_ = "faster-rcnn_r50_fpn_ms-3x_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r18_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r18_fpn_8xb8-amp-lsj-200e_coco.py index 65515c9ace8bf4445a77db2485fc8d3f95c263b9..f498760081c735e04ebfd329ec9603684a9feff0 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r18_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r18_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,7 +1,6 @@ -_base_ = './faster-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py' +_base_ = "./faster-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py" model = dict( - backbone=dict( - depth=18, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18')), - neck=dict(in_channels=[64, 128, 256, 512])) + backbone=dict(depth=18, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet18")), + neck=dict(in_channels=[64, 128, 256, 512]), +) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-c4_ms-1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-c4_ms-1x_coco.py index 7e231e865270acf0383e03a64f151efdbf88c29e..ec63499441515400c1364d866eac36290cb3b43d 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-c4_ms-1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-c4_ms-1x_coco.py @@ -1,14 +1,10 @@ -_base_ = './faster-rcnn_r50-caffe_c4-1x_coco.py' +_base_ = "./faster-rcnn_r50-caffe_c4-1x_coco.py" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] _base_.train_dataloader.dataset.pipeline = train_pipeline diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-dc5_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-dc5_1x_coco.py index 8952a5c9c6c2fe019711968fa2aa7ed2065b13f6..bfd26ef33e83916e0d55a7813aa2145b9579a89c 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-dc5_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-dc5_1x_coco.py @@ -1,5 +1,6 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50-caffe-dc5.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50-caffe-dc5.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-1x_coco.py index 63a68859a85fe5556e927c04aae5cafbef1fc0b6..01130e9af0b3b64266e99f209e62a5e0c66b2f18 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-1x_coco.py @@ -1,14 +1,10 @@ -_base_ = 'faster-rcnn_r50-caffe-dc5_1x_coco.py' +_base_ = "faster-rcnn_r50-caffe-dc5_1x_coco.py" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] _base_.train_dataloader.dataset.pipeline = train_pipeline diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-3x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-3x_coco.py index 27063468a70436a62a7cc54b8c8efc2de96ec33f..ec6e97e38d7981d0646ce26b016118d6905fa8cb 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-3x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe-dc5_ms-3x_coco.py @@ -1,4 +1,4 @@ -_base_ = './faster-rcnn_r50-caffe-dc5_ms-1x_coco.py' +_base_ = "./faster-rcnn_r50-caffe-dc5_ms-1x_coco.py" # MMEngine support the following two ways, users can choose # according to convenience diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_c4-1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_c4-1x_coco.py index 0888fc01790af82a4c7131280ca5f0247b28d9fd..bef90c91c19bc12bf6a63a55c477912ae73de4c0 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_c4-1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_c4-1x_coco.py @@ -1,5 +1,6 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50-caffe-c4.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50-caffe-c4.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_1x_coco.py index 9129a9583c52bf8ccab38a65f35c9f14bb128d07..07d3d02bdd8e39046705292d01b308c79a906373 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_1x_coco.py @@ -1,15 +1,12 @@ -_base_ = './faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "./faster-rcnn_r50_fpn_1x_coco.py" model = dict( data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( norm_cfg=dict(requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe'))) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), +) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_90k_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_90k_coco.py index 27f49355f3be8f6a53038894405c5f1b3d9b46fa..baa9ae5df4a06cb77316f6d501c71df5a793e627 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_90k_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_90k_coco.py @@ -1,22 +1,11 @@ -_base_ = 'faster-rcnn_r50-caffe_fpn_1x_coco.py' +_base_ = "faster-rcnn_r50-caffe_fpn_1x_coco.py" max_iter = 90000 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_iter, - by_epoch=False, - milestones=[60000, 80000], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_iter, by_epoch=False, milestones=[60000, 80000], gamma=0.1), ] -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=max_iter, - val_interval=10000) +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=max_iter, val_interval=10000) default_hooks = dict(checkpoint=dict(by_epoch=False, interval=10000)) log_processor = dict(by_epoch=False) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco-person-bicycle-car.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco-person-bicycle-car.py index f36bb055f87aeadc43aa1233d1d3a7bdc33fbd80..078dfda4ec21e938e22ec4355d472cbedcf2d6c6 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco-person-bicycle-car.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco-person-bicycle-car.py @@ -1,16 +1,16 @@ -_base_ = './faster-rcnn_r50-caffe_fpn_ms-1x_coco.py' +_base_ = "./faster-rcnn_r50-caffe_fpn_ms-1x_coco.py" model = dict(roi_head=dict(bbox_head=dict(num_classes=3))) metainfo = { - 'classes': ('person', 'bicycle', 'car'), - 'palette': [ + "classes": ("person", "bicycle", "car"), + "palette": [ (220, 20, 60), (119, 11, 32), (0, 0, 142), - ] + ], } train_dataloader = dict(dataset=dict(metainfo=metainfo)) val_dataloader = dict(dataset=dict(metainfo=metainfo)) test_dataloader = dict(dataset=dict(metainfo=metainfo)) -load_from = 'https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco_bbox_mAP-0.398_20200504_163323-30042637.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco_bbox_mAP-0.398_20200504_163323-30042637.pth" # noqa diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco-person.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco-person.py index 9528b63f4deabb3610a26af59c856cee62c489c2..53a2ce8630f4626e294b063cc07057a184e4a4e5 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco-person.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco-person.py @@ -1,14 +1,14 @@ -_base_ = './faster-rcnn_r50-caffe_fpn_ms-1x_coco.py' +_base_ = "./faster-rcnn_r50-caffe_fpn_ms-1x_coco.py" model = dict(roi_head=dict(bbox_head=dict(num_classes=1))) metainfo = { - 'classes': ('person', ), - 'palette': [ + "classes": ("person",), + "palette": [ (220, 20, 60), - ] + ], } train_dataloader = dict(dataset=dict(metainfo=metainfo)) val_dataloader = dict(dataset=dict(metainfo=metainfo)) test_dataloader = dict(dataset=dict(metainfo=metainfo)) -load_from = 'https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco_bbox_mAP-0.398_20200504_163323-30042637.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco/faster_rcnn_r50_caffe_fpn_mstrain_3x_coco_bbox_mAP-0.398_20200504_163323-30042637.pth" # noqa diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco.py index 59f1633c807f3eb904657cfaf97113c355df3fca..0c41156f4cb889a3434aab0e867bda1943a96623 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-1x_coco.py @@ -1,29 +1,22 @@ -_base_ = './faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "./faster-rcnn_r50_fpn_1x_coco.py" model = dict( data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( norm_cfg=dict(requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe'))) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] # MMEngine support the following two ways, users can choose # according to convenience diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-2x_coco.py index 44d320ea01ba53d591ab7db29742e7fffc7c81ce..1d4269a88eb633e40eddc8b6ecff3fe6d59f8538 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-2x_coco.py @@ -1,4 +1,4 @@ -_base_ = './faster-rcnn_r50-caffe_fpn_ms-1x_coco.py' +_base_ = "./faster-rcnn_r50-caffe_fpn_ms-1x_coco.py" # MMEngine support the following two ways, users can choose # according to convenience diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-3x_coco.py index 365f6439241c6374554af1fd58a114ef03448877..a45071d89bb555a27d73def4e89800bbd09c2c91 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-3x_coco.py @@ -1,15 +1,12 @@ -_base_ = 'faster-rcnn_r50_fpn_ms-3x_coco.py' +_base_ = "faster-rcnn_r50_fpn_ms-3x_coco.py" model = dict( data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( norm_cfg=dict(requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe'))) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), +) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-90k_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-90k_coco.py index 6b9b3eb0e79b1ffb71d15c21274692d3b85e16ac..e2e2f361dfcc8b55ae0e493003488200a9d0b8fd 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-90k_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-caffe_fpn_ms-90k_coco.py @@ -1,23 +1,12 @@ -_base_ = 'faster-rcnn_r50-caffe_fpn_ms-1x_coco.py' +_base_ = "faster-rcnn_r50-caffe_fpn_ms-1x_coco.py" max_iter = 90000 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_iter, - by_epoch=False, - milestones=[60000, 80000], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_iter, by_epoch=False, milestones=[60000, 80000], gamma=0.1), ] -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=max_iter, - val_interval=10000) +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=max_iter, val_interval=10000) default_hooks = dict(checkpoint=dict(by_epoch=False, interval=10000)) log_processor = dict(by_epoch=False) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-tnr-pre_fpn_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-tnr-pre_fpn_1x_coco.py index 7b3e5dedbe81b927492dd41b13f017bcc2bd4c92..4ad48af17414037ef172c322c326b0ae858471a6 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-tnr-pre_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50-tnr-pre_fpn_1x_coco.py @@ -1,14 +1,14 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -checkpoint = 'https://download.pytorch.org/models/resnet50-11ad3fa6.pth' -model = dict( - backbone=dict(init_cfg=dict(type='Pretrained', checkpoint=checkpoint))) +checkpoint = "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" +model = dict(backbone=dict(init_cfg=dict(type="Pretrained", checkpoint=checkpoint))) # `lr` and `weight_decay` have been searched to be optimal. optim_wrapper = dict( - optimizer=dict(_delete_=True, type='AdamW', lr=0.0001, weight_decay=0.1), - paramwise_cfg=dict(norm_decay_mult=0., bypass_duplicate=True)) + optimizer=dict(_delete_=True, type="AdamW", lr=0.0001, weight_decay=0.1), paramwise_cfg=dict(norm_decay_mult=0.0, bypass_duplicate=True) +) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py index 8a45417fdd4566241114e20275990a5729486932..04c301579705313227b356d0466f9a20e0517ff8 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py @@ -1,5 +1,6 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_2x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_2x_coco.py index 2981c6fbe16eb7a8b6ca1202ebb6325e2324c040..1805d84b7d5c04b708058f89b5ed95c2a30e9efa 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_2x_coco.py @@ -1,5 +1,6 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py index 3d366f3ba0e5ff098db3e409171a88860f1cf3af..e04c91bb1d329caee26966c3896aaa01b6fe32e7 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,18 +1,12 @@ -_base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../common/lsj-200e_coco-detection.py' -] +_base_ = ["../_base_/models/faster-rcnn_r50_fpn.py", "../common/lsj-200e_coco-detection.py"] image_size = (1024, 1024) -batch_augments = [dict(type='BatchFixedSizePad', size=image_size)] +batch_augments = [dict(type="BatchFixedSizePad", size=image_size)] model = dict(data_preprocessor=dict(batch_augments=batch_augments)) train_dataloader = dict(batch_size=8, num_workers=4) # Enable automatic-mixed-precision training with AmpOptimWrapper. -optim_wrapper = dict( - type='AmpOptimWrapper', - optimizer=dict( - type='SGD', lr=0.02 * 4, momentum=0.9, weight_decay=0.00004)) +optim_wrapper = dict(type="AmpOptimWrapper", optimizer=dict(type="SGD", lr=0.02 * 4, momentum=0.9, weight_decay=0.00004)) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_amp-1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_amp-1x_coco.py index f765deaef1db8a798c44d848c6f759755ccd4c45..8dd9411f6f6db1b839144a7fd9cc62d8f7a355f9 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_amp-1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_amp-1x_coco.py @@ -1,6 +1,6 @@ -_base_ = './faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "./faster-rcnn_r50_fpn_1x_coco.py" # MMEngine support the following two ways, users can choose # according to convenience # optim_wrapper = dict(type='AmpOptimWrapper') -_base_.optim_wrapper.type = 'AmpOptimWrapper' +_base_.optim_wrapper.type = "AmpOptimWrapper" diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_bounded-iou_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_bounded-iou_1x_coco.py index 7758ca80b372e7895be267cad8c4603778d160b3..e4719a1b0e85f9cd4f355c95d0a1f464e05a3e2b 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_bounded-iou_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_bounded-iou_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './faster-rcnn_r50_fpn_1x_coco.py' -model = dict( - roi_head=dict( - bbox_head=dict( - reg_decoded_bbox=True, - loss_bbox=dict(type='BoundedIoULoss', loss_weight=10.0)))) +_base_ = "./faster-rcnn_r50_fpn_1x_coco.py" +model = dict(roi_head=dict(bbox_head=dict(reg_decoded_bbox=True, loss_bbox=dict(type="BoundedIoULoss", loss_weight=10.0)))) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_ciou_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_ciou_1x_coco.py index e8d8a3042750e8f5f9478b5e8c3111d8b7a10528..55ed78aaa9188b217cd0449f72b51932aea78c47 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_ciou_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_ciou_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './faster-rcnn_r50_fpn_1x_coco.py' -model = dict( - roi_head=dict( - bbox_head=dict( - reg_decoded_bbox=True, - loss_bbox=dict(type='CIoULoss', loss_weight=12.0)))) +_base_ = "./faster-rcnn_r50_fpn_1x_coco.py" +model = dict(roi_head=dict(bbox_head=dict(reg_decoded_bbox=True, loss_bbox=dict(type="CIoULoss", loss_weight=12.0)))) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_fcos-rpn_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_fcos-rpn_1x_coco.py index b5a34d9f74a60388fa60afd8255d470c45f209f7..4a88ccdeb0f111d825fe0081019bcf48ace7fa76 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_fcos-rpn_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_fcos-rpn_1x_coco.py @@ -1,18 +1,16 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( # copied from configs/fcos/fcos_r50-caffe_fpn_gn-head_1x_coco.py - neck=dict( - start_level=1, - add_extra_convs='on_output', # use P5 - relu_before_extra_convs=True), + neck=dict(start_level=1, add_extra_convs="on_output", relu_before_extra_convs=True), # use P5 rpn_head=dict( _delete_=True, # ignore the unused old settings - type='FCOSHead', + type="FCOSHead", # num_classes = 1 for rpn, # if num_classes > 1, it will be set to 1 in # TwoStageDetector automatically @@ -21,28 +19,15 @@ model = dict( stacked_convs=4, feat_channels=256, strides=[8, 16, 32, 64, 128], - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='IoULoss', loss_weight=1.0), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), - roi_head=dict( # update featmap_strides - bbox_roi_extractor=dict(featmap_strides=[8, 16, 32, 64, 128]))) + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", loss_weight=1.0), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), + roi_head=dict(bbox_roi_extractor=dict(featmap_strides=[8, 16, 32, 64, 128])), # update featmap_strides +) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, - end=1000), # Slowly increase lr, otherwise loss becomes NAN - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=1000), # Slowly increase lr, otherwise loss becomes NAN + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_giou_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_giou_1x_coco.py index 82b71d77bfc448eceadcd03a6c8cbc4c8f871109..de73f508c2fd0ff1b72f52230465b9d9bd89e4b5 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_giou_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_giou_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './faster-rcnn_r50_fpn_1x_coco.py' -model = dict( - roi_head=dict( - bbox_head=dict( - reg_decoded_bbox=True, - loss_bbox=dict(type='GIoULoss', loss_weight=10.0)))) +_base_ = "./faster-rcnn_r50_fpn_1x_coco.py" +model = dict(roi_head=dict(bbox_head=dict(reg_decoded_bbox=True, loss_bbox=dict(type="GIoULoss", loss_weight=10.0)))) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_iou_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_iou_1x_coco.py index e21c43640cb7004e8e4ef189ff8843ad39de3c6f..5facda625f459a3b778dade867655a8f66acdd24 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_iou_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_iou_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './faster-rcnn_r50_fpn_1x_coco.py' -model = dict( - roi_head=dict( - bbox_head=dict( - reg_decoded_bbox=True, - loss_bbox=dict(type='IoULoss', loss_weight=10.0)))) +_base_ = "./faster-rcnn_r50_fpn_1x_coco.py" +model = dict(roi_head=dict(bbox_head=dict(reg_decoded_bbox=True, loss_bbox=dict(type="IoULoss", loss_weight=10.0)))) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_ms-3x_coco.py index 75dcfeb7a2310938c05cc103fadec6c6e119b90b..d16cfec32a4da42e8d2ed6c3e7e5f661b98e3d09 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_ms-3x_coco.py @@ -1 +1 @@ -_base_ = ['../common/ms_3x_coco.py', '../_base_/models/faster-rcnn_r50_fpn.py'] +_base_ = ["../common/ms_3x_coco.py", "../_base_/models/faster-rcnn_r50_fpn.py"] diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_ohem_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_ohem_1x_coco.py index 4f804b9be283015d4ec349f0df664e9ca7326c96..4fd5d1eccb6f067135546fbff42a35b4134436b9 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_ohem_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_ohem_1x_coco.py @@ -1,2 +1,2 @@ -_base_ = './faster-rcnn_r50_fpn_1x_coco.py' -model = dict(train_cfg=dict(rcnn=dict(sampler=dict(type='OHEMSampler')))) +_base_ = "./faster-rcnn_r50_fpn_1x_coco.py" +model = dict(train_cfg=dict(rcnn=dict(sampler=dict(type="OHEMSampler")))) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_soft-nms_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_soft-nms_1x_coco.py index 3775d8e447cb80c0fc28199be2abc4c23383eadd..14250d129aeba16926a0b5352955f3b9763a6ada 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_soft-nms_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_r50_fpn_soft-nms_1x_coco.py @@ -1,12 +1,8 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -model = dict( - test_cfg=dict( - rcnn=dict( - score_thr=0.05, - nms=dict(type='soft_nms', iou_threshold=0.5), - max_per_img=100))) +model = dict(test_cfg=dict(rcnn=dict(score_thr=0.05, nms=dict(type="soft_nms", iou_threshold=0.5), max_per_img=100))) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x4d_fpn_1x_coco.py index 395c98cd65cd5f883c9fe206a7b9c99e59acb32e..474b803a5bc799cbd0fb08902ef3a63b7032ec41 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "./faster-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x4d_fpn_2x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x4d_fpn_2x_coco.py index 6232d0edba51f433a930c46d03c49fc27954303f..85e23aaebd824b8e224016bcb03292e518b0c783 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x4d_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x4d_fpn_2x_coco.py @@ -1,14 +1,15 @@ -_base_ = './faster-rcnn_r50_fpn_2x_coco.py' +_base_ = "./faster-rcnn_r50_fpn_2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x4d_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x4d_fpn_ms-3x_coco.py index 88cb40fd62a87a8af13e166df16a348c26e6d29e..f0608562012595304c0eeb07154ba3da52bf4dd5 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x4d_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x4d_fpn_ms-3x_coco.py @@ -1,14 +1,15 @@ -_base_ = ['../common/ms_3x_coco.py', '../_base_/models/faster-rcnn_r50_fpn.py'] +_base_ = ["../common/ms_3x_coco.py", "../_base_/models/faster-rcnn_r50_fpn.py"] model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x8d_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x8d_fpn_ms-3x_coco.py index 28d6290be7a75b7cceef8957e872e221fd3e78f5..25f2ee5c4a5848a7e32d3fc5b920beafc3a99abe 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x8d_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-32x8d_fpn_ms-3x_coco.py @@ -1,23 +1,20 @@ -_base_ = ['../common/ms_3x_coco.py', '../_base_/models/faster-rcnn_r50_fpn.py'] +_base_ = ["../common/ms_3x_coco.py", "../_base_/models/faster-rcnn_r50_fpn.py"] model = dict( # ResNeXt-101-32x8d model trained with Caffe2 at FB, # so the mean and std need to be changed. data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[57.375, 57.120, 58.395], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[57.375, 57.120, 58.395], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=8, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), - style='pytorch', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnext101_32x8d'))) + norm_cfg=dict(type="BN", requires_grad=False), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnext101_32x8d"), + ), +) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-64x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-64x4d_fpn_1x_coco.py index f39d6322fc3a4729ea7bbfefc207a6975efb4bf4..17b3e11036a5074cd6e6c2a14cb6fffaa3afdf67 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-64x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-64x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "./faster-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-64x4d_fpn_2x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-64x4d_fpn_2x_coco.py index 97a3c1338fe294f66109fa92de0d8a48686b8a09..6dd7d84dc28b80d11e82357d21518fb3d60983b4 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-64x4d_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-64x4d_fpn_2x_coco.py @@ -1,14 +1,15 @@ -_base_ = './faster-rcnn_r50_fpn_2x_coco.py' +_base_ = "./faster-rcnn_r50_fpn_2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-64x4d_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-64x4d_fpn_ms-3x_coco.py index eeaa218c9dc76123791d9e19b0ebae687cc296c9..bd4de1582ee1a0dd0b8b4b689c98ccc140ecb6c0 100644 --- a/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-64x4d_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/faster_rcnn/faster-rcnn_x101-64x4d_fpn_ms-3x_coco.py @@ -1,14 +1,15 @@ -_base_ = ['../common/ms_3x_coco.py', '../_base_/models/faster-rcnn_r50_fpn.py'] +_base_ = ["../common/ms_3x_coco.py", "../_base_/models/faster-rcnn_r50_fpn.py"] model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/fcos/fcos_r101-caffe_fpn_gn-head-1x_coco.py b/mmpose/configs/mmdet/fcos/fcos_r101-caffe_fpn_gn-head-1x_coco.py index 5380e87483e494b4c0bc6d8846c6892811d581d3..68a5d84f7f37ecf2c0f3d2b09daa60f50eaafb78 100644 --- a/mmpose/configs/mmdet/fcos/fcos_r101-caffe_fpn_gn-head-1x_coco.py +++ b/mmpose/configs/mmdet/fcos/fcos_r101-caffe_fpn_gn-head-1x_coco.py @@ -1,9 +1,4 @@ -_base_ = './fcos_r50-caffe_fpn_gn-head_1x_coco.py' +_base_ = "./fcos_r50-caffe_fpn_gn-head_1x_coco.py" # model settings -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron/resnet101_caffe'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron/resnet101_caffe"))) diff --git a/mmpose/configs/mmdet/fcos/fcos_r101-caffe_fpn_gn-head_ms-640-800-2x_coco.py b/mmpose/configs/mmdet/fcos/fcos_r101-caffe_fpn_gn-head_ms-640-800-2x_coco.py index 286a07a2db2c6fc423f6cf039b2609ac81ede73d..47916a935a12c938ca85dd45a2f8b05785c8d29d 100644 --- a/mmpose/configs/mmdet/fcos/fcos_r101-caffe_fpn_gn-head_ms-640-800-2x_coco.py +++ b/mmpose/configs/mmdet/fcos/fcos_r101-caffe_fpn_gn-head_ms-640-800-2x_coco.py @@ -1,23 +1,15 @@ -_base_ = './fcos_r50-caffe_fpn_gn-head_1x_coco.py' +_base_ = "./fcos_r50-caffe_fpn_gn-head_1x_coco.py" # model settings -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron/resnet101_caffe'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron/resnet101_caffe"))) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) @@ -27,12 +19,6 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict(type='ConstantLR', factor=1.0 / 3, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="ConstantLR", factor=1.0 / 3, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/fcos/fcos_r101_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/fcos/fcos_r101_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py index 77250e6917812d3494c8dabd52a3ed12f5f34483..ac94151e9b3c2922896525cc80be5fb90d10d97d 100644 --- a/mmpose/configs/mmdet/fcos/fcos_r101_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/fcos/fcos_r101_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py @@ -1,7 +1,3 @@ -_base_ = './fcos_r50_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py' # noqa +_base_ = "./fcos_r50_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py" # noqa -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/fcos/fcos_r18_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/fcos/fcos_r18_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py index 6f001024bb702c5ed0cb1103c5e10ae3cd7f599b..24601dcf28275f5fbadf7a326a413a6a91fe50d9 100644 --- a/mmpose/configs/mmdet/fcos/fcos_r18_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/fcos/fcos_r18_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py @@ -1,7 +1,6 @@ -_base_ = './fcos_r50_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py' # noqa +_base_ = "./fcos_r50_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py" # noqa model = dict( - backbone=dict( - depth=18, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18')), - neck=dict(in_channels=[64, 128, 256, 512])) + backbone=dict(depth=18, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet18")), + neck=dict(in_channels=[64, 128, 256, 512]), +) diff --git a/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head-center-normbbox-centeronreg-giou_1x_coco.py b/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head-center-normbbox-centeronreg-giou_1x_coco.py index 2a77641dd87142d5c6d508f2f4a4ba5b70db52c1..c1f1d13f4543c0cc2e03fe28e4be579074a857fd 100644 --- a/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head-center-normbbox-centeronreg-giou_1x_coco.py +++ b/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head-center-normbbox-centeronreg-giou_1x_coco.py @@ -1,42 +1,27 @@ -_base_ = 'fcos_r50-caffe_fpn_gn-head_1x_coco.py' +_base_ = "fcos_r50-caffe_fpn_gn-head_1x_coco.py" # model setting model = dict( data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), - backbone=dict( - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), + backbone=dict(init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe")), bbox_head=dict( norm_on_bbox=True, centerness_on_reg=True, dcn_on_last_conv=False, center_sampling=True, conv_bias=True, - loss_bbox=dict(type='GIoULoss', loss_weight=1.0)), + loss_bbox=dict(type="GIoULoss", loss_weight=1.0), + ), # training and testing settings - test_cfg=dict(nms=dict(type='nms', iou_threshold=0.6))) + test_cfg=dict(nms=dict(type="nms", iou_threshold=0.6)), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 3.0, - by_epoch=False, - begin=0, - end=500), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 3.0, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] # optimizer diff --git a/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head-center_1x_coco.py b/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head-center_1x_coco.py index 9e4eb1d5981761fab8fe0bb876ff7ef243ac31f9..0e992eda0b923d0d57098e5030e878997ba97982 100644 --- a/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head-center_1x_coco.py +++ b/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head-center_1x_coco.py @@ -1,4 +1,4 @@ -_base_ = './fcos_r50-caffe_fpn_gn-head_1x_coco.py' +_base_ = "./fcos_r50-caffe_fpn_gn-head_1x_coco.py" # model settings model = dict(bbox_head=dict(center_sampling=True, center_sample_radius=1.5)) diff --git a/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head_1x_coco.py b/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head_1x_coco.py index 928a9b4c92d217822179c0ae00ae50f6f74289b1..beadde832bd84e7658b82de9627e463fb6cf1fa4 100644 --- a/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head_1x_coco.py +++ b/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head_1x_coco.py @@ -1,75 +1,53 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings model = dict( - type='FCOS', + type="FCOS", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[102.9801, 115.9465, 122.7717], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron/resnet50_caffe')), + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron/resnet50_caffe"), + ), neck=dict( - type='FPN', + type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, - add_extra_convs='on_output', # use P5 + add_extra_convs="on_output", # use P5 num_outs=5, - relu_before_extra_convs=True), + relu_before_extra_convs=True, + ), bbox_head=dict( - type='FCOSHead', + type="FCOSHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, strides=[8, 16, 32, 64, 128], - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='IoULoss', loss_weight=1.0), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", loss_weight=1.0), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), # testing settings - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100)) + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), +) # learning rate param_scheduler = [ - dict(type='ConstantLR', factor=1.0 / 3, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="ConstantLR", factor=1.0 / 3, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] # optimizer optim_wrapper = dict( - optimizer=dict(lr=0.01), - paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.), - clip_grad=dict(max_norm=35, norm_type=2)) + optimizer=dict(lr=0.01), paramwise_cfg=dict(bias_lr_mult=2.0, bias_decay_mult=0.0), clip_grad=dict(max_norm=35, norm_type=2) +) diff --git a/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head_4xb4-1x_coco.py b/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head_4xb4-1x_coco.py index 32358cd3c69800874aa77ba5746ffc0d6f3a219d..8ef08c886ab957ef7291f5f2c572e0d3e5ac545f 100644 --- a/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head_4xb4-1x_coco.py +++ b/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head_4xb4-1x_coco.py @@ -1,5 +1,5 @@ # TODO: Remove this config after benchmarking all related configs -_base_ = 'fcos_r50-caffe_fpn_gn-head_1x_coco.py' +_base_ = "fcos_r50-caffe_fpn_gn-head_1x_coco.py" # dataset settings train_dataloader = dict(batch_size=4, num_workers=4) diff --git a/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head_ms-640-800-2x_coco.py b/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head_ms-640-800-2x_coco.py index 4d50b4ec6c4a10b07cbf73475e7af545b058605c..3b3aea896c96507b0ef2f43468f0c9c57d13eef5 100644 --- a/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head_ms-640-800-2x_coco.py +++ b/mmpose/configs/mmdet/fcos/fcos_r50-caffe_fpn_gn-head_ms-640-800-2x_coco.py @@ -1,15 +1,12 @@ -_base_ = './fcos_r50-caffe_fpn_gn-head_1x_coco.py' +_base_ = "./fcos_r50-caffe_fpn_gn-head_1x_coco.py" # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) @@ -19,12 +16,6 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict(type='ConstantLR', factor=1.0 / 3, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="ConstantLR", factor=1.0 / 3, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/fcos/fcos_r50-dcn-caffe_fpn_gn-head-center-normbbox-centeronreg-giou_1x_coco.py b/mmpose/configs/mmdet/fcos/fcos_r50-dcn-caffe_fpn_gn-head-center-normbbox-centeronreg-giou_1x_coco.py index a6a6c44f9b4213601b447bc02720e24dc86a53d9..a48c927a1afd24a563368bb6255cff94f4ab277e 100644 --- a/mmpose/configs/mmdet/fcos/fcos_r50-dcn-caffe_fpn_gn-head-center-normbbox-centeronreg-giou_1x_coco.py +++ b/mmpose/configs/mmdet/fcos/fcos_r50-dcn-caffe_fpn_gn-head-center-normbbox-centeronreg-giou_1x_coco.py @@ -1,44 +1,31 @@ -_base_ = 'fcos_r50-caffe_fpn_gn-head_1x_coco.py' +_base_ = "fcos_r50-caffe_fpn_gn-head_1x_coco.py" # model settings model = dict( data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), + dcn=dict(type="DCNv2", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True), - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), bbox_head=dict( norm_on_bbox=True, centerness_on_reg=True, dcn_on_last_conv=True, center_sampling=True, conv_bias=True, - loss_bbox=dict(type='GIoULoss', loss_weight=1.0)), + loss_bbox=dict(type="GIoULoss", loss_weight=1.0), + ), # training and testing settings - test_cfg=dict(nms=dict(type='nms', iou_threshold=0.6))) + test_cfg=dict(nms=dict(type="nms", iou_threshold=0.6)), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 3.0, - by_epoch=False, - begin=0, - end=500), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 3.0, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] # optimizer diff --git a/mmpose/configs/mmdet/fcos/fcos_r50_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/fcos/fcos_r50_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py index b51556b8eb7f844866d7acff5c7b86c08cb2a054..bedcc29bcfcc3eebc70dec11b66bf3a21d67ebe2 100644 --- a/mmpose/configs/mmdet/fcos/fcos_r50_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/fcos/fcos_r50_fpn_gn-head-center-normbbox-centeronreg-giou_8xb8-amp-lsj-200e_coco.py @@ -1,38 +1,41 @@ -_base_ = '../common/lsj-200e_coco-detection.py' +_base_ = "../common/lsj-200e_coco-detection.py" image_size = (1024, 1024) -batch_augments = [dict(type='BatchFixedSizePad', size=image_size)] +batch_augments = [dict(type="BatchFixedSizePad", size=image_size)] # model settings model = dict( - type='FCOS', + type="FCOS", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32, - batch_augments=batch_augments), + batch_augments=batch_augments, + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), neck=dict( - type='FPN', + type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, - add_extra_convs='on_output', # use P5 + add_extra_convs="on_output", # use P5 num_outs=5, - relu_before_extra_convs=True), + relu_before_extra_convs=True, + ), bbox_head=dict( - type='FCOSHead', + type="FCOSHead", num_classes=80, in_channels=256, stacked_convs=4, @@ -43,31 +46,22 @@ model = dict( dcn_on_last_conv=False, center_sampling=True, conv_bias=True, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=1.0), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=1.0), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), # testing settings - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) train_dataloader = dict(batch_size=8, num_workers=4) # Enable automatic-mixed-precision training with AmpOptimWrapper. optim_wrapper = dict( - type='AmpOptimWrapper', - optimizer=dict( - type='SGD', lr=0.01 * 4, momentum=0.9, weight_decay=0.00004), - paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.), - clip_grad=dict(max_norm=35, norm_type=2)) + type="AmpOptimWrapper", + optimizer=dict(type="SGD", lr=0.01 * 4, momentum=0.9, weight_decay=0.00004), + paramwise_cfg=dict(bias_lr_mult=2.0, bias_decay_mult=0.0), + clip_grad=dict(max_norm=35, norm_type=2), +) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/fcos/fcos_x101-64x4d_fpn_gn-head_ms-640-800-2x_coco.py b/mmpose/configs/mmdet/fcos/fcos_x101-64x4d_fpn_gn-head_ms-640-800-2x_coco.py index 503c0e1ce79bdbc9f2a32cc65f977b0f1e968927..6ac3fb027eeb0f7390f4f344a600bbf668cb434e 100644 --- a/mmpose/configs/mmdet/fcos/fcos_x101-64x4d_fpn_gn-head_ms-640-800-2x_coco.py +++ b/mmpose/configs/mmdet/fcos/fcos_x101-64x4d_fpn_gn-head_ms-640-800-2x_coco.py @@ -1,37 +1,32 @@ -_base_ = './fcos_r50-caffe_fpn_gn-head_1x_coco.py' +_base_ = "./fcos_r50-caffe_fpn_gn-head_1x_coco.py" # model settings model = dict( data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) @@ -41,12 +36,6 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict(type='ConstantLR', factor=1.0 / 3, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="ConstantLR", factor=1.0 / 3, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_4xb4-1x_coco.py b/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_4xb4-1x_coco.py index 7e8ccf910e6317bf576463fa26bfcb330b6ff385..3d55c6076c913139aa37f5f0b0f2bef19f83cff9 100644 --- a/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_4xb4-1x_coco.py +++ b/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_4xb4-1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './fovea_r50_fpn_4xb4-1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./fovea_r50_fpn_4xb4-1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_4xb4-2x_coco.py b/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_4xb4-2x_coco.py index 0dc98515e62b2dba225e822850229f0a2f802d63..9a355072c18e5d82766d46f01d6466757fcacc09 100644 --- a/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_4xb4-2x_coco.py +++ b/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_4xb4-2x_coco.py @@ -1,6 +1,2 @@ -_base_ = './fovea_r50_fpn_4xb4-2x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./fovea_r50_fpn_4xb4-2x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_gn-head-align_4xb4-2x_coco.py b/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_gn-head-align_4xb4-2x_coco.py index 222671d49d1e3fbc31285e4f13487d86642ebbe3..3fe4329a41fc3751dffa959d36aa497d71acfb2c 100644 --- a/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_gn-head-align_4xb4-2x_coco.py +++ b/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_gn-head-align_4xb4-2x_coco.py @@ -1,23 +1,12 @@ -_base_ = './fovea_r50_fpn_4xb4-1x_coco.py' +_base_ = "./fovea_r50_fpn_4xb4-1x_coco.py" model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101')), - bbox_head=dict( - with_deform=True, - norm_cfg=dict(type='GN', num_groups=32, requires_grad=True))) + backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101")), + bbox_head=dict(with_deform=True, norm_cfg=dict(type="GN", num_groups=32, requires_grad=True)), +) # learning policy max_epochs = 24 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py b/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py index e1852d581fcbdd9a1459291fc7f65e51041aa4e6..dee320f624766b3773f271b625078acaa37eafcc 100644 --- a/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py +++ b/mmpose/configs/mmdet/foveabox/fovea_r101_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py @@ -1,34 +1,20 @@ -_base_ = './fovea_r50_fpn_4xb4-1x_coco.py' +_base_ = "./fovea_r50_fpn_4xb4-1x_coco.py" model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101')), - bbox_head=dict( - with_deform=True, - norm_cfg=dict(type='GN', num_groups=32, requires_grad=True))) + backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101")), + bbox_head=dict(with_deform=True, norm_cfg=dict(type="GN", num_groups=32, requires_grad=True)), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) # learning policy max_epochs = 24 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_4xb4-1x_coco.py b/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_4xb4-1x_coco.py index 13cf3ae92b0d2bfd1d84f032f7b202430f095a6a..aeeb479b53037bf3ebe9728fcc0e205a5df43411 100644 --- a/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_4xb4-1x_coco.py +++ b/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_4xb4-1x_coco.py @@ -1,35 +1,24 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings model = dict( - type='FOVEA', + type="FOVEA", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - num_outs=5, - add_extra_convs='on_input'), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, num_outs=5, add_extra_convs="on_input"), bbox_head=dict( - type='FoveaHead', + type="FoveaHead", num_classes=80, in_channels=256, stacked_convs=4, @@ -39,21 +28,13 @@ model = dict( scale_ranges=((1, 64), (32, 128), (64, 256), (128, 512), (256, 2048)), sigma=0.4, with_deform=False, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=1.50, - alpha=0.4, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0)), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=1.50, alpha=0.4, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=0.11, loss_weight=1.0), + ), # training and testing settings train_cfg=dict(), - test_cfg=dict( - nms_pre=1000, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100)) + test_cfg=dict(nms_pre=1000, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), +) train_dataloader = dict(batch_size=4, num_workers=4) # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_4xb4-2x_coco.py b/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_4xb4-2x_coco.py index f9d06ef9f9ba89f202ef13176af39df7e89cb5e6..41f473d3cf9b5adb69c79604ecf48bffa2a055b7 100644 --- a/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_4xb4-2x_coco.py +++ b/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_4xb4-2x_coco.py @@ -1,15 +1,8 @@ -_base_ = './fovea_r50_fpn_4xb4-1x_coco.py' +_base_ = "./fovea_r50_fpn_4xb4-1x_coco.py" # learning policy max_epochs = 24 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_gn-head-align_4xb4-2x_coco.py b/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_gn-head-align_4xb4-2x_coco.py index 877bb4fa4e1c03190a05da4e95558d8534e5e6e8..2998880434d0b8385ca4874c04856f19381726d5 100644 --- a/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_gn-head-align_4xb4-2x_coco.py +++ b/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_gn-head-align_4xb4-2x_coco.py @@ -1,20 +1,10 @@ -_base_ = './fovea_r50_fpn_4xb4-1x_coco.py' -model = dict( - bbox_head=dict( - with_deform=True, - norm_cfg=dict(type='GN', num_groups=32, requires_grad=True))) +_base_ = "./fovea_r50_fpn_4xb4-1x_coco.py" +model = dict(bbox_head=dict(with_deform=True, norm_cfg=dict(type="GN", num_groups=32, requires_grad=True))) # learning policy max_epochs = 24 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py b/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py index 5690bcae08cd0e639afe3c832a46f78036324c08..da9b6e9a2228ed3ece3e1ac6520ec3753678430c 100644 --- a/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py +++ b/mmpose/configs/mmdet/foveabox/fovea_r50_fpn_gn-head-align_ms-640-800-4xb4-2x_coco.py @@ -1,30 +1,17 @@ -_base_ = './fovea_r50_fpn_4xb4-1x_coco.py' -model = dict( - bbox_head=dict( - with_deform=True, - norm_cfg=dict(type='GN', num_groups=32, requires_grad=True))) +_base_ = "./fovea_r50_fpn_4xb4-1x_coco.py" +model = dict(bbox_head=dict(with_deform=True, norm_cfg=dict(type="GN", num_groups=32, requires_grad=True))) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) # learning policy max_epochs = 24 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/fpg/faster-rcnn_r50_fpg-chn128_crop640-50e_coco.py b/mmpose/configs/mmdet/fpg/faster-rcnn_r50_fpg-chn128_crop640-50e_coco.py index cb9160f5cc7e118069d7172573018515aa406331..6a3cfbb07a0ff8e9d26f8489d182a5ef6325a861 100644 --- a/mmpose/configs/mmdet/fpg/faster-rcnn_r50_fpg-chn128_crop640-50e_coco.py +++ b/mmpose/configs/mmdet/fpg/faster-rcnn_r50_fpg-chn128_crop640-50e_coco.py @@ -1,9 +1,8 @@ -_base_ = 'faster-rcnn_r50_fpg_crop640-50e_coco.py' +_base_ = "faster-rcnn_r50_fpg_crop640-50e_coco.py" -norm_cfg = dict(type='BN', requires_grad=True) +norm_cfg = dict(type="BN", requires_grad=True) model = dict( neck=dict(out_channels=128, inter_channels=128), rpn_head=dict(in_channels=128), - roi_head=dict( - bbox_roi_extractor=dict(out_channels=128), - bbox_head=dict(in_channels=128))) + roi_head=dict(bbox_roi_extractor=dict(out_channels=128), bbox_head=dict(in_channels=128)), +) diff --git a/mmpose/configs/mmdet/fpg/faster-rcnn_r50_fpg_crop640-50e_coco.py b/mmpose/configs/mmdet/fpg/faster-rcnn_r50_fpg_crop640-50e_coco.py index d0d366f1f30e5bcc6d52010c46d60183b56386ea..50db0f5170cab7e657273c020e460d130df06dbf 100644 --- a/mmpose/configs/mmdet/fpg/faster-rcnn_r50_fpg_crop640-50e_coco.py +++ b/mmpose/configs/mmdet/fpg/faster-rcnn_r50_fpg_crop640-50e_coco.py @@ -1,48 +1,27 @@ -_base_ = 'faster-rcnn_r50_fpn_crop640-50e_coco.py' +_base_ = "faster-rcnn_r50_fpn_crop640-50e_coco.py" -norm_cfg = dict(type='BN', requires_grad=True) +norm_cfg = dict(type="BN", requires_grad=True) model = dict( neck=dict( - type='FPG', + type="FPG", in_channels=[256, 512, 1024, 2048], out_channels=256, inter_channels=256, num_outs=5, stack_times=9, - paths=['bu'] * 9, + paths=["bu"] * 9, same_down_trans=None, same_up_trans=dict( - type='conv', - kernel_size=3, - stride=2, - padding=1, - norm_cfg=norm_cfg, - inplace=False, - order=('act', 'conv', 'norm')), - across_lateral_trans=dict( - type='conv', - kernel_size=1, - norm_cfg=norm_cfg, - inplace=False, - order=('act', 'conv', 'norm')), + type="conv", kernel_size=3, stride=2, padding=1, norm_cfg=norm_cfg, inplace=False, order=("act", "conv", "norm") + ), + across_lateral_trans=dict(type="conv", kernel_size=1, norm_cfg=norm_cfg, inplace=False, order=("act", "conv", "norm")), across_down_trans=dict( - type='interpolation_conv', - mode='nearest', - kernel_size=3, - norm_cfg=norm_cfg, - order=('act', 'conv', 'norm'), - inplace=False), + type="interpolation_conv", mode="nearest", kernel_size=3, norm_cfg=norm_cfg, order=("act", "conv", "norm"), inplace=False + ), across_up_trans=None, - across_skip_trans=dict( - type='conv', - kernel_size=1, - norm_cfg=norm_cfg, - inplace=False, - order=('act', 'conv', 'norm')), - output_trans=dict( - type='last_conv', - kernel_size=3, - order=('act', 'conv', 'norm'), - inplace=False), + across_skip_trans=dict(type="conv", kernel_size=1, norm_cfg=norm_cfg, inplace=False, order=("act", "conv", "norm")), + output_trans=dict(type="last_conv", kernel_size=3, order=("act", "conv", "norm"), inplace=False), norm_cfg=norm_cfg, - skip_inds=[(0, 1, 2, 3), (0, 1, 2), (0, 1), (0, ), ()])) + skip_inds=[(0, 1, 2, 3), (0, 1, 2), (0, 1), (0,), ()], + ) +) diff --git a/mmpose/configs/mmdet/fpg/faster-rcnn_r50_fpn_crop640-50e_coco.py b/mmpose/configs/mmdet/fpg/faster-rcnn_r50_fpn_crop640-50e_coco.py index 46211de03f34e6a9709a9cfa8561b88a90f69581..42ab30d4763f0307fd95d11726a55e325811c5f6 100644 --- a/mmpose/configs/mmdet/fpg/faster-rcnn_r50_fpn_crop640-50e_coco.py +++ b/mmpose/configs/mmdet/fpg/faster-rcnn_r50_fpn_crop640-50e_coco.py @@ -1,48 +1,38 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -norm_cfg = dict(type='BN', requires_grad=True) +norm_cfg = dict(type="BN", requires_grad=True) image_size = (640, 640) -batch_augments = [dict(type='BatchFixedSizePad', size=image_size)] +batch_augments = [dict(type="BatchFixedSizePad", size=image_size)] model = dict( data_preprocessor=dict(pad_size_divisor=64, batch_augments=batch_augments), backbone=dict(norm_cfg=norm_cfg, norm_eval=False), neck=dict(norm_cfg=norm_cfg), - roi_head=dict(bbox_head=dict(norm_cfg=norm_cfg))) -dataset_type = 'CocoDataset' -data_root = 'data/coco/' + roi_head=dict(bbox_head=dict(norm_cfg=norm_cfg)), +) +dataset_type = "CocoDataset" +data_root = "data/coco/" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', - scale=image_size, - ratio_range=(0.8, 1.2), - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=image_size, - allow_negative_crop=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=image_size, ratio_range=(0.8, 1.2), keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=image_size, allow_negative_crop=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=image_size, keep_ratio=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=image_size, keep_ratio=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -train_dataloader = dict( - batch_size=8, num_workers=4, dataset=dict(pipeline=train_pipeline)) +train_dataloader = dict(batch_size=8, num_workers=4, dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader @@ -50,22 +40,17 @@ test_dataloader = val_dataloader max_epochs = 50 train_cfg = dict(max_epochs=max_epochs, val_interval=2) param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[30, 40], - gamma=0.1) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[30, 40], gamma=0.1), ] # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.08, momentum=0.9, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="SGD", lr=0.08, momentum=0.9, weight_decay=0.0001), paramwise_cfg=dict(norm_decay_mult=0, bypass_duplicate=True), - clip_grad=None) + clip_grad=None, +) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/fpg/mask-rcnn_r50_fpg-chn128_crop640-50e_coco.py b/mmpose/configs/mmdet/fpg/mask-rcnn_r50_fpg-chn128_crop640-50e_coco.py index 804393966c6711a1e5261ace00e9b8b84283fde5..c3b1990e97b05a4806250654306214854f8a9dff 100644 --- a/mmpose/configs/mmdet/fpg/mask-rcnn_r50_fpg-chn128_crop640-50e_coco.py +++ b/mmpose/configs/mmdet/fpg/mask-rcnn_r50_fpg-chn128_crop640-50e_coco.py @@ -1,4 +1,4 @@ -_base_ = 'mask-rcnn_r50_fpg_crop640-50e_coco.py' +_base_ = "mask-rcnn_r50_fpg_crop640-50e_coco.py" model = dict( neck=dict(out_channels=128, inter_channels=128), @@ -7,4 +7,6 @@ model = dict( bbox_roi_extractor=dict(out_channels=128), bbox_head=dict(in_channels=128), mask_roi_extractor=dict(out_channels=128), - mask_head=dict(in_channels=128))) + mask_head=dict(in_channels=128), + ), +) diff --git a/mmpose/configs/mmdet/fpg/mask-rcnn_r50_fpg_crop640-50e_coco.py b/mmpose/configs/mmdet/fpg/mask-rcnn_r50_fpg_crop640-50e_coco.py index 135bb60bb340c40a47a9bd64e5a8afc57ede60db..4844a3428536764756e82554bbc2733f98d19001 100644 --- a/mmpose/configs/mmdet/fpg/mask-rcnn_r50_fpg_crop640-50e_coco.py +++ b/mmpose/configs/mmdet/fpg/mask-rcnn_r50_fpg_crop640-50e_coco.py @@ -1,48 +1,27 @@ -_base_ = 'mask-rcnn_r50_fpn_crop640-50e_coco.py' +_base_ = "mask-rcnn_r50_fpn_crop640-50e_coco.py" -norm_cfg = dict(type='BN', requires_grad=True) +norm_cfg = dict(type="BN", requires_grad=True) model = dict( neck=dict( - type='FPG', + type="FPG", in_channels=[256, 512, 1024, 2048], out_channels=256, inter_channels=256, num_outs=5, stack_times=9, - paths=['bu'] * 9, + paths=["bu"] * 9, same_down_trans=None, same_up_trans=dict( - type='conv', - kernel_size=3, - stride=2, - padding=1, - norm_cfg=norm_cfg, - inplace=False, - order=('act', 'conv', 'norm')), - across_lateral_trans=dict( - type='conv', - kernel_size=1, - norm_cfg=norm_cfg, - inplace=False, - order=('act', 'conv', 'norm')), + type="conv", kernel_size=3, stride=2, padding=1, norm_cfg=norm_cfg, inplace=False, order=("act", "conv", "norm") + ), + across_lateral_trans=dict(type="conv", kernel_size=1, norm_cfg=norm_cfg, inplace=False, order=("act", "conv", "norm")), across_down_trans=dict( - type='interpolation_conv', - mode='nearest', - kernel_size=3, - norm_cfg=norm_cfg, - order=('act', 'conv', 'norm'), - inplace=False), + type="interpolation_conv", mode="nearest", kernel_size=3, norm_cfg=norm_cfg, order=("act", "conv", "norm"), inplace=False + ), across_up_trans=None, - across_skip_trans=dict( - type='conv', - kernel_size=1, - norm_cfg=norm_cfg, - inplace=False, - order=('act', 'conv', 'norm')), - output_trans=dict( - type='last_conv', - kernel_size=3, - order=('act', 'conv', 'norm'), - inplace=False), + across_skip_trans=dict(type="conv", kernel_size=1, norm_cfg=norm_cfg, inplace=False, order=("act", "conv", "norm")), + output_trans=dict(type="last_conv", kernel_size=3, order=("act", "conv", "norm"), inplace=False), norm_cfg=norm_cfg, - skip_inds=[(0, 1, 2, 3), (0, 1, 2), (0, 1), (0, ), ()])) + skip_inds=[(0, 1, 2, 3), (0, 1, 2), (0, 1), (0,), ()], + ) +) diff --git a/mmpose/configs/mmdet/fpg/mask-rcnn_r50_fpn_crop640-50e_coco.py b/mmpose/configs/mmdet/fpg/mask-rcnn_r50_fpn_crop640-50e_coco.py index 08ca5b6ffd8b9d166857d3c27bb6f5bde91416cc..b98eba82f1c9754648e5640072eb3224f3673a12 100644 --- a/mmpose/configs/mmdet/fpg/mask-rcnn_r50_fpn_crop640-50e_coco.py +++ b/mmpose/configs/mmdet/fpg/mask-rcnn_r50_fpn_crop640-50e_coco.py @@ -1,54 +1,38 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -norm_cfg = dict(type='BN', requires_grad=True) +norm_cfg = dict(type="BN", requires_grad=True) image_size = (640, 640) -batch_augments = [dict(type='BatchFixedSizePad', size=image_size)] +batch_augments = [dict(type="BatchFixedSizePad", size=image_size)] model = dict( data_preprocessor=dict(pad_size_divisor=64, batch_augments=batch_augments), backbone=dict(norm_cfg=norm_cfg, norm_eval=False), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - norm_cfg=norm_cfg, - num_outs=5), - roi_head=dict( - bbox_head=dict(norm_cfg=norm_cfg), mask_head=dict(norm_cfg=norm_cfg))) -dataset_type = 'CocoDataset' -data_root = 'data/coco/' + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, norm_cfg=norm_cfg, num_outs=5), + roi_head=dict(bbox_head=dict(norm_cfg=norm_cfg), mask_head=dict(norm_cfg=norm_cfg)), +) +dataset_type = "CocoDataset" +data_root = "data/coco/" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomResize', - scale=image_size, - ratio_range=(0.8, 1.2), - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=image_size, - allow_negative_crop=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomResize", scale=image_size, ratio_range=(0.8, 1.2), keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=image_size, allow_negative_crop=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=image_size, keep_ratio=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=image_size, keep_ratio=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -train_dataloader = dict( - batch_size=8, num_workers=4, dataset=dict(pipeline=train_pipeline)) +train_dataloader = dict(batch_size=8, num_workers=4, dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader @@ -56,22 +40,17 @@ test_dataloader = val_dataloader max_epochs = 50 train_cfg = dict(max_epochs=max_epochs, val_interval=2) param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[30, 40], - gamma=0.1) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[30, 40], gamma=0.1), ] # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.08, momentum=0.9, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="SGD", lr=0.08, momentum=0.9, weight_decay=0.0001), paramwise_cfg=dict(norm_decay_mult=0, bypass_duplicate=True), - clip_grad=None) + clip_grad=None, +) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/fpg/retinanet_r50_fpg-chn128_crop640_50e_coco.py b/mmpose/configs/mmdet/fpg/retinanet_r50_fpg-chn128_crop640_50e_coco.py index 9a6cf7e56a4f23a42d3905560a9b8035d6d935ff..a90dafa60a28835ed4772d6c37c0ab7060f2b4ae 100644 --- a/mmpose/configs/mmdet/fpg/retinanet_r50_fpg-chn128_crop640_50e_coco.py +++ b/mmpose/configs/mmdet/fpg/retinanet_r50_fpg-chn128_crop640_50e_coco.py @@ -1,5 +1,3 @@ -_base_ = 'retinanet_r50_fpg_crop640_50e_coco.py' +_base_ = "retinanet_r50_fpg_crop640_50e_coco.py" -model = dict( - neck=dict(out_channels=128, inter_channels=128), - bbox_head=dict(in_channels=128)) +model = dict(neck=dict(out_channels=128, inter_channels=128), bbox_head=dict(in_channels=128)) diff --git a/mmpose/configs/mmdet/fpg/retinanet_r50_fpg_crop640_50e_coco.py b/mmpose/configs/mmdet/fpg/retinanet_r50_fpg_crop640_50e_coco.py index e2aac283992ea9e4595e7594233b21208bd672f5..91c4fef5c77d70372e5567eeeb63270fc2eaa530 100644 --- a/mmpose/configs/mmdet/fpg/retinanet_r50_fpg_crop640_50e_coco.py +++ b/mmpose/configs/mmdet/fpg/retinanet_r50_fpg_crop640_50e_coco.py @@ -1,10 +1,10 @@ -_base_ = '../nas_fpn/retinanet_r50_nasfpn_crop640-50e_coco.py' +_base_ = "../nas_fpn/retinanet_r50_nasfpn_crop640-50e_coco.py" -norm_cfg = dict(type='BN', requires_grad=True) +norm_cfg = dict(type="BN", requires_grad=True) model = dict( neck=dict( _delete_=True, - type='FPG', + type="FPG", in_channels=[256, 512, 1024, 2048], out_channels=256, inter_channels=256, @@ -12,42 +12,21 @@ model = dict( add_extra_convs=True, start_level=1, stack_times=9, - paths=['bu'] * 9, + paths=["bu"] * 9, same_down_trans=None, same_up_trans=dict( - type='conv', - kernel_size=3, - stride=2, - padding=1, - norm_cfg=norm_cfg, - inplace=False, - order=('act', 'conv', 'norm')), - across_lateral_trans=dict( - type='conv', - kernel_size=1, - norm_cfg=norm_cfg, - inplace=False, - order=('act', 'conv', 'norm')), + type="conv", kernel_size=3, stride=2, padding=1, norm_cfg=norm_cfg, inplace=False, order=("act", "conv", "norm") + ), + across_lateral_trans=dict(type="conv", kernel_size=1, norm_cfg=norm_cfg, inplace=False, order=("act", "conv", "norm")), across_down_trans=dict( - type='interpolation_conv', - mode='nearest', - kernel_size=3, - norm_cfg=norm_cfg, - order=('act', 'conv', 'norm'), - inplace=False), + type="interpolation_conv", mode="nearest", kernel_size=3, norm_cfg=norm_cfg, order=("act", "conv", "norm"), inplace=False + ), across_up_trans=None, - across_skip_trans=dict( - type='conv', - kernel_size=1, - norm_cfg=norm_cfg, - inplace=False, - order=('act', 'conv', 'norm')), - output_trans=dict( - type='last_conv', - kernel_size=3, - order=('act', 'conv', 'norm'), - inplace=False), + across_skip_trans=dict(type="conv", kernel_size=1, norm_cfg=norm_cfg, inplace=False, order=("act", "conv", "norm")), + output_trans=dict(type="last_conv", kernel_size=3, order=("act", "conv", "norm"), inplace=False), norm_cfg=norm_cfg, - skip_inds=[(0, 1, 2, 3), (0, 1, 2), (0, 1), (0, ), ()])) + skip_inds=[(0, 1, 2, 3), (0, 1, 2), (0, 1), (0,), ()], + ) +) train_cfg = dict(val_interval=2) diff --git a/mmpose/configs/mmdet/free_anchor/freeanchor_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/free_anchor/freeanchor_r101_fpn_1x_coco.py index dc323d94f7aa20b38e2204a38ed8e234dd4eadd1..3e6fa2fa5b91b80bab508c484bf81a06fc53c85f 100644 --- a/mmpose/configs/mmdet/free_anchor/freeanchor_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/free_anchor/freeanchor_r101_fpn_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './freeanchor_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./freeanchor_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/free_anchor/freeanchor_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/free_anchor/freeanchor_r50_fpn_1x_coco.py index 13f64d14a1ead0431549b8569d031f72669a2e84..5a1536213ee10d9d8c77f90a1601e94c85f47cdf 100644 --- a/mmpose/configs/mmdet/free_anchor/freeanchor_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/free_anchor/freeanchor_r50_fpn_1x_coco.py @@ -1,22 +1,18 @@ -_base_ = '../retinanet/retinanet_r50_fpn_1x_coco.py' +_base_ = "../retinanet/retinanet_r50_fpn_1x_coco.py" model = dict( bbox_head=dict( _delete_=True, - type='FreeAnchorRetinaHead', + type="FreeAnchorRetinaHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=4, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_bbox=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.75))) + type="AnchorGenerator", octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128] + ), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_bbox=dict(type="SmoothL1Loss", beta=0.11, loss_weight=0.75), + ) +) optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/free_anchor/freeanchor_x101-32x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/free_anchor/freeanchor_x101-32x4d_fpn_1x_coco.py index 8e448bc1123115d37ef9f21a33c8a6b38cd821c3..0be8f0980e3eb5cc1dae46b2159f534552cf752a 100644 --- a/mmpose/configs/mmdet/free_anchor/freeanchor_x101-32x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/free_anchor/freeanchor_x101-32x4d_fpn_1x_coco.py @@ -1,13 +1,14 @@ -_base_ = './freeanchor_r50_fpn_1x_coco.py' +_base_ = "./freeanchor_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/fsaf/fsaf_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/fsaf/fsaf_r101_fpn_1x_coco.py index 12b49fed5b6cd617aa9c05d76ed737d755992a34..042a9b1fc2965d82ed5afca07f41b3f282325985 100644 --- a/mmpose/configs/mmdet/fsaf/fsaf_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/fsaf/fsaf_r101_fpn_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './fsaf_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./fsaf_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/fsaf/fsaf_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/fsaf/fsaf_r50_fpn_1x_coco.py index e7165cd63c74ab27ff47f8255836f4c10158cf0e..b23b53e344c9e2eca73759d6b28411a9763b298d 100644 --- a/mmpose/configs/mmdet/fsaf/fsaf_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/fsaf/fsaf_r50_fpn_1x_coco.py @@ -1,9 +1,9 @@ -_base_ = '../retinanet/retinanet_r50_fpn_1x_coco.py' +_base_ = "../retinanet/retinanet_r50_fpn_1x_coco.py" # model settings model = dict( - type='FSAF', + type="FSAF", bbox_head=dict( - type='FSAFHead', + type="FSAFHead", num_classes=80, in_channels=256, stacked_convs=4, @@ -12,36 +12,18 @@ model = dict( # Only anchor-free branch is implemented. The anchor generator only # generates 1 anchor at each feature point, as a substitute of the # grid of features. - anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=1, - scales_per_octave=1, - ratios=[1.0], - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict(_delete_=True, type='TBLRBBoxCoder', normalizer=4.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0, - reduction='none'), - loss_bbox=dict( - _delete_=True, - type='IoULoss', - eps=1e-6, - loss_weight=1.0, - reduction='none')), + anchor_generator=dict(type="AnchorGenerator", octave_base_scale=1, scales_per_octave=1, ratios=[1.0], strides=[8, 16, 32, 64, 128]), + bbox_coder=dict(_delete_=True, type="TBLRBBoxCoder", normalizer=4.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0, reduction="none"), + loss_bbox=dict(_delete_=True, type="IoULoss", eps=1e-6, loss_weight=1.0, reduction="none"), + ), # training and testing settings train_cfg=dict( - assigner=dict( - _delete_=True, - type='CenterRegionAssigner', - pos_scale=0.2, - neg_scale=0.2, - min_pos_iof=0.01), + assigner=dict(_delete_=True, type="CenterRegionAssigner", pos_scale=0.2, neg_scale=0.2, min_pos_iof=0.01), allowed_border=-1, pos_weight=-1, - debug=False)) + debug=False, + ), +) optim_wrapper = dict(clip_grad=dict(max_norm=10, norm_type=2)) diff --git a/mmpose/configs/mmdet/fsaf/fsaf_x101-64x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/fsaf/fsaf_x101-64x4d_fpn_1x_coco.py index 89c0c6344aba6e6eae5657eff60745645dd1e8dc..4a11c73ccd35589f5253ce8d2b1bdebfd6bbd404 100644 --- a/mmpose/configs/mmdet/fsaf/fsaf_x101-64x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/fsaf/fsaf_x101-64x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './fsaf_r50_fpn_1x_coco.py' +_base_ = "./fsaf_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-dconv-c3-c5-r16-gcb-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-dconv-c3-c5-r16-gcb-c3-c5_fpn_1x_coco.py index 6cf605b666e460aee48adc629b0604af4c64e306..5540a9dc426b8055d4ea5eeb0dbcc91da8f681df 100644 --- a/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-dconv-c3-c5-r16-gcb-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-dconv-c3-c5-r16-gcb-c3-c5_fpn_1x_coco.py @@ -1,11 +1,8 @@ -_base_ = '../dcn/cascade-mask-rcnn_x101-32x4d-dconv-c3-c5_fpn_1x_coco.py' +_base_ = "../dcn/cascade-mask-rcnn_x101-32x4d-dconv-c3-c5_fpn_1x_coco.py" model = dict( backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False, - plugins=[ - dict( - cfg=dict(type='ContextBlock', ratio=1. / 16), - stages=(False, True, True, True), - position='after_conv3') - ])) + plugins=[dict(cfg=dict(type="ContextBlock", ratio=1.0 / 16), stages=(False, True, True, True), position="after_conv3")], + ) +) diff --git a/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-dconv-c3-c5-r4-gcb-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-dconv-c3-c5-r4-gcb-c3-c5_fpn_1x_coco.py index 95fc687b664b25b754d4ba890ae9c9e982db65fb..d83fa60349d7a269679d07f1548acc0b1fbca23b 100644 --- a/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-dconv-c3-c5-r4-gcb-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-dconv-c3-c5-r4-gcb-c3-c5_fpn_1x_coco.py @@ -1,11 +1,8 @@ -_base_ = '../dcn/cascade-mask-rcnn_x101-32x4d-dconv-c3-c5_fpn_1x_coco.py' +_base_ = "../dcn/cascade-mask-rcnn_x101-32x4d-dconv-c3-c5_fpn_1x_coco.py" model = dict( backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False, - plugins=[ - dict( - cfg=dict(type='ContextBlock', ratio=1. / 4), - stages=(False, True, True, True), - position='after_conv3') - ])) + plugins=[dict(cfg=dict(type="ContextBlock", ratio=1.0 / 4), stages=(False, True, True, True), position="after_conv3")], + ) +) diff --git a/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-dconv-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-dconv-c3-c5_fpn_1x_coco.py index 9b77dc9315f52f9437eb1e39f6d518f1afaa41bb..a0161f66b41432710ec61971cb1cb30f80c81056 100644 --- a/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-dconv-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-dconv-c3-c5_fpn_1x_coco.py @@ -1,4 +1,2 @@ -_base_ = '../dcn/cascade-mask-rcnn_x101-32x4d-dconv-c3-c5_fpn_1x_coco.py' -model = dict( - backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), norm_eval=False)) +_base_ = "../dcn/cascade-mask-rcnn_x101-32x4d-dconv-c3-c5_fpn_1x_coco.py" +model = dict(backbone=dict(norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False)) diff --git a/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-r16-gcb-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-r16-gcb-c3-c5_fpn_1x_coco.py index 8f97972aa2b7d151d5824de40da9cedae9c57535..a22b9e87685939a7b611f6d2b17ca3a84dbdc31e 100644 --- a/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-r16-gcb-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-r16-gcb-c3-c5_fpn_1x_coco.py @@ -1,11 +1,8 @@ -_base_ = '../cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_1x_coco.py' +_base_ = "../cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_1x_coco.py" model = dict( backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False, - plugins=[ - dict( - cfg=dict(type='ContextBlock', ratio=1. / 16), - stages=(False, True, True, True), - position='after_conv3') - ])) + plugins=[dict(cfg=dict(type="ContextBlock", ratio=1.0 / 16), stages=(False, True, True, True), position="after_conv3")], + ) +) diff --git a/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-r4-gcb-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-r4-gcb-c3-c5_fpn_1x_coco.py index 8404cfdaf34e470d2bff57a707ca8183fe442131..72c95ababa854ea55c261649eccd9f4b3912786c 100644 --- a/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-r4-gcb-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn-r4-gcb-c3-c5_fpn_1x_coco.py @@ -1,11 +1,8 @@ -_base_ = '../cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_1x_coco.py' +_base_ = "../cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_1x_coco.py" model = dict( backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False, - plugins=[ - dict( - cfg=dict(type='ContextBlock', ratio=1. / 4), - stages=(False, True, True, True), - position='after_conv3') - ])) + plugins=[dict(cfg=dict(type="ContextBlock", ratio=1.0 / 4), stages=(False, True, True, True), position="after_conv3")], + ) +) diff --git a/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn_fpn_1x_coco.py index 87667dee779ee8068075be17638a6d10a9985c7e..7a7df6d031649b772f5d8109584d4aec2aee2d15 100644 --- a/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/cascade-mask-rcnn_x101-32x4d-syncbn_fpn_1x_coco.py @@ -1,4 +1,2 @@ -_base_ = '../cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_1x_coco.py' -model = dict( - backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), norm_eval=False)) +_base_ = "../cascade_rcnn/cascade-mask-rcnn_x101-32x4d_fpn_1x_coco.py" +model = dict(backbone=dict(norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False)) diff --git a/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-gcb-r16-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-gcb-r16-c3-c5_fpn_1x_coco.py index 447e2c6d858738db0f0d2e46e57e1fccd2233af3..3fd2e5ad2ad56125f39407765bb5e35e4c851d49 100644 --- a/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-gcb-r16-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-gcb-r16-c3-c5_fpn_1x_coco.py @@ -1,8 +1,4 @@ -_base_ = '../mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py" model = dict( - backbone=dict(plugins=[ - dict( - cfg=dict(type='ContextBlock', ratio=1. / 16), - stages=(False, True, True, True), - position='after_conv3') - ])) + backbone=dict(plugins=[dict(cfg=dict(type="ContextBlock", ratio=1.0 / 16), stages=(False, True, True, True), position="after_conv3")]) +) diff --git a/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-gcb-r4-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-gcb-r4-c3-c5_fpn_1x_coco.py index 9c723a64b6f686b9dd0f8e7648c7b1b303205168..3207dfdc1b449ff00396cdea3e03316bf77ca294 100644 --- a/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-gcb-r4-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-gcb-r4-c3-c5_fpn_1x_coco.py @@ -1,8 +1,4 @@ -_base_ = '../mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py" model = dict( - backbone=dict(plugins=[ - dict( - cfg=dict(type='ContextBlock', ratio=1. / 4), - stages=(False, True, True, True), - position='after_conv3') - ])) + backbone=dict(plugins=[dict(cfg=dict(type="ContextBlock", ratio=1.0 / 4), stages=(False, True, True, True), position="after_conv3")]) +) diff --git a/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-syncbn-gcb-r16-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-syncbn-gcb-r16-c3-c5_fpn_1x_coco.py index 6f9d03d3f8d94116b4814825ad8377b534a912b1..d625cb289c75bd47b060b7c93b5588b47473871e 100644 --- a/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-syncbn-gcb-r16-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-syncbn-gcb-r16-c3-c5_fpn_1x_coco.py @@ -1,11 +1,8 @@ -_base_ = '../mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py" model = dict( backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False, - plugins=[ - dict( - cfg=dict(type='ContextBlock', ratio=1. / 16), - stages=(False, True, True, True), - position='after_conv3') - ])) + plugins=[dict(cfg=dict(type="ContextBlock", ratio=1.0 / 16), stages=(False, True, True, True), position="after_conv3")], + ) +) diff --git a/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py index d07cb0d488c0df76a137bad54123a7583c7da87b..24ebe8c61831c5600a4ac005bd084e77a7e5a0f4 100644 --- a/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py @@ -1,11 +1,8 @@ -_base_ = '../mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py" model = dict( backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False, - plugins=[ - dict( - cfg=dict(type='ContextBlock', ratio=1. / 4), - stages=(False, True, True, True), - position='after_conv3') - ])) + plugins=[dict(cfg=dict(type="ContextBlock", ratio=1.0 / 4), stages=(False, True, True, True), position="after_conv3")], + ) +) diff --git a/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-syncbn_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-syncbn_fpn_1x_coco.py index 957bdf55470017d9ac9fa482b416c2206266af86..38d20f9e9a0d3ba583a99eac8cb0c7033a744239 100644 --- a/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-syncbn_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/mask-rcnn_r101-syncbn_fpn_1x_coco.py @@ -1,4 +1,2 @@ -_base_ = '../mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py' -model = dict( - backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), norm_eval=False)) +_base_ = "../mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py" +model = dict(backbone=dict(norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False)) diff --git a/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-gcb-r16-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-gcb-r16-c3-c5_fpn_1x_coco.py index c9ec5ac3baf7c46ea95d4c3fcf4f5da4ad7a3dce..76e0095afb93e53b1ee9b05a41f45a6eab150fb6 100644 --- a/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-gcb-r16-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-gcb-r16-c3-c5_fpn_1x_coco.py @@ -1,8 +1,4 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" model = dict( - backbone=dict(plugins=[ - dict( - cfg=dict(type='ContextBlock', ratio=1. / 16), - stages=(False, True, True, True), - position='after_conv3') - ])) + backbone=dict(plugins=[dict(cfg=dict(type="ContextBlock", ratio=1.0 / 16), stages=(False, True, True, True), position="after_conv3")]) +) diff --git a/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-gcb-r4-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-gcb-r4-c3-c5_fpn_1x_coco.py index 42474d5196a8a130999db735989b423664486304..dc1d63f58bf603c225f5bd6bf8be1378ac82a974 100644 --- a/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-gcb-r4-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-gcb-r4-c3-c5_fpn_1x_coco.py @@ -1,8 +1,4 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" model = dict( - backbone=dict(plugins=[ - dict( - cfg=dict(type='ContextBlock', ratio=1. / 4), - stages=(False, True, True, True), - position='after_conv3') - ])) + backbone=dict(plugins=[dict(cfg=dict(type="ContextBlock", ratio=1.0 / 4), stages=(False, True, True, True), position="after_conv3")]) +) diff --git a/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-syncbn-gcb-r16-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-syncbn-gcb-r16-c3-c5_fpn_1x_coco.py index ac1928082405baebfe5ec483f37b9775da21d5ad..ebb0c098f9533cb4fd239cab24cb61a588210950 100644 --- a/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-syncbn-gcb-r16-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-syncbn-gcb-r16-c3-c5_fpn_1x_coco.py @@ -1,11 +1,8 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False, - plugins=[ - dict( - cfg=dict(type='ContextBlock', ratio=1. / 16), - stages=(False, True, True, True), - position='after_conv3') - ])) + plugins=[dict(cfg=dict(type="ContextBlock", ratio=1.0 / 16), stages=(False, True, True, True), position="after_conv3")], + ) +) diff --git a/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py index ae29f0cebe4f9fe16f2fea3de53874914186da9b..d942557dcd91afba524e5ab3a2f6aaf3fd962c43 100644 --- a/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py @@ -1,11 +1,8 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False, - plugins=[ - dict( - cfg=dict(type='ContextBlock', ratio=1. / 4), - stages=(False, True, True, True), - position='after_conv3') - ])) + plugins=[dict(cfg=dict(type="ContextBlock", ratio=1.0 / 4), stages=(False, True, True, True), position="after_conv3")], + ) +) diff --git a/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-syncbn_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-syncbn_fpn_1x_coco.py index f8ef27bad9743cba8f7134f1a77a091af1bca093..de67f47f9b270f36cfc89430b0b21bf7cbde097a 100644 --- a/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-syncbn_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/mask-rcnn_r50-syncbn_fpn_1x_coco.py @@ -1,4 +1,2 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), norm_eval=False)) +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False)) diff --git a/mmpose/configs/mmdet/gcnet/mask-rcnn_x101-32x4d-syncbn-gcb-r16-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/mask-rcnn_x101-32x4d-syncbn-gcb-r16-c3-c5_fpn_1x_coco.py index 1a2e2c9f26b25c5aefba912997cd01db60854a5e..c89f4669dbc1e3cb3e651c18c65866cbb1ef7d7a 100644 --- a/mmpose/configs/mmdet/gcnet/mask-rcnn_x101-32x4d-syncbn-gcb-r16-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/mask-rcnn_x101-32x4d-syncbn-gcb-r16-c3-c5_fpn_1x_coco.py @@ -1,11 +1,8 @@ -_base_ = '../mask_rcnn/mask-rcnn_x101-32x4d_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_x101-32x4d_fpn_1x_coco.py" model = dict( backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False, - plugins=[ - dict( - cfg=dict(type='ContextBlock', ratio=1. / 16), - stages=(False, True, True, True), - position='after_conv3') - ])) + plugins=[dict(cfg=dict(type="ContextBlock", ratio=1.0 / 16), stages=(False, True, True, True), position="after_conv3")], + ) +) diff --git a/mmpose/configs/mmdet/gcnet/mask-rcnn_x101-32x4d-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/mask-rcnn_x101-32x4d-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py index 65d3f9aadf5f79a4fb9fc9082dfabfdb3de08871..51a0f85fe4215a3d1765e224478658a6e72fec08 100644 --- a/mmpose/configs/mmdet/gcnet/mask-rcnn_x101-32x4d-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/mask-rcnn_x101-32x4d-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py @@ -1,11 +1,8 @@ -_base_ = '../mask_rcnn/mask-rcnn_x101-32x4d_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_x101-32x4d_fpn_1x_coco.py" model = dict( backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False, - plugins=[ - dict( - cfg=dict(type='ContextBlock', ratio=1. / 4), - stages=(False, True, True, True), - position='after_conv3') - ])) + plugins=[dict(cfg=dict(type="ContextBlock", ratio=1.0 / 4), stages=(False, True, True, True), position="after_conv3")], + ) +) diff --git a/mmpose/configs/mmdet/gcnet/mask-rcnn_x101-32x4d-syncbn_fpn_1x_coco.py b/mmpose/configs/mmdet/gcnet/mask-rcnn_x101-32x4d-syncbn_fpn_1x_coco.py index b5343a6d4596eb82245ef078d36a5a6ce5137aeb..d9966cc000769667645e2f22e98c9921e5da66d9 100644 --- a/mmpose/configs/mmdet/gcnet/mask-rcnn_x101-32x4d-syncbn_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gcnet/mask-rcnn_x101-32x4d-syncbn_fpn_1x_coco.py @@ -1,4 +1,2 @@ -_base_ = '../mask_rcnn/mask-rcnn_x101-32x4d_fpn_1x_coco.py' -model = dict( - backbone=dict( - norm_cfg=dict(type='SyncBN', requires_grad=True), norm_eval=False)) +_base_ = "../mask_rcnn/mask-rcnn_x101-32x4d_fpn_1x_coco.py" +model = dict(backbone=dict(norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False)) diff --git a/mmpose/configs/mmdet/gfl/gfl_r101-dconv-c3-c5_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/gfl/gfl_r101-dconv-c3-c5_fpn_ms-2x_coco.py index 7f748935b62884fd501af7e6731ad3ef6ce0effb..2bfb68c0a527d2623f49e47404071c39991e1b31 100644 --- a/mmpose/configs/mmdet/gfl/gfl_r101-dconv-c3-c5_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/gfl/gfl_r101-dconv-c3-c5_fpn_ms-2x_coco.py @@ -1,15 +1,16 @@ -_base_ = './gfl_r50_fpn_ms-2x_coco.py' +_base_ = "./gfl_r50_fpn_ms-2x_coco.py" model = dict( backbone=dict( - type='ResNet', + type="ResNet", depth=101, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), + norm_cfg=dict(type="BN", requires_grad=True), + dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), + ) +) diff --git a/mmpose/configs/mmdet/gfl/gfl_r101_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/gfl/gfl_r101_fpn_ms-2x_coco.py index 10135f161b9e933612d961af12a8e30198cca484..15bf3c2a689361ab83e551ffe689fb7a5158ac51 100644 --- a/mmpose/configs/mmdet/gfl/gfl_r101_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/gfl/gfl_r101_fpn_ms-2x_coco.py @@ -1,13 +1,14 @@ -_base_ = './gfl_r50_fpn_ms-2x_coco.py' +_base_ = "./gfl_r50_fpn_ms-2x_coco.py" model = dict( backbone=dict( - type='ResNet', + type="ResNet", depth=101, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), + ) +) diff --git a/mmpose/configs/mmdet/gfl/gfl_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/gfl/gfl_r50_fpn_1x_coco.py index 902382552d58f124bbe2b8c2904ce74ec7b7a4d8..5399eb9959faf1e01fc1b484895b234aaf1f47f4 100644 --- a/mmpose/configs/mmdet/gfl/gfl_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/gfl/gfl_r50_fpn_1x_coco.py @@ -1,66 +1,37 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] model = dict( - type='GFL', + type="GFL", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5), bbox_head=dict( - type='GFLHead', + type="GFLHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128]), - loss_cls=dict( - type='QualityFocalLoss', - use_sigmoid=True, - beta=2.0, - loss_weight=1.0), - loss_dfl=dict(type='DistributionFocalLoss', loss_weight=0.25), + anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), + loss_cls=dict(type="QualityFocalLoss", use_sigmoid=True, beta=2.0, loss_weight=1.0), + loss_dfl=dict(type="DistributionFocalLoss", loss_weight=0.25), reg_max=16, - loss_bbox=dict(type='GIoULoss', loss_weight=2.0)), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), + ), # training and testing settings - train_cfg=dict( - assigner=dict(type='ATSSAssigner', topk=9), - allowed_border=-1, - pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + train_cfg=dict(assigner=dict(type="ATSSAssigner", topk=9), allowed_border=-1, pos_weight=-1, debug=False), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/gfl/gfl_r50_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/gfl/gfl_r50_fpn_ms-2x_coco.py index 22770eb101920f9daae750a1b72f5410be395743..e121b019d6c269aeaa78c4c2ac452842d3d2dc4f 100644 --- a/mmpose/configs/mmdet/gfl/gfl_r50_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/gfl/gfl_r50_fpn_ms-2x_coco.py @@ -1,28 +1,19 @@ -_base_ = './gfl_r50_fpn_1x_coco.py' +_base_ = "./gfl_r50_fpn_1x_coco.py" max_epochs = 24 # learning policy param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) # multi-scale training train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', scale=[(1333, 480), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=[(1333, 480), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/gfl/gfl_x101-32x4d-dconv-c4-c5_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/gfl/gfl_x101-32x4d-dconv-c4-c5_fpn_ms-2x_coco.py index 6aa98eea2d0d25b4df1570aed97cce8475e9104d..5a652d539ba28f365d098cca37e6c59297623744 100644 --- a/mmpose/configs/mmdet/gfl/gfl_x101-32x4d-dconv-c4-c5_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/gfl/gfl_x101-32x4d-dconv-c4-c5_fpn_ms-2x_coco.py @@ -1,18 +1,19 @@ -_base_ = './gfl_r50_fpn_ms-2x_coco.py' +_base_ = "./gfl_r50_fpn_ms-2x_coco.py" model = dict( - type='GFL', + type="GFL", backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), + norm_cfg=dict(type="BN", requires_grad=True), + dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, False, True, True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ), +) diff --git a/mmpose/configs/mmdet/gfl/gfl_x101-32x4d_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/gfl/gfl_x101-32x4d_fpn_ms-2x_coco.py index ec629b1f0d5d3317dcb20f1244bc713818518d8a..fd5ce75e41f5ae555290cb9eef893985a5c9c79a 100644 --- a/mmpose/configs/mmdet/gfl/gfl_x101-32x4d_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/gfl/gfl_x101-32x4d_fpn_ms-2x_coco.py @@ -1,16 +1,17 @@ -_base_ = './gfl_r50_fpn_ms-2x_coco.py' +_base_ = "./gfl_r50_fpn_ms-2x_coco.py" model = dict( - type='GFL', + type="GFL", backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ), +) diff --git a/mmpose/configs/mmdet/ghm/retinanet_r101_fpn_ghm-1x_coco.py b/mmpose/configs/mmdet/ghm/retinanet_r101_fpn_ghm-1x_coco.py index 090221e68f68a95cfcf092b15f2636cd28fc9d87..b1c63f3b9959bde2698f13ca2e72610f295d9a66 100644 --- a/mmpose/configs/mmdet/ghm/retinanet_r101_fpn_ghm-1x_coco.py +++ b/mmpose/configs/mmdet/ghm/retinanet_r101_fpn_ghm-1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './retinanet_r50_fpn_ghm-1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./retinanet_r50_fpn_ghm-1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/ghm/retinanet_r50_fpn_ghm-1x_coco.py b/mmpose/configs/mmdet/ghm/retinanet_r50_fpn_ghm-1x_coco.py index 42b9aa6d05dc64f3045685a7c23d632a6041249c..3a0bc48df5548e6fd87d7f521e9c5149f2ef3dcd 100644 --- a/mmpose/configs/mmdet/ghm/retinanet_r50_fpn_ghm-1x_coco.py +++ b/mmpose/configs/mmdet/ghm/retinanet_r50_fpn_ghm-1x_coco.py @@ -1,18 +1,8 @@ -_base_ = '../retinanet/retinanet_r50_fpn_1x_coco.py' +_base_ = "../retinanet/retinanet_r50_fpn_1x_coco.py" model = dict( bbox_head=dict( - loss_cls=dict( - _delete_=True, - type='GHMC', - bins=30, - momentum=0.75, - use_sigmoid=True, - loss_weight=1.0), - loss_bbox=dict( - _delete_=True, - type='GHMR', - mu=0.02, - bins=10, - momentum=0.7, - loss_weight=10.0))) + loss_cls=dict(_delete_=True, type="GHMC", bins=30, momentum=0.75, use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(_delete_=True, type="GHMR", mu=0.02, bins=10, momentum=0.7, loss_weight=10.0), + ) +) optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/ghm/retinanet_x101-32x4d_fpn_ghm-1x_coco.py b/mmpose/configs/mmdet/ghm/retinanet_x101-32x4d_fpn_ghm-1x_coco.py index 1240545a624a70c7122829e85b426cafcc3f42d2..5f128601c000e1e9493446390f2e2e0194977bed 100644 --- a/mmpose/configs/mmdet/ghm/retinanet_x101-32x4d_fpn_ghm-1x_coco.py +++ b/mmpose/configs/mmdet/ghm/retinanet_x101-32x4d_fpn_ghm-1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './retinanet_r50_fpn_ghm-1x_coco.py' +_base_ = "./retinanet_r50_fpn_ghm-1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/ghm/retinanet_x101-64x4d_fpn_ghm-1x_coco.py b/mmpose/configs/mmdet/ghm/retinanet_x101-64x4d_fpn_ghm-1x_coco.py index 689d2edcdf1bdffa52ee3aa3a8a4dac7988f6fa5..9856440d9c42a48929867fd46378d7753fe90853 100644 --- a/mmpose/configs/mmdet/ghm/retinanet_x101-64x4d_fpn_ghm-1x_coco.py +++ b/mmpose/configs/mmdet/ghm/retinanet_x101-64x4d_fpn_ghm-1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './retinanet_r50_fpn_ghm-1x_coco.py' +_base_ = "./retinanet_r50_fpn_ghm-1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/glip/flickr30k/glip_atss_swin-t_c_fpn_dyhead_pretrain_obj365-goldg_zeroshot_flickr30k.py b/mmpose/configs/mmdet/glip/flickr30k/glip_atss_swin-t_c_fpn_dyhead_pretrain_obj365-goldg_zeroshot_flickr30k.py index 14d6e8aaa6372a5272467dd46d33e80979298efc..6054433c03d7e2d06a7299684c0bcca0889266c0 100644 --- a/mmpose/configs/mmdet/glip/flickr30k/glip_atss_swin-t_c_fpn_dyhead_pretrain_obj365-goldg_zeroshot_flickr30k.py +++ b/mmpose/configs/mmdet/glip/flickr30k/glip_atss_swin-t_c_fpn_dyhead_pretrain_obj365-goldg_zeroshot_flickr30k.py @@ -1,61 +1,64 @@ -_base_ = '../glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py' +_base_ = "../glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py" -lang_model_name = 'bert-base-uncased' +lang_model_name = "bert-base-uncased" model = dict(bbox_head=dict(early_fuse=True)) -dataset_type = 'Flickr30kDataset' -data_root = 'data/flickr30k_entities/' +dataset_type = "Flickr30kDataset" +data_root = "data/flickr30k_entities/" test_pipeline = [ + dict(type="LoadImageFromFile", backend_args=None, imdecode_backend="pillow"), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True, backend="pillow"), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='LoadImageFromFile', backend_args=None, - imdecode_backend='pillow'), - dict( - type='FixScaleResize', - scale=(800, 1333), - keep_ratio=True, - backend='pillow'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities', - 'tokens_positive', 'phrase_ids', 'phrases')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "text", + "custom_entities", + "tokens_positive", + "phrase_ids", + "phrases", + ), + ), ] dataset_Flickr30k_val = dict( type=dataset_type, data_root=data_root, - ann_file='final_flickr_separateGT_val.json', - data_prefix=dict(img='flickr30k_images/'), + ann_file="final_flickr_separateGT_val.json", + data_prefix=dict(img="flickr30k_images/"), pipeline=test_pipeline, ) dataset_Flickr30k_test = dict( type=dataset_type, data_root=data_root, - ann_file='final_flickr_separateGT_test.json', - data_prefix=dict(img='flickr30k_images/'), + ann_file="final_flickr_separateGT_test.json", + data_prefix=dict(img="flickr30k_images/"), pipeline=test_pipeline, ) -val_evaluator_Flickr30k = dict(type='Flickr30kMetric', ) +val_evaluator_Flickr30k = dict( + type="Flickr30kMetric", +) -test_evaluator_Flickr30k = dict(type='Flickr30kMetric', ) +test_evaluator_Flickr30k = dict( + type="Flickr30kMetric", +) # ----------Config---------- # -dataset_prefixes = ['Flickr30kVal', 'Flickr30kTest'] +dataset_prefixes = ["Flickr30kVal", "Flickr30kTest"] datasets = [dataset_Flickr30k_val, dataset_Flickr30k_test] metrics = [val_evaluator_Flickr30k, test_evaluator_Flickr30k] -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/glip/glip_atss_swin-l_fpn_dyhead_16xb2_ms-2x_funtune_coco.py b/mmpose/configs/mmdet/glip/glip_atss_swin-l_fpn_dyhead_16xb2_ms-2x_funtune_coco.py index 92a85a11d57b6d3d64bfed5f9a691bca739d7ce3..fa8b99e883fe65bdbcf61548c16837e7c50e8fc0 100644 --- a/mmpose/configs/mmdet/glip/glip_atss_swin-l_fpn_dyhead_16xb2_ms-2x_funtune_coco.py +++ b/mmpose/configs/mmdet/glip/glip_atss_swin-l_fpn_dyhead_16xb2_ms-2x_funtune_coco.py @@ -1,4 +1,4 @@ -_base_ = './glip_atss_swin-t_b_fpn_dyhead_16xb2_ms-2x_funtune_coco.py' +_base_ = "./glip_atss_swin-t_b_fpn_dyhead_16xb2_ms-2x_funtune_coco.py" model = dict( backbone=dict( @@ -9,6 +9,7 @@ model = dict( drop_path_rate=0.4, ), neck=dict(in_channels=[384, 768, 1536]), - bbox_head=dict(early_fuse=True, num_dyhead_blocks=8, use_checkpoint=True)) + bbox_head=dict(early_fuse=True, num_dyhead_blocks=8, use_checkpoint=True), +) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/glip/glip_l_mmdet-abfe026b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/glip/glip_l_mmdet-abfe026b.pth" # noqa diff --git a/mmpose/configs/mmdet/glip/glip_atss_swin-l_fpn_dyhead_pretrain_mixeddata.py b/mmpose/configs/mmdet/glip/glip_atss_swin-l_fpn_dyhead_pretrain_mixeddata.py index 546ecfe1d513b4161322f5ffa0e51d01b2775780..ecf23e6b16ffe6435a90da111a7f8736065d7ac4 100644 --- a/mmpose/configs/mmdet/glip/glip_atss_swin-l_fpn_dyhead_pretrain_mixeddata.py +++ b/mmpose/configs/mmdet/glip/glip_atss_swin-l_fpn_dyhead_pretrain_mixeddata.py @@ -1,4 +1,4 @@ -_base_ = './glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py' +_base_ = "./glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py" model = dict( backbone=dict( @@ -9,4 +9,5 @@ model = dict( drop_path_rate=0.4, ), neck=dict(in_channels=[384, 768, 1536]), - bbox_head=dict(early_fuse=True, num_dyhead_blocks=8)) + bbox_head=dict(early_fuse=True, num_dyhead_blocks=8), +) diff --git a/mmpose/configs/mmdet/glip/glip_atss_swin-t_a_fpn_dyhead_16xb2_ms-2x_funtune_coco.py b/mmpose/configs/mmdet/glip/glip_atss_swin-t_a_fpn_dyhead_16xb2_ms-2x_funtune_coco.py index 4b280657b315c77dd118ab84880d97dc882102a1..6c5347b6fb0f33fe2da2d5f73114204160947f36 100644 --- a/mmpose/configs/mmdet/glip/glip_atss_swin-t_a_fpn_dyhead_16xb2_ms-2x_funtune_coco.py +++ b/mmpose/configs/mmdet/glip/glip_atss_swin-t_a_fpn_dyhead_16xb2_ms-2x_funtune_coco.py @@ -1,20 +1,14 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/glip/glip_tiny_a_mmdet-b3654169.pth' # noqa -lang_model_name = 'bert-base-uncased' +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] +load_from = "https://download.openmmlab.com/mmdetection/v3.0/glip/glip_tiny_a_mmdet-b3654169.pth" # noqa +lang_model_name = "bert-base-uncased" model = dict( - type='GLIP', + type="GLIP", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.53, 116.28, 123.675], - std=[57.375, 57.12, 58.395], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - type='SwinTransformer', + type="SwinTransformer", embed_dims=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], @@ -22,134 +16,107 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=(1, 2, 3), with_cp=False, - convert_weights=False), + convert_weights=False, + ), neck=dict( - type='FPN_DropBlock', + type="FPN_DropBlock", in_channels=[192, 384, 768], out_channels=256, start_level=0, relu_before_extra_convs=True, - add_extra_convs='on_output', - num_outs=5), + add_extra_convs="on_output", + num_outs=5, + ), bbox_head=dict( - type='ATSSVLFusionHead', + type="ATSSVLFusionHead", lang_model_name=lang_model_name, num_classes=80, in_channels=256, feat_channels=256, anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128], - center_offset=0.5), - bbox_coder=dict( - type='DeltaXYWHBBoxCoderForGLIP', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=2.0), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), - language_model=dict(type='BertModel', name=lang_model_name), + type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128], center_offset=0.5 + ), + bbox_coder=dict(type="DeltaXYWHBBoxCoderForGLIP", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), + language_model=dict(type="BertModel", name=lang_model_name), train_cfg=dict( - assigner=dict( - type='ATSSAssigner', - topk=9, - iou_calculator=dict(type='BboxOverlaps2D_GLIP')), + assigner=dict(type="ATSSAssigner", topk=9, iou_calculator=dict(type="BboxOverlaps2D_GLIP")), allowed_border=-1, pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + debug=False, + ), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # dataset settings train_pipeline = [ + dict(type="LoadImageFromFile", imdecode_backend="pillow", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="GTBoxSubOne_GLIP"), dict( - type='LoadImageFromFile', - imdecode_backend='pillow', - backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='GTBoxSubOne_GLIP'), - dict( - type='RandomChoiceResize', - scales=[(1333, 480), (1333, 560), (1333, 640), (1333, 720), - (1333, 800)], + type="RandomChoiceResize", + scales=[(1333, 480), (1333, 560), (1333, 640), (1333, 720), (1333, 800)], keep_ratio=True, - resize_type='FixScaleResize', - backend='pillow'), - dict(type='RandomFlip_GLIP', prob=0.5), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1)), + resize_type="FixScaleResize", + backend="pillow", + ), + dict(type="RandomFlip_GLIP", prob=0.5), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1)), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction", "text", "custom_entities"), + ), ] test_pipeline = [ - dict( - type='LoadImageFromFile', - backend_args=_base_.backend_args, - imdecode_backend='pillow'), - dict( - type='FixScaleResize', - scale=(800, 1333), - keep_ratio=True, - backend='pillow'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities')) + dict(type="LoadImageFromFile", backend_args=_base_.backend_args, imdecode_backend="pillow"), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True, backend="pillow"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "text", "custom_entities")), ] train_dataloader = dict( dataset=dict( _delete_=True, - type='RepeatDataset', + type="RepeatDataset", times=2, dataset=dict( type=_base_.dataset_type, data_root=_base_.data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, return_classes=True, - backend_args=_base_.backend_args))) + backend_args=_base_.backend_args, + ), + ) +) -val_dataloader = dict( - dataset=dict(pipeline=test_pipeline, return_classes=True)) +val_dataloader = dict(dataset=dict(pipeline=test_pipeline, return_classes=True)) test_dataloader = val_dataloader # We did not adopt the official 24e optimizer strategy # because the results indicate that the current strategy is superior. optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict( - type='AdamW', lr=0.00002, betas=(0.9, 0.999), weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.00002, betas=(0.9, 0.999), weight_decay=0.05), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'relative_position_bias_table': dict(decay_mult=0.), - 'norm': dict(decay_mult=0.) - }), - clip_grad=None) + "absolute_pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + } + ), + clip_grad=None, +) diff --git a/mmpose/configs/mmdet/glip/glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py b/mmpose/configs/mmdet/glip/glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py index 34a818caefcbfcdd9e51ec304fb94906c20ceb9a..735c05bb7dad26555002f6be32c0f095f151f108 100644 --- a/mmpose/configs/mmdet/glip/glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py +++ b/mmpose/configs/mmdet/glip/glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py @@ -1,20 +1,14 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] -lang_model_name = 'bert-base-uncased' +lang_model_name = "bert-base-uncased" model = dict( - type='GLIP', + type="GLIP", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.53, 116.28, 123.675], - std=[57.375, 57.12, 58.395], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - type='SwinTransformer', + type="SwinTransformer", embed_dims=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], @@ -22,69 +16,45 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=(1, 2, 3), with_cp=False, - convert_weights=False), + convert_weights=False, + ), neck=dict( - type='FPN', + type="FPN", in_channels=[192, 384, 768], out_channels=256, start_level=0, relu_before_extra_convs=True, - add_extra_convs='on_output', - num_outs=5), + add_extra_convs="on_output", + num_outs=5, + ), bbox_head=dict( - type='ATSSVLFusionHead', + type="ATSSVLFusionHead", lang_model_name=lang_model_name, num_classes=80, in_channels=256, feat_channels=256, anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128], - center_offset=0.5), - bbox_coder=dict( - type='DeltaXYWHBBoxCoderForGLIP', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), + type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128], center_offset=0.5 + ), + bbox_coder=dict(type="DeltaXYWHBBoxCoderForGLIP", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), ), - language_model=dict(type='BertModel', name=lang_model_name), - train_cfg=dict( - assigner=dict(type='ATSSAssigner', topk=9), - allowed_border=-1, - pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + language_model=dict(type="BertModel", name=lang_model_name), + train_cfg=dict(assigner=dict(type="ATSSAssigner", topk=9), allowed_border=-1, pos_weight=-1, debug=False), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) test_pipeline = [ - dict( - type='LoadImageFromFile', - backend_args=_base_.backend_args, - imdecode_backend='pillow'), - dict( - type='FixScaleResize', - scale=(800, 1333), - keep_ratio=True, - backend='pillow'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities')) + dict(type="LoadImageFromFile", backend_args=_base_.backend_args, imdecode_backend="pillow"), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True, backend="pillow"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "text", "custom_entities")), ] -val_dataloader = dict( - dataset=dict(pipeline=test_pipeline, return_classes=True)) +val_dataloader = dict(dataset=dict(pipeline=test_pipeline, return_classes=True)) test_dataloader = val_dataloader diff --git a/mmpose/configs/mmdet/glip/glip_atss_swin-t_b_fpn_dyhead_16xb2_ms-2x_funtune_coco.py b/mmpose/configs/mmdet/glip/glip_atss_swin-t_b_fpn_dyhead_16xb2_ms-2x_funtune_coco.py index 3487de3f3a24077f475e8451722d1b4d252a0084..d6be65df313f96cc9475c3de34369101b333eedf 100644 --- a/mmpose/configs/mmdet/glip/glip_atss_swin-t_b_fpn_dyhead_16xb2_ms-2x_funtune_coco.py +++ b/mmpose/configs/mmdet/glip/glip_atss_swin-t_b_fpn_dyhead_16xb2_ms-2x_funtune_coco.py @@ -1,9 +1,7 @@ -_base_ = './glip_atss_swin-t_a_fpn_dyhead_16xb2_ms-2x_funtune_coco.py' +_base_ = "./glip_atss_swin-t_a_fpn_dyhead_16xb2_ms-2x_funtune_coco.py" model = dict(bbox_head=dict(early_fuse=True, use_checkpoint=True)) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/glip/glip_tiny_b_mmdet-6dfbd102.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/glip/glip_tiny_b_mmdet-6dfbd102.pth" # noqa -optim_wrapper = dict( - optimizer=dict(lr=0.00001), - clip_grad=dict(_delete_=True, max_norm=1, norm_type=2)) +optim_wrapper = dict(optimizer=dict(lr=0.00001), clip_grad=dict(_delete_=True, max_norm=1, norm_type=2)) diff --git a/mmpose/configs/mmdet/glip/glip_atss_swin-t_b_fpn_dyhead_pretrain_obj365.py b/mmpose/configs/mmdet/glip/glip_atss_swin-t_b_fpn_dyhead_pretrain_obj365.py index 6334e5e3b4043a81d154fc03a94594d93d74aed5..b5ce238e9eebc00c369f4b041b539e510c05c6ad 100644 --- a/mmpose/configs/mmdet/glip/glip_atss_swin-t_b_fpn_dyhead_pretrain_obj365.py +++ b/mmpose/configs/mmdet/glip/glip_atss_swin-t_b_fpn_dyhead_pretrain_obj365.py @@ -1,3 +1,3 @@ -_base_ = './glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py' +_base_ = "./glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py" model = dict(bbox_head=dict(early_fuse=True)) diff --git a/mmpose/configs/mmdet/glip/glip_atss_swin-t_c_fpn_dyhead_16xb2_ms-2x_funtune_coco.py b/mmpose/configs/mmdet/glip/glip_atss_swin-t_c_fpn_dyhead_16xb2_ms-2x_funtune_coco.py index 5c315e490e7a7e05a6334d4d38ce9be9b70851b3..901dd862694e7324a6ffc045c0d7324056c85b50 100644 --- a/mmpose/configs/mmdet/glip/glip_atss_swin-t_c_fpn_dyhead_16xb2_ms-2x_funtune_coco.py +++ b/mmpose/configs/mmdet/glip/glip_atss_swin-t_c_fpn_dyhead_16xb2_ms-2x_funtune_coco.py @@ -1,3 +1,3 @@ -_base_ = './glip_atss_swin-t_b_fpn_dyhead_16xb2_ms-2x_funtune_coco.py' +_base_ = "./glip_atss_swin-t_b_fpn_dyhead_16xb2_ms-2x_funtune_coco.py" -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/glip/glip_tiny_c_mmdet-2fc427dd.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/glip/glip_tiny_c_mmdet-2fc427dd.pth" # noqa diff --git a/mmpose/configs/mmdet/glip/glip_atss_swin-t_c_fpn_dyhead_pretrain_obj365-goldg.py b/mmpose/configs/mmdet/glip/glip_atss_swin-t_c_fpn_dyhead_pretrain_obj365-goldg.py index 24898f4df532cc2e2728265800d2f6a030e8efe0..c227603f57d89ea808daf0919a68c4f47878da47 100644 --- a/mmpose/configs/mmdet/glip/glip_atss_swin-t_c_fpn_dyhead_pretrain_obj365-goldg.py +++ b/mmpose/configs/mmdet/glip/glip_atss_swin-t_c_fpn_dyhead_pretrain_obj365-goldg.py @@ -1 +1 @@ -_base_ = './glip_atss_swin-t_b_fpn_dyhead_pretrain_obj365.py' +_base_ = "./glip_atss_swin-t_b_fpn_dyhead_pretrain_obj365.py" diff --git a/mmpose/configs/mmdet/glip/glip_atss_swin-t_fpn_dyhead_16xb2_ms-2x_funtune_coco.py b/mmpose/configs/mmdet/glip/glip_atss_swin-t_fpn_dyhead_16xb2_ms-2x_funtune_coco.py index 3391272e608e8098773a6435550e578f462ed886..7eb1e94330ede2ee2ff342d0aa7b85646bc72ece 100644 --- a/mmpose/configs/mmdet/glip/glip_atss_swin-t_fpn_dyhead_16xb2_ms-2x_funtune_coco.py +++ b/mmpose/configs/mmdet/glip/glip_atss_swin-t_fpn_dyhead_16xb2_ms-2x_funtune_coco.py @@ -1,3 +1,3 @@ -_base_ = './glip_atss_swin-t_b_fpn_dyhead_16xb2_ms-2x_funtune_coco.py' +_base_ = "./glip_atss_swin-t_b_fpn_dyhead_16xb2_ms-2x_funtune_coco.py" -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/glip/glip_tiny_mmdet-c24ce662.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/glip/glip_tiny_mmdet-c24ce662.pth" # noqa diff --git a/mmpose/configs/mmdet/glip/glip_atss_swin-t_fpn_dyhead_pretrain_obj365-goldg-cc3m-sub.py b/mmpose/configs/mmdet/glip/glip_atss_swin-t_fpn_dyhead_pretrain_obj365-goldg-cc3m-sub.py index 24898f4df532cc2e2728265800d2f6a030e8efe0..c227603f57d89ea808daf0919a68c4f47878da47 100644 --- a/mmpose/configs/mmdet/glip/glip_atss_swin-t_fpn_dyhead_pretrain_obj365-goldg-cc3m-sub.py +++ b/mmpose/configs/mmdet/glip/glip_atss_swin-t_fpn_dyhead_pretrain_obj365-goldg-cc3m-sub.py @@ -1 +1 @@ -_base_ = './glip_atss_swin-t_b_fpn_dyhead_pretrain_obj365.py' +_base_ = "./glip_atss_swin-t_b_fpn_dyhead_pretrain_obj365.py" diff --git a/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-l_fpn_dyhead_pretrain_zeroshot_lvis.py b/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-l_fpn_dyhead_pretrain_zeroshot_lvis.py index 1f79e447d3f24e364739740be504bb234adc1e98..2c912b60e0ee6bd3049970b3fd37cd60aeb36a06 100644 --- a/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-l_fpn_dyhead_pretrain_zeroshot_lvis.py +++ b/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-l_fpn_dyhead_pretrain_zeroshot_lvis.py @@ -1,4 +1,4 @@ -_base_ = './glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_lvis.py' +_base_ = "./glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_lvis.py" model = dict( backbone=dict( @@ -9,4 +9,5 @@ model = dict( drop_path_rate=0.4, ), neck=dict(in_channels=[384, 768, 1536]), - bbox_head=dict(early_fuse=True, num_dyhead_blocks=8)) + bbox_head=dict(early_fuse=True, num_dyhead_blocks=8), +) diff --git a/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-l_fpn_dyhead_pretrain_zeroshot_mini-lvis.py b/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-l_fpn_dyhead_pretrain_zeroshot_mini-lvis.py index 13f1a69082b670632dfe3eb8dc50826549dcf59f..60f084debc72b0cb43dca7b5e905f0d25876fb9f 100644 --- a/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-l_fpn_dyhead_pretrain_zeroshot_mini-lvis.py +++ b/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-l_fpn_dyhead_pretrain_zeroshot_mini-lvis.py @@ -1,4 +1,4 @@ -_base_ = './glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_mini-lvis.py' +_base_ = "./glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_mini-lvis.py" model = dict( backbone=dict( @@ -9,4 +9,5 @@ model = dict( drop_path_rate=0.4, ), neck=dict(in_channels=[384, 768, 1536]), - bbox_head=dict(early_fuse=True, num_dyhead_blocks=8)) + bbox_head=dict(early_fuse=True, num_dyhead_blocks=8), +) diff --git a/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_lvis.py b/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_lvis.py index 4d526d59008b39996a147a2852a44d2e936113d2..da6cdb33185f53731b33ca21e86000b0ac335507 100644 --- a/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_lvis.py +++ b/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_lvis.py @@ -1,24 +1,20 @@ -_base_ = '../glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py' +_base_ = "../glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py" -model = dict(test_cfg=dict( - max_per_img=300, - chunked_size=40, -)) +model = dict( + test_cfg=dict( + max_per_img=300, + chunked_size=40, + ) +) -dataset_type = 'LVISV1Dataset' -data_root = 'data/coco/' +dataset_type = "LVISV1Dataset" +data_root = "data/coco/" val_dataloader = dict( - dataset=dict( - data_root=data_root, - type=dataset_type, - ann_file='annotations/lvis_od_val.json', - data_prefix=dict(img=''))) + dataset=dict(data_root=data_root, type=dataset_type, ann_file="annotations/lvis_od_val.json", data_prefix=dict(img="")) +) test_dataloader = val_dataloader # numpy < 1.24.0 -val_evaluator = dict( - _delete_=True, - type='LVISFixedAPMetric', - ann_file=data_root + 'annotations/lvis_od_val.json') +val_evaluator = dict(_delete_=True, type="LVISFixedAPMetric", ann_file=data_root + "annotations/lvis_od_val.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_mini-lvis.py b/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_mini-lvis.py index 70a57a3f581ca1c374dbae71059c7049a20d3a47..9ec831bf58a3fc953375c161f5d8de8f4c289d83 100644 --- a/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_mini-lvis.py +++ b/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_mini-lvis.py @@ -1,25 +1,22 @@ -_base_ = '../glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py' +_base_ = "../glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py" -model = dict(test_cfg=dict( - max_per_img=300, - chunked_size=40, -)) +model = dict( + test_cfg=dict( + max_per_img=300, + chunked_size=40, + ) +) -dataset_type = 'LVISV1Dataset' -data_root = 'data/coco/' +dataset_type = "LVISV1Dataset" +data_root = "data/coco/" val_dataloader = dict( dataset=dict( - data_root=data_root, - type=dataset_type, - ann_file='annotations/lvis_v1_minival_inserted_image_name.json', - data_prefix=dict(img=''))) + data_root=data_root, type=dataset_type, ann_file="annotations/lvis_v1_minival_inserted_image_name.json", data_prefix=dict(img="") + ) +) test_dataloader = val_dataloader # numpy < 1.24.0 -val_evaluator = dict( - _delete_=True, - type='LVISFixedAPMetric', - ann_file=data_root + - 'annotations/lvis_v1_minival_inserted_image_name.json') +val_evaluator = dict(_delete_=True, type="LVISFixedAPMetric", ann_file=data_root + "annotations/lvis_v1_minival_inserted_image_name.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_bc_fpn_dyhead_pretrain_zeroshot_lvis.py b/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_bc_fpn_dyhead_pretrain_zeroshot_lvis.py index 6dc712b3bcb4f8dd1018b175d3a4e7f59be3a990..079e2e6dbebc2c4b7a2a455835c173313d6bec19 100644 --- a/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_bc_fpn_dyhead_pretrain_zeroshot_lvis.py +++ b/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_bc_fpn_dyhead_pretrain_zeroshot_lvis.py @@ -1,3 +1,3 @@ -_base_ = './glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_lvis.py' +_base_ = "./glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_lvis.py" model = dict(bbox_head=dict(early_fuse=True)) diff --git a/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_bc_fpn_dyhead_pretrain_zeroshot_mini-lvis.py b/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_bc_fpn_dyhead_pretrain_zeroshot_mini-lvis.py index 3babb91101a6dc283ada78911672c7c7433f67ac..48bf0d370077e858d7874ab1b9d17d4defeff9f8 100644 --- a/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_bc_fpn_dyhead_pretrain_zeroshot_mini-lvis.py +++ b/mmpose/configs/mmdet/glip/lvis/glip_atss_swin-t_bc_fpn_dyhead_pretrain_zeroshot_mini-lvis.py @@ -1,3 +1,3 @@ -_base_ = './glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_mini-lvis.py' +_base_ = "./glip_atss_swin-t_a_fpn_dyhead_pretrain_zeroshot_mini-lvis.py" model = dict(bbox_head=dict(early_fuse=True)) diff --git a/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_a_fpn_dyhead_pretrain_odinw13.py b/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_a_fpn_dyhead_pretrain_odinw13.py index d38effba8c1333a2403c6bc0f20b7fde21c4c47d..b74a99e5b024179587263955e0b05c9be8b1ac55 100644 --- a/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_a_fpn_dyhead_pretrain_odinw13.py +++ b/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_a_fpn_dyhead_pretrain_odinw13.py @@ -1,36 +1,42 @@ -_base_ = '../glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py' +_base_ = "../glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py" -dataset_type = 'CocoDataset' -data_root = 'data/odinw/' +dataset_type = "CocoDataset" +data_root = "data/odinw/" base_test_pipeline = _base_.test_pipeline -base_test_pipeline[-1]['meta_keys'] = ('img_id', 'img_path', 'ori_shape', - 'img_shape', 'scale_factor', 'text', - 'custom_entities', 'caption_prompt') +base_test_pipeline[-1]["meta_keys"] = ( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "text", + "custom_entities", + "caption_prompt", +) # ---------------------1 AerialMaritimeDrone---------------------# -class_name = ('boat', 'car', 'dock', 'jetski', 'lift') +class_name = ("boat", "car", "dock", "jetski", "lift") metainfo = dict(classes=class_name) -_data_root = data_root + 'AerialMaritimeDrone/large/' +_data_root = data_root + "AerialMaritimeDrone/large/" dataset_AerialMaritimeDrone = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), test_mode=True, pipeline=base_test_pipeline, - return_classes=True) + return_classes=True, +) val_evaluator_AerialMaritimeDrone = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------2 Aquarium---------------------# -class_name = ('fish', 'jellyfish', 'penguin', 'puffin', 'shark', 'starfish', - 'stingray') +class_name = ("fish", "jellyfish", "penguin", "puffin", "shark", "starfish", "stingray") metainfo = dict(classes=class_name) -_data_root = data_root + 'Aquarium/Aquarium Combined.v2-raw-1024.coco/' +_data_root = data_root + "Aquarium/Aquarium Combined.v2-raw-1024.coco/" caption_prompt = None # caption_prompt = { @@ -48,21 +54,19 @@ dataset_Aquarium = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_Aquarium = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Aquarium = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------3 CottontailRabbits---------------------# -class_name = ('Cottontail-Rabbit', ) +class_name = ("Cottontail-Rabbit",) metainfo = dict(classes=class_name) -_data_root = data_root + 'CottontailRabbits/' +_data_root = data_root + "CottontailRabbits/" caption_prompt = None # caption_prompt = {'Cottontail-Rabbit': {'name': 'rabbit'}} @@ -71,21 +75,19 @@ dataset_CottontailRabbits = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_CottontailRabbits = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_CottontailRabbits = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------4 EgoHands---------------------# -class_name = ('hand', ) +class_name = ("hand",) metainfo = dict(classes=class_name) -_data_root = data_root + 'EgoHands/generic/' +_data_root = data_root + "EgoHands/generic/" caption_prompt = None # caption_prompt = {'hand': {'suffix': ' of a person'}} @@ -94,21 +96,19 @@ dataset_EgoHands = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_EgoHands = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_EgoHands = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------5 NorthAmericaMushrooms---------------------# -class_name = ('CoW', 'chanterelle') +class_name = ("CoW", "chanterelle") metainfo = dict(classes=class_name) -_data_root = data_root + 'NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/' # noqa +_data_root = data_root + "NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/" # noqa caption_prompt = None # caption_prompt = { @@ -124,21 +124,21 @@ dataset_NorthAmericaMushrooms = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_NorthAmericaMushrooms = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------6 Packages---------------------# -class_name = ('package', ) +class_name = ("package",) metainfo = dict(classes=class_name) -_data_root = data_root + 'Packages/Raw/' +_data_root = data_root + "Packages/Raw/" caption_prompt = None # caption_prompt = { @@ -152,60 +152,72 @@ dataset_Packages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_Packages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Packages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------7 PascalVOC---------------------# -class_name = ('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', - 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', - 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', - 'tvmonitor') +class_name = ( + "aeroplane", + "bicycle", + "bird", + "boat", + "bottle", + "bus", + "car", + "cat", + "chair", + "cow", + "diningtable", + "dog", + "horse", + "motorbike", + "person", + "pottedplant", + "sheep", + "sofa", + "train", + "tvmonitor", +) metainfo = dict(classes=class_name) -_data_root = data_root + 'PascalVOC/' +_data_root = data_root + "PascalVOC/" dataset_PascalVOC = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_PascalVOC = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_PascalVOC = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------8 pistols---------------------# -class_name = ('pistol', ) +class_name = ("pistol",) metainfo = dict(classes=class_name) -_data_root = data_root + 'pistols/export/' +_data_root = data_root + "pistols/export/" dataset_pistols = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pistols = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pistols = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------9 pothole---------------------# -class_name = ('pothole', ) +class_name = ("pothole",) metainfo = dict(classes=class_name) -_data_root = data_root + 'pothole/' +_data_root = data_root + "pothole/" caption_prompt = None # caption_prompt = { @@ -220,119 +232,132 @@ dataset_pothole = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pothole = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pothole = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------10 Raccoon---------------------# -class_name = ('raccoon', ) +class_name = ("raccoon",) metainfo = dict(classes=class_name) -_data_root = data_root + 'Raccoon/Raccoon.v2-raw.coco/' +_data_root = data_root + "Raccoon/Raccoon.v2-raw.coco/" dataset_Raccoon = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_Raccoon = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Raccoon = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------11 ShellfishOpenImages---------------------# -class_name = ('Crab', 'Lobster', 'Shrimp') +class_name = ("Crab", "Lobster", "Shrimp") metainfo = dict(classes=class_name) -_data_root = data_root + 'ShellfishOpenImages/raw/' +_data_root = data_root + "ShellfishOpenImages/raw/" dataset_ShellfishOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_ShellfishOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------12 thermalDogsAndPeople---------------------# -class_name = ('dog', 'person') +class_name = ("dog", "person") metainfo = dict(classes=class_name) -_data_root = data_root + 'thermalDogsAndPeople/' +_data_root = data_root + "thermalDogsAndPeople/" dataset_thermalDogsAndPeople = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_thermalDogsAndPeople = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------13 VehiclesOpenImages---------------------# -class_name = ('Ambulance', 'Bus', 'Car', 'Motorcycle', 'Truck') +class_name = ("Ambulance", "Bus", "Car", "Motorcycle", "Truck") metainfo = dict(classes=class_name) -_data_root = data_root + 'VehiclesOpenImages/416x416/' +_data_root = data_root + "VehiclesOpenImages/416x416/" dataset_VehiclesOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_VehiclesOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_VehiclesOpenImages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # --------------------- Config---------------------# dataset_prefixes = [ - 'AerialMaritimeDrone', 'Aquarium', 'CottontailRabbits', 'EgoHands', - 'NorthAmericaMushrooms', 'Packages', 'PascalVOC', 'pistols', 'pothole', - 'Raccoon', 'ShellfishOpenImages', 'thermalDogsAndPeople', - 'VehiclesOpenImages' + "AerialMaritimeDrone", + "Aquarium", + "CottontailRabbits", + "EgoHands", + "NorthAmericaMushrooms", + "Packages", + "PascalVOC", + "pistols", + "pothole", + "Raccoon", + "ShellfishOpenImages", + "thermalDogsAndPeople", + "VehiclesOpenImages", ] datasets = [ - dataset_AerialMaritimeDrone, dataset_Aquarium, dataset_CottontailRabbits, - dataset_EgoHands, dataset_NorthAmericaMushrooms, dataset_Packages, - dataset_PascalVOC, dataset_pistols, dataset_pothole, dataset_Raccoon, - dataset_ShellfishOpenImages, dataset_thermalDogsAndPeople, - dataset_VehiclesOpenImages + dataset_AerialMaritimeDrone, + dataset_Aquarium, + dataset_CottontailRabbits, + dataset_EgoHands, + dataset_NorthAmericaMushrooms, + dataset_Packages, + dataset_PascalVOC, + dataset_pistols, + dataset_pothole, + dataset_Raccoon, + dataset_ShellfishOpenImages, + dataset_thermalDogsAndPeople, + dataset_VehiclesOpenImages, ] metrics = [ - val_evaluator_AerialMaritimeDrone, val_evaluator_Aquarium, - val_evaluator_CottontailRabbits, val_evaluator_EgoHands, - val_evaluator_NorthAmericaMushrooms, val_evaluator_Packages, - val_evaluator_PascalVOC, val_evaluator_pistols, val_evaluator_pothole, - val_evaluator_Raccoon, val_evaluator_ShellfishOpenImages, - val_evaluator_thermalDogsAndPeople, val_evaluator_VehiclesOpenImages + val_evaluator_AerialMaritimeDrone, + val_evaluator_Aquarium, + val_evaluator_CottontailRabbits, + val_evaluator_EgoHands, + val_evaluator_NorthAmericaMushrooms, + val_evaluator_Packages, + val_evaluator_PascalVOC, + val_evaluator_pistols, + val_evaluator_pothole, + val_evaluator_Raccoon, + val_evaluator_ShellfishOpenImages, + val_evaluator_thermalDogsAndPeople, + val_evaluator_VehiclesOpenImages, ] # -------------------------------------------------# -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_a_fpn_dyhead_pretrain_odinw35.py b/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_a_fpn_dyhead_pretrain_odinw35.py index 2eaf09ed771978397b9d67048b371724418e50aa..74cecfa8a378b9f381508a739c583e0bca1ab0a6 100644 --- a/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_a_fpn_dyhead_pretrain_odinw35.py +++ b/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_a_fpn_dyhead_pretrain_odinw35.py @@ -1,794 +1,949 @@ -_base_ = '../glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py' +_base_ = "../glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py" -dataset_type = 'CocoDataset' -data_root = 'data/odinw/' +dataset_type = "CocoDataset" +data_root = "data/odinw/" base_test_pipeline = _base_.test_pipeline -base_test_pipeline[-1]['meta_keys'] = ('img_id', 'img_path', 'ori_shape', - 'img_shape', 'scale_factor', 'text', - 'custom_entities', 'caption_prompt') +base_test_pipeline[-1]["meta_keys"] = ( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "text", + "custom_entities", + "caption_prompt", +) # ---------------------1 AerialMaritimeDrone_large---------------------# -class_name = ('boat', 'car', 'dock', 'jetski', 'lift') +class_name = ("boat", "car", "dock", "jetski", "lift") metainfo = dict(classes=class_name) -_data_root = data_root + 'AerialMaritimeDrone/large/' +_data_root = data_root + "AerialMaritimeDrone/large/" dataset_AerialMaritimeDrone_large = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_AerialMaritimeDrone_large = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------2 AerialMaritimeDrone_tiled---------------------# -class_name = ('boat', 'car', 'dock', 'jetski', 'lift') +class_name = ("boat", "car", "dock", "jetski", "lift") metainfo = dict(classes=class_name) -_data_root = data_root + 'AerialMaritimeDrone/tiled/' +_data_root = data_root + "AerialMaritimeDrone/tiled/" dataset_AerialMaritimeDrone_tiled = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_AerialMaritimeDrone_tiled = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------3 AmericanSignLanguageLetters---------------------# -class_name = ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', - 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z') -metainfo = dict(classes=class_name) -_data_root = data_root + 'AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/' # noqa +class_name = ( + "A", + "B", + "C", + "D", + "E", + "F", + "G", + "H", + "I", + "J", + "K", + "L", + "M", + "N", + "O", + "P", + "Q", + "R", + "S", + "T", + "U", + "V", + "W", + "X", + "Y", + "Z", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/" # noqa dataset_AmericanSignLanguageLetters = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_AmericanSignLanguageLetters = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------4 Aquarium---------------------# -class_name = ('fish', 'jellyfish', 'penguin', 'puffin', 'shark', 'starfish', - 'stingray') +class_name = ("fish", "jellyfish", "penguin", "puffin", "shark", "starfish", "stingray") metainfo = dict(classes=class_name) -_data_root = data_root + 'Aquarium/Aquarium Combined.v2-raw-1024.coco/' +_data_root = data_root + "Aquarium/Aquarium Combined.v2-raw-1024.coco/" dataset_Aquarium = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_Aquarium = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Aquarium = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------5 BCCD---------------------# -class_name = ('Platelets', 'RBC', 'WBC') +class_name = ("Platelets", "RBC", "WBC") metainfo = dict(classes=class_name) -_data_root = data_root + 'BCCD/BCCD.v3-raw.coco/' +_data_root = data_root + "BCCD/BCCD.v3-raw.coco/" dataset_BCCD = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_BCCD = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_BCCD = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------6 boggleBoards---------------------# -class_name = ('Q', 'a', 'an', 'b', 'c', 'd', 'e', 'er', 'f', 'g', 'h', 'he', - 'i', 'in', 'j', 'k', 'l', 'm', 'n', 'o', 'o ', 'p', 'q', 'qu', - 'r', 's', 't', 't\\', 'th', 'u', 'v', 'w', 'wild', 'x', 'y', 'z') -metainfo = dict(classes=class_name) -_data_root = data_root + 'boggleBoards/416x416AutoOrient/export/' +class_name = ( + "Q", + "a", + "an", + "b", + "c", + "d", + "e", + "er", + "f", + "g", + "h", + "he", + "i", + "in", + "j", + "k", + "l", + "m", + "n", + "o", + "o ", + "p", + "q", + "qu", + "r", + "s", + "t", + "t\\", + "th", + "u", + "v", + "w", + "wild", + "x", + "y", + "z", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "boggleBoards/416x416AutoOrient/export/" dataset_boggleBoards = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_boggleBoards = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_boggleBoards = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------7 brackishUnderwater---------------------# -class_name = ('crab', 'fish', 'jellyfish', 'shrimp', 'small_fish', 'starfish') +class_name = ("crab", "fish", "jellyfish", "shrimp", "small_fish", "starfish") metainfo = dict(classes=class_name) -_data_root = data_root + 'brackishUnderwater/960x540/' +_data_root = data_root + "brackishUnderwater/960x540/" dataset_brackishUnderwater = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_brackishUnderwater = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_brackishUnderwater = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------8 ChessPieces---------------------# -class_name = (' ', 'black bishop', 'black king', 'black knight', 'black pawn', - 'black queen', 'black rook', 'white bishop', 'white king', - 'white knight', 'white pawn', 'white queen', 'white rook') -metainfo = dict(classes=class_name) -_data_root = data_root + 'ChessPieces/Chess Pieces.v23-raw.coco/' +class_name = ( + " ", + "black bishop", + "black king", + "black knight", + "black pawn", + "black queen", + "black rook", + "white bishop", + "white king", + "white knight", + "white pawn", + "white queen", + "white rook", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "ChessPieces/Chess Pieces.v23-raw.coco/" dataset_ChessPieces = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/new_annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/new_annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_ChessPieces = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/new_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_ChessPieces = dict(type="CocoMetric", ann_file=_data_root + "valid/new_annotations_without_background.json", metric="bbox") # ---------------------9 CottontailRabbits---------------------# -class_name = ('rabbit', ) +class_name = ("rabbit",) metainfo = dict(classes=class_name) -_data_root = data_root + 'CottontailRabbits/' +_data_root = data_root + "CottontailRabbits/" dataset_CottontailRabbits = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/new_annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/new_annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_CottontailRabbits = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/new_annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/new_annotations_without_background.json", metric="bbox" +) # ---------------------10 dice---------------------# -class_name = ('1', '2', '3', '4', '5', '6') +class_name = ("1", "2", "3", "4", "5", "6") metainfo = dict(classes=class_name) -_data_root = data_root + 'dice/mediumColor/export/' +_data_root = data_root + "dice/mediumColor/export/" dataset_dice = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_dice = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_dice = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------11 DroneControl---------------------# -class_name = ('follow', 'follow_hand', 'land', 'land_hand', 'null', 'object', - 'takeoff', 'takeoff-hand') +class_name = ("follow", "follow_hand", "land", "land_hand", "null", "object", "takeoff", "takeoff-hand") metainfo = dict(classes=class_name) -_data_root = data_root + 'DroneControl/Drone Control.v3-raw.coco/' +_data_root = data_root + "DroneControl/Drone Control.v3-raw.coco/" dataset_DroneControl = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_DroneControl = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_DroneControl = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------12 EgoHands_generic---------------------# -class_name = ('hand', ) +class_name = ("hand",) metainfo = dict(classes=class_name) -_data_root = data_root + 'EgoHands/generic/' -caption_prompt = {'hand': {'suffix': ' of a person'}} +_data_root = data_root + "EgoHands/generic/" +caption_prompt = {"hand": {"suffix": " of a person"}} dataset_EgoHands_generic = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_EgoHands_generic = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_EgoHands_generic = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------13 EgoHands_specific---------------------# -class_name = ('myleft', 'myright', 'yourleft', 'yourright') +class_name = ("myleft", "myright", "yourleft", "yourright") metainfo = dict(classes=class_name) -_data_root = data_root + 'EgoHands/specific/' +_data_root = data_root + "EgoHands/specific/" dataset_EgoHands_specific = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_EgoHands_specific = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_EgoHands_specific = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------14 HardHatWorkers---------------------# -class_name = ('head', 'helmet', 'person') +class_name = ("head", "helmet", "person") metainfo = dict(classes=class_name) -_data_root = data_root + 'HardHatWorkers/raw/' +_data_root = data_root + "HardHatWorkers/raw/" dataset_HardHatWorkers = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_HardHatWorkers = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_HardHatWorkers = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------15 MaskWearing---------------------# -class_name = ('mask', 'no-mask') +class_name = ("mask", "no-mask") metainfo = dict(classes=class_name) -_data_root = data_root + 'MaskWearing/raw/' +_data_root = data_root + "MaskWearing/raw/" dataset_MaskWearing = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_MaskWearing = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_MaskWearing = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------16 MountainDewCommercial---------------------# -class_name = ('bottle', ) +class_name = ("bottle",) metainfo = dict(classes=class_name) -_data_root = data_root + 'MountainDewCommercial/' +_data_root = data_root + "MountainDewCommercial/" dataset_MountainDewCommercial = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_MountainDewCommercial = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------17 NorthAmericaMushrooms---------------------# -class_name = ('flat mushroom', 'yellow mushroom') +class_name = ("flat mushroom", "yellow mushroom") metainfo = dict(classes=class_name) -_data_root = data_root + 'NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/' # noqa +_data_root = data_root + "NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/" # noqa dataset_NorthAmericaMushrooms = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/new_annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/new_annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_NorthAmericaMushrooms = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/new_annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/new_annotations_without_background.json", metric="bbox" +) # ---------------------18 openPoetryVision---------------------# -class_name = ('American Typewriter', 'Andale Mono', 'Apple Chancery', 'Arial', - 'Avenir', 'Baskerville', 'Big Caslon', 'Bradley Hand', - 'Brush Script MT', 'Chalkboard', 'Comic Sans MS', 'Copperplate', - 'Courier', 'Didot', 'Futura', 'Geneva', 'Georgia', 'Gill Sans', - 'Helvetica', 'Herculanum', 'Impact', 'Kefa', 'Lucida Grande', - 'Luminari', 'Marker Felt', 'Menlo', 'Monaco', 'Noteworthy', - 'Optima', 'PT Sans', 'PT Serif', 'Palatino', 'Papyrus', - 'Phosphate', 'Rockwell', 'SF Pro', 'SignPainter', 'Skia', - 'Snell Roundhand', 'Tahoma', 'Times New Roman', 'Trebuchet MS', - 'Verdana') -metainfo = dict(classes=class_name) -_data_root = data_root + 'openPoetryVision/512x512/' +class_name = ( + "American Typewriter", + "Andale Mono", + "Apple Chancery", + "Arial", + "Avenir", + "Baskerville", + "Big Caslon", + "Bradley Hand", + "Brush Script MT", + "Chalkboard", + "Comic Sans MS", + "Copperplate", + "Courier", + "Didot", + "Futura", + "Geneva", + "Georgia", + "Gill Sans", + "Helvetica", + "Herculanum", + "Impact", + "Kefa", + "Lucida Grande", + "Luminari", + "Marker Felt", + "Menlo", + "Monaco", + "Noteworthy", + "Optima", + "PT Sans", + "PT Serif", + "Palatino", + "Papyrus", + "Phosphate", + "Rockwell", + "SF Pro", + "SignPainter", + "Skia", + "Snell Roundhand", + "Tahoma", + "Times New Roman", + "Trebuchet MS", + "Verdana", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "openPoetryVision/512x512/" dataset_openPoetryVision = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_openPoetryVision = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_openPoetryVision = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------19 OxfordPets_by_breed---------------------# -class_name = ('cat-Abyssinian', 'cat-Bengal', 'cat-Birman', 'cat-Bombay', - 'cat-British_Shorthair', 'cat-Egyptian_Mau', 'cat-Maine_Coon', - 'cat-Persian', 'cat-Ragdoll', 'cat-Russian_Blue', 'cat-Siamese', - 'cat-Sphynx', 'dog-american_bulldog', - 'dog-american_pit_bull_terrier', 'dog-basset_hound', - 'dog-beagle', 'dog-boxer', 'dog-chihuahua', - 'dog-english_cocker_spaniel', 'dog-english_setter', - 'dog-german_shorthaired', 'dog-great_pyrenees', 'dog-havanese', - 'dog-japanese_chin', 'dog-keeshond', 'dog-leonberger', - 'dog-miniature_pinscher', 'dog-newfoundland', 'dog-pomeranian', - 'dog-pug', 'dog-saint_bernard', 'dog-samoyed', - 'dog-scottish_terrier', 'dog-shiba_inu', - 'dog-staffordshire_bull_terrier', 'dog-wheaten_terrier', - 'dog-yorkshire_terrier') -metainfo = dict(classes=class_name) -_data_root = data_root + 'OxfordPets/by-breed/' # noqa +class_name = ( + "cat-Abyssinian", + "cat-Bengal", + "cat-Birman", + "cat-Bombay", + "cat-British_Shorthair", + "cat-Egyptian_Mau", + "cat-Maine_Coon", + "cat-Persian", + "cat-Ragdoll", + "cat-Russian_Blue", + "cat-Siamese", + "cat-Sphynx", + "dog-american_bulldog", + "dog-american_pit_bull_terrier", + "dog-basset_hound", + "dog-beagle", + "dog-boxer", + "dog-chihuahua", + "dog-english_cocker_spaniel", + "dog-english_setter", + "dog-german_shorthaired", + "dog-great_pyrenees", + "dog-havanese", + "dog-japanese_chin", + "dog-keeshond", + "dog-leonberger", + "dog-miniature_pinscher", + "dog-newfoundland", + "dog-pomeranian", + "dog-pug", + "dog-saint_bernard", + "dog-samoyed", + "dog-scottish_terrier", + "dog-shiba_inu", + "dog-staffordshire_bull_terrier", + "dog-wheaten_terrier", + "dog-yorkshire_terrier", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "OxfordPets/by-breed/" # noqa dataset_OxfordPets_by_breed = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_OxfordPets_by_breed = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------20 OxfordPets_by_species---------------------# -class_name = ('cat', 'dog') +class_name = ("cat", "dog") metainfo = dict(classes=class_name) -_data_root = data_root + 'OxfordPets/by-species/' # noqa +_data_root = data_root + "OxfordPets/by-species/" # noqa dataset_OxfordPets_by_species = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_OxfordPets_by_species = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------21 PKLot---------------------# -class_name = ('space-empty', 'space-occupied') +class_name = ("space-empty", "space-occupied") metainfo = dict(classes=class_name) -_data_root = data_root + 'PKLot/640/' # noqa +_data_root = data_root + "PKLot/640/" # noqa dataset_PKLot = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_PKLot = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_PKLot = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------22 Packages---------------------# -class_name = ('package', ) -metainfo = dict(classes=class_name) -_data_root = data_root + 'Packages/Raw/' -caption_prompt = { - 'package': { - 'prefix': 'there is a ', - 'suffix': ' on the porch' - } -} +class_name = ("package",) +metainfo = dict(classes=class_name) +_data_root = data_root + "Packages/Raw/" +caption_prompt = {"package": {"prefix": "there is a ", "suffix": " on the porch"}} dataset_Packages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_Packages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Packages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------23 PascalVOC---------------------# -class_name = ('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', - 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', - 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', - 'tvmonitor') -metainfo = dict(classes=class_name) -_data_root = data_root + 'PascalVOC/' +class_name = ( + "aeroplane", + "bicycle", + "bird", + "boat", + "bottle", + "bus", + "car", + "cat", + "chair", + "cow", + "diningtable", + "dog", + "horse", + "motorbike", + "person", + "pottedplant", + "sheep", + "sofa", + "train", + "tvmonitor", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "PascalVOC/" dataset_PascalVOC = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_PascalVOC = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_PascalVOC = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------24 pistols---------------------# -class_name = ('pistol', ) +class_name = ("pistol",) metainfo = dict(classes=class_name) -_data_root = data_root + 'pistols/export/' +_data_root = data_root + "pistols/export/" dataset_pistols = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pistols = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pistols = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------25 plantdoc---------------------# -class_name = ('Apple Scab Leaf', 'Apple leaf', 'Apple rust leaf', - 'Bell_pepper leaf', 'Bell_pepper leaf spot', 'Blueberry leaf', - 'Cherry leaf', 'Corn Gray leaf spot', 'Corn leaf blight', - 'Corn rust leaf', 'Peach leaf', 'Potato leaf', - 'Potato leaf early blight', 'Potato leaf late blight', - 'Raspberry leaf', 'Soyabean leaf', 'Soybean leaf', - 'Squash Powdery mildew leaf', 'Strawberry leaf', - 'Tomato Early blight leaf', 'Tomato Septoria leaf spot', - 'Tomato leaf', 'Tomato leaf bacterial spot', - 'Tomato leaf late blight', 'Tomato leaf mosaic virus', - 'Tomato leaf yellow virus', 'Tomato mold leaf', - 'Tomato two spotted spider mites leaf', 'grape leaf', - 'grape leaf black rot') -metainfo = dict(classes=class_name) -_data_root = data_root + 'plantdoc/416x416/' +class_name = ( + "Apple Scab Leaf", + "Apple leaf", + "Apple rust leaf", + "Bell_pepper leaf", + "Bell_pepper leaf spot", + "Blueberry leaf", + "Cherry leaf", + "Corn Gray leaf spot", + "Corn leaf blight", + "Corn rust leaf", + "Peach leaf", + "Potato leaf", + "Potato leaf early blight", + "Potato leaf late blight", + "Raspberry leaf", + "Soyabean leaf", + "Soybean leaf", + "Squash Powdery mildew leaf", + "Strawberry leaf", + "Tomato Early blight leaf", + "Tomato Septoria leaf spot", + "Tomato leaf", + "Tomato leaf bacterial spot", + "Tomato leaf late blight", + "Tomato leaf mosaic virus", + "Tomato leaf yellow virus", + "Tomato mold leaf", + "Tomato two spotted spider mites leaf", + "grape leaf", + "grape leaf black rot", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "plantdoc/416x416/" dataset_plantdoc = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_plantdoc = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_plantdoc = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------26 pothole---------------------# -class_name = ('pothole', ) -metainfo = dict(classes=class_name) -_data_root = data_root + 'pothole/' -caption_prompt = { - 'pothole': { - 'name': 'holes', - 'prefix': 'there are some ', - 'suffix': ' on the road' - } -} +class_name = ("pothole",) +metainfo = dict(classes=class_name) +_data_root = data_root + "pothole/" +caption_prompt = {"pothole": {"name": "holes", "prefix": "there are some ", "suffix": " on the road"}} dataset_pothole = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), caption_prompt=caption_prompt, pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pothole = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pothole = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------27 Raccoon---------------------# -class_name = ('raccoon', ) +class_name = ("raccoon",) metainfo = dict(classes=class_name) -_data_root = data_root + 'Raccoon/Raccoon.v2-raw.coco/' +_data_root = data_root + "Raccoon/Raccoon.v2-raw.coco/" dataset_Raccoon = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_Raccoon = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Raccoon = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------28 selfdrivingCar---------------------# -class_name = ('biker', 'car', 'pedestrian', 'trafficLight', - 'trafficLight-Green', 'trafficLight-GreenLeft', - 'trafficLight-Red', 'trafficLight-RedLeft', - 'trafficLight-Yellow', 'trafficLight-YellowLeft', 'truck') -metainfo = dict(classes=class_name) -_data_root = data_root + 'selfdrivingCar/fixedLarge/export/' +class_name = ( + "biker", + "car", + "pedestrian", + "trafficLight", + "trafficLight-Green", + "trafficLight-GreenLeft", + "trafficLight-Red", + "trafficLight-RedLeft", + "trafficLight-Yellow", + "trafficLight-YellowLeft", + "truck", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "selfdrivingCar/fixedLarge/export/" dataset_selfdrivingCar = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_selfdrivingCar = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_selfdrivingCar = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------29 ShellfishOpenImages---------------------# -class_name = ('Crab', 'Lobster', 'Shrimp') +class_name = ("Crab", "Lobster", "Shrimp") metainfo = dict(classes=class_name) -_data_root = data_root + 'ShellfishOpenImages/raw/' +_data_root = data_root + "ShellfishOpenImages/raw/" dataset_ShellfishOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_ShellfishOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------30 ThermalCheetah---------------------# -class_name = ('cheetah', 'human') +class_name = ("cheetah", "human") metainfo = dict(classes=class_name) -_data_root = data_root + 'ThermalCheetah/' +_data_root = data_root + "ThermalCheetah/" dataset_ThermalCheetah = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_ThermalCheetah = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_ThermalCheetah = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------31 thermalDogsAndPeople---------------------# -class_name = ('dog', 'person') +class_name = ("dog", "person") metainfo = dict(classes=class_name) -_data_root = data_root + 'thermalDogsAndPeople/' +_data_root = data_root + "thermalDogsAndPeople/" dataset_thermalDogsAndPeople = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_thermalDogsAndPeople = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------32 UnoCards---------------------# -class_name = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', - '12', '13', '14') +class_name = ("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14") metainfo = dict(classes=class_name) -_data_root = data_root + 'UnoCards/raw/' +_data_root = data_root + "UnoCards/raw/" dataset_UnoCards = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_UnoCards = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_UnoCards = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------33 VehiclesOpenImages---------------------# -class_name = ('Ambulance', 'Bus', 'Car', 'Motorcycle', 'Truck') +class_name = ("Ambulance", "Bus", "Car", "Motorcycle", "Truck") metainfo = dict(classes=class_name) -_data_root = data_root + 'VehiclesOpenImages/416x416/' +_data_root = data_root + "VehiclesOpenImages/416x416/" dataset_VehiclesOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_VehiclesOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_VehiclesOpenImages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------34 WildfireSmoke---------------------# -class_name = ('smoke', ) +class_name = ("smoke",) metainfo = dict(classes=class_name) -_data_root = data_root + 'WildfireSmoke/' +_data_root = data_root + "WildfireSmoke/" dataset_WildfireSmoke = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_WildfireSmoke = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_WildfireSmoke = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------35 websiteScreenshots---------------------# -class_name = ('button', 'field', 'heading', 'iframe', 'image', 'label', 'link', - 'text') +class_name = ("button", "field", "heading", "iframe", "image", "label", "link", "text") metainfo = dict(classes=class_name) -_data_root = data_root + 'websiteScreenshots/' +_data_root = data_root + "websiteScreenshots/" dataset_websiteScreenshots = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_websiteScreenshots = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_websiteScreenshots = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # --------------------- Config---------------------# dataset_prefixes = [ - 'AerialMaritimeDrone_large', - 'AerialMaritimeDrone_tiled', - 'AmericanSignLanguageLetters', - 'Aquarium', - 'BCCD', - 'boggleBoards', - 'brackishUnderwater', - 'ChessPieces', - 'CottontailRabbits', - 'dice', - 'DroneControl', - 'EgoHands_generic', - 'EgoHands_specific', - 'HardHatWorkers', - 'MaskWearing', - 'MountainDewCommercial', - 'NorthAmericaMushrooms', - 'openPoetryVision', - 'OxfordPets_by_breed', - 'OxfordPets_by_species', - 'PKLot', - 'Packages', - 'PascalVOC', - 'pistols', - 'plantdoc', - 'pothole', - 'Raccoons', - 'selfdrivingCar', - 'ShellfishOpenImages', - 'ThermalCheetah', - 'thermalDogsAndPeople', - 'UnoCards', - 'VehiclesOpenImages', - 'WildfireSmoke', - 'websiteScreenshots', + "AerialMaritimeDrone_large", + "AerialMaritimeDrone_tiled", + "AmericanSignLanguageLetters", + "Aquarium", + "BCCD", + "boggleBoards", + "brackishUnderwater", + "ChessPieces", + "CottontailRabbits", + "dice", + "DroneControl", + "EgoHands_generic", + "EgoHands_specific", + "HardHatWorkers", + "MaskWearing", + "MountainDewCommercial", + "NorthAmericaMushrooms", + "openPoetryVision", + "OxfordPets_by_breed", + "OxfordPets_by_species", + "PKLot", + "Packages", + "PascalVOC", + "pistols", + "plantdoc", + "pothole", + "Raccoons", + "selfdrivingCar", + "ShellfishOpenImages", + "ThermalCheetah", + "thermalDogsAndPeople", + "UnoCards", + "VehiclesOpenImages", + "WildfireSmoke", + "websiteScreenshots", ] datasets = [ - dataset_AerialMaritimeDrone_large, dataset_AerialMaritimeDrone_tiled, - dataset_AmericanSignLanguageLetters, dataset_Aquarium, dataset_BCCD, - dataset_boggleBoards, dataset_brackishUnderwater, dataset_ChessPieces, - dataset_CottontailRabbits, dataset_dice, dataset_DroneControl, - dataset_EgoHands_generic, dataset_EgoHands_specific, - dataset_HardHatWorkers, dataset_MaskWearing, dataset_MountainDewCommercial, - dataset_NorthAmericaMushrooms, dataset_openPoetryVision, - dataset_OxfordPets_by_breed, dataset_OxfordPets_by_species, dataset_PKLot, - dataset_Packages, dataset_PascalVOC, dataset_pistols, dataset_plantdoc, - dataset_pothole, dataset_Raccoon, dataset_selfdrivingCar, - dataset_ShellfishOpenImages, dataset_ThermalCheetah, - dataset_thermalDogsAndPeople, dataset_UnoCards, dataset_VehiclesOpenImages, - dataset_WildfireSmoke, dataset_websiteScreenshots + dataset_AerialMaritimeDrone_large, + dataset_AerialMaritimeDrone_tiled, + dataset_AmericanSignLanguageLetters, + dataset_Aquarium, + dataset_BCCD, + dataset_boggleBoards, + dataset_brackishUnderwater, + dataset_ChessPieces, + dataset_CottontailRabbits, + dataset_dice, + dataset_DroneControl, + dataset_EgoHands_generic, + dataset_EgoHands_specific, + dataset_HardHatWorkers, + dataset_MaskWearing, + dataset_MountainDewCommercial, + dataset_NorthAmericaMushrooms, + dataset_openPoetryVision, + dataset_OxfordPets_by_breed, + dataset_OxfordPets_by_species, + dataset_PKLot, + dataset_Packages, + dataset_PascalVOC, + dataset_pistols, + dataset_plantdoc, + dataset_pothole, + dataset_Raccoon, + dataset_selfdrivingCar, + dataset_ShellfishOpenImages, + dataset_ThermalCheetah, + dataset_thermalDogsAndPeople, + dataset_UnoCards, + dataset_VehiclesOpenImages, + dataset_WildfireSmoke, + dataset_websiteScreenshots, ] metrics = [ val_evaluator_AerialMaritimeDrone_large, val_evaluator_AerialMaritimeDrone_tiled, - val_evaluator_AmericanSignLanguageLetters, val_evaluator_Aquarium, - val_evaluator_BCCD, val_evaluator_boggleBoards, - val_evaluator_brackishUnderwater, val_evaluator_ChessPieces, - val_evaluator_CottontailRabbits, val_evaluator_dice, - val_evaluator_DroneControl, val_evaluator_EgoHands_generic, - val_evaluator_EgoHands_specific, val_evaluator_HardHatWorkers, - val_evaluator_MaskWearing, val_evaluator_MountainDewCommercial, - val_evaluator_NorthAmericaMushrooms, val_evaluator_openPoetryVision, - val_evaluator_OxfordPets_by_breed, val_evaluator_OxfordPets_by_species, - val_evaluator_PKLot, val_evaluator_Packages, val_evaluator_PascalVOC, - val_evaluator_pistols, val_evaluator_plantdoc, val_evaluator_pothole, - val_evaluator_Raccoon, val_evaluator_selfdrivingCar, - val_evaluator_ShellfishOpenImages, val_evaluator_ThermalCheetah, - val_evaluator_thermalDogsAndPeople, val_evaluator_UnoCards, - val_evaluator_VehiclesOpenImages, val_evaluator_WildfireSmoke, - val_evaluator_websiteScreenshots + val_evaluator_AmericanSignLanguageLetters, + val_evaluator_Aquarium, + val_evaluator_BCCD, + val_evaluator_boggleBoards, + val_evaluator_brackishUnderwater, + val_evaluator_ChessPieces, + val_evaluator_CottontailRabbits, + val_evaluator_dice, + val_evaluator_DroneControl, + val_evaluator_EgoHands_generic, + val_evaluator_EgoHands_specific, + val_evaluator_HardHatWorkers, + val_evaluator_MaskWearing, + val_evaluator_MountainDewCommercial, + val_evaluator_NorthAmericaMushrooms, + val_evaluator_openPoetryVision, + val_evaluator_OxfordPets_by_breed, + val_evaluator_OxfordPets_by_species, + val_evaluator_PKLot, + val_evaluator_Packages, + val_evaluator_PascalVOC, + val_evaluator_pistols, + val_evaluator_plantdoc, + val_evaluator_pothole, + val_evaluator_Raccoon, + val_evaluator_selfdrivingCar, + val_evaluator_ShellfishOpenImages, + val_evaluator_ThermalCheetah, + val_evaluator_thermalDogsAndPeople, + val_evaluator_UnoCards, + val_evaluator_VehiclesOpenImages, + val_evaluator_WildfireSmoke, + val_evaluator_websiteScreenshots, ] # -------------------------------------------------# -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_bc_fpn_dyhead_pretrain_odinw13.py b/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_bc_fpn_dyhead_pretrain_odinw13.py index c3479b62b781fa38282b26ab69763d1766301dc7..247e73e7ca6561c7474235961e1bbb2e426ed70e 100644 --- a/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_bc_fpn_dyhead_pretrain_odinw13.py +++ b/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_bc_fpn_dyhead_pretrain_odinw13.py @@ -1,3 +1,3 @@ -_base_ = './glip_atss_swin-t_a_fpn_dyhead_pretrain_odinw13.py' +_base_ = "./glip_atss_swin-t_a_fpn_dyhead_pretrain_odinw13.py" model = dict(bbox_head=dict(early_fuse=True)) diff --git a/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_bc_fpn_dyhead_pretrain_odinw35.py b/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_bc_fpn_dyhead_pretrain_odinw35.py index 182afc66c93441da85d7e0116970e45a58c492d0..f002e1d0c88a43854b3c11e9e5c3cf3dbba8fa3f 100644 --- a/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_bc_fpn_dyhead_pretrain_odinw35.py +++ b/mmpose/configs/mmdet/glip/odinw/glip_atss_swin-t_bc_fpn_dyhead_pretrain_odinw35.py @@ -1,3 +1,3 @@ -_base_ = './glip_atss_swin-t_a_fpn_dyhead_pretrain_odinw35.py' +_base_ = "./glip_atss_swin-t_a_fpn_dyhead_pretrain_odinw35.py" model = dict(bbox_head=dict(early_fuse=True)) diff --git a/mmpose/configs/mmdet/glip/odinw/override_category.py b/mmpose/configs/mmdet/glip/odinw/override_category.py index 9ff05fc6e5e4d0989cf7fcf7af4dc902ee99f3a3..aeadada4e6f6c8f154ca6c413e573793b2189e48 100644 --- a/mmpose/configs/mmdet/glip/odinw/override_category.py +++ b/mmpose/configs/mmdet/glip/odinw/override_category.py @@ -5,105 +5,52 @@ import mmengine def parse_args(): - parser = argparse.ArgumentParser(description='Override Category') - parser.add_argument('data_root') + parser = argparse.ArgumentParser(description="Override Category") + parser.add_argument("data_root") return parser.parse_args() def main(): args = parse_args() - ChessPieces = [{ - 'id': 1, - 'name': ' ', - 'supercategory': 'pieces' - }, { - 'id': 2, - 'name': 'black bishop', - 'supercategory': 'pieces' - }, { - 'id': 3, - 'name': 'black king', - 'supercategory': 'pieces' - }, { - 'id': 4, - 'name': 'black knight', - 'supercategory': 'pieces' - }, { - 'id': 5, - 'name': 'black pawn', - 'supercategory': 'pieces' - }, { - 'id': 6, - 'name': 'black queen', - 'supercategory': 'pieces' - }, { - 'id': 7, - 'name': 'black rook', - 'supercategory': 'pieces' - }, { - 'id': 8, - 'name': 'white bishop', - 'supercategory': 'pieces' - }, { - 'id': 9, - 'name': 'white king', - 'supercategory': 'pieces' - }, { - 'id': 10, - 'name': 'white knight', - 'supercategory': 'pieces' - }, { - 'id': 11, - 'name': 'white pawn', - 'supercategory': 'pieces' - }, { - 'id': 12, - 'name': 'white queen', - 'supercategory': 'pieces' - }, { - 'id': 13, - 'name': 'white rook', - 'supercategory': 'pieces' - }] - - _data_root = args.data_root + 'ChessPieces/Chess Pieces.v23-raw.coco/' - json_data = mmengine.load(_data_root + - 'valid/annotations_without_background.json') - json_data['categories'] = ChessPieces - mmengine.dump(json_data, - _data_root + 'valid/new_annotations_without_background.json') - - CottontailRabbits = [{ - 'id': 1, - 'name': 'rabbit', - 'supercategory': 'Cottontail-Rabbit' - }] - - _data_root = args.data_root + 'CottontailRabbits/' - json_data = mmengine.load(_data_root + - 'valid/annotations_without_background.json') - json_data['categories'] = CottontailRabbits - mmengine.dump(json_data, - _data_root + 'valid/new_annotations_without_background.json') - - NorthAmericaMushrooms = [{ - 'id': 1, - 'name': 'flat mushroom', - 'supercategory': 'mushroom' - }, { - 'id': 2, - 'name': 'yellow mushroom', - 'supercategory': 'mushroom' - }] - - _data_root = args.data_root + 'NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/' # noqa - json_data = mmengine.load(_data_root + - 'valid/annotations_without_background.json') - json_data['categories'] = NorthAmericaMushrooms - mmengine.dump(json_data, - _data_root + 'valid/new_annotations_without_background.json') - - -if __name__ == '__main__': + ChessPieces = [ + {"id": 1, "name": " ", "supercategory": "pieces"}, + {"id": 2, "name": "black bishop", "supercategory": "pieces"}, + {"id": 3, "name": "black king", "supercategory": "pieces"}, + {"id": 4, "name": "black knight", "supercategory": "pieces"}, + {"id": 5, "name": "black pawn", "supercategory": "pieces"}, + {"id": 6, "name": "black queen", "supercategory": "pieces"}, + {"id": 7, "name": "black rook", "supercategory": "pieces"}, + {"id": 8, "name": "white bishop", "supercategory": "pieces"}, + {"id": 9, "name": "white king", "supercategory": "pieces"}, + {"id": 10, "name": "white knight", "supercategory": "pieces"}, + {"id": 11, "name": "white pawn", "supercategory": "pieces"}, + {"id": 12, "name": "white queen", "supercategory": "pieces"}, + {"id": 13, "name": "white rook", "supercategory": "pieces"}, + ] + + _data_root = args.data_root + "ChessPieces/Chess Pieces.v23-raw.coco/" + json_data = mmengine.load(_data_root + "valid/annotations_without_background.json") + json_data["categories"] = ChessPieces + mmengine.dump(json_data, _data_root + "valid/new_annotations_without_background.json") + + CottontailRabbits = [{"id": 1, "name": "rabbit", "supercategory": "Cottontail-Rabbit"}] + + _data_root = args.data_root + "CottontailRabbits/" + json_data = mmengine.load(_data_root + "valid/annotations_without_background.json") + json_data["categories"] = CottontailRabbits + mmengine.dump(json_data, _data_root + "valid/new_annotations_without_background.json") + + NorthAmericaMushrooms = [ + {"id": 1, "name": "flat mushroom", "supercategory": "mushroom"}, + {"id": 2, "name": "yellow mushroom", "supercategory": "mushroom"}, + ] + + _data_root = args.data_root + "NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/" # noqa + json_data = mmengine.load(_data_root + "valid/annotations_without_background.json") + json_data["categories"] = NorthAmericaMushrooms + mmengine.dump(json_data, _data_root + "valid/new_annotations_without_background.json") + + +if __name__ == "__main__": main() diff --git a/mmpose/configs/mmdet/gn+ws/faster-rcnn_r101_fpn_gn-ws-all_1x_coco.py b/mmpose/configs/mmdet/gn+ws/faster-rcnn_r101_fpn_gn-ws-all_1x_coco.py index a4cb8281ac6d4b43a615ba1a05903770d8ee2f69..09d3e4aa8309a8451cff51f970b77a96eacb15a3 100644 --- a/mmpose/configs/mmdet/gn+ws/faster-rcnn_r101_fpn_gn-ws-all_1x_coco.py +++ b/mmpose/configs/mmdet/gn+ws/faster-rcnn_r101_fpn_gn-ws-all_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './faster-rcnn_r50_fpn_gn-ws-all_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://jhu/resnet101_gn_ws'))) +_base_ = "./faster-rcnn_r50_fpn_gn-ws-all_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://jhu/resnet101_gn_ws"))) diff --git a/mmpose/configs/mmdet/gn+ws/faster-rcnn_r50_fpn_gn-ws-all_1x_coco.py b/mmpose/configs/mmdet/gn+ws/faster-rcnn_r50_fpn_gn-ws-all_1x_coco.py index 1a044c99a2e84de71822edb62543570891141b25..a82e23bbdcedd386a3000c559f0e493fda1462aa 100644 --- a/mmpose/configs/mmdet/gn+ws/faster-rcnn_r50_fpn_gn-ws-all_1x_coco.py +++ b/mmpose/configs/mmdet/gn+ws/faster-rcnn_r50_fpn_gn-ws-all_1x_coco.py @@ -1,16 +1,8 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' -conv_cfg = dict(type='ConvWS') -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" +conv_cfg = dict(type="ConvWS") +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict( - backbone=dict( - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://jhu/resnet50_gn_ws')), + backbone=dict(conv_cfg=conv_cfg, norm_cfg=norm_cfg, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://jhu/resnet50_gn_ws")), neck=dict(conv_cfg=conv_cfg, norm_cfg=norm_cfg), - roi_head=dict( - bbox_head=dict( - type='Shared4Conv1FCBBoxHead', - conv_out_channels=256, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg))) + roi_head=dict(bbox_head=dict(type="Shared4Conv1FCBBoxHead", conv_out_channels=256, conv_cfg=conv_cfg, norm_cfg=norm_cfg)), +) diff --git a/mmpose/configs/mmdet/gn+ws/faster-rcnn_x101-32x4d_fpn_gn-ws-all_1x_coco.py b/mmpose/configs/mmdet/gn+ws/faster-rcnn_x101-32x4d_fpn_gn-ws-all_1x_coco.py index b2a317d2ac830d95788084eaa8d374838b34a365..acf7cedef4e008e801c87a4bde2b66a637b1b7bf 100644 --- a/mmpose/configs/mmdet/gn+ws/faster-rcnn_x101-32x4d_fpn_gn-ws-all_1x_coco.py +++ b/mmpose/configs/mmdet/gn+ws/faster-rcnn_x101-32x4d_fpn_gn-ws-all_1x_coco.py @@ -1,18 +1,18 @@ -_base_ = './faster-rcnn_r50_fpn_gn-ws-all_1x_coco.py' -conv_cfg = dict(type='ConvWS') -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +_base_ = "./faster-rcnn_r50_fpn_gn-ws-all_1x_coco.py" +conv_cfg = dict(type="ConvWS") +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - style='pytorch', + style="pytorch", conv_cfg=conv_cfg, norm_cfg=norm_cfg, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://jhu/resnext101_32x4d_gn_ws'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://jhu/resnext101_32x4d_gn_ws"), + ) +) diff --git a/mmpose/configs/mmdet/gn+ws/faster-rcnn_x50-32x4d_fpn_gn-ws-all_1x_coco.py b/mmpose/configs/mmdet/gn+ws/faster-rcnn_x50-32x4d_fpn_gn-ws-all_1x_coco.py index dd75a2c004b8cc04411d47d8b9db6ba0ec4ffcb0..35e1a92cbb7feac302c4d46b8a9aabf1b049be77 100644 --- a/mmpose/configs/mmdet/gn+ws/faster-rcnn_x50-32x4d_fpn_gn-ws-all_1x_coco.py +++ b/mmpose/configs/mmdet/gn+ws/faster-rcnn_x50-32x4d_fpn_gn-ws-all_1x_coco.py @@ -1,18 +1,18 @@ -_base_ = './faster-rcnn_r50_fpn_gn-ws-all_1x_coco.py' -conv_cfg = dict(type='ConvWS') -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +_base_ = "./faster-rcnn_r50_fpn_gn-ws-all_1x_coco.py" +conv_cfg = dict(type="ConvWS") +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=50, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - style='pytorch', + style="pytorch", conv_cfg=conv_cfg, norm_cfg=norm_cfg, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://jhu/resnext50_32x4d_gn_ws'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://jhu/resnext50_32x4d_gn_ws"), + ) +) diff --git a/mmpose/configs/mmdet/gn+ws/mask-rcnn_r101_fpn_gn-ws-all_20-23-24e_coco.py b/mmpose/configs/mmdet/gn+ws/mask-rcnn_r101_fpn_gn-ws-all_20-23-24e_coco.py index 1815e3f85b9fd5d7204b08cd60a13980a382fd51..5d4e95f58802b8d568bf3f9024dd7e640b9e4094 100644 --- a/mmpose/configs/mmdet/gn+ws/mask-rcnn_r101_fpn_gn-ws-all_20-23-24e_coco.py +++ b/mmpose/configs/mmdet/gn+ws/mask-rcnn_r101_fpn_gn-ws-all_20-23-24e_coco.py @@ -1,17 +1,10 @@ -_base_ = './mask-rcnn_r101_fpn_gn-ws-all_2x_coco.py' +_base_ = "./mask-rcnn_r101_fpn_gn-ws-all_2x_coco.py" # learning policy max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[20, 23], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[20, 23], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/gn+ws/mask-rcnn_r101_fpn_gn-ws-all_2x_coco.py b/mmpose/configs/mmdet/gn+ws/mask-rcnn_r101_fpn_gn-ws-all_2x_coco.py index 5de37dee5e86e202c211464eaa08dd295dba44b2..dfa0c13afcab78a26c85bf0ca64a1909841655a2 100644 --- a/mmpose/configs/mmdet/gn+ws/mask-rcnn_r101_fpn_gn-ws-all_2x_coco.py +++ b/mmpose/configs/mmdet/gn+ws/mask-rcnn_r101_fpn_gn-ws-all_2x_coco.py @@ -1,6 +1,2 @@ -_base_ = './mask-rcnn_r50_fpn_gn-ws-all_2x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://jhu/resnet101_gn_ws'))) +_base_ = "./mask-rcnn_r50_fpn_gn-ws-all_2x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://jhu/resnet101_gn_ws"))) diff --git a/mmpose/configs/mmdet/gn+ws/mask-rcnn_r50_fpn_gn-ws-all_20-23-24e_coco.py b/mmpose/configs/mmdet/gn+ws/mask-rcnn_r50_fpn_gn-ws-all_20-23-24e_coco.py index 287c652045d6230411043f2abab34be4f6106687..3acaafcb44abd71c8b9b48122fbc340841990461 100644 --- a/mmpose/configs/mmdet/gn+ws/mask-rcnn_r50_fpn_gn-ws-all_20-23-24e_coco.py +++ b/mmpose/configs/mmdet/gn+ws/mask-rcnn_r50_fpn_gn-ws-all_20-23-24e_coco.py @@ -1,17 +1,10 @@ -_base_ = './mask-rcnn_r50_fpn_gn-ws-all_2x_coco.py' +_base_ = "./mask-rcnn_r50_fpn_gn-ws-all_2x_coco.py" # learning policy max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[20, 23], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[20, 23], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/gn+ws/mask-rcnn_r50_fpn_gn-ws-all_2x_coco.py b/mmpose/configs/mmdet/gn+ws/mask-rcnn_r50_fpn_gn-ws-all_2x_coco.py index ed8b1b73fe8695fc6bbb4054405192fca995cf81..06f599234b03251eb7419b74f653da3c435f198f 100644 --- a/mmpose/configs/mmdet/gn+ws/mask-rcnn_r50_fpn_gn-ws-all_2x_coco.py +++ b/mmpose/configs/mmdet/gn+ws/mask-rcnn_r50_fpn_gn-ws-all_2x_coco.py @@ -1,33 +1,20 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' -conv_cfg = dict(type='ConvWS') -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" +conv_cfg = dict(type="ConvWS") +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict( - backbone=dict( - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://jhu/resnet50_gn_ws')), + backbone=dict(conv_cfg=conv_cfg, norm_cfg=norm_cfg, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://jhu/resnet50_gn_ws")), neck=dict(conv_cfg=conv_cfg, norm_cfg=norm_cfg), roi_head=dict( - bbox_head=dict( - type='Shared4Conv1FCBBoxHead', - conv_out_channels=256, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg), - mask_head=dict(conv_cfg=conv_cfg, norm_cfg=norm_cfg))) + bbox_head=dict(type="Shared4Conv1FCBBoxHead", conv_out_channels=256, conv_cfg=conv_cfg, norm_cfg=norm_cfg), + mask_head=dict(conv_cfg=conv_cfg, norm_cfg=norm_cfg), + ), +) # learning policy max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/gn+ws/mask-rcnn_x101-32x4d_fpn_gn-ws-all_20-23-24e_coco.py b/mmpose/configs/mmdet/gn+ws/mask-rcnn_x101-32x4d_fpn_gn-ws-all_20-23-24e_coco.py index 8ce9193579b914f8dc0804cb73c3d8e41b153655..c8f30644a640e12d95327a783a56129b53adb8b1 100644 --- a/mmpose/configs/mmdet/gn+ws/mask-rcnn_x101-32x4d_fpn_gn-ws-all_20-23-24e_coco.py +++ b/mmpose/configs/mmdet/gn+ws/mask-rcnn_x101-32x4d_fpn_gn-ws-all_20-23-24e_coco.py @@ -1,17 +1,10 @@ -_base_ = './mask-rcnn_x101-32x4d_fpn_gn-ws-all_2x_coco.py' +_base_ = "./mask-rcnn_x101-32x4d_fpn_gn-ws-all_2x_coco.py" # learning policy max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[20, 23], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[20, 23], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/gn+ws/mask-rcnn_x101-32x4d_fpn_gn-ws-all_2x_coco.py b/mmpose/configs/mmdet/gn+ws/mask-rcnn_x101-32x4d_fpn_gn-ws-all_2x_coco.py index bcfc371e774470ede7d171b4268db919385775ab..1e11be0dbe32bba70bee038832eb6979ea9e72a2 100644 --- a/mmpose/configs/mmdet/gn+ws/mask-rcnn_x101-32x4d_fpn_gn-ws-all_2x_coco.py +++ b/mmpose/configs/mmdet/gn+ws/mask-rcnn_x101-32x4d_fpn_gn-ws-all_2x_coco.py @@ -1,19 +1,19 @@ -_base_ = './mask-rcnn_r50_fpn_gn-ws-all_2x_coco.py' +_base_ = "./mask-rcnn_r50_fpn_gn-ws-all_2x_coco.py" # model settings -conv_cfg = dict(type='ConvWS') -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +conv_cfg = dict(type="ConvWS") +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - style='pytorch', + style="pytorch", conv_cfg=conv_cfg, norm_cfg=norm_cfg, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://jhu/resnext101_32x4d_gn_ws'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://jhu/resnext101_32x4d_gn_ws"), + ) +) diff --git a/mmpose/configs/mmdet/gn+ws/mask-rcnn_x50-32x4d_fpn_gn-ws-all_20-23-24e_coco.py b/mmpose/configs/mmdet/gn+ws/mask-rcnn_x50-32x4d_fpn_gn-ws-all_20-23-24e_coco.py index af9ea5ab476b8ea3247062261726bef6b6bc1b0c..49aa7b3ff48512c2c6435322a098d30aa6a6429f 100644 --- a/mmpose/configs/mmdet/gn+ws/mask-rcnn_x50-32x4d_fpn_gn-ws-all_20-23-24e_coco.py +++ b/mmpose/configs/mmdet/gn+ws/mask-rcnn_x50-32x4d_fpn_gn-ws-all_20-23-24e_coco.py @@ -1,17 +1,10 @@ -_base_ = './mask-rcnn_x50-32x4d_fpn_gn-ws-all_2x_coco.py' +_base_ = "./mask-rcnn_x50-32x4d_fpn_gn-ws-all_2x_coco.py" # learning policy max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[20, 23], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[20, 23], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/gn+ws/mask-rcnn_x50-32x4d_fpn_gn-ws-all_2x_coco.py b/mmpose/configs/mmdet/gn+ws/mask-rcnn_x50-32x4d_fpn_gn-ws-all_2x_coco.py index ab2b14042e9510ab14698e7a64c68d6ff60835e1..23f6e37a855c70215ff5596c8ca3bd4d7b78cd13 100644 --- a/mmpose/configs/mmdet/gn+ws/mask-rcnn_x50-32x4d_fpn_gn-ws-all_2x_coco.py +++ b/mmpose/configs/mmdet/gn+ws/mask-rcnn_x50-32x4d_fpn_gn-ws-all_2x_coco.py @@ -1,19 +1,19 @@ -_base_ = './mask-rcnn_r50_fpn_gn-ws-all_2x_coco.py' +_base_ = "./mask-rcnn_r50_fpn_gn-ws-all_2x_coco.py" # model settings -conv_cfg = dict(type='ConvWS') -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +conv_cfg = dict(type="ConvWS") +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=50, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - style='pytorch', + style="pytorch", conv_cfg=conv_cfg, norm_cfg=norm_cfg, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://jhu/resnext50_32x4d_gn_ws'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://jhu/resnext50_32x4d_gn_ws"), + ) +) diff --git a/mmpose/configs/mmdet/gn/mask-rcnn_r101_fpn_gn-all_2x_coco.py b/mmpose/configs/mmdet/gn/mask-rcnn_r101_fpn_gn-all_2x_coco.py index 54f57d8d0855d07c696907d8c7c0758e4c13a573..3bf9fd5337bc0859f8580e1b0547da857e12e32f 100644 --- a/mmpose/configs/mmdet/gn/mask-rcnn_r101_fpn_gn-all_2x_coco.py +++ b/mmpose/configs/mmdet/gn/mask-rcnn_r101_fpn_gn-all_2x_coco.py @@ -1,7 +1,2 @@ -_base_ = './mask-rcnn_r50_fpn_gn-all_2x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron/resnet101_gn'))) +_base_ = "./mask-rcnn_r50_fpn_gn-all_2x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron/resnet101_gn"))) diff --git a/mmpose/configs/mmdet/gn/mask-rcnn_r101_fpn_gn-all_3x_coco.py b/mmpose/configs/mmdet/gn/mask-rcnn_r101_fpn_gn-all_3x_coco.py index a94e063ecd2a5e2fd83eb78aa4d7ddd8f51e2b9e..aad293b7fc5c992d4debc1a6f3eb3415385585c3 100644 --- a/mmpose/configs/mmdet/gn/mask-rcnn_r101_fpn_gn-all_3x_coco.py +++ b/mmpose/configs/mmdet/gn/mask-rcnn_r101_fpn_gn-all_3x_coco.py @@ -1,4 +1,4 @@ -_base_ = './mask-rcnn_r101_fpn_gn-all_2x_coco.py' +_base_ = "./mask-rcnn_r101_fpn_gn-all_2x_coco.py" # learning policy max_epochs = 36 @@ -6,13 +6,6 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[28, 34], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[28, 34], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/gn/mask-rcnn_r50-contrib_fpn_gn-all_2x_coco.py b/mmpose/configs/mmdet/gn/mask-rcnn_r50-contrib_fpn_gn-all_2x_coco.py index 5515ec14a47a0dfa58acf6c46bc40d77ce39ac3d..d3f51e165d47946f45fc0a9f26e380f1c02d78ca 100644 --- a/mmpose/configs/mmdet/gn/mask-rcnn_r50-contrib_fpn_gn-all_2x_coco.py +++ b/mmpose/configs/mmdet/gn/mask-rcnn_r50-contrib_fpn_gn-all_2x_coco.py @@ -1,17 +1,12 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict( - backbone=dict( - norm_cfg=norm_cfg, - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://contrib/resnet50_gn')), + backbone=dict(norm_cfg=norm_cfg, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://contrib/resnet50_gn")), neck=dict(norm_cfg=norm_cfg), roi_head=dict( - bbox_head=dict( - type='Shared4Conv1FCBBoxHead', - conv_out_channels=256, - norm_cfg=norm_cfg), - mask_head=dict(norm_cfg=norm_cfg))) + bbox_head=dict(type="Shared4Conv1FCBBoxHead", conv_out_channels=256, norm_cfg=norm_cfg), mask_head=dict(norm_cfg=norm_cfg) + ), +) # learning policy max_epochs = 24 @@ -19,13 +14,6 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/gn/mask-rcnn_r50-contrib_fpn_gn-all_3x_coco.py b/mmpose/configs/mmdet/gn/mask-rcnn_r50-contrib_fpn_gn-all_3x_coco.py index e6f7a97e8e0482836b225e832be2e3de4ae99947..91548b91d8b9b07adb8be05644e0b95e69f2b2ab 100644 --- a/mmpose/configs/mmdet/gn/mask-rcnn_r50-contrib_fpn_gn-all_3x_coco.py +++ b/mmpose/configs/mmdet/gn/mask-rcnn_r50-contrib_fpn_gn-all_3x_coco.py @@ -1,4 +1,4 @@ -_base_ = './mask-rcnn_r50-contrib_fpn_gn-all_2x_coco.py' +_base_ = "./mask-rcnn_r50-contrib_fpn_gn-all_2x_coco.py" # learning policy max_epochs = 36 @@ -6,13 +6,6 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[28, 34], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[28, 34], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/gn/mask-rcnn_r50_fpn_gn-all_2x_coco.py b/mmpose/configs/mmdet/gn/mask-rcnn_r50_fpn_gn-all_2x_coco.py index 1313b22e4795239d5148fb8d665cdadb5fac8e4f..a0d37f0f504b9ab82f266218887fd317c7d9c172 100644 --- a/mmpose/configs/mmdet/gn/mask-rcnn_r50_fpn_gn-all_2x_coco.py +++ b/mmpose/configs/mmdet/gn/mask-rcnn_r50_fpn_gn-all_2x_coco.py @@ -1,22 +1,13 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict( - data_preprocessor=dict( - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False), - backbone=dict( - norm_cfg=norm_cfg, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron/resnet50_gn')), + data_preprocessor=dict(mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False), + backbone=dict(norm_cfg=norm_cfg, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron/resnet50_gn")), neck=dict(norm_cfg=norm_cfg), roi_head=dict( - bbox_head=dict( - type='Shared4Conv1FCBBoxHead', - conv_out_channels=256, - norm_cfg=norm_cfg), - mask_head=dict(norm_cfg=norm_cfg))) + bbox_head=dict(type="Shared4Conv1FCBBoxHead", conv_out_channels=256, norm_cfg=norm_cfg), mask_head=dict(norm_cfg=norm_cfg) + ), +) # learning policy max_epochs = 24 @@ -24,13 +15,6 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/gn/mask-rcnn_r50_fpn_gn-all_3x_coco.py b/mmpose/configs/mmdet/gn/mask-rcnn_r50_fpn_gn-all_3x_coco.py index e425de951bb0419d1d1596e45637be1d914a8034..1553e08edf12a555da3dadff76c0569b26ab3d92 100644 --- a/mmpose/configs/mmdet/gn/mask-rcnn_r50_fpn_gn-all_3x_coco.py +++ b/mmpose/configs/mmdet/gn/mask-rcnn_r50_fpn_gn-all_3x_coco.py @@ -1,4 +1,4 @@ -_base_ = './mask-rcnn_r50_fpn_gn-all_2x_coco.py' +_base_ = "./mask-rcnn_r50_fpn_gn-all_2x_coco.py" # learning policy max_epochs = 36 @@ -6,13 +6,6 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[28, 34], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[28, 34], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_r101_fpn_gn-head_2x_coco.py b/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_r101_fpn_gn-head_2x_coco.py index 46d41ed4ed5d1d6345e98434221cc5b07c60767d..98f965759cc4acb78b78f10cb05f907504bd39c2 100644 --- a/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_r101_fpn_gn-head_2x_coco.py +++ b/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_r101_fpn_gn-head_2x_coco.py @@ -1,7 +1,3 @@ -_base_ = './grid-rcnn_r50_fpn_gn-head_2x_coco.py' +_base_ = "./grid-rcnn_r50_fpn_gn-head_2x_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_r50_fpn_gn-head_1x_coco.py b/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_r50_fpn_gn-head_1x_coco.py index 358280630fa96e40ac7834cbda6b1ad3dc689c55..02f7dc10245174a1d296445b79717720fd8397b4 100644 --- a/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_r50_fpn_gn-head_1x_coco.py +++ b/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_r50_fpn_gn-head_1x_coco.py @@ -1,4 +1,4 @@ -_base_ = './grid-rcnn_r50_fpn_gn-head_2x_coco.py' +_base_ = "./grid-rcnn_r50_fpn_gn-head_2x_coco.py" # training schedule max_epochs = 12 @@ -6,14 +6,6 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.0001, by_epoch=False, begin=0, - end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.0001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[8, 11], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_r50_fpn_gn-head_2x_coco.py b/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_r50_fpn_gn-head_2x_coco.py index 228fca2323ceec2052a3835089d987a2643c53c1..b40c5376f3aa48bd1a116561e66df8109c1a0ff5 100644 --- a/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_r50_fpn_gn-head_2x_coco.py +++ b/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_r50_fpn_gn-head_2x_coco.py @@ -1,156 +1,102 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/default_runtime.py"] # model settings model = dict( - type='GridRCNN', + type="GridRCNN", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.0), + ), roi_head=dict( - type='GridRoIHead', + type="GridRoIHead", bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", with_reg=False, in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), - reg_class_agnostic=False), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + reg_class_agnostic=False, + ), grid_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), grid_head=dict( - type='GridHead', + type="GridHead", grid_points=9, num_convs=8, in_channels=256, point_feat_channels=64, - norm_cfg=dict(type='GN', num_groups=36), - loss_grid=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=15))), + norm_cfg=dict(type="GN", num_groups=36), + loss_grid=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=15), + ), + ), # model training and testing settings train_cfg=dict( rpn=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, ignore_iof_thr=-1), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, ignore_iof_thr=-1), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_radius=1, pos_weight=-1, max_num_grid=192, - debug=False)), + debug=False, + ), + ), test_cfg=dict( - rpn=dict( - nms_pre=1000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.03, - nms=dict(type='nms', iou_threshold=0.3), - max_per_img=100))) + rpn=dict(nms_pre=1000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.03, nms=dict(type="nms", iou_threshold=0.3), max_per_img=100), + ), +) # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)) # training schedule max_epochs = 25 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 80, - by_epoch=False, - begin=0, - end=3665), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[17, 23], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 80, by_epoch=False, begin=0, end=3665), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[17, 23], gamma=0.1), ] # Default setting for scaling LR automatically diff --git a/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_x101-32x4d_fpn_gn-head_2x_coco.py b/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_x101-32x4d_fpn_gn-head_2x_coco.py index dddf157beb6667887d0cd920cb2803e340d43183..61c10456e905440580f099119af396f2a1a3bdc0 100644 --- a/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_x101-32x4d_fpn_gn-head_2x_coco.py +++ b/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_x101-32x4d_fpn_gn-head_2x_coco.py @@ -1,13 +1,14 @@ -_base_ = './grid-rcnn_r50_fpn_gn-head_2x_coco.py' +_base_ = "./grid-rcnn_r50_fpn_gn-head_2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_x101-64x4d_fpn_gn-head_2x_coco.py b/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_x101-64x4d_fpn_gn-head_2x_coco.py index e4ff50f546ae660cf398c2cb1c6f67ca20848c0f..187e6ff3e503c751cf9aec402083d3109bde2487 100644 --- a/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_x101-64x4d_fpn_gn-head_2x_coco.py +++ b/mmpose/configs/mmdet/grid_rcnn/grid-rcnn_x101-64x4d_fpn_gn-head_2x_coco.py @@ -1,13 +1,14 @@ -_base_ = './grid-rcnn_x101-32x4d_fpn_gn-head_2x_coco.py' +_base_ = "./grid-rcnn_x101-32x4d_fpn_gn-head_2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/groie/faste-rcnn_r50_fpn_groie_1x_coco.py b/mmpose/configs/mmdet/groie/faste-rcnn_r50_fpn_groie_1x_coco.py index 0fbe8a32c3a81e9b312a02f79f3495171387d9f0..0bf679a13b3d598c68b150c192d635b8de4f8a5b 100644 --- a/mmpose/configs/mmdet/groie/faste-rcnn_r50_fpn_groie_1x_coco.py +++ b/mmpose/configs/mmdet/groie/faste-rcnn_r50_fpn_groie_1x_coco.py @@ -1,25 +1,22 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" # model settings model = dict( roi_head=dict( bbox_roi_extractor=dict( - type='GenericRoIExtractor', - aggregation='sum', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=2), + type="GenericRoIExtractor", + aggregation="sum", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=2), out_channels=256, featmap_strides=[4, 8, 16, 32], pre_cfg=dict( - type='ConvModule', + type="ConvModule", in_channels=256, out_channels=256, kernel_size=5, padding=2, inplace=False, ), - post_cfg=dict( - type='GeneralizedAttention', - in_channels=256, - spatial_range=-1, - num_heads=6, - attention_type='0100', - kv_stride=2)))) + post_cfg=dict(type="GeneralizedAttention", in_channels=256, spatial_range=-1, num_heads=6, attention_type="0100", kv_stride=2), + ) + ) +) diff --git a/mmpose/configs/mmdet/groie/grid-rcnn_r50_fpn_gn-head-groie_1x_coco.py b/mmpose/configs/mmdet/groie/grid-rcnn_r50_fpn_gn-head-groie_1x_coco.py index dadccb79c2288f16eb4a1fa33269e4a8f5a55c9b..06d089862892072e41f68c66177f62f741e81b3b 100644 --- a/mmpose/configs/mmdet/groie/grid-rcnn_r50_fpn_gn-head-groie_1x_coco.py +++ b/mmpose/configs/mmdet/groie/grid-rcnn_r50_fpn_gn-head-groie_1x_coco.py @@ -1,45 +1,37 @@ -_base_ = '../grid_rcnn/grid-rcnn_r50_fpn_gn-head_1x_coco.py' +_base_ = "../grid_rcnn/grid-rcnn_r50_fpn_gn-head_1x_coco.py" # model settings model = dict( roi_head=dict( bbox_roi_extractor=dict( - type='GenericRoIExtractor', - aggregation='sum', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=2), + type="GenericRoIExtractor", + aggregation="sum", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=2), out_channels=256, featmap_strides=[4, 8, 16, 32], pre_cfg=dict( - type='ConvModule', + type="ConvModule", in_channels=256, out_channels=256, kernel_size=5, padding=2, inplace=False, ), - post_cfg=dict( - type='GeneralizedAttention', - in_channels=256, - spatial_range=-1, - num_heads=6, - attention_type='0100', - kv_stride=2)), + post_cfg=dict(type="GeneralizedAttention", in_channels=256, spatial_range=-1, num_heads=6, attention_type="0100", kv_stride=2), + ), grid_roi_extractor=dict( - type='GenericRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=2), + type="GenericRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=2), out_channels=256, featmap_strides=[4, 8, 16, 32], pre_cfg=dict( - type='ConvModule', + type="ConvModule", in_channels=256, out_channels=256, kernel_size=5, padding=2, inplace=False, ), - post_cfg=dict( - type='GeneralizedAttention', - in_channels=256, - spatial_range=-1, - num_heads=6, - attention_type='0100', - kv_stride=2)))) + post_cfg=dict(type="GeneralizedAttention", in_channels=256, spatial_range=-1, num_heads=6, attention_type="0100", kv_stride=2), + ), + ) +) diff --git a/mmpose/configs/mmdet/groie/mask-rcnn_r101_fpn_syncbn-r4-gcb_c3-c5-groie_1x_coco.py b/mmpose/configs/mmdet/groie/mask-rcnn_r101_fpn_syncbn-r4-gcb_c3-c5-groie_1x_coco.py index 5699b4284a76fe633afd81acb0b047a81df6afd2..2916182929be7dc289d6511277c8622db6967514 100644 --- a/mmpose/configs/mmdet/groie/mask-rcnn_r101_fpn_syncbn-r4-gcb_c3-c5-groie_1x_coco.py +++ b/mmpose/configs/mmdet/groie/mask-rcnn_r101_fpn_syncbn-r4-gcb_c3-c5-groie_1x_coco.py @@ -1,45 +1,37 @@ -_base_ = '../gcnet/mask-rcnn_r101-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py' +_base_ = "../gcnet/mask-rcnn_r101-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py" # model settings model = dict( roi_head=dict( bbox_roi_extractor=dict( - type='GenericRoIExtractor', - aggregation='sum', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=2), + type="GenericRoIExtractor", + aggregation="sum", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=2), out_channels=256, featmap_strides=[4, 8, 16, 32], pre_cfg=dict( - type='ConvModule', + type="ConvModule", in_channels=256, out_channels=256, kernel_size=5, padding=2, inplace=False, ), - post_cfg=dict( - type='GeneralizedAttention', - in_channels=256, - spatial_range=-1, - num_heads=6, - attention_type='0100', - kv_stride=2)), + post_cfg=dict(type="GeneralizedAttention", in_channels=256, spatial_range=-1, num_heads=6, attention_type="0100", kv_stride=2), + ), mask_roi_extractor=dict( - type='GenericRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=2), + type="GenericRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=2), out_channels=256, featmap_strides=[4, 8, 16, 32], pre_cfg=dict( - type='ConvModule', + type="ConvModule", in_channels=256, out_channels=256, kernel_size=5, padding=2, inplace=False, ), - post_cfg=dict( - type='GeneralizedAttention', - in_channels=256, - spatial_range=-1, - num_heads=6, - attention_type='0100', - kv_stride=2)))) + post_cfg=dict(type="GeneralizedAttention", in_channels=256, spatial_range=-1, num_heads=6, attention_type="0100", kv_stride=2), + ), + ) +) diff --git a/mmpose/configs/mmdet/groie/mask-rcnn_r50_fpn_groie_1x_coco.py b/mmpose/configs/mmdet/groie/mask-rcnn_r50_fpn_groie_1x_coco.py index 4c9521e2f5730b74efc51f2051f861bfe5f8192d..df8aac7808455aa38b73bf98679a6f8966435e40 100644 --- a/mmpose/configs/mmdet/groie/mask-rcnn_r50_fpn_groie_1x_coco.py +++ b/mmpose/configs/mmdet/groie/mask-rcnn_r50_fpn_groie_1x_coco.py @@ -1,45 +1,37 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" # model settings model = dict( roi_head=dict( bbox_roi_extractor=dict( - type='GenericRoIExtractor', - aggregation='sum', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=2), + type="GenericRoIExtractor", + aggregation="sum", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=2), out_channels=256, featmap_strides=[4, 8, 16, 32], pre_cfg=dict( - type='ConvModule', + type="ConvModule", in_channels=256, out_channels=256, kernel_size=5, padding=2, inplace=False, ), - post_cfg=dict( - type='GeneralizedAttention', - in_channels=256, - spatial_range=-1, - num_heads=6, - attention_type='0100', - kv_stride=2)), + post_cfg=dict(type="GeneralizedAttention", in_channels=256, spatial_range=-1, num_heads=6, attention_type="0100", kv_stride=2), + ), mask_roi_extractor=dict( - type='GenericRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=2), + type="GenericRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=2), out_channels=256, featmap_strides=[4, 8, 16, 32], pre_cfg=dict( - type='ConvModule', + type="ConvModule", in_channels=256, out_channels=256, kernel_size=5, padding=2, inplace=False, ), - post_cfg=dict( - type='GeneralizedAttention', - in_channels=256, - spatial_range=-1, - num_heads=6, - attention_type='0100', - kv_stride=2)))) + post_cfg=dict(type="GeneralizedAttention", in_channels=256, spatial_range=-1, num_heads=6, attention_type="0100", kv_stride=2), + ), + ) +) diff --git a/mmpose/configs/mmdet/groie/mask-rcnn_r50_fpn_syncbn-r4-gcb-c3-c5-groie_1x_coco.py b/mmpose/configs/mmdet/groie/mask-rcnn_r50_fpn_syncbn-r4-gcb-c3-c5-groie_1x_coco.py index 22e97b6959a0bd13ae4432c806c61ca3d899f9ea..54ff87308d1625154075e716c14a32f9bd0f8127 100644 --- a/mmpose/configs/mmdet/groie/mask-rcnn_r50_fpn_syncbn-r4-gcb-c3-c5-groie_1x_coco.py +++ b/mmpose/configs/mmdet/groie/mask-rcnn_r50_fpn_syncbn-r4-gcb-c3-c5-groie_1x_coco.py @@ -1,45 +1,37 @@ -_base_ = '../gcnet/mask-rcnn_r50-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py' +_base_ = "../gcnet/mask-rcnn_r50-syncbn-gcb-r4-c3-c5_fpn_1x_coco.py" # model settings model = dict( roi_head=dict( bbox_roi_extractor=dict( - type='GenericRoIExtractor', - aggregation='sum', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=2), + type="GenericRoIExtractor", + aggregation="sum", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=2), out_channels=256, featmap_strides=[4, 8, 16, 32], pre_cfg=dict( - type='ConvModule', + type="ConvModule", in_channels=256, out_channels=256, kernel_size=5, padding=2, inplace=False, ), - post_cfg=dict( - type='GeneralizedAttention', - in_channels=256, - spatial_range=-1, - num_heads=6, - attention_type='0100', - kv_stride=2)), + post_cfg=dict(type="GeneralizedAttention", in_channels=256, spatial_range=-1, num_heads=6, attention_type="0100", kv_stride=2), + ), mask_roi_extractor=dict( - type='GenericRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=2), + type="GenericRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=2), out_channels=256, featmap_strides=[4, 8, 16, 32], pre_cfg=dict( - type='ConvModule', + type="ConvModule", in_channels=256, out_channels=256, kernel_size=5, padding=2, inplace=False, ), - post_cfg=dict( - type='GeneralizedAttention', - in_channels=256, - spatial_range=-1, - num_heads=6, - attention_type='0100', - kv_stride=2)))) + post_cfg=dict(type="GeneralizedAttention", in_channels=256, spatial_range=-1, num_heads=6, attention_type="0100", kv_stride=2), + ), + ) +) diff --git a/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-b_pretrain_zeroshot_concat_dod.py b/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-b_pretrain_zeroshot_concat_dod.py index ac655b74aa664ef912b6b1f509e4eb9341ccd62a..8e2863098156fa472087494d2f445e0328e35639 100644 --- a/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-b_pretrain_zeroshot_concat_dod.py +++ b/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-b_pretrain_zeroshot_concat_dod.py @@ -1,7 +1,7 @@ -_base_ = 'grounding_dino_swin-t_pretrain_zeroshot_concat_dod.py' +_base_ = "grounding_dino_swin-t_pretrain_zeroshot_concat_dod.py" model = dict( - type='GroundingDINO', + type="GroundingDINO", backbone=dict( pretrain_img_size=384, embed_dims=128, @@ -9,6 +9,7 @@ model = dict( num_heads=[4, 8, 16, 32], window_size=12, drop_path_rate=0.3, - patch_norm=True), + patch_norm=True, + ), neck=dict(in_channels=[256, 512, 1024]), ) diff --git a/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-b_pretrain_zeroshot_parallel_dod.py b/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-b_pretrain_zeroshot_parallel_dod.py index 9a1c8f2ac740c6c64a01a1a6a8f7dd57622bedf6..8729895e56603afe6575ec6d4d2ac033d4069006 100644 --- a/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-b_pretrain_zeroshot_parallel_dod.py +++ b/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-b_pretrain_zeroshot_parallel_dod.py @@ -1,3 +1,3 @@ -_base_ = 'grounding_dino_swin-b_pretrain_zeroshot_concat_dod.py' +_base_ = "grounding_dino_swin-b_pretrain_zeroshot_concat_dod.py" model = dict(test_cfg=dict(chunked_size=1)) diff --git a/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_concat_dod.py b/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_concat_dod.py index bb418011bf489c259f3696589aa56c5b8296256c..63f31f7f20fd6a4c3a534e7bd63f57b4fc408b25 100644 --- a/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_concat_dod.py +++ b/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_concat_dod.py @@ -1,78 +1,64 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py" -data_root = 'data/d3/' +data_root = "data/d3/" test_pipeline = [ + dict(type="LoadImageFromFile", backend_args=None, imdecode_backend="pillow"), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True, backend="pillow"), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='LoadImageFromFile', backend_args=None, - imdecode_backend='pillow'), - dict( - type='FixScaleResize', - scale=(800, 1333), - keep_ratio=True, - backend='pillow'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities', 'sent_ids')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "text", "custom_entities", "sent_ids"), + ), ] # -------------------------------------------------# val_dataset_full = dict( - type='DODDataset', + type="DODDataset", data_root=data_root, - ann_file='d3_json/d3_full_annotations.json', - data_prefix=dict(img='d3_images/', anno='d3_pkl'), + ann_file="d3_json/d3_full_annotations.json", + data_prefix=dict(img="d3_images/", anno="d3_pkl"), pipeline=test_pipeline, test_mode=True, backend_args=None, - return_classes=True) + return_classes=True, +) -val_evaluator_full = dict( - type='DODCocoMetric', - ann_file=data_root + 'd3_json/d3_full_annotations.json') +val_evaluator_full = dict(type="DODCocoMetric", ann_file=data_root + "d3_json/d3_full_annotations.json") # -------------------------------------------------# val_dataset_pres = dict( - type='DODDataset', + type="DODDataset", data_root=data_root, - ann_file='d3_json/d3_pres_annotations.json', - data_prefix=dict(img='d3_images/', anno='d3_pkl'), + ann_file="d3_json/d3_pres_annotations.json", + data_prefix=dict(img="d3_images/", anno="d3_pkl"), pipeline=test_pipeline, test_mode=True, backend_args=None, - return_classes=True) -val_evaluator_pres = dict( - type='DODCocoMetric', - ann_file=data_root + 'd3_json/d3_pres_annotations.json') + return_classes=True, +) +val_evaluator_pres = dict(type="DODCocoMetric", ann_file=data_root + "d3_json/d3_pres_annotations.json") # -------------------------------------------------# val_dataset_abs = dict( - type='DODDataset', + type="DODDataset", data_root=data_root, - ann_file='d3_json/d3_abs_annotations.json', - data_prefix=dict(img='d3_images/', anno='d3_pkl'), + ann_file="d3_json/d3_abs_annotations.json", + data_prefix=dict(img="d3_images/", anno="d3_pkl"), pipeline=test_pipeline, test_mode=True, backend_args=None, - return_classes=True) -val_evaluator_abs = dict( - type='DODCocoMetric', - ann_file=data_root + 'd3_json/d3_abs_annotations.json') + return_classes=True, +) +val_evaluator_abs = dict(type="DODCocoMetric", ann_file=data_root + "d3_json/d3_abs_annotations.json") # -------------------------------------------------# datasets = [val_dataset_full, val_dataset_pres, val_dataset_abs] -dataset_prefixes = ['FULL', 'PRES', 'ABS'] +dataset_prefixes = ["FULL", "PRES", "ABS"] metrics = [val_evaluator_full, val_evaluator_pres, val_evaluator_abs] -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_parallel_dod.py b/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_parallel_dod.py index 3d680091162e5ac96c15c76b58a18764e85d3233..cd9786f64eaa1987884d06dc67c06995b6b77b74 100644 --- a/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_parallel_dod.py +++ b/mmpose/configs/mmdet/grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_parallel_dod.py @@ -1,3 +1,3 @@ -_base_ = 'grounding_dino_swin-t_pretrain_zeroshot_concat_dod.py' +_base_ = "grounding_dino_swin-t_pretrain_zeroshot_concat_dod.py" model = dict(test_cfg=dict(chunked_size=1)) diff --git a/mmpose/configs/mmdet/grounding_dino/flickr30k/grounding_dino_swin-t-pretrain_zeroshot_flickr30k.py b/mmpose/configs/mmdet/grounding_dino/flickr30k/grounding_dino_swin-t-pretrain_zeroshot_flickr30k.py index c1996567588842f82c0af83e3a9ab84c81e7c25d..419d134a19a82b93bfffe13877f79f5bd5361541 100644 --- a/mmpose/configs/mmdet/grounding_dino/flickr30k/grounding_dino_swin-t-pretrain_zeroshot_flickr30k.py +++ b/mmpose/configs/mmdet/grounding_dino/flickr30k/grounding_dino_swin-t-pretrain_zeroshot_flickr30k.py @@ -1,57 +1,56 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py" -dataset_type = 'Flickr30kDataset' -data_root = 'data/flickr30k_entities/' +dataset_type = "Flickr30kDataset" +data_root = "data/flickr30k_entities/" test_pipeline = [ + dict(type="LoadImageFromFile", backend_args=None, imdecode_backend="pillow"), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True, backend="pillow"), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='LoadImageFromFile', backend_args=None, - imdecode_backend='pillow'), - dict( - type='FixScaleResize', - scale=(800, 1333), - keep_ratio=True, - backend='pillow'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities', - 'tokens_positive', 'phrase_ids', 'phrases')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "text", + "custom_entities", + "tokens_positive", + "phrase_ids", + "phrases", + ), + ), ] dataset_Flickr30k_val = dict( type=dataset_type, data_root=data_root, - ann_file='final_flickr_separateGT_val.json', - data_prefix=dict(img='flickr30k_images/'), + ann_file="final_flickr_separateGT_val.json", + data_prefix=dict(img="flickr30k_images/"), pipeline=test_pipeline, ) dataset_Flickr30k_test = dict( type=dataset_type, data_root=data_root, - ann_file='final_flickr_separateGT_test.json', - data_prefix=dict(img='flickr30k_images/'), + ann_file="final_flickr_separateGT_test.json", + data_prefix=dict(img="flickr30k_images/"), pipeline=test_pipeline, ) -val_evaluator_Flickr30k = dict(type='Flickr30kMetric') +val_evaluator_Flickr30k = dict(type="Flickr30kMetric") -test_evaluator_Flickr30k = dict(type='Flickr30kMetric') +test_evaluator_Flickr30k = dict(type="Flickr30kMetric") # ----------Config---------- # -dataset_prefixes = ['Flickr30kVal', 'Flickr30kTest'] +dataset_prefixes = ["Flickr30kVal", "Flickr30kTest"] datasets = [dataset_Flickr30k_val, dataset_Flickr30k_test] metrics = [val_evaluator_Flickr30k, test_evaluator_Flickr30k] -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/grounding_dino/grounding_dino_r50_scratch_8xb2_1x_coco.py b/mmpose/configs/mmdet/grounding_dino/grounding_dino_r50_scratch_8xb2_1x_coco.py index 623a29b87adfd6734e980e814766e873b2b89d05..08dc9bb5f166dc4bddd689406a3bbe318634e108 100644 --- a/mmpose/configs/mmdet/grounding_dino/grounding_dino_r50_scratch_8xb2_1x_coco.py +++ b/mmpose/configs/mmdet/grounding_dino/grounding_dino_r50_scratch_8xb2_1x_coco.py @@ -1,68 +1,62 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] -lang_model_name = 'bert-base-uncased' +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] +lang_model_name = "bert-base-uncased" model = dict( - type='GroundingDINO', + type="GroundingDINO", num_queries=900, with_box_refine=True, as_two_stage=True, data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_mask=False, ), language_model=dict( - type='BertModel', + type="BertModel", name=lang_model_name, pad_to_max=False, use_sub_sentence_represent=True, - special_tokens_list=['[CLS]', '[SEP]', '.', '?'], + special_tokens_list=["[CLS]", "[SEP]", ".", "?"], add_pooling_layer=False, ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), neck=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[512, 1024, 2048], kernel_size=1, out_channels=256, act_cfg=None, bias=True, - norm_cfg=dict(type='GN', num_groups=32), - num_outs=4), + norm_cfg=dict(type="GN", num_groups=32), + num_outs=4, + ), encoder=dict( num_layers=6, num_cp=6, # visual layer config layer_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_levels=4, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=2048, ffn_drop=0.0)), + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), + ), # text layer config text_layer_cfg=dict( self_attn_cfg=dict(num_heads=4, embed_dims=256, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=1024, ffn_drop=0.0)), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0), + ), # fusion layer config - fusion_layer_cfg=dict( - v_dim=256, - l_dim=256, - embed_dim=1024, - num_heads=4, - init_values=1e-4), + fusion_layer_cfg=dict(v_dim=256, l_dim=256, embed_dim=1024, num_heads=4, init_values=1e-4), ), decoder=dict( num_layers=6, @@ -74,133 +68,127 @@ model = dict( cross_attn_text_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), # cross attention layer query to image cross_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=2048, ffn_drop=0.0)), - post_norm_cfg=None), - positional_encoding=dict( - num_feats=128, normalize=True, offset=0.0, temperature=20), + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), + ), + post_norm_cfg=None, + ), + positional_encoding=dict(num_feats=128, normalize=True, offset=0.0, temperature=20), bbox_head=dict( - type='GroundingDINOHead', + type="GroundingDINOHead", num_classes=80, sync_cls_avg_factor=True, - contrastive_cfg=dict(max_text_len=256, log_scale='auto', bias=True), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), # 2.0 in DeformDETR - loss_bbox=dict(type='L1Loss', loss_weight=5.0), - loss_iou=dict(type='GIoULoss', loss_weight=2.0)), + contrastive_cfg=dict(max_text_len=256, log_scale="auto", bias=True), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), # 2.0 in DeformDETR + loss_bbox=dict(type="L1Loss", loss_weight=5.0), + loss_iou=dict(type="GIoULoss", loss_weight=2.0), + ), dn_cfg=dict( # TODO: Move to model.train_cfg ? - label_noise_scale=0.5, - box_noise_scale=1.0, # 0.4 for DN-DETR - group_cfg=dict(dynamic=True, num_groups=None, - num_dn_queries=100)), # TODO: half num_dn_queries + label_noise_scale=0.5, box_noise_scale=1.0, group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100) # 0.4 for DN-DETR + ), # TODO: half num_dn_queries # training and testing settings train_cfg=dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='BinaryFocalLossCost', weight=2.0), - dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'), - dict(type='IoUCost', iou_mode='giou', weight=2.0) - ])), - test_cfg=dict(max_per_img=300)) + dict(type="BinaryFocalLossCost", weight=2.0), + dict(type="BBoxL1Cost", weight=5.0, box_format="xywh"), + dict(type="IoUCost", iou_mode="giou", weight=2.0), + ], + ) + ), + test_cfg=dict(max_per_img=300), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction", "text", "custom_entities"), + ), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='FixScaleResize', scale=(800, 1333), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities')) + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "text", "custom_entities")), ] -train_dataloader = dict( - dataset=dict( - filter_cfg=dict(filter_empty_gt=False), - pipeline=train_pipeline, - return_classes=True)) -val_dataloader = dict( - dataset=dict(pipeline=test_pipeline, return_classes=True)) +train_dataloader = dict(dataset=dict(filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline, return_classes=True)) +val_dataloader = dict(dataset=dict(pipeline=test_pipeline, return_classes=True)) test_dataloader = val_dataloader # We did not adopt the official 24e optimizer strategy # because the results indicate that the current strategy is superior. optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict( - type='AdamW', - lr=0.0001, # 0.0002 for DeformDETR - weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0001, weight_decay=0.0001), # 0.0002 for DeformDETR clip_grad=dict(max_norm=0.1, norm_type=2), - paramwise_cfg=dict(custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1) - })) + paramwise_cfg=dict(custom_keys={"absolute_pos_embed": dict(decay_mult=0.0), "backbone": dict(lr_mult=0.1)}), +) # learning policy max_epochs = 12 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[11], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[11], gamma=0.1)] # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-b_finetune_16xb2_1x_coco.py b/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-b_finetune_16xb2_1x_coco.py index 3554ee245ffe4312fc7f2cdd83755b1a0731aab9..08a14b187742b6672f8de64bf05602f9ab0f4f78 100644 --- a/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-b_finetune_16xb2_1x_coco.py +++ b/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-b_finetune_16xb2_1x_coco.py @@ -1,10 +1,10 @@ _base_ = [ - './grounding_dino_swin-t_finetune_16xb2_1x_coco.py', + "./grounding_dino_swin-t_finetune_16xb2_1x_coco.py", ] -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swinb_cogcoor_mmdet-55949c9c.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swinb_cogcoor_mmdet-55949c9c.pth" # noqa model = dict( - type='GroundingDINO', + type="GroundingDINO", backbone=dict( pretrain_img_size=384, embed_dims=128, @@ -12,6 +12,7 @@ model = dict( num_heads=[4, 8, 16, 32], window_size=12, drop_path_rate=0.3, - patch_norm=True), + patch_norm=True, + ), neck=dict(in_channels=[256, 512, 1024]), ) diff --git a/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-b_pretrain_mixeddata.py b/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-b_pretrain_mixeddata.py index 92f327fef8311f0f72d7f75149bfc163863e913c..4b37c484d25415f775a84c0c96b422fc89f404db 100644 --- a/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-b_pretrain_mixeddata.py +++ b/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-b_pretrain_mixeddata.py @@ -1,9 +1,9 @@ _base_ = [ - './grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py', + "./grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py", ] model = dict( - type='GroundingDINO', + type="GroundingDINO", backbone=dict( pretrain_img_size=384, embed_dims=128, @@ -11,6 +11,7 @@ model = dict( num_heads=[4, 8, 16, 32], window_size=12, drop_path_rate=0.3, - patch_norm=True), + patch_norm=True, + ), neck=dict(in_channels=[256, 512, 1024]), ) diff --git a/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-t_finetune_16xb2_1x_coco.py b/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-t_finetune_16xb2_1x_coco.py index 0c6403ee66d9e5782723117191176efbadec2a90..fedcbca06e46a6043a9afe6475aa214e46bda1e2 100644 --- a/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-t_finetune_16xb2_1x_coco.py +++ b/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-t_finetune_16xb2_1x_coco.py @@ -1,32 +1,29 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swint_ogc_mmdet-822d7e9d.pth' # noqa -lang_model_name = 'bert-base-uncased' +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] +load_from = "https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swint_ogc_mmdet-822d7e9d.pth" # noqa +lang_model_name = "bert-base-uncased" model = dict( - type='GroundingDINO', + type="GroundingDINO", num_queries=900, with_box_refine=True, as_two_stage=True, data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_mask=False, ), language_model=dict( - type='BertModel', + type="BertModel", name=lang_model_name, pad_to_max=False, use_sub_sentence_represent=True, - special_tokens_list=['[CLS]', '[SEP]', '.', '?'], + special_tokens_list=["[CLS]", "[SEP]", ".", "?"], add_pooling_layer=False, ), backbone=dict( - type='SwinTransformer', + type="SwinTransformer", embed_dims=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], @@ -34,42 +31,39 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=(1, 2, 3), with_cp=True, - convert_weights=False), + convert_weights=False, + ), neck=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[192, 384, 768], kernel_size=1, out_channels=256, act_cfg=None, bias=True, - norm_cfg=dict(type='GN', num_groups=32), - num_outs=4), + norm_cfg=dict(type="GN", num_groups=32), + num_outs=4, + ), encoder=dict( num_layers=6, num_cp=6, # visual layer config layer_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_levels=4, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=2048, ffn_drop=0.0)), + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), + ), # text layer config text_layer_cfg=dict( self_attn_cfg=dict(num_heads=4, embed_dims=256, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=1024, ffn_drop=0.0)), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0), + ), # fusion layer config - fusion_layer_cfg=dict( - v_dim=256, - l_dim=256, - embed_dim=1024, - num_heads=4, - init_values=1e-4), + fusion_layer_cfg=dict(v_dim=256, l_dim=256, embed_dim=1024, num_heads=4, init_values=1e-4), ), decoder=dict( num_layers=6, @@ -81,122 +75,120 @@ model = dict( cross_attn_text_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), # cross attention layer query to image cross_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=2048, ffn_drop=0.0)), - post_norm_cfg=None), - positional_encoding=dict( - num_feats=128, normalize=True, offset=0.0, temperature=20), + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), + ), + post_norm_cfg=None, + ), + positional_encoding=dict(num_feats=128, normalize=True, offset=0.0, temperature=20), bbox_head=dict( - type='GroundingDINOHead', + type="GroundingDINOHead", num_classes=80, sync_cls_avg_factor=True, contrastive_cfg=dict(max_text_len=256, log_scale=0.0, bias=False), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), # 2.0 in DeformDETR - loss_bbox=dict(type='L1Loss', loss_weight=5.0), - loss_iou=dict(type='GIoULoss', loss_weight=2.0)), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), # 2.0 in DeformDETR + loss_bbox=dict(type="L1Loss", loss_weight=5.0), + loss_iou=dict(type="GIoULoss", loss_weight=2.0), + ), dn_cfg=dict( # TODO: Move to model.train_cfg ? - label_noise_scale=0.5, - box_noise_scale=1.0, # 0.4 for DN-DETR - group_cfg=dict(dynamic=True, num_groups=None, - num_dn_queries=100)), # TODO: half num_dn_queries + label_noise_scale=0.5, box_noise_scale=1.0, group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100) # 0.4 for DN-DETR + ), # TODO: half num_dn_queries # training and testing settings train_cfg=dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='BinaryFocalLossCost', weight=2.0), - dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'), - dict(type='IoUCost', iou_mode='giou', weight=2.0) - ])), - test_cfg=dict(max_per_img=300)) + dict(type="BinaryFocalLossCost", weight=2.0), + dict(type="BBoxL1Cost", weight=5.0, box_format="xywh"), + dict(type="IoUCost", iou_mode="giou", weight=2.0), + ], + ) + ), + test_cfg=dict(max_per_img=300), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction", "text", "custom_entities"), + ), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='FixScaleResize', scale=(800, 1333), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities')) + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "text", "custom_entities")), ] -train_dataloader = dict( - dataset=dict( - filter_cfg=dict(filter_empty_gt=False), - pipeline=train_pipeline, - return_classes=True)) -val_dataloader = dict( - dataset=dict(pipeline=test_pipeline, return_classes=True)) +train_dataloader = dict(dataset=dict(filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline, return_classes=True)) +val_dataloader = dict(dataset=dict(pipeline=test_pipeline, return_classes=True)) test_dataloader = val_dataloader optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0001, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0001, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), - paramwise_cfg=dict(custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1) - })) + paramwise_cfg=dict(custom_keys={"absolute_pos_embed": dict(decay_mult=0.0), "backbone": dict(lr_mult=0.1)}), +) # learning policy max_epochs = 12 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[11], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[11], gamma=0.1)] # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-t_finetune_8xb2_20e_cat.py b/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-t_finetune_8xb2_20e_cat.py index c2265e86730f68ed69af246a5e0e87fa2cb5e570..0c0e7c18a59808987ae26627866ecb1a60c91920 100644 --- a/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-t_finetune_8xb2_20e_cat.py +++ b/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-t_finetune_8xb2_20e_cat.py @@ -1,56 +1,44 @@ -_base_ = 'grounding_dino_swin-t_finetune_16xb2_1x_coco.py' +_base_ = "grounding_dino_swin-t_finetune_16xb2_1x_coco.py" -data_root = 'data/cat/' -class_name = ('cat', ) +data_root = "data/cat/" +class_name = ("cat",) num_classes = len(class_name) metainfo = dict(classes=class_name, palette=[(220, 20, 60)]) model = dict(bbox_head=dict(num_classes=num_classes)) train_dataloader = dict( - dataset=dict( - data_root=data_root, - metainfo=metainfo, - ann_file='annotations/trainval.json', - data_prefix=dict(img='images/'))) + dataset=dict(data_root=data_root, metainfo=metainfo, ann_file="annotations/trainval.json", data_prefix=dict(img="images/")) +) val_dataloader = dict( - dataset=dict( - metainfo=metainfo, - data_root=data_root, - ann_file='annotations/test.json', - data_prefix=dict(img='images/'))) + dataset=dict(metainfo=metainfo, data_root=data_root, ann_file="annotations/test.json", data_prefix=dict(img="images/")) +) test_dataloader = val_dataloader -val_evaluator = dict(ann_file=data_root + 'annotations/test.json') +val_evaluator = dict(ann_file=data_root + "annotations/test.json") test_evaluator = val_evaluator max_epoch = 20 -default_hooks = dict( - checkpoint=dict(interval=1, max_keep_ckpts=1, save_best='auto'), - logger=dict(type='LoggerHook', interval=5)) +default_hooks = dict(checkpoint=dict(interval=1, max_keep_ckpts=1, save_best="auto"), logger=dict(type="LoggerHook", interval=5)) train_cfg = dict(max_epochs=max_epoch, val_interval=1) param_scheduler = [ - dict(type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=30), - dict( - type='MultiStepLR', - begin=0, - end=max_epoch, - by_epoch=True, - milestones=[15], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=30), + dict(type="MultiStepLR", begin=0, end=max_epoch, by_epoch=True, milestones=[15], gamma=0.1), ] optim_wrapper = dict( optimizer=dict(lr=0.00005), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1), - 'language_model': dict(lr_mult=0), - })) + "absolute_pos_embed": dict(decay_mult=0.0), + "backbone": dict(lr_mult=0.1), + "language_model": dict(lr_mult=0), + } + ), +) auto_scale_lr = dict(base_batch_size=16) diff --git a/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py b/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py index 7448764ef7ed4fb91bbca981e8006b412e74c414..4b65c342abaeb4c96fd8f919622f04289ccea2af 100644 --- a/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py +++ b/mmpose/configs/mmdet/grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py @@ -1,32 +1,29 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] -lang_model_name = 'bert-base-uncased' +lang_model_name = "bert-base-uncased" model = dict( - type='GroundingDINO', + type="GroundingDINO", num_queries=900, with_box_refine=True, as_two_stage=True, data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_mask=False, ), language_model=dict( - type='BertModel', + type="BertModel", name=lang_model_name, pad_to_max=False, use_sub_sentence_represent=True, - special_tokens_list=['[CLS]', '[SEP]', '.', '?'], + special_tokens_list=["[CLS]", "[SEP]", ".", "?"], add_pooling_layer=True, ), backbone=dict( - type='SwinTransformer', + type="SwinTransformer", embed_dims=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], @@ -34,41 +31,38 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=(1, 2, 3), with_cp=False, - convert_weights=False), + convert_weights=False, + ), neck=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[192, 384, 768], kernel_size=1, out_channels=256, act_cfg=None, bias=True, - norm_cfg=dict(type='GN', num_groups=32), - num_outs=4), + norm_cfg=dict(type="GN", num_groups=32), + num_outs=4, + ), encoder=dict( num_layers=6, # visual layer config layer_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_levels=4, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=2048, ffn_drop=0.0)), + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), + ), # text layer config text_layer_cfg=dict( self_attn_cfg=dict(num_heads=4, embed_dims=256, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=1024, ffn_drop=0.0)), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0), + ), # fusion layer config - fusion_layer_cfg=dict( - v_dim=256, - l_dim=256, - embed_dim=1024, - num_heads=4, - init_values=1e-4), + fusion_layer_cfg=dict(v_dim=256, l_dim=256, embed_dim=1024, num_heads=4, init_values=1e-4), ), decoder=dict( num_layers=6, @@ -80,49 +74,36 @@ model = dict( cross_attn_text_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), # cross attention layer query to image cross_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=2048, ffn_drop=0.0)), - post_norm_cfg=None), - positional_encoding=dict( - num_feats=128, normalize=True, offset=0.0, temperature=20), + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), + ), + post_norm_cfg=None, + ), + positional_encoding=dict(num_feats=128, normalize=True, offset=0.0, temperature=20), bbox_head=dict( - type='GroundingDINOHead', + type="GroundingDINOHead", num_classes=80, sync_cls_avg_factor=True, contrastive_cfg=dict(max_text_len=256), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), # 2.0 in DeformDETR - loss_bbox=dict(type='L1Loss', loss_weight=5.0)), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), # 2.0 in DeformDETR + loss_bbox=dict(type="L1Loss", loss_weight=5.0), + ), dn_cfg=dict( # TODO: Move to model.train_cfg ? - label_noise_scale=0.5, - box_noise_scale=1.0, # 0.4 for DN-DETR - group_cfg=dict(dynamic=True, num_groups=None, - num_dn_queries=100)), # TODO: half num_dn_queries + label_noise_scale=0.5, box_noise_scale=1.0, group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100) # 0.4 for DN-DETR + ), # TODO: half num_dn_queries # training and testing settings train_cfg=None, - test_cfg=dict(max_per_img=300)) + test_cfg=dict(max_per_img=300), +) test_pipeline = [ + dict(type="LoadImageFromFile", backend_args=None, imdecode_backend="pillow"), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True, backend="pillow"), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='LoadImageFromFile', backend_args=None, - imdecode_backend='pillow'), - dict( - type='FixScaleResize', - scale=(800, 1333), - keep_ratio=True, - backend='pillow'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities', - 'tokens_positive')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "text", "custom_entities", "tokens_positive"), + ), ] -val_dataloader = dict( - dataset=dict(pipeline=test_pipeline, return_classes=True)) +val_dataloader = dict(dataset=dict(pipeline=test_pipeline, return_classes=True)) test_dataloader = val_dataloader diff --git a/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-b_pretrain_zeroshot_lvis.py b/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-b_pretrain_zeroshot_lvis.py index 6084159044e8c0e8642a1226c6a9efd85c7d27d2..7de7744e77526589a026834c86d2f0611266c27b 100644 --- a/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-b_pretrain_zeroshot_lvis.py +++ b/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-b_pretrain_zeroshot_lvis.py @@ -1,7 +1,7 @@ -_base_ = './grounding_dino_swin-t_pretrain_zeroshot_lvis.py' +_base_ = "./grounding_dino_swin-t_pretrain_zeroshot_lvis.py" model = dict( - type='GroundingDINO', + type="GroundingDINO", backbone=dict( pretrain_img_size=384, embed_dims=128, @@ -9,6 +9,7 @@ model = dict( num_heads=[4, 8, 16, 32], window_size=12, drop_path_rate=0.3, - patch_norm=True), + patch_norm=True, + ), neck=dict(in_channels=[256, 512, 1024]), ) diff --git a/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-b_pretrain_zeroshot_mini-lvis.py b/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-b_pretrain_zeroshot_mini-lvis.py index 68467a7237ca893aa79eb5b0acc9d159f7082968..3859d8149c4f57298e1f32819b197d29c0b80449 100644 --- a/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-b_pretrain_zeroshot_mini-lvis.py +++ b/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-b_pretrain_zeroshot_mini-lvis.py @@ -1,7 +1,7 @@ -_base_ = './grounding_dino_swin-t_pretrain_zeroshot_mini-lvis.py' +_base_ = "./grounding_dino_swin-t_pretrain_zeroshot_mini-lvis.py" model = dict( - type='GroundingDINO', + type="GroundingDINO", backbone=dict( pretrain_img_size=384, embed_dims=128, @@ -9,6 +9,7 @@ model = dict( num_heads=[4, 8, 16, 32], window_size=12, drop_path_rate=0.3, - patch_norm=True), + patch_norm=True, + ), neck=dict(in_channels=[256, 512, 1024]), ) diff --git a/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_lvis.py b/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_lvis.py index 3d05f0ce1c0cb095c0c9f9a65bd7666cba57afe7..9d3c2e647282fdf5b40b7f414db72178df839828 100644 --- a/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_lvis.py +++ b/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_lvis.py @@ -1,24 +1,20 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py" -model = dict(test_cfg=dict( - max_per_img=300, - chunked_size=40, -)) +model = dict( + test_cfg=dict( + max_per_img=300, + chunked_size=40, + ) +) -dataset_type = 'LVISV1Dataset' -data_root = 'data/coco/' +dataset_type = "LVISV1Dataset" +data_root = "data/coco/" val_dataloader = dict( - dataset=dict( - data_root=data_root, - type=dataset_type, - ann_file='annotations/lvis_od_val.json', - data_prefix=dict(img=''))) + dataset=dict(data_root=data_root, type=dataset_type, ann_file="annotations/lvis_od_val.json", data_prefix=dict(img="")) +) test_dataloader = val_dataloader # numpy < 1.24.0 -val_evaluator = dict( - _delete_=True, - type='LVISFixedAPMetric', - ann_file=data_root + 'annotations/lvis_od_val.json') +val_evaluator = dict(_delete_=True, type="LVISFixedAPMetric", ann_file=data_root + "annotations/lvis_od_val.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_mini-lvis.py b/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_mini-lvis.py index 0aac6cf33a92827c9c350175977bb1a595d2c0c8..e17a7b79de26093884a462be8aa2bcbb60f89cdd 100644 --- a/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_mini-lvis.py +++ b/mmpose/configs/mmdet/grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_mini-lvis.py @@ -1,25 +1,22 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py" -model = dict(test_cfg=dict( - max_per_img=300, - chunked_size=40, -)) +model = dict( + test_cfg=dict( + max_per_img=300, + chunked_size=40, + ) +) -dataset_type = 'LVISV1Dataset' -data_root = 'data/coco/' +dataset_type = "LVISV1Dataset" +data_root = "data/coco/" val_dataloader = dict( dataset=dict( - data_root=data_root, - type=dataset_type, - ann_file='annotations/lvis_v1_minival_inserted_image_name.json', - data_prefix=dict(img=''))) + data_root=data_root, type=dataset_type, ann_file="annotations/lvis_v1_minival_inserted_image_name.json", data_prefix=dict(img="") + ) +) test_dataloader = val_dataloader # numpy < 1.24.0 -val_evaluator = dict( - _delete_=True, - type='LVISFixedAPMetric', - ann_file=data_root + - 'annotations/lvis_v1_minival_inserted_image_name.json') +val_evaluator = dict(_delete_=True, type="LVISFixedAPMetric", ann_file=data_root + "annotations/lvis_v1_minival_inserted_image_name.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-b_pretrain_odinw13.py b/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-b_pretrain_odinw13.py index 65a6bc2a078a9ea5123c745aa72ba22466ea6e58..c6f0e8bf830d29917f882f8fffd3136d97e72974 100644 --- a/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-b_pretrain_odinw13.py +++ b/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-b_pretrain_odinw13.py @@ -1,36 +1,42 @@ -_base_ = '../grounding_dino_swin-b_pretrain_mixeddata.py' +_base_ = "../grounding_dino_swin-b_pretrain_mixeddata.py" -dataset_type = 'CocoDataset' -data_root = 'data/odinw/' +dataset_type = "CocoDataset" +data_root = "data/odinw/" base_test_pipeline = _base_.test_pipeline -base_test_pipeline[-1]['meta_keys'] = ('img_id', 'img_path', 'ori_shape', - 'img_shape', 'scale_factor', 'text', - 'custom_entities', 'caption_prompt') +base_test_pipeline[-1]["meta_keys"] = ( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "text", + "custom_entities", + "caption_prompt", +) # ---------------------1 AerialMaritimeDrone---------------------# -class_name = ('boat', 'car', 'dock', 'jetski', 'lift') +class_name = ("boat", "car", "dock", "jetski", "lift") metainfo = dict(classes=class_name) -_data_root = data_root + 'AerialMaritimeDrone/large/' +_data_root = data_root + "AerialMaritimeDrone/large/" dataset_AerialMaritimeDrone = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), test_mode=True, pipeline=base_test_pipeline, - return_classes=True) + return_classes=True, +) val_evaluator_AerialMaritimeDrone = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------2 Aquarium---------------------# -class_name = ('fish', 'jellyfish', 'penguin', 'puffin', 'shark', 'starfish', - 'stingray') +class_name = ("fish", "jellyfish", "penguin", "puffin", "shark", "starfish", "stingray") metainfo = dict(classes=class_name) -_data_root = data_root + 'Aquarium/Aquarium Combined.v2-raw-1024.coco/' +_data_root = data_root + "Aquarium/Aquarium Combined.v2-raw-1024.coco/" caption_prompt = None # caption_prompt = { @@ -48,21 +54,19 @@ dataset_Aquarium = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_Aquarium = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Aquarium = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------3 CottontailRabbits---------------------# -class_name = ('Cottontail-Rabbit', ) +class_name = ("Cottontail-Rabbit",) metainfo = dict(classes=class_name) -_data_root = data_root + 'CottontailRabbits/' +_data_root = data_root + "CottontailRabbits/" caption_prompt = None # caption_prompt = {'Cottontail-Rabbit': {'name': 'rabbit'}} @@ -71,21 +75,19 @@ dataset_CottontailRabbits = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_CottontailRabbits = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_CottontailRabbits = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------4 EgoHands---------------------# -class_name = ('hand', ) +class_name = ("hand",) metainfo = dict(classes=class_name) -_data_root = data_root + 'EgoHands/generic/' +_data_root = data_root + "EgoHands/generic/" caption_prompt = None # caption_prompt = {'hand': {'suffix': ' of a person'}} @@ -94,21 +96,19 @@ dataset_EgoHands = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_EgoHands = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_EgoHands = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------5 NorthAmericaMushrooms---------------------# -class_name = ('CoW', 'chanterelle') +class_name = ("CoW", "chanterelle") metainfo = dict(classes=class_name) -_data_root = data_root + 'NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/' # noqa +_data_root = data_root + "NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/" # noqa caption_prompt = None # caption_prompt = { @@ -124,21 +124,21 @@ dataset_NorthAmericaMushrooms = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_NorthAmericaMushrooms = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------6 Packages---------------------# -class_name = ('package', ) +class_name = ("package",) metainfo = dict(classes=class_name) -_data_root = data_root + 'Packages/Raw/' +_data_root = data_root + "Packages/Raw/" caption_prompt = None # caption_prompt = { @@ -152,60 +152,72 @@ dataset_Packages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_Packages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Packages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------7 PascalVOC---------------------# -class_name = ('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', - 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', - 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', - 'tvmonitor') +class_name = ( + "aeroplane", + "bicycle", + "bird", + "boat", + "bottle", + "bus", + "car", + "cat", + "chair", + "cow", + "diningtable", + "dog", + "horse", + "motorbike", + "person", + "pottedplant", + "sheep", + "sofa", + "train", + "tvmonitor", +) metainfo = dict(classes=class_name) -_data_root = data_root + 'PascalVOC/' +_data_root = data_root + "PascalVOC/" dataset_PascalVOC = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_PascalVOC = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_PascalVOC = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------8 pistols---------------------# -class_name = ('pistol', ) +class_name = ("pistol",) metainfo = dict(classes=class_name) -_data_root = data_root + 'pistols/export/' +_data_root = data_root + "pistols/export/" dataset_pistols = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pistols = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pistols = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------9 pothole---------------------# -class_name = ('pothole', ) +class_name = ("pothole",) metainfo = dict(classes=class_name) -_data_root = data_root + 'pothole/' +_data_root = data_root + "pothole/" caption_prompt = None # caption_prompt = { @@ -220,119 +232,132 @@ dataset_pothole = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pothole = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pothole = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------10 Raccoon---------------------# -class_name = ('raccoon', ) +class_name = ("raccoon",) metainfo = dict(classes=class_name) -_data_root = data_root + 'Raccoon/Raccoon.v2-raw.coco/' +_data_root = data_root + "Raccoon/Raccoon.v2-raw.coco/" dataset_Raccoon = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_Raccoon = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Raccoon = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------11 ShellfishOpenImages---------------------# -class_name = ('Crab', 'Lobster', 'Shrimp') +class_name = ("Crab", "Lobster", "Shrimp") metainfo = dict(classes=class_name) -_data_root = data_root + 'ShellfishOpenImages/raw/' +_data_root = data_root + "ShellfishOpenImages/raw/" dataset_ShellfishOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_ShellfishOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------12 thermalDogsAndPeople---------------------# -class_name = ('dog', 'person') +class_name = ("dog", "person") metainfo = dict(classes=class_name) -_data_root = data_root + 'thermalDogsAndPeople/' +_data_root = data_root + "thermalDogsAndPeople/" dataset_thermalDogsAndPeople = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_thermalDogsAndPeople = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------13 VehiclesOpenImages---------------------# -class_name = ('Ambulance', 'Bus', 'Car', 'Motorcycle', 'Truck') +class_name = ("Ambulance", "Bus", "Car", "Motorcycle", "Truck") metainfo = dict(classes=class_name) -_data_root = data_root + 'VehiclesOpenImages/416x416/' +_data_root = data_root + "VehiclesOpenImages/416x416/" dataset_VehiclesOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_VehiclesOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_VehiclesOpenImages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # --------------------- Config---------------------# dataset_prefixes = [ - 'AerialMaritimeDrone', 'Aquarium', 'CottontailRabbits', 'EgoHands', - 'NorthAmericaMushrooms', 'Packages', 'PascalVOC', 'pistols', 'pothole', - 'Raccoon', 'ShellfishOpenImages', 'thermalDogsAndPeople', - 'VehiclesOpenImages' + "AerialMaritimeDrone", + "Aquarium", + "CottontailRabbits", + "EgoHands", + "NorthAmericaMushrooms", + "Packages", + "PascalVOC", + "pistols", + "pothole", + "Raccoon", + "ShellfishOpenImages", + "thermalDogsAndPeople", + "VehiclesOpenImages", ] datasets = [ - dataset_AerialMaritimeDrone, dataset_Aquarium, dataset_CottontailRabbits, - dataset_EgoHands, dataset_NorthAmericaMushrooms, dataset_Packages, - dataset_PascalVOC, dataset_pistols, dataset_pothole, dataset_Raccoon, - dataset_ShellfishOpenImages, dataset_thermalDogsAndPeople, - dataset_VehiclesOpenImages + dataset_AerialMaritimeDrone, + dataset_Aquarium, + dataset_CottontailRabbits, + dataset_EgoHands, + dataset_NorthAmericaMushrooms, + dataset_Packages, + dataset_PascalVOC, + dataset_pistols, + dataset_pothole, + dataset_Raccoon, + dataset_ShellfishOpenImages, + dataset_thermalDogsAndPeople, + dataset_VehiclesOpenImages, ] metrics = [ - val_evaluator_AerialMaritimeDrone, val_evaluator_Aquarium, - val_evaluator_CottontailRabbits, val_evaluator_EgoHands, - val_evaluator_NorthAmericaMushrooms, val_evaluator_Packages, - val_evaluator_PascalVOC, val_evaluator_pistols, val_evaluator_pothole, - val_evaluator_Raccoon, val_evaluator_ShellfishOpenImages, - val_evaluator_thermalDogsAndPeople, val_evaluator_VehiclesOpenImages + val_evaluator_AerialMaritimeDrone, + val_evaluator_Aquarium, + val_evaluator_CottontailRabbits, + val_evaluator_EgoHands, + val_evaluator_NorthAmericaMushrooms, + val_evaluator_Packages, + val_evaluator_PascalVOC, + val_evaluator_pistols, + val_evaluator_pothole, + val_evaluator_Raccoon, + val_evaluator_ShellfishOpenImages, + val_evaluator_thermalDogsAndPeople, + val_evaluator_VehiclesOpenImages, ] # -------------------------------------------------# -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-b_pretrain_odinw35.py b/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-b_pretrain_odinw35.py index e73cd8e61ba20f4baff6f7c85477a8fae3735e44..265134ac3047330c1ffbe47bc7493e3b8bfcdac6 100644 --- a/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-b_pretrain_odinw35.py +++ b/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-b_pretrain_odinw35.py @@ -1,796 +1,951 @@ -_base_ = '../grounding_dino_swin-b_pretrain_mixeddata.py' +_base_ = "../grounding_dino_swin-b_pretrain_mixeddata.py" -dataset_type = 'CocoDataset' -data_root = 'data/odinw/' +dataset_type = "CocoDataset" +data_root = "data/odinw/" base_test_pipeline = _base_.test_pipeline -base_test_pipeline[-1]['meta_keys'] = ('img_id', 'img_path', 'ori_shape', - 'img_shape', 'scale_factor', 'text', - 'custom_entities', 'caption_prompt') +base_test_pipeline[-1]["meta_keys"] = ( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "text", + "custom_entities", + "caption_prompt", +) # ---------------------1 AerialMaritimeDrone_large---------------------# -class_name = ('boat', 'car', 'dock', 'jetski', 'lift') +class_name = ("boat", "car", "dock", "jetski", "lift") metainfo = dict(classes=class_name) -_data_root = data_root + 'AerialMaritimeDrone/large/' +_data_root = data_root + "AerialMaritimeDrone/large/" dataset_AerialMaritimeDrone_large = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_AerialMaritimeDrone_large = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------2 AerialMaritimeDrone_tiled---------------------# -class_name = ('boat', 'car', 'dock', 'jetski', 'lift') +class_name = ("boat", "car", "dock", "jetski", "lift") metainfo = dict(classes=class_name) -_data_root = data_root + 'AerialMaritimeDrone/tiled/' +_data_root = data_root + "AerialMaritimeDrone/tiled/" dataset_AerialMaritimeDrone_tiled = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_AerialMaritimeDrone_tiled = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------3 AmericanSignLanguageLetters---------------------# -class_name = ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', - 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z') -metainfo = dict(classes=class_name) -_data_root = data_root + 'AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/' # noqa +class_name = ( + "A", + "B", + "C", + "D", + "E", + "F", + "G", + "H", + "I", + "J", + "K", + "L", + "M", + "N", + "O", + "P", + "Q", + "R", + "S", + "T", + "U", + "V", + "W", + "X", + "Y", + "Z", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/" # noqa dataset_AmericanSignLanguageLetters = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_AmericanSignLanguageLetters = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------4 Aquarium---------------------# -class_name = ('fish', 'jellyfish', 'penguin', 'puffin', 'shark', 'starfish', - 'stingray') +class_name = ("fish", "jellyfish", "penguin", "puffin", "shark", "starfish", "stingray") metainfo = dict(classes=class_name) -_data_root = data_root + 'Aquarium/Aquarium Combined.v2-raw-1024.coco/' +_data_root = data_root + "Aquarium/Aquarium Combined.v2-raw-1024.coco/" dataset_Aquarium = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_Aquarium = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Aquarium = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------5 BCCD---------------------# -class_name = ('Platelets', 'RBC', 'WBC') +class_name = ("Platelets", "RBC", "WBC") metainfo = dict(classes=class_name) -_data_root = data_root + 'BCCD/BCCD.v3-raw.coco/' +_data_root = data_root + "BCCD/BCCD.v3-raw.coco/" dataset_BCCD = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_BCCD = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_BCCD = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------6 boggleBoards---------------------# -class_name = ('Q', 'a', 'an', 'b', 'c', 'd', 'e', 'er', 'f', 'g', 'h', 'he', - 'i', 'in', 'j', 'k', 'l', 'm', 'n', 'o', 'o ', 'p', 'q', 'qu', - 'r', 's', 't', 't\\', 'th', 'u', 'v', 'w', 'wild', 'x', 'y', 'z') -metainfo = dict(classes=class_name) -_data_root = data_root + 'boggleBoards/416x416AutoOrient/export/' +class_name = ( + "Q", + "a", + "an", + "b", + "c", + "d", + "e", + "er", + "f", + "g", + "h", + "he", + "i", + "in", + "j", + "k", + "l", + "m", + "n", + "o", + "o ", + "p", + "q", + "qu", + "r", + "s", + "t", + "t\\", + "th", + "u", + "v", + "w", + "wild", + "x", + "y", + "z", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "boggleBoards/416x416AutoOrient/export/" dataset_boggleBoards = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_boggleBoards = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_boggleBoards = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------7 brackishUnderwater---------------------# -class_name = ('crab', 'fish', 'jellyfish', 'shrimp', 'small_fish', 'starfish') +class_name = ("crab", "fish", "jellyfish", "shrimp", "small_fish", "starfish") metainfo = dict(classes=class_name) -_data_root = data_root + 'brackishUnderwater/960x540/' +_data_root = data_root + "brackishUnderwater/960x540/" dataset_brackishUnderwater = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_brackishUnderwater = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_brackishUnderwater = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------8 ChessPieces---------------------# -class_name = (' ', 'black bishop', 'black king', 'black knight', 'black pawn', - 'black queen', 'black rook', 'white bishop', 'white king', - 'white knight', 'white pawn', 'white queen', 'white rook') -metainfo = dict(classes=class_name) -_data_root = data_root + 'ChessPieces/Chess Pieces.v23-raw.coco/' +class_name = ( + " ", + "black bishop", + "black king", + "black knight", + "black pawn", + "black queen", + "black rook", + "white bishop", + "white king", + "white knight", + "white pawn", + "white queen", + "white rook", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "ChessPieces/Chess Pieces.v23-raw.coco/" dataset_ChessPieces = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/new_annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/new_annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_ChessPieces = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/new_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_ChessPieces = dict(type="CocoMetric", ann_file=_data_root + "valid/new_annotations_without_background.json", metric="bbox") # ---------------------9 CottontailRabbits---------------------# -class_name = ('rabbit', ) +class_name = ("rabbit",) metainfo = dict(classes=class_name) -_data_root = data_root + 'CottontailRabbits/' +_data_root = data_root + "CottontailRabbits/" dataset_CottontailRabbits = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/new_annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/new_annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_CottontailRabbits = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/new_annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/new_annotations_without_background.json", metric="bbox" +) # ---------------------10 dice---------------------# -class_name = ('1', '2', '3', '4', '5', '6') +class_name = ("1", "2", "3", "4", "5", "6") metainfo = dict(classes=class_name) -_data_root = data_root + 'dice/mediumColor/export/' +_data_root = data_root + "dice/mediumColor/export/" dataset_dice = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_dice = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_dice = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------11 DroneControl---------------------# -class_name = ('follow', 'follow_hand', 'land', 'land_hand', 'null', 'object', - 'takeoff', 'takeoff-hand') +class_name = ("follow", "follow_hand", "land", "land_hand", "null", "object", "takeoff", "takeoff-hand") metainfo = dict(classes=class_name) -_data_root = data_root + 'DroneControl/Drone Control.v3-raw.coco/' +_data_root = data_root + "DroneControl/Drone Control.v3-raw.coco/" dataset_DroneControl = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_DroneControl = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_DroneControl = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------12 EgoHands_generic---------------------# -class_name = ('hand', ) +class_name = ("hand",) metainfo = dict(classes=class_name) -_data_root = data_root + 'EgoHands/generic/' -caption_prompt = {'hand': {'suffix': ' of a person'}} +_data_root = data_root + "EgoHands/generic/" +caption_prompt = {"hand": {"suffix": " of a person"}} dataset_EgoHands_generic = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, # NOTE w. prompt 0.548; wo. prompt 0.764 # caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_EgoHands_generic = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_EgoHands_generic = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------13 EgoHands_specific---------------------# -class_name = ('myleft', 'myright', 'yourleft', 'yourright') +class_name = ("myleft", "myright", "yourleft", "yourright") metainfo = dict(classes=class_name) -_data_root = data_root + 'EgoHands/specific/' +_data_root = data_root + "EgoHands/specific/" dataset_EgoHands_specific = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_EgoHands_specific = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_EgoHands_specific = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------14 HardHatWorkers---------------------# -class_name = ('head', 'helmet', 'person') +class_name = ("head", "helmet", "person") metainfo = dict(classes=class_name) -_data_root = data_root + 'HardHatWorkers/raw/' +_data_root = data_root + "HardHatWorkers/raw/" dataset_HardHatWorkers = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_HardHatWorkers = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_HardHatWorkers = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------15 MaskWearing---------------------# -class_name = ('mask', 'no-mask') +class_name = ("mask", "no-mask") metainfo = dict(classes=class_name) -_data_root = data_root + 'MaskWearing/raw/' +_data_root = data_root + "MaskWearing/raw/" dataset_MaskWearing = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_MaskWearing = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_MaskWearing = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------16 MountainDewCommercial---------------------# -class_name = ('bottle', ) +class_name = ("bottle",) metainfo = dict(classes=class_name) -_data_root = data_root + 'MountainDewCommercial/' +_data_root = data_root + "MountainDewCommercial/" dataset_MountainDewCommercial = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_MountainDewCommercial = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------17 NorthAmericaMushrooms---------------------# -class_name = ('flat mushroom', 'yellow mushroom') +class_name = ("flat mushroom", "yellow mushroom") metainfo = dict(classes=class_name) -_data_root = data_root + 'NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/' # noqa +_data_root = data_root + "NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/" # noqa dataset_NorthAmericaMushrooms = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/new_annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/new_annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_NorthAmericaMushrooms = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/new_annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/new_annotations_without_background.json", metric="bbox" +) # ---------------------18 openPoetryVision---------------------# -class_name = ('American Typewriter', 'Andale Mono', 'Apple Chancery', 'Arial', - 'Avenir', 'Baskerville', 'Big Caslon', 'Bradley Hand', - 'Brush Script MT', 'Chalkboard', 'Comic Sans MS', 'Copperplate', - 'Courier', 'Didot', 'Futura', 'Geneva', 'Georgia', 'Gill Sans', - 'Helvetica', 'Herculanum', 'Impact', 'Kefa', 'Lucida Grande', - 'Luminari', 'Marker Felt', 'Menlo', 'Monaco', 'Noteworthy', - 'Optima', 'PT Sans', 'PT Serif', 'Palatino', 'Papyrus', - 'Phosphate', 'Rockwell', 'SF Pro', 'SignPainter', 'Skia', - 'Snell Roundhand', 'Tahoma', 'Times New Roman', 'Trebuchet MS', - 'Verdana') -metainfo = dict(classes=class_name) -_data_root = data_root + 'openPoetryVision/512x512/' +class_name = ( + "American Typewriter", + "Andale Mono", + "Apple Chancery", + "Arial", + "Avenir", + "Baskerville", + "Big Caslon", + "Bradley Hand", + "Brush Script MT", + "Chalkboard", + "Comic Sans MS", + "Copperplate", + "Courier", + "Didot", + "Futura", + "Geneva", + "Georgia", + "Gill Sans", + "Helvetica", + "Herculanum", + "Impact", + "Kefa", + "Lucida Grande", + "Luminari", + "Marker Felt", + "Menlo", + "Monaco", + "Noteworthy", + "Optima", + "PT Sans", + "PT Serif", + "Palatino", + "Papyrus", + "Phosphate", + "Rockwell", + "SF Pro", + "SignPainter", + "Skia", + "Snell Roundhand", + "Tahoma", + "Times New Roman", + "Trebuchet MS", + "Verdana", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "openPoetryVision/512x512/" dataset_openPoetryVision = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_openPoetryVision = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_openPoetryVision = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------19 OxfordPets_by_breed---------------------# -class_name = ('cat-Abyssinian', 'cat-Bengal', 'cat-Birman', 'cat-Bombay', - 'cat-British_Shorthair', 'cat-Egyptian_Mau', 'cat-Maine_Coon', - 'cat-Persian', 'cat-Ragdoll', 'cat-Russian_Blue', 'cat-Siamese', - 'cat-Sphynx', 'dog-american_bulldog', - 'dog-american_pit_bull_terrier', 'dog-basset_hound', - 'dog-beagle', 'dog-boxer', 'dog-chihuahua', - 'dog-english_cocker_spaniel', 'dog-english_setter', - 'dog-german_shorthaired', 'dog-great_pyrenees', 'dog-havanese', - 'dog-japanese_chin', 'dog-keeshond', 'dog-leonberger', - 'dog-miniature_pinscher', 'dog-newfoundland', 'dog-pomeranian', - 'dog-pug', 'dog-saint_bernard', 'dog-samoyed', - 'dog-scottish_terrier', 'dog-shiba_inu', - 'dog-staffordshire_bull_terrier', 'dog-wheaten_terrier', - 'dog-yorkshire_terrier') -metainfo = dict(classes=class_name) -_data_root = data_root + 'OxfordPets/by-breed/' # noqa +class_name = ( + "cat-Abyssinian", + "cat-Bengal", + "cat-Birman", + "cat-Bombay", + "cat-British_Shorthair", + "cat-Egyptian_Mau", + "cat-Maine_Coon", + "cat-Persian", + "cat-Ragdoll", + "cat-Russian_Blue", + "cat-Siamese", + "cat-Sphynx", + "dog-american_bulldog", + "dog-american_pit_bull_terrier", + "dog-basset_hound", + "dog-beagle", + "dog-boxer", + "dog-chihuahua", + "dog-english_cocker_spaniel", + "dog-english_setter", + "dog-german_shorthaired", + "dog-great_pyrenees", + "dog-havanese", + "dog-japanese_chin", + "dog-keeshond", + "dog-leonberger", + "dog-miniature_pinscher", + "dog-newfoundland", + "dog-pomeranian", + "dog-pug", + "dog-saint_bernard", + "dog-samoyed", + "dog-scottish_terrier", + "dog-shiba_inu", + "dog-staffordshire_bull_terrier", + "dog-wheaten_terrier", + "dog-yorkshire_terrier", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "OxfordPets/by-breed/" # noqa dataset_OxfordPets_by_breed = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_OxfordPets_by_breed = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------20 OxfordPets_by_species---------------------# -class_name = ('cat', 'dog') +class_name = ("cat", "dog") metainfo = dict(classes=class_name) -_data_root = data_root + 'OxfordPets/by-species/' # noqa +_data_root = data_root + "OxfordPets/by-species/" # noqa dataset_OxfordPets_by_species = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_OxfordPets_by_species = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------21 PKLot---------------------# -class_name = ('space-empty', 'space-occupied') +class_name = ("space-empty", "space-occupied") metainfo = dict(classes=class_name) -_data_root = data_root + 'PKLot/640/' # noqa +_data_root = data_root + "PKLot/640/" # noqa dataset_PKLot = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_PKLot = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_PKLot = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------22 Packages---------------------# -class_name = ('package', ) -metainfo = dict(classes=class_name) -_data_root = data_root + 'Packages/Raw/' -caption_prompt = { - 'package': { - 'prefix': 'there is a ', - 'suffix': ' on the porch' - } -} +class_name = ("package",) +metainfo = dict(classes=class_name) +_data_root = data_root + "Packages/Raw/" +caption_prompt = {"package": {"prefix": "there is a ", "suffix": " on the porch"}} dataset_Packages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, # NOTE w. prompt 0.728; wo. prompt 0.670 test_mode=True, - return_classes=True) -val_evaluator_Packages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Packages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------23 PascalVOC---------------------# -class_name = ('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', - 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', - 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', - 'tvmonitor') -metainfo = dict(classes=class_name) -_data_root = data_root + 'PascalVOC/' +class_name = ( + "aeroplane", + "bicycle", + "bird", + "boat", + "bottle", + "bus", + "car", + "cat", + "chair", + "cow", + "diningtable", + "dog", + "horse", + "motorbike", + "person", + "pottedplant", + "sheep", + "sofa", + "train", + "tvmonitor", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "PascalVOC/" dataset_PascalVOC = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_PascalVOC = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_PascalVOC = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------24 pistols---------------------# -class_name = ('pistol', ) +class_name = ("pistol",) metainfo = dict(classes=class_name) -_data_root = data_root + 'pistols/export/' +_data_root = data_root + "pistols/export/" dataset_pistols = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pistols = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pistols = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------25 plantdoc---------------------# -class_name = ('Apple Scab Leaf', 'Apple leaf', 'Apple rust leaf', - 'Bell_pepper leaf', 'Bell_pepper leaf spot', 'Blueberry leaf', - 'Cherry leaf', 'Corn Gray leaf spot', 'Corn leaf blight', - 'Corn rust leaf', 'Peach leaf', 'Potato leaf', - 'Potato leaf early blight', 'Potato leaf late blight', - 'Raspberry leaf', 'Soyabean leaf', 'Soybean leaf', - 'Squash Powdery mildew leaf', 'Strawberry leaf', - 'Tomato Early blight leaf', 'Tomato Septoria leaf spot', - 'Tomato leaf', 'Tomato leaf bacterial spot', - 'Tomato leaf late blight', 'Tomato leaf mosaic virus', - 'Tomato leaf yellow virus', 'Tomato mold leaf', - 'Tomato two spotted spider mites leaf', 'grape leaf', - 'grape leaf black rot') -metainfo = dict(classes=class_name) -_data_root = data_root + 'plantdoc/416x416/' +class_name = ( + "Apple Scab Leaf", + "Apple leaf", + "Apple rust leaf", + "Bell_pepper leaf", + "Bell_pepper leaf spot", + "Blueberry leaf", + "Cherry leaf", + "Corn Gray leaf spot", + "Corn leaf blight", + "Corn rust leaf", + "Peach leaf", + "Potato leaf", + "Potato leaf early blight", + "Potato leaf late blight", + "Raspberry leaf", + "Soyabean leaf", + "Soybean leaf", + "Squash Powdery mildew leaf", + "Strawberry leaf", + "Tomato Early blight leaf", + "Tomato Septoria leaf spot", + "Tomato leaf", + "Tomato leaf bacterial spot", + "Tomato leaf late blight", + "Tomato leaf mosaic virus", + "Tomato leaf yellow virus", + "Tomato mold leaf", + "Tomato two spotted spider mites leaf", + "grape leaf", + "grape leaf black rot", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "plantdoc/416x416/" dataset_plantdoc = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_plantdoc = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_plantdoc = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------26 pothole---------------------# -class_name = ('pothole', ) -metainfo = dict(classes=class_name) -_data_root = data_root + 'pothole/' -caption_prompt = { - 'pothole': { - 'name': 'holes', - 'prefix': 'there are some ', - 'suffix': ' on the road' - } -} +class_name = ("pothole",) +metainfo = dict(classes=class_name) +_data_root = data_root + "pothole/" +caption_prompt = {"pothole": {"name": "holes", "prefix": "there are some ", "suffix": " on the road"}} dataset_pothole = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), # NOTE w. prompt 0.221; wo. prompt 0.478 # caption_prompt=caption_prompt, pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pothole = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pothole = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------27 Raccoon---------------------# -class_name = ('raccoon', ) +class_name = ("raccoon",) metainfo = dict(classes=class_name) -_data_root = data_root + 'Raccoon/Raccoon.v2-raw.coco/' +_data_root = data_root + "Raccoon/Raccoon.v2-raw.coco/" dataset_Raccoon = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_Raccoon = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Raccoon = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------28 selfdrivingCar---------------------# -class_name = ('biker', 'car', 'pedestrian', 'trafficLight', - 'trafficLight-Green', 'trafficLight-GreenLeft', - 'trafficLight-Red', 'trafficLight-RedLeft', - 'trafficLight-Yellow', 'trafficLight-YellowLeft', 'truck') -metainfo = dict(classes=class_name) -_data_root = data_root + 'selfdrivingCar/fixedLarge/export/' +class_name = ( + "biker", + "car", + "pedestrian", + "trafficLight", + "trafficLight-Green", + "trafficLight-GreenLeft", + "trafficLight-Red", + "trafficLight-RedLeft", + "trafficLight-Yellow", + "trafficLight-YellowLeft", + "truck", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "selfdrivingCar/fixedLarge/export/" dataset_selfdrivingCar = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_selfdrivingCar = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_selfdrivingCar = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------29 ShellfishOpenImages---------------------# -class_name = ('Crab', 'Lobster', 'Shrimp') +class_name = ("Crab", "Lobster", "Shrimp") metainfo = dict(classes=class_name) -_data_root = data_root + 'ShellfishOpenImages/raw/' +_data_root = data_root + "ShellfishOpenImages/raw/" dataset_ShellfishOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_ShellfishOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------30 ThermalCheetah---------------------# -class_name = ('cheetah', 'human') +class_name = ("cheetah", "human") metainfo = dict(classes=class_name) -_data_root = data_root + 'ThermalCheetah/' +_data_root = data_root + "ThermalCheetah/" dataset_ThermalCheetah = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_ThermalCheetah = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_ThermalCheetah = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------31 thermalDogsAndPeople---------------------# -class_name = ('dog', 'person') +class_name = ("dog", "person") metainfo = dict(classes=class_name) -_data_root = data_root + 'thermalDogsAndPeople/' +_data_root = data_root + "thermalDogsAndPeople/" dataset_thermalDogsAndPeople = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_thermalDogsAndPeople = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------32 UnoCards---------------------# -class_name = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', - '12', '13', '14') +class_name = ("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14") metainfo = dict(classes=class_name) -_data_root = data_root + 'UnoCards/raw/' +_data_root = data_root + "UnoCards/raw/" dataset_UnoCards = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_UnoCards = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_UnoCards = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------33 VehiclesOpenImages---------------------# -class_name = ('Ambulance', 'Bus', 'Car', 'Motorcycle', 'Truck') +class_name = ("Ambulance", "Bus", "Car", "Motorcycle", "Truck") metainfo = dict(classes=class_name) -_data_root = data_root + 'VehiclesOpenImages/416x416/' +_data_root = data_root + "VehiclesOpenImages/416x416/" dataset_VehiclesOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_VehiclesOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_VehiclesOpenImages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------34 WildfireSmoke---------------------# -class_name = ('smoke', ) +class_name = ("smoke",) metainfo = dict(classes=class_name) -_data_root = data_root + 'WildfireSmoke/' +_data_root = data_root + "WildfireSmoke/" dataset_WildfireSmoke = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_WildfireSmoke = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_WildfireSmoke = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------35 websiteScreenshots---------------------# -class_name = ('button', 'field', 'heading', 'iframe', 'image', 'label', 'link', - 'text') +class_name = ("button", "field", "heading", "iframe", "image", "label", "link", "text") metainfo = dict(classes=class_name) -_data_root = data_root + 'websiteScreenshots/' +_data_root = data_root + "websiteScreenshots/" dataset_websiteScreenshots = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_websiteScreenshots = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_websiteScreenshots = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # --------------------- Config---------------------# dataset_prefixes = [ - 'AerialMaritimeDrone_large', - 'AerialMaritimeDrone_tiled', - 'AmericanSignLanguageLetters', - 'Aquarium', - 'BCCD', - 'boggleBoards', - 'brackishUnderwater', - 'ChessPieces', - 'CottontailRabbits', - 'dice', - 'DroneControl', - 'EgoHands_generic', - 'EgoHands_specific', - 'HardHatWorkers', - 'MaskWearing', - 'MountainDewCommercial', - 'NorthAmericaMushrooms', - 'openPoetryVision', - 'OxfordPets_by_breed', - 'OxfordPets_by_species', - 'PKLot', - 'Packages', - 'PascalVOC', - 'pistols', - 'plantdoc', - 'pothole', - 'Raccoons', - 'selfdrivingCar', - 'ShellfishOpenImages', - 'ThermalCheetah', - 'thermalDogsAndPeople', - 'UnoCards', - 'VehiclesOpenImages', - 'WildfireSmoke', - 'websiteScreenshots', + "AerialMaritimeDrone_large", + "AerialMaritimeDrone_tiled", + "AmericanSignLanguageLetters", + "Aquarium", + "BCCD", + "boggleBoards", + "brackishUnderwater", + "ChessPieces", + "CottontailRabbits", + "dice", + "DroneControl", + "EgoHands_generic", + "EgoHands_specific", + "HardHatWorkers", + "MaskWearing", + "MountainDewCommercial", + "NorthAmericaMushrooms", + "openPoetryVision", + "OxfordPets_by_breed", + "OxfordPets_by_species", + "PKLot", + "Packages", + "PascalVOC", + "pistols", + "plantdoc", + "pothole", + "Raccoons", + "selfdrivingCar", + "ShellfishOpenImages", + "ThermalCheetah", + "thermalDogsAndPeople", + "UnoCards", + "VehiclesOpenImages", + "WildfireSmoke", + "websiteScreenshots", ] datasets = [ - dataset_AerialMaritimeDrone_large, dataset_AerialMaritimeDrone_tiled, - dataset_AmericanSignLanguageLetters, dataset_Aquarium, dataset_BCCD, - dataset_boggleBoards, dataset_brackishUnderwater, dataset_ChessPieces, - dataset_CottontailRabbits, dataset_dice, dataset_DroneControl, - dataset_EgoHands_generic, dataset_EgoHands_specific, - dataset_HardHatWorkers, dataset_MaskWearing, dataset_MountainDewCommercial, - dataset_NorthAmericaMushrooms, dataset_openPoetryVision, - dataset_OxfordPets_by_breed, dataset_OxfordPets_by_species, dataset_PKLot, - dataset_Packages, dataset_PascalVOC, dataset_pistols, dataset_plantdoc, - dataset_pothole, dataset_Raccoon, dataset_selfdrivingCar, - dataset_ShellfishOpenImages, dataset_ThermalCheetah, - dataset_thermalDogsAndPeople, dataset_UnoCards, dataset_VehiclesOpenImages, - dataset_WildfireSmoke, dataset_websiteScreenshots + dataset_AerialMaritimeDrone_large, + dataset_AerialMaritimeDrone_tiled, + dataset_AmericanSignLanguageLetters, + dataset_Aquarium, + dataset_BCCD, + dataset_boggleBoards, + dataset_brackishUnderwater, + dataset_ChessPieces, + dataset_CottontailRabbits, + dataset_dice, + dataset_DroneControl, + dataset_EgoHands_generic, + dataset_EgoHands_specific, + dataset_HardHatWorkers, + dataset_MaskWearing, + dataset_MountainDewCommercial, + dataset_NorthAmericaMushrooms, + dataset_openPoetryVision, + dataset_OxfordPets_by_breed, + dataset_OxfordPets_by_species, + dataset_PKLot, + dataset_Packages, + dataset_PascalVOC, + dataset_pistols, + dataset_plantdoc, + dataset_pothole, + dataset_Raccoon, + dataset_selfdrivingCar, + dataset_ShellfishOpenImages, + dataset_ThermalCheetah, + dataset_thermalDogsAndPeople, + dataset_UnoCards, + dataset_VehiclesOpenImages, + dataset_WildfireSmoke, + dataset_websiteScreenshots, ] metrics = [ val_evaluator_AerialMaritimeDrone_large, val_evaluator_AerialMaritimeDrone_tiled, - val_evaluator_AmericanSignLanguageLetters, val_evaluator_Aquarium, - val_evaluator_BCCD, val_evaluator_boggleBoards, - val_evaluator_brackishUnderwater, val_evaluator_ChessPieces, - val_evaluator_CottontailRabbits, val_evaluator_dice, - val_evaluator_DroneControl, val_evaluator_EgoHands_generic, - val_evaluator_EgoHands_specific, val_evaluator_HardHatWorkers, - val_evaluator_MaskWearing, val_evaluator_MountainDewCommercial, - val_evaluator_NorthAmericaMushrooms, val_evaluator_openPoetryVision, - val_evaluator_OxfordPets_by_breed, val_evaluator_OxfordPets_by_species, - val_evaluator_PKLot, val_evaluator_Packages, val_evaluator_PascalVOC, - val_evaluator_pistols, val_evaluator_plantdoc, val_evaluator_pothole, - val_evaluator_Raccoon, val_evaluator_selfdrivingCar, - val_evaluator_ShellfishOpenImages, val_evaluator_ThermalCheetah, - val_evaluator_thermalDogsAndPeople, val_evaluator_UnoCards, - val_evaluator_VehiclesOpenImages, val_evaluator_WildfireSmoke, - val_evaluator_websiteScreenshots + val_evaluator_AmericanSignLanguageLetters, + val_evaluator_Aquarium, + val_evaluator_BCCD, + val_evaluator_boggleBoards, + val_evaluator_brackishUnderwater, + val_evaluator_ChessPieces, + val_evaluator_CottontailRabbits, + val_evaluator_dice, + val_evaluator_DroneControl, + val_evaluator_EgoHands_generic, + val_evaluator_EgoHands_specific, + val_evaluator_HardHatWorkers, + val_evaluator_MaskWearing, + val_evaluator_MountainDewCommercial, + val_evaluator_NorthAmericaMushrooms, + val_evaluator_openPoetryVision, + val_evaluator_OxfordPets_by_breed, + val_evaluator_OxfordPets_by_species, + val_evaluator_PKLot, + val_evaluator_Packages, + val_evaluator_PascalVOC, + val_evaluator_pistols, + val_evaluator_plantdoc, + val_evaluator_pothole, + val_evaluator_Raccoon, + val_evaluator_selfdrivingCar, + val_evaluator_ShellfishOpenImages, + val_evaluator_ThermalCheetah, + val_evaluator_thermalDogsAndPeople, + val_evaluator_UnoCards, + val_evaluator_VehiclesOpenImages, + val_evaluator_WildfireSmoke, + val_evaluator_websiteScreenshots, ] # -------------------------------------------------# -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw13.py b/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw13.py index 216b8059726b8fbe9dff3b2a43718bc563502aab..ce0f7f8c7bcc9e5acbdaefb06f538af0d0a5659c 100644 --- a/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw13.py +++ b/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw13.py @@ -1,36 +1,42 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py' # noqa +_base_ = "../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py" # noqa -dataset_type = 'CocoDataset' -data_root = 'data/odinw/' +dataset_type = "CocoDataset" +data_root = "data/odinw/" base_test_pipeline = _base_.test_pipeline -base_test_pipeline[-1]['meta_keys'] = ('img_id', 'img_path', 'ori_shape', - 'img_shape', 'scale_factor', 'text', - 'custom_entities', 'caption_prompt') +base_test_pipeline[-1]["meta_keys"] = ( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "text", + "custom_entities", + "caption_prompt", +) # ---------------------1 AerialMaritimeDrone---------------------# -class_name = ('boat', 'car', 'dock', 'jetski', 'lift') +class_name = ("boat", "car", "dock", "jetski", "lift") metainfo = dict(classes=class_name) -_data_root = data_root + 'AerialMaritimeDrone/large/' +_data_root = data_root + "AerialMaritimeDrone/large/" dataset_AerialMaritimeDrone = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), test_mode=True, pipeline=base_test_pipeline, - return_classes=True) + return_classes=True, +) val_evaluator_AerialMaritimeDrone = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------2 Aquarium---------------------# -class_name = ('fish', 'jellyfish', 'penguin', 'puffin', 'shark', 'starfish', - 'stingray') +class_name = ("fish", "jellyfish", "penguin", "puffin", "shark", "starfish", "stingray") metainfo = dict(classes=class_name) -_data_root = data_root + 'Aquarium/Aquarium Combined.v2-raw-1024.coco/' +_data_root = data_root + "Aquarium/Aquarium Combined.v2-raw-1024.coco/" caption_prompt = None # caption_prompt = { @@ -48,21 +54,19 @@ dataset_Aquarium = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_Aquarium = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Aquarium = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------3 CottontailRabbits---------------------# -class_name = ('Cottontail-Rabbit', ) +class_name = ("Cottontail-Rabbit",) metainfo = dict(classes=class_name) -_data_root = data_root + 'CottontailRabbits/' +_data_root = data_root + "CottontailRabbits/" caption_prompt = None # caption_prompt = {'Cottontail-Rabbit': {'name': 'rabbit'}} @@ -71,21 +75,19 @@ dataset_CottontailRabbits = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_CottontailRabbits = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_CottontailRabbits = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------4 EgoHands---------------------# -class_name = ('hand', ) +class_name = ("hand",) metainfo = dict(classes=class_name) -_data_root = data_root + 'EgoHands/generic/' +_data_root = data_root + "EgoHands/generic/" caption_prompt = None # caption_prompt = {'hand': {'suffix': ' of a person'}} @@ -94,21 +96,19 @@ dataset_EgoHands = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_EgoHands = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_EgoHands = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------5 NorthAmericaMushrooms---------------------# -class_name = ('CoW', 'chanterelle') +class_name = ("CoW", "chanterelle") metainfo = dict(classes=class_name) -_data_root = data_root + 'NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/' # noqa +_data_root = data_root + "NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/" # noqa caption_prompt = None # caption_prompt = { @@ -124,21 +124,21 @@ dataset_NorthAmericaMushrooms = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_NorthAmericaMushrooms = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------6 Packages---------------------# -class_name = ('package', ) +class_name = ("package",) metainfo = dict(classes=class_name) -_data_root = data_root + 'Packages/Raw/' +_data_root = data_root + "Packages/Raw/" caption_prompt = None # caption_prompt = { @@ -152,60 +152,72 @@ dataset_Packages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_Packages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Packages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------7 PascalVOC---------------------# -class_name = ('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', - 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', - 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', - 'tvmonitor') +class_name = ( + "aeroplane", + "bicycle", + "bird", + "boat", + "bottle", + "bus", + "car", + "cat", + "chair", + "cow", + "diningtable", + "dog", + "horse", + "motorbike", + "person", + "pottedplant", + "sheep", + "sofa", + "train", + "tvmonitor", +) metainfo = dict(classes=class_name) -_data_root = data_root + 'PascalVOC/' +_data_root = data_root + "PascalVOC/" dataset_PascalVOC = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_PascalVOC = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_PascalVOC = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------8 pistols---------------------# -class_name = ('pistol', ) +class_name = ("pistol",) metainfo = dict(classes=class_name) -_data_root = data_root + 'pistols/export/' +_data_root = data_root + "pistols/export/" dataset_pistols = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pistols = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pistols = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------9 pothole---------------------# -class_name = ('pothole', ) +class_name = ("pothole",) metainfo = dict(classes=class_name) -_data_root = data_root + 'pothole/' +_data_root = data_root + "pothole/" caption_prompt = None # caption_prompt = { @@ -220,119 +232,132 @@ dataset_pothole = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pothole = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pothole = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------10 Raccoon---------------------# -class_name = ('raccoon', ) +class_name = ("raccoon",) metainfo = dict(classes=class_name) -_data_root = data_root + 'Raccoon/Raccoon.v2-raw.coco/' +_data_root = data_root + "Raccoon/Raccoon.v2-raw.coco/" dataset_Raccoon = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_Raccoon = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Raccoon = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------11 ShellfishOpenImages---------------------# -class_name = ('Crab', 'Lobster', 'Shrimp') +class_name = ("Crab", "Lobster", "Shrimp") metainfo = dict(classes=class_name) -_data_root = data_root + 'ShellfishOpenImages/raw/' +_data_root = data_root + "ShellfishOpenImages/raw/" dataset_ShellfishOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_ShellfishOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------12 thermalDogsAndPeople---------------------# -class_name = ('dog', 'person') +class_name = ("dog", "person") metainfo = dict(classes=class_name) -_data_root = data_root + 'thermalDogsAndPeople/' +_data_root = data_root + "thermalDogsAndPeople/" dataset_thermalDogsAndPeople = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_thermalDogsAndPeople = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------13 VehiclesOpenImages---------------------# -class_name = ('Ambulance', 'Bus', 'Car', 'Motorcycle', 'Truck') +class_name = ("Ambulance", "Bus", "Car", "Motorcycle", "Truck") metainfo = dict(classes=class_name) -_data_root = data_root + 'VehiclesOpenImages/416x416/' +_data_root = data_root + "VehiclesOpenImages/416x416/" dataset_VehiclesOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_VehiclesOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_VehiclesOpenImages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # --------------------- Config---------------------# dataset_prefixes = [ - 'AerialMaritimeDrone', 'Aquarium', 'CottontailRabbits', 'EgoHands', - 'NorthAmericaMushrooms', 'Packages', 'PascalVOC', 'pistols', 'pothole', - 'Raccoon', 'ShellfishOpenImages', 'thermalDogsAndPeople', - 'VehiclesOpenImages' + "AerialMaritimeDrone", + "Aquarium", + "CottontailRabbits", + "EgoHands", + "NorthAmericaMushrooms", + "Packages", + "PascalVOC", + "pistols", + "pothole", + "Raccoon", + "ShellfishOpenImages", + "thermalDogsAndPeople", + "VehiclesOpenImages", ] datasets = [ - dataset_AerialMaritimeDrone, dataset_Aquarium, dataset_CottontailRabbits, - dataset_EgoHands, dataset_NorthAmericaMushrooms, dataset_Packages, - dataset_PascalVOC, dataset_pistols, dataset_pothole, dataset_Raccoon, - dataset_ShellfishOpenImages, dataset_thermalDogsAndPeople, - dataset_VehiclesOpenImages + dataset_AerialMaritimeDrone, + dataset_Aquarium, + dataset_CottontailRabbits, + dataset_EgoHands, + dataset_NorthAmericaMushrooms, + dataset_Packages, + dataset_PascalVOC, + dataset_pistols, + dataset_pothole, + dataset_Raccoon, + dataset_ShellfishOpenImages, + dataset_thermalDogsAndPeople, + dataset_VehiclesOpenImages, ] metrics = [ - val_evaluator_AerialMaritimeDrone, val_evaluator_Aquarium, - val_evaluator_CottontailRabbits, val_evaluator_EgoHands, - val_evaluator_NorthAmericaMushrooms, val_evaluator_Packages, - val_evaluator_PascalVOC, val_evaluator_pistols, val_evaluator_pothole, - val_evaluator_Raccoon, val_evaluator_ShellfishOpenImages, - val_evaluator_thermalDogsAndPeople, val_evaluator_VehiclesOpenImages + val_evaluator_AerialMaritimeDrone, + val_evaluator_Aquarium, + val_evaluator_CottontailRabbits, + val_evaluator_EgoHands, + val_evaluator_NorthAmericaMushrooms, + val_evaluator_Packages, + val_evaluator_PascalVOC, + val_evaluator_pistols, + val_evaluator_pothole, + val_evaluator_Raccoon, + val_evaluator_ShellfishOpenImages, + val_evaluator_thermalDogsAndPeople, + val_evaluator_VehiclesOpenImages, ] # -------------------------------------------------# -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw35.py b/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw35.py index 3df0394a204061684cbb9bb66adb08d92a784efb..bd50bb58d1ee7cd400a4de4cfd01fe34a9966ada 100644 --- a/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw35.py +++ b/mmpose/configs/mmdet/grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw35.py @@ -1,796 +1,951 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py' # noqa +_base_ = "../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py" # noqa -dataset_type = 'CocoDataset' -data_root = 'data/odinw/' +dataset_type = "CocoDataset" +data_root = "data/odinw/" base_test_pipeline = _base_.test_pipeline -base_test_pipeline[-1]['meta_keys'] = ('img_id', 'img_path', 'ori_shape', - 'img_shape', 'scale_factor', 'text', - 'custom_entities', 'caption_prompt') +base_test_pipeline[-1]["meta_keys"] = ( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "text", + "custom_entities", + "caption_prompt", +) # ---------------------1 AerialMaritimeDrone_large---------------------# -class_name = ('boat', 'car', 'dock', 'jetski', 'lift') +class_name = ("boat", "car", "dock", "jetski", "lift") metainfo = dict(classes=class_name) -_data_root = data_root + 'AerialMaritimeDrone/large/' +_data_root = data_root + "AerialMaritimeDrone/large/" dataset_AerialMaritimeDrone_large = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_AerialMaritimeDrone_large = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------2 AerialMaritimeDrone_tiled---------------------# -class_name = ('boat', 'car', 'dock', 'jetski', 'lift') +class_name = ("boat", "car", "dock", "jetski", "lift") metainfo = dict(classes=class_name) -_data_root = data_root + 'AerialMaritimeDrone/tiled/' +_data_root = data_root + "AerialMaritimeDrone/tiled/" dataset_AerialMaritimeDrone_tiled = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_AerialMaritimeDrone_tiled = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------3 AmericanSignLanguageLetters---------------------# -class_name = ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', - 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z') -metainfo = dict(classes=class_name) -_data_root = data_root + 'AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/' # noqa +class_name = ( + "A", + "B", + "C", + "D", + "E", + "F", + "G", + "H", + "I", + "J", + "K", + "L", + "M", + "N", + "O", + "P", + "Q", + "R", + "S", + "T", + "U", + "V", + "W", + "X", + "Y", + "Z", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/" # noqa dataset_AmericanSignLanguageLetters = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_AmericanSignLanguageLetters = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------4 Aquarium---------------------# -class_name = ('fish', 'jellyfish', 'penguin', 'puffin', 'shark', 'starfish', - 'stingray') +class_name = ("fish", "jellyfish", "penguin", "puffin", "shark", "starfish", "stingray") metainfo = dict(classes=class_name) -_data_root = data_root + 'Aquarium/Aquarium Combined.v2-raw-1024.coco/' +_data_root = data_root + "Aquarium/Aquarium Combined.v2-raw-1024.coco/" dataset_Aquarium = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_Aquarium = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Aquarium = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------5 BCCD---------------------# -class_name = ('Platelets', 'RBC', 'WBC') +class_name = ("Platelets", "RBC", "WBC") metainfo = dict(classes=class_name) -_data_root = data_root + 'BCCD/BCCD.v3-raw.coco/' +_data_root = data_root + "BCCD/BCCD.v3-raw.coco/" dataset_BCCD = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_BCCD = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_BCCD = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------6 boggleBoards---------------------# -class_name = ('Q', 'a', 'an', 'b', 'c', 'd', 'e', 'er', 'f', 'g', 'h', 'he', - 'i', 'in', 'j', 'k', 'l', 'm', 'n', 'o', 'o ', 'p', 'q', 'qu', - 'r', 's', 't', 't\\', 'th', 'u', 'v', 'w', 'wild', 'x', 'y', 'z') -metainfo = dict(classes=class_name) -_data_root = data_root + 'boggleBoards/416x416AutoOrient/export/' +class_name = ( + "Q", + "a", + "an", + "b", + "c", + "d", + "e", + "er", + "f", + "g", + "h", + "he", + "i", + "in", + "j", + "k", + "l", + "m", + "n", + "o", + "o ", + "p", + "q", + "qu", + "r", + "s", + "t", + "t\\", + "th", + "u", + "v", + "w", + "wild", + "x", + "y", + "z", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "boggleBoards/416x416AutoOrient/export/" dataset_boggleBoards = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_boggleBoards = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_boggleBoards = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------7 brackishUnderwater---------------------# -class_name = ('crab', 'fish', 'jellyfish', 'shrimp', 'small_fish', 'starfish') +class_name = ("crab", "fish", "jellyfish", "shrimp", "small_fish", "starfish") metainfo = dict(classes=class_name) -_data_root = data_root + 'brackishUnderwater/960x540/' +_data_root = data_root + "brackishUnderwater/960x540/" dataset_brackishUnderwater = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_brackishUnderwater = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_brackishUnderwater = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------8 ChessPieces---------------------# -class_name = (' ', 'black bishop', 'black king', 'black knight', 'black pawn', - 'black queen', 'black rook', 'white bishop', 'white king', - 'white knight', 'white pawn', 'white queen', 'white rook') -metainfo = dict(classes=class_name) -_data_root = data_root + 'ChessPieces/Chess Pieces.v23-raw.coco/' +class_name = ( + " ", + "black bishop", + "black king", + "black knight", + "black pawn", + "black queen", + "black rook", + "white bishop", + "white king", + "white knight", + "white pawn", + "white queen", + "white rook", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "ChessPieces/Chess Pieces.v23-raw.coco/" dataset_ChessPieces = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/new_annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/new_annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_ChessPieces = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/new_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_ChessPieces = dict(type="CocoMetric", ann_file=_data_root + "valid/new_annotations_without_background.json", metric="bbox") # ---------------------9 CottontailRabbits---------------------# -class_name = ('rabbit', ) +class_name = ("rabbit",) metainfo = dict(classes=class_name) -_data_root = data_root + 'CottontailRabbits/' +_data_root = data_root + "CottontailRabbits/" dataset_CottontailRabbits = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/new_annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/new_annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_CottontailRabbits = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/new_annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/new_annotations_without_background.json", metric="bbox" +) # ---------------------10 dice---------------------# -class_name = ('1', '2', '3', '4', '5', '6') +class_name = ("1", "2", "3", "4", "5", "6") metainfo = dict(classes=class_name) -_data_root = data_root + 'dice/mediumColor/export/' +_data_root = data_root + "dice/mediumColor/export/" dataset_dice = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_dice = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_dice = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------11 DroneControl---------------------# -class_name = ('follow', 'follow_hand', 'land', 'land_hand', 'null', 'object', - 'takeoff', 'takeoff-hand') +class_name = ("follow", "follow_hand", "land", "land_hand", "null", "object", "takeoff", "takeoff-hand") metainfo = dict(classes=class_name) -_data_root = data_root + 'DroneControl/Drone Control.v3-raw.coco/' +_data_root = data_root + "DroneControl/Drone Control.v3-raw.coco/" dataset_DroneControl = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_DroneControl = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_DroneControl = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------12 EgoHands_generic---------------------# -class_name = ('hand', ) +class_name = ("hand",) metainfo = dict(classes=class_name) -_data_root = data_root + 'EgoHands/generic/' -caption_prompt = {'hand': {'suffix': ' of a person'}} +_data_root = data_root + "EgoHands/generic/" +caption_prompt = {"hand": {"suffix": " of a person"}} dataset_EgoHands_generic = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, # NOTE w. prompt 0.526, wo. prompt 0.608 # caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_EgoHands_generic = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_EgoHands_generic = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------13 EgoHands_specific---------------------# -class_name = ('myleft', 'myright', 'yourleft', 'yourright') +class_name = ("myleft", "myright", "yourleft", "yourright") metainfo = dict(classes=class_name) -_data_root = data_root + 'EgoHands/specific/' +_data_root = data_root + "EgoHands/specific/" dataset_EgoHands_specific = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_EgoHands_specific = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_EgoHands_specific = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------14 HardHatWorkers---------------------# -class_name = ('head', 'helmet', 'person') +class_name = ("head", "helmet", "person") metainfo = dict(classes=class_name) -_data_root = data_root + 'HardHatWorkers/raw/' +_data_root = data_root + "HardHatWorkers/raw/" dataset_HardHatWorkers = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_HardHatWorkers = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_HardHatWorkers = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------15 MaskWearing---------------------# -class_name = ('mask', 'no-mask') +class_name = ("mask", "no-mask") metainfo = dict(classes=class_name) -_data_root = data_root + 'MaskWearing/raw/' +_data_root = data_root + "MaskWearing/raw/" dataset_MaskWearing = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_MaskWearing = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_MaskWearing = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------16 MountainDewCommercial---------------------# -class_name = ('bottle', ) +class_name = ("bottle",) metainfo = dict(classes=class_name) -_data_root = data_root + 'MountainDewCommercial/' +_data_root = data_root + "MountainDewCommercial/" dataset_MountainDewCommercial = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_MountainDewCommercial = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------17 NorthAmericaMushrooms---------------------# -class_name = ('flat mushroom', 'yellow mushroom') +class_name = ("flat mushroom", "yellow mushroom") metainfo = dict(classes=class_name) -_data_root = data_root + 'NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/' # noqa +_data_root = data_root + "NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/" # noqa dataset_NorthAmericaMushrooms = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/new_annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/new_annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_NorthAmericaMushrooms = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/new_annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/new_annotations_without_background.json", metric="bbox" +) # ---------------------18 openPoetryVision---------------------# -class_name = ('American Typewriter', 'Andale Mono', 'Apple Chancery', 'Arial', - 'Avenir', 'Baskerville', 'Big Caslon', 'Bradley Hand', - 'Brush Script MT', 'Chalkboard', 'Comic Sans MS', 'Copperplate', - 'Courier', 'Didot', 'Futura', 'Geneva', 'Georgia', 'Gill Sans', - 'Helvetica', 'Herculanum', 'Impact', 'Kefa', 'Lucida Grande', - 'Luminari', 'Marker Felt', 'Menlo', 'Monaco', 'Noteworthy', - 'Optima', 'PT Sans', 'PT Serif', 'Palatino', 'Papyrus', - 'Phosphate', 'Rockwell', 'SF Pro', 'SignPainter', 'Skia', - 'Snell Roundhand', 'Tahoma', 'Times New Roman', 'Trebuchet MS', - 'Verdana') -metainfo = dict(classes=class_name) -_data_root = data_root + 'openPoetryVision/512x512/' +class_name = ( + "American Typewriter", + "Andale Mono", + "Apple Chancery", + "Arial", + "Avenir", + "Baskerville", + "Big Caslon", + "Bradley Hand", + "Brush Script MT", + "Chalkboard", + "Comic Sans MS", + "Copperplate", + "Courier", + "Didot", + "Futura", + "Geneva", + "Georgia", + "Gill Sans", + "Helvetica", + "Herculanum", + "Impact", + "Kefa", + "Lucida Grande", + "Luminari", + "Marker Felt", + "Menlo", + "Monaco", + "Noteworthy", + "Optima", + "PT Sans", + "PT Serif", + "Palatino", + "Papyrus", + "Phosphate", + "Rockwell", + "SF Pro", + "SignPainter", + "Skia", + "Snell Roundhand", + "Tahoma", + "Times New Roman", + "Trebuchet MS", + "Verdana", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "openPoetryVision/512x512/" dataset_openPoetryVision = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_openPoetryVision = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_openPoetryVision = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------19 OxfordPets_by_breed---------------------# -class_name = ('cat-Abyssinian', 'cat-Bengal', 'cat-Birman', 'cat-Bombay', - 'cat-British_Shorthair', 'cat-Egyptian_Mau', 'cat-Maine_Coon', - 'cat-Persian', 'cat-Ragdoll', 'cat-Russian_Blue', 'cat-Siamese', - 'cat-Sphynx', 'dog-american_bulldog', - 'dog-american_pit_bull_terrier', 'dog-basset_hound', - 'dog-beagle', 'dog-boxer', 'dog-chihuahua', - 'dog-english_cocker_spaniel', 'dog-english_setter', - 'dog-german_shorthaired', 'dog-great_pyrenees', 'dog-havanese', - 'dog-japanese_chin', 'dog-keeshond', 'dog-leonberger', - 'dog-miniature_pinscher', 'dog-newfoundland', 'dog-pomeranian', - 'dog-pug', 'dog-saint_bernard', 'dog-samoyed', - 'dog-scottish_terrier', 'dog-shiba_inu', - 'dog-staffordshire_bull_terrier', 'dog-wheaten_terrier', - 'dog-yorkshire_terrier') -metainfo = dict(classes=class_name) -_data_root = data_root + 'OxfordPets/by-breed/' # noqa +class_name = ( + "cat-Abyssinian", + "cat-Bengal", + "cat-Birman", + "cat-Bombay", + "cat-British_Shorthair", + "cat-Egyptian_Mau", + "cat-Maine_Coon", + "cat-Persian", + "cat-Ragdoll", + "cat-Russian_Blue", + "cat-Siamese", + "cat-Sphynx", + "dog-american_bulldog", + "dog-american_pit_bull_terrier", + "dog-basset_hound", + "dog-beagle", + "dog-boxer", + "dog-chihuahua", + "dog-english_cocker_spaniel", + "dog-english_setter", + "dog-german_shorthaired", + "dog-great_pyrenees", + "dog-havanese", + "dog-japanese_chin", + "dog-keeshond", + "dog-leonberger", + "dog-miniature_pinscher", + "dog-newfoundland", + "dog-pomeranian", + "dog-pug", + "dog-saint_bernard", + "dog-samoyed", + "dog-scottish_terrier", + "dog-shiba_inu", + "dog-staffordshire_bull_terrier", + "dog-wheaten_terrier", + "dog-yorkshire_terrier", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "OxfordPets/by-breed/" # noqa dataset_OxfordPets_by_breed = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_OxfordPets_by_breed = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------20 OxfordPets_by_species---------------------# -class_name = ('cat', 'dog') +class_name = ("cat", "dog") metainfo = dict(classes=class_name) -_data_root = data_root + 'OxfordPets/by-species/' # noqa +_data_root = data_root + "OxfordPets/by-species/" # noqa dataset_OxfordPets_by_species = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_OxfordPets_by_species = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------21 PKLot---------------------# -class_name = ('space-empty', 'space-occupied') +class_name = ("space-empty", "space-occupied") metainfo = dict(classes=class_name) -_data_root = data_root + 'PKLot/640/' # noqa +_data_root = data_root + "PKLot/640/" # noqa dataset_PKLot = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_PKLot = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_PKLot = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------22 Packages---------------------# -class_name = ('package', ) -metainfo = dict(classes=class_name) -_data_root = data_root + 'Packages/Raw/' -caption_prompt = { - 'package': { - 'prefix': 'there is a ', - 'suffix': ' on the porch' - } -} +class_name = ("package",) +metainfo = dict(classes=class_name) +_data_root = data_root + "Packages/Raw/" +caption_prompt = {"package": {"prefix": "there is a ", "suffix": " on the porch"}} dataset_Packages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, # NOTE w. prompt 0.695; wo. prompt 0.687 test_mode=True, - return_classes=True) -val_evaluator_Packages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Packages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------23 PascalVOC---------------------# -class_name = ('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', - 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', - 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', - 'tvmonitor') -metainfo = dict(classes=class_name) -_data_root = data_root + 'PascalVOC/' +class_name = ( + "aeroplane", + "bicycle", + "bird", + "boat", + "bottle", + "bus", + "car", + "cat", + "chair", + "cow", + "diningtable", + "dog", + "horse", + "motorbike", + "person", + "pottedplant", + "sheep", + "sofa", + "train", + "tvmonitor", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "PascalVOC/" dataset_PascalVOC = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_PascalVOC = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_PascalVOC = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------24 pistols---------------------# -class_name = ('pistol', ) +class_name = ("pistol",) metainfo = dict(classes=class_name) -_data_root = data_root + 'pistols/export/' +_data_root = data_root + "pistols/export/" dataset_pistols = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pistols = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pistols = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------25 plantdoc---------------------# -class_name = ('Apple Scab Leaf', 'Apple leaf', 'Apple rust leaf', - 'Bell_pepper leaf', 'Bell_pepper leaf spot', 'Blueberry leaf', - 'Cherry leaf', 'Corn Gray leaf spot', 'Corn leaf blight', - 'Corn rust leaf', 'Peach leaf', 'Potato leaf', - 'Potato leaf early blight', 'Potato leaf late blight', - 'Raspberry leaf', 'Soyabean leaf', 'Soybean leaf', - 'Squash Powdery mildew leaf', 'Strawberry leaf', - 'Tomato Early blight leaf', 'Tomato Septoria leaf spot', - 'Tomato leaf', 'Tomato leaf bacterial spot', - 'Tomato leaf late blight', 'Tomato leaf mosaic virus', - 'Tomato leaf yellow virus', 'Tomato mold leaf', - 'Tomato two spotted spider mites leaf', 'grape leaf', - 'grape leaf black rot') -metainfo = dict(classes=class_name) -_data_root = data_root + 'plantdoc/416x416/' +class_name = ( + "Apple Scab Leaf", + "Apple leaf", + "Apple rust leaf", + "Bell_pepper leaf", + "Bell_pepper leaf spot", + "Blueberry leaf", + "Cherry leaf", + "Corn Gray leaf spot", + "Corn leaf blight", + "Corn rust leaf", + "Peach leaf", + "Potato leaf", + "Potato leaf early blight", + "Potato leaf late blight", + "Raspberry leaf", + "Soyabean leaf", + "Soybean leaf", + "Squash Powdery mildew leaf", + "Strawberry leaf", + "Tomato Early blight leaf", + "Tomato Septoria leaf spot", + "Tomato leaf", + "Tomato leaf bacterial spot", + "Tomato leaf late blight", + "Tomato leaf mosaic virus", + "Tomato leaf yellow virus", + "Tomato mold leaf", + "Tomato two spotted spider mites leaf", + "grape leaf", + "grape leaf black rot", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "plantdoc/416x416/" dataset_plantdoc = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_plantdoc = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_plantdoc = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------26 pothole---------------------# -class_name = ('pothole', ) -metainfo = dict(classes=class_name) -_data_root = data_root + 'pothole/' -caption_prompt = { - 'pothole': { - 'name': 'holes', - 'prefix': 'there are some ', - 'suffix': ' on the road' - } -} +class_name = ("pothole",) +metainfo = dict(classes=class_name) +_data_root = data_root + "pothole/" +caption_prompt = {"pothole": {"name": "holes", "prefix": "there are some ", "suffix": " on the road"}} dataset_pothole = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), # NOTE w. prompt 0.137; wo. prompt 0.215 # caption_prompt=caption_prompt, pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pothole = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pothole = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------27 Raccoon---------------------# -class_name = ('raccoon', ) +class_name = ("raccoon",) metainfo = dict(classes=class_name) -_data_root = data_root + 'Raccoon/Raccoon.v2-raw.coco/' +_data_root = data_root + "Raccoon/Raccoon.v2-raw.coco/" dataset_Raccoon = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_Raccoon = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Raccoon = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------28 selfdrivingCar---------------------# -class_name = ('biker', 'car', 'pedestrian', 'trafficLight', - 'trafficLight-Green', 'trafficLight-GreenLeft', - 'trafficLight-Red', 'trafficLight-RedLeft', - 'trafficLight-Yellow', 'trafficLight-YellowLeft', 'truck') -metainfo = dict(classes=class_name) -_data_root = data_root + 'selfdrivingCar/fixedLarge/export/' +class_name = ( + "biker", + "car", + "pedestrian", + "trafficLight", + "trafficLight-Green", + "trafficLight-GreenLeft", + "trafficLight-Red", + "trafficLight-RedLeft", + "trafficLight-Yellow", + "trafficLight-YellowLeft", + "truck", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "selfdrivingCar/fixedLarge/export/" dataset_selfdrivingCar = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_selfdrivingCar = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_selfdrivingCar = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------29 ShellfishOpenImages---------------------# -class_name = ('Crab', 'Lobster', 'Shrimp') +class_name = ("Crab", "Lobster", "Shrimp") metainfo = dict(classes=class_name) -_data_root = data_root + 'ShellfishOpenImages/raw/' +_data_root = data_root + "ShellfishOpenImages/raw/" dataset_ShellfishOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_ShellfishOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------30 ThermalCheetah---------------------# -class_name = ('cheetah', 'human') +class_name = ("cheetah", "human") metainfo = dict(classes=class_name) -_data_root = data_root + 'ThermalCheetah/' +_data_root = data_root + "ThermalCheetah/" dataset_ThermalCheetah = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_ThermalCheetah = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_ThermalCheetah = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------31 thermalDogsAndPeople---------------------# -class_name = ('dog', 'person') +class_name = ("dog", "person") metainfo = dict(classes=class_name) -_data_root = data_root + 'thermalDogsAndPeople/' +_data_root = data_root + "thermalDogsAndPeople/" dataset_thermalDogsAndPeople = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_thermalDogsAndPeople = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------32 UnoCards---------------------# -class_name = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', - '12', '13', '14') +class_name = ("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14") metainfo = dict(classes=class_name) -_data_root = data_root + 'UnoCards/raw/' +_data_root = data_root + "UnoCards/raw/" dataset_UnoCards = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_UnoCards = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_UnoCards = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------33 VehiclesOpenImages---------------------# -class_name = ('Ambulance', 'Bus', 'Car', 'Motorcycle', 'Truck') +class_name = ("Ambulance", "Bus", "Car", "Motorcycle", "Truck") metainfo = dict(classes=class_name) -_data_root = data_root + 'VehiclesOpenImages/416x416/' +_data_root = data_root + "VehiclesOpenImages/416x416/" dataset_VehiclesOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_VehiclesOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_VehiclesOpenImages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------34 WildfireSmoke---------------------# -class_name = ('smoke', ) +class_name = ("smoke",) metainfo = dict(classes=class_name) -_data_root = data_root + 'WildfireSmoke/' +_data_root = data_root + "WildfireSmoke/" dataset_WildfireSmoke = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_WildfireSmoke = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_WildfireSmoke = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------35 websiteScreenshots---------------------# -class_name = ('button', 'field', 'heading', 'iframe', 'image', 'label', 'link', - 'text') +class_name = ("button", "field", "heading", "iframe", "image", "label", "link", "text") metainfo = dict(classes=class_name) -_data_root = data_root + 'websiteScreenshots/' +_data_root = data_root + "websiteScreenshots/" dataset_websiteScreenshots = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_websiteScreenshots = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_websiteScreenshots = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # --------------------- Config---------------------# dataset_prefixes = [ - 'AerialMaritimeDrone_large', - 'AerialMaritimeDrone_tiled', - 'AmericanSignLanguageLetters', - 'Aquarium', - 'BCCD', - 'boggleBoards', - 'brackishUnderwater', - 'ChessPieces', - 'CottontailRabbits', - 'dice', - 'DroneControl', - 'EgoHands_generic', - 'EgoHands_specific', - 'HardHatWorkers', - 'MaskWearing', - 'MountainDewCommercial', - 'NorthAmericaMushrooms', - 'openPoetryVision', - 'OxfordPets_by_breed', - 'OxfordPets_by_species', - 'PKLot', - 'Packages', - 'PascalVOC', - 'pistols', - 'plantdoc', - 'pothole', - 'Raccoons', - 'selfdrivingCar', - 'ShellfishOpenImages', - 'ThermalCheetah', - 'thermalDogsAndPeople', - 'UnoCards', - 'VehiclesOpenImages', - 'WildfireSmoke', - 'websiteScreenshots', + "AerialMaritimeDrone_large", + "AerialMaritimeDrone_tiled", + "AmericanSignLanguageLetters", + "Aquarium", + "BCCD", + "boggleBoards", + "brackishUnderwater", + "ChessPieces", + "CottontailRabbits", + "dice", + "DroneControl", + "EgoHands_generic", + "EgoHands_specific", + "HardHatWorkers", + "MaskWearing", + "MountainDewCommercial", + "NorthAmericaMushrooms", + "openPoetryVision", + "OxfordPets_by_breed", + "OxfordPets_by_species", + "PKLot", + "Packages", + "PascalVOC", + "pistols", + "plantdoc", + "pothole", + "Raccoons", + "selfdrivingCar", + "ShellfishOpenImages", + "ThermalCheetah", + "thermalDogsAndPeople", + "UnoCards", + "VehiclesOpenImages", + "WildfireSmoke", + "websiteScreenshots", ] datasets = [ - dataset_AerialMaritimeDrone_large, dataset_AerialMaritimeDrone_tiled, - dataset_AmericanSignLanguageLetters, dataset_Aquarium, dataset_BCCD, - dataset_boggleBoards, dataset_brackishUnderwater, dataset_ChessPieces, - dataset_CottontailRabbits, dataset_dice, dataset_DroneControl, - dataset_EgoHands_generic, dataset_EgoHands_specific, - dataset_HardHatWorkers, dataset_MaskWearing, dataset_MountainDewCommercial, - dataset_NorthAmericaMushrooms, dataset_openPoetryVision, - dataset_OxfordPets_by_breed, dataset_OxfordPets_by_species, dataset_PKLot, - dataset_Packages, dataset_PascalVOC, dataset_pistols, dataset_plantdoc, - dataset_pothole, dataset_Raccoon, dataset_selfdrivingCar, - dataset_ShellfishOpenImages, dataset_ThermalCheetah, - dataset_thermalDogsAndPeople, dataset_UnoCards, dataset_VehiclesOpenImages, - dataset_WildfireSmoke, dataset_websiteScreenshots + dataset_AerialMaritimeDrone_large, + dataset_AerialMaritimeDrone_tiled, + dataset_AmericanSignLanguageLetters, + dataset_Aquarium, + dataset_BCCD, + dataset_boggleBoards, + dataset_brackishUnderwater, + dataset_ChessPieces, + dataset_CottontailRabbits, + dataset_dice, + dataset_DroneControl, + dataset_EgoHands_generic, + dataset_EgoHands_specific, + dataset_HardHatWorkers, + dataset_MaskWearing, + dataset_MountainDewCommercial, + dataset_NorthAmericaMushrooms, + dataset_openPoetryVision, + dataset_OxfordPets_by_breed, + dataset_OxfordPets_by_species, + dataset_PKLot, + dataset_Packages, + dataset_PascalVOC, + dataset_pistols, + dataset_plantdoc, + dataset_pothole, + dataset_Raccoon, + dataset_selfdrivingCar, + dataset_ShellfishOpenImages, + dataset_ThermalCheetah, + dataset_thermalDogsAndPeople, + dataset_UnoCards, + dataset_VehiclesOpenImages, + dataset_WildfireSmoke, + dataset_websiteScreenshots, ] metrics = [ val_evaluator_AerialMaritimeDrone_large, val_evaluator_AerialMaritimeDrone_tiled, - val_evaluator_AmericanSignLanguageLetters, val_evaluator_Aquarium, - val_evaluator_BCCD, val_evaluator_boggleBoards, - val_evaluator_brackishUnderwater, val_evaluator_ChessPieces, - val_evaluator_CottontailRabbits, val_evaluator_dice, - val_evaluator_DroneControl, val_evaluator_EgoHands_generic, - val_evaluator_EgoHands_specific, val_evaluator_HardHatWorkers, - val_evaluator_MaskWearing, val_evaluator_MountainDewCommercial, - val_evaluator_NorthAmericaMushrooms, val_evaluator_openPoetryVision, - val_evaluator_OxfordPets_by_breed, val_evaluator_OxfordPets_by_species, - val_evaluator_PKLot, val_evaluator_Packages, val_evaluator_PascalVOC, - val_evaluator_pistols, val_evaluator_plantdoc, val_evaluator_pothole, - val_evaluator_Raccoon, val_evaluator_selfdrivingCar, - val_evaluator_ShellfishOpenImages, val_evaluator_ThermalCheetah, - val_evaluator_thermalDogsAndPeople, val_evaluator_UnoCards, - val_evaluator_VehiclesOpenImages, val_evaluator_WildfireSmoke, - val_evaluator_websiteScreenshots + val_evaluator_AmericanSignLanguageLetters, + val_evaluator_Aquarium, + val_evaluator_BCCD, + val_evaluator_boggleBoards, + val_evaluator_brackishUnderwater, + val_evaluator_ChessPieces, + val_evaluator_CottontailRabbits, + val_evaluator_dice, + val_evaluator_DroneControl, + val_evaluator_EgoHands_generic, + val_evaluator_EgoHands_specific, + val_evaluator_HardHatWorkers, + val_evaluator_MaskWearing, + val_evaluator_MountainDewCommercial, + val_evaluator_NorthAmericaMushrooms, + val_evaluator_openPoetryVision, + val_evaluator_OxfordPets_by_breed, + val_evaluator_OxfordPets_by_species, + val_evaluator_PKLot, + val_evaluator_Packages, + val_evaluator_PascalVOC, + val_evaluator_pistols, + val_evaluator_plantdoc, + val_evaluator_pothole, + val_evaluator_Raccoon, + val_evaluator_selfdrivingCar, + val_evaluator_ShellfishOpenImages, + val_evaluator_ThermalCheetah, + val_evaluator_thermalDogsAndPeople, + val_evaluator_UnoCards, + val_evaluator_VehiclesOpenImages, + val_evaluator_WildfireSmoke, + val_evaluator_websiteScreenshots, ] # -------------------------------------------------# -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/grounding_dino/odinw/override_category.py b/mmpose/configs/mmdet/grounding_dino/odinw/override_category.py index 9ff05fc6e5e4d0989cf7fcf7af4dc902ee99f3a3..aeadada4e6f6c8f154ca6c413e573793b2189e48 100644 --- a/mmpose/configs/mmdet/grounding_dino/odinw/override_category.py +++ b/mmpose/configs/mmdet/grounding_dino/odinw/override_category.py @@ -5,105 +5,52 @@ import mmengine def parse_args(): - parser = argparse.ArgumentParser(description='Override Category') - parser.add_argument('data_root') + parser = argparse.ArgumentParser(description="Override Category") + parser.add_argument("data_root") return parser.parse_args() def main(): args = parse_args() - ChessPieces = [{ - 'id': 1, - 'name': ' ', - 'supercategory': 'pieces' - }, { - 'id': 2, - 'name': 'black bishop', - 'supercategory': 'pieces' - }, { - 'id': 3, - 'name': 'black king', - 'supercategory': 'pieces' - }, { - 'id': 4, - 'name': 'black knight', - 'supercategory': 'pieces' - }, { - 'id': 5, - 'name': 'black pawn', - 'supercategory': 'pieces' - }, { - 'id': 6, - 'name': 'black queen', - 'supercategory': 'pieces' - }, { - 'id': 7, - 'name': 'black rook', - 'supercategory': 'pieces' - }, { - 'id': 8, - 'name': 'white bishop', - 'supercategory': 'pieces' - }, { - 'id': 9, - 'name': 'white king', - 'supercategory': 'pieces' - }, { - 'id': 10, - 'name': 'white knight', - 'supercategory': 'pieces' - }, { - 'id': 11, - 'name': 'white pawn', - 'supercategory': 'pieces' - }, { - 'id': 12, - 'name': 'white queen', - 'supercategory': 'pieces' - }, { - 'id': 13, - 'name': 'white rook', - 'supercategory': 'pieces' - }] - - _data_root = args.data_root + 'ChessPieces/Chess Pieces.v23-raw.coco/' - json_data = mmengine.load(_data_root + - 'valid/annotations_without_background.json') - json_data['categories'] = ChessPieces - mmengine.dump(json_data, - _data_root + 'valid/new_annotations_without_background.json') - - CottontailRabbits = [{ - 'id': 1, - 'name': 'rabbit', - 'supercategory': 'Cottontail-Rabbit' - }] - - _data_root = args.data_root + 'CottontailRabbits/' - json_data = mmengine.load(_data_root + - 'valid/annotations_without_background.json') - json_data['categories'] = CottontailRabbits - mmengine.dump(json_data, - _data_root + 'valid/new_annotations_without_background.json') - - NorthAmericaMushrooms = [{ - 'id': 1, - 'name': 'flat mushroom', - 'supercategory': 'mushroom' - }, { - 'id': 2, - 'name': 'yellow mushroom', - 'supercategory': 'mushroom' - }] - - _data_root = args.data_root + 'NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/' # noqa - json_data = mmengine.load(_data_root + - 'valid/annotations_without_background.json') - json_data['categories'] = NorthAmericaMushrooms - mmengine.dump(json_data, - _data_root + 'valid/new_annotations_without_background.json') - - -if __name__ == '__main__': + ChessPieces = [ + {"id": 1, "name": " ", "supercategory": "pieces"}, + {"id": 2, "name": "black bishop", "supercategory": "pieces"}, + {"id": 3, "name": "black king", "supercategory": "pieces"}, + {"id": 4, "name": "black knight", "supercategory": "pieces"}, + {"id": 5, "name": "black pawn", "supercategory": "pieces"}, + {"id": 6, "name": "black queen", "supercategory": "pieces"}, + {"id": 7, "name": "black rook", "supercategory": "pieces"}, + {"id": 8, "name": "white bishop", "supercategory": "pieces"}, + {"id": 9, "name": "white king", "supercategory": "pieces"}, + {"id": 10, "name": "white knight", "supercategory": "pieces"}, + {"id": 11, "name": "white pawn", "supercategory": "pieces"}, + {"id": 12, "name": "white queen", "supercategory": "pieces"}, + {"id": 13, "name": "white rook", "supercategory": "pieces"}, + ] + + _data_root = args.data_root + "ChessPieces/Chess Pieces.v23-raw.coco/" + json_data = mmengine.load(_data_root + "valid/annotations_without_background.json") + json_data["categories"] = ChessPieces + mmengine.dump(json_data, _data_root + "valid/new_annotations_without_background.json") + + CottontailRabbits = [{"id": 1, "name": "rabbit", "supercategory": "Cottontail-Rabbit"}] + + _data_root = args.data_root + "CottontailRabbits/" + json_data = mmengine.load(_data_root + "valid/annotations_without_background.json") + json_data["categories"] = CottontailRabbits + mmengine.dump(json_data, _data_root + "valid/new_annotations_without_background.json") + + NorthAmericaMushrooms = [ + {"id": 1, "name": "flat mushroom", "supercategory": "mushroom"}, + {"id": 2, "name": "yellow mushroom", "supercategory": "mushroom"}, + ] + + _data_root = args.data_root + "NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/" # noqa + json_data = mmengine.load(_data_root + "valid/annotations_without_background.json") + json_data["categories"] = NorthAmericaMushrooms + mmengine.dump(json_data, _data_root + "valid/new_annotations_without_background.json") + + +if __name__ == "__main__": main() diff --git a/mmpose/configs/mmdet/grounding_dino/refcoco/grounding_dino_swin-b_pretrain_zeroshot_refexp.py b/mmpose/configs/mmdet/grounding_dino/refcoco/grounding_dino_swin-b_pretrain_zeroshot_refexp.py index dea0bad08c0ebf6455211fadb268b07868ab4ded..777881829709c97e157828d7b2f840c6b96350cf 100644 --- a/mmpose/configs/mmdet/grounding_dino/refcoco/grounding_dino_swin-b_pretrain_zeroshot_refexp.py +++ b/mmpose/configs/mmdet/grounding_dino/refcoco/grounding_dino_swin-b_pretrain_zeroshot_refexp.py @@ -1,7 +1,7 @@ -_base_ = './grounding_dino_swin-t_pretrain_zeroshot_refexp.py' +_base_ = "./grounding_dino_swin-t_pretrain_zeroshot_refexp.py" model = dict( - type='GroundingDINO', + type="GroundingDINO", backbone=dict( pretrain_img_size=384, embed_dims=128, @@ -9,6 +9,7 @@ model = dict( num_heads=[4, 8, 16, 32], window_size=12, drop_path_rate=0.3, - patch_norm=True), + patch_norm=True, + ), neck=dict(in_channels=[256, 512, 1024]), ) diff --git a/mmpose/configs/mmdet/grounding_dino/refcoco/grounding_dino_swin-t_pretrain_zeroshot_refexp.py b/mmpose/configs/mmdet/grounding_dino/refcoco/grounding_dino_swin-t_pretrain_zeroshot_refexp.py index 4b5c46574a30bbb2253fc69f79edbcf0cb016505..5f601c2ea655a8a2d8425c45230ef038206aa720 100644 --- a/mmpose/configs/mmdet/grounding_dino/refcoco/grounding_dino_swin-t_pretrain_zeroshot_refexp.py +++ b/mmpose/configs/mmdet/grounding_dino/refcoco/grounding_dino_swin-t_pretrain_zeroshot_refexp.py @@ -1,228 +1,198 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py" # 30 is an empirical value, just set it to the maximum value # without affecting the evaluation result model = dict(test_cfg=dict(max_per_img=30)) -data_root = 'data/coco/' +data_root = "data/coco/" test_pipeline = [ + dict(type="LoadImageFromFile", backend_args=None, imdecode_backend="pillow"), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True, backend="pillow"), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='LoadImageFromFile', backend_args=None, - imdecode_backend='pillow'), - dict( - type='FixScaleResize', - scale=(800, 1333), - keep_ratio=True, - backend='pillow'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities', - 'tokens_positive')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "text", "custom_entities", "tokens_positive"), + ), ] # -------------------------------------------------# -ann_file = 'mdetr_annotations/final_refexp_val.json' +ann_file = "mdetr_annotations/final_refexp_val.json" val_dataset_all_val = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) -val_evaluator_all_val = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) + backend_args=None, +) +val_evaluator_all_val = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcoco_testA.json' +ann_file = "mdetr_annotations/finetune_refcoco_testA.json" val_dataset_refcoco_testA = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcoco_testA = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcoco_testA = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcoco_testB.json' +ann_file = "mdetr_annotations/finetune_refcoco_testB.json" val_dataset_refcoco_testB = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcoco_testB = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcoco_testB = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcoco+_testA.json' +ann_file = "mdetr_annotations/finetune_refcoco+_testA.json" val_dataset_refcoco_plus_testA = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcoco_plus_testA = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcoco_plus_testA = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcoco+_testB.json' +ann_file = "mdetr_annotations/finetune_refcoco+_testB.json" val_dataset_refcoco_plus_testB = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcoco_plus_testB = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcoco_plus_testB = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcocog_test.json' +ann_file = "mdetr_annotations/finetune_refcocog_test.json" val_dataset_refcocog_test = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcocog_test = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcocog_test = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_grefcoco_val.json' +ann_file = "mdetr_annotations/finetune_grefcoco_val.json" val_dataset_grefcoco_val = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) val_evaluator_grefcoco_val = dict( - type='gRefCOCOMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - thresh_score=0.7, - thresh_f1=1.0) + type="gRefCOCOMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, thresh_score=0.7, thresh_f1=1.0 +) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_grefcoco_testA.json' +ann_file = "mdetr_annotations/finetune_grefcoco_testA.json" val_dataset_grefcoco_testA = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) val_evaluator_grefcoco_testA = dict( - type='gRefCOCOMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - thresh_score=0.7, - thresh_f1=1.0) + type="gRefCOCOMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, thresh_score=0.7, thresh_f1=1.0 +) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_grefcoco_testB.json' +ann_file = "mdetr_annotations/finetune_grefcoco_testB.json" val_dataset_grefcoco_testB = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) val_evaluator_grefcoco_testB = dict( - type='gRefCOCOMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - thresh_score=0.7, - thresh_f1=1.0) + type="gRefCOCOMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, thresh_score=0.7, thresh_f1=1.0 +) # -------------------------------------------------# datasets = [ - val_dataset_all_val, val_dataset_refcoco_testA, val_dataset_refcoco_testB, - val_dataset_refcoco_plus_testA, val_dataset_refcoco_plus_testB, - val_dataset_refcocog_test, val_dataset_grefcoco_val, - val_dataset_grefcoco_testA, val_dataset_grefcoco_testB + val_dataset_all_val, + val_dataset_refcoco_testA, + val_dataset_refcoco_testB, + val_dataset_refcoco_plus_testA, + val_dataset_refcoco_plus_testB, + val_dataset_refcocog_test, + val_dataset_grefcoco_val, + val_dataset_grefcoco_testA, + val_dataset_grefcoco_testB, ] dataset_prefixes = [ - 'val', 'refcoco_testA', 'refcoco_testB', 'refcoco+_testA', - 'refcoco+_testB', 'refcocog_test', 'grefcoco_val', 'grefcoco_testA', - 'grefcoco_testB' + "val", + "refcoco_testA", + "refcoco_testB", + "refcoco+_testA", + "refcoco+_testB", + "refcocog_test", + "grefcoco_val", + "grefcoco_testA", + "grefcoco_testB", ] metrics = [ - val_evaluator_all_val, val_evaluator_refcoco_testA, - val_evaluator_refcoco_testB, val_evaluator_refcoco_plus_testA, - val_evaluator_refcoco_plus_testB, val_evaluator_refcocog_test, - val_evaluator_grefcoco_val, val_evaluator_grefcoco_testA, - val_evaluator_grefcoco_testB + val_evaluator_all_val, + val_evaluator_refcoco_testA, + val_evaluator_refcoco_testB, + val_evaluator_refcoco_plus_testA, + val_evaluator_refcoco_plus_testB, + val_evaluator_refcocog_test, + val_evaluator_grefcoco_val, + val_evaluator_grefcoco_testA, + val_evaluator_grefcoco_testB, ] -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-fast-rcnn_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-fast-rcnn_r50-caffe_fpn_1x_coco.py index 2d0579c53cb23d71d0bec57387f413cc39449e93..60f75978c67c47b8aa356d97383b7ae516f2341d 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-fast-rcnn_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-fast-rcnn_r50-caffe_fpn_1x_coco.py @@ -1,66 +1,56 @@ -_base_ = '../fast_rcnn/fast-rcnn_r50_fpn_1x_coco.py' +_base_ = "../fast_rcnn/fast-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), - roi_head=dict( - bbox_head=dict(bbox_coder=dict(target_stds=[0.05, 0.05, 0.1, 0.1]))), + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), + roi_head=dict(bbox_head=dict(bbox_coder=dict(target_stds=[0.05, 0.05, 0.1, 0.1]))), # model training and testing settings - train_cfg=dict( - rcnn=dict( - assigner=dict(pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6), - sampler=dict(num=256))), - test_cfg=dict(rcnn=dict(score_thr=1e-3))) -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) + train_cfg=dict(rcnn=dict(assigner=dict(pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6), sampler=dict(num=256))), + test_cfg=dict(rcnn=dict(score_thr=1e-3)), +) +dataset_type = "CocoDataset" +data_root = "data/coco/" +img_norm_cfg = dict(mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadProposals', num_max_proposals=300), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), + dict(type="LoadImageFromFile"), + dict(type="LoadProposals", num_max_proposals=300), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", img_scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", flip_ratio=0.5), + dict(type="Normalize", **img_norm_cfg), + dict(type="Pad", size_divisor=32), + dict(type="DefaultFormatBundle"), + dict(type="Collect", keys=["img", "proposals", "gt_bboxes", "gt_labels"]), ] test_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadProposals', num_max_proposals=None), + dict(type="LoadImageFromFile"), + dict(type="LoadProposals", num_max_proposals=None), dict( - type='MultiScaleFlipAug', + type="MultiScaleFlipAug", img_scale=(1333, 800), flip=False, transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img', 'proposals']), - ]) + dict(type="Resize", keep_ratio=True), + dict(type="RandomFlip"), + dict(type="Normalize", **img_norm_cfg), + dict(type="Pad", size_divisor=32), + dict(type="ImageToTensor", keys=["img"]), + dict(type="Collect", keys=["img", "proposals"]), + ], + ), ] # TODO: support loading proposals data = dict( - train=dict( - proposal_file=data_root + 'proposals/ga_rpn_r50_fpn_1x_train2017.pkl', - pipeline=train_pipeline), - val=dict( - proposal_file=data_root + 'proposals/ga_rpn_r50_fpn_1x_val2017.pkl', - pipeline=test_pipeline), - test=dict( - proposal_file=data_root + 'proposals/ga_rpn_r50_fpn_1x_val2017.pkl', - pipeline=test_pipeline)) -optimizer_config = dict( - _delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) + train=dict(proposal_file=data_root + "proposals/ga_rpn_r50_fpn_1x_train2017.pkl", pipeline=train_pipeline), + val=dict(proposal_file=data_root + "proposals/ga_rpn_r50_fpn_1x_val2017.pkl", pipeline=test_pipeline), + test=dict(proposal_file=data_root + "proposals/ga_rpn_r50_fpn_1x_val2017.pkl", pipeline=test_pipeline), +) +optimizer_config = dict(_delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_r101-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_r101-caffe_fpn_1x_coco.py index f585dc355ac7dc10e75875f6b9f739fe669912bb..4fa3eac547c1c04c62e3bc0cf6dd6e10d99965d2 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_r101-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_r101-caffe_fpn_1x_coco.py @@ -1,7 +1,2 @@ -_base_ = './ga-faster-rcnn_r50-caffe_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) +_base_ = "./ga-faster-rcnn_r50-caffe_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"))) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_r50-caffe_fpn_1x_coco.py index 6cd44de557bfb20b4298099bd0972e3327b410cb..be6c5111cae475b28b5e86b169491343c82ccb04 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_r50-caffe_fpn_1x_coco.py @@ -1,64 +1,35 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50-caffe_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50-caffe_fpn_1x_coco.py" model = dict( rpn_head=dict( _delete_=True, - type='GARPNHead', + type="GARPNHead", in_channels=256, feat_channels=256, approx_anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=8, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - square_anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - scales=[8], - strides=[4, 8, 16, 32, 64]), - anchor_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.07, 0.07, 0.14, 0.14]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.07, 0.07, 0.11, 0.11]), + type="AnchorGenerator", octave_base_scale=8, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64] + ), + square_anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], scales=[8], strides=[4, 8, 16, 32, 64]), + anchor_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.07, 0.07, 0.14, 0.14]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.07, 0.07, 0.11, 0.11]), loc_filter_thr=0.01, - loss_loc=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_shape=dict(type='BoundedIoULoss', beta=0.2, loss_weight=1.0), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)), - roi_head=dict( - bbox_head=dict(bbox_coder=dict(target_stds=[0.05, 0.05, 0.1, 0.1]))), + loss_loc=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_shape=dict(type="BoundedIoULoss", beta=0.2, loss_weight=1.0), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), + roi_head=dict(bbox_head=dict(bbox_coder=dict(target_stds=[0.05, 0.05, 0.1, 0.1]))), # model training and testing settings train_cfg=dict( rpn=dict( - ga_assigner=dict( - type='ApproxMaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - ignore_iof_thr=-1), - ga_sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + ga_assigner=dict(type="ApproxMaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, ignore_iof_thr=-1), + ga_sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, center_ratio=0.2, - ignore_ratio=0.5), + ignore_ratio=0.5, + ), rpn_proposal=dict(nms_post=1000, max_per_img=300), - rcnn=dict( - assigner=dict(pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6), - sampler=dict(type='RandomSampler', num=256))), - test_cfg=dict( - rpn=dict(nms_post=1000, max_per_img=300), rcnn=dict(score_thr=1e-3))) + rcnn=dict(assigner=dict(pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6), sampler=dict(type="RandomSampler", num=256)), + ), + test_cfg=dict(rpn=dict(nms_post=1000, max_per_img=300), rcnn=dict(score_thr=1e-3)), +) optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_r50_fpn_1x_coco.py index 3007fbec42016fa8c6b90ba5b0b4e772d0e865f7..b51c41e1350dbdc5c889a11773c395fb117a667a 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_r50_fpn_1x_coco.py @@ -1,64 +1,35 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" model = dict( rpn_head=dict( _delete_=True, - type='GARPNHead', + type="GARPNHead", in_channels=256, feat_channels=256, approx_anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=8, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - square_anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - scales=[8], - strides=[4, 8, 16, 32, 64]), - anchor_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.07, 0.07, 0.14, 0.14]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.07, 0.07, 0.11, 0.11]), + type="AnchorGenerator", octave_base_scale=8, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64] + ), + square_anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], scales=[8], strides=[4, 8, 16, 32, 64]), + anchor_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.07, 0.07, 0.14, 0.14]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.07, 0.07, 0.11, 0.11]), loc_filter_thr=0.01, - loss_loc=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_shape=dict(type='BoundedIoULoss', beta=0.2, loss_weight=1.0), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)), - roi_head=dict( - bbox_head=dict(bbox_coder=dict(target_stds=[0.05, 0.05, 0.1, 0.1]))), + loss_loc=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_shape=dict(type="BoundedIoULoss", beta=0.2, loss_weight=1.0), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), + roi_head=dict(bbox_head=dict(bbox_coder=dict(target_stds=[0.05, 0.05, 0.1, 0.1]))), # model training and testing settings train_cfg=dict( rpn=dict( - ga_assigner=dict( - type='ApproxMaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - ignore_iof_thr=-1), - ga_sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + ga_assigner=dict(type="ApproxMaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, ignore_iof_thr=-1), + ga_sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, center_ratio=0.2, - ignore_ratio=0.5), + ignore_ratio=0.5, + ), rpn_proposal=dict(nms_post=1000, max_per_img=300), - rcnn=dict( - assigner=dict(pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6), - sampler=dict(type='RandomSampler', num=256))), - test_cfg=dict( - rpn=dict(nms_post=1000, max_per_img=300), rcnn=dict(score_thr=1e-3))) + rcnn=dict(assigner=dict(pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6), sampler=dict(type="RandomSampler", num=256)), + ), + test_cfg=dict(rpn=dict(nms_post=1000, max_per_img=300), rcnn=dict(score_thr=1e-3)), +) optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_x101-32x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_x101-32x4d_fpn_1x_coco.py index 8a22a1ec01e66854c68968f65802dc117aa59953..f07276defbb2f0b9375351b415f016ccbabb7d07 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_x101-32x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_x101-32x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './ga-faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "./ga-faster-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_x101-64x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_x101-64x4d_fpn_1x_coco.py index 3d6aaeaa7187deaa2c0da73a89bf14980a3405db..f310f067434191bc07b899b24d6a3a5d130fe4da 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_x101-64x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-faster-rcnn_x101-64x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './ga-faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "./ga-faster-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r101-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r101-caffe_fpn_1x_coco.py index 9adbae55eea2311800ccbc8e01e3f41521c7040b..c94d34509d27f5673b6561dec6080189dd1f29df 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r101-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r101-caffe_fpn_1x_coco.py @@ -1,7 +1,2 @@ -_base_ = './ga-retinanet_r50-caffe_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) +_base_ = "./ga-retinanet_r50-caffe_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"))) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r101-caffe_fpn_ms-2x.py b/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r101-caffe_fpn_ms-2x.py index 012e89b8338c69c4ffdf4182827a185233945288..e5141f9e0fe7d6d06db09e143faf582708a28bb0 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r101-caffe_fpn_ms-2x.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r101-caffe_fpn_ms-2x.py @@ -1,34 +1,20 @@ -_base_ = './ga-retinanet_r101-caffe_fpn_1x_coco.py' +_base_ = "./ga-retinanet_r101-caffe_fpn_1x_coco.py" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', scale=[(1333, 480), (1333, 960)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=[(1333, 480), (1333, 960)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) # learning policy max_epochs = 24 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 3.0, - by_epoch=False, - begin=0, - end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 3.0, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r50-caffe_fpn_1x_coco.py index b62aba62c64870977c7c8fe4021a361c8871b633..010825319432200be88390b247e05c0768e54034 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r50-caffe_fpn_1x_coco.py @@ -1,61 +1,31 @@ -_base_ = '../retinanet/retinanet_r50-caffe_fpn_1x_coco.py' +_base_ = "../retinanet/retinanet_r50-caffe_fpn_1x_coco.py" model = dict( bbox_head=dict( _delete_=True, - type='GARetinaHead', + type="GARetinaHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, approx_anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=4, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[8, 16, 32, 64, 128]), - square_anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - scales=[4], - strides=[8, 16, 32, 64, 128]), - anchor_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), + type="AnchorGenerator", octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128] + ), + square_anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], scales=[4], strides=[8, 16, 32, 64, 128]), + anchor_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loc_filter_thr=0.01, - loss_loc=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_shape=dict(type='BoundedIoULoss', beta=0.2, loss_weight=1.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=0.04, loss_weight=1.0)), + loss_loc=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_shape=dict(type="BoundedIoULoss", beta=0.2, loss_weight=1.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=0.04, loss_weight=1.0), + ), # training and testing settings train_cfg=dict( - ga_assigner=dict( - type='ApproxMaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0.4, - ignore_iof_thr=-1), - ga_sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + ga_assigner=dict(type="ApproxMaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0.4, ignore_iof_thr=-1), + ga_sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), assigner=dict(neg_iou_thr=0.5, min_pos_iou=0.0), center_ratio=0.2, - ignore_ratio=0.5)) + ignore_ratio=0.5, + ), +) optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r50_fpn_1x_coco.py index da39c7005b26d65cca0ae122bf078db2d8ad2786..1c9f2720b3669fd3baffa7079466d11468c444a3 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_r50_fpn_1x_coco.py @@ -1,61 +1,31 @@ -_base_ = '../retinanet/retinanet_r50_fpn_1x_coco.py' +_base_ = "../retinanet/retinanet_r50_fpn_1x_coco.py" model = dict( bbox_head=dict( _delete_=True, - type='GARetinaHead', + type="GARetinaHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, approx_anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=4, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[8, 16, 32, 64, 128]), - square_anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - scales=[4], - strides=[8, 16, 32, 64, 128]), - anchor_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), + type="AnchorGenerator", octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128] + ), + square_anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], scales=[4], strides=[8, 16, 32, 64, 128]), + anchor_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loc_filter_thr=0.01, - loss_loc=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_shape=dict(type='BoundedIoULoss', beta=0.2, loss_weight=1.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=0.04, loss_weight=1.0)), + loss_loc=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_shape=dict(type="BoundedIoULoss", beta=0.2, loss_weight=1.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=0.04, loss_weight=1.0), + ), # training and testing settings train_cfg=dict( - ga_assigner=dict( - type='ApproxMaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0.4, - ignore_iof_thr=-1), - ga_sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + ga_assigner=dict(type="ApproxMaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0.4, ignore_iof_thr=-1), + ga_sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), assigner=dict(neg_iou_thr=0.5, min_pos_iou=0.0), center_ratio=0.2, - ignore_ratio=0.5)) + ignore_ratio=0.5, + ), +) optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_x101-32x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_x101-32x4d_fpn_1x_coco.py index 478a8e5e4a2192e23329564ac688ac40c93110dd..33175972c3565ca3819ddfa3b27dc1ee88ee276b 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_x101-32x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_x101-32x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './ga-retinanet_r50_fpn_1x_coco.py' +_base_ = "./ga-retinanet_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_x101-64x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_x101-64x4d_fpn_1x_coco.py index cb7721d3a604277977b102d431076d6d58a7d457..d9364cc0f1b0db7a4d3df78f0db6c5fa8fd9353a 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_x101-64x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-retinanet_x101-64x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './ga-retinanet_r50_fpn_1x_coco.py' +_base_ = "./ga-retinanet_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-rpn_r101-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-rpn_r101-caffe_fpn_1x_coco.py index b375c874ac8cabf5ad29aacc51e1065d14d83ee1..3ac452411d5b4d4132f25c8980b96faf3f3199f6 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-rpn_r101-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-rpn_r101-caffe_fpn_1x_coco.py @@ -1,8 +1,3 @@ -_base_ = './ga-rpn_r50-caffe_fpn_1x_coco.py' +_base_ = "./ga-rpn_r50-caffe_fpn_1x_coco.py" # model settings -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"))) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-rpn_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-rpn_r50-caffe_fpn_1x_coco.py index aa58426effe8bedbe9ffb907153b98d51bef5ef2..858a0f86569bf170db194a19b5a497407348e80e 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-rpn_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-rpn_r50-caffe_fpn_1x_coco.py @@ -1,57 +1,32 @@ -_base_ = '../rpn/rpn_r50-caffe_fpn_1x_coco.py' +_base_ = "../rpn/rpn_r50-caffe_fpn_1x_coco.py" model = dict( rpn_head=dict( _delete_=True, - type='GARPNHead', + type="GARPNHead", in_channels=256, feat_channels=256, approx_anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=8, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - square_anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - scales=[8], - strides=[4, 8, 16, 32, 64]), - anchor_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.07, 0.07, 0.14, 0.14]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.07, 0.07, 0.11, 0.11]), + type="AnchorGenerator", octave_base_scale=8, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64] + ), + square_anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], scales=[8], strides=[4, 8, 16, 32, 64]), + anchor_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.07, 0.07, 0.14, 0.14]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.07, 0.07, 0.11, 0.11]), loc_filter_thr=0.01, - loss_loc=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_shape=dict(type='BoundedIoULoss', beta=0.2, loss_weight=1.0), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)), + loss_loc=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_shape=dict(type="BoundedIoULoss", beta=0.2, loss_weight=1.0), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), # model training and testing settings train_cfg=dict( rpn=dict( - ga_assigner=dict( - type='ApproxMaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - ignore_iof_thr=-1), - ga_sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + ga_assigner=dict(type="ApproxMaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, ignore_iof_thr=-1), + ga_sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, center_ratio=0.2, - ignore_ratio=0.5)), - test_cfg=dict(rpn=dict(nms_post=1000))) + ignore_ratio=0.5, + ) + ), + test_cfg=dict(rpn=dict(nms_post=1000)), +) optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-rpn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-rpn_r50_fpn_1x_coco.py index 2973f272b740c8deec74f6c24798a2d80d917946..585ab8506d1301c3ebf49b92ecf65ca6c91884ca 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-rpn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-rpn_r50_fpn_1x_coco.py @@ -1,57 +1,32 @@ -_base_ = '../rpn/rpn_r50_fpn_1x_coco.py' +_base_ = "../rpn/rpn_r50_fpn_1x_coco.py" model = dict( rpn_head=dict( _delete_=True, - type='GARPNHead', + type="GARPNHead", in_channels=256, feat_channels=256, approx_anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=8, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - square_anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - scales=[8], - strides=[4, 8, 16, 32, 64]), - anchor_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.07, 0.07, 0.14, 0.14]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.07, 0.07, 0.11, 0.11]), + type="AnchorGenerator", octave_base_scale=8, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64] + ), + square_anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], scales=[8], strides=[4, 8, 16, 32, 64]), + anchor_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.07, 0.07, 0.14, 0.14]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.07, 0.07, 0.11, 0.11]), loc_filter_thr=0.01, - loss_loc=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_shape=dict(type='BoundedIoULoss', beta=0.2, loss_weight=1.0), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)), + loss_loc=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_shape=dict(type="BoundedIoULoss", beta=0.2, loss_weight=1.0), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), # model training and testing settings train_cfg=dict( rpn=dict( - ga_assigner=dict( - type='ApproxMaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - ignore_iof_thr=-1), - ga_sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + ga_assigner=dict(type="ApproxMaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, ignore_iof_thr=-1), + ga_sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, center_ratio=0.2, - ignore_ratio=0.5)), - test_cfg=dict(rpn=dict(nms_post=1000))) + ignore_ratio=0.5, + ) + ), + test_cfg=dict(rpn=dict(nms_post=1000)), +) optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-rpn_x101-32x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-rpn_x101-32x4d_fpn_1x_coco.py index 276d45d8c21fa1eba130e834671bdddd794fa1f5..52c5b10bce11521297ea9234c55369208c039cf7 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-rpn_x101-32x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-rpn_x101-32x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './ga-rpn_r50_fpn_1x_coco.py' +_base_ = "./ga-rpn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/guided_anchoring/ga-rpn_x101-64x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/guided_anchoring/ga-rpn_x101-64x4d_fpn_1x_coco.py index f29fe9aa20054f3152e290df5ca75363dff6a4ce..c03d99a874f70d7fe691ab6088d46ef0da9da82a 100644 --- a/mmpose/configs/mmdet/guided_anchoring/ga-rpn_x101-64x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/guided_anchoring/ga-rpn_x101-64x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './ga-rpn_r50_fpn_1x_coco.py' +_base_ = "./ga-rpn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/hrnet/cascade-mask-rcnn_hrnetv2p-w18_20e_coco.py b/mmpose/configs/mmdet/hrnet/cascade-mask-rcnn_hrnetv2p-w18_20e_coco.py index 5ca0ebfe43b00886b22ffc426c5ac89a50f4fda6..7fa314725ad323f9f05191bfa9fcda36abc35f73 100644 --- a/mmpose/configs/mmdet/hrnet/cascade-mask-rcnn_hrnetv2p-w18_20e_coco.py +++ b/mmpose/configs/mmdet/hrnet/cascade-mask-rcnn_hrnetv2p-w18_20e_coco.py @@ -1,11 +1,9 @@ -_base_ = './cascade-mask-rcnn_hrnetv2p-w32_20e_coco.py' +_base_ = "./cascade-mask-rcnn_hrnetv2p-w32_20e_coco.py" # model settings model = dict( backbone=dict( - extra=dict( - stage2=dict(num_channels=(18, 36)), - stage3=dict(num_channels=(18, 36, 72)), - stage4=dict(num_channels=(18, 36, 72, 144))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18')), - neck=dict(type='HRFPN', in_channels=[18, 36, 72, 144], out_channels=256)) + extra=dict(stage2=dict(num_channels=(18, 36)), stage3=dict(num_channels=(18, 36, 72)), stage4=dict(num_channels=(18, 36, 72, 144))), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), + ), + neck=dict(type="HRFPN", in_channels=[18, 36, 72, 144], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/cascade-mask-rcnn_hrnetv2p-w32_20e_coco.py b/mmpose/configs/mmdet/hrnet/cascade-mask-rcnn_hrnetv2p-w32_20e_coco.py index 1ffedc3916748c3c6b333023110e56895de7e4bd..a72bbcd666fde023f2223f298419c0d66001be85 100644 --- a/mmpose/configs/mmdet/hrnet/cascade-mask-rcnn_hrnetv2p-w32_20e_coco.py +++ b/mmpose/configs/mmdet/hrnet/cascade-mask-rcnn_hrnetv2p-w32_20e_coco.py @@ -1,51 +1,22 @@ -_base_ = '../cascade_rcnn/cascade-mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "../cascade_rcnn/cascade-mask-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( _delete_=True, - type='HRNet', + type="HRNet", extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w32')), - neck=dict( - _delete_=True, - type='HRFPN', - in_channels=[32, 64, 128, 256], - out_channels=256)) + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w32"), + ), + neck=dict(_delete_=True, type="HRFPN", in_channels=[32, 64, 128, 256], out_channels=256), +) # learning policy max_epochs = 20 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 19], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 19], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/hrnet/cascade-mask-rcnn_hrnetv2p-w40-20e_coco.py b/mmpose/configs/mmdet/hrnet/cascade-mask-rcnn_hrnetv2p-w40-20e_coco.py index 4a51a02412871905d947bcbb648b1a24e5033f56..e3f913fbbe1820b117061bd91e26ca14bf3e2311 100644 --- a/mmpose/configs/mmdet/hrnet/cascade-mask-rcnn_hrnetv2p-w40-20e_coco.py +++ b/mmpose/configs/mmdet/hrnet/cascade-mask-rcnn_hrnetv2p-w40-20e_coco.py @@ -1,12 +1,12 @@ -_base_ = './cascade-mask-rcnn_hrnetv2p-w32_20e_coco.py' +_base_ = "./cascade-mask-rcnn_hrnetv2p-w32_20e_coco.py" # model settings model = dict( backbone=dict( - type='HRNet', + type="HRNet", extra=dict( - stage2=dict(num_channels=(40, 80)), - stage3=dict(num_channels=(40, 80, 160)), - stage4=dict(num_channels=(40, 80, 160, 320))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w40')), - neck=dict(type='HRFPN', in_channels=[40, 80, 160, 320], out_channels=256)) + stage2=dict(num_channels=(40, 80)), stage3=dict(num_channels=(40, 80, 160)), stage4=dict(num_channels=(40, 80, 160, 320)) + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w40"), + ), + neck=dict(type="HRFPN", in_channels=[40, 80, 160, 320], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/cascade-rcnn_hrnetv2p-w18-20e_coco.py b/mmpose/configs/mmdet/hrnet/cascade-rcnn_hrnetv2p-w18-20e_coco.py index 8834c1d4ac7973a0e5ceb9f794786c0d706f343a..5fa9aedbcdfb7ba8e3327f2082342a699d6c97e4 100644 --- a/mmpose/configs/mmdet/hrnet/cascade-rcnn_hrnetv2p-w18-20e_coco.py +++ b/mmpose/configs/mmdet/hrnet/cascade-rcnn_hrnetv2p-w18-20e_coco.py @@ -1,11 +1,9 @@ -_base_ = './cascade-rcnn_hrnetv2p-w32-20e_coco.py' +_base_ = "./cascade-rcnn_hrnetv2p-w32-20e_coco.py" # model settings model = dict( backbone=dict( - extra=dict( - stage2=dict(num_channels=(18, 36)), - stage3=dict(num_channels=(18, 36, 72)), - stage4=dict(num_channels=(18, 36, 72, 144))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18')), - neck=dict(type='HRFPN', in_channels=[18, 36, 72, 144], out_channels=256)) + extra=dict(stage2=dict(num_channels=(18, 36)), stage3=dict(num_channels=(18, 36, 72)), stage4=dict(num_channels=(18, 36, 72, 144))), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), + ), + neck=dict(type="HRFPN", in_channels=[18, 36, 72, 144], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/cascade-rcnn_hrnetv2p-w32-20e_coco.py b/mmpose/configs/mmdet/hrnet/cascade-rcnn_hrnetv2p-w32-20e_coco.py index afeb75dbe13c5a8425924e280b250208aaec872f..7bf26dcdf2d8197e39d4922b37d04a714341e49d 100644 --- a/mmpose/configs/mmdet/hrnet/cascade-rcnn_hrnetv2p-w32-20e_coco.py +++ b/mmpose/configs/mmdet/hrnet/cascade-rcnn_hrnetv2p-w32-20e_coco.py @@ -1,51 +1,22 @@ -_base_ = '../cascade_rcnn/cascade-rcnn_r50_fpn_1x_coco.py' +_base_ = "../cascade_rcnn/cascade-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( _delete_=True, - type='HRNet', + type="HRNet", extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w32')), - neck=dict( - _delete_=True, - type='HRFPN', - in_channels=[32, 64, 128, 256], - out_channels=256)) + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w32"), + ), + neck=dict(_delete_=True, type="HRFPN", in_channels=[32, 64, 128, 256], out_channels=256), +) # learning policy max_epochs = 20 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 19], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 19], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/hrnet/cascade-rcnn_hrnetv2p-w40-20e_coco.py b/mmpose/configs/mmdet/hrnet/cascade-rcnn_hrnetv2p-w40-20e_coco.py index 66f8882a0030ae82f7a74f67963bbd1da3422a48..e9ce966ecd698d4fc5ac3f6e052179b4e0966570 100644 --- a/mmpose/configs/mmdet/hrnet/cascade-rcnn_hrnetv2p-w40-20e_coco.py +++ b/mmpose/configs/mmdet/hrnet/cascade-rcnn_hrnetv2p-w40-20e_coco.py @@ -1,12 +1,12 @@ -_base_ = './cascade-rcnn_hrnetv2p-w32-20e_coco.py' +_base_ = "./cascade-rcnn_hrnetv2p-w32-20e_coco.py" # model settings model = dict( backbone=dict( - type='HRNet', + type="HRNet", extra=dict( - stage2=dict(num_channels=(40, 80)), - stage3=dict(num_channels=(40, 80, 160)), - stage4=dict(num_channels=(40, 80, 160, 320))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w40')), - neck=dict(type='HRFPN', in_channels=[40, 80, 160, 320], out_channels=256)) + stage2=dict(num_channels=(40, 80)), stage3=dict(num_channels=(40, 80, 160)), stage4=dict(num_channels=(40, 80, 160, 320)) + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w40"), + ), + neck=dict(type="HRFPN", in_channels=[40, 80, 160, 320], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w18-1x_coco.py b/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w18-1x_coco.py index ee9a698699a6674c90011b4037843560459462db..73396d683c5e38a1d4a631a68365d00ef5d1243f 100644 --- a/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w18-1x_coco.py +++ b/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w18-1x_coco.py @@ -1,11 +1,9 @@ -_base_ = './faster-rcnn_hrnetv2p-w32-1x_coco.py' +_base_ = "./faster-rcnn_hrnetv2p-w32-1x_coco.py" # model settings model = dict( backbone=dict( - extra=dict( - stage2=dict(num_channels=(18, 36)), - stage3=dict(num_channels=(18, 36, 72)), - stage4=dict(num_channels=(18, 36, 72, 144))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18')), - neck=dict(type='HRFPN', in_channels=[18, 36, 72, 144], out_channels=256)) + extra=dict(stage2=dict(num_channels=(18, 36)), stage3=dict(num_channels=(18, 36, 72)), stage4=dict(num_channels=(18, 36, 72, 144))), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), + ), + neck=dict(type="HRFPN", in_channels=[18, 36, 72, 144], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w18-2x_coco.py b/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w18-2x_coco.py index 0b72c68f8cbbc83d16313c6d3ab3faf0ac86926f..68136e5bb568e853f72d60daeb6598ca5b94bcd5 100644 --- a/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w18-2x_coco.py +++ b/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w18-2x_coco.py @@ -1,16 +1,9 @@ -_base_ = './faster-rcnn_hrnetv2p-w18-1x_coco.py' +_base_ = "./faster-rcnn_hrnetv2p-w18-1x_coco.py" # learning policy max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w32-1x_coco.py b/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w32-1x_coco.py index a27ad06c5c169c84c6368f767b79b0a817d99fa1..e20b3fcf3f82abdec28d84b0bc99a6a89e1ae849 100644 --- a/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w32-1x_coco.py +++ b/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w32-1x_coco.py @@ -1,37 +1,15 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( _delete_=True, - type='HRNet', + type="HRNet", extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w32')), - neck=dict( - _delete_=True, - type='HRFPN', - in_channels=[32, 64, 128, 256], - out_channels=256)) + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w32"), + ), + neck=dict(_delete_=True, type="HRFPN", in_channels=[32, 64, 128, 256], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w32_2x_coco.py b/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w32_2x_coco.py index c9568ce65c142f86ec6181236464454106d7de99..a1afc36c452f4df16791f0bd7329b028b3df9939 100644 --- a/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w32_2x_coco.py +++ b/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w32_2x_coco.py @@ -1,16 +1,9 @@ -_base_ = './faster-rcnn_hrnetv2p-w32-1x_coco.py' +_base_ = "./faster-rcnn_hrnetv2p-w32-1x_coco.py" # learning policy max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w40-1x_coco.py b/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w40-1x_coco.py index b36200230b76269a9644cc7852cec6ce62eac5c3..5a5fbd002c114c90b034ba477ee864df66ec53d4 100644 --- a/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w40-1x_coco.py +++ b/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w40-1x_coco.py @@ -1,11 +1,11 @@ -_base_ = './faster-rcnn_hrnetv2p-w32-1x_coco.py' +_base_ = "./faster-rcnn_hrnetv2p-w32-1x_coco.py" model = dict( backbone=dict( - type='HRNet', + type="HRNet", extra=dict( - stage2=dict(num_channels=(40, 80)), - stage3=dict(num_channels=(40, 80, 160)), - stage4=dict(num_channels=(40, 80, 160, 320))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w40')), - neck=dict(type='HRFPN', in_channels=[40, 80, 160, 320], out_channels=256)) + stage2=dict(num_channels=(40, 80)), stage3=dict(num_channels=(40, 80, 160)), stage4=dict(num_channels=(40, 80, 160, 320)) + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w40"), + ), + neck=dict(type="HRFPN", in_channels=[40, 80, 160, 320], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w40_2x_coco.py b/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w40_2x_coco.py index d1b45355db1de7c649136438b91fec5199e08141..61455bf08434cc185512e1b0c596e46358c4ff99 100644 --- a/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w40_2x_coco.py +++ b/mmpose/configs/mmdet/hrnet/faster-rcnn_hrnetv2p-w40_2x_coco.py @@ -1,16 +1,9 @@ -_base_ = './faster-rcnn_hrnetv2p-w40-1x_coco.py' +_base_ = "./faster-rcnn_hrnetv2p-w40-1x_coco.py" # learning policy max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w18-gn-head_4xb4-1x_coco.py b/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w18-gn-head_4xb4-1x_coco.py index c20ca7767364e14e552b5b8af68a8124f6a1253e..8ed9c0549d9f2901b6640dcde75288bc807179de 100644 --- a/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w18-gn-head_4xb4-1x_coco.py +++ b/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w18-gn-head_4xb4-1x_coco.py @@ -1,10 +1,8 @@ -_base_ = './fcos_hrnetv2p-w32-gn-head_4xb4-1x_coco.py' +_base_ = "./fcos_hrnetv2p-w32-gn-head_4xb4-1x_coco.py" model = dict( backbone=dict( - extra=dict( - stage2=dict(num_channels=(18, 36)), - stage3=dict(num_channels=(18, 36, 72)), - stage4=dict(num_channels=(18, 36, 72, 144))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18')), - neck=dict(type='HRFPN', in_channels=[18, 36, 72, 144], out_channels=256)) + extra=dict(stage2=dict(num_channels=(18, 36)), stage3=dict(num_channels=(18, 36, 72)), stage4=dict(num_channels=(18, 36, 72, 144))), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), + ), + neck=dict(type="HRFPN", in_channels=[18, 36, 72, 144], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w18-gn-head_4xb4-2x_coco.py b/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w18-gn-head_4xb4-2x_coco.py index f5b67f6a12e294455829dddb89d05e281f2d7dc0..960b2917502b54d685b551f45c4677dc9415d3cc 100644 --- a/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w18-gn-head_4xb4-2x_coco.py +++ b/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w18-gn-head_4xb4-2x_coco.py @@ -1,16 +1,9 @@ -_base_ = './fcos_hrnetv2p-w18-gn-head_4xb4-1x_coco.py' +_base_ = "./fcos_hrnetv2p-w18-gn-head_4xb4-1x_coco.py" # learning policy max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w18-gn-head_ms-640-800-4xb4-2x_coco.py b/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w18-gn-head_ms-640-800-4xb4-2x_coco.py index c5332d65d129255117f459f45369d5e13ed6653c..568165305e7d65b4b7caf26ad1b4b2071b12e402 100644 --- a/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w18-gn-head_ms-640-800-4xb4-2x_coco.py +++ b/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w18-gn-head_ms-640-800-4xb4-2x_coco.py @@ -1,10 +1,8 @@ -_base_ = './fcos_hrnetv2p-w32-gn-head_ms-640-800-4xb4-2x_coco.py' +_base_ = "./fcos_hrnetv2p-w32-gn-head_ms-640-800-4xb4-2x_coco.py" model = dict( backbone=dict( - extra=dict( - stage2=dict(num_channels=(18, 36)), - stage3=dict(num_channels=(18, 36, 72)), - stage4=dict(num_channels=(18, 36, 72, 144))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18')), - neck=dict(type='HRFPN', in_channels=[18, 36, 72, 144], out_channels=256)) + extra=dict(stage2=dict(num_channels=(18, 36)), stage3=dict(num_channels=(18, 36, 72)), stage4=dict(num_channels=(18, 36, 72, 144))), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), + ), + neck=dict(type="HRFPN", in_channels=[18, 36, 72, 144], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w32-gn-head_4xb4-1x_coco.py b/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w32-gn-head_4xb4-1x_coco.py index 159d96d712ae047efd7988bc53ae65006291478f..836fd00ec70622ae59232f43fc856eafaa8e0a56 100644 --- a/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w32-gn-head_4xb4-1x_coco.py +++ b/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w32-gn-head_4xb4-1x_coco.py @@ -1,43 +1,16 @@ -_base_ = '../fcos/fcos_r50-caffe_fpn_gn-head_4xb4-1x_coco.py' +_base_ = "../fcos/fcos_r50-caffe_fpn_gn-head_4xb4-1x_coco.py" model = dict( - data_preprocessor=dict( - mean=[103.53, 116.28, 123.675], - std=[57.375, 57.12, 58.395], - bgr_to_rgb=False), + data_preprocessor=dict(mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], bgr_to_rgb=False), backbone=dict( _delete_=True, - type='HRNet', + type="HRNet", extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w32')), - neck=dict( - _delete_=True, - type='HRFPN', - in_channels=[32, 64, 128, 256], - out_channels=256, - stride=2, - num_outs=5)) + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w32"), + ), + neck=dict(_delete_=True, type="HRFPN", in_channels=[32, 64, 128, 256], out_channels=256, stride=2, num_outs=5), +) diff --git a/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w32-gn-head_4xb4-2x_coco.py b/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w32-gn-head_4xb4-2x_coco.py index 73fd80e979d88840a57c68ca2fad6cb2e82a26bd..87c95708aaa26a4e404257744ef02e150256da84 100644 --- a/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w32-gn-head_4xb4-2x_coco.py +++ b/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w32-gn-head_4xb4-2x_coco.py @@ -1,16 +1,9 @@ -_base_ = './fcos_hrnetv2p-w32-gn-head_4xb4-1x_coco.py' +_base_ = "./fcos_hrnetv2p-w32-gn-head_4xb4-1x_coco.py" # learning policy max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w32-gn-head_ms-640-800-4xb4-2x_coco.py b/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w32-gn-head_ms-640-800-4xb4-2x_coco.py index 4c977bf31ed2fb0ef062108cea97c1cd235b89d3..798018e6bab0660aa05530ee41cc0cf84e9590b1 100644 --- a/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w32-gn-head_ms-640-800-4xb4-2x_coco.py +++ b/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w32-gn-head_ms-640-800-4xb4-2x_coco.py @@ -1,20 +1,13 @@ -_base_ = './fcos_hrnetv2p-w32-gn-head_4xb4-1x_coco.py' +_base_ = "./fcos_hrnetv2p-w32-gn-head_4xb4-1x_coco.py" -model = dict( - data_preprocessor=dict( - mean=[103.53, 116.28, 123.675], - std=[57.375, 57.12, 58.395], - bgr_to_rgb=False)) +model = dict(data_preprocessor=dict(mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], bgr_to_rgb=False)) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) @@ -23,13 +16,6 @@ train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w40-gn-head_ms-640-800-4xb4-2x_coco.py b/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w40-gn-head_ms-640-800-4xb4-2x_coco.py index bb0ff6d6ce80e702f6e88b556a770345a23afca4..5e544b426b02254f3e71553af04f63cd97bd3c41 100644 --- a/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w40-gn-head_ms-640-800-4xb4-2x_coco.py +++ b/mmpose/configs/mmdet/hrnet/fcos_hrnetv2p-w40-gn-head_ms-640-800-4xb4-2x_coco.py @@ -1,11 +1,11 @@ -_base_ = './fcos_hrnetv2p-w32-gn-head_ms-640-800-4xb4-2x_coco.py' +_base_ = "./fcos_hrnetv2p-w32-gn-head_ms-640-800-4xb4-2x_coco.py" model = dict( backbone=dict( - type='HRNet', + type="HRNet", extra=dict( - stage2=dict(num_channels=(40, 80)), - stage3=dict(num_channels=(40, 80, 160)), - stage4=dict(num_channels=(40, 80, 160, 320))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w40')), - neck=dict(type='HRFPN', in_channels=[40, 80, 160, 320], out_channels=256)) + stage2=dict(num_channels=(40, 80)), stage3=dict(num_channels=(40, 80, 160)), stage4=dict(num_channels=(40, 80, 160, 320)) + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w40"), + ), + neck=dict(type="HRFPN", in_channels=[40, 80, 160, 320], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w18_20e_coco.py b/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w18_20e_coco.py index 55255d52a3541c99660dcddfba96da27c99f841d..d4d5660e841dbcdb3d913c05c8ed40e9d63f1b34 100644 --- a/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w18_20e_coco.py +++ b/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w18_20e_coco.py @@ -1,10 +1,8 @@ -_base_ = './htc_hrnetv2p-w32_20e_coco.py' +_base_ = "./htc_hrnetv2p-w32_20e_coco.py" model = dict( backbone=dict( - extra=dict( - stage2=dict(num_channels=(18, 36)), - stage3=dict(num_channels=(18, 36, 72)), - stage4=dict(num_channels=(18, 36, 72, 144))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18')), - neck=dict(type='HRFPN', in_channels=[18, 36, 72, 144], out_channels=256)) + extra=dict(stage2=dict(num_channels=(18, 36)), stage3=dict(num_channels=(18, 36, 72)), stage4=dict(num_channels=(18, 36, 72, 144))), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), + ), + neck=dict(type="HRFPN", in_channels=[18, 36, 72, 144], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w32_20e_coco.py b/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w32_20e_coco.py index 545cb83eaca50f9d5de1fa6b3f3e569faab7d5f2..6a696f90de0c2fc9edf1b9a1b745b3f407343c55 100644 --- a/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w32_20e_coco.py +++ b/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w32_20e_coco.py @@ -1,37 +1,15 @@ -_base_ = '../htc/htc_r50_fpn_20e_coco.py' +_base_ = "../htc/htc_r50_fpn_20e_coco.py" model = dict( backbone=dict( _delete_=True, - type='HRNet', + type="HRNet", extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w32')), - neck=dict( - _delete_=True, - type='HRFPN', - in_channels=[32, 64, 128, 256], - out_channels=256)) + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w32"), + ), + neck=dict(_delete_=True, type="HRFPN", in_channels=[32, 64, 128, 256], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w40_20e_coco.py b/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w40_20e_coco.py index b09256a08ee16893bcc0dd6518714daece294e0d..05aece0afed300b8f90a453581a53ba13049beb7 100644 --- a/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w40_20e_coco.py +++ b/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w40_20e_coco.py @@ -1,11 +1,11 @@ -_base_ = './htc_hrnetv2p-w32_20e_coco.py' +_base_ = "./htc_hrnetv2p-w32_20e_coco.py" model = dict( backbone=dict( - type='HRNet', + type="HRNet", extra=dict( - stage2=dict(num_channels=(40, 80)), - stage3=dict(num_channels=(40, 80, 160)), - stage4=dict(num_channels=(40, 80, 160, 320))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w40')), - neck=dict(type='HRFPN', in_channels=[40, 80, 160, 320], out_channels=256)) + stage2=dict(num_channels=(40, 80)), stage3=dict(num_channels=(40, 80, 160)), stage4=dict(num_channels=(40, 80, 160, 320)) + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w40"), + ), + neck=dict(type="HRFPN", in_channels=[40, 80, 160, 320], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w40_28e_coco.py b/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w40_28e_coco.py index 1c13b58a1a0690d19239fef40915489ddaff408e..cb0bafcbd5b046685198930b1edbc08c2991d13c 100644 --- a/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w40_28e_coco.py +++ b/mmpose/configs/mmdet/hrnet/htc_hrnetv2p-w40_28e_coco.py @@ -1,16 +1,9 @@ -_base_ = './htc_hrnetv2p-w40_20e_coco.py' +_base_ = "./htc_hrnetv2p-w40_20e_coco.py" # learning policy max_epochs = 28 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[24, 27], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[24, 27], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/hrnet/htc_x101-64x4d_fpn_16xb1-28e_coco.py b/mmpose/configs/mmdet/hrnet/htc_x101-64x4d_fpn_16xb1-28e_coco.py index 1f1304e5f963351667c28cb264ca5434bc81f744..3e6caec59a0c47ac655c168ca6b4124b29e09fec 100644 --- a/mmpose/configs/mmdet/hrnet/htc_x101-64x4d_fpn_16xb1-28e_coco.py +++ b/mmpose/configs/mmdet/hrnet/htc_x101-64x4d_fpn_16xb1-28e_coco.py @@ -1,16 +1,9 @@ -_base_ = '../htc/htc_x101-64x4d_fpn_16xb1-20e_coco.py' +_base_ = "../htc/htc_x101-64x4d_fpn_16xb1-20e_coco.py" # learning policy max_epochs = 28 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[24, 27], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[24, 27], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w18-1x_coco.py b/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w18-1x_coco.py index 5d5a463a66bed51d73a42eafffea654a18c111ce..8f903250d027c6361eac5c45985b04d06353ce46 100644 --- a/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w18-1x_coco.py +++ b/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w18-1x_coco.py @@ -1,10 +1,8 @@ -_base_ = './mask-rcnn_hrnetv2p-w32-1x_coco.py' +_base_ = "./mask-rcnn_hrnetv2p-w32-1x_coco.py" model = dict( backbone=dict( - extra=dict( - stage2=dict(num_channels=(18, 36)), - stage3=dict(num_channels=(18, 36, 72)), - stage4=dict(num_channels=(18, 36, 72, 144))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w18')), - neck=dict(type='HRFPN', in_channels=[18, 36, 72, 144], out_channels=256)) + extra=dict(stage2=dict(num_channels=(18, 36)), stage3=dict(num_channels=(18, 36, 72)), stage4=dict(num_channels=(18, 36, 72, 144))), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w18"), + ), + neck=dict(type="HRFPN", in_channels=[18, 36, 72, 144], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w18-2x_coco.py b/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w18-2x_coco.py index 8abc55924a3eb8e06f9e1e5eeed503890542f6f6..a8636e9720adbfe4d98b3cd445896260a04c16c8 100644 --- a/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w18-2x_coco.py +++ b/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w18-2x_coco.py @@ -1,16 +1,9 @@ -_base_ = './mask-rcnn_hrnetv2p-w18-1x_coco.py' +_base_ = "./mask-rcnn_hrnetv2p-w18-1x_coco.py" # learning policy max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w32-1x_coco.py b/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w32-1x_coco.py index 208b037807dfa9cab1d33ac58ac785ff72e400c1..fe88dd2d2e5a2b9e3482ed845ee4ae388e582ef1 100644 --- a/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w32-1x_coco.py +++ b/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w32-1x_coco.py @@ -1,37 +1,15 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( _delete_=True, - type='HRNet', + type="HRNet", extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w32')), - neck=dict( - _delete_=True, - type='HRFPN', - in_channels=[32, 64, 128, 256], - out_channels=256)) + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w32"), + ), + neck=dict(_delete_=True, type="HRFPN", in_channels=[32, 64, 128, 256], out_channels=256), +) diff --git a/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w32-2x_coco.py b/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w32-2x_coco.py index d3741c820a6a0ca622ce6bbf80cb3e922107efb6..8996cd48e7e948071a789823f788ec437c81ae57 100644 --- a/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w32-2x_coco.py +++ b/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w32-2x_coco.py @@ -1,16 +1,9 @@ -_base_ = './mask-rcnn_hrnetv2p-w32-1x_coco.py' +_base_ = "./mask-rcnn_hrnetv2p-w32-1x_coco.py" # learning policy max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w40-2x_coco.py b/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w40-2x_coco.py index 360420c56d42814ed6f4d84775f1a19dfa96574a..256931a921e534e299f59aae2c7c583b3ba441df 100644 --- a/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w40-2x_coco.py +++ b/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w40-2x_coco.py @@ -1,16 +1,9 @@ -_base_ = './mask-rcnn_hrnetv2p-w40_1x_coco.py' +_base_ = "./mask-rcnn_hrnetv2p-w40_1x_coco.py" # learning policy max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w40_1x_coco.py b/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w40_1x_coco.py index 36e2305a520fd8305f9fd1358f5cbcb01027e40d..c5a3d24676ed0d48cbccc943af1266b030fb0de1 100644 --- a/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w40_1x_coco.py +++ b/mmpose/configs/mmdet/hrnet/mask-rcnn_hrnetv2p-w40_1x_coco.py @@ -1,11 +1,11 @@ -_base_ = './mask-rcnn_hrnetv2p-w18-1x_coco.py' +_base_ = "./mask-rcnn_hrnetv2p-w18-1x_coco.py" model = dict( backbone=dict( - type='HRNet', + type="HRNet", extra=dict( - stage2=dict(num_channels=(40, 80)), - stage3=dict(num_channels=(40, 80, 160)), - stage4=dict(num_channels=(40, 80, 160, 320))), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://msra/hrnetv2_w40')), - neck=dict(type='HRFPN', in_channels=[40, 80, 160, 320], out_channels=256)) + stage2=dict(num_channels=(40, 80)), stage3=dict(num_channels=(40, 80, 160)), stage4=dict(num_channels=(40, 80, 160, 320)) + ), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://msra/hrnetv2_w40"), + ), + neck=dict(type="HRFPN", in_channels=[40, 80, 160, 320], out_channels=256), +) diff --git a/mmpose/configs/mmdet/htc/htc-without-semantic_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/htc/htc-without-semantic_r50_fpn_1x_coco.py index 791f4eb25b53e122cd4876a71e84a4a9d2f67e26..f2d414693ea91c2acd81fb48124a56dabb9423ea 100644 --- a/mmpose/configs/mmdet/htc/htc-without-semantic_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/htc/htc-without-semantic_r50_fpn_1x_coco.py @@ -1,223 +1,148 @@ -_base_ = [ - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_instance.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings model = dict( - type='HybridTaskCascade', + type="HybridTaskCascade", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.0), + ), roi_head=dict( - type='HybridTaskCascadeRoIHead', + type="HybridTaskCascadeRoIHead", interleaved=True, mask_info_flow=True, num_stages=3, stage_loss_weights=[1, 0.5, 0.25], bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=[ dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.05, 0.05, 0.1, 0.1]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.033, 0.033, 0.067, 0.067]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), ], mask_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), mask_head=[ dict( - type='HTCMaskHead', + type="HTCMaskHead", with_conv_res=False, num_convs=4, in_channels=256, conv_out_channels=256, num_classes=80, - loss_mask=dict( - type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)), + loss_mask=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0), + ), dict( - type='HTCMaskHead', + type="HTCMaskHead", num_convs=4, in_channels=256, conv_out_channels=256, num_classes=80, - loss_mask=dict( - type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)), + loss_mask=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0), + ), dict( - type='HTCMaskHead', + type="HTCMaskHead", num_convs=4, in_channels=256, conv_out_channels=256, num_classes=80, - loss_mask=dict( - type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)) - ]), + loss_mask=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0), + ), + ], + ), # model training and testing settings train_cfg=dict( rpn=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, ignore_iof_thr=-1), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=[ dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, ignore_iof_thr=-1), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, - debug=False), + debug=False, + ), dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.6, - neg_iou_thr=0.6, - min_pos_iou=0.6, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6, ignore_iof_thr=-1), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, - debug=False), + debug=False, + ), dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.7, - min_pos_iou=0.7, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.7, min_pos_iou=0.7, ignore_iof_thr=-1), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, - debug=False) - ]), + debug=False, + ), + ], + ), test_cfg=dict( - rpn=dict( - nms_pre=1000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.001, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100, - mask_thr_binary=0.5))) + rpn=dict(nms_pre=1000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.001, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100, mask_thr_binary=0.5), + ), +) diff --git a/mmpose/configs/mmdet/htc/htc_r101_fpn_20e_coco.py b/mmpose/configs/mmdet/htc/htc_r101_fpn_20e_coco.py index 28091aad31029109c29941404f2c3cc47f9c1092..771772d89174da6c7e35b915fa5d0daebe4f07ae 100644 --- a/mmpose/configs/mmdet/htc/htc_r101_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/htc/htc_r101_fpn_20e_coco.py @@ -1,6 +1,2 @@ -_base_ = './htc_r50_fpn_20e_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./htc_r50_fpn_20e_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/htc/htc_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/htc/htc_r50_fpn_1x_coco.py index 3573f1f698095585f4a1de692d0e45a21429822e..a2db6d8206a61d8636a2c42fb86459f548859097 100644 --- a/mmpose/configs/mmdet/htc/htc_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/htc/htc_r50_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './htc-without-semantic_r50_fpn_1x_coco.py' +_base_ = "./htc-without-semantic_r50_fpn_1x_coco.py" model = dict( data_preprocessor=dict(pad_seg=True), roi_head=dict( semantic_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=0), out_channels=256, - featmap_strides=[8]), + featmap_strides=[8], + ), semantic_head=dict( - type='FusedSemanticHead', + type="FusedSemanticHead", num_ins=5, fusion_level=1, seg_scale_factor=1 / 8, @@ -16,18 +17,16 @@ model = dict( in_channels=256, conv_out_channels=256, num_classes=183, - loss_seg=dict( - type='CrossEntropyLoss', ignore_index=255, loss_weight=0.2)))) + loss_seg=dict(type="CrossEntropyLoss", ignore_index=255, loss_weight=0.2), + ), + ), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict( - type='LoadAnnotations', with_bbox=True, with_mask=True, with_seg=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, with_seg=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] -train_dataloader = dict( - dataset=dict( - data_prefix=dict(img='train2017/', seg='stuffthingmaps/train2017/'), - pipeline=train_pipeline)) +train_dataloader = dict(dataset=dict(data_prefix=dict(img="train2017/", seg="stuffthingmaps/train2017/"), pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/htc/htc_r50_fpn_20e_coco.py b/mmpose/configs/mmdet/htc/htc_r50_fpn_20e_coco.py index 9f510fa6eec210381707f4d1b01264e72e0d0f76..99122c074db23ce9a2aba4b2f964c0367e160e4c 100644 --- a/mmpose/configs/mmdet/htc/htc_r50_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/htc/htc_r50_fpn_20e_coco.py @@ -1,16 +1,9 @@ -_base_ = './htc_r50_fpn_1x_coco.py' +_base_ = "./htc_r50_fpn_1x_coco.py" # learning policy max_epochs = 20 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 19], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 19], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/htc/htc_x101-32x4d_fpn_16xb1-20e_coco.py b/mmpose/configs/mmdet/htc/htc_x101-32x4d_fpn_16xb1-20e_coco.py index 396d3a0e2b72acc1d9601706ec4629720a46a738..b550b04668d4dd5df21b92344b99800972d52106 100644 --- a/mmpose/configs/mmdet/htc/htc_x101-32x4d_fpn_16xb1-20e_coco.py +++ b/mmpose/configs/mmdet/htc/htc_x101-32x4d_fpn_16xb1-20e_coco.py @@ -1,32 +1,26 @@ -_base_ = './htc_r50_fpn_1x_coco.py' +_base_ = "./htc_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) train_dataloader = dict(batch_size=1, num_workers=1) # learning policy max_epochs = 20 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 19], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 19], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/htc/htc_x101-64x4d-dconv-c3-c5_fpn_ms-400-1400-16xb1-20e_coco.py b/mmpose/configs/mmdet/htc/htc_x101-64x4d-dconv-c3-c5_fpn_ms-400-1400-16xb1-20e_coco.py index 26d68e7e2cda2a711e4d16899ae85b100afc60a0..5a03af8f72c53cb50076e9d8ca9949a6de1460a1 100644 --- a/mmpose/configs/mmdet/htc/htc_x101-64x4d-dconv-c3-c5_fpn_ms-400-1400-16xb1-20e_coco.py +++ b/mmpose/configs/mmdet/htc/htc_x101-64x4d-dconv-c3-c5_fpn_ms-400-1400-16xb1-20e_coco.py @@ -1,20 +1,13 @@ -_base_ = './htc_x101-64x4d_fpn_16xb1-20e_coco.py' +_base_ = "./htc_x101-64x4d_fpn_16xb1-20e_coco.py" -model = dict( - backbone=dict( - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True))) +model = dict(backbone=dict(dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True))) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='LoadAnnotations', with_bbox=True, with_mask=True, with_seg=True), - dict( - type='RandomResize', - scale=[(1600, 400), (1600, 1400)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, with_seg=True), + dict(type="RandomResize", scale=[(1600, 400), (1600, 1400)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/htc/htc_x101-64x4d_fpn_16xb1-20e_coco.py b/mmpose/configs/mmdet/htc/htc_x101-64x4d_fpn_16xb1-20e_coco.py index a600ddb0ebd2287cdaa0d00a6008db636d79be76..b848eaa37ef0e589e11473d21c8d430151103066 100644 --- a/mmpose/configs/mmdet/htc/htc_x101-64x4d_fpn_16xb1-20e_coco.py +++ b/mmpose/configs/mmdet/htc/htc_x101-64x4d_fpn_16xb1-20e_coco.py @@ -1,7 +1,2 @@ -_base_ = './htc_x101-32x4d_fpn_16xb1-20e_coco.py' -model = dict( - backbone=dict( - type='ResNeXt', - groups=64, - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) +_base_ = "./htc_x101-32x4d_fpn_16xb1-20e_coco.py" +model = dict(backbone=dict(type="ResNeXt", groups=64, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"))) diff --git a/mmpose/configs/mmdet/instaboost/cascade-mask-rcnn_r101_fpn_instaboost-4x_coco.py b/mmpose/configs/mmdet/instaboost/cascade-mask-rcnn_r101_fpn_instaboost-4x_coco.py index 53e33b890cad86fcc64e6ea6eefe39138241c8e7..8b7ca2d8e53078a5cd9e9f7e1b35e4391dab1bff 100644 --- a/mmpose/configs/mmdet/instaboost/cascade-mask-rcnn_r101_fpn_instaboost-4x_coco.py +++ b/mmpose/configs/mmdet/instaboost/cascade-mask-rcnn_r101_fpn_instaboost-4x_coco.py @@ -1,7 +1,3 @@ -_base_ = './cascade-mask-rcnn_r50_fpn_instaboost-4x_coco.py' +_base_ = "./cascade-mask-rcnn_r50_fpn_instaboost-4x_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/instaboost/cascade-mask-rcnn_r50_fpn_instaboost-4x_coco.py b/mmpose/configs/mmdet/instaboost/cascade-mask-rcnn_r50_fpn_instaboost-4x_coco.py index f7736cf5756676944c543b7e8412997ac81c2745..64195635278d943769b42361a78d5179574c9d79 100644 --- a/mmpose/configs/mmdet/instaboost/cascade-mask-rcnn_r50_fpn_instaboost-4x_coco.py +++ b/mmpose/configs/mmdet/instaboost/cascade-mask-rcnn_r50_fpn_instaboost-4x_coco.py @@ -1,10 +1,10 @@ -_base_ = '../cascade_rcnn/cascade-mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "../cascade_rcnn/cascade-mask-rcnn_r50_fpn_1x_coco.py" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), dict( - type='InstaBoost', - action_candidate=('normal', 'horizontal', 'skip'), + type="InstaBoost", + action_candidate=("normal", "horizontal", "skip"), action_prob=(1, 0, 0), scale=(0.8, 1.2), dx=15, @@ -12,11 +12,12 @@ train_pipeline = [ theta=(-1, 1), color_prob=0.5, hflag=False, - aug_ratio=0.5), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + aug_ratio=0.5, + ), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) @@ -24,15 +25,8 @@ train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) max_epochs = 48 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[32, 44], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[32, 44], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/instaboost/cascade-mask-rcnn_x101-64x4d_fpn_instaboost-4x_coco.py b/mmpose/configs/mmdet/instaboost/cascade-mask-rcnn_x101-64x4d_fpn_instaboost-4x_coco.py index c7938d9e00e3a9c030b788ca83b1a6ddee208aed..6b5edd118ef9ab520fe6c6d4ac7768a07822b178 100644 --- a/mmpose/configs/mmdet/instaboost/cascade-mask-rcnn_x101-64x4d_fpn_instaboost-4x_coco.py +++ b/mmpose/configs/mmdet/instaboost/cascade-mask-rcnn_x101-64x4d_fpn_instaboost-4x_coco.py @@ -1,14 +1,15 @@ -_base_ = './cascade-mask-rcnn_r50_fpn_instaboost-4x_coco.py' +_base_ = "./cascade-mask-rcnn_r50_fpn_instaboost-4x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/instaboost/mask-rcnn_r101_fpn_instaboost-4x_coco.py b/mmpose/configs/mmdet/instaboost/mask-rcnn_r101_fpn_instaboost-4x_coco.py index 55bfa9fefa4db9d6d69fb3c4a285d04592168398..189aed89e5e3d3b9e99f4e6e7c1b0f323c4653bd 100644 --- a/mmpose/configs/mmdet/instaboost/mask-rcnn_r101_fpn_instaboost-4x_coco.py +++ b/mmpose/configs/mmdet/instaboost/mask-rcnn_r101_fpn_instaboost-4x_coco.py @@ -1,6 +1,2 @@ -_base_ = './mask-rcnn_r50_fpn_instaboost-4x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./mask-rcnn_r50_fpn_instaboost-4x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/instaboost/mask-rcnn_r50_fpn_instaboost-4x_coco.py b/mmpose/configs/mmdet/instaboost/mask-rcnn_r50_fpn_instaboost-4x_coco.py index 0a8c9be81f03f98f97975aca47922575555e3844..a2d37bba79c314e122262341b6bec40c537c1382 100644 --- a/mmpose/configs/mmdet/instaboost/mask-rcnn_r50_fpn_instaboost-4x_coco.py +++ b/mmpose/configs/mmdet/instaboost/mask-rcnn_r50_fpn_instaboost-4x_coco.py @@ -1,10 +1,10 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), dict( - type='InstaBoost', - action_candidate=('normal', 'horizontal', 'skip'), + type="InstaBoost", + action_candidate=("normal", "horizontal", "skip"), action_prob=(1, 0, 0), scale=(0.8, 1.2), dx=15, @@ -12,11 +12,12 @@ train_pipeline = [ theta=(-1, 1), color_prob=0.5, hflag=False, - aug_ratio=0.5), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + aug_ratio=0.5, + ), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) @@ -24,15 +25,8 @@ train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) max_epochs = 48 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[32, 44], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[32, 44], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/instaboost/mask-rcnn_x101-64x4d_fpn_instaboost-4x_coco.py b/mmpose/configs/mmdet/instaboost/mask-rcnn_x101-64x4d_fpn_instaboost-4x_coco.py index 9ba2ada6011dd77ea2dcac2133bef8d92e522381..68754d51474735e222e8abbef862284e2d0b1cd7 100644 --- a/mmpose/configs/mmdet/instaboost/mask-rcnn_x101-64x4d_fpn_instaboost-4x_coco.py +++ b/mmpose/configs/mmdet/instaboost/mask-rcnn_x101-64x4d_fpn_instaboost-4x_coco.py @@ -1,14 +1,15 @@ -_base_ = './mask-rcnn_r50_fpn_instaboost-4x_coco.py' +_base_ = "./mask-rcnn_r50_fpn_instaboost-4x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/lad/lad_r101-paa-r50_fpn_2xb8_coco_1x.py b/mmpose/configs/mmdet/lad/lad_r101-paa-r50_fpn_2xb8_coco_1x.py index d61d08638a073f3dad71d7499221e3ef62ff90f3..18cced0df53718d5468448106303237c080d1b5f 100644 --- a/mmpose/configs/mmdet/lad/lad_r101-paa-r50_fpn_2xb8_coco_1x.py +++ b/mmpose/configs/mmdet/lad/lad_r101-paa-r50_fpn_2xb8_coco_1x.py @@ -1,38 +1,26 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] -teacher_ckpt = 'https://download.openmmlab.com/mmdetection/v2.0/paa/paa_r50_fpn_1x_coco/paa_r50_fpn_1x_coco_20200821-936edec3.pth' # noqa +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] +teacher_ckpt = "https://download.openmmlab.com/mmdetection/v2.0/paa/paa_r50_fpn_1x_coco/paa_r50_fpn_1x_coco_20200821-936edec3.pth" # noqa model = dict( - type='LAD', + type="LAD", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), # student backbone=dict( - type='ResNet', + type="ResNet", depth=101, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5), bbox_head=dict( - type='LADHead', + type="LADHead", reg_decoded_bbox=True, score_voting=True, topk=9, @@ -40,45 +28,29 @@ model = dict( in_channels=256, stacked_convs=4, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=1.3), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.5)), + anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=1.3), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=0.5), + ), # teacher teacher_ckpt=teacher_ckpt, teacher_backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch'), + style="pytorch", + ), teacher_neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5), + type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5 + ), teacher_bbox_head=dict( - type='LADHead', + type="LADHead", reg_decoded_bbox=True, score_voting=True, topk=9, @@ -86,42 +58,22 @@ model = dict( in_channels=256, stacked_convs=4, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=1.3), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.5)), + anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=1.3), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=0.5), + ), # training and testing settings train_cfg=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.1, - neg_iou_thr=0.1, - min_pos_iou=0, - ignore_iof_thr=-1), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.1, neg_iou_thr=0.1, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, - debug=False), + debug=False, + ), test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - score_voting=True, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + nms_pre=1000, min_bbox_size=0, score_thr=0.05, score_voting=True, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100 + ), +) train_dataloader = dict(batch_size=8, num_workers=4) -optim_wrapper = dict(type='AmpOptimWrapper', optimizer=dict(lr=0.01)) +optim_wrapper = dict(type="AmpOptimWrapper", optimizer=dict(lr=0.01)) diff --git a/mmpose/configs/mmdet/lad/lad_r50-paa-r101_fpn_2xb8_coco_1x.py b/mmpose/configs/mmdet/lad/lad_r50-paa-r101_fpn_2xb8_coco_1x.py index f7eaf2bfba1c41b42836e94ffe2714978dffd20a..68bd4a55947472137a337a3388add9bd383c6217 100644 --- a/mmpose/configs/mmdet/lad/lad_r50-paa-r101_fpn_2xb8_coco_1x.py +++ b/mmpose/configs/mmdet/lad/lad_r50-paa-r101_fpn_2xb8_coco_1x.py @@ -1,37 +1,26 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] -teacher_ckpt = 'http://download.openmmlab.com/mmdetection/v2.0/paa/paa_r101_fpn_1x_coco/paa_r101_fpn_1x_coco_20200821-0a1825a4.pth' # noqa +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] +teacher_ckpt = "http://download.openmmlab.com/mmdetection/v2.0/paa/paa_r101_fpn_1x_coco/paa_r101_fpn_1x_coco_20200821-0a1825a4.pth" # noqa model = dict( - type='LAD', + type="LAD", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), # student backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5), bbox_head=dict( - type='LADHead', + type="LADHead", reg_decoded_bbox=True, score_voting=True, topk=9, @@ -39,45 +28,29 @@ model = dict( in_channels=256, stacked_convs=4, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=1.3), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.5)), + anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=1.3), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=0.5), + ), # teacher teacher_ckpt=teacher_ckpt, teacher_backbone=dict( - type='ResNet', + type="ResNet", depth=101, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch'), + style="pytorch", + ), teacher_neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5), + type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5 + ), teacher_bbox_head=dict( - type='LADHead', + type="LADHead", reg_decoded_bbox=True, score_voting=True, topk=9, @@ -85,42 +58,22 @@ model = dict( in_channels=256, stacked_convs=4, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=1.3), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.5)), + anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=1.3), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=0.5), + ), # training and testing settings train_cfg=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.1, - neg_iou_thr=0.1, - min_pos_iou=0, - ignore_iof_thr=-1), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.1, neg_iou_thr=0.1, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, - debug=False), + debug=False, + ), test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - score_voting=True, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + nms_pre=1000, min_bbox_size=0, score_thr=0.05, score_voting=True, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100 + ), +) train_dataloader = dict(batch_size=8, num_workers=4) -optim_wrapper = dict(type='AmpOptimWrapper', optimizer=dict(lr=0.01)) +optim_wrapper = dict(type="AmpOptimWrapper", optimizer=dict(lr=0.01)) diff --git a/mmpose/configs/mmdet/ld/ld_r101-gflv1-r101-dcn_fpn_2x_coco.py b/mmpose/configs/mmdet/ld/ld_r101-gflv1-r101-dcn_fpn_2x_coco.py index a7e928bdc2325825d836bd939f163d71e972c238..e943c0d1504466cd0faaae80ca374c677f31da35 100644 --- a/mmpose/configs/mmdet/ld/ld_r101-gflv1-r101-dcn_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/ld/ld_r101-gflv1-r101-dcn_fpn_2x_coco.py @@ -1,49 +1,35 @@ -_base_ = ['./ld_r18-gflv1-r101_fpn_1x_coco.py'] -teacher_ckpt = 'https://download.openmmlab.com/mmdetection/v2.0/gfl/gfl_r101_fpn_dconv_c3-c5_mstrain_2x_coco/gfl_r101_fpn_dconv_c3-c5_mstrain_2x_coco_20200630_102002-134b07df.pth' # noqa +_base_ = ["./ld_r18-gflv1-r101_fpn_1x_coco.py"] +teacher_ckpt = "https://download.openmmlab.com/mmdetection/v2.0/gfl/gfl_r101_fpn_dconv_c3-c5_mstrain_2x_coco/gfl_r101_fpn_dconv_c3-c5_mstrain_2x_coco_20200630_102002-134b07df.pth" # noqa model = dict( - teacher_config='configs/gfl/gfl_r101-dconv-c3-c5_fpn_ms-2x_coco.py', + teacher_config="configs/gfl/gfl_r101-dconv-c3-c5_fpn_ms-2x_coco.py", teacher_ckpt=teacher_ckpt, backbone=dict( - type='ResNet', + type="ResNet", depth=101, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5), +) max_epochs = 24 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) # multi-scale training train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', scale=[(1333, 480), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=[(1333, 480), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/ld/ld_r18-gflv1-r101_fpn_1x_coco.py b/mmpose/configs/mmdet/ld/ld_r18-gflv1-r101_fpn_1x_coco.py index f18bb1d3620f3caecdc870ea8a3346424729225c..16df189827ab88c219dea8f54d3ad8e8410655d7 100644 --- a/mmpose/configs/mmdet/ld/ld_r18-gflv1-r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/ld/ld_r18-gflv1-r101_fpn_1x_coco.py @@ -1,70 +1,40 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] -teacher_ckpt = 'https://download.openmmlab.com/mmdetection/v2.0/gfl/gfl_r101_fpn_mstrain_2x_coco/gfl_r101_fpn_mstrain_2x_coco_20200629_200126-dd12f847.pth' # noqa +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] +teacher_ckpt = "https://download.openmmlab.com/mmdetection/v2.0/gfl/gfl_r101_fpn_mstrain_2x_coco/gfl_r101_fpn_mstrain_2x_coco_20200629_200126-dd12f847.pth" # noqa model = dict( - type='KnowledgeDistillationSingleStageDetector', + type="KnowledgeDistillationSingleStageDetector", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), - teacher_config='configs/gfl/gfl_r101_fpn_ms-2x_coco.py', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), + teacher_config="configs/gfl/gfl_r101_fpn_ms-2x_coco.py", teacher_ckpt=teacher_ckpt, backbone=dict( - type='ResNet', + type="ResNet", depth=18, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18')), - neck=dict( - type='FPN', - in_channels=[64, 128, 256, 512], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet18"), + ), + neck=dict(type="FPN", in_channels=[64, 128, 256, 512], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5), bbox_head=dict( - type='LDHead', + type="LDHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128]), - loss_cls=dict( - type='QualityFocalLoss', - use_sigmoid=True, - beta=2.0, - loss_weight=1.0), - loss_dfl=dict(type='DistributionFocalLoss', loss_weight=0.25), - loss_ld=dict( - type='KnowledgeDistillationKLDivLoss', loss_weight=0.25, T=10), + anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), + loss_cls=dict(type="QualityFocalLoss", use_sigmoid=True, beta=2.0, loss_weight=1.0), + loss_dfl=dict(type="DistributionFocalLoss", loss_weight=0.25), + loss_ld=dict(type="KnowledgeDistillationKLDivLoss", loss_weight=0.25, T=10), reg_max=16, - loss_bbox=dict(type='GIoULoss', loss_weight=2.0)), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), + ), # training and testing settings - train_cfg=dict( - assigner=dict(type='ATSSAssigner', topk=9), - allowed_border=-1, - pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + train_cfg=dict(assigner=dict(type="ATSSAssigner", topk=9), allowed_border=-1, pos_weight=-1, debug=False), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/ld/ld_r34-gflv1-r101_fpn_1x_coco.py b/mmpose/configs/mmdet/ld/ld_r34-gflv1-r101_fpn_1x_coco.py index 2198adc82cfc98fca139e120ea0487989ac8bae7..959c7faea0403639f01c527216de314eb82e8d47 100644 --- a/mmpose/configs/mmdet/ld/ld_r34-gflv1-r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/ld/ld_r34-gflv1-r101_fpn_1x_coco.py @@ -1,19 +1,15 @@ -_base_ = ['./ld_r18-gflv1-r101_fpn_1x_coco.py'] +_base_ = ["./ld_r18-gflv1-r101_fpn_1x_coco.py"] model = dict( backbone=dict( - type='ResNet', + type="ResNet", depth=34, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet34')), - neck=dict( - type='FPN', - in_channels=[64, 128, 256, 512], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet34"), + ), + neck=dict(type="FPN", in_channels=[64, 128, 256, 512], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5), +) diff --git a/mmpose/configs/mmdet/ld/ld_r50-gflv1-r101_fpn_1x_coco.py b/mmpose/configs/mmdet/ld/ld_r50-gflv1-r101_fpn_1x_coco.py index 89ab5796969b88080f96f3afcc24183b0c11c730..6b5e37b747ac76dd5485667a0d03dd720d862774 100644 --- a/mmpose/configs/mmdet/ld/ld_r50-gflv1-r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/ld/ld_r50-gflv1-r101_fpn_1x_coco.py @@ -1,19 +1,15 @@ -_base_ = ['./ld_r18-gflv1-r101_fpn_1x_coco.py'] +_base_ = ["./ld_r18-gflv1-r101_fpn_1x_coco.py"] model = dict( backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5), +) diff --git a/mmpose/configs/mmdet/legacy_1.x/cascade-mask-rcnn_r50_fpn_1x_coco_v1.py b/mmpose/configs/mmdet/legacy_1.x/cascade-mask-rcnn_r50_fpn_1x_coco_v1.py index f948a7a9c10f618438e8ff54bdf3333335577e90..5ce7425dca4994391bff36d9dd568ed16c907686 100644 --- a/mmpose/configs/mmdet/legacy_1.x/cascade-mask-rcnn_r50_fpn_1x_coco_v1.py +++ b/mmpose/configs/mmdet/legacy_1.x/cascade-mask-rcnn_r50_fpn_1x_coco_v1.py @@ -1,78 +1,62 @@ _base_ = [ - '../_base_/models/cascade-mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( - type='CascadeRCNN', + type="CascadeRCNN", backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - anchor_generator=dict(type='LegacyAnchorGenerator', center_offset=0.5), - bbox_coder=dict( - type='LegacyDeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0])), + anchor_generator=dict(type="LegacyAnchorGenerator", center_offset=0.5), + bbox_coder=dict(type="LegacyDeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + ), roi_head=dict( - bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict( - type='RoIAlign', - output_size=7, - sampling_ratio=2, - aligned=False)), + bbox_roi_extractor=dict(type="SingleRoIExtractor", roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=2, aligned=False)), bbox_head=[ dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", reg_class_agnostic=True, in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='LegacyDeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2])), + bbox_coder=dict(type="LegacyDeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", reg_class_agnostic=True, in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='LegacyDeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.05, 0.05, 0.1, 0.1])), + bbox_coder=dict(type="LegacyDeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", reg_class_agnostic=True, in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, bbox_coder=dict( - type='LegacyDeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.033, 0.033, 0.067, 0.067])), + type="LegacyDeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067] + ), + ), ], mask_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict( - type='RoIAlign', - output_size=14, - sampling_ratio=2, - aligned=False)))) + type="SingleRoIExtractor", roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=2, aligned=False) + ), + ), +) diff --git a/mmpose/configs/mmdet/legacy_1.x/faster-rcnn_r50_fpn_1x_coco_v1.py b/mmpose/configs/mmdet/legacy_1.x/faster-rcnn_r50_fpn_1x_coco_v1.py index 66bf9713793c4a0a951273d037253f930fbb31a6..66067434636fb7049861e24adfb5f5720c00cf91 100644 --- a/mmpose/configs/mmdet/legacy_1.x/faster-rcnn_r50_fpn_1x_coco_v1.py +++ b/mmpose/configs/mmdet/legacy_1.x/faster-rcnn_r50_fpn_1x_coco_v1.py @@ -1,38 +1,31 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( - type='FasterRCNN', - backbone=dict( - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + type="FasterRCNN", + backbone=dict(init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50")), rpn_head=dict( - type='RPNHead', + type="RPNHead", anchor_generator=dict( - type='LegacyAnchorGenerator', - center_offset=0.5, - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - bbox_coder=dict(type='LegacyDeltaXYWHBBoxCoder'), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), + type="LegacyAnchorGenerator", center_offset=0.5, scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64] + ), + bbox_coder=dict(type="LegacyDeltaXYWHBBoxCoder"), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.0), + ), roi_head=dict( - type='StandardRoIHead', + type="StandardRoIHead", bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict( - type='RoIAlign', - output_size=7, - sampling_ratio=2, - aligned=False), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=2, aligned=False), out_channels=256, - featmap_strides=[4, 8, 16, 32]), - bbox_head=dict( - bbox_coder=dict(type='LegacyDeltaXYWHBBoxCoder'), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))), + featmap_strides=[4, 8, 16, 32], + ), + bbox_head=dict(bbox_coder=dict(type="LegacyDeltaXYWHBBoxCoder"), loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0)), + ), # model training and testing settings - train_cfg=dict( - rpn_proposal=dict(max_per_img=2000), - rcnn=dict(assigner=dict(match_low_quality=True)))) + train_cfg=dict(rpn_proposal=dict(max_per_img=2000), rcnn=dict(assigner=dict(match_low_quality=True))), +) diff --git a/mmpose/configs/mmdet/legacy_1.x/mask-rcnn_r50_fpn_1x_coco_v1.py b/mmpose/configs/mmdet/legacy_1.x/mask-rcnn_r50_fpn_1x_coco_v1.py index 690802598493e64821aaf98111161e36b169e475..5a6f8e2c0c09e17ef0114d8424a9c13e54194009 100644 --- a/mmpose/configs/mmdet/legacy_1.x/mask-rcnn_r50_fpn_1x_coco_v1.py +++ b/mmpose/configs/mmdet/legacy_1.x/mask-rcnn_r50_fpn_1x_coco_v1.py @@ -1,34 +1,23 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( rpn_head=dict( - anchor_generator=dict(type='LegacyAnchorGenerator', center_offset=0.5), - bbox_coder=dict(type='LegacyDeltaXYWHBBoxCoder'), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), + anchor_generator=dict(type="LegacyAnchorGenerator", center_offset=0.5), + bbox_coder=dict(type="LegacyDeltaXYWHBBoxCoder"), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.0), + ), roi_head=dict( - bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict( - type='RoIAlign', - output_size=7, - sampling_ratio=2, - aligned=False)), + bbox_roi_extractor=dict(type="SingleRoIExtractor", roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=2, aligned=False)), mask_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict( - type='RoIAlign', - output_size=14, - sampling_ratio=2, - aligned=False)), - bbox_head=dict( - bbox_coder=dict(type='LegacyDeltaXYWHBBoxCoder'), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))), - + type="SingleRoIExtractor", roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=2, aligned=False) + ), + bbox_head=dict(bbox_coder=dict(type="LegacyDeltaXYWHBBoxCoder"), loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0)), + ), # model training and testing settings - train_cfg=dict( - rpn_proposal=dict(max_per_img=2000), - rcnn=dict(assigner=dict(match_low_quality=True)))) + train_cfg=dict(rpn_proposal=dict(max_per_img=2000), rcnn=dict(assigner=dict(match_low_quality=True))), +) diff --git a/mmpose/configs/mmdet/legacy_1.x/retinanet_r50-caffe_fpn_1x_coco_v1.py b/mmpose/configs/mmdet/legacy_1.x/retinanet_r50-caffe_fpn_1x_coco_v1.py index 49abc31a002f56147cacf1b7707140a14b784a99..22084daf44cffceb3c756b1b463dc1670494f467 100644 --- a/mmpose/configs/mmdet/legacy_1.x/retinanet_r50-caffe_fpn_1x_coco_v1.py +++ b/mmpose/configs/mmdet/legacy_1.x/retinanet_r50-caffe_fpn_1x_coco_v1.py @@ -1,16 +1,17 @@ -_base_ = './retinanet_r50_fpn_1x_coco_v1.py' +_base_ = "./retinanet_r50_fpn_1x_coco_v1.py" model = dict( data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", # use caffe img_norm mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, - pad_size_divisor=32), + pad_size_divisor=32, + ), backbone=dict( norm_cfg=dict(requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron/resnet50_caffe'))) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron/resnet50_caffe"), + ), +) diff --git a/mmpose/configs/mmdet/legacy_1.x/retinanet_r50_fpn_1x_coco_v1.py b/mmpose/configs/mmdet/legacy_1.x/retinanet_r50_fpn_1x_coco_v1.py index 6198b9717957374ce734ca74de5f54dda44123b9..7bd3148569ceed67b0a05a59537a4bf19bfb5e60 100644 --- a/mmpose/configs/mmdet/legacy_1.x/retinanet_r50_fpn_1x_coco_v1.py +++ b/mmpose/configs/mmdet/legacy_1.x/retinanet_r50_fpn_1x_coco_v1.py @@ -1,17 +1,21 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( bbox_head=dict( - type='RetinaHead', + type="RetinaHead", anchor_generator=dict( - type='LegacyAnchorGenerator', + type="LegacyAnchorGenerator", center_offset=0.5, octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict(type='LegacyDeltaXYWHBBoxCoder'), - loss_bbox=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0))) + strides=[8, 16, 32, 64, 128], + ), + bbox_coder=dict(type="LegacyDeltaXYWHBBoxCoder"), + loss_bbox=dict(type="SmoothL1Loss", beta=0.11, loss_weight=1.0), + ) +) diff --git a/mmpose/configs/mmdet/legacy_1.x/ssd300_coco_v1.py b/mmpose/configs/mmdet/legacy_1.x/ssd300_coco_v1.py index e5ffc633a9b4773d7116bed7cbf8bcab7fb3110d..d0a1f5926da9f6906be8499df36a9fd815dd08e2 100644 --- a/mmpose/configs/mmdet/legacy_1.x/ssd300_coco_v1.py +++ b/mmpose/configs/mmdet/legacy_1.x/ssd300_coco_v1.py @@ -1,20 +1,22 @@ _base_ = [ - '../_base_/models/ssd300.py', '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/ssd300.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] # model settings input_size = 300 model = dict( bbox_head=dict( - type='SSDHead', + type="SSDHead", anchor_generator=dict( - type='LegacySSDAnchorGenerator', + type="LegacySSDAnchorGenerator", scale_major=False, input_size=input_size, basesize_ratio_range=(0.15, 0.9), strides=[8, 16, 32, 64, 100, 300], - ratios=[[2], [2, 3], [2, 3], [2, 3], [2], [2]]), - bbox_coder=dict( - type='LegacyDeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]))) + ratios=[[2], [2, 3], [2, 3], [2, 3], [2], [2]], + ), + bbox_coder=dict(type="LegacyDeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + ) +) diff --git a/mmpose/configs/mmdet/libra_rcnn/libra-fast-rcnn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/libra_rcnn/libra-fast-rcnn_r50_fpn_1x_coco.py index 2efe440ce361d5bc5855c76001a5ff6b661a568a..8762d4071315c795513bcdf308164f7cd18efd9d 100644 --- a/mmpose/configs/mmdet/libra_rcnn/libra-fast-rcnn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/libra_rcnn/libra-fast-rcnn_r50_fpn_1x_coco.py @@ -1,52 +1,33 @@ -_base_ = '../fast_rcnn/fast-rcnn_r50_fpn_1x_coco.py' +_base_ = "../fast_rcnn/fast-rcnn_r50_fpn_1x_coco.py" # model settings model = dict( neck=[ - dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), - dict( - type='BFP', - in_channels=256, - num_levels=5, - refine_level=2, - refine_type='non_local') + dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), + dict(type="BFP", in_channels=256, num_levels=5, refine_level=2, refine_type="non_local"), ], - roi_head=dict( - bbox_head=dict( - loss_bbox=dict( - _delete_=True, - type='BalancedL1Loss', - alpha=0.5, - gamma=1.5, - beta=1.0, - loss_weight=1.0))), + roi_head=dict(bbox_head=dict(loss_bbox=dict(_delete_=True, type="BalancedL1Loss", alpha=0.5, gamma=1.5, beta=1.0, loss_weight=1.0))), # model training and testing settings train_cfg=dict( rcnn=dict( sampler=dict( _delete_=True, - type='CombinedSampler', + type="CombinedSampler", num=512, pos_fraction=0.25, add_gt_as_proposals=True, - pos_sampler=dict(type='InstanceBalancedPosSampler'), - neg_sampler=dict( - type='IoUBalancedNegSampler', - floor_thr=-1, - floor_fraction=0, - num_bins=3))))) + pos_sampler=dict(type="InstanceBalancedPosSampler"), + neg_sampler=dict(type="IoUBalancedNegSampler", floor_thr=-1, floor_fraction=0, num_bins=3), + ) + ) + ), +) # MMEngine support the following two ways, users can choose # according to convenience # _base_.train_dataloader.dataset.proposal_file = 'libra_proposals/rpn_r50_fpn_1x_train2017.pkl' # noqa -train_dataloader = dict( - dataset=dict(proposal_file='libra_proposals/rpn_r50_fpn_1x_train2017.pkl')) +train_dataloader = dict(dataset=dict(proposal_file="libra_proposals/rpn_r50_fpn_1x_train2017.pkl")) # _base_.val_dataloader.dataset.proposal_file = 'libra_proposals/rpn_r50_fpn_1x_val2017.pkl' # noqa # test_dataloader = _base_.val_dataloader -val_dataloader = dict( - dataset=dict(proposal_file='libra_proposals/rpn_r50_fpn_1x_val2017.pkl')) +val_dataloader = dict(dataset=dict(proposal_file="libra_proposals/rpn_r50_fpn_1x_val2017.pkl")) test_dataloader = val_dataloader diff --git a/mmpose/configs/mmdet/libra_rcnn/libra-faster-rcnn_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/libra_rcnn/libra-faster-rcnn_r101_fpn_1x_coco.py index 985df64cb437e233f76235ee9be4b788ec8f701c..85a68e79827dec2db7407f2d109735d002d96704 100644 --- a/mmpose/configs/mmdet/libra_rcnn/libra-faster-rcnn_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/libra_rcnn/libra-faster-rcnn_r101_fpn_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './libra-faster-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./libra-faster-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/libra_rcnn/libra-faster-rcnn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/libra_rcnn/libra-faster-rcnn_r50_fpn_1x_coco.py index f9ee507d26338b49eca004ee195fd2b1954c32d9..5f1f5bbd23e9b950d02df92686f48a65d0f84ddf 100644 --- a/mmpose/configs/mmdet/libra_rcnn/libra-faster-rcnn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/libra_rcnn/libra-faster-rcnn_r50_fpn_1x_coco.py @@ -1,41 +1,24 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" # model settings model = dict( neck=[ - dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), - dict( - type='BFP', - in_channels=256, - num_levels=5, - refine_level=2, - refine_type='non_local') + dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), + dict(type="BFP", in_channels=256, num_levels=5, refine_level=2, refine_type="non_local"), ], - roi_head=dict( - bbox_head=dict( - loss_bbox=dict( - _delete_=True, - type='BalancedL1Loss', - alpha=0.5, - gamma=1.5, - beta=1.0, - loss_weight=1.0))), + roi_head=dict(bbox_head=dict(loss_bbox=dict(_delete_=True, type="BalancedL1Loss", alpha=0.5, gamma=1.5, beta=1.0, loss_weight=1.0))), # model training and testing settings train_cfg=dict( rpn=dict(sampler=dict(neg_pos_ub=5), allowed_border=-1), rcnn=dict( sampler=dict( _delete_=True, - type='CombinedSampler', + type="CombinedSampler", num=512, pos_fraction=0.25, add_gt_as_proposals=True, - pos_sampler=dict(type='InstanceBalancedPosSampler'), - neg_sampler=dict( - type='IoUBalancedNegSampler', - floor_thr=-1, - floor_fraction=0, - num_bins=3))))) + pos_sampler=dict(type="InstanceBalancedPosSampler"), + neg_sampler=dict(type="IoUBalancedNegSampler", floor_thr=-1, floor_fraction=0, num_bins=3), + ) + ), + ), +) diff --git a/mmpose/configs/mmdet/libra_rcnn/libra-faster-rcnn_x101-64x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/libra_rcnn/libra-faster-rcnn_x101-64x4d_fpn_1x_coco.py index 158e238ed14d9c56b7d02d17f0061b08d4116282..f6286a305c033e573955718c73684979fa3afb2a 100644 --- a/mmpose/configs/mmdet/libra_rcnn/libra-faster-rcnn_x101-64x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/libra_rcnn/libra-faster-rcnn_x101-64x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './libra-faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "./libra-faster-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/libra_rcnn/libra-retinanet_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/libra_rcnn/libra-retinanet_r50_fpn_1x_coco.py index be2742098fb8f1e46bbb16c9d3e2e20c2e3083aa..b7da2f302ee3da8cd1ca479305cdbf8b4b710721 100644 --- a/mmpose/configs/mmdet/libra_rcnn/libra-retinanet_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/libra_rcnn/libra-retinanet_r50_fpn_1x_coco.py @@ -1,26 +1,9 @@ -_base_ = '../retinanet/retinanet_r50_fpn_1x_coco.py' +_base_ = "../retinanet/retinanet_r50_fpn_1x_coco.py" # model settings model = dict( neck=[ - dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_input', - num_outs=5), - dict( - type='BFP', - in_channels=256, - num_levels=5, - refine_level=1, - refine_type='non_local') + dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_input", num_outs=5), + dict(type="BFP", in_channels=256, num_levels=5, refine_level=1, refine_type="non_local"), ], - bbox_head=dict( - loss_bbox=dict( - _delete_=True, - type='BalancedL1Loss', - alpha=0.5, - gamma=1.5, - beta=0.11, - loss_weight=1.0))) + bbox_head=dict(loss_bbox=dict(_delete_=True, type="BalancedL1Loss", alpha=0.5, gamma=1.5, beta=0.11, loss_weight=1.0)), +) diff --git a/mmpose/configs/mmdet/lvis/mask-rcnn_r101_fpn_sample1e-3_ms-1x_lvis-v1.py b/mmpose/configs/mmdet/lvis/mask-rcnn_r101_fpn_sample1e-3_ms-1x_lvis-v1.py index 3994d75a81aaa5368bd42c591fa770b05b665e25..7ac1b9c186785dea584b0b5a52986e98f468140b 100644 --- a/mmpose/configs/mmdet/lvis/mask-rcnn_r101_fpn_sample1e-3_ms-1x_lvis-v1.py +++ b/mmpose/configs/mmdet/lvis/mask-rcnn_r101_fpn_sample1e-3_ms-1x_lvis-v1.py @@ -1,6 +1,2 @@ -_base_ = './mask-rcnn_r50_fpn_sample1e-3_ms-1x_lvis-v1.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./mask-rcnn_r50_fpn_sample1e-3_ms-1x_lvis-v1.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/lvis/mask-rcnn_r101_fpn_sample1e-3_ms-2x_lvis-v0.5.py b/mmpose/configs/mmdet/lvis/mask-rcnn_r101_fpn_sample1e-3_ms-2x_lvis-v0.5.py index ed8b3639a0046e14d5c11a98f9d7dc38eb4badec..de6603a34ba44d65336303b9a986a8673498f78d 100644 --- a/mmpose/configs/mmdet/lvis/mask-rcnn_r101_fpn_sample1e-3_ms-2x_lvis-v0.5.py +++ b/mmpose/configs/mmdet/lvis/mask-rcnn_r101_fpn_sample1e-3_ms-2x_lvis-v0.5.py @@ -1,6 +1,2 @@ -_base_ = './mask-rcnn_r50_fpn_sample1e-3_ms-2x_lvis-v0.5.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./mask-rcnn_r50_fpn_sample1e-3_ms-2x_lvis-v0.5.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/lvis/mask-rcnn_r50_fpn_sample1e-3_ms-1x_lvis-v1.py b/mmpose/configs/mmdet/lvis/mask-rcnn_r50_fpn_sample1e-3_ms-1x_lvis-v1.py index cdd3683e3005dd09ada78827825da516bfd4c66e..c0aa3024fb939cb5d7410212aef5e101123ebb41 100644 --- a/mmpose/configs/mmdet/lvis/mask-rcnn_r50_fpn_sample1e-3_ms-1x_lvis-v1.py +++ b/mmpose/configs/mmdet/lvis/mask-rcnn_r50_fpn_sample1e-3_ms-1x_lvis-v1.py @@ -1,13 +1,16 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/lvis_v1_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/lvis_v1_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( - roi_head=dict( - bbox_head=dict(num_classes=1203), mask_head=dict(num_classes=1203)), + roi_head=dict(bbox_head=dict(num_classes=1203), mask_head=dict(num_classes=1203)), test_cfg=dict( rcnn=dict( score_thr=0.0001, # LVIS allows up to 300 - max_per_img=300))) + max_per_img=300, + ) + ), +) diff --git a/mmpose/configs/mmdet/lvis/mask-rcnn_r50_fpn_sample1e-3_ms-2x_lvis-v0.5.py b/mmpose/configs/mmdet/lvis/mask-rcnn_r50_fpn_sample1e-3_ms-2x_lvis-v0.5.py index b36b6c17fef7da3646654e494fa715302b1b050e..e56e671f9fa1cb7827999849dac710365f2e168d 100644 --- a/mmpose/configs/mmdet/lvis/mask-rcnn_r50_fpn_sample1e-3_ms-2x_lvis-v0.5.py +++ b/mmpose/configs/mmdet/lvis/mask-rcnn_r50_fpn_sample1e-3_ms-2x_lvis-v0.5.py @@ -1,13 +1,16 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/lvis_v0.5_instance.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/lvis_v0.5_instance.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] model = dict( - roi_head=dict( - bbox_head=dict(num_classes=1230), mask_head=dict(num_classes=1230)), + roi_head=dict(bbox_head=dict(num_classes=1230), mask_head=dict(num_classes=1230)), test_cfg=dict( rcnn=dict( score_thr=0.0001, # LVIS allows up to 300 - max_per_img=300))) + max_per_img=300, + ) + ), +) diff --git a/mmpose/configs/mmdet/lvis/mask-rcnn_x101-32x4d_fpn_sample1e-3_ms-1x_lvis-v1.py b/mmpose/configs/mmdet/lvis/mask-rcnn_x101-32x4d_fpn_sample1e-3_ms-1x_lvis-v1.py index 9da3ab6db04ec6ee772202270a47179171a9d13c..7faa9331c5d60f569a3fe67953c833340e02ffed 100644 --- a/mmpose/configs/mmdet/lvis/mask-rcnn_x101-32x4d_fpn_sample1e-3_ms-1x_lvis-v1.py +++ b/mmpose/configs/mmdet/lvis/mask-rcnn_x101-32x4d_fpn_sample1e-3_ms-1x_lvis-v1.py @@ -1,14 +1,15 @@ -_base_ = './mask-rcnn_r50_fpn_sample1e-3_ms-1x_lvis-v1.py' +_base_ = "./mask-rcnn_r50_fpn_sample1e-3_ms-1x_lvis-v1.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/lvis/mask-rcnn_x101-32x4d_fpn_sample1e-3_ms-2x_lvis-v0.5.py b/mmpose/configs/mmdet/lvis/mask-rcnn_x101-32x4d_fpn_sample1e-3_ms-2x_lvis-v0.5.py index 9a097c94c7e2d7c7b583027ce6000aba8205d490..16afa369dfb5ee0db835f1b4c3da1c004dc77f27 100644 --- a/mmpose/configs/mmdet/lvis/mask-rcnn_x101-32x4d_fpn_sample1e-3_ms-2x_lvis-v0.5.py +++ b/mmpose/configs/mmdet/lvis/mask-rcnn_x101-32x4d_fpn_sample1e-3_ms-2x_lvis-v0.5.py @@ -1,14 +1,15 @@ -_base_ = './mask-rcnn_r50_fpn_sample1e-3_ms-2x_lvis-v0.5.py' +_base_ = "./mask-rcnn_r50_fpn_sample1e-3_ms-2x_lvis-v0.5.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/lvis/mask-rcnn_x101-64x4d_fpn_sample1e-3_ms-1x_lvis-v1.py b/mmpose/configs/mmdet/lvis/mask-rcnn_x101-64x4d_fpn_sample1e-3_ms-1x_lvis-v1.py index b0819b3ec60d710205a643305edd2a27db977d9b..94d5db87d4fa5c5c0cdb156bb837086a51467016 100644 --- a/mmpose/configs/mmdet/lvis/mask-rcnn_x101-64x4d_fpn_sample1e-3_ms-1x_lvis-v1.py +++ b/mmpose/configs/mmdet/lvis/mask-rcnn_x101-64x4d_fpn_sample1e-3_ms-1x_lvis-v1.py @@ -1,14 +1,15 @@ -_base_ = './mask-rcnn_r50_fpn_sample1e-3_ms-1x_lvis-v1.py' +_base_ = "./mask-rcnn_r50_fpn_sample1e-3_ms-1x_lvis-v1.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/lvis/mask-rcnn_x101-64x4d_fpn_sample1e-3_ms-2x_lvis-v0.5.py b/mmpose/configs/mmdet/lvis/mask-rcnn_x101-64x4d_fpn_sample1e-3_ms-2x_lvis-v0.5.py index 9d2720089181f066bcaa04b73903836b64b97bb9..2277dc4ebe63d5e5cb718fe2bace8b82074d373d 100644 --- a/mmpose/configs/mmdet/lvis/mask-rcnn_x101-64x4d_fpn_sample1e-3_ms-2x_lvis-v0.5.py +++ b/mmpose/configs/mmdet/lvis/mask-rcnn_x101-64x4d_fpn_sample1e-3_ms-2x_lvis-v0.5.py @@ -1,14 +1,15 @@ -_base_ = './mask-rcnn_r50_fpn_sample1e-3_ms-2x_lvis-v0.5.py' +_base_ = "./mask-rcnn_r50_fpn_sample1e-3_ms-2x_lvis-v0.5.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/mask2former/mask2former_r101_8xb2-lsj-50e_coco-panoptic.py b/mmpose/configs/mmdet/mask2former/mask2former_r101_8xb2-lsj-50e_coco-panoptic.py index 66685a2fca9c0e165ba0024e242d5eabf5d565c9..b094449b8fc323354b7bbe9d30e78c38817024dc 100644 --- a/mmpose/configs/mmdet/mask2former/mask2former_r101_8xb2-lsj-50e_coco-panoptic.py +++ b/mmpose/configs/mmdet/mask2former/mask2former_r101_8xb2-lsj-50e_coco-panoptic.py @@ -1,7 +1,3 @@ -_base_ = './mask2former_r50_8xb2-lsj-50e_coco-panoptic.py' +_base_ = "./mask2former_r50_8xb2-lsj-50e_coco-panoptic.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/mask2former/mask2former_r101_8xb2-lsj-50e_coco.py b/mmpose/configs/mmdet/mask2former/mask2former_r101_8xb2-lsj-50e_coco.py index f4c29906d9fc6ce47ce928fb73dcb1bb6c6f7ba9..9e40ffec8209cac55bdff523e43d693ef5173c02 100644 --- a/mmpose/configs/mmdet/mask2former/mask2former_r101_8xb2-lsj-50e_coco.py +++ b/mmpose/configs/mmdet/mask2former/mask2former_r101_8xb2-lsj-50e_coco.py @@ -1,7 +1,3 @@ -_base_ = ['./mask2former_r50_8xb2-lsj-50e_coco.py'] +_base_ = ["./mask2former_r50_8xb2-lsj-50e_coco.py"] -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/mask2former/mask2former_r50_8xb2-lsj-50e_coco-panoptic.py b/mmpose/configs/mmdet/mask2former/mask2former_r50_8xb2-lsj-50e_coco-panoptic.py index c53e981bf0d5081c3735676be922f64298a8fc80..a11d2d1fcebacb602867431eadb00e0d3512ed0b 100644 --- a/mmpose/configs/mmdet/mask2former/mask2former_r50_8xb2-lsj-50e_coco-panoptic.py +++ b/mmpose/configs/mmdet/mask2former/mask2former_r50_8xb2-lsj-50e_coco-panoptic.py @@ -1,19 +1,10 @@ -_base_ = [ - '../_base_/datasets/coco_panoptic.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_panoptic.py", "../_base_/default_runtime.py"] image_size = (1024, 1024) batch_augments = [ - dict( - type='BatchFixedSizePad', - size=image_size, - img_pad_value=0, - pad_mask=True, - mask_pad_value=0, - pad_seg=True, - seg_pad_value=255) + dict(type="BatchFixedSizePad", size=image_size, img_pad_value=0, pad_mask=True, mask_pad_value=0, pad_seg=True, seg_pad_value=255) ] data_preprocessor = dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, @@ -22,26 +13,28 @@ data_preprocessor = dict( mask_pad_value=0, pad_seg=True, seg_pad_value=255, - batch_augments=batch_augments) + batch_augments=batch_augments, +) num_things_classes = 80 num_stuff_classes = 53 num_classes = num_things_classes + num_stuff_classes model = dict( - type='Mask2Former', + type="Mask2Former", data_preprocessor=data_preprocessor, backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=-1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), panoptic_head=dict( - type='Mask2FormerHead', + type="Mask2FormerHead", in_channels=[256, 512, 1024, 2048], # pass to pixel_decoder inside strides=[4, 8, 16, 32], feat_channels=256, @@ -51,88 +44,62 @@ model = dict( num_queries=100, num_transformer_feat_level=3, pixel_decoder=dict( - type='MSDeformAttnPixelDecoder', + type="MSDeformAttnPixelDecoder", num_outs=3, - norm_cfg=dict(type='GN', num_groups=32), - act_cfg=dict(type='ReLU'), + norm_cfg=dict(type="GN", num_groups=32), + act_cfg=dict(type="ReLU"), encoder=dict( # DeformableDetrTransformerEncoder num_layers=6, layer_cfg=dict( # DeformableDetrTransformerEncoderLayer self_attn_cfg=dict( # MultiScaleDeformableAttention - embed_dims=256, - num_heads=8, - num_levels=3, - num_points=4, - dropout=0.0, - batch_first=True), + embed_dims=256, num_heads=8, num_levels=3, num_points=4, dropout=0.0, batch_first=True + ), ffn_cfg=dict( - embed_dims=256, - feedforward_channels=1024, - num_fcs=2, - ffn_drop=0.0, - act_cfg=dict(type='ReLU', inplace=True)))), - positional_encoding=dict(num_feats=128, normalize=True)), + embed_dims=256, feedforward_channels=1024, num_fcs=2, ffn_drop=0.0, act_cfg=dict(type="ReLU", inplace=True) + ), + ), + ), + positional_encoding=dict(num_feats=128, normalize=True), + ), enforce_decoder_input_project=False, positional_encoding=dict(num_feats=128, normalize=True), transformer_decoder=dict( # Mask2FormerTransformerDecoder return_intermediate=True, num_layers=9, layer_cfg=dict( # Mask2FormerTransformerDecoderLayer - self_attn_cfg=dict( # MultiheadAttention - embed_dims=256, - num_heads=8, - dropout=0.0, - batch_first=True), - cross_attn_cfg=dict( # MultiheadAttention - embed_dims=256, - num_heads=8, - dropout=0.0, - batch_first=True), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, - num_fcs=2, - ffn_drop=0.0, - act_cfg=dict(type='ReLU', inplace=True))), - init_cfg=None), + self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0, batch_first=True), # MultiheadAttention + cross_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0, batch_first=True), # MultiheadAttention + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, num_fcs=2, ffn_drop=0.0, act_cfg=dict(type="ReLU", inplace=True)), + ), + init_cfg=None, + ), loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=2.0, - reduction='mean', - class_weight=[1.0] * num_classes + [0.1]), - loss_mask=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - reduction='mean', - loss_weight=5.0), - loss_dice=dict( - type='DiceLoss', - use_sigmoid=True, - activate=True, - reduction='mean', - naive_dice=True, - eps=1.0, - loss_weight=5.0)), + type="CrossEntropyLoss", use_sigmoid=False, loss_weight=2.0, reduction="mean", class_weight=[1.0] * num_classes + [0.1] + ), + loss_mask=dict(type="CrossEntropyLoss", use_sigmoid=True, reduction="mean", loss_weight=5.0), + loss_dice=dict(type="DiceLoss", use_sigmoid=True, activate=True, reduction="mean", naive_dice=True, eps=1.0, loss_weight=5.0), + ), panoptic_fusion_head=dict( - type='MaskFormerFusionHead', + type="MaskFormerFusionHead", num_things_classes=num_things_classes, num_stuff_classes=num_stuff_classes, loss_panoptic=None, - init_cfg=None), + init_cfg=None, + ), train_cfg=dict( num_points=12544, oversample_ratio=3.0, importance_sample_ratio=0.75, assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='ClassificationCost', weight=2.0), - dict( - type='CrossEntropyLossCost', weight=5.0, use_sigmoid=True), - dict(type='DiceCost', weight=5.0, pred_act=True, eps=1.0) - ]), - sampler=dict(type='MaskPseudoSampler')), + dict(type="ClassificationCost", weight=2.0), + dict(type="CrossEntropyLossCost", weight=5.0, use_sigmoid=True), + dict(type="DiceCost", weight=5.0, pred_act=True, eps=1.0), + ], + ), + sampler=dict(type="MaskPseudoSampler"), + ), test_cfg=dict( panoptic_on=True, # For now, the dataset does not support @@ -144,105 +111,73 @@ model = dict( iou_thr=0.8, # In Mask2Former's panoptic postprocessing, # it will filter mask area where score is less than 0.5 . - filter_low_score=True), - init_cfg=None) + filter_low_score=True, + ), + init_cfg=None, +) # dataset settings -data_root = 'data/coco/' +data_root = "data/coco/" train_pipeline = [ - dict( - type='LoadImageFromFile', - to_float32=True, - backend_args={{_base_.backend_args}}), - dict( - type='LoadPanopticAnnotations', - with_bbox=True, - with_mask=True, - with_seg=True, - backend_args={{_base_.backend_args}}), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", to_float32=True, backend_args={{_base_.backend_args}}), + dict(type="LoadPanopticAnnotations", with_bbox=True, with_mask=True, with_seg=True, backend_args={{_base_.backend_args}}), + dict(type="RandomFlip", prob=0.5), # large scale jittering - dict( - type='RandomResize', - scale=image_size, - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict( - type='RandomCrop', - crop_size=image_size, - crop_type='absolute', - recompute_bbox=True, - allow_negative_crop=True), - dict(type='PackDetInputs') + dict(type="RandomResize", scale=image_size, ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=image_size, crop_type="absolute", recompute_bbox=True, allow_negative_crop=True), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) val_evaluator = [ dict( - type='CocoPanopticMetric', - ann_file=data_root + 'annotations/panoptic_val2017.json', - seg_prefix=data_root + 'annotations/panoptic_val2017/', - backend_args={{_base_.backend_args}}), + type="CocoPanopticMetric", + ann_file=data_root + "annotations/panoptic_val2017.json", + seg_prefix=data_root + "annotations/panoptic_val2017/", + backend_args={{_base_.backend_args}}, + ), dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric=['bbox', 'segm'], - backend_args={{_base_.backend_args}}) + type="CocoMetric", + ann_file=data_root + "annotations/instances_val2017.json", + metric=["bbox", "segm"], + backend_args={{_base_.backend_args}}, + ), ] test_evaluator = val_evaluator # optimizer embed_multi = dict(lr_mult=1.0, decay_mult=0.0) optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict( - type='AdamW', - lr=0.0001, - weight_decay=0.05, - eps=1e-8, - betas=(0.9, 0.999)), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0001, weight_decay=0.05, eps=1e-8, betas=(0.9, 0.999)), paramwise_cfg=dict( custom_keys={ - 'backbone': dict(lr_mult=0.1, decay_mult=1.0), - 'query_embed': embed_multi, - 'query_feat': embed_multi, - 'level_embed': embed_multi, + "backbone": dict(lr_mult=0.1, decay_mult=1.0), + "query_embed": embed_multi, + "query_feat": embed_multi, + "level_embed": embed_multi, }, - norm_decay_mult=0.0), - clip_grad=dict(max_norm=0.01, norm_type=2)) + norm_decay_mult=0.0, + ), + clip_grad=dict(max_norm=0.01, norm_type=2), +) # learning policy max_iters = 368750 -param_scheduler = dict( - type='MultiStepLR', - begin=0, - end=max_iters, - by_epoch=False, - milestones=[327778, 355092], - gamma=0.1) +param_scheduler = dict(type="MultiStepLR", begin=0, end=max_iters, by_epoch=False, milestones=[327778, 355092], gamma=0.1) # Before 365001th iteration, we do evaluation every 5000 iterations. # After 365000th iteration, we do evaluation every 368750 iterations, # which means that we do evaluation at the end of training. interval = 5000 dynamic_intervals = [(max_iters // interval * interval + 1, max_iters)] -train_cfg = dict( - type='IterBasedTrainLoop', - max_iters=max_iters, - val_interval=interval, - dynamic_intervals=dynamic_intervals) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="IterBasedTrainLoop", max_iters=max_iters, val_interval=interval, dynamic_intervals=dynamic_intervals) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") -default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', - by_epoch=False, - save_last=True, - max_keep_ckpts=3, - interval=interval)) -log_processor = dict(type='LogProcessor', window_size=50, by_epoch=False) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", by_epoch=False, save_last=True, max_keep_ckpts=3, interval=interval)) +log_processor = dict(type="LogProcessor", window_size=50, by_epoch=False) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically diff --git a/mmpose/configs/mmdet/mask2former/mask2former_r50_8xb2-lsj-50e_coco.py b/mmpose/configs/mmdet/mask2former/mask2former_r50_8xb2-lsj-50e_coco.py index 24a17f58c54a2e8694a8bf960d10ebc918acdddc..b9b94f1b38f05c121b2fdaad2eca7cfbb408b005 100644 --- a/mmpose/configs/mmdet/mask2former/mask2former_r50_8xb2-lsj-50e_coco.py +++ b/mmpose/configs/mmdet/mask2former/mask2former_r50_8xb2-lsj-50e_coco.py @@ -1,20 +1,12 @@ -_base_ = ['./mask2former_r50_8xb2-lsj-50e_coco-panoptic.py'] +_base_ = ["./mask2former_r50_8xb2-lsj-50e_coco-panoptic.py"] num_things_classes = 80 num_stuff_classes = 0 num_classes = num_things_classes + num_stuff_classes image_size = (1024, 1024) -batch_augments = [ - dict( - type='BatchFixedSizePad', - size=image_size, - img_pad_value=0, - pad_mask=True, - mask_pad_value=0, - pad_seg=False) -] +batch_augments = [dict(type="BatchFixedSizePad", size=image_size, img_pad_value=0, pad_mask=True, mask_pad_value=0, pad_seg=False)] data_preprocessor = dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, @@ -22,79 +14,56 @@ data_preprocessor = dict( pad_mask=True, mask_pad_value=0, pad_seg=False, - batch_augments=batch_augments) + batch_augments=batch_augments, +) model = dict( data_preprocessor=data_preprocessor, panoptic_head=dict( - num_things_classes=num_things_classes, - num_stuff_classes=num_stuff_classes, - loss_cls=dict(class_weight=[1.0] * num_classes + [0.1])), - panoptic_fusion_head=dict( - num_things_classes=num_things_classes, - num_stuff_classes=num_stuff_classes), - test_cfg=dict(panoptic_on=False)) + num_things_classes=num_things_classes, num_stuff_classes=num_stuff_classes, loss_cls=dict(class_weight=[1.0] * num_classes + [0.1]) + ), + panoptic_fusion_head=dict(num_things_classes=num_things_classes, num_stuff_classes=num_stuff_classes), + test_cfg=dict(panoptic_on=False), +) # dataset settings train_pipeline = [ - dict( - type='LoadImageFromFile', - to_float32=True, - backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", to_float32=True, backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomFlip", prob=0.5), # large scale jittering - dict( - type='RandomResize', - scale=image_size, - ratio_range=(0.1, 2.0), - resize_type='Resize', - keep_ratio=True), - dict( - type='RandomCrop', - crop_size=image_size, - crop_type='absolute', - recompute_bbox=True, - allow_negative_crop=True), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-5, 1e-5), by_mask=True), - dict(type='PackDetInputs') + dict(type="RandomResize", scale=image_size, ratio_range=(0.1, 2.0), resize_type="Resize", keep_ratio=True), + dict(type="RandomCrop", crop_size=image_size, crop_type="absolute", recompute_bbox=True, allow_negative_crop=True), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-5, 1e-5), by_mask=True), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict( - type='LoadImageFromFile', - to_float32=True, - backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type="LoadImageFromFile", to_float32=True, backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" train_dataloader = dict( dataset=dict( - type=dataset_type, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), - pipeline=train_pipeline)) + type=dataset_type, ann_file="annotations/instances_train2017.json", data_prefix=dict(img="train2017/"), pipeline=train_pipeline + ) +) val_dataloader = dict( - dataset=dict( - type=dataset_type, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), - pipeline=test_pipeline)) + dataset=dict(type=dataset_type, ann_file="annotations/instances_val2017.json", data_prefix=dict(img="val2017/"), pipeline=test_pipeline) +) test_dataloader = val_dataloader val_evaluator = dict( _delete_=True, - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric=['bbox', 'segm'], + type="CocoMetric", + ann_file=data_root + "annotations/instances_val2017.json", + metric=["bbox", "segm"], format_only=False, - backend_args={{_base_.backend_args}}) + backend_args={{_base_.backend_args}}, +) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/mask2former/mask2former_swin-b-p4-w12-384-in21k_8xb2-lsj-50e_coco-panoptic.py b/mmpose/configs/mmdet/mask2former/mask2former_swin-b-p4-w12-384-in21k_8xb2-lsj-50e_coco-panoptic.py index b275f23175e8d8294b8bb76e9708dd014ef7030b..f61e00bb1f731e1ddca04af9d7bec40ce40250ab 100644 --- a/mmpose/configs/mmdet/mask2former/mask2former_swin-b-p4-w12-384-in21k_8xb2-lsj-50e_coco-panoptic.py +++ b/mmpose/configs/mmdet/mask2former/mask2former_swin-b-p4-w12-384-in21k_8xb2-lsj-50e_coco-panoptic.py @@ -1,5 +1,4 @@ -_base_ = ['./mask2former_swin-b-p4-w12-384_8xb2-lsj-50e_coco-panoptic.py'] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth' # noqa +_base_ = ["./mask2former_swin-b-p4-w12-384_8xb2-lsj-50e_coco-panoptic.py"] +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth" # noqa -model = dict( - backbone=dict(init_cfg=dict(type='Pretrained', checkpoint=pretrained))) +model = dict(backbone=dict(init_cfg=dict(type="Pretrained", checkpoint=pretrained))) diff --git a/mmpose/configs/mmdet/mask2former/mask2former_swin-b-p4-w12-384_8xb2-lsj-50e_coco-panoptic.py b/mmpose/configs/mmdet/mask2former/mask2former_swin-b-p4-w12-384_8xb2-lsj-50e_coco-panoptic.py index bd59400b4aed1aac97795e474633d5581705b899..938e0cac64760c317632046f0ab412e5de29c0ec 100644 --- a/mmpose/configs/mmdet/mask2former/mask2former_swin-b-p4-w12-384_8xb2-lsj-50e_coco-panoptic.py +++ b/mmpose/configs/mmdet/mask2former/mask2former_swin-b-p4-w12-384_8xb2-lsj-50e_coco-panoptic.py @@ -1,5 +1,5 @@ -_base_ = ['./mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py'] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384.pth' # noqa +_base_ = ["./mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py"] +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384.pth" # noqa depths = [2, 2, 18, 2] model = dict( @@ -9,8 +9,10 @@ model = dict( depths=depths, num_heads=[4, 8, 16, 32], window_size=12, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), - panoptic_head=dict(in_channels=[128, 256, 512, 1024])) + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), + panoptic_head=dict(in_channels=[128, 256, 512, 1024]), +) # set all layers in backbone to lr_mult=0.1 # set all norm layers, position_embeding, @@ -19,24 +21,22 @@ backbone_norm_multi = dict(lr_mult=0.1, decay_mult=0.0) backbone_embed_multi = dict(lr_mult=0.1, decay_mult=0.0) embed_multi = dict(lr_mult=1.0, decay_mult=0.0) custom_keys = { - 'backbone': dict(lr_mult=0.1, decay_mult=1.0), - 'backbone.patch_embed.norm': backbone_norm_multi, - 'backbone.norm': backbone_norm_multi, - 'absolute_pos_embed': backbone_embed_multi, - 'relative_position_bias_table': backbone_embed_multi, - 'query_embed': embed_multi, - 'query_feat': embed_multi, - 'level_embed': embed_multi + "backbone": dict(lr_mult=0.1, decay_mult=1.0), + "backbone.patch_embed.norm": backbone_norm_multi, + "backbone.norm": backbone_norm_multi, + "absolute_pos_embed": backbone_embed_multi, + "relative_position_bias_table": backbone_embed_multi, + "query_embed": embed_multi, + "query_feat": embed_multi, + "level_embed": embed_multi, } -custom_keys.update({ - f'backbone.stages.{stage_id}.blocks.{block_id}.norm': backbone_norm_multi - for stage_id, num_blocks in enumerate(depths) - for block_id in range(num_blocks) -}) -custom_keys.update({ - f'backbone.stages.{stage_id}.downsample.norm': backbone_norm_multi - for stage_id in range(len(depths) - 1) -}) +custom_keys.update( + { + f"backbone.stages.{stage_id}.blocks.{block_id}.norm": backbone_norm_multi + for stage_id, num_blocks in enumerate(depths) + for block_id in range(num_blocks) + } +) +custom_keys.update({f"backbone.stages.{stage_id}.downsample.norm": backbone_norm_multi for stage_id in range(len(depths) - 1)}) # optimizer -optim_wrapper = dict( - paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) +optim_wrapper = dict(paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) diff --git a/mmpose/configs/mmdet/mask2former/mask2former_swin-l-p4-w12-384-in21k_16xb1-lsj-100e_coco-panoptic.py b/mmpose/configs/mmdet/mask2former/mask2former_swin-l-p4-w12-384-in21k_16xb1-lsj-100e_coco-panoptic.py index e203ffc96c40098e4cf0788fc47b4438ebffbb41..74a18abaa527d43efce1b04a2f21db54d966f2e5 100644 --- a/mmpose/configs/mmdet/mask2former/mask2former_swin-l-p4-w12-384-in21k_16xb1-lsj-100e_coco-panoptic.py +++ b/mmpose/configs/mmdet/mask2former/mask2former_swin-l-p4-w12-384-in21k_16xb1-lsj-100e_coco-panoptic.py @@ -1,12 +1,10 @@ -_base_ = ['./mask2former_swin-b-p4-w12-384_8xb2-lsj-50e_coco-panoptic.py'] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth' # noqa +_base_ = ["./mask2former_swin-b-p4-w12-384_8xb2-lsj-50e_coco-panoptic.py"] +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth" # noqa model = dict( - backbone=dict( - embed_dims=192, - num_heads=[6, 12, 24, 48], - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), - panoptic_head=dict(num_queries=200, in_channels=[192, 384, 768, 1536])) + backbone=dict(embed_dims=192, num_heads=[6, 12, 24, 48], init_cfg=dict(type="Pretrained", checkpoint=pretrained)), + panoptic_head=dict(num_queries=200, in_channels=[192, 384, 768, 1536]), +) train_dataloader = dict(batch_size=1, num_workers=1) @@ -19,7 +17,4 @@ param_scheduler = dict(end=max_iters, milestones=[655556, 710184]) # which means that we do evaluation at the end of training.' interval = 5000 dynamic_intervals = [(max_iters // interval * interval + 1, max_iters)] -train_cfg = dict( - max_iters=max_iters, - val_interval=interval, - dynamic_intervals=dynamic_intervals) +train_cfg = dict(max_iters=max_iters, val_interval=interval, dynamic_intervals=dynamic_intervals) diff --git a/mmpose/configs/mmdet/mask2former/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py b/mmpose/configs/mmdet/mask2former/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py index f9d081db58a74dd02b3b715c3777f077d42de7ca..7b5cb0b1db3e374ac868d8bbc6e7ae9f4d20de8d 100644 --- a/mmpose/configs/mmdet/mask2former/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py +++ b/mmpose/configs/mmdet/mask2former/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py @@ -1,11 +1,8 @@ -_base_ = ['./mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py'] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_small_patch4_window7_224.pth' # noqa +_base_ = ["./mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py"] +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_small_patch4_window7_224.pth" # noqa depths = [2, 2, 18, 2] -model = dict( - backbone=dict( - depths=depths, init_cfg=dict(type='Pretrained', - checkpoint=pretrained))) +model = dict(backbone=dict(depths=depths, init_cfg=dict(type="Pretrained", checkpoint=pretrained))) # set all layers in backbone to lr_mult=0.1 # set all norm layers, position_embeding, @@ -14,24 +11,22 @@ backbone_norm_multi = dict(lr_mult=0.1, decay_mult=0.0) backbone_embed_multi = dict(lr_mult=0.1, decay_mult=0.0) embed_multi = dict(lr_mult=1.0, decay_mult=0.0) custom_keys = { - 'backbone': dict(lr_mult=0.1, decay_mult=1.0), - 'backbone.patch_embed.norm': backbone_norm_multi, - 'backbone.norm': backbone_norm_multi, - 'absolute_pos_embed': backbone_embed_multi, - 'relative_position_bias_table': backbone_embed_multi, - 'query_embed': embed_multi, - 'query_feat': embed_multi, - 'level_embed': embed_multi + "backbone": dict(lr_mult=0.1, decay_mult=1.0), + "backbone.patch_embed.norm": backbone_norm_multi, + "backbone.norm": backbone_norm_multi, + "absolute_pos_embed": backbone_embed_multi, + "relative_position_bias_table": backbone_embed_multi, + "query_embed": embed_multi, + "query_feat": embed_multi, + "level_embed": embed_multi, } -custom_keys.update({ - f'backbone.stages.{stage_id}.blocks.{block_id}.norm': backbone_norm_multi - for stage_id, num_blocks in enumerate(depths) - for block_id in range(num_blocks) -}) -custom_keys.update({ - f'backbone.stages.{stage_id}.downsample.norm': backbone_norm_multi - for stage_id in range(len(depths) - 1) -}) +custom_keys.update( + { + f"backbone.stages.{stage_id}.blocks.{block_id}.norm": backbone_norm_multi + for stage_id, num_blocks in enumerate(depths) + for block_id in range(num_blocks) + } +) +custom_keys.update({f"backbone.stages.{stage_id}.downsample.norm": backbone_norm_multi for stage_id in range(len(depths) - 1)}) # optimizer -optim_wrapper = dict( - paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) +optim_wrapper = dict(paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) diff --git a/mmpose/configs/mmdet/mask2former/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco.py b/mmpose/configs/mmdet/mask2former/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco.py index 69d5e8c6f96434973e3e9f3498155e385af815be..848d85378b3d75568be4031966f722f36e7d6b97 100644 --- a/mmpose/configs/mmdet/mask2former/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco.py +++ b/mmpose/configs/mmdet/mask2former/mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco.py @@ -1,11 +1,8 @@ -_base_ = ['./mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco.py'] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_small_patch4_window7_224.pth' # noqa +_base_ = ["./mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco.py"] +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_small_patch4_window7_224.pth" # noqa depths = [2, 2, 18, 2] -model = dict( - backbone=dict( - depths=depths, init_cfg=dict(type='Pretrained', - checkpoint=pretrained))) +model = dict(backbone=dict(depths=depths, init_cfg=dict(type="Pretrained", checkpoint=pretrained))) # set all layers in backbone to lr_mult=0.1 # set all norm layers, position_embeding, @@ -14,24 +11,22 @@ backbone_norm_multi = dict(lr_mult=0.1, decay_mult=0.0) backbone_embed_multi = dict(lr_mult=0.1, decay_mult=0.0) embed_multi = dict(lr_mult=1.0, decay_mult=0.0) custom_keys = { - 'backbone': dict(lr_mult=0.1, decay_mult=1.0), - 'backbone.patch_embed.norm': backbone_norm_multi, - 'backbone.norm': backbone_norm_multi, - 'absolute_pos_embed': backbone_embed_multi, - 'relative_position_bias_table': backbone_embed_multi, - 'query_embed': embed_multi, - 'query_feat': embed_multi, - 'level_embed': embed_multi + "backbone": dict(lr_mult=0.1, decay_mult=1.0), + "backbone.patch_embed.norm": backbone_norm_multi, + "backbone.norm": backbone_norm_multi, + "absolute_pos_embed": backbone_embed_multi, + "relative_position_bias_table": backbone_embed_multi, + "query_embed": embed_multi, + "query_feat": embed_multi, + "level_embed": embed_multi, } -custom_keys.update({ - f'backbone.stages.{stage_id}.blocks.{block_id}.norm': backbone_norm_multi - for stage_id, num_blocks in enumerate(depths) - for block_id in range(num_blocks) -}) -custom_keys.update({ - f'backbone.stages.{stage_id}.downsample.norm': backbone_norm_multi - for stage_id in range(len(depths) - 1) -}) +custom_keys.update( + { + f"backbone.stages.{stage_id}.blocks.{block_id}.norm": backbone_norm_multi + for stage_id, num_blocks in enumerate(depths) + for block_id in range(num_blocks) + } +) +custom_keys.update({f"backbone.stages.{stage_id}.downsample.norm": backbone_norm_multi for stage_id in range(len(depths) - 1)}) # optimizer -optim_wrapper = dict( - paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) +optim_wrapper = dict(paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) diff --git a/mmpose/configs/mmdet/mask2former/mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py b/mmpose/configs/mmdet/mask2former/mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py index 1c00d7a697f07ad618a0b4735432a0a74d4992a9..e7786b33f3bddee876af2dd40e08c5160c3b2b6b 100644 --- a/mmpose/configs/mmdet/mask2former/mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py +++ b/mmpose/configs/mmdet/mask2former/mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py @@ -1,12 +1,12 @@ -_base_ = ['./mask2former_r50_8xb2-lsj-50e_coco-panoptic.py'] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth' # noqa +_base_ = ["./mask2former_r50_8xb2-lsj-50e_coco-panoptic.py"] +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth" # noqa depths = [2, 2, 6, 2] model = dict( - type='Mask2Former', + type="Mask2Former", backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", embed_dims=96, depths=depths, num_heads=[3, 6, 12, 24], @@ -14,18 +14,19 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, out_indices=(0, 1, 2, 3), with_cp=False, convert_weights=True, frozen_stages=-1, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), - panoptic_head=dict( - type='Mask2FormerHead', in_channels=[96, 192, 384, 768]), - init_cfg=None) + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), + panoptic_head=dict(type="Mask2FormerHead", in_channels=[96, 192, 384, 768]), + init_cfg=None, +) # set all layers in backbone to lr_mult=0.1 # set all norm layers, position_embeding, @@ -34,25 +35,23 @@ backbone_norm_multi = dict(lr_mult=0.1, decay_mult=0.0) backbone_embed_multi = dict(lr_mult=0.1, decay_mult=0.0) embed_multi = dict(lr_mult=1.0, decay_mult=0.0) custom_keys = { - 'backbone': dict(lr_mult=0.1, decay_mult=1.0), - 'backbone.patch_embed.norm': backbone_norm_multi, - 'backbone.norm': backbone_norm_multi, - 'absolute_pos_embed': backbone_embed_multi, - 'relative_position_bias_table': backbone_embed_multi, - 'query_embed': embed_multi, - 'query_feat': embed_multi, - 'level_embed': embed_multi + "backbone": dict(lr_mult=0.1, decay_mult=1.0), + "backbone.patch_embed.norm": backbone_norm_multi, + "backbone.norm": backbone_norm_multi, + "absolute_pos_embed": backbone_embed_multi, + "relative_position_bias_table": backbone_embed_multi, + "query_embed": embed_multi, + "query_feat": embed_multi, + "level_embed": embed_multi, } -custom_keys.update({ - f'backbone.stages.{stage_id}.blocks.{block_id}.norm': backbone_norm_multi - for stage_id, num_blocks in enumerate(depths) - for block_id in range(num_blocks) -}) -custom_keys.update({ - f'backbone.stages.{stage_id}.downsample.norm': backbone_norm_multi - for stage_id in range(len(depths) - 1) -}) +custom_keys.update( + { + f"backbone.stages.{stage_id}.blocks.{block_id}.norm": backbone_norm_multi + for stage_id, num_blocks in enumerate(depths) + for block_id in range(num_blocks) + } +) +custom_keys.update({f"backbone.stages.{stage_id}.downsample.norm": backbone_norm_multi for stage_id in range(len(depths) - 1)}) # optimizer -optim_wrapper = dict( - paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) +optim_wrapper = dict(paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) diff --git a/mmpose/configs/mmdet/mask2former/mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco.py b/mmpose/configs/mmdet/mask2former/mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco.py index 5bb9c21858ebe065691a8a963bf5dec85542fb57..6013f3ad58e7115dbc50f0d12b7ecf2ef27a0c2f 100644 --- a/mmpose/configs/mmdet/mask2former/mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco.py +++ b/mmpose/configs/mmdet/mask2former/mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco.py @@ -1,11 +1,11 @@ -_base_ = ['./mask2former_r50_8xb2-lsj-50e_coco.py'] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth' # noqa +_base_ = ["./mask2former_r50_8xb2-lsj-50e_coco.py"] +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth" # noqa depths = [2, 2, 6, 2] model = dict( - type='Mask2Former', + type="Mask2Former", backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", embed_dims=96, depths=depths, num_heads=[3, 6, 12, 24], @@ -13,18 +13,19 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, out_indices=(0, 1, 2, 3), with_cp=False, convert_weights=True, frozen_stages=-1, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), - panoptic_head=dict( - type='Mask2FormerHead', in_channels=[96, 192, 384, 768]), - init_cfg=None) + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), + panoptic_head=dict(type="Mask2FormerHead", in_channels=[96, 192, 384, 768]), + init_cfg=None, +) # set all layers in backbone to lr_mult=0.1 # set all norm layers, position_embeding, @@ -33,24 +34,22 @@ backbone_norm_multi = dict(lr_mult=0.1, decay_mult=0.0) backbone_embed_multi = dict(lr_mult=0.1, decay_mult=0.0) embed_multi = dict(lr_mult=1.0, decay_mult=0.0) custom_keys = { - 'backbone': dict(lr_mult=0.1, decay_mult=1.0), - 'backbone.patch_embed.norm': backbone_norm_multi, - 'backbone.norm': backbone_norm_multi, - 'absolute_pos_embed': backbone_embed_multi, - 'relative_position_bias_table': backbone_embed_multi, - 'query_embed': embed_multi, - 'query_feat': embed_multi, - 'level_embed': embed_multi + "backbone": dict(lr_mult=0.1, decay_mult=1.0), + "backbone.patch_embed.norm": backbone_norm_multi, + "backbone.norm": backbone_norm_multi, + "absolute_pos_embed": backbone_embed_multi, + "relative_position_bias_table": backbone_embed_multi, + "query_embed": embed_multi, + "query_feat": embed_multi, + "level_embed": embed_multi, } -custom_keys.update({ - f'backbone.stages.{stage_id}.blocks.{block_id}.norm': backbone_norm_multi - for stage_id, num_blocks in enumerate(depths) - for block_id in range(num_blocks) -}) -custom_keys.update({ - f'backbone.stages.{stage_id}.downsample.norm': backbone_norm_multi - for stage_id in range(len(depths) - 1) -}) +custom_keys.update( + { + f"backbone.stages.{stage_id}.blocks.{block_id}.norm": backbone_norm_multi + for stage_id, num_blocks in enumerate(depths) + for block_id in range(num_blocks) + } +) +custom_keys.update({f"backbone.stages.{stage_id}.downsample.norm": backbone_norm_multi for stage_id in range(len(depths) - 1)}) # optimizer -optim_wrapper = dict( - paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) +optim_wrapper = dict(paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) diff --git a/mmpose/configs/mmdet/mask2former_vis/mask2former_r101_8xb2-8e_youtubevis2019.py b/mmpose/configs/mmdet/mask2former_vis/mask2former_r101_8xb2-8e_youtubevis2019.py index 3ba4aea8eac72f347940fb12ac964e9bf67c2e0e..3f8150565839c3cf9ee491d72be7444b05e0b196 100644 --- a/mmpose/configs/mmdet/mask2former_vis/mask2former_r101_8xb2-8e_youtubevis2019.py +++ b/mmpose/configs/mmdet/mask2former_vis/mask2former_r101_8xb2-8e_youtubevis2019.py @@ -1,12 +1,11 @@ -_base_ = './mask2former_r50_8xb2-8e_youtubevis2019.py' +_base_ = "./mask2former_r50_8xb2-8e_youtubevis2019.py" model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101')), + backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101")), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'mask2former/mask2former_r101_8xb2-lsj-50e_coco/' - 'mask2former_r101_8xb2-lsj-50e_coco_20220426_100250-ecf181e2.pth')) + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "mask2former/mask2former_r101_8xb2-lsj-50e_coco/" + "mask2former_r101_8xb2-lsj-50e_coco_20220426_100250-ecf181e2.pth", + ), +) diff --git a/mmpose/configs/mmdet/mask2former_vis/mask2former_r101_8xb2-8e_youtubevis2021.py b/mmpose/configs/mmdet/mask2former_vis/mask2former_r101_8xb2-8e_youtubevis2021.py index 95f9ceeb38833aeef342e12178703db6901fe5f6..a642a30edaf40235bb1cf9191d86f608dfe27ea8 100644 --- a/mmpose/configs/mmdet/mask2former_vis/mask2former_r101_8xb2-8e_youtubevis2021.py +++ b/mmpose/configs/mmdet/mask2former_vis/mask2former_r101_8xb2-8e_youtubevis2021.py @@ -1,12 +1,11 @@ -_base_ = './mask2former_r50_8xb2-8e_youtubevis2021.py' +_base_ = "./mask2former_r50_8xb2-8e_youtubevis2021.py" model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101')), + backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101")), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'mask2former/mask2former_r101_8xb2-lsj-50e_coco/' - 'mask2former_r101_8xb2-lsj-50e_coco_20220426_100250-ecf181e2.pth')) + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "mask2former/mask2former_r101_8xb2-lsj-50e_coco/" + "mask2former_r101_8xb2-lsj-50e_coco_20220426_100250-ecf181e2.pth", + ), +) diff --git a/mmpose/configs/mmdet/mask2former_vis/mask2former_r50_8xb2-8e_youtubevis2019.py b/mmpose/configs/mmdet/mask2former_vis/mask2former_r50_8xb2-8e_youtubevis2019.py index 8dc03bf97a2ed2b90e097bbd9637a42bf4d64c35..34dcf7e9b3e72f331f3aa516e9b02147fade9f9e 100644 --- a/mmpose/configs/mmdet/mask2former_vis/mask2former_r50_8xb2-8e_youtubevis2019.py +++ b/mmpose/configs/mmdet/mask2former_vis/mask2former_r50_8xb2-8e_youtubevis2019.py @@ -1,28 +1,30 @@ -_base_ = ['../_base_/datasets/youtube_vis.py', '../_base_/default_runtime.py'] +_base_ = ["../_base_/datasets/youtube_vis.py", "../_base_/default_runtime.py"] num_classes = 40 num_frames = 2 model = dict( - type='Mask2FormerVideo', + type="Mask2FormerVideo", data_preprocessor=dict( - type='TrackDataPreprocessor', + type="TrackDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_mask=True, - pad_size_divisor=32), + pad_size_divisor=32, + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=-1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), track_head=dict( - type='Mask2FormerTrackHead', + type="Mask2FormerTrackHead", in_channels=[256, 512, 1024, 2048], # pass to pixel_decoder inside strides=[4, 8, 16, 32], feat_channels=256, @@ -32,143 +34,106 @@ model = dict( num_frames=num_frames, num_transformer_feat_level=3, pixel_decoder=dict( - type='MSDeformAttnPixelDecoder', + type="MSDeformAttnPixelDecoder", num_outs=3, - norm_cfg=dict(type='GN', num_groups=32), - act_cfg=dict(type='ReLU'), + norm_cfg=dict(type="GN", num_groups=32), + act_cfg=dict(type="ReLU"), encoder=dict( # DeformableDetrTransformerEncoder num_layers=6, layer_cfg=dict( # DeformableDetrTransformerEncoderLayer self_attn_cfg=dict( # MultiScaleDeformableAttention - embed_dims=256, - num_heads=8, - num_levels=3, - num_points=4, - im2col_step=128, - dropout=0.0, - batch_first=True), + embed_dims=256, num_heads=8, num_levels=3, num_points=4, im2col_step=128, dropout=0.0, batch_first=True + ), ffn_cfg=dict( - embed_dims=256, - feedforward_channels=1024, - num_fcs=2, - ffn_drop=0.0, - act_cfg=dict(type='ReLU', inplace=True)))), - positional_encoding=dict(num_feats=128, normalize=True)), + embed_dims=256, feedforward_channels=1024, num_fcs=2, ffn_drop=0.0, act_cfg=dict(type="ReLU", inplace=True) + ), + ), + ), + positional_encoding=dict(num_feats=128, normalize=True), + ), enforce_decoder_input_project=False, - positional_encoding=dict( - type='SinePositionalEncoding3D', num_feats=128, normalize=True), + positional_encoding=dict(type="SinePositionalEncoding3D", num_feats=128, normalize=True), transformer_decoder=dict( # Mask2FormerTransformerDecoder return_intermediate=True, num_layers=9, layer_cfg=dict( # Mask2FormerTransformerDecoderLayer - self_attn_cfg=dict( # MultiheadAttention - embed_dims=256, - num_heads=8, - dropout=0.0, - batch_first=True), - cross_attn_cfg=dict( # MultiheadAttention - embed_dims=256, - num_heads=8, - dropout=0.0, - batch_first=True), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, - num_fcs=2, - ffn_drop=0.0, - act_cfg=dict(type='ReLU', inplace=True))), - init_cfg=None), + self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0, batch_first=True), # MultiheadAttention + cross_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0, batch_first=True), # MultiheadAttention + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, num_fcs=2, ffn_drop=0.0, act_cfg=dict(type="ReLU", inplace=True)), + ), + init_cfg=None, + ), loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=2.0, - reduction='mean', - class_weight=[1.0] * num_classes + [0.1]), - loss_mask=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - reduction='mean', - loss_weight=5.0), - loss_dice=dict( - type='DiceLoss', - use_sigmoid=True, - activate=True, - reduction='mean', - naive_dice=True, - eps=1.0, - loss_weight=5.0), + type="CrossEntropyLoss", use_sigmoid=False, loss_weight=2.0, reduction="mean", class_weight=[1.0] * num_classes + [0.1] + ), + loss_mask=dict(type="CrossEntropyLoss", use_sigmoid=True, reduction="mean", loss_weight=5.0), + loss_dice=dict(type="DiceLoss", use_sigmoid=True, activate=True, reduction="mean", naive_dice=True, eps=1.0, loss_weight=5.0), train_cfg=dict( num_points=12544, oversample_ratio=3.0, importance_sample_ratio=0.75, assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='ClassificationCost', weight=2.0), - dict( - type='CrossEntropyLossCost', - weight=5.0, - use_sigmoid=True), - dict(type='DiceCost', weight=5.0, pred_act=True, eps=1.0) - ]), - sampler=dict(type='MaskPseudoSampler'))), + dict(type="ClassificationCost", weight=2.0), + dict(type="CrossEntropyLossCost", weight=5.0, use_sigmoid=True), + dict(type="DiceCost", weight=5.0, pred_act=True, eps=1.0), + ], + ), + sampler=dict(type="MaskPseudoSampler"), + ), + ), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'mask2former/mask2former_r50_8xb2-lsj-50e_coco/' - 'mask2former_r50_8xb2-lsj-50e_coco_20220506_191028-41b088b6.pth')) + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "mask2former/mask2former_r50_8xb2-lsj-50e_coco/" + "mask2former_r50_8xb2-lsj-50e_coco_20220506_191028-41b088b6.pth", + ), +) # optimizer embed_multi = dict(lr_mult=1.0, decay_mult=0.0) optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict( - type='AdamW', - lr=0.0001, - weight_decay=0.05, - eps=1e-8, - betas=(0.9, 0.999)), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0001, weight_decay=0.05, eps=1e-8, betas=(0.9, 0.999)), paramwise_cfg=dict( custom_keys={ - 'backbone': dict(lr_mult=0.1, decay_mult=1.0), - 'query_embed': embed_multi, - 'query_feat': embed_multi, - 'level_embed': embed_multi, + "backbone": dict(lr_mult=0.1, decay_mult=1.0), + "query_embed": embed_multi, + "query_feat": embed_multi, + "level_embed": embed_multi, }, - norm_decay_mult=0.0), - clip_grad=dict(max_norm=0.01, norm_type=2)) + norm_decay_mult=0.0, + ), + clip_grad=dict(max_norm=0.01, norm_type=2), +) # learning policy max_iters = 6000 param_scheduler = dict( - type='MultiStepLR', + type="MultiStepLR", begin=0, end=max_iters, by_epoch=False, milestones=[ 4000, ], - gamma=0.1) + gamma=0.1, +) # runtime settings -train_cfg = dict( - type='IterBasedTrainLoop', max_iters=max_iters, val_interval=6001) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="IterBasedTrainLoop", max_iters=max_iters, val_interval=6001) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") -vis_backends = [dict(type='LocalVisBackend')] -visualizer = dict( - type='TrackLocalVisualizer', vis_backends=vis_backends, name='visualizer') +vis_backends = [dict(type="LocalVisBackend")] +visualizer = dict(type="TrackLocalVisualizer", vis_backends=vis_backends, name="visualizer") default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', by_epoch=False, save_last=True, interval=2000), - visualization=dict(type='TrackVisualizationHook', draw=False)) -log_processor = dict(type='LogProcessor', window_size=50, by_epoch=False) + checkpoint=dict(type="CheckpointHook", by_epoch=False, save_last=True, interval=2000), + visualization=dict(type="TrackVisualizationHook", draw=False), +) +log_processor = dict(type="LogProcessor", window_size=50, by_epoch=False) # evaluator -val_evaluator = dict( - type='YouTubeVISMetric', - metric='youtube_vis_ap', - outfile_prefix='./youtube_vis_results', - format_only=True) +val_evaluator = dict(type="YouTubeVISMetric", metric="youtube_vis_ap", outfile_prefix="./youtube_vis_results", format_only=True) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/mask2former_vis/mask2former_r50_8xb2-8e_youtubevis2021.py b/mmpose/configs/mmdet/mask2former_vis/mask2former_r50_8xb2-8e_youtubevis2021.py index 158fe52d20fccf162cb66202fbc9069ba0f4cb68..09d2433fc06ed0bfe4d476d04627346f3fd6eb0d 100644 --- a/mmpose/configs/mmdet/mask2former_vis/mask2former_r50_8xb2-8e_youtubevis2021.py +++ b/mmpose/configs/mmdet/mask2former_vis/mask2former_r50_8xb2-8e_youtubevis2021.py @@ -1,37 +1,31 @@ -_base_ = './mask2former_r50_8xb2-8e_youtubevis2019.py' +_base_ = "./mask2former_r50_8xb2-8e_youtubevis2019.py" -dataset_type = 'YouTubeVISDataset' -data_root = 'data/youtube_vis_2021/' +dataset_type = "YouTubeVISDataset" +data_root = "data/youtube_vis_2021/" dataset_version = data_root[-5:-1] # 2019 or 2021 train_dataloader = dict( - dataset=dict( - data_root=data_root, - dataset_version=dataset_version, - ann_file='annotations/youtube_vis_2021_train.json')) + dataset=dict(data_root=data_root, dataset_version=dataset_version, ann_file="annotations/youtube_vis_2021_train.json") +) val_dataloader = dict( - dataset=dict( - data_root=data_root, - dataset_version=dataset_version, - ann_file='annotations/youtube_vis_2021_valid.json')) + dataset=dict(data_root=data_root, dataset_version=dataset_version, ann_file="annotations/youtube_vis_2021_valid.json") +) test_dataloader = val_dataloader # learning policy max_iters = 8000 param_scheduler = dict( - type='MultiStepLR', + type="MultiStepLR", begin=0, end=max_iters, by_epoch=False, milestones=[ 5500, ], - gamma=0.1) + gamma=0.1, +) # runtime settings -train_cfg = dict( - type='IterBasedTrainLoop', max_iters=max_iters, val_interval=8001) +train_cfg = dict(type="IterBasedTrainLoop", max_iters=max_iters, val_interval=8001) -default_hooks = dict( - checkpoint=dict( - type='CheckpointHook', by_epoch=False, save_last=True, interval=500)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", by_epoch=False, save_last=True, interval=500)) diff --git a/mmpose/configs/mmdet/mask2former_vis/mask2former_swin-l-p4-w12-384-in21k_8xb2-8e_youtubevis2021.py b/mmpose/configs/mmdet/mask2former_vis/mask2former_swin-l-p4-w12-384-in21k_8xb2-8e_youtubevis2021.py index 94dcccf408dfb989ea264536a617a48ecc13171c..595d673b7ace8029d21a22178ca2476866dcd9db 100644 --- a/mmpose/configs/mmdet/mask2former_vis/mask2former_swin-l-p4-w12-384-in21k_8xb2-8e_youtubevis2021.py +++ b/mmpose/configs/mmdet/mask2former_vis/mask2former_swin-l-p4-w12-384-in21k_8xb2-8e_youtubevis2021.py @@ -1,10 +1,10 @@ -_base_ = ['./mask2former_r50_8xb2-8e_youtubevis2021.py'] +_base_ = ["./mask2former_r50_8xb2-8e_youtubevis2021.py"] depths = [2, 2, 18, 2] model = dict( - type='Mask2FormerVideo', + type="Mask2FormerVideo", backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", pretrain_img_size=384, embed_dims=192, depths=depths, @@ -13,26 +13,25 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, out_indices=(0, 1, 2, 3), with_cp=False, convert_weights=True, frozen_stages=-1, - init_cfg=None), - track_head=dict( - type='Mask2FormerTrackHead', - in_channels=[192, 384, 768, 1536], - num_queries=200), + init_cfg=None, + ), + track_head=dict(type="Mask2FormerTrackHead", in_channels=[192, 384, 768, 1536], num_queries=200), init_cfg=dict( - type='Pretrained', - checkpoint= # noqa: E251 - 'https://download.openmmlab.com/mmdetection/v3.0/mask2former/' - 'mask2former_swin-l-p4-w12-384-in21k_16xb1-lsj-100e_coco-panoptic/' - 'mask2former_swin-l-p4-w12-384-in21k_16xb1-lsj-100e_coco-panoptic_' - '20220407_104949-82f8d28d.pth')) + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/mask2former/" # noqa: E251 + "mask2former_swin-l-p4-w12-384-in21k_16xb1-lsj-100e_coco-panoptic/" + "mask2former_swin-l-p4-w12-384-in21k_16xb1-lsj-100e_coco-panoptic_" + "20220407_104949-82f8d28d.pth", + ), +) # set all layers in backbone to lr_mult=0.1 # set all norm layers, position_embeding, @@ -41,24 +40,22 @@ backbone_norm_multi = dict(lr_mult=0.1, decay_mult=0.0) backbone_embed_multi = dict(lr_mult=0.1, decay_mult=0.0) embed_multi = dict(lr_mult=1.0, decay_mult=0.0) custom_keys = { - 'backbone': dict(lr_mult=0.1, decay_mult=1.0), - 'backbone.patch_embed.norm': backbone_norm_multi, - 'backbone.norm': backbone_norm_multi, - 'absolute_pos_embed': backbone_embed_multi, - 'relative_position_bias_table': backbone_embed_multi, - 'query_embed': embed_multi, - 'query_feat': embed_multi, - 'level_embed': embed_multi + "backbone": dict(lr_mult=0.1, decay_mult=1.0), + "backbone.patch_embed.norm": backbone_norm_multi, + "backbone.norm": backbone_norm_multi, + "absolute_pos_embed": backbone_embed_multi, + "relative_position_bias_table": backbone_embed_multi, + "query_embed": embed_multi, + "query_feat": embed_multi, + "level_embed": embed_multi, } -custom_keys.update({ - f'backbone.stages.{stage_id}.blocks.{block_id}.norm': backbone_norm_multi - for stage_id, num_blocks in enumerate(depths) - for block_id in range(num_blocks) -}) -custom_keys.update({ - f'backbone.stages.{stage_id}.downsample.norm': backbone_norm_multi - for stage_id in range(len(depths) - 1) -}) +custom_keys.update( + { + f"backbone.stages.{stage_id}.blocks.{block_id}.norm": backbone_norm_multi + for stage_id, num_blocks in enumerate(depths) + for block_id in range(num_blocks) + } +) +custom_keys.update({f"backbone.stages.{stage_id}.downsample.norm": backbone_norm_multi for stage_id in range(len(depths) - 1)}) # optimizer -optim_wrapper = dict( - paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) +optim_wrapper = dict(paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101-caffe_fpn_1x_coco.py index 09808e4bcada43b1e935d5393894c7ba3401fc3d..34345d9cce5152797e93adf0f7c0bd001d843f1a 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101-caffe_fpn_1x_coco.py @@ -1,7 +1,2 @@ -_base_ = './mask-rcnn_r50-caffe_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) +_base_ = "./mask-rcnn_r50-caffe_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"))) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101-caffe_fpn_ms-poly-3x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101-caffe_fpn_ms-poly-3x_coco.py index e723aea81ff82dfa842d7468e166f42ee9291669..c458f5aedaac566cf1b1198ac61a31adb3fb255c 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101-caffe_fpn_ms-poly-3x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101-caffe_fpn_ms-poly-3x_coco.py @@ -1,19 +1,13 @@ -_base_ = [ - '../common/ms-poly_3x_coco-instance.py', - '../_base_/models/mask-rcnn_r50_fpn.py' -] +_base_ = ["../common/ms-poly_3x_coco-instance.py", "../_base_/models/mask-rcnn_r50_fpn.py"] model = dict( # use caffe img_norm - data_preprocessor=dict( - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False), + data_preprocessor=dict(mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False), backbone=dict( depth=101, norm_cfg=dict(requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"), + ), +) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py index af91ff0b8349b0e9e658b69cf4c5dd138b7b8a5a..c5531e1923a112f3edcd2b51825891d3e6cb4e67 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './mask-rcnn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./mask-rcnn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_2x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_2x_coco.py index a5599e7c4942b523d6500e2c7c8ad4638cab45c6..91a8dd0a6a2a99c12f89734a11a8d9707abe25b0 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_2x_coco.py @@ -1,6 +1,2 @@ -_base_ = './mask-rcnn_r50_fpn_2x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./mask-rcnn_r50_fpn_2x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_8xb8-amp-lsj-200e_coco.py index 452351050238a4d4411b2bf6fc916e2d69804766..fc52cfbcb9d310406957ee55afd2d26b4175bc91 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,7 +1,3 @@ -_base_ = './mask-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py' +_base_ = "./mask-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_ms-poly-3x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_ms-poly-3x_coco.py index 384f6dcd3ca33cd91755b48dd525d747a358ee02..3dd7518c3fa8e0c5ca9abfcac35038c5d84ece84 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_ms-poly-3x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r101_fpn_ms-poly-3x_coco.py @@ -1,10 +1,3 @@ -_base_ = [ - '../common/ms-poly_3x_coco-instance.py', - '../_base_/models/mask-rcnn_r50_fpn.py' -] +_base_ = ["../common/ms-poly_3x_coco-instance.py", "../_base_/models/mask-rcnn_r50_fpn.py"] -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r18_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r18_fpn_8xb8-amp-lsj-200e_coco.py index 5b9219c9c1da8ca68cf7ada0881419b371a26a87..8ff7c80cfab9e502cf92ec6d857e2a921cc7c171 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r18_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r18_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,7 +1,6 @@ -_base_ = './mask-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py' +_base_ = "./mask-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py" model = dict( - backbone=dict( - depth=18, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18')), - neck=dict(in_channels=[64, 128, 256, 512])) + backbone=dict(depth=18, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet18")), + neck=dict(in_channels=[64, 128, 256, 512]), +) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe-c4_1x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe-c4_1x_coco.py index 9919f11c3fc7b68528bf6f690e39185d703aff43..a27deb9425cab59efa3522a3fb22696890d8d2bc 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe-c4_1x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe-c4_1x_coco.py @@ -1,5 +1,6 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50-caffe-c4.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50-caffe-c4.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_1x_coco.py index 4124f138d874def6810cea6c884a02eaacdf5f71..bc1588dc0d62b1ab27cbdac8931cdb3cb05c8bfc 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_1x_coco.py @@ -1,13 +1,10 @@ -_base_ = './mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "./mask-rcnn_r50_fpn_1x_coco.py" model = dict( # use caffe img_norm - data_preprocessor=dict( - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False), + data_preprocessor=dict(mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False), backbone=dict( norm_cfg=dict(requires_grad=False), - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe'))) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), +) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-1x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-1x_coco.py index 7702ae14a9cc54686df6a3eadec5bc8cfeb8e0a8..2f2f05b44ddcbcf44e2b59e482a412f6e8664fc2 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-1x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-1x_coco.py @@ -1,28 +1,21 @@ -_base_ = './mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "./mask-rcnn_r50_fpn_1x_coco.py" model = dict( # use caffe img_norm - data_preprocessor=dict( - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False), + data_preprocessor=dict(mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False), backbone=dict( norm_cfg=dict(requires_grad=False), - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe'))) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs'), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py index 94d94dd3613e0599f51f113ccf12e568a5b29f8f..e1ac4bebe6d7337781455d84014afa35005d66a0 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py @@ -1,31 +1,20 @@ -_base_ = './mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "./mask-rcnn_r50_fpn_1x_coco.py" model = dict( # use caffe img_norm - data_preprocessor=dict( - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False), + data_preprocessor=dict(mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False), backbone=dict( norm_cfg=dict(requires_grad=False), - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe'))) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-2x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-2x_coco.py index dbf87bb8346dd351c8f16700df7b9640bcfa984a..e2c53a68af8a02223c1f83cf3973505d4fd7cbb6 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-2x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-2x_coco.py @@ -1,15 +1,8 @@ -_base_ = './mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py' +_base_ = "./mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py" train_cfg = dict(max_epochs=24) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=24, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=24, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-3x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-3x_coco.py index 45260e2e39b53c0107e257ef2d05a14f5d5c0323..d0190415bee2cb5028bb9721f3727e77124113c7 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-3x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-3x_coco.py @@ -1,15 +1,8 @@ -_base_ = './mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py' +_base_ = "./mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py" train_cfg = dict(max_epochs=36) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=24, - by_epoch=True, - milestones=[28, 34], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=24, by_epoch=True, milestones=[28, 34], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_poly-1x_coco_v1.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_poly-1x_coco_v1.py index 3baf00140ecfa57ea54b68b85ac826e14490daa4..0f2144e6905694907b5b5d8e98ab9cf98fcba4b0 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_poly-1x_coco_v1.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_poly-1x_coco_v1.py @@ -1,31 +1,17 @@ -_base_ = './mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "./mask-rcnn_r50_fpn_1x_coco.py" model = dict( # use caffe img_norm - data_preprocessor=dict( - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False), + data_preprocessor=dict(mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False), backbone=dict( norm_cfg=dict(requires_grad=False), - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), - rpn_head=dict( - loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), + rpn_head=dict(loss_bbox=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.0)), roi_head=dict( - bbox_roi_extractor=dict( - roi_layer=dict( - type='RoIAlign', - output_size=7, - sampling_ratio=2, - aligned=False)), - bbox_head=dict( - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)), - mask_roi_extractor=dict( - roi_layer=dict( - type='RoIAlign', - output_size=14, - sampling_ratio=2, - aligned=False)))) + bbox_roi_extractor=dict(roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=2, aligned=False)), + bbox_head=dict(loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0)), + mask_roi_extractor=dict(roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=2, aligned=False)), + ), +) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_1x-wandb_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_1x-wandb_coco.py index 28b125ccb94869aff2bb283e6533fd693c79a76e..6ce80054690d7afe8b13c9bae89d5b21147a2e90 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_1x-wandb_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_1x-wandb_coco.py @@ -1,10 +1,11 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -vis_backends = [dict(type='LocalVisBackend'), dict(type='WandbVisBackend')] +vis_backends = [dict(type="LocalVisBackend"), dict(type="WandbVisBackend")] visualizer = dict(vis_backends=vis_backends) # MMEngine support the following two ways, users can choose diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py index 0fc6b91aa895e044b3fc62a3cdedbc12a052e91b..847598b0cd4d27d74a59e4529072fd118b12ff43 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py @@ -1,5 +1,6 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_2x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_2x_coco.py index 87cb8b4bb7d2fbfcfe667e7bd6cfc08e01e28c1a..7f4cf9422a94499b58de752bd172fec96e4250cc 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_2x_coco.py @@ -1,5 +1,6 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py index 7371b3646fdda7bdc1fcfcd44cf8a20df27c40b5..5089241e9e1cdaeb66072f646ebabbfbcb1878fb 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,20 +1,12 @@ -_base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../common/lsj-100e_coco-instance.py' -] +_base_ = ["../_base_/models/mask-rcnn_r50_fpn.py", "../common/lsj-100e_coco-instance.py"] image_size = (1024, 1024) -batch_augments = [ - dict(type='BatchFixedSizePad', size=image_size, pad_mask=True) -] +batch_augments = [dict(type="BatchFixedSizePad", size=image_size, pad_mask=True)] model = dict(data_preprocessor=dict(batch_augments=batch_augments)) train_dataloader = dict(batch_size=8, num_workers=4) # Enable automatic-mixed-precision training with AmpOptimWrapper. -optim_wrapper = dict( - type='AmpOptimWrapper', - optimizer=dict( - type='SGD', lr=0.02 * 4, momentum=0.9, weight_decay=0.00004)) +optim_wrapper = dict(type="AmpOptimWrapper", optimizer=dict(type="SGD", lr=0.02 * 4, momentum=0.9, weight_decay=0.00004)) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_amp-1x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_amp-1x_coco.py index a139c48b2091a3a40943ce7ec8301b06cea01d4f..c48e1a9ea4d2e4013c6036202d5e1d07e947b8ab 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_amp-1x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_amp-1x_coco.py @@ -1,4 +1,4 @@ -_base_ = './mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "./mask-rcnn_r50_fpn_1x_coco.py" # Enable automatic-mixed-precision training with AmpOptimWrapper. -optim_wrapper = dict(type='AmpOptimWrapper') +optim_wrapper = dict(type="AmpOptimWrapper") diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_ms-poly-3x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_ms-poly-3x_coco.py index 417adc3cebb3acbcc987b3f0453a78204dde1ea9..1886d97d64beb7fe98647c7137bda5372b82c903 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_ms-poly-3x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_ms-poly-3x_coco.py @@ -1,4 +1 @@ -_base_ = [ - '../common/ms-poly_3x_coco-instance.py', - '../_base_/models/mask-rcnn_r50_fpn.py' -] +_base_ = ["../common/ms-poly_3x_coco-instance.py", "../_base_/models/mask-rcnn_r50_fpn.py"] diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_poly-1x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_poly-1x_coco.py index 826180ce0a831a1ee6206bd52ffa516df766136c..f52522269cb9f463bbcf133c10938b82af7d68ea 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_poly-1x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_r50_fpn_poly-1x_coco.py @@ -1,18 +1,15 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs'), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x4d_fpn_1x_coco.py index 921ade81e30afb60a3a6f03d2f2aecef85767da8..bc93a20e7d3416adbb2d6597f71fe4cebe73c87a 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './mask-rcnn_r101_fpn_1x_coco.py' +_base_ = "./mask-rcnn_r101_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x4d_fpn_2x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x4d_fpn_2x_coco.py index db8157f80fac23f6216afbeefed6cb80398f7e0d..e427ccaf8204bf0638032e6e2215b24e6346f926 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x4d_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x4d_fpn_2x_coco.py @@ -1,14 +1,15 @@ -_base_ = './mask-rcnn_r101_fpn_2x_coco.py' +_base_ = "./mask-rcnn_r101_fpn_2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x4d_fpn_ms-poly-3x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x4d_fpn_ms-poly-3x_coco.py index 83e5451f38cb01d3d30712f22633fed6234d06c9..610fa17ed45751869743d08854a6b6bed9604dc4 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x4d_fpn_ms-poly-3x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x4d_fpn_ms-poly-3x_coco.py @@ -1,18 +1,16 @@ -_base_ = [ - '../common/ms-poly_3x_coco-instance.py', - '../_base_/models/mask-rcnn_r50_fpn.py' -] +_base_ = ["../common/ms-poly_3x_coco-instance.py", "../_base_/models/mask-rcnn_r50_fpn.py"] model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x8d_fpn_1x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x8d_fpn_1x_coco.py index 3e9b1b6fe8fcb152d9ad22bc403da6e62e936f77..faa81be1d0300497df82e3180143a431e6ebf2ac 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x8d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x8d_fpn_1x_coco.py @@ -1,22 +1,19 @@ -_base_ = './mask-rcnn_r101_fpn_1x_coco.py' +_base_ = "./mask-rcnn_r101_fpn_1x_coco.py" model = dict( # ResNeXt-101-32x8d model trained with Caffe2 at FB, # so the mean and std need to be changed. - data_preprocessor=dict( - mean=[103.530, 116.280, 123.675], - std=[57.375, 57.120, 58.395], - bgr_to_rgb=False), + data_preprocessor=dict(mean=[103.530, 116.280, 123.675], std=[57.375, 57.120, 58.395], bgr_to_rgb=False), backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=8, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), - style='pytorch', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnext101_32x8d'))) + norm_cfg=dict(type="BN", requires_grad=False), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnext101_32x8d"), + ), +) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x8d_fpn_ms-poly-1x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x8d_fpn_ms-poly-1x_coco.py index 6ee204d90001edd3e8e08e4a59ba25dd1ec4195c..0424a41b53e78e3b30d6b4205134bb7431ca9998 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x8d_fpn_ms-poly-1x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x8d_fpn_ms-poly-1x_coco.py @@ -1,40 +1,29 @@ -_base_ = './mask-rcnn_r101_fpn_1x_coco.py' +_base_ = "./mask-rcnn_r101_fpn_1x_coco.py" model = dict( # ResNeXt-101-32x8d model trained with Caffe2 at FB, # so the mean and std need to be changed. - data_preprocessor=dict( - mean=[103.530, 116.280, 123.675], - std=[57.375, 57.120, 58.395], - bgr_to_rgb=False), + data_preprocessor=dict(mean=[103.530, 116.280, 123.675], std=[57.375, 57.120, 58.395], bgr_to_rgb=False), backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=8, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), - style='pytorch', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnext101_32x8d'))) + norm_cfg=dict(type="BN", requires_grad=False), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnext101_32x8d"), + ), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs'), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x8d_fpn_ms-poly-3x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x8d_fpn_ms-poly-3x_coco.py index 999a30c39fc083f26fe0cd9e2ec13bb4f6063268..bca68de53d622af8e9958754c3c3eec49990a6e9 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x8d_fpn_ms-poly-3x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-32x8d_fpn_ms-poly-3x_coco.py @@ -1,25 +1,19 @@ -_base_ = [ - '../common/ms-poly_3x_coco-instance.py', - '../_base_/models/mask-rcnn_r50_fpn.py' -] +_base_ = ["../common/ms-poly_3x_coco-instance.py", "../_base_/models/mask-rcnn_r50_fpn.py"] model = dict( # ResNeXt-101-32x8d model trained with Caffe2 at FB, # so the mean and std need to be changed. - data_preprocessor=dict( - mean=[103.530, 116.280, 123.675], - std=[57.375, 57.120, 58.395], - bgr_to_rgb=False), + data_preprocessor=dict(mean=[103.530, 116.280, 123.675], std=[57.375, 57.120, 58.395], bgr_to_rgb=False), backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=8, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), - style='pytorch', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnext101_32x8d'))) + norm_cfg=dict(type="BN", requires_grad=False), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnext101_32x8d"), + ), +) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-64x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-64x4d_fpn_1x_coco.py index 2cbb658c1b053d6674694c1a09101e965d5724ba..e2245843cc9b1f83eab0108804e66029bb35aa8f 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-64x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-64x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './mask-rcnn_x101-32x4d_fpn_1x_coco.py' +_base_ = "./mask-rcnn_x101-32x4d_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-64x4d_fpn_2x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-64x4d_fpn_2x_coco.py index f21a55b00db77a3cf2386a738a3b8fb39bf2fa44..49ff8a8314127e164dc75188cd93783cd7ebf7be 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-64x4d_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-64x4d_fpn_2x_coco.py @@ -1,14 +1,15 @@ -_base_ = './mask-rcnn_x101-32x4d_fpn_2x_coco.py' +_base_ = "./mask-rcnn_x101-32x4d_fpn_2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-64x4d_fpn_ms-poly_3x_coco.py b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-64x4d_fpn_ms-poly_3x_coco.py index 09b49d47740b70c4a192d94a95b994d0a303f2d1..3b7593cf13da92cef26fa38291376ca80227e94e 100644 --- a/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-64x4d_fpn_ms-poly_3x_coco.py +++ b/mmpose/configs/mmdet/mask_rcnn/mask-rcnn_x101-64x4d_fpn_ms-poly_3x_coco.py @@ -1,18 +1,16 @@ -_base_ = [ - '../common/ms-poly_3x_coco-instance.py', - '../_base_/models/mask-rcnn_r50_fpn.py' -] +_base_ = ["../common/ms-poly_3x_coco-instance.py", "../_base_/models/mask-rcnn_r50_fpn.py"] model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/maskformer/maskformer_r50_ms-16xb1-75e_coco.py b/mmpose/configs/mmdet/maskformer/maskformer_r50_ms-16xb1-75e_coco.py index 784ee7767bf1318e967444461028b49a38dc3dbc..feddd1cdca4a37a823f21f9b38746f8f66d4ef38 100644 --- a/mmpose/configs/mmdet/maskformer/maskformer_r50_ms-16xb1-75e_coco.py +++ b/mmpose/configs/mmdet/maskformer/maskformer_r50_ms-16xb1-75e_coco.py @@ -1,9 +1,7 @@ -_base_ = [ - '../_base_/datasets/coco_panoptic.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_panoptic.py", "../_base_/default_runtime.py"] data_preprocessor = dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, @@ -11,26 +9,28 @@ data_preprocessor = dict( pad_mask=True, mask_pad_value=0, pad_seg=True, - seg_pad_value=255) + seg_pad_value=255, +) num_things_classes = 80 num_stuff_classes = 53 num_classes = num_things_classes + num_stuff_classes model = dict( - type='MaskFormer', + type="MaskFormer", data_preprocessor=data_preprocessor, backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=-1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), panoptic_head=dict( - type='MaskFormerHead', + type="MaskFormerHead", in_channels=[256, 512, 1024, 2048], # pass to pixel_decoder inside feat_channels=256, out_channels=256, @@ -38,82 +38,55 @@ model = dict( num_stuff_classes=num_stuff_classes, num_queries=100, pixel_decoder=dict( - type='TransformerEncoderPixelDecoder', - norm_cfg=dict(type='GN', num_groups=32), - act_cfg=dict(type='ReLU'), + type="TransformerEncoderPixelDecoder", + norm_cfg=dict(type="GN", num_groups=32), + act_cfg=dict(type="ReLU"), encoder=dict( # DetrTransformerEncoder num_layers=6, layer_cfg=dict( # DetrTransformerEncoderLayer - self_attn_cfg=dict( # MultiheadAttention - embed_dims=256, - num_heads=8, - dropout=0.1, - batch_first=True), + self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.1, batch_first=True), # MultiheadAttention ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, - num_fcs=2, - ffn_drop=0.1, - act_cfg=dict(type='ReLU', inplace=True)))), - positional_encoding=dict(num_feats=128, normalize=True)), + embed_dims=256, feedforward_channels=2048, num_fcs=2, ffn_drop=0.1, act_cfg=dict(type="ReLU", inplace=True) + ), + ), + ), + positional_encoding=dict(num_feats=128, normalize=True), + ), enforce_decoder_input_project=False, positional_encoding=dict(num_feats=128, normalize=True), transformer_decoder=dict( # DetrTransformerDecoder num_layers=6, layer_cfg=dict( # DetrTransformerDecoderLayer - self_attn_cfg=dict( # MultiheadAttention - embed_dims=256, - num_heads=8, - dropout=0.1, - batch_first=True), - cross_attn_cfg=dict( # MultiheadAttention - embed_dims=256, - num_heads=8, - dropout=0.1, - batch_first=True), - ffn_cfg=dict( - embed_dims=256, - feedforward_channels=2048, - num_fcs=2, - ffn_drop=0.1, - act_cfg=dict(type='ReLU', inplace=True))), - return_intermediate=True), + self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.1, batch_first=True), # MultiheadAttention + cross_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.1, batch_first=True), # MultiheadAttention + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, num_fcs=2, ffn_drop=0.1, act_cfg=dict(type="ReLU", inplace=True)), + ), + return_intermediate=True, + ), loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0, - reduction='mean', - class_weight=[1.0] * num_classes + [0.1]), - loss_mask=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - reduction='mean', - loss_weight=20.0), - loss_dice=dict( - type='DiceLoss', - use_sigmoid=True, - activate=True, - reduction='mean', - naive_dice=True, - eps=1.0, - loss_weight=1.0)), + type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0, reduction="mean", class_weight=[1.0] * num_classes + [0.1] + ), + loss_mask=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, reduction="mean", loss_weight=20.0), + loss_dice=dict(type="DiceLoss", use_sigmoid=True, activate=True, reduction="mean", naive_dice=True, eps=1.0, loss_weight=1.0), + ), panoptic_fusion_head=dict( - type='MaskFormerFusionHead', + type="MaskFormerFusionHead", num_things_classes=num_things_classes, num_stuff_classes=num_stuff_classes, loss_panoptic=None, - init_cfg=None), + init_cfg=None, + ), train_cfg=dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='ClassificationCost', weight=1.0), - dict(type='FocalLossCost', weight=20.0, binary_input=True), - dict(type='DiceCost', weight=1.0, pred_act=True, eps=1.0) - ]), - sampler=dict(type='MaskPseudoSampler')), + dict(type="ClassificationCost", weight=1.0), + dict(type="FocalLossCost", weight=20.0, binary_input=True), + dict(type="DiceCost", weight=1.0, pred_act=True, eps=1.0), + ], + ), + sampler=dict(type="MaskPseudoSampler"), + ), test_cfg=dict( panoptic_on=True, # For now, the dataset does not support @@ -126,51 +99,65 @@ model = dict( iou_thr=0.8, # In MaskFormer's panoptic postprocessing, # it will not filter masks whose score is smaller than 0.5 . - filter_low_score=False), - init_cfg=None) + filter_low_score=False, + ), + init_cfg=None, +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='LoadPanopticAnnotations', - with_bbox=True, - with_mask=True, - with_seg=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile"), + dict(type="LoadPanopticAnnotations", with_bbox=True, with_mask=True, with_seg=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', - transforms=[[ - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoice", + transforms=[ + [ + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) + ], + [ + dict(type="RandomChoiceResize", scales=[(400, 1333), (500, 1333), (600, 1333)], keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], ], - [ - dict( - type='RandomChoiceResize', - scales=[(400, 1333), (500, 1333), (600, 1333)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), - (576, 1333), (608, 1333), (640, 1333), - (672, 1333), (704, 1333), (736, 1333), - (768, 1333), (800, 1333)], - keep_ratio=True) - ]]), - dict(type='PackDetInputs') + ), + dict(type="PackDetInputs"), ] -train_dataloader = dict( - batch_size=1, num_workers=1, dataset=dict(pipeline=train_pipeline)) +train_dataloader = dict(batch_size=1, num_workers=1, dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(batch_size=1, num_workers=1) @@ -178,36 +165,22 @@ test_dataloader = val_dataloader # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict( - type='AdamW', - lr=0.0001, - weight_decay=0.0001, - eps=1e-8, - betas=(0.9, 0.999)), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0001, weight_decay=0.0001, eps=1e-8, betas=(0.9, 0.999)), paramwise_cfg=dict( - custom_keys={ - 'backbone': dict(lr_mult=0.1, decay_mult=1.0), - 'query_embed': dict(lr_mult=1.0, decay_mult=0.0) - }, - norm_decay_mult=0.0), - clip_grad=dict(max_norm=0.01, norm_type=2)) + custom_keys={"backbone": dict(lr_mult=0.1, decay_mult=1.0), "query_embed": dict(lr_mult=1.0, decay_mult=0.0)}, norm_decay_mult=0.0 + ), + clip_grad=dict(max_norm=0.01, norm_type=2), +) max_epochs = 75 # learning rate -param_scheduler = dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[50], - gamma=0.1) +param_scheduler = dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[50], gamma=0.1) -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically diff --git a/mmpose/configs/mmdet/maskformer/maskformer_swin-l-p4-w12_64xb1-ms-300e_coco.py b/mmpose/configs/mmdet/maskformer/maskformer_swin-l-p4-w12_64xb1-ms-300e_coco.py index 9e4897f26d47c049f8791169867c2df307b87f61..841c6826cd82562503eff8371df377f3c8dc12fb 100644 --- a/mmpose/configs/mmdet/maskformer/maskformer_swin-l-p4-w12_64xb1-ms-300e_coco.py +++ b/mmpose/configs/mmdet/maskformer/maskformer_swin-l-p4-w12_64xb1-ms-300e_coco.py @@ -1,11 +1,11 @@ -_base_ = './maskformer_r50_ms-16xb1-75e_coco.py' +_base_ = "./maskformer_r50_ms-16xb1-75e_coco.py" -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth" # noqa depths = [2, 2, 18, 2] model = dict( backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", pretrain_img_size=384, embed_dims=192, patch_size=4, @@ -15,22 +15,21 @@ model = dict( num_heads=[6, 12, 24, 48], qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, out_indices=(0, 1, 2, 3), with_cp=False, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), panoptic_head=dict( in_channels=[192, 384, 768, 1536], # pass to pixel_decoder inside - pixel_decoder=dict( - _delete_=True, - type='PixelDecoder', - norm_cfg=dict(type='GN', num_groups=32), - act_cfg=dict(type='ReLU')), - enforce_decoder_input_project=True)) + pixel_decoder=dict(_delete_=True, type="PixelDecoder", norm_cfg=dict(type="GN", num_groups=32), act_cfg=dict(type="ReLU")), + enforce_decoder_input_project=True, + ), +) # optimizer @@ -40,29 +39,20 @@ model = dict( embed_multi = dict(lr_mult=1.0, decay_mult=0.0) norm_multi = dict(lr_mult=1.0, decay_mult=0.0) custom_keys = { - 'norm': norm_multi, - 'absolute_pos_embed': embed_multi, - 'relative_position_bias_table': embed_multi, - 'query_embed': embed_multi + "norm": norm_multi, + "absolute_pos_embed": embed_multi, + "relative_position_bias_table": embed_multi, + "query_embed": embed_multi, } -optim_wrapper = dict( - optimizer=dict(lr=6e-5, weight_decay=0.01), - paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) +optim_wrapper = dict(optimizer=dict(lr=6e-5, weight_decay=0.01), paramwise_cfg=dict(custom_keys=custom_keys, norm_decay_mult=0.0)) max_epochs = 300 # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=1e-6, by_epoch=False, begin=0, end=1500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[250], - gamma=0.1) + dict(type="LinearLR", start_factor=1e-6, by_epoch=False, begin=0, end=1500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[250], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r101_fpn_8xb1-12e_youtubevis2019.py b/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r101_fpn_8xb1-12e_youtubevis2019.py index 4be492d5419b8598120faa29eed44eada0fb5ba2..ec7286542bd9c3b74a9e71a073c1741bb0cdf5e4 100644 --- a/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r101_fpn_8xb1-12e_youtubevis2019.py +++ b/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r101_fpn_8xb1-12e_youtubevis2019.py @@ -1,12 +1,10 @@ -_base_ = ['./masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2019.py'] +_base_ = ["./masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2019.py"] model = dict( detector=dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', checkpoint='torchvision://resnet101')), + backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101")), init_cfg=dict( - type='Pretrained', - checkpoint= # noqa: E251 - 'https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r101_fpn_1x_coco/mask_rcnn_r101_fpn_1x_coco_20200204-1efe0ed5.pth' # noqa: E501 - ))) + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r101_fpn_1x_coco/mask_rcnn_r101_fpn_1x_coco_20200204-1efe0ed5.pth", # noqa: E251 # noqa: E501 + ), + ) +) diff --git a/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r101_fpn_8xb1-12e_youtubevis2021.py b/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r101_fpn_8xb1-12e_youtubevis2021.py index 81bae4af8d8945a024cd498a001e52059741f8a9..d6956eb350237589e2ca71db2c4b42783872f1d4 100644 --- a/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r101_fpn_8xb1-12e_youtubevis2021.py +++ b/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r101_fpn_8xb1-12e_youtubevis2021.py @@ -1,28 +1,22 @@ -_base_ = ['./masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2019.py'] +_base_ = ["./masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2019.py"] model = dict( detector=dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', checkpoint='torchvision://resnet101')), + backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101")), init_cfg=dict( - type='Pretrained', - checkpoint= # noqa: E251 - 'https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r101_fpn_1x_coco/mask_rcnn_r101_fpn_1x_coco_20200204-1efe0ed5.pth' # noqa: E501 - ))) + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r101_fpn_1x_coco/mask_rcnn_r101_fpn_1x_coco_20200204-1efe0ed5.pth", # noqa: E251 # noqa: E501 + ), + ) +) -data_root = 'data/youtube_vis_2021/' +data_root = "data/youtube_vis_2021/" dataset_version = data_root[-5:-1] # dataloader train_dataloader = dict( - dataset=dict( - data_root=data_root, - dataset_version=dataset_version, - ann_file='annotations/youtube_vis_2021_train.json')) + dataset=dict(data_root=data_root, dataset_version=dataset_version, ann_file="annotations/youtube_vis_2021_train.json") +) val_dataloader = dict( - dataset=dict( - data_root=data_root, - dataset_version=dataset_version, - ann_file='annotations/youtube_vis_2021_valid.json')) + dataset=dict(data_root=data_root, dataset_version=dataset_version, ann_file="annotations/youtube_vis_2021_valid.json") +) test_dataloader = val_dataloader diff --git a/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2019.py b/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2019.py index db1be7b0ddf00a07ce6e06e4e179059e68c103a3..ca6dbcbc1688137f9f0491d6f74fc54a9ded5cc4 100644 --- a/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2019.py +++ b/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2019.py @@ -1,10 +1,7 @@ -_base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/youtube_vis.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/models/mask-rcnn_r50_fpn.py", "../_base_/datasets/youtube_vis.py", "../_base_/default_runtime.py"] detector = _base_.model -detector.pop('data_preprocessor') +detector.pop("data_preprocessor") detector.roi_head.bbox_head.update(dict(num_classes=40)) detector.roi_head.mask_head.update(dict(num_classes=40)) detector.train_cfg.rpn.sampler.update(dict(num=64)) @@ -12,59 +9,46 @@ detector.train_cfg.rpn_proposal.update(dict(nms_pre=200, max_per_img=200)) detector.train_cfg.rcnn.sampler.update(dict(num=128)) detector.test_cfg.rpn.update(dict(nms_pre=200, max_per_img=200)) detector.test_cfg.rcnn.update(dict(score_thr=0.01)) -detector['init_cfg'] = dict( - type='Pretrained', - checkpoint= # noqa: E251 - 'https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_fpn_1x_coco/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth' # noqa: E501 +detector["init_cfg"] = dict( + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_fpn_1x_coco/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth", # noqa: E251 # noqa: E501 ) del _base_.model model = dict( - type='MaskTrackRCNN', + type="MaskTrackRCNN", data_preprocessor=dict( - type='TrackDataPreprocessor', + type="TrackDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_mask=True, - pad_size_divisor=32), + pad_size_divisor=32, + ), detector=detector, track_head=dict( - type='RoITrackHead', + type="RoITrackHead", roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), - embed_head=dict( - type='RoIEmbedHead', - num_fcs=2, - roi_feat_size=7, - in_channels=256, - fc_out_channels=1024), + featmap_strides=[4, 8, 16, 32], + ), + embed_head=dict(type="RoIEmbedHead", num_fcs=2, roi_feat_size=7, in_channels=256, fc_out_channels=1024), train_cfg=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=128, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=128, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False)), - tracker=dict( - type='MaskTrackRCNNTracker', - match_weights=dict(det_score=1.0, iou=2.0, det_label=10.0), - num_frames_retain=20)) + debug=False, + ), + ), + tracker=dict(type="MaskTrackRCNNTracker", match_weights=dict(det_score=1.0, iou=2.0, det_label=10.0), num_frames_retain=20), +) -dataset_type = 'YouTubeVISDataset' -data_root = 'data/youtube_vis_2019/' +dataset_type = "YouTubeVISDataset" +data_root = "data/youtube_vis_2019/" dataset_version = data_root[-5:-1] # 2019 or 2021 # train_dataloader @@ -73,58 +57,42 @@ train_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, - sampler=dict(type='TrackImgSampler'), # image-based sampling - batch_sampler=dict(type='TrackAspectRatioBatchSampler'), + sampler=dict(type="TrackImgSampler"), # image-based sampling + batch_sampler=dict(type="TrackAspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, dataset_version=dataset_version, - ann_file='annotations/youtube_vis_2019_train.json', - data_prefix=dict(img_path='train/JPEGImages'), - pipeline=_base_.train_pipeline)) + ann_file="annotations/youtube_vis_2019_train.json", + data_prefix=dict(img_path="train/JPEGImages"), + pipeline=_base_.train_pipeline, + ), +) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.00125, momentum=0.9, weight_decay=0.0001), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", optimizer=dict(type="SGD", lr=0.00125, momentum=0.9, weight_decay=0.0001), clip_grad=dict(max_norm=35, norm_type=2) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 3.0, - by_epoch=False, - begin=0, - end=500), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 3.0, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] # visualizer -default_hooks = dict( - visualization=dict(type='TrackVisualizationHook', draw=False)) +default_hooks = dict(visualization=dict(type="TrackVisualizationHook", draw=False)) -vis_backends = [dict(type='LocalVisBackend')] -visualizer = dict( - type='TrackLocalVisualizer', vis_backends=vis_backends, name='visualizer') +vis_backends = [dict(type="LocalVisBackend")] +visualizer = dict(type="TrackLocalVisualizer", vis_backends=vis_backends, name="visualizer") # runtime settings -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=12, val_begin=13) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=12, val_begin=13) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # evaluator -val_evaluator = dict( - type='YouTubeVISMetric', - metric='youtube_vis_ap', - outfile_prefix='./youtube_vis_results', - format_only=True) +val_evaluator = dict(type="YouTubeVISMetric", metric="youtube_vis_ap", outfile_prefix="./youtube_vis_results", format_only=True) test_evaluator = val_evaluator del detector diff --git a/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2021.py b/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2021.py index 47263d5091c3b5b76056373558ce9a0a97bb071b..cca201f1f7580845394f3d5a408bf66d9128ca29 100644 --- a/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2021.py +++ b/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2021.py @@ -1,17 +1,13 @@ -_base_ = ['./masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2019.py'] +_base_ = ["./masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2019.py"] -data_root = 'data/youtube_vis_2021/' +data_root = "data/youtube_vis_2021/" dataset_version = data_root[-5:-1] # dataloader train_dataloader = dict( - dataset=dict( - data_root=data_root, - dataset_version=dataset_version, - ann_file='annotations/youtube_vis_2021_train.json')) + dataset=dict(data_root=data_root, dataset_version=dataset_version, ann_file="annotations/youtube_vis_2021_train.json") +) val_dataloader = dict( - dataset=dict( - data_root=data_root, - dataset_version=dataset_version, - ann_file='annotations/youtube_vis_2021_valid.json')) + dataset=dict(data_root=data_root, dataset_version=dataset_version, ann_file="annotations/youtube_vis_2021_valid.json") +) test_dataloader = val_dataloader diff --git a/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_x101_fpn_8xb1-12e_youtubevis2019.py b/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_x101_fpn_8xb1-12e_youtubevis2019.py index e7e3f11e13a3a20ba8e4311963db558a9e4fd247..30625ad8664ac786b2bf2d48fe12c7ce6039ab2c 100644 --- a/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_x101_fpn_8xb1-12e_youtubevis2019.py +++ b/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_x101_fpn_8xb1-12e_youtubevis2019.py @@ -1,16 +1,12 @@ -_base_ = ['./masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2019.py'] +_base_ = ["./masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2019.py"] model = dict( detector=dict( backbone=dict( - type='ResNeXt', - depth=101, - groups=64, - base_width=4, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://resnext101_64x4d')), + type="ResNeXt", depth=101, groups=64, base_width=4, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d") + ), init_cfg=dict( - type='Pretrained', - checkpoint= # noqa: E251 - 'https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_x101_64x4d_fpn_1x_coco/mask_rcnn_x101_64x4d_fpn_1x_coco_20200201-9352eb0d.pth' # noqa: E501 - ))) + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_x101_64x4d_fpn_1x_coco/mask_rcnn_x101_64x4d_fpn_1x_coco_20200201-9352eb0d.pth", # noqa: E251 # noqa: E501 + ), + ) +) diff --git a/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_x101_fpn_8xb1-12e_youtubevis2021.py b/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_x101_fpn_8xb1-12e_youtubevis2021.py index ea4c8b92483292cc7de1b2f321d4d514427f3cb5..4c00ce4ae105822271fe7dd8139836bef032891a 100644 --- a/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_x101_fpn_8xb1-12e_youtubevis2021.py +++ b/mmpose/configs/mmdet/masktrack_rcnn/masktrack-rcnn_mask-rcnn_x101_fpn_8xb1-12e_youtubevis2021.py @@ -1,32 +1,24 @@ -_base_ = ['./masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2019.py'] +_base_ = ["./masktrack-rcnn_mask-rcnn_r50_fpn_8xb1-12e_youtubevis2019.py"] model = dict( detector=dict( backbone=dict( - type='ResNeXt', - depth=101, - groups=64, - base_width=4, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://resnext101_64x4d')), + type="ResNeXt", depth=101, groups=64, base_width=4, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d") + ), init_cfg=dict( - type='Pretrained', - checkpoint= # noqa: E251 - 'https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_x101_64x4d_fpn_1x_coco/mask_rcnn_x101_64x4d_fpn_1x_coco_20200201-9352eb0d.pth' # noqa: E501 - ))) + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_x101_64x4d_fpn_1x_coco/mask_rcnn_x101_64x4d_fpn_1x_coco_20200201-9352eb0d.pth", # noqa: E251 # noqa: E501 + ), + ) +) -data_root = 'data/youtube_vis_2021/' +data_root = "data/youtube_vis_2021/" dataset_version = data_root[-5:-1] # dataloader train_dataloader = dict( - dataset=dict( - data_root=data_root, - dataset_version=dataset_version, - ann_file='annotations/youtube_vis_2021_train.json')) + dataset=dict(data_root=data_root, dataset_version=dataset_version, ann_file="annotations/youtube_vis_2021_train.json") +) val_dataloader = dict( - dataset=dict( - data_root=data_root, - dataset_version=dataset_version, - ann_file='annotations/youtube_vis_2021_valid.json')) + dataset=dict(data_root=data_root, dataset_version=dataset_version, ann_file="annotations/youtube_vis_2021_valid.json") +) test_dataloader = val_dataloader diff --git a/mmpose/configs/mmdet/misc/d2_faster-rcnn_r50-caffe_fpn_ms-90k_coco.py b/mmpose/configs/mmdet/misc/d2_faster-rcnn_r50-caffe_fpn_ms-90k_coco.py index d93e1562606b3d6bd657454c99220d329c526f30..60a30071f653bd334af33d88ca99a0e4dfb99cc3 100644 --- a/mmpose/configs/mmdet/misc/d2_faster-rcnn_r50-caffe_fpn_ms-90k_coco.py +++ b/mmpose/configs/mmdet/misc/d2_faster-rcnn_r50-caffe_fpn_ms-90k_coco.py @@ -1,43 +1,36 @@ -_base_ = '../common/ms-90k_coco.py' +_base_ = "../common/ms-90k_coco.py" # model settings model = dict( - type='Detectron2Wrapper', + type="Detectron2Wrapper", bgr_to_rgb=False, detector=dict( # The settings in `d2_detector` will merged into default settings # in detectron2. More details please refer to # https://github.com/facebookresearch/detectron2/blob/main/detectron2/config/defaults.py # noqa - meta_architecture='GeneralizedRCNN', + meta_architecture="GeneralizedRCNN", # If you want to finetune the detector, you can use the # checkpoint released by detectron2, for example: # weights='detectron2://COCO-Detection/faster_rcnn_R_50_FPN_1x/137257794/model_final_b275ba.pkl' # noqa - weights='detectron2://ImageNetPretrained/MSRA/R-50.pkl', + weights="detectron2://ImageNetPretrained/MSRA/R-50.pkl", mask_on=False, pixel_mean=[103.530, 116.280, 123.675], pixel_std=[1.0, 1.0, 1.0], - backbone=dict(name='build_resnet_fpn_backbone', freeze_at=2), - resnets=dict( - depth=50, - out_features=['res2', 'res3', 'res4', 'res5'], - num_groups=1, - norm='FrozenBN'), - fpn=dict( - in_features=['res2', 'res3', 'res4', 'res5'], out_channels=256), + backbone=dict(name="build_resnet_fpn_backbone", freeze_at=2), + resnets=dict(depth=50, out_features=["res2", "res3", "res4", "res5"], num_groups=1, norm="FrozenBN"), + fpn=dict(in_features=["res2", "res3", "res4", "res5"], out_channels=256), anchor_generator=dict( - name='DefaultAnchorGenerator', - sizes=[[32], [64], [128], [256], [512]], - aspect_ratios=[[0.5, 1.0, 2.0]], - angles=[[-90, 0, 90]]), - proposal_generator=dict(name='RPN'), + name="DefaultAnchorGenerator", sizes=[[32], [64], [128], [256], [512]], aspect_ratios=[[0.5, 1.0, 2.0]], angles=[[-90, 0, 90]] + ), + proposal_generator=dict(name="RPN"), rpn=dict( - head_name='StandardRPNHead', - in_features=['p2', 'p3', 'p4', 'p5', 'p6'], + head_name="StandardRPNHead", + in_features=["p2", "p3", "p4", "p5", "p6"], iou_thresholds=[0.3, 0.7], iou_labels=[0, -1, 1], batch_size_per_image=256, positive_fraction=0.5, - bbox_reg_loss_type='smooth_l1', + bbox_reg_loss_type="smooth_l1", bbox_reg_loss_weight=1.0, bbox_reg_weights=(1.0, 1.0, 1.0, 1.0), smooth_l1_beta=0.0, @@ -48,28 +41,33 @@ model = dict( pre_nms_topk_test=1000, post_nms_topk_test=1000, nms_thresh=0.7, - conv_dims=[-1]), + conv_dims=[-1], + ), roi_heads=dict( - name='StandardROIHeads', + name="StandardROIHeads", num_classes=80, - in_features=['p2', 'p3', 'p4', 'p5'], + in_features=["p2", "p3", "p4", "p5"], iou_thresholds=[0.5], iou_labels=[0, 1], batch_size_per_image=512, positive_fraction=0.25, score_thresh_test=0.05, nms_thresh_test=0.5, - proposal_append_gt=True), + proposal_append_gt=True, + ), roi_box_head=dict( - name='FastRCNNConvFCHead', + name="FastRCNNConvFCHead", num_fc=2, fc_dim=1024, conv_dim=256, - pooler_type='ROIAlignV2', + pooler_type="ROIAlignV2", pooler_resolution=7, pooler_sampling_ratio=0, - bbox_reg_loss_type='smooth_l1', + bbox_reg_loss_type="smooth_l1", bbox_reg_loss_weight=1.0, bbox_reg_weights=(10.0, 10.0, 5.0, 5.0), smooth_l1_beta=0.0, - cls_agnostic_bbox_reg=False))) + cls_agnostic_bbox_reg=False, + ), + ), +) diff --git a/mmpose/configs/mmdet/misc/d2_mask-rcnn_r50-caffe_fpn_ms-90k_coco.py b/mmpose/configs/mmdet/misc/d2_mask-rcnn_r50-caffe_fpn_ms-90k_coco.py index c0919c4593f028445dc033e85314320f88409a54..5a7fb803798b22333f09fd2836029967997e4254 100644 --- a/mmpose/configs/mmdet/misc/d2_mask-rcnn_r50-caffe_fpn_ms-90k_coco.py +++ b/mmpose/configs/mmdet/misc/d2_mask-rcnn_r50-caffe_fpn_ms-90k_coco.py @@ -1,43 +1,36 @@ -_base_ = '../common/ms-poly-90k_coco-instance.py' +_base_ = "../common/ms-poly-90k_coco-instance.py" # model settings model = dict( - type='Detectron2Wrapper', + type="Detectron2Wrapper", bgr_to_rgb=False, detector=dict( # The settings in `d2_detector` will merged into default settings # in detectron2. More details please refer to # https://github.com/facebookresearch/detectron2/blob/main/detectron2/config/defaults.py # noqa - meta_architecture='GeneralizedRCNN', + meta_architecture="GeneralizedRCNN", # If you want to finetune the detector, you can use the # checkpoint released by detectron2, for example: # weights='detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x/137260431/model_final_a54504.pkl' # noqa - weights='detectron2://ImageNetPretrained/MSRA/R-50.pkl', + weights="detectron2://ImageNetPretrained/MSRA/R-50.pkl", mask_on=True, pixel_mean=[103.530, 116.280, 123.675], pixel_std=[1.0, 1.0, 1.0], - backbone=dict(name='build_resnet_fpn_backbone', freeze_at=2), - resnets=dict( - depth=50, - out_features=['res2', 'res3', 'res4', 'res5'], - num_groups=1, - norm='FrozenBN'), - fpn=dict( - in_features=['res2', 'res3', 'res4', 'res5'], out_channels=256), + backbone=dict(name="build_resnet_fpn_backbone", freeze_at=2), + resnets=dict(depth=50, out_features=["res2", "res3", "res4", "res5"], num_groups=1, norm="FrozenBN"), + fpn=dict(in_features=["res2", "res3", "res4", "res5"], out_channels=256), anchor_generator=dict( - name='DefaultAnchorGenerator', - sizes=[[32], [64], [128], [256], [512]], - aspect_ratios=[[0.5, 1.0, 2.0]], - angles=[[-90, 0, 90]]), - proposal_generator=dict(name='RPN'), + name="DefaultAnchorGenerator", sizes=[[32], [64], [128], [256], [512]], aspect_ratios=[[0.5, 1.0, 2.0]], angles=[[-90, 0, 90]] + ), + proposal_generator=dict(name="RPN"), rpn=dict( - head_name='StandardRPNHead', - in_features=['p2', 'p3', 'p4', 'p5', 'p6'], + head_name="StandardRPNHead", + in_features=["p2", "p3", "p4", "p5", "p6"], iou_thresholds=[0.3, 0.7], iou_labels=[0, -1, 1], batch_size_per_image=256, positive_fraction=0.5, - bbox_reg_loss_type='smooth_l1', + bbox_reg_loss_type="smooth_l1", bbox_reg_loss_weight=1.0, bbox_reg_weights=(1.0, 1.0, 1.0, 1.0), smooth_l1_beta=0.0, @@ -48,36 +41,42 @@ model = dict( pre_nms_topk_test=1000, post_nms_topk_test=1000, nms_thresh=0.7, - conv_dims=[-1]), + conv_dims=[-1], + ), roi_heads=dict( - name='StandardROIHeads', + name="StandardROIHeads", num_classes=80, - in_features=['p2', 'p3', 'p4', 'p5'], + in_features=["p2", "p3", "p4", "p5"], iou_thresholds=[0.5], iou_labels=[0, 1], batch_size_per_image=512, positive_fraction=0.25, score_thresh_test=0.05, nms_thresh_test=0.5, - proposal_append_gt=True), + proposal_append_gt=True, + ), roi_box_head=dict( - name='FastRCNNConvFCHead', + name="FastRCNNConvFCHead", num_fc=2, fc_dim=1024, conv_dim=256, - pooler_type='ROIAlignV2', + pooler_type="ROIAlignV2", pooler_resolution=7, pooler_sampling_ratio=0, - bbox_reg_loss_type='smooth_l1', + bbox_reg_loss_type="smooth_l1", bbox_reg_loss_weight=1.0, bbox_reg_weights=(10.0, 10.0, 5.0, 5.0), smooth_l1_beta=0.0, - cls_agnostic_bbox_reg=False), + cls_agnostic_bbox_reg=False, + ), roi_mask_head=dict( - name='MaskRCNNConvUpsampleHead', + name="MaskRCNNConvUpsampleHead", conv_dim=256, num_conv=4, - pooler_type='ROIAlignV2', + pooler_type="ROIAlignV2", pooler_resolution=14, pooler_sampling_ratio=0, - cls_agnostic_mask=False))) + cls_agnostic_mask=False, + ), + ), +) diff --git a/mmpose/configs/mmdet/misc/d2_retinanet_r50-caffe_fpn_ms-90k_coco.py b/mmpose/configs/mmdet/misc/d2_retinanet_r50-caffe_fpn_ms-90k_coco.py index d3f7587648bde1d15b5c3c1e1ace6c35bb7c20b0..6a6b6c08dcb98f5f6a56ffba8cbe9a4362235f57 100644 --- a/mmpose/configs/mmdet/misc/d2_retinanet_r50-caffe_fpn_ms-90k_coco.py +++ b/mmpose/configs/mmdet/misc/d2_retinanet_r50-caffe_fpn_ms-90k_coco.py @@ -1,48 +1,47 @@ -_base_ = '../common/ms-90k_coco.py' +_base_ = "../common/ms-90k_coco.py" # model settings model = dict( - type='Detectron2Wrapper', + type="Detectron2Wrapper", bgr_to_rgb=False, detector=dict( # The settings in `d2_detector` will merged into default settings # in detectron2. More details please refer to # https://github.com/facebookresearch/detectron2/blob/main/detectron2/config/defaults.py # noqa - meta_architecture='RetinaNet', + meta_architecture="RetinaNet", # If you want to finetune the detector, you can use the # checkpoint released by detectron2, for example: # weights='detectron2://COCO-Detection/retinanet_R_50_FPN_1x/190397773/model_final_bfca0b.pkl' # noqa - weights='detectron2://ImageNetPretrained/MSRA/R-50.pkl', + weights="detectron2://ImageNetPretrained/MSRA/R-50.pkl", mask_on=False, pixel_mean=[103.530, 116.280, 123.675], pixel_std=[1.0, 1.0, 1.0], - backbone=dict(name='build_retinanet_resnet_fpn_backbone', freeze_at=2), - resnets=dict( - depth=50, - out_features=['res3', 'res4', 'res5'], - num_groups=1, - norm='FrozenBN'), - fpn=dict(in_features=['res3', 'res4', 'res5'], out_channels=256), + backbone=dict(name="build_retinanet_resnet_fpn_backbone", freeze_at=2), + resnets=dict(depth=50, out_features=["res3", "res4", "res5"], num_groups=1, norm="FrozenBN"), + fpn=dict(in_features=["res3", "res4", "res5"], out_channels=256), anchor_generator=dict( - name='DefaultAnchorGenerator', - sizes=[[x, x * 2**(1.0 / 3), x * 2**(2.0 / 3)] - for x in [32, 64, 128, 256, 512]], + name="DefaultAnchorGenerator", + sizes=[[x, x * 2 ** (1.0 / 3), x * 2 ** (2.0 / 3)] for x in [32, 64, 128, 256, 512]], aspect_ratios=[[0.5, 1.0, 2.0]], - angles=[[-90, 0, 90]]), + angles=[[-90, 0, 90]], + ), retinanet=dict( num_classes=80, - in_features=['p3', 'p4', 'p5', 'p6', 'p7'], + in_features=["p3", "p4", "p5", "p6", "p7"], num_convs=4, iou_thresholds=[0.4, 0.5], iou_labels=[0, -1, 1], bbox_reg_weights=(1.0, 1.0, 1.0, 1.0), - bbox_reg_loss_type='smooth_l1', + bbox_reg_loss_type="smooth_l1", smooth_l1_loss_beta=0.0, focal_loss_gamma=2.0, focal_loss_alpha=0.25, prior_prob=0.01, score_thresh_test=0.05, topk_candidates_test=1000, - nms_thresh_test=0.5))) + nms_thresh_test=0.5, + ), + ), +) optim_wrapper = dict(optimizer=dict(lr=0.01)) diff --git a/mmpose/configs/mmdet/mm_grounding_dino/brain_tumor/grounding_dino_swin-t_finetune_8xb4_50e_brain_tumor.py b/mmpose/configs/mmdet/mm_grounding_dino/brain_tumor/grounding_dino_swin-t_finetune_8xb4_50e_brain_tumor.py index 1172da5b64102413eec11f223f467ad4c03a7cdf..fa04f85a056d89f79f25cd2c67c37433ededdcfb 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/brain_tumor/grounding_dino_swin-t_finetune_8xb4_50e_brain_tumor.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/brain_tumor/grounding_dino_swin-t_finetune_8xb4_50e_brain_tumor.py @@ -1,112 +1,118 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" # https://universe.roboflow.com/roboflow-100/brain-tumor-m2pbp/dataset/2 -data_root = 'data/brain_tumor_v2/' -class_name = ('label0', 'label1', 'label2') -label_name = '_annotations.coco.json' +data_root = "data/brain_tumor_v2/" +class_name = ("label0", "label1", "label2") +label_name = "_annotations.coco.json" palette = [(220, 20, 60), (255, 0, 0), (0, 0, 142)] metainfo = dict(classes=class_name, palette=palette) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction", "text", "custom_entities"), + ), ] train_dataloader = dict( - sampler=dict(_delete_=True, type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(_delete_=True, type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( _delete_=True, - type='RepeatDataset', + type="RepeatDataset", times=10, dataset=dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, metainfo=metainfo, filter_cfg=dict(filter_empty_gt=False, min_size=32), pipeline=train_pipeline, return_classes=True, - data_prefix=dict(img='train/'), - ann_file='train/' + label_name))) + data_prefix=dict(img="train/"), + ann_file="train/" + label_name, + ), + ), +) val_dataloader = dict( dataset=dict( - metainfo=metainfo, - data_root=data_root, - return_classes=True, - ann_file='valid/' + label_name, - data_prefix=dict(img='valid/'))) + metainfo=metainfo, data_root=data_root, return_classes=True, ann_file="valid/" + label_name, data_prefix=dict(img="valid/") + ) +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'valid/' + label_name, - metric='bbox', - format_only=False) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "valid/" + label_name, metric="bbox", format_only=False) test_evaluator = val_evaluator optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0001, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0001, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), - paramwise_cfg=dict(custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1) - })) + paramwise_cfg=dict(custom_keys={"absolute_pos_embed": dict(decay_mult=0.0), "backbone": dict(lr_mult=0.1)}), +) # learning policy max_epochs = 5 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[4], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[4], gamma=0.1)] train_cfg = dict(max_epochs=max_epochs, val_interval=1) -default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best='auto')) +default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best="auto")) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/mm_grounding_dino/cityscapes/grounding_dino_swin-t_finetune_8xb4_50e_cityscapes.py b/mmpose/configs/mmdet/mm_grounding_dino/cityscapes/grounding_dino_swin-t_finetune_8xb4_50e_cityscapes.py index c4283413c4ba0c060144d7fb85f7d064a60577c7..8721046be98bc5a61796a5f334f65360f7f8a559 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/cityscapes/grounding_dino_swin-t_finetune_8xb4_50e_cityscapes.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/cityscapes/grounding_dino_swin-t_finetune_8xb4_50e_cityscapes.py @@ -1,110 +1,120 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -data_root = 'data/cityscapes/' -class_name = ('person', 'rider', 'car', 'truck', 'bus', 'train', 'motorcycle', - 'bicycle') -palette = [(220, 20, 60), (255, 0, 0), (0, 0, 142), (0, 0, 70), (0, 60, 100), - (0, 80, 100), (0, 0, 230), (119, 11, 32)] +data_root = "data/cityscapes/" +class_name = ("person", "rider", "car", "truck", "bus", "train", "motorcycle", "bicycle") +palette = [(220, 20, 60), (255, 0, 0), (0, 0, 142), (0, 0, 70), (0, 60, 100), (0, 80, 100), (0, 0, 230), (119, 11, 32)] metainfo = dict(classes=class_name, palette=palette) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction", "text", "custom_entities"), + ), ] train_dataloader = dict( - sampler=dict(_delete_=True, type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(_delete_=True, type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( _delete_=True, - type='RepeatDataset', + type="RepeatDataset", times=10, dataset=dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, metainfo=metainfo, filter_cfg=dict(filter_empty_gt=False, min_size=32), pipeline=train_pipeline, return_classes=True, - data_prefix=dict(img='leftImg8bit/train/'), - ann_file='annotations/instancesonly_filtered_gtFine_train.json'))) + data_prefix=dict(img="leftImg8bit/train/"), + ann_file="annotations/instancesonly_filtered_gtFine_train.json", + ), + ), +) val_dataloader = dict( dataset=dict( metainfo=metainfo, data_root=data_root, return_classes=True, - ann_file='annotations/instancesonly_filtered_gtFine_val.json', - data_prefix=dict(img='leftImg8bit/val/'))) + ann_file="annotations/instancesonly_filtered_gtFine_val.json", + data_prefix=dict(img="leftImg8bit/val/"), + ) +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instancesonly_filtered_gtFine_val.json', - metric='bbox', - format_only=False) + type="CocoMetric", ann_file=data_root + "annotations/instancesonly_filtered_gtFine_val.json", metric="bbox", format_only=False +) test_evaluator = val_evaluator optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0001, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0001, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), - paramwise_cfg=dict(custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1) - })) + paramwise_cfg=dict(custom_keys={"absolute_pos_embed": dict(decay_mult=0.0), "backbone": dict(lr_mult=0.1)}), +) # learning policy max_epochs = 5 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[4], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[4], gamma=0.1)] train_cfg = dict(max_epochs=max_epochs, val_interval=1) -default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best='auto')) +default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best="auto")) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/mm_grounding_dino/coco/grounding_dino_swin-t_finetune_16xb4_1x_coco.py b/mmpose/configs/mmdet/mm_grounding_dino/coco/grounding_dino_swin-t_finetune_16xb4_1x_coco.py index 792297accd302d390f865bee294b1294863d6ac1..f5e0b94a93fd97e44437f4e40300ac487a481a34 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/coco/grounding_dino_swin-t_finetune_16xb4_1x_coco.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/coco/grounding_dino_swin-t_finetune_16xb4_1x_coco.py @@ -1,85 +1,100 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -data_root = 'data/coco/' +data_root = "data/coco/" train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction", "text", "custom_entities"), + ), ] train_dataloader = dict( dataset=dict( _delete_=True, - type='CocoDataset', + type="CocoDataset", data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), return_classes=True, filter_cfg=dict(filter_empty_gt=False, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + ) +) optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0002, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1), - 'language_model': dict(lr_mult=0.1), - })) + "absolute_pos_embed": dict(decay_mult=0.0), + "backbone": dict(lr_mult=0.1), + "language_model": dict(lr_mult=0.1), + } + ), +) # learning policy max_epochs = 12 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[8, 11], gamma=0.1)] train_cfg = dict(max_epochs=max_epochs, val_interval=1) -default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best='auto')) +default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best="auto")) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/mm_grounding_dino/coco/grounding_dino_swin-t_finetune_16xb4_1x_coco_48_17.py b/mmpose/configs/mmdet/mm_grounding_dino/coco/grounding_dino_swin-t_finetune_16xb4_1x_coco_48_17.py index e68afbb43286af24612321129042e7d0e0f34b29..d7720659179d607a6d9525a2d6a02166d2d8a8f9 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/coco/grounding_dino_swin-t_finetune_16xb4_1x_coco_48_17.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/coco/grounding_dino_swin-t_finetune_16xb4_1x_coco_48_17.py @@ -1,157 +1,272 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -data_root = 'data/coco/' -base_classes = ('person', 'bicycle', 'car', 'motorcycle', 'train', 'truck', - 'boat', 'bench', 'bird', 'horse', 'sheep', 'bear', 'zebra', - 'giraffe', 'backpack', 'handbag', 'suitcase', 'frisbee', - 'skis', 'kite', 'surfboard', 'bottle', 'fork', 'spoon', 'bowl', - 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', - 'pizza', 'donut', 'chair', 'bed', 'toilet', 'tv', 'laptop', - 'mouse', 'remote', 'microwave', 'oven', 'toaster', - 'refrigerator', 'book', 'clock', 'vase', 'toothbrush') # 48 -novel_classes = ('airplane', 'bus', 'cat', 'dog', 'cow', 'elephant', - 'umbrella', 'tie', 'snowboard', 'skateboard', 'cup', 'knife', - 'cake', 'couch', 'keyboard', 'sink', 'scissors') # 17 +data_root = "data/coco/" +base_classes = ( + "person", + "bicycle", + "car", + "motorcycle", + "train", + "truck", + "boat", + "bench", + "bird", + "horse", + "sheep", + "bear", + "zebra", + "giraffe", + "backpack", + "handbag", + "suitcase", + "frisbee", + "skis", + "kite", + "surfboard", + "bottle", + "fork", + "spoon", + "bowl", + "banana", + "apple", + "sandwich", + "orange", + "broccoli", + "carrot", + "pizza", + "donut", + "chair", + "bed", + "toilet", + "tv", + "laptop", + "mouse", + "remote", + "microwave", + "oven", + "toaster", + "refrigerator", + "book", + "clock", + "vase", + "toothbrush", +) # 48 +novel_classes = ( + "airplane", + "bus", + "cat", + "dog", + "cow", + "elephant", + "umbrella", + "tie", + "snowboard", + "skateboard", + "cup", + "knife", + "cake", + "couch", + "keyboard", + "sink", + "scissors", +) # 17 all_classes = ( - 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', - 'truck', 'boat', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', - 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', - 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'kite', 'skateboard', - 'surfboard', 'bottle', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', - 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'pizza', 'donut', - 'cake', 'chair', 'couch', 'bed', 'toilet', 'tv', 'laptop', 'mouse', - 'remote', 'keyboard', 'microwave', 'oven', 'toaster', 'sink', - 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'toothbrush') # 65 + "person", + "bicycle", + "car", + "motorcycle", + "airplane", + "bus", + "train", + "truck", + "boat", + "bench", + "bird", + "cat", + "dog", + "horse", + "sheep", + "cow", + "elephant", + "bear", + "zebra", + "giraffe", + "backpack", + "umbrella", + "handbag", + "tie", + "suitcase", + "frisbee", + "skis", + "snowboard", + "kite", + "skateboard", + "surfboard", + "bottle", + "cup", + "fork", + "knife", + "spoon", + "bowl", + "banana", + "apple", + "sandwich", + "orange", + "broccoli", + "carrot", + "pizza", + "donut", + "cake", + "chair", + "couch", + "bed", + "toilet", + "tv", + "laptop", + "mouse", + "remote", + "keyboard", + "microwave", + "oven", + "toaster", + "sink", + "refrigerator", + "book", + "clock", + "vase", + "scissors", + "toothbrush", +) # 65 train_metainfo = dict(classes=base_classes) -test_metainfo = dict( - classes=all_classes, - base_classes=base_classes, - novel_classes=novel_classes) +test_metainfo = dict(classes=all_classes, base_classes=base_classes, novel_classes=novel_classes) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction", "text", "custom_entities"), + ), ] test_pipeline = [ + dict(type="LoadImageFromFile", backend_args=None, imdecode_backend="pillow"), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True, backend="pillow"), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='LoadImageFromFile', backend_args=None, - imdecode_backend='pillow'), - dict( - type='FixScaleResize', - scale=(800, 1333), - keep_ratio=True, - backend='pillow'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities', - 'tokens_positive')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "text", "custom_entities", "tokens_positive"), + ), ] train_dataloader = dict( dataset=dict( _delete_=True, - type='CocoDataset', + type="CocoDataset", metainfo=train_metainfo, data_root=data_root, - ann_file='annotations/instances_train2017_seen_2.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017_seen_2.json", + data_prefix=dict(img="train2017/"), return_classes=True, filter_cfg=dict(filter_empty_gt=False, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + ) +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( - type='CocoDataset', + type="CocoDataset", metainfo=test_metainfo, data_root=data_root, - ann_file='annotations/instances_val2017_all_2.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017_all_2.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, return_classes=True, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='OVCocoMetric', - ann_file=data_root + 'annotations/instances_val2017_all_2.json', - metric='bbox', - format_only=False) +val_evaluator = dict(type="OVCocoMetric", ann_file=data_root + "annotations/instances_val2017_all_2.json", metric="bbox", format_only=False) test_evaluator = val_evaluator optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.00005, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.00005, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1), + "absolute_pos_embed": dict(decay_mult=0.0), + "backbone": dict(lr_mult=0.1), # 'language_model': dict(lr_mult=0), - })) + } + ), +) # learning policy max_epochs = 12 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[8, 11], gamma=0.1)] train_cfg = dict(max_epochs=max_epochs, val_interval=1) -default_hooks = dict( - checkpoint=dict( - max_keep_ckpts=1, save_best='coco/novel_ap50', rule='greater')) +default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best="coco/novel_ap50", rule="greater")) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/mm_grounding_dino/coco/grounding_dino_swin-t_finetune_16xb4_1x_sft_coco.py b/mmpose/configs/mmdet/mm_grounding_dino/coco/grounding_dino_swin-t_finetune_16xb4_1x_sft_coco.py index 5505df58b8b103a93570519c20aaf0fcc144e91c..550c1ccbba4af596badd0ed2e9601038c0a38461 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/coco/grounding_dino_swin-t_finetune_16xb4_1x_sft_coco.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/coco/grounding_dino_swin-t_finetune_16xb4_1x_sft_coco.py @@ -1,93 +1,121 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -data_root = 'data/coco/' +data_root = "data/coco/" train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), dict( - type='RandomSamplingNegPos', + type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=20, # ======= important ===== - label_map_file='data/coco/annotations/coco2017_label_map.json', - max_tokens=256), + label_map_file="data/coco/annotations/coco2017_label_map.json", + max_tokens=256, + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] train_dataloader = dict( dataset=dict( _delete_=True, - type='ODVGDataset', + type="ODVGDataset", need_text=False, data_root=data_root, - ann_file='annotations/instances_train2017_od.json', - label_map_file='annotations/coco2017_label_map.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017_od.json", + label_map_file="annotations/coco2017_label_map.json", + data_prefix=dict(img="train2017/"), return_classes=True, filter_cfg=dict(filter_empty_gt=False, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + ) +) optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.00005, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.00005, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1), - 'language_model': dict(lr_mult=0.0), - })) + "absolute_pos_embed": dict(decay_mult=0.0), + "backbone": dict(lr_mult=0.1), + "language_model": dict(lr_mult=0.0), + } + ), +) # learning policy max_epochs = 12 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[8, 11], gamma=0.1)] train_cfg = dict(max_epochs=max_epochs, val_interval=1) -default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best='auto')) +default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best="auto")) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/mm_grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_concat_dod.py b/mmpose/configs/mmdet/mm_grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_concat_dod.py index e59a0a52518aa125d556aab12f8076a95f39ec22..fac8e98ad5d091150e71a6ad9139b51852cd162c 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_concat_dod.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_concat_dod.py @@ -1,78 +1,64 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -data_root = 'data/d3/' +data_root = "data/d3/" test_pipeline = [ + dict(type="LoadImageFromFile", backend_args=None, imdecode_backend="pillow"), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True, backend="pillow"), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='LoadImageFromFile', backend_args=None, - imdecode_backend='pillow'), - dict( - type='FixScaleResize', - scale=(800, 1333), - keep_ratio=True, - backend='pillow'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities', 'sent_ids')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "text", "custom_entities", "sent_ids"), + ), ] # -------------------------------------------------# val_dataset_full = dict( - type='DODDataset', + type="DODDataset", data_root=data_root, - ann_file='d3_json/d3_full_annotations.json', - data_prefix=dict(img='d3_images/', anno='d3_pkl'), + ann_file="d3_json/d3_full_annotations.json", + data_prefix=dict(img="d3_images/", anno="d3_pkl"), pipeline=test_pipeline, test_mode=True, backend_args=None, - return_classes=True) + return_classes=True, +) -val_evaluator_full = dict( - type='DODCocoMetric', - ann_file=data_root + 'd3_json/d3_full_annotations.json') +val_evaluator_full = dict(type="DODCocoMetric", ann_file=data_root + "d3_json/d3_full_annotations.json") # -------------------------------------------------# val_dataset_pres = dict( - type='DODDataset', + type="DODDataset", data_root=data_root, - ann_file='d3_json/d3_pres_annotations.json', - data_prefix=dict(img='d3_images/', anno='d3_pkl'), + ann_file="d3_json/d3_pres_annotations.json", + data_prefix=dict(img="d3_images/", anno="d3_pkl"), pipeline=test_pipeline, test_mode=True, backend_args=None, - return_classes=True) -val_evaluator_pres = dict( - type='DODCocoMetric', - ann_file=data_root + 'd3_json/d3_pres_annotations.json') + return_classes=True, +) +val_evaluator_pres = dict(type="DODCocoMetric", ann_file=data_root + "d3_json/d3_pres_annotations.json") # -------------------------------------------------# val_dataset_abs = dict( - type='DODDataset', + type="DODDataset", data_root=data_root, - ann_file='d3_json/d3_abs_annotations.json', - data_prefix=dict(img='d3_images/', anno='d3_pkl'), + ann_file="d3_json/d3_abs_annotations.json", + data_prefix=dict(img="d3_images/", anno="d3_pkl"), pipeline=test_pipeline, test_mode=True, backend_args=None, - return_classes=True) -val_evaluator_abs = dict( - type='DODCocoMetric', - ann_file=data_root + 'd3_json/d3_abs_annotations.json') + return_classes=True, +) +val_evaluator_abs = dict(type="DODCocoMetric", ann_file=data_root + "d3_json/d3_abs_annotations.json") # -------------------------------------------------# datasets = [val_dataset_full, val_dataset_pres, val_dataset_abs] -dataset_prefixes = ['FULL', 'PRES', 'ABS'] +dataset_prefixes = ["FULL", "PRES", "ABS"] metrics = [val_evaluator_full, val_evaluator_pres, val_evaluator_abs] -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/mm_grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_parallel_dod.py b/mmpose/configs/mmdet/mm_grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_parallel_dod.py index 3d680091162e5ac96c15c76b58a18764e85d3233..cd9786f64eaa1987884d06dc67c06995b6b77b74 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_parallel_dod.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/dod/grounding_dino_swin-t_pretrain_zeroshot_parallel_dod.py @@ -1,3 +1,3 @@ -_base_ = 'grounding_dino_swin-t_pretrain_zeroshot_concat_dod.py' +_base_ = "grounding_dino_swin-t_pretrain_zeroshot_concat_dod.py" model = dict(test_cfg=dict(chunked_size=1)) diff --git a/mmpose/configs/mmdet/mm_grounding_dino/flickr30k/grounding_dino_swin-t-pretrain_flickr30k.py b/mmpose/configs/mmdet/mm_grounding_dino/flickr30k/grounding_dino_swin-t-pretrain_flickr30k.py index e9eb783da97a6d665002cc9192f740010282870e..ac00dae99e6a06150f65dae08ba8470a5576bfc6 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/flickr30k/grounding_dino_swin-t-pretrain_flickr30k.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/flickr30k/grounding_dino_swin-t-pretrain_flickr30k.py @@ -1,57 +1,56 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -dataset_type = 'Flickr30kDataset' -data_root = 'data/flickr30k_entities/' +dataset_type = "Flickr30kDataset" +data_root = "data/flickr30k_entities/" test_pipeline = [ + dict(type="LoadImageFromFile", backend_args=None, imdecode_backend="pillow"), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True, backend="pillow"), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='LoadImageFromFile', backend_args=None, - imdecode_backend='pillow'), - dict( - type='FixScaleResize', - scale=(800, 1333), - keep_ratio=True, - backend='pillow'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities', - 'tokens_positive', 'phrase_ids', 'phrases')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "text", + "custom_entities", + "tokens_positive", + "phrase_ids", + "phrases", + ), + ), ] dataset_Flickr30k_val = dict( type=dataset_type, data_root=data_root, - ann_file='final_flickr_separateGT_val.json', - data_prefix=dict(img='flickr30k_images/'), + ann_file="final_flickr_separateGT_val.json", + data_prefix=dict(img="flickr30k_images/"), pipeline=test_pipeline, ) dataset_Flickr30k_test = dict( type=dataset_type, data_root=data_root, - ann_file='final_flickr_separateGT_test.json', - data_prefix=dict(img='flickr30k_images/'), + ann_file="final_flickr_separateGT_test.json", + data_prefix=dict(img="flickr30k_images/"), pipeline=test_pipeline, ) -val_evaluator_Flickr30k = dict(type='Flickr30kMetric') +val_evaluator_Flickr30k = dict(type="Flickr30kMetric") -test_evaluator_Flickr30k = dict(type='Flickr30kMetric') +test_evaluator_Flickr30k = dict(type="Flickr30kMetric") # ----------Config---------- # -dataset_prefixes = ['Flickr30kVal', 'Flickr30kTest'] +dataset_prefixes = ["Flickr30kVal", "Flickr30kTest"] datasets = [dataset_Flickr30k_val, dataset_Flickr30k_test] metrics = [val_evaluator_Flickr30k, test_evaluator_Flickr30k] -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-b_pretrain_all.py b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-b_pretrain_all.py index eff58bba6b192fe43e62cb1e3ae40a546e1a3ddf..c086ba907ea2ce8534e23434877e527bc7331f36 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-b_pretrain_all.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-b_pretrain_all.py @@ -1,12 +1,12 @@ -_base_ = 'grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "grounding_dino_swin-t_pretrain_obj365.py" -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-b_pretrain_obj365_goldg_v3det/grounding_dino_swin-b_pretrain_obj365_goldg_v3de-f83eef00.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-b_pretrain_obj365_goldg_v3det/grounding_dino_swin-b_pretrain_obj365_goldg_v3de-f83eef00.pth" # noqa model = dict( use_autocast=True, backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", pretrain_img_size=384, embed_dims=128, depths=[2, 2, 18, 2], @@ -15,24 +15,25 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, out_indices=(1, 2, 3), with_cp=True, convert_weights=True, frozen_stages=-1, - init_cfg=None), + init_cfg=None, + ), neck=dict(in_channels=[256, 512, 1024]), ) o365v1_od_dataset = dict( - type='ODVGDataset', - data_root='data/objects365v1/', - ann_file='o365v1_train_odvg.json', - label_map_file='o365v1_label_map.json', - data_prefix=dict(img='train/'), + type="ODVGDataset", + data_root="data/objects365v1/", + ann_file="o365v1_train_odvg.json", + label_map_file="o365v1_label_map.json", + data_prefix=dict(img="train/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, @@ -40,296 +41,367 @@ o365v1_od_dataset = dict( ) flickr30k_dataset = dict( - type='ODVGDataset', - data_root='data/flickr30k_entities/', - ann_file='final_flickr_separateGT_train_vg.json', + type="ODVGDataset", + data_root="data/flickr30k_entities/", + ann_file="final_flickr_separateGT_train_vg.json", label_map_file=None, - data_prefix=dict(img='flickr30k_images/'), + data_prefix=dict(img="flickr30k_images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) gqa_dataset = dict( - type='ODVGDataset', - data_root='data/gqa/', - ann_file='final_mixed_train_no_coco_vg.json', + type="ODVGDataset", + data_root="data/gqa/", + ann_file="final_mixed_train_no_coco_vg.json", label_map_file=None, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) v3d_train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), dict( - type='RandomSamplingNegPos', + type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, # change this - label_map_file='data/V3Det/annotations/v3det_2023_v1_label_map.json', - max_tokens=256), + label_map_file="data/V3Det/annotations/v3det_2023_v1_label_map.json", + max_tokens=256, + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] v3det_dataset = dict( - type='ODVGDataset', - data_root='data/V3Det/', - ann_file='annotations/v3det_2023_v1_train_od.json', - label_map_file='annotations/v3det_2023_v1_label_map.json', - data_prefix=dict(img=''), + type="ODVGDataset", + data_root="data/V3Det/", + ann_file="annotations/v3det_2023_v1_train_od.json", + label_map_file="annotations/v3det_2023_v1_label_map.json", + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False), need_text=False, # change this pipeline=v3d_train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) grit_dataset = dict( - type='ODVGDataset', - data_root='grit_processed/', - ann_file='grit20m_vg.json', + type="ODVGDataset", + data_root="grit_processed/", + ann_file="grit20m_vg.json", label_map_file=None, - data_prefix=dict(img=''), + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) # --------------------------- lvis od dataset--------------------------- lvis_train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), dict( - type='RandomSamplingNegPos', + type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, # change this - label_map_file='data/coco/annotations/lvis_v1_label_map.json', - max_tokens=256), + label_map_file="data/coco/annotations/lvis_v1_label_map.json", + max_tokens=256, + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] lvis_dataset = dict( - type='ClassBalancedDataset', + type="ClassBalancedDataset", oversample_thr=1e-3, dataset=dict( - type='ODVGDataset', - data_root='data/coco/', - ann_file='annotations/lvis_v1_train_od.json', - label_map_file='annotations/lvis_v1_label_map.json', - data_prefix=dict(img=''), + type="ODVGDataset", + data_root="data/coco/", + ann_file="annotations/lvis_v1_train_od.json", + label_map_file="annotations/lvis_v1_label_map.json", + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False), need_text=False, # change this pipeline=lvis_train_pipeline, return_classes=True, - backend_args=None)) + backend_args=None, + ), +) # --------------------------- coco2017 od dataset--------------------------- coco2017_train_dataset = dict( - type='RepeatDataset', + type="RepeatDataset", times=2, dataset=dict( - type='ODVGDataset', - data_root='data/coco/', - ann_file='annotations/instance_train2017_norefval_od.json', - label_map_file='annotations/coco2017_label_map.json', - data_prefix=dict(img='train2017'), + type="ODVGDataset", + data_root="data/coco/", + ann_file="annotations/instance_train2017_norefval_od.json", + label_map_file="annotations/coco2017_label_map.json", + data_prefix=dict(img="train2017"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None)) + backend_args=None, + ), +) # --------------------------- coco2014 vg dataset--------------------------- coco2014_vg_dataset = dict( - type='ODVGDataset', - data_root='data/coco/', - ann_file='mdetr_annotations/final_mixed_train_only_coco_vg.json', + type="ODVGDataset", + data_root="data/coco/", + ann_file="mdetr_annotations/final_mixed_train_only_coco_vg.json", label_map_file=None, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) # --------------------------- refcoco vg dataset--------------------------- refcoco_dataset = dict( - type='RepeatDataset', + type="RepeatDataset", times=2, dataset=dict( - type='ODVGDataset', - data_root='data/coco/', - ann_file='mdetr_annotations/finetune_refcoco_train_vg.json', + type="ODVGDataset", + data_root="data/coco/", + ann_file="mdetr_annotations/finetune_refcoco_train_vg.json", label_map_file=None, - data_prefix=dict(img='train2014'), + data_prefix=dict(img="train2014"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None)) + backend_args=None, + ), +) # --------------------------- refcoco+ vg dataset--------------------------- refcoco_plus_dataset = dict( - type='RepeatDataset', + type="RepeatDataset", times=2, dataset=dict( - type='ODVGDataset', - data_root='data/coco/', - ann_file='mdetr_annotations/finetune_refcoco+_train_vg.json', + type="ODVGDataset", + data_root="data/coco/", + ann_file="mdetr_annotations/finetune_refcoco+_train_vg.json", label_map_file=None, - data_prefix=dict(img='train2014'), + data_prefix=dict(img="train2014"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None)) + backend_args=None, + ), +) # --------------------------- refcocog vg dataset--------------------------- refcocog_dataset = dict( - type='RepeatDataset', + type="RepeatDataset", times=3, dataset=dict( - type='ODVGDataset', - data_root='data/coco/', - ann_file='mdetr_annotations/finetune_refcocog_train_vg.json', + type="ODVGDataset", + data_root="data/coco/", + ann_file="mdetr_annotations/finetune_refcocog_train_vg.json", label_map_file=None, - data_prefix=dict(img='train2014'), + data_prefix=dict(img="train2014"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None)) + backend_args=None, + ), +) # --------------------------- grefcoco vg dataset--------------------------- grefcoco_dataset = dict( - type='RepeatDataset', + type="RepeatDataset", times=2, dataset=dict( - type='ODVGDataset', - data_root='data/coco/', - ann_file='mdetr_annotations/finetune_grefcoco_train_vg.json', + type="ODVGDataset", + data_root="data/coco/", + ann_file="mdetr_annotations/finetune_grefcoco_train_vg.json", label_map_file=None, - data_prefix=dict(img='train2014'), + data_prefix=dict(img="train2014"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None)) + backend_args=None, + ), +) # --------------------------- dataloader--------------------------- train_dataloader = dict( batch_size=4, num_workers=4, sampler=dict( - _delete_=True, - type='CustomSampleSizeSampler', - ratio_mode=True, - dataset_size=[-1, -1, 0.07, -1, -1, -1, -1, -1, -1, -1, -1, -1]), - dataset=dict(datasets=[ - o365v1_od_dataset, # 1.74M - v3det_dataset, # - grit_dataset, - lvis_dataset, - coco2017_train_dataset, # 0.12M - flickr30k_dataset, # 0.15M - gqa_dataset, # 0.62M - coco2014_vg_dataset, # 0.49M - refcoco_dataset, # 0.12M - refcoco_plus_dataset, # 0.12M - refcocog_dataset, # 0.08M - grefcoco_dataset, # 0.19M - ])) + _delete_=True, type="CustomSampleSizeSampler", ratio_mode=True, dataset_size=[-1, -1, 0.07, -1, -1, -1, -1, -1, -1, -1, -1, -1] + ), + dataset=dict( + datasets=[ + o365v1_od_dataset, # 1.74M + v3det_dataset, # + grit_dataset, + lvis_dataset, + coco2017_train_dataset, # 0.12M + flickr30k_dataset, # 0.15M + gqa_dataset, # 0.62M + coco2014_vg_dataset, # 0.49M + refcoco_dataset, # 0.12M + refcoco_plus_dataset, # 0.12M + refcocog_dataset, # 0.08M + grefcoco_dataset, # 0.19M + ] + ), +) optim_wrapper = dict(optimizer=dict(lr=0.0001)) # learning policy max_iter = 304680 -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=max_iter, - val_interval=10000) +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=max_iter, val_interval=10000) param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_iter, - by_epoch=False, - milestones=[228510], - gamma=0.1) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_iter, by_epoch=False, milestones=[228510], gamma=0.1), ] -default_hooks = dict( - checkpoint=dict(by_epoch=False, interval=10000, max_keep_ckpts=20)) +default_hooks = dict(checkpoint=dict(by_epoch=False, interval=10000, max_keep_ckpts=20)) log_processor = dict(by_epoch=False) diff --git a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-b_pretrain_obj365_goldg_v3det.py b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-b_pretrain_obj365_goldg_v3det.py index 743d02cffbe9c38977edad2bce8a53bd6a8594af..64d618f012c2838c4dd98066b2b86d7a9d4f2ec1 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-b_pretrain_obj365_goldg_v3det.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-b_pretrain_obj365_goldg_v3det.py @@ -1,11 +1,11 @@ -_base_ = 'grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "grounding_dino_swin-t_pretrain_obj365.py" -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth" # noqa model = dict( use_autocast=True, backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", pretrain_img_size=384, embed_dims=128, depths=[2, 2, 18, 2], @@ -14,24 +14,25 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, out_indices=(1, 2, 3), with_cp=True, convert_weights=True, frozen_stages=-1, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), neck=dict(in_channels=[256, 512, 1024]), ) o365v1_od_dataset = dict( - type='ODVGDataset', - data_root='data/objects365v1/', - ann_file='o365v1_train_odvg.json', - label_map_file='o365v1_label_map.json', - data_prefix=dict(img='train/'), + type="ODVGDataset", + data_root="data/objects365v1/", + ann_file="o365v1_train_odvg.json", + label_map_file="o365v1_label_map.json", + data_prefix=dict(img="train/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, @@ -39,105 +40,130 @@ o365v1_od_dataset = dict( ) flickr30k_dataset = dict( - type='ODVGDataset', - data_root='data/flickr30k_entities/', - ann_file='final_flickr_separateGT_train_vg.json', + type="ODVGDataset", + data_root="data/flickr30k_entities/", + ann_file="final_flickr_separateGT_train_vg.json", label_map_file=None, - data_prefix=dict(img='flickr30k_images/'), + data_prefix=dict(img="flickr30k_images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) gqa_dataset = dict( - type='ODVGDataset', - data_root='data/gqa/', - ann_file='final_mixed_train_no_coco_vg.json', + type="ODVGDataset", + data_root="data/gqa/", + ann_file="final_mixed_train_no_coco_vg.json", label_map_file=None, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) v3d_train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), dict( - type='RandomSamplingNegPos', + type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, # change this - label_map_file='data/V3Det/annotations/v3det_2023_v1_label_map.json', - max_tokens=256), + label_map_file="data/V3Det/annotations/v3det_2023_v1_label_map.json", + max_tokens=256, + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] v3det_dataset = dict( - type='ODVGDataset', - data_root='data/V3Det/', - ann_file='annotations/v3det_2023_v1_train_od.json', - label_map_file='annotations/v3det_2023_v1_label_map.json', - data_prefix=dict(img=''), + type="ODVGDataset", + data_root="data/V3Det/", + ann_file="annotations/v3det_2023_v1_train_od.json", + label_map_file="annotations/v3det_2023_v1_label_map.json", + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False), need_text=False, # change this pipeline=v3d_train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) -train_dataloader = dict( - dataset=dict(datasets=[ - o365v1_od_dataset, flickr30k_dataset, gqa_dataset, v3det_dataset - ])) +train_dataloader = dict(dataset=dict(datasets=[o365v1_od_dataset, flickr30k_dataset, gqa_dataset, v3det_dataset])) # learning policy max_epochs = 18 param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[13, 16], - gamma=0.1) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[13, 16], gamma=0.1), ] -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) diff --git a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-l_pretrain_all.py b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-l_pretrain_all.py index a17f2344e14d8af81bd267d8bd47662f7e6e059d..b0e5a0fe78a712647a41bb2d3cea87fb8d2ae792 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-l_pretrain_all.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-l_pretrain_all.py @@ -1,6 +1,6 @@ -_base_ = 'grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "grounding_dino_swin-t_pretrain_obj365.py" -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-l_pretrain_obj365_goldg/grounding_dino_swin-l_pretrain_obj365_goldg-34dcdc53.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-l_pretrain_obj365_goldg/grounding_dino_swin-l_pretrain_obj365_goldg-34dcdc53.pth" # noqa num_levels = 5 model = dict( @@ -8,7 +8,7 @@ model = dict( num_feature_levels=num_levels, backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", pretrain_img_size=384, embed_dims=192, depths=[2, 2, 18, 2], @@ -17,8 +17,8 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=(0, 1, 2, 3), @@ -27,10 +27,12 @@ model = dict( with_cp=True, convert_weights=True, frozen_stages=-1, - init_cfg=None), + init_cfg=None, + ), neck=dict(in_channels=[192, 384, 768, 1536], num_outs=num_levels), encoder=dict(layer_cfg=dict(self_attn_cfg=dict(num_levels=num_levels))), - decoder=dict(layer_cfg=dict(cross_attn_cfg=dict(num_levels=num_levels)))) + decoder=dict(layer_cfg=dict(cross_attn_cfg=dict(num_levels=num_levels))), +) # --------------------------- object365v2 od dataset--------------------------- # objv2_backend_args = dict( @@ -42,61 +44,93 @@ model = dict( objv2_backend_args = None objv2_train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=objv2_backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=objv2_backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), dict( - type='RandomSamplingNegPos', + type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, # change this - label_map_file='data/objects365v2/annotations/o365v2_label_map.json', - max_tokens=256), + label_map_file="data/objects365v2/annotations/o365v2_label_map.json", + max_tokens=256, + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] o365v2_dataset = dict( - type='ODVGDataset', - data_root='data/objects365v2/', - ann_file='annotations/zhiyuan_objv2_train_od.json', - label_map_file='annotations/o365v2_label_map.json', - data_prefix=dict(img='train/'), + type="ODVGDataset", + data_root="data/objects365v2/", + ann_file="annotations/zhiyuan_objv2_train_od.json", + label_map_file="annotations/o365v2_label_map.json", + data_prefix=dict(img="train/"), filter_cfg=dict(filter_empty_gt=False), pipeline=objv2_train_pipeline, return_classes=True, @@ -114,310 +148,425 @@ o365v2_dataset = dict( oi_backend_args = None oi_train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=oi_backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=oi_backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), dict( - type='RandomSamplingNegPos', + type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, # change this - label_map_file='data/OpenImages/annotations/openimages_label_map.json', - max_tokens=256), + label_map_file="data/OpenImages/annotations/openimages_label_map.json", + max_tokens=256, + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] oiv6_dataset = dict( - type='ODVGDataset', - data_root='data/OpenImages/', - ann_file='annotations/oidv6-train-annotations_od.json', - label_map_file='annotations/openimages_label_map.json', - data_prefix=dict(img='OpenImages/train/'), + type="ODVGDataset", + data_root="data/OpenImages/", + ann_file="annotations/oidv6-train-annotations_od.json", + label_map_file="annotations/openimages_label_map.json", + data_prefix=dict(img="OpenImages/train/"), filter_cfg=dict(filter_empty_gt=False), need_text=False, pipeline=oi_train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) # --------------------------- v3det od dataset--------------------------- v3d_train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), dict( - type='RandomSamplingNegPos', + type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, # change this - label_map_file='data/V3Det/annotations/v3det_2023_v1_label_map.json', - max_tokens=256), + label_map_file="data/V3Det/annotations/v3det_2023_v1_label_map.json", + max_tokens=256, + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] v3det_dataset = dict( - type='RepeatDataset', + type="RepeatDataset", times=2, dataset=dict( - type='ODVGDataset', - data_root='data/V3Det/', - ann_file='annotations/v3det_2023_v1_train_od.json', - label_map_file='annotations/v3det_2023_v1_label_map.json', - data_prefix=dict(img=''), + type="ODVGDataset", + data_root="data/V3Det/", + ann_file="annotations/v3det_2023_v1_train_od.json", + label_map_file="annotations/v3det_2023_v1_label_map.json", + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False), need_text=False, pipeline=v3d_train_pipeline, return_classes=True, - backend_args=None)) + backend_args=None, + ), +) # --------------------------- lvis od dataset--------------------------- lvis_train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), dict( - type='RandomSamplingNegPos', + type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, # change this - label_map_file='data/coco/annotations/lvis_v1_label_map.json', - max_tokens=256), + label_map_file="data/coco/annotations/lvis_v1_label_map.json", + max_tokens=256, + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] lvis_dataset = dict( - type='ClassBalancedDataset', + type="ClassBalancedDataset", oversample_thr=1e-3, dataset=dict( - type='ODVGDataset', - data_root='data/coco/', - ann_file='annotations/lvis_v1_train_od.json', - label_map_file='annotations/lvis_v1_label_map.json', - data_prefix=dict(img=''), + type="ODVGDataset", + data_root="data/coco/", + ann_file="annotations/lvis_v1_train_od.json", + label_map_file="annotations/lvis_v1_label_map.json", + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False), need_text=False, # change this pipeline=lvis_train_pipeline, return_classes=True, - backend_args=None)) + backend_args=None, + ), +) # --------------------------- coco2017 od dataset--------------------------- coco2017_train_dataset = dict( - type='RepeatDataset', + type="RepeatDataset", times=2, dataset=dict( - type='ODVGDataset', - data_root='data/coco/', - ann_file='annotations/instance_train2017_norefval_od.json', - label_map_file='annotations/coco2017_label_map.json', - data_prefix=dict(img='train2017'), + type="ODVGDataset", + data_root="data/coco/", + ann_file="annotations/instance_train2017_norefval_od.json", + label_map_file="annotations/coco2017_label_map.json", + data_prefix=dict(img="train2017"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None)) + backend_args=None, + ), +) # --------------------------- flickr30k vg dataset--------------------------- flickr30k_dataset = dict( - type='RepeatDataset', + type="RepeatDataset", times=2, dataset=dict( - type='ODVGDataset', - data_root='data/flickr30k_entities/', - ann_file='final_flickr_separateGT_train_vg.json', + type="ODVGDataset", + data_root="data/flickr30k_entities/", + ann_file="final_flickr_separateGT_train_vg.json", label_map_file=None, - data_prefix=dict(img='flickr30k_images/'), + data_prefix=dict(img="flickr30k_images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None)) + backend_args=None, + ), +) # --------------------------- gqa vg dataset--------------------------- gqa_dataset = dict( - type='ODVGDataset', - data_root='data/gqa/', - ann_file='final_mixed_train_no_coco_vg.json', + type="ODVGDataset", + data_root="data/gqa/", + ann_file="final_mixed_train_no_coco_vg.json", label_map_file=None, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) # --------------------------- coco2014 vg dataset--------------------------- coco2014_vg_dataset = dict( - type='ODVGDataset', - data_root='data/coco/', - ann_file='mdetr_annotations/final_mixed_train_only_coco_vg.json', + type="ODVGDataset", + data_root="data/coco/", + ann_file="mdetr_annotations/final_mixed_train_only_coco_vg.json", label_map_file=None, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) # --------------------------- refcoco vg dataset--------------------------- refcoco_dataset = dict( - type='RepeatDataset', + type="RepeatDataset", times=2, dataset=dict( - type='ODVGDataset', - data_root='data/coco/', - ann_file='mdetr_annotations/finetune_refcoco_train_vg.json', + type="ODVGDataset", + data_root="data/coco/", + ann_file="mdetr_annotations/finetune_refcoco_train_vg.json", label_map_file=None, - data_prefix=dict(img='train2014'), + data_prefix=dict(img="train2014"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None)) + backend_args=None, + ), +) # --------------------------- refcoco+ vg dataset--------------------------- refcoco_plus_dataset = dict( - type='RepeatDataset', + type="RepeatDataset", times=2, dataset=dict( - type='ODVGDataset', - data_root='data/coco/', - ann_file='mdetr_annotations/finetune_refcoco+_train_vg.json', + type="ODVGDataset", + data_root="data/coco/", + ann_file="mdetr_annotations/finetune_refcoco+_train_vg.json", label_map_file=None, - data_prefix=dict(img='train2014'), + data_prefix=dict(img="train2014"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None)) + backend_args=None, + ), +) # --------------------------- refcocog vg dataset--------------------------- refcocog_dataset = dict( - type='RepeatDataset', + type="RepeatDataset", times=3, dataset=dict( - type='ODVGDataset', - data_root='data/coco/', - ann_file='mdetr_annotations/finetune_refcocog_train_vg.json', + type="ODVGDataset", + data_root="data/coco/", + ann_file="mdetr_annotations/finetune_refcocog_train_vg.json", label_map_file=None, - data_prefix=dict(img='train2014'), + data_prefix=dict(img="train2014"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None)) + backend_args=None, + ), +) # --------------------------- grefcoco vg dataset--------------------------- grefcoco_dataset = dict( - type='RepeatDataset', + type="RepeatDataset", times=2, dataset=dict( - type='ODVGDataset', - data_root='data/coco/', - ann_file='mdetr_annotations/finetune_grefcoco_train_vg.json', + type="ODVGDataset", + data_root="data/coco/", + ann_file="mdetr_annotations/finetune_grefcoco_train_vg.json", label_map_file=None, - data_prefix=dict(img='train2014'), + data_prefix=dict(img="train2014"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None)) + backend_args=None, + ), +) # --------------------------- grit vg dataset--------------------------- # grit_backend_args = dict( @@ -429,63 +578,91 @@ grefcoco_dataset = dict( grit_backend_args = None grit_train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=grit_backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=grit_backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), - dict( - type='RandomSamplingNegPos', - tokenizer_name=_base_.lang_model_name, - num_sample_negative=85, - max_tokens=256), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, max_tokens=256), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] grit_dataset = dict( - type='ODVGDataset', - data_root='data/grit/', - ann_file='grit20m_vg.json', + type="ODVGDataset", + data_root="data/grit/", + ann_file="grit20m_vg.json", label_map_file=None, - data_prefix=dict(img=''), + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False), pipeline=grit_train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) # --------------------------- dataloader--------------------------- train_dataloader = dict( @@ -493,48 +670,41 @@ train_dataloader = dict( num_workers=4, sampler=dict( _delete_=True, - type='CustomSampleSizeSampler', + type="CustomSampleSizeSampler", ratio_mode=True, # OD ~ 1.74+1.67*0.5+0.18*2+0.12*2+0.1=3.2 # vg ~ 0.15*2+0.62*1+0.49*1+0.12*2+0.12*2+0.08*3+0.19*2+9*0.09=3.3 - dataset_size=[-1, 0.5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0.09]), - dataset=dict(datasets=[ - o365v2_dataset, # 1.74M - oiv6_dataset, # 1.67M - v3det_dataset, # 0.18M - coco2017_train_dataset, # 0.12M - lvis_dataset, # 0.1M - flickr30k_dataset, # 0.15M - gqa_dataset, # 0.62M - coco2014_vg_dataset, # 0.49M - refcoco_dataset, # 0.12M - refcoco_plus_dataset, # 0.12M - refcocog_dataset, # 0.08M - grefcoco_dataset, # 0.19M - grit_dataset # 9M - ])) + dataset_size=[-1, 0.5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0.09], + ), + dataset=dict( + datasets=[ + o365v2_dataset, # 1.74M + oiv6_dataset, # 1.67M + v3det_dataset, # 0.18M + coco2017_train_dataset, # 0.12M + lvis_dataset, # 0.1M + flickr30k_dataset, # 0.15M + gqa_dataset, # 0.62M + coco2014_vg_dataset, # 0.49M + refcoco_dataset, # 0.12M + refcoco_plus_dataset, # 0.12M + refcocog_dataset, # 0.08M + grefcoco_dataset, # 0.19M + grit_dataset, # 9M + ] + ), +) # 4NODES * 8GPU optim_wrapper = dict(optimizer=dict(lr=0.0001)) max_iter = 250000 -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=max_iter, - val_interval=13000) +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=max_iter, val_interval=13000) param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_iter, - by_epoch=False, - milestones=[210000], - gamma=0.1) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_iter, by_epoch=False, milestones=[210000], gamma=0.1), ] -default_hooks = dict( - checkpoint=dict(by_epoch=False, interval=13000, max_keep_ckpts=30)) +default_hooks = dict(checkpoint=dict(by_epoch=False, interval=13000, max_keep_ckpts=30)) log_processor = dict(by_epoch=False) diff --git a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-l_pretrain_obj365_goldg.py b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-l_pretrain_obj365_goldg.py index 85d43f96b3bdf79081dfb091c1cc8b6c03de7252..d7d32ee93829c9e3da46dde49eaf1467d8765263 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-l_pretrain_obj365_goldg.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-l_pretrain_obj365_goldg.py @@ -1,13 +1,13 @@ -_base_ = 'grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "grounding_dino_swin-t_pretrain_obj365.py" -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth" # noqa num_levels = 5 model = dict( use_autocast=True, num_feature_levels=num_levels, backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", pretrain_img_size=384, embed_dims=192, depths=[2, 2, 18, 2], @@ -16,8 +16,8 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=(0, 1, 2, 3), @@ -26,10 +26,12 @@ model = dict( with_cp=True, convert_weights=True, frozen_stages=-1, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), neck=dict(in_channels=[192, 384, 768, 1536], num_outs=num_levels), encoder=dict(layer_cfg=dict(self_attn_cfg=dict(num_levels=num_levels))), - decoder=dict(layer_cfg=dict(cross_attn_cfg=dict(num_levels=num_levels)))) + decoder=dict(layer_cfg=dict(cross_attn_cfg=dict(num_levels=num_levels))), +) # --------------------------- object365v2 od dataset--------------------------- # objv2_backend_args = dict( @@ -41,61 +43,93 @@ model = dict( objv2_backend_args = None objv2_train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=objv2_backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=objv2_backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), dict( - type='RandomSamplingNegPos', + type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, # change this - label_map_file='data/objects365v2/annotations/o365v2_label_map.json', - max_tokens=256), + label_map_file="data/objects365v2/annotations/o365v2_label_map.json", + max_tokens=256, + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] o365v2_dataset = dict( - type='ODVGDataset', - data_root='data/objects365v2/', - ann_file='annotations/zhiyuan_objv2_train_od.json', - label_map_file='annotations/o365v2_label_map.json', - data_prefix=dict(img='train/'), + type="ODVGDataset", + data_root="data/objects365v2/", + ann_file="annotations/zhiyuan_objv2_train_od.json", + label_map_file="annotations/o365v2_label_map.json", + data_prefix=dict(img="train/"), filter_cfg=dict(filter_empty_gt=False), pipeline=objv2_train_pipeline, return_classes=True, @@ -113,115 +147,136 @@ o365v2_dataset = dict( oi_backend_args = None oi_train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=oi_backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=oi_backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), dict( - type='RandomSamplingNegPos', + type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, # change this - label_map_file='data/OpenImages/annotations/openimages_label_map.json', - max_tokens=256), + label_map_file="data/OpenImages/annotations/openimages_label_map.json", + max_tokens=256, + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] oiv6_dataset = dict( - type='ODVGDataset', - data_root='data/OpenImages/', - ann_file='annotations/oidv6-train-annotations_od.json', - label_map_file='annotations/openimages_label_map.json', - data_prefix=dict(img='OpenImages/train/'), + type="ODVGDataset", + data_root="data/OpenImages/", + ann_file="annotations/oidv6-train-annotations_od.json", + label_map_file="annotations/openimages_label_map.json", + data_prefix=dict(img="OpenImages/train/"), filter_cfg=dict(filter_empty_gt=False), need_text=False, pipeline=oi_train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) flickr30k_dataset = dict( - type='ODVGDataset', - data_root='data/flickr30k_entities/', - ann_file='final_flickr_separateGT_train_vg.json', + type="ODVGDataset", + data_root="data/flickr30k_entities/", + ann_file="final_flickr_separateGT_train_vg.json", label_map_file=None, - data_prefix=dict(img='flickr30k_images/'), + data_prefix=dict(img="flickr30k_images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) gqa_dataset = dict( - type='ODVGDataset', - data_root='data/gqa/', - ann_file='final_mixed_train_no_coco_vg.json', + type="ODVGDataset", + data_root="data/gqa/", + ann_file="final_mixed_train_no_coco_vg.json", label_map_file=None, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) -train_dataloader = dict( - dataset=dict(datasets=[ - o365v2_dataset, oiv6_dataset, flickr30k_dataset, gqa_dataset - ])) +train_dataloader = dict(dataset=dict(datasets=[o365v2_dataset, oiv6_dataset, flickr30k_dataset, gqa_dataset])) # 4Nodex8GPU optim_wrapper = dict(optimizer=dict(lr=0.0002)) max_iter = 200000 -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=max_iter, - val_interval=13000) +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=max_iter, val_interval=13000) param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_iter, - by_epoch=False, - milestones=[156100], - gamma=0.5) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_iter, by_epoch=False, milestones=[156100], gamma=0.5), ] -default_hooks = dict( - checkpoint=dict(by_epoch=False, interval=13000, max_keep_ckpts=30)) +default_hooks = dict(checkpoint=dict(by_epoch=False, interval=13000, max_keep_ckpts=30)) log_processor = dict(by_epoch=False) diff --git a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_finetune_8xb4_20e_cat.py b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_finetune_8xb4_20e_cat.py index bf3b35894eb5fcee6db9f02c2ab8a837cd6da20b..713ea1e4286b2001c03027693f575356799cf819 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_finetune_8xb4_20e_cat.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_finetune_8xb4_20e_cat.py @@ -1,102 +1,108 @@ -_base_ = 'grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "grounding_dino_swin-t_pretrain_obj365.py" -data_root = 'data/cat/' -class_name = ('cat', ) +data_root = "data/cat/" +class_name = ("cat",) num_classes = len(class_name) metainfo = dict(classes=class_name, palette=[(220, 20, 60)]) model = dict(bbox_head=dict(num_classes=num_classes)) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction", "text", "custom_entities"), + ), ] train_dataloader = dict( dataset=dict( _delete_=True, - type='CocoDataset', + type="CocoDataset", data_root=data_root, metainfo=metainfo, return_classes=True, pipeline=train_pipeline, filter_cfg=dict(filter_empty_gt=False, min_size=32), - ann_file='annotations/trainval.json', - data_prefix=dict(img='images/'))) + ann_file="annotations/trainval.json", + data_prefix=dict(img="images/"), + ) +) val_dataloader = dict( - dataset=dict( - metainfo=metainfo, - data_root=data_root, - ann_file='annotations/test.json', - data_prefix=dict(img='images/'))) + dataset=dict(metainfo=metainfo, data_root=data_root, ann_file="annotations/test.json", data_prefix=dict(img="images/")) +) test_dataloader = val_dataloader -val_evaluator = dict(ann_file=data_root + 'annotations/test.json') +val_evaluator = dict(ann_file=data_root + "annotations/test.json") test_evaluator = val_evaluator max_epoch = 20 -default_hooks = dict( - checkpoint=dict(interval=1, max_keep_ckpts=1, save_best='auto'), - logger=dict(type='LoggerHook', interval=5)) +default_hooks = dict(checkpoint=dict(interval=1, max_keep_ckpts=1, save_best="auto"), logger=dict(type="LoggerHook", interval=5)) train_cfg = dict(max_epochs=max_epoch, val_interval=1) -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epoch, - by_epoch=True, - milestones=[15], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epoch, by_epoch=True, milestones=[15], gamma=0.1)] optim_wrapper = dict( optimizer=dict(lr=0.0001), paramwise_cfg=dict( - custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.0), - 'language_model': dict(lr_mult=0.0) - })) + custom_keys={"absolute_pos_embed": dict(decay_mult=0.0), "backbone": dict(lr_mult=0.0), "language_model": dict(lr_mult=0.0)} + ), +) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365.py b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365.py index 66060f45ea735ab5bbd8e1852c035ea20adcbd80..635b5bd0d84f25b6b2560606eb8aaa58e61dedb5 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365.py @@ -1,33 +1,30 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth' # noqa -lang_model_name = 'bert-base-uncased' +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth" # noqa +lang_model_name = "bert-base-uncased" model = dict( - type='GroundingDINO', + type="GroundingDINO", num_queries=900, with_box_refine=True, as_two_stage=True, data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_mask=False, ), language_model=dict( - type='BertModel', + type="BertModel", name=lang_model_name, max_tokens=256, pad_to_max=False, use_sub_sentence_represent=True, - special_tokens_list=['[CLS]', '[SEP]', '.', '?'], + special_tokens_list=["[CLS]", "[SEP]", ".", "?"], add_pooling_layer=False, ), backbone=dict( - type='SwinTransformer', + type="SwinTransformer", embed_dims=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], @@ -35,44 +32,41 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=(1, 2, 3), with_cp=True, convert_weights=True, frozen_stages=-1, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), neck=dict( - type='ChannelMapper', + type="ChannelMapper", in_channels=[192, 384, 768], kernel_size=1, out_channels=256, act_cfg=None, bias=True, - norm_cfg=dict(type='GN', num_groups=32), - num_outs=4), + norm_cfg=dict(type="GN", num_groups=32), + num_outs=4, + ), encoder=dict( num_layers=6, num_cp=6, # visual layer config layer_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_levels=4, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=2048, ffn_drop=0.0)), + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), + ), # text layer config text_layer_cfg=dict( self_attn_cfg=dict(num_heads=4, embed_dims=256, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=1024, ffn_drop=0.0)), + ffn_cfg=dict(embed_dims=256, feedforward_channels=1024, ffn_drop=0.0), + ), # fusion layer config - fusion_layer_cfg=dict( - v_dim=256, - l_dim=256, - embed_dim=1024, - num_heads=4, - init_values=1e-4), + fusion_layer_cfg=dict(v_dim=256, l_dim=256, embed_dim=1024, num_heads=4, init_values=1e-4), ), decoder=dict( num_layers=6, @@ -84,164 +78,176 @@ model = dict( cross_attn_text_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), # cross attention layer query to image cross_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg=dict( - embed_dims=256, feedforward_channels=2048, ffn_drop=0.0)), - post_norm_cfg=None), - positional_encoding=dict( - num_feats=128, normalize=True, offset=0.0, temperature=20), + ffn_cfg=dict(embed_dims=256, feedforward_channels=2048, ffn_drop=0.0), + ), + post_norm_cfg=None, + ), + positional_encoding=dict(num_feats=128, normalize=True, offset=0.0, temperature=20), bbox_head=dict( - type='GroundingDINOHead', + type="GroundingDINOHead", num_classes=256, sync_cls_avg_factor=True, - contrastive_cfg=dict(max_text_len=256, log_scale='auto', bias=True), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), # 2.0 in DeformDETR - loss_bbox=dict(type='L1Loss', loss_weight=5.0)), + contrastive_cfg=dict(max_text_len=256, log_scale="auto", bias=True), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), # 2.0 in DeformDETR + loss_bbox=dict(type="L1Loss", loss_weight=5.0), + ), dn_cfg=dict( # TODO: Move to model.train_cfg ? - label_noise_scale=0.5, - box_noise_scale=1.0, # 0.4 for DN-DETR - group_cfg=dict(dynamic=True, num_groups=None, - num_dn_queries=100)), # TODO: half num_dn_queries + label_noise_scale=0.5, box_noise_scale=1.0, group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100) # 0.4 for DN-DETR + ), # TODO: half num_dn_queries # training and testing settings train_cfg=dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='BinaryFocalLossCost', weight=2.0), - dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'), - dict(type='IoUCost', iou_mode='giou', weight=2.0) - ])), - test_cfg=dict(max_per_img=300)) + dict(type="BinaryFocalLossCost", weight=2.0), + dict(type="BBoxL1Cost", weight=5.0, box_format="xywh"), + dict(type="IoUCost", iou_mode="giou", weight=2.0), + ], + ) + ), + test_cfg=dict(max_per_img=300), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), - dict( - type='RandomSamplingNegPos', - tokenizer_name=lang_model_name, - num_sample_negative=85, - max_tokens=256), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="RandomSamplingNegPos", tokenizer_name=lang_model_name, num_sample_negative=85, max_tokens=256), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] test_pipeline = [ + dict(type="LoadImageFromFile", backend_args=None, imdecode_backend="pillow"), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True, backend="pillow"), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='LoadImageFromFile', backend_args=None, - imdecode_backend='pillow'), - dict( - type='FixScaleResize', - scale=(800, 1333), - keep_ratio=True, - backend='pillow'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities', - 'tokens_positive')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "text", "custom_entities", "tokens_positive"), + ), ] -dataset_type = 'ODVGDataset' -data_root = 'data/objects365v1/' +dataset_type = "ODVGDataset" +data_root = "data/objects365v1/" coco_od_dataset = dict( type=dataset_type, data_root=data_root, - ann_file='o365v1_train_odvg.json', - label_map_file='o365v1_label_map.json', - data_prefix=dict(img='train/'), + ann_file="o365v1_train_odvg.json", + label_map_file="o365v1_label_map.json", + data_prefix=dict(img="train/"), filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) train_dataloader = dict( _delete_=True, batch_size=4, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), - dataset=dict(type='ConcatDataset', datasets=[coco_od_dataset])) + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), + dataset=dict(type="ConcatDataset", datasets=[coco_od_dataset]), +) -val_dataloader = dict( - dataset=dict(pipeline=test_pipeline, return_classes=True)) +val_dataloader = dict(dataset=dict(pipeline=test_pipeline, return_classes=True)) test_dataloader = val_dataloader optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0004, - weight_decay=0.0001), # bs=16 0.0001 + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0004, weight_decay=0.0001), # bs=16 0.0001 clip_grad=dict(max_norm=0.1, norm_type=2), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1), - 'language_model': dict(lr_mult=0.1), - })) + "absolute_pos_embed": dict(decay_mult=0.0), + "backbone": dict(lr_mult=0.1), + "language_model": dict(lr_mult=0.1), + } + ), +) # learning policy max_epochs = 30 param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[19, 26], - gamma=0.1) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[19, 26], gamma=0.1), ] -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. # base_batch_size = (16 GPUs) x (2 samples per GPU) auto_scale_lr = dict(base_batch_size=64) -default_hooks = dict(visualization=dict(type='GroundingVisualizationHook')) +default_hooks = dict(visualization=dict(type="GroundingVisualizationHook")) diff --git a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg.py b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg.py index b7f388bdd4e8b61d1e7b6fd19445b3628164c4a0..03eb00f362fa57c764c4fb42f035c7b0abd210e1 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg.py @@ -1,11 +1,11 @@ -_base_ = 'grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "grounding_dino_swin-t_pretrain_obj365.py" o365v1_od_dataset = dict( - type='ODVGDataset', - data_root='data/objects365v1/', - ann_file='o365v1_train_odvg.json', - label_map_file='o365v1_label_map.json', - data_prefix=dict(img='train/'), + type="ODVGDataset", + data_root="data/objects365v1/", + ann_file="o365v1_train_odvg.json", + label_map_file="o365v1_label_map.json", + data_prefix=dict(img="train/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, @@ -13,26 +13,27 @@ o365v1_od_dataset = dict( ) flickr30k_dataset = dict( - type='ODVGDataset', - data_root='data/flickr30k_entities/', - ann_file='final_flickr_separateGT_train_vg.json', + type="ODVGDataset", + data_root="data/flickr30k_entities/", + ann_file="final_flickr_separateGT_train_vg.json", label_map_file=None, - data_prefix=dict(img='flickr30k_images/'), + data_prefix=dict(img="flickr30k_images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) gqa_dataset = dict( - type='ODVGDataset', - data_root='data/gqa/', - ann_file='final_mixed_train_no_coco_vg.json', + type="ODVGDataset", + data_root="data/gqa/", + ann_file="final_mixed_train_no_coco_vg.json", label_map_file=None, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) -train_dataloader = dict( - dataset=dict(datasets=[o365v1_od_dataset, flickr30k_dataset, gqa_dataset])) +train_dataloader = dict(dataset=dict(datasets=[o365v1_od_dataset, flickr30k_dataset, gqa_dataset])) diff --git a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m.py b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m.py index 8e9f5ca4aaba7afb631f76b8a575101868fed2a4..0556a3940ebdc626282c788001cab1555fcda42e 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m.py @@ -1,11 +1,11 @@ -_base_ = 'grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "grounding_dino_swin-t_pretrain_obj365.py" o365v1_od_dataset = dict( - type='ODVGDataset', - data_root='data/objects365v1/', - ann_file='o365v1_train_odvg.json', - label_map_file='o365v1_label_map.json', - data_prefix=dict(img='train/'), + type="ODVGDataset", + data_root="data/objects365v1/", + ann_file="o365v1_train_odvg.json", + label_map_file="o365v1_label_map.json", + data_prefix=dict(img="train/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, @@ -13,43 +13,42 @@ o365v1_od_dataset = dict( ) flickr30k_dataset = dict( - type='ODVGDataset', - data_root='data/flickr30k_entities/', - ann_file='final_flickr_separateGT_train_vg.json', + type="ODVGDataset", + data_root="data/flickr30k_entities/", + ann_file="final_flickr_separateGT_train_vg.json", label_map_file=None, - data_prefix=dict(img='flickr30k_images/'), + data_prefix=dict(img="flickr30k_images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) gqa_dataset = dict( - type='ODVGDataset', - data_root='data/gqa/', - ann_file='final_mixed_train_no_coco_vg.json', + type="ODVGDataset", + data_root="data/gqa/", + ann_file="final_mixed_train_no_coco_vg.json", label_map_file=None, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) grit_dataset = dict( - type='ODVGDataset', - data_root='grit_processed/', - ann_file='grit20m_vg.json', + type="ODVGDataset", + data_root="grit_processed/", + ann_file="grit20m_vg.json", label_map_file=None, - data_prefix=dict(img=''), + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) train_dataloader = dict( - sampler=dict( - _delete_=True, - type='CustomSampleSizeSampler', - dataset_size=[-1, -1, -1, 500000]), - dataset=dict(datasets=[ - o365v1_od_dataset, flickr30k_dataset, gqa_dataset, grit_dataset - ])) + sampler=dict(_delete_=True, type="CustomSampleSizeSampler", dataset_size=[-1, -1, -1, 500000]), + dataset=dict(datasets=[o365v1_od_dataset, flickr30k_dataset, gqa_dataset, grit_dataset]), +) diff --git a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det.py b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det.py index 56e500c86932a8e61dba88fde2bfc00c0ced5585..ef79b2b588c95aa15a9311301f5f70986cdb6a95 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det.py @@ -1,11 +1,11 @@ -_base_ = 'grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "grounding_dino_swin-t_pretrain_obj365.py" o365v1_od_dataset = dict( - type='ODVGDataset', - data_root='data/objects365v1/', - ann_file='o365v1_train_odvg.json', - label_map_file='o365v1_label_map.json', - data_prefix=dict(img='train/'), + type="ODVGDataset", + data_root="data/objects365v1/", + ann_file="o365v1_train_odvg.json", + label_map_file="o365v1_label_map.json", + data_prefix=dict(img="train/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, @@ -13,105 +13,136 @@ o365v1_od_dataset = dict( ) flickr30k_dataset = dict( - type='ODVGDataset', - data_root='data/flickr30k_entities/', - ann_file='final_flickr_separateGT_train_vg.json', + type="ODVGDataset", + data_root="data/flickr30k_entities/", + ann_file="final_flickr_separateGT_train_vg.json", label_map_file=None, - data_prefix=dict(img='flickr30k_images/'), + data_prefix=dict(img="flickr30k_images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) gqa_dataset = dict( - type='ODVGDataset', - data_root='data/gqa/', - ann_file='final_mixed_train_no_coco_vg.json', + type="ODVGDataset", + data_root="data/gqa/", + ann_file="final_mixed_train_no_coco_vg.json", label_map_file=None, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) v3d_train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), dict( - type='RandomSamplingNegPos', + type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, # change this - label_map_file='data/V3Det/annotations/v3det_2023_v1_label_map.json', - max_tokens=256), + label_map_file="data/V3Det/annotations/v3det_2023_v1_label_map.json", + max_tokens=256, + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] v3det_dataset = dict( - type='ODVGDataset', - data_root='data/V3Det/', - ann_file='annotations/v3det_2023_v1_train_od.json', - label_map_file='annotations/v3det_2023_v1_label_map.json', - data_prefix=dict(img=''), + type="ODVGDataset", + data_root="data/V3Det/", + ann_file="annotations/v3det_2023_v1_train_od.json", + label_map_file="annotations/v3det_2023_v1_label_map.json", + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False), need_text=False, # change this pipeline=v3d_train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) grit_dataset = dict( - type='ODVGDataset', - data_root='grit_processed/', - ann_file='grit20m_vg.json', + type="ODVGDataset", + data_root="grit_processed/", + ann_file="grit20m_vg.json", label_map_file=None, - data_prefix=dict(img=''), + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) train_dataloader = dict( - sampler=dict( - _delete_=True, - type='CustomSampleSizeSampler', - dataset_size=[-1, -1, -1, -1, 500000]), - dataset=dict(datasets=[ - o365v1_od_dataset, flickr30k_dataset, gqa_dataset, v3det_dataset, - grit_dataset - ])) + sampler=dict(_delete_=True, type="CustomSampleSizeSampler", dataset_size=[-1, -1, -1, -1, 500000]), + dataset=dict(datasets=[o365v1_od_dataset, flickr30k_dataset, gqa_dataset, v3det_dataset, grit_dataset]), +) diff --git a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_v3det.py b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_v3det.py index c89014fbbe43a1e7787fa46d7d850d42a64ff8a9..e41cbb75faad42839e3bb5879e662910e0dab934 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_v3det.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_v3det.py @@ -1,11 +1,11 @@ -_base_ = 'grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "grounding_dino_swin-t_pretrain_obj365.py" o365v1_od_dataset = dict( - type='ODVGDataset', - data_root='data/objects365v1/', - ann_file='o365v1_train_odvg.json', - label_map_file='o365v1_label_map.json', - data_prefix=dict(img='train/'), + type="ODVGDataset", + data_root="data/objects365v1/", + ann_file="o365v1_train_odvg.json", + label_map_file="o365v1_label_map.json", + data_prefix=dict(img="train/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, @@ -13,89 +13,121 @@ o365v1_od_dataset = dict( ) flickr30k_dataset = dict( - type='ODVGDataset', - data_root='data/flickr30k_entities/', - ann_file='final_flickr_separateGT_train_vg.json', + type="ODVGDataset", + data_root="data/flickr30k_entities/", + ann_file="final_flickr_separateGT_train_vg.json", label_map_file=None, - data_prefix=dict(img='flickr30k_images/'), + data_prefix=dict(img="flickr30k_images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) gqa_dataset = dict( - type='ODVGDataset', - data_root='data/gqa/', - ann_file='final_mixed_train_no_coco_vg.json', + type="ODVGDataset", + data_root="data/gqa/", + ann_file="final_mixed_train_no_coco_vg.json", label_map_file=None, - data_prefix=dict(img='images/'), + data_prefix=dict(img="images/"), filter_cfg=dict(filter_empty_gt=False), pipeline=_base_.train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) v3d_train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), dict( - type='RandomSamplingNegPos', + type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, # change this - label_map_file='data/V3Det/annotations/v3det_2023_v1_label_map.json', - max_tokens=256), + label_map_file="data/V3Det/annotations/v3det_2023_v1_label_map.json", + max_tokens=256, + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] v3det_dataset = dict( - type='ODVGDataset', - data_root='data/V3Det/', - ann_file='annotations/v3det_2023_v1_train_od.json', - label_map_file='annotations/v3det_2023_v1_label_map.json', - data_prefix=dict(img=''), + type="ODVGDataset", + data_root="data/V3Det/", + ann_file="annotations/v3det_2023_v1_train_od.json", + label_map_file="annotations/v3det_2023_v1_label_map.json", + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False), need_text=False, # change this pipeline=v3d_train_pipeline, return_classes=True, - backend_args=None) + backend_args=None, +) -train_dataloader = dict( - dataset=dict(datasets=[ - o365v1_od_dataset, flickr30k_dataset, gqa_dataset, v3det_dataset - ])) +train_dataloader = dict(dataset=dict(datasets=[o365v1_od_dataset, flickr30k_dataset, gqa_dataset, v3det_dataset])) diff --git a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_pseudo-labeling_cat.py b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_pseudo-labeling_cat.py index 6dc8dcd8df4b98a3fdb3aa26d73ce353b9251f50..e78c324f1ac73244f81824434a46de79d2996759 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_pseudo-labeling_cat.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_pseudo-labeling_cat.py @@ -1,43 +1,39 @@ -_base_ = 'grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "grounding_dino_swin-t_pretrain_obj365.py" test_pipeline = [ + dict(type="LoadImageFromFile", backend_args=None, imdecode_backend="pillow"), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True, backend="pillow"), + dict(type="LoadTextAnnotations"), dict( - type='LoadImageFromFile', backend_args=None, - imdecode_backend='pillow'), - dict( - type='FixScaleResize', - scale=(800, 1333), - keep_ratio=True, - backend='pillow'), - dict(type='LoadTextAnnotations'), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities', - 'tokens_positive')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "text", "custom_entities", "tokens_positive"), + ), ] -data_root = 'data/cat/' +data_root = "data/cat/" val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=False, dataset=dict( - type='ODVGDataset', + type="ODVGDataset", data_root=data_root, - label_map_file='cat_label_map.json', - ann_file='cat_train_od.json', - data_prefix=dict(img='images/'), + label_map_file="cat_label_map.json", + ann_file="cat_train_od.json", + data_prefix=dict(img="images/"), pipeline=test_pipeline, - return_classes=True)) + return_classes=True, + ), +) test_dataloader = val_dataloader val_evaluator = dict( _delete_=True, - outfile_path=data_root + 'cat_train_od_v1.json', - img_prefix=data_root + 'images/', + outfile_path=data_root + "cat_train_od_v1.json", + img_prefix=data_root + "images/", score_thr=0.7, nms_thr=0.5, - type='DumpODVGResults') + type="DumpODVGResults", +) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_pseudo-labeling_flickr30k.py b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_pseudo-labeling_flickr30k.py index 78bf1c344bf7c795ace08283b745527dfc9b15f7..2ca2173764b94168c59ed94ab72553162981314d 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_pseudo-labeling_flickr30k.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/grounding_dino_swin-t_pretrain_pseudo-labeling_flickr30k.py @@ -1,42 +1,38 @@ -_base_ = 'grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "grounding_dino_swin-t_pretrain_obj365.py" test_pipeline = [ + dict(type="LoadImageFromFile", backend_args=None, imdecode_backend="pillow"), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True, backend="pillow"), + dict(type="LoadTextAnnotations"), dict( - type='LoadImageFromFile', backend_args=None, - imdecode_backend='pillow'), - dict( - type='FixScaleResize', - scale=(800, 1333), - keep_ratio=True, - backend='pillow'), - dict(type='LoadTextAnnotations'), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities', - 'tokens_positive')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "text", "custom_entities", "tokens_positive"), + ), ] -data_root = 'data/flickr30k_entities/' +data_root = "data/flickr30k_entities/" val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=False, dataset=dict( - type='ODVGDataset', + type="ODVGDataset", data_root=data_root, - ann_file='flickr_simple_train_vg.json', - data_prefix=dict(img='flickr30k_images/'), + ann_file="flickr_simple_train_vg.json", + data_prefix=dict(img="flickr30k_images/"), pipeline=test_pipeline, - return_classes=True)) + return_classes=True, + ), +) test_dataloader = val_dataloader val_evaluator = dict( _delete_=True, - outfile_path=data_root + 'flickr_simple_train_vg_v1.json', - img_prefix=data_root + 'flickr30k_images/', + outfile_path=data_root + "flickr_simple_train_vg_v1.json", + img_prefix=data_root + "flickr30k_images/", score_thr=0.4, nms_thr=0.5, - type='DumpODVGResults') + type="DumpODVGResults", +) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_finetune_16xb4_1x_lvis.py b/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_finetune_16xb4_1x_lvis.py index 3ba12c9067511b00b616781ca0cf2e477e5e689e..63a9d25396d1941bc827c4fe5624ac62ff3c0535 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_finetune_16xb4_1x_lvis.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_finetune_16xb4_1x_lvis.py @@ -1,120 +1,144 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -data_root = 'data/coco/' +data_root = "data/coco/" -model = dict(test_cfg=dict( - max_per_img=300, - chunked_size=40, -)) +model = dict( + test_cfg=dict( + max_per_img=300, + chunked_size=40, + ) +) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), dict( - type='RandomSamplingNegPos', + type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, # change this - label_map_file='data/coco/annotations/lvis_v1_label_map.json', - max_tokens=256), + label_map_file="data/coco/annotations/lvis_v1_label_map.json", + max_tokens=256, + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] train_dataloader = dict( dataset=dict( _delete_=True, - type='ClassBalancedDataset', + type="ClassBalancedDataset", oversample_thr=1e-3, dataset=dict( - type='ODVGDataset', + type="ODVGDataset", data_root=data_root, need_text=False, - label_map_file='annotations/lvis_v1_label_map.json', - ann_file='annotations/lvis_v1_train_od.json', - data_prefix=dict(img=''), + label_map_file="annotations/lvis_v1_label_map.json", + ann_file="annotations/lvis_v1_train_od.json", + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False, min_size=32), return_classes=True, - pipeline=train_pipeline))) + pipeline=train_pipeline, + ), + ) +) val_dataloader = dict( dataset=dict( - data_root=data_root, - type='LVISV1Dataset', - ann_file='annotations/lvis_v1_minival_inserted_image_name.json', - data_prefix=dict(img=''))) + data_root=data_root, type="LVISV1Dataset", ann_file="annotations/lvis_v1_minival_inserted_image_name.json", data_prefix=dict(img="") + ) +) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='LVISFixedAPMetric', - ann_file=data_root + - 'annotations/lvis_v1_minival_inserted_image_name.json') +val_evaluator = dict(_delete_=True, type="LVISFixedAPMetric", ann_file=data_root + "annotations/lvis_v1_minival_inserted_image_name.json") test_evaluator = val_evaluator optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0002, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1), + "absolute_pos_embed": dict(decay_mult=0.0), + "backbone": dict(lr_mult=0.1), # 'language_model': dict(lr_mult=0), - })) + } + ), +) # learning policy max_epochs = 12 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[11], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[11], gamma=0.1)] train_cfg = dict(max_epochs=max_epochs, val_interval=3) -default_hooks = dict( - checkpoint=dict( - max_keep_ckpts=1, save_best='lvis_fixed_ap/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best="lvis_fixed_ap/AP", rule="greater")) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_finetune_16xb4_1x_lvis_866_337.py b/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_finetune_16xb4_1x_lvis_866_337.py index 28d0141d3e2c0feba26ae4ed924000960c311bf5..d7d552f8353cd085c92615a1d143e8d0e1e7014e 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_finetune_16xb4_1x_lvis_866_337.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_finetune_16xb4_1x_lvis_866_337.py @@ -1,120 +1,144 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -data_root = 'data/coco/' +data_root = "data/coco/" -model = dict(test_cfg=dict( - max_per_img=300, - chunked_size=40, -)) +model = dict( + test_cfg=dict( + max_per_img=300, + chunked_size=40, + ) +) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), dict( - type='RandomSamplingNegPos', + type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, # change this - label_map_file='data/coco/annotations/lvis_v1_label_map_norare.json', - max_tokens=256), + label_map_file="data/coco/annotations/lvis_v1_label_map_norare.json", + max_tokens=256, + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] train_dataloader = dict( dataset=dict( _delete_=True, - type='ClassBalancedDataset', + type="ClassBalancedDataset", oversample_thr=1e-3, dataset=dict( - type='ODVGDataset', + type="ODVGDataset", data_root=data_root, need_text=False, - label_map_file='annotations/lvis_v1_label_map_norare.json', - ann_file='annotations/lvis_v1_train_od_norare.json', - data_prefix=dict(img=''), + label_map_file="annotations/lvis_v1_label_map_norare.json", + ann_file="annotations/lvis_v1_train_od_norare.json", + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False, min_size=32), return_classes=True, - pipeline=train_pipeline))) + pipeline=train_pipeline, + ), + ) +) val_dataloader = dict( dataset=dict( - data_root=data_root, - type='LVISV1Dataset', - ann_file='annotations/lvis_v1_minival_inserted_image_name.json', - data_prefix=dict(img=''))) + data_root=data_root, type="LVISV1Dataset", ann_file="annotations/lvis_v1_minival_inserted_image_name.json", data_prefix=dict(img="") + ) +) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='LVISFixedAPMetric', - ann_file=data_root + - 'annotations/lvis_v1_minival_inserted_image_name.json') +val_evaluator = dict(_delete_=True, type="LVISFixedAPMetric", ann_file=data_root + "annotations/lvis_v1_minival_inserted_image_name.json") test_evaluator = val_evaluator optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.00005, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.00005, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1), + "absolute_pos_embed": dict(decay_mult=0.0), + "backbone": dict(lr_mult=0.1), # 'language_model': dict(lr_mult=0), - })) + } + ), +) # learning policy max_epochs = 12 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[8, 11], gamma=0.1)] train_cfg = dict(max_epochs=max_epochs, val_interval=3) -default_hooks = dict( - checkpoint=dict( - max_keep_ckpts=3, save_best='lvis_fixed_ap/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(max_keep_ckpts=3, save_best="lvis_fixed_ap/AP", rule="greater")) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_lvis.py b/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_lvis.py index fb4ed438e0b59ca4c991836310cf7103cc02f0f2..4fd2f8009bc7110bc2ec25a351a4dd0ae8204a99 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_lvis.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_lvis.py @@ -1,24 +1,20 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -model = dict(test_cfg=dict( - max_per_img=300, - chunked_size=40, -)) +model = dict( + test_cfg=dict( + max_per_img=300, + chunked_size=40, + ) +) -dataset_type = 'LVISV1Dataset' -data_root = 'data/coco/' +dataset_type = "LVISV1Dataset" +data_root = "data/coco/" val_dataloader = dict( - dataset=dict( - data_root=data_root, - type=dataset_type, - ann_file='annotations/lvis_od_val.json', - data_prefix=dict(img=''))) + dataset=dict(data_root=data_root, type=dataset_type, ann_file="annotations/lvis_od_val.json", data_prefix=dict(img="")) +) test_dataloader = val_dataloader # numpy < 1.24.0 -val_evaluator = dict( - _delete_=True, - type='LVISFixedAPMetric', - ann_file=data_root + 'annotations/lvis_od_val.json') +val_evaluator = dict(_delete_=True, type="LVISFixedAPMetric", ann_file=data_root + "annotations/lvis_od_val.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_mini-lvis.py b/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_mini-lvis.py index 406a39a4264a0d6ea5d7950a205b0bac72e8f846..0ee38b1b43733674cfaedd985d4fd7168fd6ad42 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_mini-lvis.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/lvis/grounding_dino_swin-t_pretrain_zeroshot_mini-lvis.py @@ -1,25 +1,22 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -model = dict(test_cfg=dict( - max_per_img=300, - chunked_size=40, -)) +model = dict( + test_cfg=dict( + max_per_img=300, + chunked_size=40, + ) +) -dataset_type = 'LVISV1Dataset' -data_root = 'data/coco/' +dataset_type = "LVISV1Dataset" +data_root = "data/coco/" val_dataloader = dict( dataset=dict( - data_root=data_root, - type=dataset_type, - ann_file='annotations/lvis_v1_minival_inserted_image_name.json', - data_prefix=dict(img=''))) + data_root=data_root, type=dataset_type, ann_file="annotations/lvis_v1_minival_inserted_image_name.json", data_prefix=dict(img="") + ) +) test_dataloader = val_dataloader # numpy < 1.24.0 -val_evaluator = dict( - _delete_=True, - type='LVISFixedAPMetric', - ann_file=data_root + - 'annotations/lvis_v1_minival_inserted_image_name.json') +val_evaluator = dict(_delete_=True, type="LVISFixedAPMetric", ann_file=data_root + "annotations/lvis_v1_minival_inserted_image_name.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/mm_grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw13.py b/mmpose/configs/mmdet/mm_grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw13.py index d87ca7ca1ea48a3cff83e15f3e2ad66927598d7f..11eeac66796fc1f40a2a6c9ce1fb976832416f3f 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw13.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw13.py @@ -1,36 +1,42 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' # noqa +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" # noqa -dataset_type = 'CocoDataset' -data_root = 'data/odinw/' +dataset_type = "CocoDataset" +data_root = "data/odinw/" base_test_pipeline = _base_.test_pipeline -base_test_pipeline[-1]['meta_keys'] = ('img_id', 'img_path', 'ori_shape', - 'img_shape', 'scale_factor', 'text', - 'custom_entities', 'caption_prompt') +base_test_pipeline[-1]["meta_keys"] = ( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "text", + "custom_entities", + "caption_prompt", +) # ---------------------1 AerialMaritimeDrone---------------------# -class_name = ('boat', 'car', 'dock', 'jetski', 'lift') +class_name = ("boat", "car", "dock", "jetski", "lift") metainfo = dict(classes=class_name) -_data_root = data_root + 'AerialMaritimeDrone/large/' +_data_root = data_root + "AerialMaritimeDrone/large/" dataset_AerialMaritimeDrone = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), test_mode=True, pipeline=base_test_pipeline, - return_classes=True) + return_classes=True, +) val_evaluator_AerialMaritimeDrone = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------2 Aquarium---------------------# -class_name = ('fish', 'jellyfish', 'penguin', 'puffin', 'shark', 'starfish', - 'stingray') +class_name = ("fish", "jellyfish", "penguin", "puffin", "shark", "starfish", "stingray") metainfo = dict(classes=class_name) -_data_root = data_root + 'Aquarium/Aquarium Combined.v2-raw-1024.coco/' +_data_root = data_root + "Aquarium/Aquarium Combined.v2-raw-1024.coco/" caption_prompt = None # caption_prompt = { @@ -48,291 +54,292 @@ dataset_Aquarium = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_Aquarium = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Aquarium = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------3 CottontailRabbits---------------------# -class_name = ('Cottontail-Rabbit', ) +class_name = ("Cottontail-Rabbit",) metainfo = dict(classes=class_name) -_data_root = data_root + 'CottontailRabbits/' +_data_root = data_root + "CottontailRabbits/" # caption_prompt = None -caption_prompt = {'Cottontail-Rabbit': {'name': 'rabbit'}} +caption_prompt = {"Cottontail-Rabbit": {"name": "rabbit"}} dataset_CottontailRabbits = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_CottontailRabbits = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_CottontailRabbits = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------4 EgoHands---------------------# -class_name = ('hand', ) +class_name = ("hand",) metainfo = dict(classes=class_name) -_data_root = data_root + 'EgoHands/generic/' +_data_root = data_root + "EgoHands/generic/" # caption_prompt = None -caption_prompt = {'hand': {'suffix': ' of a person'}} +caption_prompt = {"hand": {"suffix": " of a person"}} dataset_EgoHands = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_EgoHands = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_EgoHands = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------5 NorthAmericaMushrooms---------------------# -class_name = ('CoW', 'chanterelle') +class_name = ("CoW", "chanterelle") metainfo = dict(classes=class_name) -_data_root = data_root + 'NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/' # noqa +_data_root = data_root + "NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/" # noqa # caption_prompt = None -caption_prompt = { - 'CoW': { - 'name': 'flat mushroom' - }, - 'chanterelle': { - 'name': 'yellow mushroom' - } -} +caption_prompt = {"CoW": {"name": "flat mushroom"}, "chanterelle": {"name": "yellow mushroom"}} dataset_NorthAmericaMushrooms = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_NorthAmericaMushrooms = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------6 Packages---------------------# -class_name = ('package', ) +class_name = ("package",) metainfo = dict(classes=class_name) -_data_root = data_root + 'Packages/Raw/' +_data_root = data_root + "Packages/Raw/" # caption_prompt = None -caption_prompt = { - 'package': { - 'prefix': 'there is a ', - 'suffix': ' on the porch' - } -} +caption_prompt = {"package": {"prefix": "there is a ", "suffix": " on the porch"}} dataset_Packages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_Packages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Packages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------7 PascalVOC---------------------# -class_name = ('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', - 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', - 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', - 'tvmonitor') +class_name = ( + "aeroplane", + "bicycle", + "bird", + "boat", + "bottle", + "bus", + "car", + "cat", + "chair", + "cow", + "diningtable", + "dog", + "horse", + "motorbike", + "person", + "pottedplant", + "sheep", + "sofa", + "train", + "tvmonitor", +) metainfo = dict(classes=class_name) -_data_root = data_root + 'PascalVOC/' +_data_root = data_root + "PascalVOC/" dataset_PascalVOC = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_PascalVOC = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_PascalVOC = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------8 pistols---------------------# -class_name = ('pistol', ) +class_name = ("pistol",) metainfo = dict(classes=class_name) -_data_root = data_root + 'pistols/export/' +_data_root = data_root + "pistols/export/" dataset_pistols = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pistols = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pistols = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------9 pothole---------------------# -class_name = ('pothole', ) +class_name = ("pothole",) metainfo = dict(classes=class_name) -_data_root = data_root + 'pothole/' +_data_root = data_root + "pothole/" # caption_prompt = None -caption_prompt = { - 'pothole': { - 'prefix': 'there are some ', - 'name': 'holes', - 'suffix': ' on the road' - } -} +caption_prompt = {"pothole": {"prefix": "there are some ", "name": "holes", "suffix": " on the road"}} dataset_pothole = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pothole = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pothole = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------10 Raccoon---------------------# -class_name = ('raccoon', ) +class_name = ("raccoon",) metainfo = dict(classes=class_name) -_data_root = data_root + 'Raccoon/Raccoon.v2-raw.coco/' +_data_root = data_root + "Raccoon/Raccoon.v2-raw.coco/" dataset_Raccoon = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_Raccoon = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Raccoon = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------11 ShellfishOpenImages---------------------# -class_name = ('Crab', 'Lobster', 'Shrimp') +class_name = ("Crab", "Lobster", "Shrimp") metainfo = dict(classes=class_name) -_data_root = data_root + 'ShellfishOpenImages/raw/' +_data_root = data_root + "ShellfishOpenImages/raw/" dataset_ShellfishOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_ShellfishOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------12 thermalDogsAndPeople---------------------# -class_name = ('dog', 'person') +class_name = ("dog", "person") metainfo = dict(classes=class_name) -_data_root = data_root + 'thermalDogsAndPeople/' +_data_root = data_root + "thermalDogsAndPeople/" dataset_thermalDogsAndPeople = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_thermalDogsAndPeople = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------13 VehiclesOpenImages---------------------# -class_name = ('Ambulance', 'Bus', 'Car', 'Motorcycle', 'Truck') +class_name = ("Ambulance", "Bus", "Car", "Motorcycle", "Truck") metainfo = dict(classes=class_name) -_data_root = data_root + 'VehiclesOpenImages/416x416/' +_data_root = data_root + "VehiclesOpenImages/416x416/" dataset_VehiclesOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_VehiclesOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_VehiclesOpenImages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # --------------------- Config---------------------# dataset_prefixes = [ - 'AerialMaritimeDrone', 'Aquarium', 'CottontailRabbits', 'EgoHands', - 'NorthAmericaMushrooms', 'Packages', 'PascalVOC', 'pistols', 'pothole', - 'Raccoon', 'ShellfishOpenImages', 'thermalDogsAndPeople', - 'VehiclesOpenImages' + "AerialMaritimeDrone", + "Aquarium", + "CottontailRabbits", + "EgoHands", + "NorthAmericaMushrooms", + "Packages", + "PascalVOC", + "pistols", + "pothole", + "Raccoon", + "ShellfishOpenImages", + "thermalDogsAndPeople", + "VehiclesOpenImages", ] datasets = [ - dataset_AerialMaritimeDrone, dataset_Aquarium, dataset_CottontailRabbits, - dataset_EgoHands, dataset_NorthAmericaMushrooms, dataset_Packages, - dataset_PascalVOC, dataset_pistols, dataset_pothole, dataset_Raccoon, - dataset_ShellfishOpenImages, dataset_thermalDogsAndPeople, - dataset_VehiclesOpenImages + dataset_AerialMaritimeDrone, + dataset_Aquarium, + dataset_CottontailRabbits, + dataset_EgoHands, + dataset_NorthAmericaMushrooms, + dataset_Packages, + dataset_PascalVOC, + dataset_pistols, + dataset_pothole, + dataset_Raccoon, + dataset_ShellfishOpenImages, + dataset_thermalDogsAndPeople, + dataset_VehiclesOpenImages, ] metrics = [ - val_evaluator_AerialMaritimeDrone, val_evaluator_Aquarium, - val_evaluator_CottontailRabbits, val_evaluator_EgoHands, - val_evaluator_NorthAmericaMushrooms, val_evaluator_Packages, - val_evaluator_PascalVOC, val_evaluator_pistols, val_evaluator_pothole, - val_evaluator_Raccoon, val_evaluator_ShellfishOpenImages, - val_evaluator_thermalDogsAndPeople, val_evaluator_VehiclesOpenImages + val_evaluator_AerialMaritimeDrone, + val_evaluator_Aquarium, + val_evaluator_CottontailRabbits, + val_evaluator_EgoHands, + val_evaluator_NorthAmericaMushrooms, + val_evaluator_Packages, + val_evaluator_PascalVOC, + val_evaluator_pistols, + val_evaluator_pothole, + val_evaluator_Raccoon, + val_evaluator_ShellfishOpenImages, + val_evaluator_thermalDogsAndPeople, + val_evaluator_VehiclesOpenImages, ] # -------------------------------------------------# -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/mm_grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw35.py b/mmpose/configs/mmdet/mm_grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw35.py index a6b8566aed486ef48653b6e54200cb8817910f2f..62a13655d288fb5af90cf0cce95127f24cd04be1 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw35.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/odinw/grounding_dino_swin-t_pretrain_odinw35.py @@ -1,794 +1,949 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' # noqa +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" # noqa -dataset_type = 'CocoDataset' -data_root = 'data/odinw/' +dataset_type = "CocoDataset" +data_root = "data/odinw/" base_test_pipeline = _base_.test_pipeline -base_test_pipeline[-1]['meta_keys'] = ('img_id', 'img_path', 'ori_shape', - 'img_shape', 'scale_factor', 'text', - 'custom_entities', 'caption_prompt') +base_test_pipeline[-1]["meta_keys"] = ( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "text", + "custom_entities", + "caption_prompt", +) # ---------------------1 AerialMaritimeDrone_large---------------------# -class_name = ('boat', 'car', 'dock', 'jetski', 'lift') +class_name = ("boat", "car", "dock", "jetski", "lift") metainfo = dict(classes=class_name) -_data_root = data_root + 'AerialMaritimeDrone/large/' +_data_root = data_root + "AerialMaritimeDrone/large/" dataset_AerialMaritimeDrone_large = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_AerialMaritimeDrone_large = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------2 AerialMaritimeDrone_tiled---------------------# -class_name = ('boat', 'car', 'dock', 'jetski', 'lift') +class_name = ("boat", "car", "dock", "jetski", "lift") metainfo = dict(classes=class_name) -_data_root = data_root + 'AerialMaritimeDrone/tiled/' +_data_root = data_root + "AerialMaritimeDrone/tiled/" dataset_AerialMaritimeDrone_tiled = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_AerialMaritimeDrone_tiled = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------3 AmericanSignLanguageLetters---------------------# -class_name = ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', - 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z') -metainfo = dict(classes=class_name) -_data_root = data_root + 'AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/' # noqa +class_name = ( + "A", + "B", + "C", + "D", + "E", + "F", + "G", + "H", + "I", + "J", + "K", + "L", + "M", + "N", + "O", + "P", + "Q", + "R", + "S", + "T", + "U", + "V", + "W", + "X", + "Y", + "Z", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/" # noqa dataset_AmericanSignLanguageLetters = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_AmericanSignLanguageLetters = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------4 Aquarium---------------------# -class_name = ('fish', 'jellyfish', 'penguin', 'puffin', 'shark', 'starfish', - 'stingray') +class_name = ("fish", "jellyfish", "penguin", "puffin", "shark", "starfish", "stingray") metainfo = dict(classes=class_name) -_data_root = data_root + 'Aquarium/Aquarium Combined.v2-raw-1024.coco/' +_data_root = data_root + "Aquarium/Aquarium Combined.v2-raw-1024.coco/" dataset_Aquarium = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_Aquarium = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Aquarium = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------5 BCCD---------------------# -class_name = ('Platelets', 'RBC', 'WBC') +class_name = ("Platelets", "RBC", "WBC") metainfo = dict(classes=class_name) -_data_root = data_root + 'BCCD/BCCD.v3-raw.coco/' +_data_root = data_root + "BCCD/BCCD.v3-raw.coco/" dataset_BCCD = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_BCCD = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_BCCD = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------6 boggleBoards---------------------# -class_name = ('Q', 'a', 'an', 'b', 'c', 'd', 'e', 'er', 'f', 'g', 'h', 'he', - 'i', 'in', 'j', 'k', 'l', 'm', 'n', 'o', 'o ', 'p', 'q', 'qu', - 'r', 's', 't', 't\\', 'th', 'u', 'v', 'w', 'wild', 'x', 'y', 'z') -metainfo = dict(classes=class_name) -_data_root = data_root + 'boggleBoards/416x416AutoOrient/export/' +class_name = ( + "Q", + "a", + "an", + "b", + "c", + "d", + "e", + "er", + "f", + "g", + "h", + "he", + "i", + "in", + "j", + "k", + "l", + "m", + "n", + "o", + "o ", + "p", + "q", + "qu", + "r", + "s", + "t", + "t\\", + "th", + "u", + "v", + "w", + "wild", + "x", + "y", + "z", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "boggleBoards/416x416AutoOrient/export/" dataset_boggleBoards = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_boggleBoards = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_boggleBoards = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------7 brackishUnderwater---------------------# -class_name = ('crab', 'fish', 'jellyfish', 'shrimp', 'small_fish', 'starfish') +class_name = ("crab", "fish", "jellyfish", "shrimp", "small_fish", "starfish") metainfo = dict(classes=class_name) -_data_root = data_root + 'brackishUnderwater/960x540/' +_data_root = data_root + "brackishUnderwater/960x540/" dataset_brackishUnderwater = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_brackishUnderwater = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_brackishUnderwater = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------8 ChessPieces---------------------# -class_name = (' ', 'black bishop', 'black king', 'black knight', 'black pawn', - 'black queen', 'black rook', 'white bishop', 'white king', - 'white knight', 'white pawn', 'white queen', 'white rook') -metainfo = dict(classes=class_name) -_data_root = data_root + 'ChessPieces/Chess Pieces.v23-raw.coco/' +class_name = ( + " ", + "black bishop", + "black king", + "black knight", + "black pawn", + "black queen", + "black rook", + "white bishop", + "white king", + "white knight", + "white pawn", + "white queen", + "white rook", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "ChessPieces/Chess Pieces.v23-raw.coco/" dataset_ChessPieces = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/new_annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/new_annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_ChessPieces = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/new_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_ChessPieces = dict(type="CocoMetric", ann_file=_data_root + "valid/new_annotations_without_background.json", metric="bbox") # ---------------------9 CottontailRabbits---------------------# -class_name = ('rabbit', ) +class_name = ("rabbit",) metainfo = dict(classes=class_name) -_data_root = data_root + 'CottontailRabbits/' +_data_root = data_root + "CottontailRabbits/" dataset_CottontailRabbits = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/new_annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/new_annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_CottontailRabbits = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/new_annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/new_annotations_without_background.json", metric="bbox" +) # ---------------------10 dice---------------------# -class_name = ('1', '2', '3', '4', '5', '6') +class_name = ("1", "2", "3", "4", "5", "6") metainfo = dict(classes=class_name) -_data_root = data_root + 'dice/mediumColor/export/' +_data_root = data_root + "dice/mediumColor/export/" dataset_dice = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_dice = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_dice = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------11 DroneControl---------------------# -class_name = ('follow', 'follow_hand', 'land', 'land_hand', 'null', 'object', - 'takeoff', 'takeoff-hand') +class_name = ("follow", "follow_hand", "land", "land_hand", "null", "object", "takeoff", "takeoff-hand") metainfo = dict(classes=class_name) -_data_root = data_root + 'DroneControl/Drone Control.v3-raw.coco/' +_data_root = data_root + "DroneControl/Drone Control.v3-raw.coco/" dataset_DroneControl = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_DroneControl = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_DroneControl = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------12 EgoHands_generic---------------------# -class_name = ('hand', ) +class_name = ("hand",) metainfo = dict(classes=class_name) -_data_root = data_root + 'EgoHands/generic/' -caption_prompt = {'hand': {'suffix': ' of a person'}} +_data_root = data_root + "EgoHands/generic/" +caption_prompt = {"hand": {"suffix": " of a person"}} dataset_EgoHands_generic = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_EgoHands_generic = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_EgoHands_generic = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------13 EgoHands_specific---------------------# -class_name = ('myleft', 'myright', 'yourleft', 'yourright') +class_name = ("myleft", "myright", "yourleft", "yourright") metainfo = dict(classes=class_name) -_data_root = data_root + 'EgoHands/specific/' +_data_root = data_root + "EgoHands/specific/" dataset_EgoHands_specific = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_EgoHands_specific = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_EgoHands_specific = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------14 HardHatWorkers---------------------# -class_name = ('head', 'helmet', 'person') +class_name = ("head", "helmet", "person") metainfo = dict(classes=class_name) -_data_root = data_root + 'HardHatWorkers/raw/' +_data_root = data_root + "HardHatWorkers/raw/" dataset_HardHatWorkers = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_HardHatWorkers = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_HardHatWorkers = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------15 MaskWearing---------------------# -class_name = ('mask', 'no-mask') +class_name = ("mask", "no-mask") metainfo = dict(classes=class_name) -_data_root = data_root + 'MaskWearing/raw/' +_data_root = data_root + "MaskWearing/raw/" dataset_MaskWearing = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_MaskWearing = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_MaskWearing = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------16 MountainDewCommercial---------------------# -class_name = ('bottle', ) +class_name = ("bottle",) metainfo = dict(classes=class_name) -_data_root = data_root + 'MountainDewCommercial/' +_data_root = data_root + "MountainDewCommercial/" dataset_MountainDewCommercial = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_MountainDewCommercial = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------17 NorthAmericaMushrooms---------------------# -class_name = ('flat mushroom', 'yellow mushroom') +class_name = ("flat mushroom", "yellow mushroom") metainfo = dict(classes=class_name) -_data_root = data_root + 'NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/' # noqa +_data_root = data_root + "NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/" # noqa dataset_NorthAmericaMushrooms = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/new_annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/new_annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_NorthAmericaMushrooms = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/new_annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/new_annotations_without_background.json", metric="bbox" +) # ---------------------18 openPoetryVision---------------------# -class_name = ('American Typewriter', 'Andale Mono', 'Apple Chancery', 'Arial', - 'Avenir', 'Baskerville', 'Big Caslon', 'Bradley Hand', - 'Brush Script MT', 'Chalkboard', 'Comic Sans MS', 'Copperplate', - 'Courier', 'Didot', 'Futura', 'Geneva', 'Georgia', 'Gill Sans', - 'Helvetica', 'Herculanum', 'Impact', 'Kefa', 'Lucida Grande', - 'Luminari', 'Marker Felt', 'Menlo', 'Monaco', 'Noteworthy', - 'Optima', 'PT Sans', 'PT Serif', 'Palatino', 'Papyrus', - 'Phosphate', 'Rockwell', 'SF Pro', 'SignPainter', 'Skia', - 'Snell Roundhand', 'Tahoma', 'Times New Roman', 'Trebuchet MS', - 'Verdana') -metainfo = dict(classes=class_name) -_data_root = data_root + 'openPoetryVision/512x512/' +class_name = ( + "American Typewriter", + "Andale Mono", + "Apple Chancery", + "Arial", + "Avenir", + "Baskerville", + "Big Caslon", + "Bradley Hand", + "Brush Script MT", + "Chalkboard", + "Comic Sans MS", + "Copperplate", + "Courier", + "Didot", + "Futura", + "Geneva", + "Georgia", + "Gill Sans", + "Helvetica", + "Herculanum", + "Impact", + "Kefa", + "Lucida Grande", + "Luminari", + "Marker Felt", + "Menlo", + "Monaco", + "Noteworthy", + "Optima", + "PT Sans", + "PT Serif", + "Palatino", + "Papyrus", + "Phosphate", + "Rockwell", + "SF Pro", + "SignPainter", + "Skia", + "Snell Roundhand", + "Tahoma", + "Times New Roman", + "Trebuchet MS", + "Verdana", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "openPoetryVision/512x512/" dataset_openPoetryVision = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_openPoetryVision = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_openPoetryVision = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------19 OxfordPets_by_breed---------------------# -class_name = ('cat-Abyssinian', 'cat-Bengal', 'cat-Birman', 'cat-Bombay', - 'cat-British_Shorthair', 'cat-Egyptian_Mau', 'cat-Maine_Coon', - 'cat-Persian', 'cat-Ragdoll', 'cat-Russian_Blue', 'cat-Siamese', - 'cat-Sphynx', 'dog-american_bulldog', - 'dog-american_pit_bull_terrier', 'dog-basset_hound', - 'dog-beagle', 'dog-boxer', 'dog-chihuahua', - 'dog-english_cocker_spaniel', 'dog-english_setter', - 'dog-german_shorthaired', 'dog-great_pyrenees', 'dog-havanese', - 'dog-japanese_chin', 'dog-keeshond', 'dog-leonberger', - 'dog-miniature_pinscher', 'dog-newfoundland', 'dog-pomeranian', - 'dog-pug', 'dog-saint_bernard', 'dog-samoyed', - 'dog-scottish_terrier', 'dog-shiba_inu', - 'dog-staffordshire_bull_terrier', 'dog-wheaten_terrier', - 'dog-yorkshire_terrier') -metainfo = dict(classes=class_name) -_data_root = data_root + 'OxfordPets/by-breed/' # noqa +class_name = ( + "cat-Abyssinian", + "cat-Bengal", + "cat-Birman", + "cat-Bombay", + "cat-British_Shorthair", + "cat-Egyptian_Mau", + "cat-Maine_Coon", + "cat-Persian", + "cat-Ragdoll", + "cat-Russian_Blue", + "cat-Siamese", + "cat-Sphynx", + "dog-american_bulldog", + "dog-american_pit_bull_terrier", + "dog-basset_hound", + "dog-beagle", + "dog-boxer", + "dog-chihuahua", + "dog-english_cocker_spaniel", + "dog-english_setter", + "dog-german_shorthaired", + "dog-great_pyrenees", + "dog-havanese", + "dog-japanese_chin", + "dog-keeshond", + "dog-leonberger", + "dog-miniature_pinscher", + "dog-newfoundland", + "dog-pomeranian", + "dog-pug", + "dog-saint_bernard", + "dog-samoyed", + "dog-scottish_terrier", + "dog-shiba_inu", + "dog-staffordshire_bull_terrier", + "dog-wheaten_terrier", + "dog-yorkshire_terrier", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "OxfordPets/by-breed/" # noqa dataset_OxfordPets_by_breed = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_OxfordPets_by_breed = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------20 OxfordPets_by_species---------------------# -class_name = ('cat', 'dog') +class_name = ("cat", "dog") metainfo = dict(classes=class_name) -_data_root = data_root + 'OxfordPets/by-species/' # noqa +_data_root = data_root + "OxfordPets/by-species/" # noqa dataset_OxfordPets_by_species = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_OxfordPets_by_species = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------21 PKLot---------------------# -class_name = ('space-empty', 'space-occupied') +class_name = ("space-empty", "space-occupied") metainfo = dict(classes=class_name) -_data_root = data_root + 'PKLot/640/' # noqa +_data_root = data_root + "PKLot/640/" # noqa dataset_PKLot = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_PKLot = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_PKLot = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------22 Packages---------------------# -class_name = ('package', ) -metainfo = dict(classes=class_name) -_data_root = data_root + 'Packages/Raw/' -caption_prompt = { - 'package': { - 'prefix': 'there is a ', - 'suffix': ' on the porch' - } -} +class_name = ("package",) +metainfo = dict(classes=class_name) +_data_root = data_root + "Packages/Raw/" +caption_prompt = {"package": {"prefix": "there is a ", "suffix": " on the porch"}} dataset_Packages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=base_test_pipeline, caption_prompt=caption_prompt, test_mode=True, - return_classes=True) -val_evaluator_Packages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Packages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------23 PascalVOC---------------------# -class_name = ('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', - 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', - 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', - 'tvmonitor') -metainfo = dict(classes=class_name) -_data_root = data_root + 'PascalVOC/' +class_name = ( + "aeroplane", + "bicycle", + "bird", + "boat", + "bottle", + "bus", + "car", + "cat", + "chair", + "cow", + "diningtable", + "dog", + "horse", + "motorbike", + "person", + "pottedplant", + "sheep", + "sofa", + "train", + "tvmonitor", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "PascalVOC/" dataset_PascalVOC = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_PascalVOC = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_PascalVOC = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------24 pistols---------------------# -class_name = ('pistol', ) +class_name = ("pistol",) metainfo = dict(classes=class_name) -_data_root = data_root + 'pistols/export/' +_data_root = data_root + "pistols/export/" dataset_pistols = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pistols = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pistols = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------25 plantdoc---------------------# -class_name = ('Apple Scab Leaf', 'Apple leaf', 'Apple rust leaf', - 'Bell_pepper leaf', 'Bell_pepper leaf spot', 'Blueberry leaf', - 'Cherry leaf', 'Corn Gray leaf spot', 'Corn leaf blight', - 'Corn rust leaf', 'Peach leaf', 'Potato leaf', - 'Potato leaf early blight', 'Potato leaf late blight', - 'Raspberry leaf', 'Soyabean leaf', 'Soybean leaf', - 'Squash Powdery mildew leaf', 'Strawberry leaf', - 'Tomato Early blight leaf', 'Tomato Septoria leaf spot', - 'Tomato leaf', 'Tomato leaf bacterial spot', - 'Tomato leaf late blight', 'Tomato leaf mosaic virus', - 'Tomato leaf yellow virus', 'Tomato mold leaf', - 'Tomato two spotted spider mites leaf', 'grape leaf', - 'grape leaf black rot') -metainfo = dict(classes=class_name) -_data_root = data_root + 'plantdoc/416x416/' +class_name = ( + "Apple Scab Leaf", + "Apple leaf", + "Apple rust leaf", + "Bell_pepper leaf", + "Bell_pepper leaf spot", + "Blueberry leaf", + "Cherry leaf", + "Corn Gray leaf spot", + "Corn leaf blight", + "Corn rust leaf", + "Peach leaf", + "Potato leaf", + "Potato leaf early blight", + "Potato leaf late blight", + "Raspberry leaf", + "Soyabean leaf", + "Soybean leaf", + "Squash Powdery mildew leaf", + "Strawberry leaf", + "Tomato Early blight leaf", + "Tomato Septoria leaf spot", + "Tomato leaf", + "Tomato leaf bacterial spot", + "Tomato leaf late blight", + "Tomato leaf mosaic virus", + "Tomato leaf yellow virus", + "Tomato mold leaf", + "Tomato two spotted spider mites leaf", + "grape leaf", + "grape leaf black rot", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "plantdoc/416x416/" dataset_plantdoc = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_plantdoc = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_plantdoc = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------26 pothole---------------------# -class_name = ('pothole', ) -metainfo = dict(classes=class_name) -_data_root = data_root + 'pothole/' -caption_prompt = { - 'pothole': { - 'name': 'holes', - 'prefix': 'there are some ', - 'suffix': ' on the road' - } -} +class_name = ("pothole",) +metainfo = dict(classes=class_name) +_data_root = data_root + "pothole/" +caption_prompt = {"pothole": {"name": "holes", "prefix": "there are some ", "suffix": " on the road"}} dataset_pothole = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), caption_prompt=caption_prompt, pipeline=base_test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_pothole = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_pothole = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------27 Raccoon---------------------# -class_name = ('raccoon', ) +class_name = ("raccoon",) metainfo = dict(classes=class_name) -_data_root = data_root + 'Raccoon/Raccoon.v2-raw.coco/' +_data_root = data_root + "Raccoon/Raccoon.v2-raw.coco/" dataset_Raccoon = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_Raccoon = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_Raccoon = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------28 selfdrivingCar---------------------# -class_name = ('biker', 'car', 'pedestrian', 'trafficLight', - 'trafficLight-Green', 'trafficLight-GreenLeft', - 'trafficLight-Red', 'trafficLight-RedLeft', - 'trafficLight-Yellow', 'trafficLight-YellowLeft', 'truck') -metainfo = dict(classes=class_name) -_data_root = data_root + 'selfdrivingCar/fixedLarge/export/' +class_name = ( + "biker", + "car", + "pedestrian", + "trafficLight", + "trafficLight-Green", + "trafficLight-GreenLeft", + "trafficLight-Red", + "trafficLight-RedLeft", + "trafficLight-Yellow", + "trafficLight-YellowLeft", + "truck", +) +metainfo = dict(classes=class_name) +_data_root = data_root + "selfdrivingCar/fixedLarge/export/" dataset_selfdrivingCar = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='val_annotations_without_background.json', - data_prefix=dict(img=''), + ann_file="val_annotations_without_background.json", + data_prefix=dict(img=""), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_selfdrivingCar = dict( - type='CocoMetric', - ann_file=_data_root + 'val_annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_selfdrivingCar = dict(type="CocoMetric", ann_file=_data_root + "val_annotations_without_background.json", metric="bbox") # ---------------------29 ShellfishOpenImages---------------------# -class_name = ('Crab', 'Lobster', 'Shrimp') +class_name = ("Crab", "Lobster", "Shrimp") metainfo = dict(classes=class_name) -_data_root = data_root + 'ShellfishOpenImages/raw/' +_data_root = data_root + "ShellfishOpenImages/raw/" dataset_ShellfishOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_ShellfishOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------30 ThermalCheetah---------------------# -class_name = ('cheetah', 'human') +class_name = ("cheetah", "human") metainfo = dict(classes=class_name) -_data_root = data_root + 'ThermalCheetah/' +_data_root = data_root + "ThermalCheetah/" dataset_ThermalCheetah = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_ThermalCheetah = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_ThermalCheetah = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------31 thermalDogsAndPeople---------------------# -class_name = ('dog', 'person') +class_name = ("dog", "person") metainfo = dict(classes=class_name) -_data_root = data_root + 'thermalDogsAndPeople/' +_data_root = data_root + "thermalDogsAndPeople/" dataset_thermalDogsAndPeople = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) + return_classes=True, +) val_evaluator_thermalDogsAndPeople = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox" +) # ---------------------32 UnoCards---------------------# -class_name = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', - '12', '13', '14') +class_name = ("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14") metainfo = dict(classes=class_name) -_data_root = data_root + 'UnoCards/raw/' +_data_root = data_root + "UnoCards/raw/" dataset_UnoCards = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_UnoCards = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_UnoCards = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------33 VehiclesOpenImages---------------------# -class_name = ('Ambulance', 'Bus', 'Car', 'Motorcycle', 'Truck') +class_name = ("Ambulance", "Bus", "Car", "Motorcycle", "Truck") metainfo = dict(classes=class_name) -_data_root = data_root + 'VehiclesOpenImages/416x416/' +_data_root = data_root + "VehiclesOpenImages/416x416/" dataset_VehiclesOpenImages = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_VehiclesOpenImages = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_VehiclesOpenImages = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------34 WildfireSmoke---------------------# -class_name = ('smoke', ) +class_name = ("smoke",) metainfo = dict(classes=class_name) -_data_root = data_root + 'WildfireSmoke/' +_data_root = data_root + "WildfireSmoke/" dataset_WildfireSmoke = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_WildfireSmoke = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_WildfireSmoke = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # ---------------------35 websiteScreenshots---------------------# -class_name = ('button', 'field', 'heading', 'iframe', 'image', 'label', 'link', - 'text') +class_name = ("button", "field", "heading", "iframe", "image", "label", "link", "text") metainfo = dict(classes=class_name) -_data_root = data_root + 'websiteScreenshots/' +_data_root = data_root + "websiteScreenshots/" dataset_websiteScreenshots = dict( type=dataset_type, metainfo=metainfo, data_root=_data_root, - ann_file='valid/annotations_without_background.json', - data_prefix=dict(img='valid/'), + ann_file="valid/annotations_without_background.json", + data_prefix=dict(img="valid/"), pipeline=_base_.test_pipeline, test_mode=True, - return_classes=True) -val_evaluator_websiteScreenshots = dict( - type='CocoMetric', - ann_file=_data_root + 'valid/annotations_without_background.json', - metric='bbox') + return_classes=True, +) +val_evaluator_websiteScreenshots = dict(type="CocoMetric", ann_file=_data_root + "valid/annotations_without_background.json", metric="bbox") # --------------------- Config---------------------# dataset_prefixes = [ - 'AerialMaritimeDrone_large', - 'AerialMaritimeDrone_tiled', - 'AmericanSignLanguageLetters', - 'Aquarium', - 'BCCD', - 'boggleBoards', - 'brackishUnderwater', - 'ChessPieces', - 'CottontailRabbits', - 'dice', - 'DroneControl', - 'EgoHands_generic', - 'EgoHands_specific', - 'HardHatWorkers', - 'MaskWearing', - 'MountainDewCommercial', - 'NorthAmericaMushrooms', - 'openPoetryVision', - 'OxfordPets_by_breed', - 'OxfordPets_by_species', - 'PKLot', - 'Packages', - 'PascalVOC', - 'pistols', - 'plantdoc', - 'pothole', - 'Raccoons', - 'selfdrivingCar', - 'ShellfishOpenImages', - 'ThermalCheetah', - 'thermalDogsAndPeople', - 'UnoCards', - 'VehiclesOpenImages', - 'WildfireSmoke', - 'websiteScreenshots', + "AerialMaritimeDrone_large", + "AerialMaritimeDrone_tiled", + "AmericanSignLanguageLetters", + "Aquarium", + "BCCD", + "boggleBoards", + "brackishUnderwater", + "ChessPieces", + "CottontailRabbits", + "dice", + "DroneControl", + "EgoHands_generic", + "EgoHands_specific", + "HardHatWorkers", + "MaskWearing", + "MountainDewCommercial", + "NorthAmericaMushrooms", + "openPoetryVision", + "OxfordPets_by_breed", + "OxfordPets_by_species", + "PKLot", + "Packages", + "PascalVOC", + "pistols", + "plantdoc", + "pothole", + "Raccoons", + "selfdrivingCar", + "ShellfishOpenImages", + "ThermalCheetah", + "thermalDogsAndPeople", + "UnoCards", + "VehiclesOpenImages", + "WildfireSmoke", + "websiteScreenshots", ] datasets = [ - dataset_AerialMaritimeDrone_large, dataset_AerialMaritimeDrone_tiled, - dataset_AmericanSignLanguageLetters, dataset_Aquarium, dataset_BCCD, - dataset_boggleBoards, dataset_brackishUnderwater, dataset_ChessPieces, - dataset_CottontailRabbits, dataset_dice, dataset_DroneControl, - dataset_EgoHands_generic, dataset_EgoHands_specific, - dataset_HardHatWorkers, dataset_MaskWearing, dataset_MountainDewCommercial, - dataset_NorthAmericaMushrooms, dataset_openPoetryVision, - dataset_OxfordPets_by_breed, dataset_OxfordPets_by_species, dataset_PKLot, - dataset_Packages, dataset_PascalVOC, dataset_pistols, dataset_plantdoc, - dataset_pothole, dataset_Raccoon, dataset_selfdrivingCar, - dataset_ShellfishOpenImages, dataset_ThermalCheetah, - dataset_thermalDogsAndPeople, dataset_UnoCards, dataset_VehiclesOpenImages, - dataset_WildfireSmoke, dataset_websiteScreenshots + dataset_AerialMaritimeDrone_large, + dataset_AerialMaritimeDrone_tiled, + dataset_AmericanSignLanguageLetters, + dataset_Aquarium, + dataset_BCCD, + dataset_boggleBoards, + dataset_brackishUnderwater, + dataset_ChessPieces, + dataset_CottontailRabbits, + dataset_dice, + dataset_DroneControl, + dataset_EgoHands_generic, + dataset_EgoHands_specific, + dataset_HardHatWorkers, + dataset_MaskWearing, + dataset_MountainDewCommercial, + dataset_NorthAmericaMushrooms, + dataset_openPoetryVision, + dataset_OxfordPets_by_breed, + dataset_OxfordPets_by_species, + dataset_PKLot, + dataset_Packages, + dataset_PascalVOC, + dataset_pistols, + dataset_plantdoc, + dataset_pothole, + dataset_Raccoon, + dataset_selfdrivingCar, + dataset_ShellfishOpenImages, + dataset_ThermalCheetah, + dataset_thermalDogsAndPeople, + dataset_UnoCards, + dataset_VehiclesOpenImages, + dataset_WildfireSmoke, + dataset_websiteScreenshots, ] metrics = [ val_evaluator_AerialMaritimeDrone_large, val_evaluator_AerialMaritimeDrone_tiled, - val_evaluator_AmericanSignLanguageLetters, val_evaluator_Aquarium, - val_evaluator_BCCD, val_evaluator_boggleBoards, - val_evaluator_brackishUnderwater, val_evaluator_ChessPieces, - val_evaluator_CottontailRabbits, val_evaluator_dice, - val_evaluator_DroneControl, val_evaluator_EgoHands_generic, - val_evaluator_EgoHands_specific, val_evaluator_HardHatWorkers, - val_evaluator_MaskWearing, val_evaluator_MountainDewCommercial, - val_evaluator_NorthAmericaMushrooms, val_evaluator_openPoetryVision, - val_evaluator_OxfordPets_by_breed, val_evaluator_OxfordPets_by_species, - val_evaluator_PKLot, val_evaluator_Packages, val_evaluator_PascalVOC, - val_evaluator_pistols, val_evaluator_plantdoc, val_evaluator_pothole, - val_evaluator_Raccoon, val_evaluator_selfdrivingCar, - val_evaluator_ShellfishOpenImages, val_evaluator_ThermalCheetah, - val_evaluator_thermalDogsAndPeople, val_evaluator_UnoCards, - val_evaluator_VehiclesOpenImages, val_evaluator_WildfireSmoke, - val_evaluator_websiteScreenshots + val_evaluator_AmericanSignLanguageLetters, + val_evaluator_Aquarium, + val_evaluator_BCCD, + val_evaluator_boggleBoards, + val_evaluator_brackishUnderwater, + val_evaluator_ChessPieces, + val_evaluator_CottontailRabbits, + val_evaluator_dice, + val_evaluator_DroneControl, + val_evaluator_EgoHands_generic, + val_evaluator_EgoHands_specific, + val_evaluator_HardHatWorkers, + val_evaluator_MaskWearing, + val_evaluator_MountainDewCommercial, + val_evaluator_NorthAmericaMushrooms, + val_evaluator_openPoetryVision, + val_evaluator_OxfordPets_by_breed, + val_evaluator_OxfordPets_by_species, + val_evaluator_PKLot, + val_evaluator_Packages, + val_evaluator_PascalVOC, + val_evaluator_pistols, + val_evaluator_plantdoc, + val_evaluator_pothole, + val_evaluator_Raccoon, + val_evaluator_selfdrivingCar, + val_evaluator_ShellfishOpenImages, + val_evaluator_ThermalCheetah, + val_evaluator_thermalDogsAndPeople, + val_evaluator_UnoCards, + val_evaluator_VehiclesOpenImages, + val_evaluator_WildfireSmoke, + val_evaluator_websiteScreenshots, ] # -------------------------------------------------# -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/mm_grounding_dino/odinw/override_category.py b/mmpose/configs/mmdet/mm_grounding_dino/odinw/override_category.py index 9ff05fc6e5e4d0989cf7fcf7af4dc902ee99f3a3..aeadada4e6f6c8f154ca6c413e573793b2189e48 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/odinw/override_category.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/odinw/override_category.py @@ -5,105 +5,52 @@ import mmengine def parse_args(): - parser = argparse.ArgumentParser(description='Override Category') - parser.add_argument('data_root') + parser = argparse.ArgumentParser(description="Override Category") + parser.add_argument("data_root") return parser.parse_args() def main(): args = parse_args() - ChessPieces = [{ - 'id': 1, - 'name': ' ', - 'supercategory': 'pieces' - }, { - 'id': 2, - 'name': 'black bishop', - 'supercategory': 'pieces' - }, { - 'id': 3, - 'name': 'black king', - 'supercategory': 'pieces' - }, { - 'id': 4, - 'name': 'black knight', - 'supercategory': 'pieces' - }, { - 'id': 5, - 'name': 'black pawn', - 'supercategory': 'pieces' - }, { - 'id': 6, - 'name': 'black queen', - 'supercategory': 'pieces' - }, { - 'id': 7, - 'name': 'black rook', - 'supercategory': 'pieces' - }, { - 'id': 8, - 'name': 'white bishop', - 'supercategory': 'pieces' - }, { - 'id': 9, - 'name': 'white king', - 'supercategory': 'pieces' - }, { - 'id': 10, - 'name': 'white knight', - 'supercategory': 'pieces' - }, { - 'id': 11, - 'name': 'white pawn', - 'supercategory': 'pieces' - }, { - 'id': 12, - 'name': 'white queen', - 'supercategory': 'pieces' - }, { - 'id': 13, - 'name': 'white rook', - 'supercategory': 'pieces' - }] - - _data_root = args.data_root + 'ChessPieces/Chess Pieces.v23-raw.coco/' - json_data = mmengine.load(_data_root + - 'valid/annotations_without_background.json') - json_data['categories'] = ChessPieces - mmengine.dump(json_data, - _data_root + 'valid/new_annotations_without_background.json') - - CottontailRabbits = [{ - 'id': 1, - 'name': 'rabbit', - 'supercategory': 'Cottontail-Rabbit' - }] - - _data_root = args.data_root + 'CottontailRabbits/' - json_data = mmengine.load(_data_root + - 'valid/annotations_without_background.json') - json_data['categories'] = CottontailRabbits - mmengine.dump(json_data, - _data_root + 'valid/new_annotations_without_background.json') - - NorthAmericaMushrooms = [{ - 'id': 1, - 'name': 'flat mushroom', - 'supercategory': 'mushroom' - }, { - 'id': 2, - 'name': 'yellow mushroom', - 'supercategory': 'mushroom' - }] - - _data_root = args.data_root + 'NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/' # noqa - json_data = mmengine.load(_data_root + - 'valid/annotations_without_background.json') - json_data['categories'] = NorthAmericaMushrooms - mmengine.dump(json_data, - _data_root + 'valid/new_annotations_without_background.json') - - -if __name__ == '__main__': + ChessPieces = [ + {"id": 1, "name": " ", "supercategory": "pieces"}, + {"id": 2, "name": "black bishop", "supercategory": "pieces"}, + {"id": 3, "name": "black king", "supercategory": "pieces"}, + {"id": 4, "name": "black knight", "supercategory": "pieces"}, + {"id": 5, "name": "black pawn", "supercategory": "pieces"}, + {"id": 6, "name": "black queen", "supercategory": "pieces"}, + {"id": 7, "name": "black rook", "supercategory": "pieces"}, + {"id": 8, "name": "white bishop", "supercategory": "pieces"}, + {"id": 9, "name": "white king", "supercategory": "pieces"}, + {"id": 10, "name": "white knight", "supercategory": "pieces"}, + {"id": 11, "name": "white pawn", "supercategory": "pieces"}, + {"id": 12, "name": "white queen", "supercategory": "pieces"}, + {"id": 13, "name": "white rook", "supercategory": "pieces"}, + ] + + _data_root = args.data_root + "ChessPieces/Chess Pieces.v23-raw.coco/" + json_data = mmengine.load(_data_root + "valid/annotations_without_background.json") + json_data["categories"] = ChessPieces + mmengine.dump(json_data, _data_root + "valid/new_annotations_without_background.json") + + CottontailRabbits = [{"id": 1, "name": "rabbit", "supercategory": "Cottontail-Rabbit"}] + + _data_root = args.data_root + "CottontailRabbits/" + json_data = mmengine.load(_data_root + "valid/annotations_without_background.json") + json_data["categories"] = CottontailRabbits + mmengine.dump(json_data, _data_root + "valid/new_annotations_without_background.json") + + NorthAmericaMushrooms = [ + {"id": 1, "name": "flat mushroom", "supercategory": "mushroom"}, + {"id": 2, "name": "yellow mushroom", "supercategory": "mushroom"}, + ] + + _data_root = args.data_root + "NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/" # noqa + json_data = mmengine.load(_data_root + "valid/annotations_without_background.json") + json_data["categories"] = NorthAmericaMushrooms + mmengine.dump(json_data, _data_root + "valid/new_annotations_without_background.json") + + +if __name__ == "__main__": main() diff --git a/mmpose/configs/mmdet/mm_grounding_dino/people_in_painting/grounding_dino_swin-t_finetune_8xb4_50e_people_in_painting.py b/mmpose/configs/mmdet/mm_grounding_dino/people_in_painting/grounding_dino_swin-t_finetune_8xb4_50e_people_in_painting.py index 449d8682f896c3857e6a50b16a13b43acc77ebc2..738bc628e5da1250e0175d469a794dc3536be792 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/people_in_painting/grounding_dino_swin-t_finetune_8xb4_50e_people_in_painting.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/people_in_painting/grounding_dino_swin-t_finetune_8xb4_50e_people_in_painting.py @@ -1,109 +1,115 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" # https://universe.roboflow.com/roboflow-100/people-in-paintings/dataset/2 -data_root = 'data/people_in_painting_v2/' -class_name = ('Human', ) +data_root = "data/people_in_painting_v2/" +class_name = ("Human",) palette = [(220, 20, 60)] metainfo = dict(classes=class_name, palette=palette) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction", "text", "custom_entities"), + ), ] train_dataloader = dict( - sampler=dict(_delete_=True, type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(_delete_=True, type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( _delete_=True, - type='RepeatDataset', + type="RepeatDataset", times=10, dataset=dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, metainfo=metainfo, filter_cfg=dict(filter_empty_gt=False, min_size=32), pipeline=train_pipeline, return_classes=True, - data_prefix=dict(img='train/'), - ann_file='train/_annotations.coco.json'))) + data_prefix=dict(img="train/"), + ann_file="train/_annotations.coco.json", + ), + ), +) val_dataloader = dict( dataset=dict( - metainfo=metainfo, - data_root=data_root, - return_classes=True, - ann_file='valid/_annotations.coco.json', - data_prefix=dict(img='valid/'))) + metainfo=metainfo, data_root=data_root, return_classes=True, ann_file="valid/_annotations.coco.json", data_prefix=dict(img="valid/") + ) +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'valid/_annotations.coco.json', - metric='bbox', - format_only=False) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "valid/_annotations.coco.json", metric="bbox", format_only=False) test_evaluator = val_evaluator optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0001, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0001, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), - paramwise_cfg=dict(custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1) - })) + paramwise_cfg=dict(custom_keys={"absolute_pos_embed": dict(decay_mult=0.0), "backbone": dict(lr_mult=0.1)}), +) # learning policy max_epochs = 5 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[4], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[4], gamma=0.1)] train_cfg = dict(max_epochs=max_epochs, val_interval=1) -default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best='auto')) +default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best="auto")) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_grefcoco.py b/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_grefcoco.py index 983ffe5c6f3f6e59cf1616a0b22c17f065e08437..1a12f5b23ba6a5f92f3a05ba11761df5846cf3d0 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_grefcoco.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_grefcoco.py @@ -1,170 +1,174 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -data_root = 'data/coco/' +data_root = "data/coco/" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), # change this - dict(type='RandomFlip', prob=0.0), + dict(type="RandomFlip", prob=0.0), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), - dict( - type='RandomSamplingNegPos', - tokenizer_name=_base_.lang_model_name, - num_sample_negative=85, - max_tokens=256), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, max_tokens=256), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] train_dataloader = dict( dataset=dict( _delete_=True, - type='ODVGDataset', + type="ODVGDataset", data_root=data_root, - ann_file='mdetr_annotations/finetune_grefcoco_train_vg.json', - data_prefix=dict(img='train2014/'), + ann_file="mdetr_annotations/finetune_grefcoco_train_vg.json", + data_prefix=dict(img="train2014/"), filter_cfg=dict(filter_empty_gt=False, min_size=32), return_classes=True, - pipeline=train_pipeline)) + pipeline=train_pipeline, + ) +) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_grefcoco_val.json' +ann_file = "mdetr_annotations/finetune_grefcoco_val.json" val_dataset_all_val = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=_base_.test_pipeline, - backend_args=None) + backend_args=None, +) val_evaluator_all_val = dict( - type='gRefCOCOMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - thresh_score=0.7, - thresh_f1=1.0) + type="gRefCOCOMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, thresh_score=0.7, thresh_f1=1.0 +) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_grefcoco_testA.json' +ann_file = "mdetr_annotations/finetune_grefcoco_testA.json" val_dataset_refcoco_testA = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=_base_.test_pipeline, - backend_args=None) + backend_args=None, +) val_evaluator_refcoco_testA = dict( - type='gRefCOCOMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - thresh_score=0.7, - thresh_f1=1.0) + type="gRefCOCOMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, thresh_score=0.7, thresh_f1=1.0 +) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_grefcoco_testB.json' +ann_file = "mdetr_annotations/finetune_grefcoco_testB.json" val_dataset_refcoco_testB = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=_base_.test_pipeline, - backend_args=None) + backend_args=None, +) val_evaluator_refcoco_testB = dict( - type='gRefCOCOMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - thresh_score=0.7, - thresh_f1=1.0) + type="gRefCOCOMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, thresh_score=0.7, thresh_f1=1.0 +) # -------------------------------------------------# -datasets = [ - val_dataset_all_val, val_dataset_refcoco_testA, val_dataset_refcoco_testB -] -dataset_prefixes = ['grefcoco_val', 'grefcoco_testA', 'grefcoco_testB'] -metrics = [ - val_evaluator_all_val, val_evaluator_refcoco_testA, - val_evaluator_refcoco_testB -] +datasets = [val_dataset_all_val, val_dataset_refcoco_testA, val_dataset_refcoco_testB] +dataset_prefixes = ["grefcoco_val", "grefcoco_testA", "grefcoco_testB"] +metrics = [val_evaluator_all_val, val_evaluator_refcoco_testA, val_evaluator_refcoco_testB] -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0002, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1), + "absolute_pos_embed": dict(decay_mult=0.0), + "backbone": dict(lr_mult=0.1), # 'language_model': dict(lr_mult=0), - })) + } + ), +) # learning policy max_epochs = 5 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[3], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[3], gamma=0.1)] train_cfg = dict(max_epochs=max_epochs, val_interval=1) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_refcoco.py b/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_refcoco.py index d91af473a239f2f48a09a272d926e00c52da987b..98bd7b202ddd9e9e35f7f0e23782802ebf8e8dbc 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_refcoco.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_refcoco.py @@ -1,167 +1,168 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -data_root = 'data/coco/' +data_root = "data/coco/" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), # change this - dict(type='RandomFlip', prob=0.0), + dict(type="RandomFlip", prob=0.0), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), - dict( - type='RandomSamplingNegPos', - tokenizer_name=_base_.lang_model_name, - num_sample_negative=85, - max_tokens=256), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, max_tokens=256), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] train_dataloader = dict( dataset=dict( _delete_=True, - type='ODVGDataset', + type="ODVGDataset", data_root=data_root, - ann_file='mdetr_annotations/finetune_refcoco_train_vg.json', - data_prefix=dict(img='train2014/'), + ann_file="mdetr_annotations/finetune_refcoco_train_vg.json", + data_prefix=dict(img="train2014/"), filter_cfg=dict(filter_empty_gt=False, min_size=32), return_classes=True, - pipeline=train_pipeline)) + pipeline=train_pipeline, + ) +) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcoco_val.json' +ann_file = "mdetr_annotations/finetune_refcoco_val.json" val_dataset_all_val = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=_base_.test_pipeline, - backend_args=None) -val_evaluator_all_val = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) + backend_args=None, +) +val_evaluator_all_val = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcoco_testA.json' +ann_file = "mdetr_annotations/finetune_refcoco_testA.json" val_dataset_refcoco_testA = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=_base_.test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcoco_testA = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcoco_testA = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcoco_testB.json' +ann_file = "mdetr_annotations/finetune_refcoco_testB.json" val_dataset_refcoco_testB = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=_base_.test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcoco_testB = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcoco_testB = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -datasets = [ - val_dataset_all_val, val_dataset_refcoco_testA, val_dataset_refcoco_testB -] -dataset_prefixes = ['refcoco_val', 'refcoco_testA', 'refcoco_testB'] -metrics = [ - val_evaluator_all_val, val_evaluator_refcoco_testA, - val_evaluator_refcoco_testB -] +datasets = [val_dataset_all_val, val_dataset_refcoco_testA, val_dataset_refcoco_testB] +dataset_prefixes = ["refcoco_val", "refcoco_testA", "refcoco_testB"] +metrics = [val_evaluator_all_val, val_evaluator_refcoco_testA, val_evaluator_refcoco_testB] -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0002, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1), + "absolute_pos_embed": dict(decay_mult=0.0), + "backbone": dict(lr_mult=0.1), # 'language_model': dict(lr_mult=0), - })) + } + ), +) # learning policy max_epochs = 5 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[3], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[3], gamma=0.1)] train_cfg = dict(max_epochs=max_epochs, val_interval=1) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_refcoco_plus.py b/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_refcoco_plus.py index 871adc8efb48532fb5e0fbfa07e6019c37911712..0a00db793bbe1ebc64ce0fbfa0b545f3a30d77b7 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_refcoco_plus.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_refcoco_plus.py @@ -1,167 +1,168 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -data_root = 'data/coco/' +data_root = "data/coco/" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), # change this - dict(type='RandomFlip', prob=0.0), + dict(type="RandomFlip", prob=0.0), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), - dict( - type='RandomSamplingNegPos', - tokenizer_name=_base_.lang_model_name, - num_sample_negative=85, - max_tokens=256), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, max_tokens=256), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] train_dataloader = dict( dataset=dict( _delete_=True, - type='ODVGDataset', + type="ODVGDataset", data_root=data_root, - ann_file='mdetr_annotations/finetune_refcoco+_train_vg.json', - data_prefix=dict(img='train2014/'), + ann_file="mdetr_annotations/finetune_refcoco+_train_vg.json", + data_prefix=dict(img="train2014/"), filter_cfg=dict(filter_empty_gt=False, min_size=32), return_classes=True, - pipeline=train_pipeline)) + pipeline=train_pipeline, + ) +) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcoco+_val.json' +ann_file = "mdetr_annotations/finetune_refcoco+_val.json" val_dataset_all_val = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=_base_.test_pipeline, - backend_args=None) -val_evaluator_all_val = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) + backend_args=None, +) +val_evaluator_all_val = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcoco+_testA.json' +ann_file = "mdetr_annotations/finetune_refcoco+_testA.json" val_dataset_refcoco_testA = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=_base_.test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcoco_testA = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcoco_testA = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcoco+_testB.json' +ann_file = "mdetr_annotations/finetune_refcoco+_testB.json" val_dataset_refcoco_testB = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=_base_.test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcoco_testB = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcoco_testB = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -datasets = [ - val_dataset_all_val, val_dataset_refcoco_testA, val_dataset_refcoco_testB -] -dataset_prefixes = ['refcoco+_val', 'refcoco+_testA', 'refcoco+_testB'] -metrics = [ - val_evaluator_all_val, val_evaluator_refcoco_testA, - val_evaluator_refcoco_testB -] +datasets = [val_dataset_all_val, val_dataset_refcoco_testA, val_dataset_refcoco_testB] +dataset_prefixes = ["refcoco+_val", "refcoco+_testA", "refcoco+_testB"] +metrics = [val_evaluator_all_val, val_evaluator_refcoco_testA, val_evaluator_refcoco_testB] -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0002, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1), + "absolute_pos_embed": dict(decay_mult=0.0), + "backbone": dict(lr_mult=0.1), # 'language_model': dict(lr_mult=0), - })) + } + ), +) # learning policy max_epochs = 5 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[3], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[3], gamma=0.1)] train_cfg = dict(max_epochs=max_epochs, val_interval=1) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_refcocog.py b/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_refcocog.py index a351d6f9d123fc8f2000990a5e6d02adbb3eb2fa..c541ebe587dc0c044a045a4346d00cda2fb78a62 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_refcocog.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_finetune_8xb4_5e_refcocog.py @@ -1,145 +1,155 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -data_root = 'data/coco/' +data_root = "data/coco/" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), # change this - dict(type='RandomFlip', prob=0.0), + dict(type="RandomFlip", prob=0.0), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), - dict( - type='RandomSamplingNegPos', - tokenizer_name=_base_.lang_model_name, - num_sample_negative=85, - max_tokens=256), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="RandomSamplingNegPos", tokenizer_name=_base_.lang_model_name, num_sample_negative=85, max_tokens=256), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities', 'tokens_positive', 'dataset_mode')) + type="PackDetInputs", + meta_keys=( + "img_id", + "img_path", + "ori_shape", + "img_shape", + "scale_factor", + "flip", + "flip_direction", + "text", + "custom_entities", + "tokens_positive", + "dataset_mode", + ), + ), ] train_dataloader = dict( dataset=dict( _delete_=True, - type='ODVGDataset', + type="ODVGDataset", data_root=data_root, - ann_file='mdetr_annotations/finetune_refcocog_train_vg.json', - data_prefix=dict(img='train2014/'), + ann_file="mdetr_annotations/finetune_refcocog_train_vg.json", + data_prefix=dict(img="train2014/"), filter_cfg=dict(filter_empty_gt=False, min_size=32), return_classes=True, - pipeline=train_pipeline)) + pipeline=train_pipeline, + ) +) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcocog_val.json' +ann_file = "mdetr_annotations/finetune_refcocog_val.json" val_dataset_all_val = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=_base_.test_pipeline, - backend_args=None) -val_evaluator_all_val = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) + backend_args=None, +) +val_evaluator_all_val = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcocog_test.json' +ann_file = "mdetr_annotations/finetune_refcocog_test.json" val_dataset_refcoco_test = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=_base_.test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcoco_test = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcoco_test = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# datasets = [val_dataset_all_val, val_dataset_refcoco_test] -dataset_prefixes = ['refcocog_val', 'refcocog_test'] +dataset_prefixes = ["refcocog_val", "refcocog_test"] metrics = [val_evaluator_all_val, val_evaluator_refcoco_test] -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0002, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0002, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1), + "absolute_pos_embed": dict(decay_mult=0.0), + "backbone": dict(lr_mult=0.1), # 'language_model': dict(lr_mult=0), - })) + } + ), +) # learning policy max_epochs = 5 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[3], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[3], gamma=0.1)] train_cfg = dict(max_epochs=max_epochs, val_interval=1) -default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best='auto')) +default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best="auto")) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_pretrain_zeroshot_refexp.py b/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_pretrain_zeroshot_refexp.py index 437d71c6b357eda85d13b5efd4c81d4d32f91120..2998820750261989769238161ea9f5efc5a7f880 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_pretrain_zeroshot_refexp.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/refcoco/grounding_dino_swin-t_pretrain_zeroshot_refexp.py @@ -1,228 +1,198 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" # 30 is an empirical value, just set it to the maximum value # without affecting the evaluation result model = dict(test_cfg=dict(max_per_img=30)) -data_root = 'data/coco/' +data_root = "data/coco/" test_pipeline = [ + dict(type="LoadImageFromFile", backend_args=None, imdecode_backend="pillow"), + dict(type="FixScaleResize", scale=(800, 1333), keep_ratio=True, backend="pillow"), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='LoadImageFromFile', backend_args=None, - imdecode_backend='pillow'), - dict( - type='FixScaleResize', - scale=(800, 1333), - keep_ratio=True, - backend='pillow'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'text', 'custom_entities', - 'tokens_positive')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "text", "custom_entities", "tokens_positive"), + ), ] # -------------------------------------------------# -ann_file = 'mdetr_annotations/final_refexp_val.json' +ann_file = "mdetr_annotations/final_refexp_val.json" val_dataset_all_val = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) -val_evaluator_all_val = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) + backend_args=None, +) +val_evaluator_all_val = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcoco_testA.json' +ann_file = "mdetr_annotations/finetune_refcoco_testA.json" val_dataset_refcoco_testA = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcoco_testA = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcoco_testA = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcoco_testB.json' +ann_file = "mdetr_annotations/finetune_refcoco_testB.json" val_dataset_refcoco_testB = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcoco_testB = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcoco_testB = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcoco+_testA.json' +ann_file = "mdetr_annotations/finetune_refcoco+_testA.json" val_dataset_refcoco_plus_testA = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcoco_plus_testA = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcoco_plus_testA = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcoco+_testB.json' +ann_file = "mdetr_annotations/finetune_refcoco+_testB.json" val_dataset_refcoco_plus_testB = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcoco_plus_testB = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcoco_plus_testB = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_refcocog_test.json' +ann_file = "mdetr_annotations/finetune_refcocog_test.json" val_dataset_refcocog_test = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) -val_evaluator_refcocog_test = dict( - type='RefExpMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - topk=(1, 5, 10)) +val_evaluator_refcocog_test = dict(type="RefExpMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, topk=(1, 5, 10)) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_grefcoco_val.json' +ann_file = "mdetr_annotations/finetune_grefcoco_val.json" val_dataset_grefcoco_val = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) val_evaluator_grefcoco_val = dict( - type='gRefCOCOMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - thresh_score=0.7, - thresh_f1=1.0) + type="gRefCOCOMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, thresh_score=0.7, thresh_f1=1.0 +) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_grefcoco_testA.json' +ann_file = "mdetr_annotations/finetune_grefcoco_testA.json" val_dataset_grefcoco_testA = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) val_evaluator_grefcoco_testA = dict( - type='gRefCOCOMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - thresh_score=0.7, - thresh_f1=1.0) + type="gRefCOCOMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, thresh_score=0.7, thresh_f1=1.0 +) # -------------------------------------------------# -ann_file = 'mdetr_annotations/finetune_grefcoco_testB.json' +ann_file = "mdetr_annotations/finetune_grefcoco_testB.json" val_dataset_grefcoco_testB = dict( - type='MDETRStyleRefCocoDataset', + type="MDETRStyleRefCocoDataset", data_root=data_root, ann_file=ann_file, - data_prefix=dict(img='train2014/'), + data_prefix=dict(img="train2014/"), test_mode=True, return_classes=True, pipeline=test_pipeline, - backend_args=None) + backend_args=None, +) val_evaluator_grefcoco_testB = dict( - type='gRefCOCOMetric', - ann_file=data_root + ann_file, - metric='bbox', - iou_thrs=0.5, - thresh_score=0.7, - thresh_f1=1.0) + type="gRefCOCOMetric", ann_file=data_root + ann_file, metric="bbox", iou_thrs=0.5, thresh_score=0.7, thresh_f1=1.0 +) # -------------------------------------------------# datasets = [ - val_dataset_all_val, val_dataset_refcoco_testA, val_dataset_refcoco_testB, - val_dataset_refcoco_plus_testA, val_dataset_refcoco_plus_testB, - val_dataset_refcocog_test, val_dataset_grefcoco_val, - val_dataset_grefcoco_testA, val_dataset_grefcoco_testB + val_dataset_all_val, + val_dataset_refcoco_testA, + val_dataset_refcoco_testB, + val_dataset_refcoco_plus_testA, + val_dataset_refcoco_plus_testB, + val_dataset_refcocog_test, + val_dataset_grefcoco_val, + val_dataset_grefcoco_testA, + val_dataset_grefcoco_testB, ] dataset_prefixes = [ - 'val', 'refcoco_testA', 'refcoco_testB', 'refcoco+_testA', - 'refcoco+_testB', 'refcocog_test', 'grefcoco_val', 'grefcoco_testA', - 'grefcoco_testB' + "val", + "refcoco_testA", + "refcoco_testB", + "refcoco+_testA", + "refcoco+_testB", + "refcocog_test", + "grefcoco_val", + "grefcoco_testA", + "grefcoco_testB", ] metrics = [ - val_evaluator_all_val, val_evaluator_refcoco_testA, - val_evaluator_refcoco_testB, val_evaluator_refcoco_plus_testA, - val_evaluator_refcoco_plus_testB, val_evaluator_refcocog_test, - val_evaluator_grefcoco_val, val_evaluator_grefcoco_testA, - val_evaluator_grefcoco_testB + val_evaluator_all_val, + val_evaluator_refcoco_testA, + val_evaluator_refcoco_testB, + val_evaluator_refcoco_plus_testA, + val_evaluator_refcoco_plus_testB, + val_evaluator_refcocog_test, + val_evaluator_grefcoco_val, + val_evaluator_grefcoco_testA, + val_evaluator_grefcoco_testB, ] -val_dataloader = dict( - dataset=dict(_delete_=True, type='ConcatDataset', datasets=datasets)) +val_dataloader = dict(dataset=dict(_delete_=True, type="ConcatDataset", datasets=datasets)) test_dataloader = val_dataloader -val_evaluator = dict( - _delete_=True, - type='MultiDatasetsEvaluator', - metrics=metrics, - dataset_prefixes=dataset_prefixes) +val_evaluator = dict(_delete_=True, type="MultiDatasetsEvaluator", metrics=metrics, dataset_prefixes=dataset_prefixes) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/mm_grounding_dino/rtts/grounding_dino_swin-t_finetune_8xb4_1x_rtts.py b/mmpose/configs/mmdet/mm_grounding_dino/rtts/grounding_dino_swin-t_finetune_8xb4_1x_rtts.py index 95c2be058e2c407fc92de93f4b79ec8b36e25c18..5d8bdd1fe59a45f49fddfe05fd6a88003e573c85 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/rtts/grounding_dino_swin-t_finetune_8xb4_1x_rtts.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/rtts/grounding_dino_swin-t_finetune_8xb4_1x_rtts.py @@ -1,106 +1,110 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -data_root = 'data/RTTS/' -class_name = ('bicycle', 'bus', 'car', 'motorbike', 'person') -palette = [(255, 97, 0), (0, 201, 87), (176, 23, 31), (138, 43, 226), - (30, 144, 255)] +data_root = "data/RTTS/" +class_name = ("bicycle", "bus", "car", "motorbike", "person") +palette = [(255, 97, 0), (0, 201, 87), (176, 23, 31), (138, 43, 226), (30, 144, 255)] metainfo = dict(classes=class_name, palette=palette) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction", "text", "custom_entities"), + ), ] train_dataloader = dict( - sampler=dict(_delete_=True, type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(_delete_=True, type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( _delete_=True, - type='CocoDataset', + type="CocoDataset", data_root=data_root, metainfo=metainfo, filter_cfg=dict(filter_empty_gt=False, min_size=32), pipeline=train_pipeline, return_classes=True, - ann_file='annotations_json/rtts_train.json', - data_prefix=dict(img=''))) + ann_file="annotations_json/rtts_train.json", + data_prefix=dict(img=""), + ), +) val_dataloader = dict( dataset=dict( - metainfo=metainfo, - data_root=data_root, - return_classes=True, - ann_file='annotations_json/rtts_val.json', - data_prefix=dict(img=''))) + metainfo=metainfo, data_root=data_root, return_classes=True, ann_file="annotations_json/rtts_val.json", data_prefix=dict(img="") + ) +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations_json/rtts_val.json', - metric='bbox', - format_only=False) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations_json/rtts_val.json", metric="bbox", format_only=False) test_evaluator = val_evaluator optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0001, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0001, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), - paramwise_cfg=dict(custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1) - })) + paramwise_cfg=dict(custom_keys={"absolute_pos_embed": dict(decay_mult=0.0), "backbone": dict(lr_mult=0.1)}), +) # learning policy max_epochs = 12 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[11], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[11], gamma=0.1)] train_cfg = dict(max_epochs=max_epochs, val_interval=1) -default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best='auto')) +default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best="auto")) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/mm_grounding_dino/ruod/grounding_dino_swin-t_finetune_8xb4_1x_ruod.py b/mmpose/configs/mmdet/mm_grounding_dino/ruod/grounding_dino_swin-t_finetune_8xb4_1x_ruod.py index f57682b29d970fb6d46c2f459f773b03e803695d..b2a2cc2427e7883b87ae571bfeaf51565a625e83 100644 --- a/mmpose/configs/mmdet/mm_grounding_dino/ruod/grounding_dino_swin-t_finetune_8xb4_1x_ruod.py +++ b/mmpose/configs/mmdet/mm_grounding_dino/ruod/grounding_dino_swin-t_finetune_8xb4_1x_ruod.py @@ -1,108 +1,125 @@ -_base_ = '../grounding_dino_swin-t_pretrain_obj365.py' +_base_ = "../grounding_dino_swin-t_pretrain_obj365.py" -data_root = 'data/RUOD/' -class_name = ('holothurian', 'echinus', 'scallop', 'starfish', 'fish', - 'corals', 'diver', 'cuttlefish', 'turtle', 'jellyfish') -palette = [(235, 211, 70), (106, 90, 205), (160, 32, 240), (176, 23, 31), - (142, 0, 0), (230, 0, 0), (106, 0, 228), (60, 100, 0), (80, 100, 0), - (70, 0, 0)] +data_root = "data/RUOD/" +class_name = ("holothurian", "echinus", "scallop", "starfish", "fish", "corals", "diver", "cuttlefish", "turtle", "jellyfish") +palette = [ + (235, 211, 70), + (106, 90, 205), + (160, 32, 240), + (176, 23, 31), + (142, 0, 0), + (230, 0, 0), + (106, 0, 228), + (60, 100, 0), + (80, 100, 0), + (70, 0, 0), +] metainfo = dict(classes=class_name, palette=palette) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction', 'text', - 'custom_entities')) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction", "text", "custom_entities"), + ), ] train_dataloader = dict( - sampler=dict(_delete_=True, type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(_delete_=True, type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( _delete_=True, - type='CocoDataset', + type="CocoDataset", data_root=data_root, metainfo=metainfo, filter_cfg=dict(filter_empty_gt=False, min_size=32), pipeline=train_pipeline, return_classes=True, - ann_file='RUOD_ANN/instances_train.json', - data_prefix=dict(img='RUOD_pic/train/'))) + ann_file="RUOD_ANN/instances_train.json", + data_prefix=dict(img="RUOD_pic/train/"), + ), +) val_dataloader = dict( dataset=dict( metainfo=metainfo, data_root=data_root, return_classes=True, - ann_file='RUOD_ANN/instances_test.json', - data_prefix=dict(img='RUOD_pic/test/'))) + ann_file="RUOD_ANN/instances_test.json", + data_prefix=dict(img="RUOD_pic/test/"), + ) +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'RUOD_ANN/instances_test.json', - metric='bbox', - format_only=False) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "RUOD_ANN/instances_test.json", metric="bbox", format_only=False) test_evaluator = val_evaluator optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=0.0001, weight_decay=0.0001), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=0.0001, weight_decay=0.0001), clip_grad=dict(max_norm=0.1, norm_type=2), - paramwise_cfg=dict(custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'backbone': dict(lr_mult=0.1) - })) + paramwise_cfg=dict(custom_keys={"absolute_pos_embed": dict(decay_mult=0.0), "backbone": dict(lr_mult=0.1)}), +) # learning policy max_epochs = 12 -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[11], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[11], gamma=0.1)] train_cfg = dict(max_epochs=max_epochs, val_interval=1) -default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best='auto')) +default_hooks = dict(checkpoint=dict(max_keep_ckpts=1, save_best="auto")) -load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth' # noqa +load_from = "https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth" # noqa diff --git a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r101-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r101-caffe_fpn_1x_coco.py index 2ff4f2d66ae6de88ba9d5d8fb5cf31abaa4cb3c5..9bccd23dffd11ed067aafa6e203a125291808bf9 100644 --- a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r101-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r101-caffe_fpn_1x_coco.py @@ -1,7 +1,2 @@ -_base_ = './ms-rcnn_r50-caffe_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) +_base_ = "./ms-rcnn_r50-caffe_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"))) diff --git a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r101-caffe_fpn_2x_coco.py b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r101-caffe_fpn_2x_coco.py index 54b29e4f7aea547e2b26782b71ada8053930d325..b2b997e6c1daa68fcd49cbbd34d5a1a53b88fe4e 100644 --- a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r101-caffe_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r101-caffe_fpn_2x_coco.py @@ -1,17 +1,9 @@ -_base_ = './ms-rcnn_r101-caffe_fpn_1x_coco.py' +_base_ = "./ms-rcnn_r101-caffe_fpn_1x_coco.py" # learning policy max_epochs = 24 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r50-caffe_fpn_1x_coco.py index e7fbc51f1ba431ca7c22ff3d2c74cfc9e1263ffb..7dbe8ecc409b6e9f00f32f1673c8fe79361bf6a2 100644 --- a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r50-caffe_fpn_1x_coco.py @@ -1,16 +1,19 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50-caffe_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r50-caffe_fpn_1x_coco.py" model = dict( - type='MaskScoringRCNN', + type="MaskScoringRCNN", roi_head=dict( - type='MaskScoringRoIHead', + type="MaskScoringRoIHead", mask_iou_head=dict( - type='MaskIoUHead', + type="MaskIoUHead", num_convs=4, num_fcs=2, roi_feat_size=14, in_channels=256, conv_out_channels=256, fc_out_channels=1024, - num_classes=80)), + num_classes=80, + ), + ), # model training and testing settings - train_cfg=dict(rcnn=dict(mask_thr_binary=0.5))) + train_cfg=dict(rcnn=dict(mask_thr_binary=0.5)), +) diff --git a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r50-caffe_fpn_2x_coco.py b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r50-caffe_fpn_2x_coco.py index 033488229220e5b044c30c43f5e72f8468f68224..f2b64180792a59907e61553f2a9111025e7b0776 100644 --- a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r50-caffe_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r50-caffe_fpn_2x_coco.py @@ -1,17 +1,9 @@ -_base_ = './ms-rcnn_r50-caffe_fpn_1x_coco.py' +_base_ = "./ms-rcnn_r50-caffe_fpn_1x_coco.py" # learning policy max_epochs = 24 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r50_fpn_1x_coco.py index 0ae47d1c38daa4430de4b4264bbb2aef0eb7f7ea..cd10d4649bf032314dbc9afe64d532ff7716a566 100644 --- a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_r50_fpn_1x_coco.py @@ -1,16 +1,19 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" model = dict( - type='MaskScoringRCNN', + type="MaskScoringRCNN", roi_head=dict( - type='MaskScoringRoIHead', + type="MaskScoringRoIHead", mask_iou_head=dict( - type='MaskIoUHead', + type="MaskIoUHead", num_convs=4, num_fcs=2, roi_feat_size=14, in_channels=256, conv_out_channels=256, fc_out_channels=1024, - num_classes=80)), + num_classes=80, + ), + ), # model training and testing settings - train_cfg=dict(rcnn=dict(mask_thr_binary=0.5))) + train_cfg=dict(rcnn=dict(mask_thr_binary=0.5)), +) diff --git a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_x101-32x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_x101-32x4d_fpn_1x_coco.py index 1a5d0d0f3188e8e661cc9ab7a731fc631dd950ac..4696ae0a5bf8434d0f05fc3632561e407921c177 100644 --- a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_x101-32x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_x101-32x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './ms-rcnn_r50_fpn_1x_coco.py' +_base_ = "./ms-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_x101-64x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_x101-64x4d_fpn_1x_coco.py index 16290076c07d7a97108b89e4a41b5ff51cbbcdc1..c904ea8d78ad7978331d8c5649fcfe5c0ad46fb9 100644 --- a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_x101-64x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_x101-64x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './ms-rcnn_r50_fpn_1x_coco.py' +_base_ = "./ms-rcnn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_x101-64x4d_fpn_2x_coco.py b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_x101-64x4d_fpn_2x_coco.py index 7aec1874394692a63dc8caeef2609cf01b7bfd7c..8cb93d92e0893b0b365ff1bfdafa1410766d4a10 100644 --- a/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_x101-64x4d_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/ms_rcnn/ms-rcnn_x101-64x4d_fpn_2x_coco.py @@ -1,17 +1,9 @@ -_base_ = './ms-rcnn_x101-64x4d_fpn_1x_coco.py' +_base_ = "./ms-rcnn_x101-64x4d_fpn_1x_coco.py" # learning policy max_epochs = 24 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/nas_fcos/nas-fcos_r50-caffe_fpn_fcoshead-gn-head_4xb4-1x_coco.py b/mmpose/configs/mmdet/nas_fcos/nas-fcos_r50-caffe_fpn_fcoshead-gn-head_4xb4-1x_coco.py index ba207c9fbdddc5cd30e4d4d86add2c98664e7ffb..1b9dc4923f06685179b72198e1e5018a3877c480 100644 --- a/mmpose/configs/mmdet/nas_fcos/nas-fcos_r50-caffe_fpn_fcoshead-gn-head_4xb4-1x_coco.py +++ b/mmpose/configs/mmdet/nas_fcos/nas-fcos_r50-caffe_fpn_fcoshead-gn-head_4xb4-1x_coco.py @@ -1,75 +1,54 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings model = dict( - type='NASFCOS', + type="NASFCOS", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False, eps=0), - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), + norm_cfg=dict(type="BN", requires_grad=False, eps=0), + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), neck=dict( - type='NASFCOS_FPN', + type="NASFCOS_FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs=True, num_outs=5, - norm_cfg=dict(type='BN'), - conv_cfg=dict(type='DCNv2', deform_groups=2)), + norm_cfg=dict(type="BN"), + conv_cfg=dict(type="DCNv2", deform_groups=2), + ), bbox_head=dict( - type='FCOSHead', + type="FCOSHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, strides=[8, 16, 32, 64, 128], - norm_cfg=dict(type='GN', num_groups=32), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='IoULoss', loss_weight=1.0), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + norm_cfg=dict(type="GN", num_groups=32), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", loss_weight=1.0), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), train_cfg=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + debug=False, + ), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # dataset settings train_dataloader = dict(batch_size=4, num_workers=2) # optimizer -optim_wrapper = dict( - optimizer=dict(lr=0.01), - paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.)) +optim_wrapper = dict(optimizer=dict(lr=0.01), paramwise_cfg=dict(bias_lr_mult=2.0, bias_decay_mult=0.0)) diff --git a/mmpose/configs/mmdet/nas_fcos/nas-fcos_r50-caffe_fpn_nashead-gn-head_4xb4-1x_coco.py b/mmpose/configs/mmdet/nas_fcos/nas-fcos_r50-caffe_fpn_nashead-gn-head_4xb4-1x_coco.py index 329f34c45ca0ea3f95e8da8505717df86b7c79c0..d94b7ad628a5ed14cf153dedec790757cac48815 100644 --- a/mmpose/configs/mmdet/nas_fcos/nas-fcos_r50-caffe_fpn_nashead-gn-head_4xb4-1x_coco.py +++ b/mmpose/configs/mmdet/nas_fcos/nas-fcos_r50-caffe_fpn_nashead-gn-head_4xb4-1x_coco.py @@ -1,74 +1,53 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings model = dict( - type='NASFCOS', + type="NASFCOS", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False, eps=0), - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), + norm_cfg=dict(type="BN", requires_grad=False, eps=0), + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), neck=dict( - type='NASFCOS_FPN', + type="NASFCOS_FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs=True, num_outs=5, - norm_cfg=dict(type='BN'), - conv_cfg=dict(type='DCNv2', deform_groups=2)), + norm_cfg=dict(type="BN"), + conv_cfg=dict(type="DCNv2", deform_groups=2), + ), bbox_head=dict( - type='NASFCOSHead', + type="NASFCOSHead", num_classes=80, in_channels=256, feat_channels=256, strides=[8, 16, 32, 64, 128], - norm_cfg=dict(type='GN', num_groups=32), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='IoULoss', loss_weight=1.0), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + norm_cfg=dict(type="GN", num_groups=32), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", loss_weight=1.0), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), train_cfg=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + debug=False, + ), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # dataset settings train_dataloader = dict(batch_size=4, num_workers=2) # optimizer -optim_wrapper = dict( - optimizer=dict(lr=0.01), - paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.)) +optim_wrapper = dict(optimizer=dict(lr=0.01), paramwise_cfg=dict(bias_lr_mult=2.0, bias_decay_mult=0.0)) diff --git a/mmpose/configs/mmdet/nas_fpn/retinanet_r50_fpn_crop640-50e_coco.py b/mmpose/configs/mmdet/nas_fpn/retinanet_r50_fpn_crop640-50e_coco.py index 11c34f6758a4862571e3f840424341c3964115be..16c0e5b0e79bbb20327e8e7b932f3f1056467f64 100644 --- a/mmpose/configs/mmdet/nas_fpn/retinanet_r50_fpn_crop640-50e_coco.py +++ b/mmpose/configs/mmdet/nas_fpn/retinanet_r50_fpn_crop640-50e_coco.py @@ -1,51 +1,43 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -norm_cfg = dict(type='BN', requires_grad=True) +norm_cfg = dict(type="BN", requires_grad=True) model = dict( data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=64, - batch_augments=[dict(type='BatchFixedSizePad', size=(640, 640))]), + batch_augments=[dict(type="BatchFixedSizePad", size=(640, 640))], + ), backbone=dict(norm_eval=False), - neck=dict( - relu_before_extra_convs=True, - no_norm_on_lateral=True, - norm_cfg=norm_cfg), - bbox_head=dict(type='RetinaSepBNHead', num_ins=5, norm_cfg=norm_cfg), + neck=dict(relu_before_extra_convs=True, no_norm_on_lateral=True, norm_cfg=norm_cfg), + bbox_head=dict(type="RetinaSepBNHead", num_ins=5, norm_cfg=norm_cfg), # training and testing settings - train_cfg=dict(assigner=dict(neg_iou_thr=0.5))) + train_cfg=dict(assigner=dict(neg_iou_thr=0.5)), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', - scale=(640, 640), - ratio_range=(0.8, 1.2), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(640, 640)), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=(640, 640), ratio_range=(0.8, 1.2), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640)), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(640, 640), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(640, 640), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -train_dataloader = dict( - batch_size=8, num_workers=4, dataset=dict(pipeline=train_pipeline)) +train_dataloader = dict(batch_size=8, num_workers=4, dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader @@ -55,20 +47,14 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[30, 40], - gamma=0.1) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[30, 40], gamma=0.1), ] # optimizer optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.08, momentum=0.9, weight_decay=0.0001), - paramwise_cfg=dict(norm_decay_mult=0, bypass_duplicate=True)) + optimizer=dict(type="SGD", lr=0.08, momentum=0.9, weight_decay=0.0001), paramwise_cfg=dict(norm_decay_mult=0, bypass_duplicate=True) +) env_cfg = dict(cudnn_benchmark=True) diff --git a/mmpose/configs/mmdet/nas_fpn/retinanet_r50_nasfpn_crop640-50e_coco.py b/mmpose/configs/mmdet/nas_fpn/retinanet_r50_nasfpn_crop640-50e_coco.py index a851b745defb72aa05df289a3002c1534655d118..5c89a2ab15acf09d19f7d084d944b8afb2d38a85 100644 --- a/mmpose/configs/mmdet/nas_fpn/retinanet_r50_nasfpn_crop640-50e_coco.py +++ b/mmpose/configs/mmdet/nas_fpn/retinanet_r50_nasfpn_crop640-50e_coco.py @@ -1,4 +1,4 @@ -_base_ = './retinanet_r50_fpn_crop640-50e_coco.py' +_base_ = "./retinanet_r50_fpn_crop640-50e_coco.py" # model settings model = dict( @@ -7,10 +7,12 @@ model = dict( data_preprocessor=dict(pad_size_divisor=128), neck=dict( _delete_=True, - type='NASFPN', + type="NASFPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5, stack_times=7, start_level=1, - norm_cfg=dict(type='BN', requires_grad=True))) + norm_cfg=dict(type="BN", requires_grad=True), + ), +) diff --git a/mmpose/configs/mmdet/objects365/faster-rcnn_r50-syncbn_fpn_1350k_objects365v1.py b/mmpose/configs/mmdet/objects365/faster-rcnn_r50-syncbn_fpn_1350k_objects365v1.py index ff7d0a360b95b1a72f779a8f7ad22a7e03235720..c4855ae39dd12859407373ed573bfcc2c1632a40 100644 --- a/mmpose/configs/mmdet/objects365/faster-rcnn_r50-syncbn_fpn_1350k_objects365v1.py +++ b/mmpose/configs/mmdet/objects365/faster-rcnn_r50-syncbn_fpn_1350k_objects365v1.py @@ -1,44 +1,27 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/objects365v2_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/objects365v2_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -model = dict( - backbone=dict(norm_cfg=dict(type='SyncBN', requires_grad=True)), - roi_head=dict(bbox_head=dict(num_classes=365))) +model = dict(backbone=dict(norm_cfg=dict(type="SyncBN", requires_grad=True)), roi_head=dict(bbox_head=dict(num_classes=365))) # training schedule for 1350K -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=1350000, # 36 epochs - val_interval=150000) +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=1350000, val_interval=150000) # 36 epochs # Using 8 GPUS while training optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001), clip_grad=dict(max_norm=35, norm_type=2) +) # learning rate policy param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 1000, - by_epoch=False, - begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=1350000, - by_epoch=False, - milestones=[900000, 1200000], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 1000, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=1350000, by_epoch=False, milestones=[900000, 1200000], gamma=0.1), ] -train_dataloader = dict(sampler=dict(type='InfiniteSampler')) +train_dataloader = dict(sampler=dict(type="InfiniteSampler")) default_hooks = dict(checkpoint=dict(by_epoch=False, interval=150000)) log_processor = dict(by_epoch=False) diff --git a/mmpose/configs/mmdet/objects365/faster-rcnn_r50_fpn_16xb4-1x_objects365v1.py b/mmpose/configs/mmdet/objects365/faster-rcnn_r50_fpn_16xb4-1x_objects365v1.py index bc0d96fa22920a34f9ab9437a0f15cc93f46d0fa..bcc9c2225ca24bd1f63f332125af1289cd1f654e 100644 --- a/mmpose/configs/mmdet/objects365/faster-rcnn_r50_fpn_16xb4-1x_objects365v1.py +++ b/mmpose/configs/mmdet/objects365/faster-rcnn_r50_fpn_16xb4-1x_objects365v1.py @@ -1,7 +1,8 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/objects365v1_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/objects365v1_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict(roi_head=dict(bbox_head=dict(num_classes=365))) @@ -12,25 +13,13 @@ train_dataloader = dict( # Using 32 GPUS while training optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.08, momentum=0.9, weight_decay=0.0001), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", optimizer=dict(type="SGD", lr=0.08, momentum=0.9, weight_decay=0.0001), clip_grad=dict(max_norm=35, norm_type=2) +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 1000, - by_epoch=False, - begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 1000, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/objects365/faster-rcnn_r50_fpn_16xb4-1x_objects365v2.py b/mmpose/configs/mmdet/objects365/faster-rcnn_r50_fpn_16xb4-1x_objects365v2.py index 1090678f652444c82a627fbf8bdda39fe0077f1e..e26c58a9aa833256178edf2dc9cf7039986f0609 100644 --- a/mmpose/configs/mmdet/objects365/faster-rcnn_r50_fpn_16xb4-1x_objects365v2.py +++ b/mmpose/configs/mmdet/objects365/faster-rcnn_r50_fpn_16xb4-1x_objects365v2.py @@ -1,7 +1,8 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/objects365v2_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/objects365v2_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict(roi_head=dict(bbox_head=dict(num_classes=365))) @@ -12,25 +13,13 @@ train_dataloader = dict( # Using 32 GPUS while training optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.08, momentum=0.9, weight_decay=0.0001), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", optimizer=dict(type="SGD", lr=0.08, momentum=0.9, weight_decay=0.0001), clip_grad=dict(max_norm=35, norm_type=2) +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 1000, - by_epoch=False, - begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 1000, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/objects365/retinanet_r50-syncbn_fpn_1350k_objects365v1.py b/mmpose/configs/mmdet/objects365/retinanet_r50-syncbn_fpn_1350k_objects365v1.py index c41dfce8bc67e7f4d18434a2c10a33c66da403c1..984a395adefc42c13c0debe65e18f47c133fab23 100644 --- a/mmpose/configs/mmdet/objects365/retinanet_r50-syncbn_fpn_1350k_objects365v1.py +++ b/mmpose/configs/mmdet/objects365/retinanet_r50-syncbn_fpn_1350k_objects365v1.py @@ -1,44 +1,27 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/objects365v2_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/objects365v2_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -model = dict( - backbone=dict(norm_cfg=dict(type='SyncBN', requires_grad=True)), - bbox_head=dict(num_classes=365)) +model = dict(backbone=dict(norm_cfg=dict(type="SyncBN", requires_grad=True)), bbox_head=dict(num_classes=365)) # training schedule for 1350K -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=1350000, # 36 epochs - val_interval=150000) +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=1350000, val_interval=150000) # 36 epochs # Using 8 GPUS while training optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001), clip_grad=dict(max_norm=35, norm_type=2) +) # learning rate policy param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 1000, - by_epoch=False, - begin=0, - end=10000), - dict( - type='MultiStepLR', - begin=0, - end=1350000, - by_epoch=False, - milestones=[900000, 1200000], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 1000, by_epoch=False, begin=0, end=10000), + dict(type="MultiStepLR", begin=0, end=1350000, by_epoch=False, milestones=[900000, 1200000], gamma=0.1), ] -train_dataloader = dict(sampler=dict(type='InfiniteSampler')) +train_dataloader = dict(sampler=dict(type="InfiniteSampler")) default_hooks = dict(checkpoint=dict(by_epoch=False, interval=150000)) log_processor = dict(by_epoch=False) diff --git a/mmpose/configs/mmdet/objects365/retinanet_r50_fpn_1x_objects365v1.py b/mmpose/configs/mmdet/objects365/retinanet_r50_fpn_1x_objects365v1.py index 72144192aaa36d757053a982ed7ad2a886916b75..8a0eddc86fe90f29453bf8cb61afaa5f80d9bf42 100644 --- a/mmpose/configs/mmdet/objects365/retinanet_r50_fpn_1x_objects365v1.py +++ b/mmpose/configs/mmdet/objects365/retinanet_r50_fpn_1x_objects365v1.py @@ -1,32 +1,21 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/objects365v1_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/objects365v1_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict(bbox_head=dict(num_classes=365)) # Using 8 GPUS while training optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001), clip_grad=dict(max_norm=35, norm_type=2) +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 1000, - by_epoch=False, - begin=0, - end=10000), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 1000, by_epoch=False, begin=0, end=10000), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/objects365/retinanet_r50_fpn_1x_objects365v2.py b/mmpose/configs/mmdet/objects365/retinanet_r50_fpn_1x_objects365v2.py index 219544126ab0ab6e93d50f1962ffaf40f25b14f0..85565db9bb35a3b0c764dd034df8f05ff3a01e82 100644 --- a/mmpose/configs/mmdet/objects365/retinanet_r50_fpn_1x_objects365v2.py +++ b/mmpose/configs/mmdet/objects365/retinanet_r50_fpn_1x_objects365v2.py @@ -1,32 +1,21 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/objects365v2_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/objects365v2_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict(bbox_head=dict(num_classes=365)) # Using 8 GPUS while training optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001), clip_grad=dict(max_norm=35, norm_type=2) +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 1000, - by_epoch=False, - begin=0, - end=10000), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 1000, by_epoch=False, begin=0, end=10000), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/ocsort/ocsort_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py b/mmpose/configs/mmdet/ocsort/ocsort_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py index ea04923d6aec237c51b7e23d0348c487cb9d697b..669ef8a9aa6af8d3e16f345cb7cbf6c72b93bea9 100644 --- a/mmpose/configs/mmdet/ocsort/ocsort_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py +++ b/mmpose/configs/mmdet/ocsort/ocsort_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py @@ -1,13 +1,13 @@ _base_ = [ - '../bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py', # noqa: E501 + "../bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py", # noqa: E501 ] model = dict( - type='OCSORT', + type="OCSORT", tracker=dict( _delete_=True, - type='OCSORTTracker', - motion=dict(type='KalmanFilter'), + type="OCSORTTracker", + motion=dict(type="KalmanFilter"), obj_score_thr=0.3, init_track_thr=0.7, weight_iou_with_det_scores=True, @@ -15,4 +15,6 @@ model = dict( num_tentatives=3, vel_consist_weight=0.2, vel_delta_t=3, - num_frames_retain=30)) + num_frames_retain=30, + ), +) diff --git a/mmpose/configs/mmdet/ocsort/ocsort_yolox_x_8xb4-amp-80e_crowdhuman-mot20train_test-mot20test.py b/mmpose/configs/mmdet/ocsort/ocsort_yolox_x_8xb4-amp-80e_crowdhuman-mot20train_test-mot20test.py index ea04923d6aec237c51b7e23d0348c487cb9d697b..669ef8a9aa6af8d3e16f345cb7cbf6c72b93bea9 100644 --- a/mmpose/configs/mmdet/ocsort/ocsort_yolox_x_8xb4-amp-80e_crowdhuman-mot20train_test-mot20test.py +++ b/mmpose/configs/mmdet/ocsort/ocsort_yolox_x_8xb4-amp-80e_crowdhuman-mot20train_test-mot20test.py @@ -1,13 +1,13 @@ _base_ = [ - '../bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py', # noqa: E501 + "../bytetrack/bytetrack_yolox_x_8xb4-amp-80e_crowdhuman-mot17halftrain_test-mot17halfval.py", # noqa: E501 ] model = dict( - type='OCSORT', + type="OCSORT", tracker=dict( _delete_=True, - type='OCSORTTracker', - motion=dict(type='KalmanFilter'), + type="OCSORTTracker", + motion=dict(type="KalmanFilter"), obj_score_thr=0.3, init_track_thr=0.7, weight_iou_with_det_scores=True, @@ -15,4 +15,6 @@ model = dict( num_tentatives=3, vel_consist_weight=0.2, vel_delta_t=3, - num_frames_retain=30)) + num_frames_retain=30, + ), +) diff --git a/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-1x_openimages-challenge.py b/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-1x_openimages-challenge.py index e79a92cccb2e432e5dd60bc080dab76781eb32bc..01a1157b4b63de2cd0a23a143d3e0e878c1f097e 100644 --- a/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-1x_openimages-challenge.py +++ b/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-1x_openimages-challenge.py @@ -1,37 +1,39 @@ -_base_ = ['faster-rcnn_r50_fpn_32xb2-1x_openimages.py'] +_base_ = ["faster-rcnn_r50_fpn_32xb2-1x_openimages.py"] -model = dict( - roi_head=dict(bbox_head=dict(num_classes=500)), - test_cfg=dict(rcnn=dict(score_thr=0.01))) +model = dict(roi_head=dict(bbox_head=dict(num_classes=500)), test_cfg=dict(rcnn=dict(score_thr=0.01))) # dataset settings -dataset_type = 'OpenImagesChallengeDataset' +dataset_type = "OpenImagesChallengeDataset" train_dataloader = dict( dataset=dict( type=dataset_type, - ann_file='challenge2019/challenge-2019-train-detection-bbox.txt', - label_file='challenge2019/cls-label-description.csv', - hierarchy_file='challenge2019/class_label_tree.np', - meta_file='challenge2019/challenge-2019-train-metas.pkl')) + ann_file="challenge2019/challenge-2019-train-detection-bbox.txt", + label_file="challenge2019/cls-label-description.csv", + hierarchy_file="challenge2019/class_label_tree.np", + meta_file="challenge2019/challenge-2019-train-metas.pkl", + ) +) val_dataloader = dict( dataset=dict( type=dataset_type, - ann_file='challenge2019/challenge-2019-validation-detection-bbox.txt', - data_prefix=dict(img='OpenImages/'), - label_file='challenge2019/cls-label-description.csv', - hierarchy_file='challenge2019/class_label_tree.np', - meta_file='challenge2019/challenge-2019-validation-metas.pkl', - image_level_ann_file='challenge2019/challenge-2019-validation-' - 'detection-human-imagelabels.csv')) + ann_file="challenge2019/challenge-2019-validation-detection-bbox.txt", + data_prefix=dict(img="OpenImages/"), + label_file="challenge2019/cls-label-description.csv", + hierarchy_file="challenge2019/class_label_tree.np", + meta_file="challenge2019/challenge-2019-validation-metas.pkl", + image_level_ann_file="challenge2019/challenge-2019-validation-" "detection-human-imagelabels.csv", + ) +) test_dataloader = dict( dataset=dict( type=dataset_type, - ann_file='challenge2019/challenge-2019-validation-detection-bbox.txt', - label_file='challenge2019/cls-label-description.csv', - hierarchy_file='challenge2019/class_label_tree.np', - meta_file='challenge2019/challenge-2019-validation-metas.pkl', - image_level_ann_file='challenge2019/challenge-2019-validation-' - 'detection-human-imagelabels.csv')) + ann_file="challenge2019/challenge-2019-validation-detection-bbox.txt", + label_file="challenge2019/cls-label-description.csv", + hierarchy_file="challenge2019/class_label_tree.np", + meta_file="challenge2019/challenge-2019-validation-metas.pkl", + image_level_ann_file="challenge2019/challenge-2019-validation-" "detection-human-imagelabels.csv", + ) +) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-1x_openimages.py b/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-1x_openimages.py index f3f0aa0a0ff0ef16cd6e55543a72b5fe405ec5a8..1b0d81ff4babb13140234c7a8d52c9aae17564a6 100644 --- a/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-1x_openimages.py +++ b/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-1x_openimages.py @@ -1,32 +1,21 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/openimages_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/openimages_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict(roi_head=dict(bbox_head=dict(num_classes=601))) # Using 32 GPUS while training optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.08, momentum=0.9, weight_decay=0.0001), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", optimizer=dict(type="SGD", lr=0.08, momentum=0.9, weight_decay=0.0001), clip_grad=dict(max_norm=35, norm_type=2) +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 64, - by_epoch=False, - begin=0, - end=26000), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 64, by_epoch=False, begin=0, end=26000), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-cas-1x_openimages-challenge.py b/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-cas-1x_openimages-challenge.py index 9e428725bcc39d2c009a2382c191fa53fe5ce284..edd8dd168bb4dac3d1d63ebe70571f837814e29d 100644 --- a/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-cas-1x_openimages-challenge.py +++ b/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-cas-1x_openimages-challenge.py @@ -1,5 +1,4 @@ -_base_ = ['faster-rcnn_r50_fpn_32xb2-1x_openimages-challenge.py'] +_base_ = ["faster-rcnn_r50_fpn_32xb2-1x_openimages-challenge.py"] # Use ClassAwareSampler -train_dataloader = dict( - sampler=dict(_delete_=True, type='ClassAwareSampler', num_sample_class=1)) +train_dataloader = dict(sampler=dict(_delete_=True, type="ClassAwareSampler", num_sample_class=1)) diff --git a/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-cas-1x_openimages.py b/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-cas-1x_openimages.py index 803190abfee63ea87e70dfe1b0fddca02f3556b8..34edacc92569e7824c1d4f4592e5b85271b3aa0f 100644 --- a/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-cas-1x_openimages.py +++ b/mmpose/configs/mmdet/openimages/faster-rcnn_r50_fpn_32xb2-cas-1x_openimages.py @@ -1,5 +1,4 @@ -_base_ = ['faster-rcnn_r50_fpn_32xb2-1x_openimages.py'] +_base_ = ["faster-rcnn_r50_fpn_32xb2-1x_openimages.py"] # Use ClassAwareSampler -train_dataloader = dict( - sampler=dict(_delete_=True, type='ClassAwareSampler', num_sample_class=1)) +train_dataloader = dict(sampler=dict(_delete_=True, type="ClassAwareSampler", num_sample_class=1)) diff --git a/mmpose/configs/mmdet/openimages/retinanet_r50_fpn_32xb2-1x_openimages.py b/mmpose/configs/mmdet/openimages/retinanet_r50_fpn_32xb2-1x_openimages.py index 97a0eb075c730ceeaa494190e0b8369706c7d7c3..d83db30b37d4f740313d6afa2d30ec091f28093f 100644 --- a/mmpose/configs/mmdet/openimages/retinanet_r50_fpn_32xb2-1x_openimages.py +++ b/mmpose/configs/mmdet/openimages/retinanet_r50_fpn_32xb2-1x_openimages.py @@ -1,33 +1,22 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/openimages_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/openimages_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict(bbox_head=dict(num_classes=601)) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 64, - by_epoch=False, - begin=0, - end=26000), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 64, by_epoch=False, begin=0, end=26000), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.08, momentum=0.9, weight_decay=0.0001), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", optimizer=dict(type="SGD", lr=0.08, momentum=0.9, weight_decay=0.0001), clip_grad=dict(max_norm=35, norm_type=2) +) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/openimages/ssd300_32xb8-36e_openimages.py b/mmpose/configs/mmdet/openimages/ssd300_32xb8-36e_openimages.py index 9cb51cae00a8707c0a901b99620851132e9eaccf..3895ae13db8ee8e91ba970a883089f28635947fc 100644 --- a/mmpose/configs/mmdet/openimages/ssd300_32xb8-36e_openimages.py +++ b/mmpose/configs/mmdet/openimages/ssd300_32xb8-36e_openimages.py @@ -1,46 +1,35 @@ _base_ = [ - '../_base_/models/ssd300.py', '../_base_/datasets/openimages_detection.py', - '../_base_/default_runtime.py', '../_base_/schedules/schedule_1x.py' + "../_base_/models/ssd300.py", + "../_base_/datasets/openimages_detection.py", + "../_base_/default_runtime.py", + "../_base_/schedules/schedule_1x.py", ] -model = dict( - bbox_head=dict( - num_classes=601, - anchor_generator=dict(basesize_ratio_range=(0.2, 0.9)))) +model = dict(bbox_head=dict(num_classes=601, anchor_generator=dict(basesize_ratio_range=(0.2, 0.9)))) # dataset settings -dataset_type = 'OpenImagesDataset' -data_root = 'data/OpenImages/' +dataset_type = "OpenImagesDataset" +data_root = "data/OpenImages/" input_size = 300 train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PhotoMetricDistortion", brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict( - type='PhotoMetricDistortion', - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18), - dict( - type='Expand', + type="Expand", mean={{_base_.model.data_preprocessor.mean}}, to_rgb={{_base_.model.data_preprocessor.bgr_to_rgb}}, - ratio_range=(1, 4)), - dict( - type='MinIoURandomCrop', - min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), - min_crop_size=0.3), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + ratio_range=(1, 4), + ), + dict(type="MinIoURandomCrop", min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), # avoid bboxes being resized - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'instances')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "instances")), ] train_dataloader = dict( @@ -48,38 +37,29 @@ train_dataloader = dict( batch_sampler=None, dataset=dict( _delete_=True, - type='RepeatDataset', + type="RepeatDataset", times=3, # repeat 3 times, total epochs are 12 x 3 dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/oidv6-train-annotations-bbox.csv', - data_prefix=dict(img='OpenImages/train/'), - label_file='annotations/class-descriptions-boxable.csv', - hierarchy_file='annotations/bbox_labels_600_hierarchy.json', - meta_file='annotations/train-image-metas.pkl', - pipeline=train_pipeline))) + ann_file="annotations/oidv6-train-annotations-bbox.csv", + data_prefix=dict(img="OpenImages/train/"), + label_file="annotations/class-descriptions-boxable.csv", + hierarchy_file="annotations/bbox_labels_600_hierarchy.json", + meta_file="annotations/train-image-metas.pkl", + pipeline=train_pipeline, + ), + ), +) val_dataloader = dict(batch_size=8, dataset=dict(pipeline=test_pipeline)) test_dataloader = dict(batch_size=8, dataset=dict(pipeline=test_pipeline)) # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.04, momentum=0.9, weight_decay=5e-4)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.04, momentum=0.9, weight_decay=5e-4)) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=0.001, - by_epoch=False, - begin=0, - end=20000), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=20000), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/paa/paa_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/paa/paa_r101_fpn_1x_coco.py index 94f1c278dc16c1befbca510ca0ac5ba407969f6d..a493666facc06cbdb5009daf45a61f281a404878 100644 --- a/mmpose/configs/mmdet/paa/paa_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/paa/paa_r101_fpn_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './paa_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./paa_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/paa/paa_r101_fpn_2x_coco.py b/mmpose/configs/mmdet/paa/paa_r101_fpn_2x_coco.py index c6136f3bb404df6a6fc18536e6770116738af6c7..54e3486fa80c5ee3341feee2e5dbe3da6910e325 100644 --- a/mmpose/configs/mmdet/paa/paa_r101_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/paa/paa_r101_fpn_2x_coco.py @@ -1,17 +1,10 @@ -_base_ = './paa_r101_fpn_1x_coco.py' +_base_ = "./paa_r101_fpn_1x_coco.py" max_epochs = 24 # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] # training schedule for 2x diff --git a/mmpose/configs/mmdet/paa/paa_r101_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/paa/paa_r101_fpn_ms-3x_coco.py index 8529dcdb90adb2b02162f4d2268088f5f376fcb0..3d66e449bc61c8d28f2375ccd6166f6f00019f39 100644 --- a/mmpose/configs/mmdet/paa/paa_r101_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/paa/paa_r101_fpn_ms-3x_coco.py @@ -1,6 +1,2 @@ -_base_ = './paa_r50_fpn_ms-3x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./paa_r50_fpn_ms-3x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/paa/paa_r50_fpn_1.5x_coco.py b/mmpose/configs/mmdet/paa/paa_r50_fpn_1.5x_coco.py index ae993b5c4370c8fc3e450f84fb7058528b853727..9f1cdfe555d2cdd55a19adc3cba5561d5fe0342d 100644 --- a/mmpose/configs/mmdet/paa/paa_r50_fpn_1.5x_coco.py +++ b/mmpose/configs/mmdet/paa/paa_r50_fpn_1.5x_coco.py @@ -1,17 +1,10 @@ -_base_ = './paa_r50_fpn_1x_coco.py' +_base_ = "./paa_r50_fpn_1x_coco.py" max_epochs = 18 # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[12, 16], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[12, 16], gamma=0.1), ] # training schedule for 1.5x diff --git a/mmpose/configs/mmdet/paa/paa_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/paa/paa_r50_fpn_1x_coco.py index f806a3ea65ffb9ee8b898122fb678b94ef212637..00b76f3cb8828b624f058c9a10d78150d4755fd3 100644 --- a/mmpose/configs/mmdet/paa/paa_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/paa/paa_r50_fpn_1x_coco.py @@ -1,36 +1,25 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings model = dict( - type='PAA', + type="PAA", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5), bbox_head=dict( - type='PAAHead', + type="PAAHead", reg_decoded_bbox=True, score_voting=True, topk=9, @@ -38,43 +27,20 @@ model = dict( in_channels=256, stacked_convs=4, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=1.3), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.5)), + anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=1.3), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=0.5), + ), # training and testing settings train_cfg=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.1, - neg_iou_thr=0.1, - min_pos_iou=0, - ignore_iof_thr=-1), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.1, neg_iou_thr=0.1, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + debug=False, + ), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/paa/paa_r50_fpn_2x_coco.py b/mmpose/configs/mmdet/paa/paa_r50_fpn_2x_coco.py index 6908e4eb97fcfa92a20d486ceab9a7ddfaf480b7..b4e546f94ebbdf97a0897d70881ce0dfd6d9bd19 100644 --- a/mmpose/configs/mmdet/paa/paa_r50_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/paa/paa_r50_fpn_2x_coco.py @@ -1,17 +1,10 @@ -_base_ = './paa_r50_fpn_1x_coco.py' +_base_ = "./paa_r50_fpn_1x_coco.py" max_epochs = 24 # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] # training schedule for 2x diff --git a/mmpose/configs/mmdet/paa/paa_r50_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/paa/paa_r50_fpn_ms-3x_coco.py index fed8b90a0fde7a1d344160a6658be04d1f9c654e..9d3f284c5eb14487a6bd48b29233fbf613b79a20 100644 --- a/mmpose/configs/mmdet/paa/paa_r50_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/paa/paa_r50_fpn_ms-3x_coco.py @@ -1,29 +1,20 @@ -_base_ = './paa_r50_fpn_1x_coco.py' +_base_ = "./paa_r50_fpn_1x_coco.py" max_epochs = 36 # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[28, 34], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[28, 34], gamma=0.1), ] # training schedule for 3x train_cfg = dict(max_epochs=max_epochs) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', scale=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/pafpn/faster-rcnn_r50_pafpn_1x_coco.py b/mmpose/configs/mmdet/pafpn/faster-rcnn_r50_pafpn_1x_coco.py index 1452baeca7e680b11f9b2ec654abe689d3e53042..429c35f2f240e34c969ab19288341003d1ee83b0 100644 --- a/mmpose/configs/mmdet/pafpn/faster-rcnn_r50_pafpn_1x_coco.py +++ b/mmpose/configs/mmdet/pafpn/faster-rcnn_r50_pafpn_1x_coco.py @@ -1,8 +1,3 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" -model = dict( - neck=dict( - type='PAFPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5)) +model = dict(neck=dict(type="PAFPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5)) diff --git a/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r101_fpn_1x_coco.py index b960254ef5ecfac1de790a66a5378535114e9ba3..006e986172119d18c0f1a7a5a8c76dc4d6a2d669 100644 --- a/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r101_fpn_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './panoptic-fpn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./panoptic-fpn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r101_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r101_fpn_ms-3x_coco.py index 268782ee2cca31796e43423300319176556cfef7..2176ba17cfc579ce92a02e1e3c427b665ec7a746 100644 --- a/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r101_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r101_fpn_ms-3x_coco.py @@ -1,6 +1,2 @@ -_base_ = './panoptic-fpn_r50_fpn_ms-3x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./panoptic-fpn_r50_fpn_ms-3x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r50_fpn_1x_coco.py index c2c89ef520124a43c910b35a4808153e4c455d3a..42ebbd57616e3d89ec8812b5257d68e54e603ac3 100644 --- a/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r50_fpn_1x_coco.py @@ -1,13 +1,14 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_panoptic.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_panoptic.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( - type='PanopticFPN', + type="PanopticFPN", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, @@ -15,31 +16,27 @@ model = dict( pad_mask=True, mask_pad_value=0, pad_seg=True, - seg_pad_value=255), + seg_pad_value=255, + ), semantic_head=dict( - type='PanopticFPNHead', + type="PanopticFPNHead", num_things_classes=80, num_stuff_classes=53, in_channels=256, inner_channels=128, start_level=0, end_level=4, - norm_cfg=dict(type='GN', num_groups=32, requires_grad=True), + norm_cfg=dict(type="GN", num_groups=32, requires_grad=True), conv_cfg=None, - loss_seg=dict( - type='CrossEntropyLoss', ignore_index=255, loss_weight=0.5)), - panoptic_fusion_head=dict( - type='HeuristicFusionHead', - num_things_classes=80, - num_stuff_classes=53), + loss_seg=dict(type="CrossEntropyLoss", ignore_index=255, loss_weight=0.5), + ), + panoptic_fusion_head=dict(type="HeuristicFusionHead", num_things_classes=80, num_stuff_classes=53), test_cfg=dict( - rcnn=dict( - score_thr=0.6, - nms=dict(type='nms', iou_threshold=0.5, class_agnostic=True), - max_per_img=100, - mask_thr_binary=0.5), + rcnn=dict(score_thr=0.6, nms=dict(type="nms", iou_threshold=0.5, class_agnostic=True), max_per_img=100, mask_thr_binary=0.5), # used in HeuristicFusionHead - panoptic=dict(mask_overlap=0.5, stuff_area_limit=4096))) + panoptic=dict(mask_overlap=0.5, stuff_area_limit=4096), + ), +) # Forced to remove NumClassCheckHook custom_hooks = [] diff --git a/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r50_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r50_fpn_ms-3x_coco.py index b18a8f8dd7eb6c49e277346ffe71c6e36c9d3b68..59868d0e849e0ec43667f575580dd0cd1d0e028d 100644 --- a/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r50_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/panoptic_fpn/panoptic-fpn_r50_fpn_ms-3x_coco.py @@ -1,19 +1,13 @@ -_base_ = './panoptic-fpn_r50_fpn_1x_coco.py' +_base_ = "./panoptic-fpn_r50_fpn_1x_coco.py" # In mstrain 3x config, img_scale=[(1333, 640), (1333, 800)], # multiscale_mode='range' train_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='LoadPanopticAnnotations', - with_bbox=True, - with_mask=True, - with_seg=True), - dict( - type='RandomResize', scale=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile"), + dict(type="LoadPanopticAnnotations", with_bbox=True, with_mask=True, with_seg=True), + dict(type="RandomResize", scale=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) @@ -23,13 +17,6 @@ train_cfg = dict(max_epochs=36, val_interval=3) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=36, - by_epoch=True, - milestones=[24, 33], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=36, by_epoch=True, milestones=[24, 33], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/pascal_voc/faster-rcnn_r50-caffe-c4_ms-18k_voc0712.py b/mmpose/configs/mmdet/pascal_voc/faster-rcnn_r50-caffe-c4_ms-18k_voc0712.py index dddc0bbdf33948478e11bb701f844a8473ddf165..faa07a84b1e26441993621f67b5994417aad81df 100644 --- a/mmpose/configs/mmdet/pascal_voc/faster-rcnn_r50-caffe-c4_ms-18k_voc0712.py +++ b/mmpose/configs/mmdet/pascal_voc/faster-rcnn_r50-caffe-c4_ms-18k_voc0712.py @@ -1,86 +1,86 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50-caffe-c4.py', - '../_base_/schedules/schedule_1x.py', '../_base_/datasets/voc0712.py', - '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50-caffe-c4.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/datasets/voc0712.py", + "../_base_/default_runtime.py", ] model = dict(roi_head=dict(bbox_head=dict(num_classes=20))) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='RandomChoiceResize', - scales=[(1333, 480), (1333, 512), (1333, 544), (1333, 576), - (1333, 608), (1333, 640), (1333, 672), (1333, 704), - (1333, 736), (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + type="RandomChoiceResize", + scales=[ + (1333, 480), + (1333, 512), + (1333, 544), + (1333, 576), + (1333, 608), + (1333, 640), + (1333, 672), + (1333, 704), + (1333, 736), + (1333, 768), + (1333, 800), + ], + keep_ratio=True, + ), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), # avoid bboxes being resized - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( - sampler=dict(type='InfiniteSampler', shuffle=True), + sampler=dict(type="InfiniteSampler", shuffle=True), dataset=dict( _delete_=True, - type='ConcatDataset', + type="ConcatDataset", datasets=[ dict( - type='VOCDataset', + type="VOCDataset", data_root={{_base_.data_root}}, - ann_file='VOC2007/ImageSets/Main/trainval.txt', - data_prefix=dict(sub_data_root='VOC2007/'), + ann_file="VOC2007/ImageSets/Main/trainval.txt", + data_prefix=dict(sub_data_root="VOC2007/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args={{_base_.backend_args}}), + backend_args={{_base_.backend_args}}, + ), dict( - type='VOCDataset', + type="VOCDataset", data_root={{_base_.data_root}}, - ann_file='VOC2012/ImageSets/Main/trainval.txt', - data_prefix=dict(sub_data_root='VOC2012/'), + ann_file="VOC2012/ImageSets/Main/trainval.txt", + data_prefix=dict(sub_data_root="VOC2012/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args={{_base_.backend_args}}) - ])) + backend_args={{_base_.backend_args}}, + ), + ], + ), +) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader # training schedule for 18k max_iter = 18000 -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=max_iter, - val_interval=3000) +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=max_iter, val_interval=3000) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=100), - dict( - type='MultiStepLR', - begin=0, - end=max_iter, - by_epoch=False, - milestones=[12000, 16000], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=100), + dict(type="MultiStepLR", begin=0, end=max_iter, by_epoch=False, milestones=[12000, 16000], gamma=0.1), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)) default_hooks = dict(checkpoint=dict(by_epoch=False, interval=3000)) log_processor = dict(by_epoch=False) diff --git a/mmpose/configs/mmdet/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712-cocofmt.py b/mmpose/configs/mmdet/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712-cocofmt.py index 0b0aa41d67fc4edfde6d534e2e54a135f5de6e44..af03f3f16debc39161c05dd1ff4811d16e29e359 100644 --- a/mmpose/configs/mmdet/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712-cocofmt.py +++ b/mmpose/configs/mmdet/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712-cocofmt.py @@ -1,97 +1,117 @@ -_base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', '../_base_/datasets/voc0712.py', - '../_base_/default_runtime.py' -] +_base_ = ["../_base_/models/faster-rcnn_r50_fpn.py", "../_base_/datasets/voc0712.py", "../_base_/default_runtime.py"] model = dict(roi_head=dict(bbox_head=dict(num_classes=20))) METAINFO = { - 'classes': - ('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', - 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', - 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'), + "classes": ( + "aeroplane", + "bicycle", + "bird", + "boat", + "bottle", + "bus", + "car", + "cat", + "chair", + "cow", + "diningtable", + "dog", + "horse", + "motorbike", + "person", + "pottedplant", + "sheep", + "sofa", + "train", + "tvmonitor", + ), # palette is a list of color tuples, which is used for visualization. - 'palette': [(106, 0, 228), (119, 11, 32), (165, 42, 42), (0, 0, 192), - (197, 226, 255), (0, 60, 100), (0, 0, 142), (255, 77, 255), - (153, 69, 1), (120, 166, 157), (0, 182, 199), (0, 226, 252), - (182, 182, 255), (0, 0, 230), (220, 20, 60), (163, 255, 0), - (0, 82, 0), (3, 95, 161), (0, 80, 100), (183, 130, 88)] + "palette": [ + (106, 0, 228), + (119, 11, 32), + (165, 42, 42), + (0, 0, 192), + (197, 226, 255), + (0, 60, 100), + (0, 0, 142), + (255, 77, 255), + (153, 69, 1), + (120, 166, 157), + (0, 182, 199), + (0, 226, 252), + (182, 182, 255), + (0, 0, 230), + (220, 20, 60), + (163, 255, 0), + (0, 82, 0), + (3, 95, 161), + (0, 80, 100), + (183, 130, 88), + ], } # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/VOCdevkit/' +dataset_type = "CocoDataset" +data_root = "data/VOCdevkit/" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1000, 600), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1000, 600), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(1000, 600), keep_ratio=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(1000, 600), keep_ratio=True), # avoid bboxes being resized - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( dataset=dict( - type='RepeatDataset', + type="RepeatDataset", times=3, dataset=dict( _delete_=True, type=dataset_type, data_root=data_root, - ann_file='annotations/voc0712_trainval.json', - data_prefix=dict(img=''), + ann_file="annotations/voc0712_trainval.json", + data_prefix=dict(img=""), metainfo=METAINFO, filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args={{_base_.backend_args}}))) + backend_args={{_base_.backend_args}}, + ), + ) +) val_dataloader = dict( dataset=dict( - type=dataset_type, - ann_file='annotations/voc07_test.json', - data_prefix=dict(img=''), - metainfo=METAINFO, - pipeline=test_pipeline)) + type=dataset_type, ann_file="annotations/voc07_test.json", data_prefix=dict(img=""), metainfo=METAINFO, pipeline=test_pipeline + ) +) test_dataloader = val_dataloader val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/voc07_test.json', - metric='bbox', + type="CocoMetric", + ann_file=data_root + "annotations/voc07_test.json", + metric="bbox", format_only=False, - backend_args={{_base_.backend_args}}) + backend_args={{_base_.backend_args}}, +) test_evaluator = val_evaluator # training schedule, the dataset is repeated 3 times, so the # actual epoch = 4 * 3 = 12 max_epochs = 4 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[3], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[3], gamma=0.1)] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically diff --git a/mmpose/configs/mmdet/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712.py b/mmpose/configs/mmdet/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712.py index 07391667b35c9db9e352a03624411bb568f5396a..5e5c65f67a1be0363c4a883d7dc3cb3d06209d8d 100644 --- a/mmpose/configs/mmdet/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712.py +++ b/mmpose/configs/mmdet/pascal_voc/faster-rcnn_r50_fpn_1x_voc0712.py @@ -1,32 +1,18 @@ -_base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', '../_base_/datasets/voc0712.py', - '../_base_/default_runtime.py' -] +_base_ = ["../_base_/models/faster-rcnn_r50_fpn.py", "../_base_/datasets/voc0712.py", "../_base_/default_runtime.py"] model = dict(roi_head=dict(bbox_head=dict(num_classes=20))) # training schedule, voc dataset is repeated 3 times, in # `_base_/datasets/voc0712.py`, so the actual epoch = 4 * 3 = 12 max_epochs = 4 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[3], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[3], gamma=0.1)] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically diff --git a/mmpose/configs/mmdet/pascal_voc/retinanet_r50_fpn_1x_voc0712.py b/mmpose/configs/mmdet/pascal_voc/retinanet_r50_fpn_1x_voc0712.py index c86a6f199c9317804692189975f3abaff24f6aff..2e056ec903d058ba6450f2b81b2fe8e963afb775 100644 --- a/mmpose/configs/mmdet/pascal_voc/retinanet_r50_fpn_1x_voc0712.py +++ b/mmpose/configs/mmdet/pascal_voc/retinanet_r50_fpn_1x_voc0712.py @@ -1,31 +1,17 @@ -_base_ = [ - '../_base_/models/retinanet_r50_fpn.py', '../_base_/datasets/voc0712.py', - '../_base_/default_runtime.py' -] +_base_ = ["../_base_/models/retinanet_r50_fpn.py", "../_base_/datasets/voc0712.py", "../_base_/default_runtime.py"] model = dict(bbox_head=dict(num_classes=20)) # training schedule, voc dataset is repeated 3 times, in # `_base_/datasets/voc0712.py`, so the actual epoch = 4 * 3 = 12 max_epochs = 4 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[3], - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[3], gamma=0.1)] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically diff --git a/mmpose/configs/mmdet/pascal_voc/ssd300_voc0712.py b/mmpose/configs/mmdet/pascal_voc/ssd300_voc0712.py index ff7a1368b76aa53700bd81a912b54e84ab58e53a..3336e9a037dd6023c7306c4fcea6e75a5882b18e 100644 --- a/mmpose/configs/mmdet/pascal_voc/ssd300_voc0712.py +++ b/mmpose/configs/mmdet/pascal_voc/ssd300_voc0712.py @@ -1,46 +1,35 @@ _base_ = [ - '../_base_/models/ssd300.py', '../_base_/datasets/voc0712.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/ssd300.py", + "../_base_/datasets/voc0712.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] -model = dict( - bbox_head=dict( - num_classes=20, anchor_generator=dict(basesize_ratio_range=(0.2, - 0.9)))) +model = dict(bbox_head=dict(num_classes=20, anchor_generator=dict(basesize_ratio_range=(0.2, 0.9)))) # dataset settings -dataset_type = 'VOCDataset' -data_root = 'data/VOCdevkit/' +dataset_type = "VOCDataset" +data_root = "data/VOCdevkit/" input_size = 300 train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='Expand', + type="Expand", mean={{_base_.model.data_preprocessor.mean}}, to_rgb={{_base_.model.data_preprocessor.bgr_to_rgb}}, - ratio_range=(1, 4)), - dict( - type='MinIoURandomCrop', - min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), - min_crop_size=0.3), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), - dict(type='RandomFlip', prob=0.5), - dict( - type='PhotoMetricDistortion', - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18), - dict(type='PackDetInputs') + ratio_range=(1, 4), + ), + dict(type="MinIoURandomCrop", min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), + dict(type="RandomFlip", prob=0.5), + dict(type="PhotoMetricDistortion", brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), + dict(type="LoadImageFromFile"), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), # avoid bboxes being resized - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=8, @@ -53,47 +42,40 @@ train_dataloader = dict( # VOCDataset will add different `dataset_type` in dataset.metainfo, # which will get error if using ConcatDataset. Adding # `ignore_keys` can avoid this error. - ignore_keys=['dataset_type'], + ignore_keys=["dataset_type"], datasets=[ dict( type=dataset_type, data_root=data_root, - ann_file='VOC2007/ImageSets/Main/trainval.txt', - data_prefix=dict(sub_data_root='VOC2007/'), + ann_file="VOC2007/ImageSets/Main/trainval.txt", + data_prefix=dict(sub_data_root="VOC2007/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline), + pipeline=train_pipeline, + ), dict( type=dataset_type, data_root=data_root, - ann_file='VOC2012/ImageSets/Main/trainval.txt', - data_prefix=dict(sub_data_root='VOC2012/'), + ann_file="VOC2012/ImageSets/Main/trainval.txt", + data_prefix=dict(sub_data_root="VOC2012/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline) - ]))) + pipeline=train_pipeline, + ), + ], + ), + ), +) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader -custom_hooks = [ - dict(type='NumClassCheckHook'), - dict(type='CheckInvalidLossHook', interval=50, priority='VERY_LOW') -] +custom_hooks = [dict(type="NumClassCheckHook"), dict(type="CheckInvalidLossHook", interval=50, priority="VERY_LOW")] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=1e-3, momentum=0.9, weight_decay=5e-4)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=1e-3, momentum=0.9, weight_decay=5e-4)) # learning policy param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=24, - by_epoch=True, - milestones=[16, 20], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=24, by_epoch=True, milestones=[16, 20], gamma=0.1), ] # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/pascal_voc/ssd512_voc0712.py b/mmpose/configs/mmdet/pascal_voc/ssd512_voc0712.py index 6c4dc8a3eec86ccced7d44120b254463d18c00f5..0ad1d9a88883ebf4a72b44020261afe12665b197 100644 --- a/mmpose/configs/mmdet/pascal_voc/ssd512_voc0712.py +++ b/mmpose/configs/mmdet/pascal_voc/ssd512_voc0712.py @@ -1,54 +1,45 @@ -_base_ = 'ssd300_voc0712.py' +_base_ = "ssd300_voc0712.py" input_size = 512 model = dict( neck=dict( - out_channels=(512, 1024, 512, 256, 256, 256, 256), - level_strides=(2, 2, 2, 2, 1), - level_paddings=(1, 1, 1, 1, 1), - last_kernel_size=4), + out_channels=(512, 1024, 512, 256, 256, 256, 256), level_strides=(2, 2, 2, 2, 1), level_paddings=(1, 1, 1, 1, 1), last_kernel_size=4 + ), bbox_head=dict( in_channels=(512, 1024, 512, 256, 256, 256, 256), anchor_generator=dict( input_size=input_size, strides=[8, 16, 32, 64, 128, 256, 512], basesize_ratio_range=(0.15, 0.9), - ratios=([2], [2, 3], [2, 3], [2, 3], [2, 3], [2], [2])))) + ratios=([2], [2, 3], [2, 3], [2, 3], [2, 3], [2], [2]), + ), + ), +) # dataset settings -dataset_type = 'VOCDataset' -data_root = 'data/VOCdevkit/' +dataset_type = "VOCDataset" +data_root = "data/VOCdevkit/" train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='Expand', + type="Expand", mean={{_base_.model.data_preprocessor.mean}}, to_rgb={{_base_.model.data_preprocessor.bgr_to_rgb}}, - ratio_range=(1, 4)), - dict( - type='MinIoURandomCrop', - min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), - min_crop_size=0.3), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), - dict(type='RandomFlip', prob=0.5), - dict( - type='PhotoMetricDistortion', - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18), - dict(type='PackDetInputs') + ratio_range=(1, 4), + ), + dict(type="MinIoURandomCrop", min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), + dict(type="RandomFlip", prob=0.5), + dict(type="PhotoMetricDistortion", brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), + dict(type="LoadImageFromFile"), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), # avoid bboxes being resized - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=8, @@ -61,22 +52,27 @@ train_dataloader = dict( # VOCDataset will add different `dataset_type` in dataset.metainfo, # which will get error if using ConcatDataset. Adding # `ignore_keys` can avoid this error. - ignore_keys=['dataset_type'], + ignore_keys=["dataset_type"], datasets=[ dict( type=dataset_type, data_root=data_root, - ann_file='VOC2007/ImageSets/Main/trainval.txt', - data_prefix=dict(sub_data_root='VOC2007/'), + ann_file="VOC2007/ImageSets/Main/trainval.txt", + data_prefix=dict(sub_data_root="VOC2007/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline), + pipeline=train_pipeline, + ), dict( type=dataset_type, data_root=data_root, - ann_file='VOC2012/ImageSets/Main/trainval.txt', - data_prefix=dict(sub_data_root='VOC2012/'), + ann_file="VOC2012/ImageSets/Main/trainval.txt", + data_prefix=dict(sub_data_root="VOC2012/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline) - ]))) + pipeline=train_pipeline, + ), + ], + ), + ), +) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader diff --git a/mmpose/configs/mmdet/pisa/faster-rcnn_r50_fpn_pisa_1x_coco.py b/mmpose/configs/mmdet/pisa/faster-rcnn_r50_fpn_pisa_1x_coco.py index 237a3b13aa5e61f04579670af01df8f481d80dd1..dcea3c2bf0f25c7e284765721b17d7ea87cf2611 100644 --- a/mmpose/configs/mmdet/pisa/faster-rcnn_r50_fpn_pisa_1x_coco.py +++ b/mmpose/configs/mmdet/pisa/faster-rcnn_r50_fpn_pisa_1x_coco.py @@ -1,30 +1,14 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" model = dict( - roi_head=dict( - type='PISARoIHead', - bbox_head=dict( - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))), + roi_head=dict(type="PISARoIHead", bbox_head=dict(loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0))), train_cfg=dict( - rpn_proposal=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + rpn_proposal=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( - sampler=dict( - type='ScoreHLRSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True, - k=0.5, - bias=0.), + sampler=dict(type="ScoreHLRSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True, k=0.5, bias=0.0), isr=dict(k=2, bias=0), - carl=dict(k=1, bias=0.2))), - test_cfg=dict( - rpn=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0))) + carl=dict(k=1, bias=0.2), + ), + ), + test_cfg=dict(rpn=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0)), +) diff --git a/mmpose/configs/mmdet/pisa/faster-rcnn_x101-32x4d_fpn_pisa_1x_coco.py b/mmpose/configs/mmdet/pisa/faster-rcnn_x101-32x4d_fpn_pisa_1x_coco.py index 4b2c8d9a20ac7adf1965bb3d98e868c785cb23c3..f339ce7bc6e15dc42dad27b098bf68c14b9b679e 100644 --- a/mmpose/configs/mmdet/pisa/faster-rcnn_x101-32x4d_fpn_pisa_1x_coco.py +++ b/mmpose/configs/mmdet/pisa/faster-rcnn_x101-32x4d_fpn_pisa_1x_coco.py @@ -1,30 +1,14 @@ -_base_ = '../faster_rcnn/faster-rcnn_x101-32x4d_fpn_1x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_x101-32x4d_fpn_1x_coco.py" model = dict( - roi_head=dict( - type='PISARoIHead', - bbox_head=dict( - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))), + roi_head=dict(type="PISARoIHead", bbox_head=dict(loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0))), train_cfg=dict( - rpn_proposal=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + rpn_proposal=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( - sampler=dict( - type='ScoreHLRSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True, - k=0.5, - bias=0.), + sampler=dict(type="ScoreHLRSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True, k=0.5, bias=0.0), isr=dict(k=2, bias=0), - carl=dict(k=1, bias=0.2))), - test_cfg=dict( - rpn=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0))) + carl=dict(k=1, bias=0.2), + ), + ), + test_cfg=dict(rpn=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0)), +) diff --git a/mmpose/configs/mmdet/pisa/mask-rcnn_r50_fpn_pisa_1x_coco.py b/mmpose/configs/mmdet/pisa/mask-rcnn_r50_fpn_pisa_1x_coco.py index d6a6823591b1d7780c7f9d49029579afede239aa..436b2d152272b33224bbdd8af4f934fc17ae6624 100644 --- a/mmpose/configs/mmdet/pisa/mask-rcnn_r50_fpn_pisa_1x_coco.py +++ b/mmpose/configs/mmdet/pisa/mask-rcnn_r50_fpn_pisa_1x_coco.py @@ -1,30 +1,14 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" model = dict( - roi_head=dict( - type='PISARoIHead', - bbox_head=dict( - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))), + roi_head=dict(type="PISARoIHead", bbox_head=dict(loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0))), train_cfg=dict( - rpn_proposal=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + rpn_proposal=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( - sampler=dict( - type='ScoreHLRSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True, - k=0.5, - bias=0.), + sampler=dict(type="ScoreHLRSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True, k=0.5, bias=0.0), isr=dict(k=2, bias=0), - carl=dict(k=1, bias=0.2))), - test_cfg=dict( - rpn=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0))) + carl=dict(k=1, bias=0.2), + ), + ), + test_cfg=dict(rpn=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0)), +) diff --git a/mmpose/configs/mmdet/pisa/mask-rcnn_x101-32x4d_fpn_pisa_1x_coco.py b/mmpose/configs/mmdet/pisa/mask-rcnn_x101-32x4d_fpn_pisa_1x_coco.py index f2ac19fe75ba8c5b2440772eced16397e2273735..a1faca7919828e2807d340c743772b0872c2477a 100644 --- a/mmpose/configs/mmdet/pisa/mask-rcnn_x101-32x4d_fpn_pisa_1x_coco.py +++ b/mmpose/configs/mmdet/pisa/mask-rcnn_x101-32x4d_fpn_pisa_1x_coco.py @@ -1,30 +1,14 @@ -_base_ = '../mask_rcnn/mask-rcnn_x101-32x4d_fpn_1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_x101-32x4d_fpn_1x_coco.py" model = dict( - roi_head=dict( - type='PISARoIHead', - bbox_head=dict( - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))), + roi_head=dict(type="PISARoIHead", bbox_head=dict(loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0))), train_cfg=dict( - rpn_proposal=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + rpn_proposal=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( - sampler=dict( - type='ScoreHLRSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True, - k=0.5, - bias=0.), + sampler=dict(type="ScoreHLRSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True, k=0.5, bias=0.0), isr=dict(k=2, bias=0), - carl=dict(k=1, bias=0.2))), - test_cfg=dict( - rpn=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0))) + carl=dict(k=1, bias=0.2), + ), + ), + test_cfg=dict(rpn=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0)), +) diff --git a/mmpose/configs/mmdet/pisa/retinanet-r50_fpn_pisa_1x_coco.py b/mmpose/configs/mmdet/pisa/retinanet-r50_fpn_pisa_1x_coco.py index 70f89e227ec64b5c7224375aac0cf7ae3a10a29e..1f4add64ffe0bbb49af8d7ce8df8029ef6f40436 100644 --- a/mmpose/configs/mmdet/pisa/retinanet-r50_fpn_pisa_1x_coco.py +++ b/mmpose/configs/mmdet/pisa/retinanet-r50_fpn_pisa_1x_coco.py @@ -1,7 +1,6 @@ -_base_ = '../retinanet/retinanet_r50_fpn_1x_coco.py' +_base_ = "../retinanet/retinanet_r50_fpn_1x_coco.py" model = dict( - bbox_head=dict( - type='PISARetinaHead', - loss_bbox=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0)), - train_cfg=dict(isr=dict(k=2., bias=0.), carl=dict(k=1., bias=0.2))) + bbox_head=dict(type="PISARetinaHead", loss_bbox=dict(type="SmoothL1Loss", beta=0.11, loss_weight=1.0)), + train_cfg=dict(isr=dict(k=2.0, bias=0.0), carl=dict(k=1.0, bias=0.2)), +) diff --git a/mmpose/configs/mmdet/pisa/retinanet_x101-32x4d_fpn_pisa_1x_coco.py b/mmpose/configs/mmdet/pisa/retinanet_x101-32x4d_fpn_pisa_1x_coco.py index 9caad45d34a9cde84a3c29ad45e3080bb831bb76..74730d188e9fc29412a8b5958e40ceacab6bbf1f 100644 --- a/mmpose/configs/mmdet/pisa/retinanet_x101-32x4d_fpn_pisa_1x_coco.py +++ b/mmpose/configs/mmdet/pisa/retinanet_x101-32x4d_fpn_pisa_1x_coco.py @@ -1,7 +1,6 @@ -_base_ = '../retinanet/retinanet_x101-32x4d_fpn_1x_coco.py' +_base_ = "../retinanet/retinanet_x101-32x4d_fpn_1x_coco.py" model = dict( - bbox_head=dict( - type='PISARetinaHead', - loss_bbox=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0)), - train_cfg=dict(isr=dict(k=2., bias=0.), carl=dict(k=1., bias=0.2))) + bbox_head=dict(type="PISARetinaHead", loss_bbox=dict(type="SmoothL1Loss", beta=0.11, loss_weight=1.0)), + train_cfg=dict(isr=dict(k=2.0, bias=0.0), carl=dict(k=1.0, bias=0.2)), +) diff --git a/mmpose/configs/mmdet/pisa/ssd300_pisa_coco.py b/mmpose/configs/mmdet/pisa/ssd300_pisa_coco.py index b10236baeb1925483c2fdb025d86c45d51ba0276..00b527062bec5774a9166f7f099bc857f61de4e0 100644 --- a/mmpose/configs/mmdet/pisa/ssd300_pisa_coco.py +++ b/mmpose/configs/mmdet/pisa/ssd300_pisa_coco.py @@ -1,7 +1,5 @@ -_base_ = '../ssd/ssd300_coco.py' +_base_ = "../ssd/ssd300_coco.py" -model = dict( - bbox_head=dict(type='PISASSDHead'), - train_cfg=dict(isr=dict(k=2., bias=0.), carl=dict(k=1., bias=0.2))) +model = dict(bbox_head=dict(type="PISASSDHead"), train_cfg=dict(isr=dict(k=2.0, bias=0.0), carl=dict(k=1.0, bias=0.2))) optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/pisa/ssd512_pisa_coco.py b/mmpose/configs/mmdet/pisa/ssd512_pisa_coco.py index 939c7f453d4d881324c3b0443b0696eb96b3df4f..ecf89e6a364810b8fdae64aba0ee0ae91e6e254a 100644 --- a/mmpose/configs/mmdet/pisa/ssd512_pisa_coco.py +++ b/mmpose/configs/mmdet/pisa/ssd512_pisa_coco.py @@ -1,7 +1,5 @@ -_base_ = '../ssd/ssd512_coco.py' +_base_ = "../ssd/ssd512_coco.py" -model = dict( - bbox_head=dict(type='PISASSDHead'), - train_cfg=dict(isr=dict(k=2., bias=0.), carl=dict(k=1., bias=0.2))) +model = dict(bbox_head=dict(type="PISASSDHead"), train_cfg=dict(isr=dict(k=2.0, bias=0.0), carl=dict(k=1.0, bias=0.2))) optim_wrapper = dict(clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/point_rend/point-rend_r50-caffe_fpn_ms-1x_coco.py b/mmpose/configs/mmdet/point_rend/point-rend_r50-caffe_fpn_ms-1x_coco.py index 8b17f5a340bad54a8fe9b366ccc7d5574f687b17..fd3109b51e6c6a71479441ec0fc810ec1e9fdacd 100644 --- a/mmpose/configs/mmdet/point_rend/point-rend_r50-caffe_fpn_ms-1x_coco.py +++ b/mmpose/configs/mmdet/point_rend/point-rend_r50-caffe_fpn_ms-1x_coco.py @@ -1,44 +1,37 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-1x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-1x_coco.py" # model settings model = dict( - type='PointRend', + type="PointRend", roi_head=dict( - type='PointRendRoIHead', + type="PointRendRoIHead", mask_roi_extractor=dict( - type='GenericRoIExtractor', - aggregation='concat', - roi_layer=dict( - _delete_=True, type='SimpleRoIAlign', output_size=14), + type="GenericRoIExtractor", + aggregation="concat", + roi_layer=dict(_delete_=True, type="SimpleRoIAlign", output_size=14), out_channels=256, - featmap_strides=[4]), + featmap_strides=[4], + ), mask_head=dict( _delete_=True, - type='CoarseMaskHead', + type="CoarseMaskHead", num_fcs=2, in_channels=256, conv_out_channels=256, fc_out_channels=1024, num_classes=80, - loss_mask=dict( - type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)), + loss_mask=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0), + ), point_head=dict( - type='MaskPointHead', + type="MaskPointHead", num_fcs=3, in_channels=256, fc_channels=256, num_classes=80, coarse_pred_each_layer=True, - loss_point=dict( - type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), + loss_point=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0), + ), + ), # model training and testing settings - train_cfg=dict( - rcnn=dict( - mask_size=7, - num_points=14 * 14, - oversample_ratio=3, - importance_sample_ratio=0.75)), - test_cfg=dict( - rcnn=dict( - subdivision_steps=5, - subdivision_num_points=28 * 28, - scale_factor=2))) + train_cfg=dict(rcnn=dict(mask_size=7, num_points=14 * 14, oversample_ratio=3, importance_sample_ratio=0.75)), + test_cfg=dict(rcnn=dict(subdivision_steps=5, subdivision_num_points=28 * 28, scale_factor=2)), +) diff --git a/mmpose/configs/mmdet/point_rend/point-rend_r50-caffe_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/point_rend/point-rend_r50-caffe_fpn_ms-3x_coco.py index b11faaa98ebc5b61f086a2297debda6769dc6270..b8b9e250ae93bdcea4e0d3d13b7395a1a1986463 100644 --- a/mmpose/configs/mmdet/point_rend/point-rend_r50-caffe_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/point_rend/point-rend_r50-caffe_fpn_ms-3x_coco.py @@ -1,18 +1,11 @@ -_base_ = './point-rend_r50-caffe_fpn_ms-1x_coco.py' +_base_ = "./point-rend_r50-caffe_fpn_ms-1x_coco.py" max_epochs = 36 # learning policy param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[28, 34], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[28, 34], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/pvt/retinanet_pvt-l_fpn_1x_coco.py b/mmpose/configs/mmdet/pvt/retinanet_pvt-l_fpn_1x_coco.py index 1a6f604bdb367106bc75680808ce6fabc2740ed1..8dcdd265c8887d8841212b16b7dc645c2a031518 100644 --- a/mmpose/configs/mmdet/pvt/retinanet_pvt-l_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/pvt/retinanet_pvt-l_fpn_1x_coco.py @@ -1,8 +1,8 @@ -_base_ = 'retinanet_pvt-t_fpn_1x_coco.py' +_base_ = "retinanet_pvt-t_fpn_1x_coco.py" model = dict( backbone=dict( - num_layers=[3, 8, 27, 3], - init_cfg=dict(checkpoint='https://github.com/whai362/PVT/' - 'releases/download/v2/pvt_large.pth'))) + num_layers=[3, 8, 27, 3], init_cfg=dict(checkpoint="https://github.com/whai362/PVT/" "releases/download/v2/pvt_large.pth") + ) +) # Enable automatic-mixed-precision training with AmpOptimWrapper. -optim_wrapper = dict(type='AmpOptimWrapper') +optim_wrapper = dict(type="AmpOptimWrapper") diff --git a/mmpose/configs/mmdet/pvt/retinanet_pvt-m_fpn_1x_coco.py b/mmpose/configs/mmdet/pvt/retinanet_pvt-m_fpn_1x_coco.py index b888f788b6c7310491751774238451bb7107dccc..619152883d65b5a9fd2dfd35dcd0e0679ef8c6f1 100644 --- a/mmpose/configs/mmdet/pvt/retinanet_pvt-m_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/pvt/retinanet_pvt-m_fpn_1x_coco.py @@ -1,6 +1,6 @@ -_base_ = 'retinanet_pvt-t_fpn_1x_coco.py' +_base_ = "retinanet_pvt-t_fpn_1x_coco.py" model = dict( backbone=dict( - num_layers=[3, 4, 18, 3], - init_cfg=dict(checkpoint='https://github.com/whai362/PVT/' - 'releases/download/v2/pvt_medium.pth'))) + num_layers=[3, 4, 18, 3], init_cfg=dict(checkpoint="https://github.com/whai362/PVT/" "releases/download/v2/pvt_medium.pth") + ) +) diff --git a/mmpose/configs/mmdet/pvt/retinanet_pvt-s_fpn_1x_coco.py b/mmpose/configs/mmdet/pvt/retinanet_pvt-s_fpn_1x_coco.py index 46603488bb3ceb4fc1052139da53340a3d595256..a16a8a766849912a7fbd538d5dd197f6889c3da9 100644 --- a/mmpose/configs/mmdet/pvt/retinanet_pvt-s_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/pvt/retinanet_pvt-s_fpn_1x_coco.py @@ -1,6 +1,4 @@ -_base_ = 'retinanet_pvt-t_fpn_1x_coco.py' +_base_ = "retinanet_pvt-t_fpn_1x_coco.py" model = dict( - backbone=dict( - num_layers=[3, 4, 6, 3], - init_cfg=dict(checkpoint='https://github.com/whai362/PVT/' - 'releases/download/v2/pvt_small.pth'))) + backbone=dict(num_layers=[3, 4, 6, 3], init_cfg=dict(checkpoint="https://github.com/whai362/PVT/" "releases/download/v2/pvt_small.pth")) +) diff --git a/mmpose/configs/mmdet/pvt/retinanet_pvt-t_fpn_1x_coco.py b/mmpose/configs/mmdet/pvt/retinanet_pvt-t_fpn_1x_coco.py index 5f67c444f262613d615b8b7331991ca7e2f57935..46b9886dadab1d305894508cb0c28ff0bb78e80e 100644 --- a/mmpose/configs/mmdet/pvt/retinanet_pvt-t_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/pvt/retinanet_pvt-t_fpn_1x_coco.py @@ -1,18 +1,18 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( - type='RetinaNet', + type="RetinaNet", backbone=dict( _delete_=True, - type='PyramidVisionTransformer', + type="PyramidVisionTransformer", num_layers=[2, 2, 2, 2], - init_cfg=dict(checkpoint='https://github.com/whai362/PVT/' - 'releases/download/v2/pvt_tiny.pth')), - neck=dict(in_channels=[64, 128, 320, 512])) + init_cfg=dict(checkpoint="https://github.com/whai362/PVT/" "releases/download/v2/pvt_tiny.pth"), + ), + neck=dict(in_channels=[64, 128, 320, 512]), +) # optimizer -optim_wrapper = dict( - optimizer=dict( - _delete_=True, type='AdamW', lr=0.0001, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(_delete_=True, type="AdamW", lr=0.0001, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b0_fpn_1x_coco.py b/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b0_fpn_1x_coco.py index cbebf90fb89d81bd2f4c0874dc2c82cf7c7393d0..f7e68ca119894a330a55bd9d0b07fefb0a2a6016 100644 --- a/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b0_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b0_fpn_1x_coco.py @@ -1,19 +1,19 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( - type='RetinaNet', + type="RetinaNet", backbone=dict( _delete_=True, - type='PyramidVisionTransformerV2', + type="PyramidVisionTransformerV2", embed_dims=32, num_layers=[2, 2, 2, 2], - init_cfg=dict(checkpoint='https://github.com/whai362/PVT/' - 'releases/download/v2/pvt_v2_b0.pth')), - neck=dict(in_channels=[32, 64, 160, 256])) + init_cfg=dict(checkpoint="https://github.com/whai362/PVT/" "releases/download/v2/pvt_v2_b0.pth"), + ), + neck=dict(in_channels=[32, 64, 160, 256]), +) # optimizer -optim_wrapper = dict( - optimizer=dict( - _delete_=True, type='AdamW', lr=0.0001, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(_delete_=True, type="AdamW", lr=0.0001, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b1_fpn_1x_coco.py b/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b1_fpn_1x_coco.py index 5374c50925f5c7ed8a761eda40dc4bf374df3aeb..5a8743220f66793311680c771d75eba4cba82fdd 100644 --- a/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b1_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b1_fpn_1x_coco.py @@ -1,7 +1,5 @@ -_base_ = 'retinanet_pvtv2-b0_fpn_1x_coco.py' +_base_ = "retinanet_pvtv2-b0_fpn_1x_coco.py" model = dict( - backbone=dict( - embed_dims=64, - init_cfg=dict(checkpoint='https://github.com/whai362/PVT/' - 'releases/download/v2/pvt_v2_b1.pth')), - neck=dict(in_channels=[64, 128, 320, 512])) + backbone=dict(embed_dims=64, init_cfg=dict(checkpoint="https://github.com/whai362/PVT/" "releases/download/v2/pvt_v2_b1.pth")), + neck=dict(in_channels=[64, 128, 320, 512]), +) diff --git a/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b2_fpn_1x_coco.py b/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b2_fpn_1x_coco.py index cf9a18debbe5f8b9918e0d086ad6d54d203ef310..428675c9dfa588eb08ffe680484a78b4a4aa6fe0 100644 --- a/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b2_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b2_fpn_1x_coco.py @@ -1,8 +1,9 @@ -_base_ = 'retinanet_pvtv2-b0_fpn_1x_coco.py' +_base_ = "retinanet_pvtv2-b0_fpn_1x_coco.py" model = dict( backbone=dict( embed_dims=64, num_layers=[3, 4, 6, 3], - init_cfg=dict(checkpoint='https://github.com/whai362/PVT/' - 'releases/download/v2/pvt_v2_b2.pth')), - neck=dict(in_channels=[64, 128, 320, 512])) + init_cfg=dict(checkpoint="https://github.com/whai362/PVT/" "releases/download/v2/pvt_v2_b2.pth"), + ), + neck=dict(in_channels=[64, 128, 320, 512]), +) diff --git a/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b3_fpn_1x_coco.py b/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b3_fpn_1x_coco.py index 7a47f820324af7fecf773640d7d1829b0c115471..40a6d612c06528813c4f88536a2486c964856daa 100644 --- a/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b3_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b3_fpn_1x_coco.py @@ -1,8 +1,9 @@ -_base_ = 'retinanet_pvtv2-b0_fpn_1x_coco.py' +_base_ = "retinanet_pvtv2-b0_fpn_1x_coco.py" model = dict( backbone=dict( embed_dims=64, num_layers=[3, 4, 18, 3], - init_cfg=dict(checkpoint='https://github.com/whai362/PVT/' - 'releases/download/v2/pvt_v2_b3.pth')), - neck=dict(in_channels=[64, 128, 320, 512])) + init_cfg=dict(checkpoint="https://github.com/whai362/PVT/" "releases/download/v2/pvt_v2_b3.pth"), + ), + neck=dict(in_channels=[64, 128, 320, 512]), +) diff --git a/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b4_fpn_1x_coco.py b/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b4_fpn_1x_coco.py index 5faf4c507ba89ffe614b2b9d34d452e4c106b0fe..90cd5de3e479d86835c59bb554bc77f6ba8254de 100644 --- a/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b4_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b4_fpn_1x_coco.py @@ -1,15 +1,14 @@ -_base_ = 'retinanet_pvtv2-b0_fpn_1x_coco.py' +_base_ = "retinanet_pvtv2-b0_fpn_1x_coco.py" model = dict( backbone=dict( embed_dims=64, num_layers=[3, 8, 27, 3], - init_cfg=dict(checkpoint='https://github.com/whai362/PVT/' - 'releases/download/v2/pvt_v2_b4.pth')), - neck=dict(in_channels=[64, 128, 320, 512])) + init_cfg=dict(checkpoint="https://github.com/whai362/PVT/" "releases/download/v2/pvt_v2_b4.pth"), + ), + neck=dict(in_channels=[64, 128, 320, 512]), +) # optimizer -optim_wrapper = dict( - optimizer=dict( - _delete_=True, type='AdamW', lr=0.0001 / 1.4, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(_delete_=True, type="AdamW", lr=0.0001 / 1.4, weight_decay=0.0001)) # dataset settings train_dataloader = dict(batch_size=1, num_workers=1) diff --git a/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b5_fpn_1x_coco.py b/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b5_fpn_1x_coco.py index afff8719ece41dbfbbe23e2259b9973bb29871f6..e5e4e07369039e890ec101fb99c761f192b6275f 100644 --- a/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/pvt/retinanet_pvtv2-b5_fpn_1x_coco.py @@ -1,16 +1,15 @@ -_base_ = 'retinanet_pvtv2-b0_fpn_1x_coco.py' +_base_ = "retinanet_pvtv2-b0_fpn_1x_coco.py" model = dict( backbone=dict( embed_dims=64, num_layers=[3, 6, 40, 3], mlp_ratios=(4, 4, 4, 4), - init_cfg=dict(checkpoint='https://github.com/whai362/PVT/' - 'releases/download/v2/pvt_v2_b5.pth')), - neck=dict(in_channels=[64, 128, 320, 512])) + init_cfg=dict(checkpoint="https://github.com/whai362/PVT/" "releases/download/v2/pvt_v2_b5.pth"), + ), + neck=dict(in_channels=[64, 128, 320, 512]), +) # optimizer -optim_wrapper = dict( - optimizer=dict( - _delete_=True, type='AdamW', lr=0.0001 / 1.4, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(_delete_=True, type="AdamW", lr=0.0001 / 1.4, weight_decay=0.0001)) # dataset settings train_dataloader = dict(batch_size=1, num_workers=1) diff --git a/mmpose/configs/mmdet/qdtrack/qdtrack_faster-rcnn_r50_fpn_4e_base.py b/mmpose/configs/mmdet/qdtrack/qdtrack_faster-rcnn_r50_fpn_4e_base.py index e3c17c3eb97eedef88949c841364b858a3a1d6e9..ec003e437a9d90798c381f8ea924eb48eac20cf0 100644 --- a/mmpose/configs/mmdet/qdtrack/qdtrack_faster-rcnn_r50_fpn_4e_base.py +++ b/mmpose/configs/mmdet/qdtrack/qdtrack_faster-rcnn_r50_fpn_4e_base.py @@ -1,81 +1,69 @@ -_base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/models/faster-rcnn_r50_fpn.py", "../_base_/default_runtime.py"] detector = _base_.model -detector.pop('data_preprocessor') +detector.pop("data_preprocessor") -detector['backbone'].update( +detector["backbone"].update( dict( - norm_cfg=dict(type='BN', requires_grad=False), - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe'))) -detector.rpn_head.loss_bbox.update( - dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)) + norm_cfg=dict(type="BN", requires_grad=False), + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ) +) +detector.rpn_head.loss_bbox.update(dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.0)) detector.rpn_head.bbox_coder.update(dict(clip_border=False)) detector.roi_head.bbox_head.update(dict(num_classes=1)) detector.roi_head.bbox_head.bbox_coder.update(dict(clip_border=False)) -detector['init_cfg'] = dict( - type='Pretrained', - checkpoint= # noqa: E251 - 'https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/' - 'faster_rcnn_r50_fpn_1x_coco-person/' - 'faster_rcnn_r50_fpn_1x_coco-person_20201216_175929-d022e227.pth' +detector["init_cfg"] = dict( + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/" # noqa: E251 + "faster_rcnn_r50_fpn_1x_coco-person/" + "faster_rcnn_r50_fpn_1x_coco-person_20201216_175929-d022e227.pth", # noqa: E501 ) del _base_.model model = dict( - type='QDTrack', + type="QDTrack", data_preprocessor=dict( - type='TrackDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="TrackDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), detector=detector, track_head=dict( - type='QuasiDenseTrackHead', + type="QuasiDenseTrackHead", roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), embed_head=dict( - type='QuasiDenseEmbedHead', + type="QuasiDenseEmbedHead", num_convs=4, num_fcs=1, embed_channels=256, - norm_cfg=dict(type='GN', num_groups=32), - loss_track=dict(type='MultiPosCrossEntropyLoss', loss_weight=0.25), - loss_track_aux=dict( - type='MarginL2Loss', - neg_pos_ub=3, - pos_margin=0, - neg_margin=0.1, - hard_mining=True, - loss_weight=1.0)), - loss_bbox=dict(type='L1Loss', loss_weight=1.0), + norm_cfg=dict(type="GN", num_groups=32), + loss_track=dict(type="MultiPosCrossEntropyLoss", loss_weight=0.25), + loss_track_aux=dict(type="MarginL2Loss", neg_pos_ub=3, pos_margin=0, neg_margin=0.1, hard_mining=True, loss_weight=1.0), + ), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), train_cfg=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), sampler=dict( - type='CombinedSampler', + type="CombinedSampler", num=256, pos_fraction=0.5, neg_pos_ub=3, add_gt_as_proposals=True, - pos_sampler=dict(type='InstanceBalancedPosSampler'), - neg_sampler=dict(type='RandomSampler')))), + pos_sampler=dict(type="InstanceBalancedPosSampler"), + neg_sampler=dict(type="RandomSampler"), + ), + ), + ), tracker=dict( - type='QuasiDenseTracker', + type="QuasiDenseTracker", init_score_thr=0.9, obj_score_thr=0.5, match_score_thr=0.5, @@ -86,33 +74,29 @@ model = dict( nms_backdrop_iou_thr=0.3, nms_class_iou_thr=0.7, with_cats=True, - match_metric='bisoftmax')) + match_metric="bisoftmax", + ), +) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001), clip_grad=dict(max_norm=35, norm_type=2) +) # learning policy -param_scheduler = [ - dict(type='MultiStepLR', begin=0, end=4, by_epoch=True, milestones=[3]) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=4, by_epoch=True, milestones=[3])] # runtime settings -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=4, val_interval=4) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=4, val_interval=4) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") -default_hooks = dict( - logger=dict(type='LoggerHook', interval=50), - visualization=dict(type='TrackVisualizationHook', draw=False)) +default_hooks = dict(logger=dict(type="LoggerHook", interval=50), visualization=dict(type="TrackVisualizationHook", draw=False)) -vis_backends = [dict(type='LocalVisBackend')] -visualizer = dict( - type='TrackLocalVisualizer', vis_backends=vis_backends, name='visualizer') +vis_backends = [dict(type="LocalVisBackend")] +visualizer = dict(type="TrackLocalVisualizer", vis_backends=vis_backends, name="visualizer") # custom hooks custom_hooks = [ # Synchronize model buffers such as running_mean and running_var in BN # at the end of each epoch - dict(type='SyncBuffersHook') + dict(type="SyncBuffersHook") ] diff --git a/mmpose/configs/mmdet/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py b/mmpose/configs/mmdet/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py index d87604dad6bf39028a8111708307482186118b19..2a59d081841d5fc10c5839ef55e7f3f73d8534a4 100644 --- a/mmpose/configs/mmdet/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py +++ b/mmpose/configs/mmdet/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py @@ -1,12 +1,12 @@ _base_ = [ - './qdtrack_faster-rcnn_r50_fpn_4e_base.py', - '../_base_/datasets/mot_challenge.py', + "./qdtrack_faster-rcnn_r50_fpn_4e_base.py", + "../_base_/datasets/mot_challenge.py", ] # evaluator val_evaluator = [ - dict(type='CocoVideoMetric', metric=['bbox'], classwise=True), - dict(type='MOTChallengeMetric', metric=['HOTA', 'CLEAR', 'Identity']) + dict(type="CocoVideoMetric", metric=["bbox"], classwise=True), + dict(type="MOTChallengeMetric", metric=["HOTA", "CLEAR", "Identity"]), ] test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/queryinst/queryinst_r101_fpn_300-proposals_crop-ms-480-800-3x_coco.py b/mmpose/configs/mmdet/queryinst/queryinst_r101_fpn_300-proposals_crop-ms-480-800-3x_coco.py index 1692c134698a98da33612487a9fb703117fdb8b6..82b0fac74ea50d85e9fb62b1623dae0b31e5055d 100644 --- a/mmpose/configs/mmdet/queryinst/queryinst_r101_fpn_300-proposals_crop-ms-480-800-3x_coco.py +++ b/mmpose/configs/mmdet/queryinst/queryinst_r101_fpn_300-proposals_crop-ms-480-800-3x_coco.py @@ -1,7 +1,3 @@ -_base_ = './queryinst_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py' +_base_ = "./queryinst_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/queryinst/queryinst_r101_fpn_ms-480-800-3x_coco.py b/mmpose/configs/mmdet/queryinst/queryinst_r101_fpn_ms-480-800-3x_coco.py index dd5b7f452e583eb362e0bb05f272a771d68b6e48..d1e92c59bc7bd021f2f21448da5697c19bdad679 100644 --- a/mmpose/configs/mmdet/queryinst/queryinst_r101_fpn_ms-480-800-3x_coco.py +++ b/mmpose/configs/mmdet/queryinst/queryinst_r101_fpn_ms-480-800-3x_coco.py @@ -1,7 +1,3 @@ -_base_ = './queryinst_r50_fpn_ms-480-800-3x_coco.py' +_base_ = "./queryinst_r50_fpn_ms-480-800-3x_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/queryinst/queryinst_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/queryinst/queryinst_r50_fpn_1x_coco.py index 63d61d78872b452bdd8d2607fc03181b169ea845..8d3018990b137bb69036e80cb6e260b59d638289 100644 --- a/mmpose/configs/mmdet/queryinst/queryinst_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/queryinst/queryinst_r50_fpn_1x_coco.py @@ -1,57 +1,49 @@ -_base_ = [ - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_instance.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] num_stages = 6 num_proposals = 100 model = dict( - type='QueryInst', + type="QueryInst", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_mask=True, - pad_size_divisor=32), + pad_size_divisor=32, + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=0, - add_extra_convs='on_input', - num_outs=4), - rpn_head=dict( - type='EmbeddingRPNHead', - num_proposals=num_proposals, - proposal_feature_channel=256), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=0, add_extra_convs="on_input", num_outs=4), + rpn_head=dict(type="EmbeddingRPNHead", num_proposals=num_proposals, proposal_feature_channel=256), roi_head=dict( - type='SparseRoIHead', + type="SparseRoIHead", num_stages=num_stages, stage_loss_weights=[1] * num_stages, proposal_feature_channel=256, bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=2), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=2), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), mask_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=2), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=2), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=[ dict( - type='DIIHead', + type="DIIHead", num_classes=80, num_ffn_fcs=2, num_heads=8, @@ -60,41 +52,38 @@ model = dict( feedforward_channels=2048, in_channels=256, dropout=0.0, - ffn_act_cfg=dict(type='ReLU', inplace=True), + ffn_act_cfg=dict(type="ReLU", inplace=True), dynamic_conv_cfg=dict( - type='DynamicConv', + type="DynamicConv", in_channels=256, feat_channels=64, out_channels=256, input_feat_shape=7, - act_cfg=dict(type='ReLU', inplace=True), - norm_cfg=dict(type='LN')), - loss_bbox=dict(type='L1Loss', loss_weight=5.0), - loss_iou=dict(type='GIoULoss', loss_weight=2.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=2.0), + act_cfg=dict(type="ReLU", inplace=True), + norm_cfg=dict(type="LN"), + ), + loss_bbox=dict(type="L1Loss", loss_weight=5.0), + loss_iou=dict(type="GIoULoss", loss_weight=2.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=2.0), bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - clip_border=False, - target_means=[0., 0., 0., 0.], - target_stds=[0.5, 0.5, 1., 1.])) for _ in range(num_stages) + type="DeltaXYWHBBoxCoder", clip_border=False, target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.5, 0.5, 1.0, 1.0] + ), + ) + for _ in range(num_stages) ], mask_head=[ dict( - type='DynamicMaskHead', + type="DynamicMaskHead", dynamic_conv_cfg=dict( - type='DynamicConv', + type="DynamicConv", in_channels=256, feat_channels=64, out_channels=256, input_feat_shape=14, with_proj=False, - act_cfg=dict(type='ReLU', inplace=True), - norm_cfg=dict(type='LN')), + act_cfg=dict(type="ReLU", inplace=True), + norm_cfg=dict(type="LN"), + ), num_convs=4, num_classes=80, roi_feat_size=14, @@ -102,54 +91,46 @@ model = dict( conv_kernel_size=3, conv_out_channels=256, class_agnostic=False, - norm_cfg=dict(type='BN'), - upsample_cfg=dict(type='deconv', scale_factor=2), - loss_mask=dict( - type='DiceLoss', - loss_weight=8.0, - use_sigmoid=True, - activate=False, - eps=1e-5)) for _ in range(num_stages) - ]), + norm_cfg=dict(type="BN"), + upsample_cfg=dict(type="deconv", scale_factor=2), + loss_mask=dict(type="DiceLoss", loss_weight=8.0, use_sigmoid=True, activate=False, eps=1e-5), + ) + for _ in range(num_stages) + ], + ), # training and testing settings train_cfg=dict( rpn=None, rcnn=[ dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='FocalLossCost', weight=2.0), - dict(type='BBoxL1Cost', weight=5.0, box_format='xyxy'), - dict(type='IoUCost', iou_mode='giou', weight=2.0) - ]), - sampler=dict(type='PseudoSampler'), + dict(type="FocalLossCost", weight=2.0), + dict(type="BBoxL1Cost", weight=5.0, box_format="xyxy"), + dict(type="IoUCost", iou_mode="giou", weight=2.0), + ], + ), + sampler=dict(type="PseudoSampler"), pos_weight=1, mask_size=28, - ) for _ in range(num_stages) - ]), - test_cfg=dict( - rpn=None, rcnn=dict(max_per_img=num_proposals, mask_thr_binary=0.5))) + ) + for _ in range(num_stages) + ], + ), + test_cfg=dict(rpn=None, rcnn=dict(max_per_img=num_proposals, mask_thr_binary=0.5)), +) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict( - _delete_=True, type='AdamW', lr=0.0001, weight_decay=0.0001), - paramwise_cfg=dict( - custom_keys={'backbone': dict(lr_mult=0.1, decay_mult=1.0)}), - clip_grad=dict(max_norm=0.1, norm_type=2)) + type="OptimWrapper", + optimizer=dict(_delete_=True, type="AdamW", lr=0.0001, weight_decay=0.0001), + paramwise_cfg=dict(custom_keys={"backbone": dict(lr_mult=0.1, decay_mult=1.0)}), + clip_grad=dict(max_norm=0.1, norm_type=2), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/queryinst/queryinst_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py b/mmpose/configs/mmdet/queryinst/queryinst_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py index 33ab061267bc9753f490acc57ed8d4193f1250b4..519d76e2dd0932b233e0af0c57ebddaa7236c10c 100644 --- a/mmpose/configs/mmdet/queryinst/queryinst_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py +++ b/mmpose/configs/mmdet/queryinst/queryinst_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py @@ -1,45 +1,60 @@ -_base_ = './queryinst_r50_fpn_ms-480-800-3x_coco.py' +_base_ = "./queryinst_r50_fpn_ms-480-800-3x_coco.py" num_proposals = 300 model = dict( rpn_head=dict(num_proposals=num_proposals), - test_cfg=dict( - _delete_=True, - rpn=None, - rcnn=dict(max_per_img=num_proposals, mask_thr_binary=0.5))) + test_cfg=dict(_delete_=True, rpn=None, rcnn=dict(max_per_img=num_proposals, mask_thr_binary=0.5)), +) # augmentation strategy originates from DETR. train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', - transforms=[[ - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoice", + transforms=[ + [ + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) + ], + [ + dict(type="RandomChoiceResize", scales=[(400, 1333), (500, 1333), (600, 1333)], keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], ], - [ - dict( - type='RandomChoiceResize', - scales=[(400, 1333), (500, 1333), (600, 1333)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), - (576, 1333), (608, 1333), (640, 1333), - (672, 1333), (704, 1333), (736, 1333), - (768, 1333), (800, 1333)], - keep_ratio=True) - ]]), - dict(type='PackDetInputs') + ), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/queryinst/queryinst_r50_fpn_ms-480-800-3x_coco.py b/mmpose/configs/mmdet/queryinst/queryinst_r50_fpn_ms-480-800-3x_coco.py index 6b99374ef4364dc76a60c2dd74377f92c15780ed..6a934a6fdf80b6405fe131edd1d449b8b5ff1e0d 100644 --- a/mmpose/configs/mmdet/queryinst/queryinst_r50_fpn_ms-480-800-3x_coco.py +++ b/mmpose/configs/mmdet/queryinst/queryinst_r50_fpn_ms-480-800-3x_coco.py @@ -1,32 +1,36 @@ -_base_ = './queryinst_r50_fpn_1x_coco.py' +_base_ = "./queryinst_r50_fpn_1x_coco.py" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) # learning policy max_epochs = 36 -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=max_epochs) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[27, 33], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[27, 33], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-1.6GF_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-1.6GF_fpn_ms-3x_coco.py index 74e6adaba5c262d45aaec876d1225b0061bb290b..0bef77402a78755be81770b4db02b5d730470330 100644 --- a/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-1.6GF_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-1.6GF_fpn_ms-3x_coco.py @@ -1,17 +1,14 @@ -_base_ = 'cascade-mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py' +_base_ = "cascade-mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py" model = dict( backbone=dict( - type='RegNet', - arch='regnetx_1.6gf', + type="RegNet", + arch="regnetx_1.6gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_1.6gf')), - neck=dict( - type='FPN', - in_channels=[72, 168, 408, 912], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_1.6gf"), + ), + neck=dict(type="FPN", in_channels=[72, 168, 408, 912], out_channels=256, num_outs=5), +) diff --git a/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py index ea219021260b6aa3a844eb6b4780e9669e50ed3b..f6902f6566bf1f1a26ccb62f3b79d5129a58580c 100644 --- a/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py @@ -1,28 +1,23 @@ -_base_ = [ - '../common/ms_3x_coco-instance.py', - '../_base_/models/cascade-mask-rcnn_r50_fpn.py' -] +_base_ = ["../common/ms_3x_coco-instance.py", "../_base_/models/cascade-mask-rcnn_r50_fpn.py"] model = dict( data_preprocessor=dict( # The mean and std are used in PyCls when training RegNets mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], - bgr_to_rgb=False), + bgr_to_rgb=False, + ), backbone=dict( _delete_=True, - type='RegNet', - arch='regnetx_3.2gf', + type="RegNet", + arch="regnetx_3.2gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_3.2gf')), - neck=dict( - type='FPN', - in_channels=[96, 192, 432, 1008], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_3.2gf"), + ), + neck=dict(type="FPN", in_channels=[96, 192, 432, 1008], out_channels=256, num_outs=5), +) optim_wrapper = dict(optimizer=dict(weight_decay=0.00005)) diff --git a/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-400MF_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-400MF_fpn_ms-3x_coco.py index 3fe47f837437163710ecd28f1bb217c643464965..e17220a4c4eeddabbb613e5685b44b779f8984ee 100644 --- a/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-400MF_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-400MF_fpn_ms-3x_coco.py @@ -1,17 +1,14 @@ -_base_ = 'cascade-mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py' +_base_ = "cascade-mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py" model = dict( backbone=dict( - type='RegNet', - arch='regnetx_400mf', + type="RegNet", + arch="regnetx_400mf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_400mf')), - neck=dict( - type='FPN', - in_channels=[32, 64, 160, 384], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_400mf"), + ), + neck=dict(type="FPN", in_channels=[32, 64, 160, 384], out_channels=256, num_outs=5), +) diff --git a/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-4GF_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-4GF_fpn_ms-3x_coco.py index e22886a80f92ba4269477a307b2689c45468381c..fcd509e206bdc2f7b437e6f4d863fee2521d5dcd 100644 --- a/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-4GF_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-4GF_fpn_ms-3x_coco.py @@ -1,17 +1,14 @@ -_base_ = 'cascade-mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py' +_base_ = "cascade-mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py" model = dict( backbone=dict( - type='RegNet', - arch='regnetx_4.0gf', + type="RegNet", + arch="regnetx_4.0gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_4.0gf')), - neck=dict( - type='FPN', - in_channels=[80, 240, 560, 1360], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_4.0gf"), + ), + neck=dict(type="FPN", in_channels=[80, 240, 560, 1360], out_channels=256, num_outs=5), +) diff --git a/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-800MF_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-800MF_fpn_ms-3x_coco.py index 655bdc60c772875e0a1ed871bd6bf02aab8e39cc..4d2ba676f48199636ccce3d1ec936a47a6586b8a 100644 --- a/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-800MF_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/cascade-mask-rcnn_regnetx-800MF_fpn_ms-3x_coco.py @@ -1,17 +1,14 @@ -_base_ = 'cascade-mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py' +_base_ = "cascade-mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py" model = dict( backbone=dict( - type='RegNet', - arch='regnetx_800mf', + type="RegNet", + arch="regnetx_800mf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_800mf')), - neck=dict( - type='FPN', - in_channels=[64, 128, 288, 672], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_800mf"), + ), + neck=dict(type="FPN", in_channels=[64, 128, 288, 672], out_channels=256, num_outs=5), +) diff --git a/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-1.6GF_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-1.6GF_fpn_ms-3x_coco.py index e9e8302bdd1537b825f36777e3211d27dec8fb0c..53cca39772802808ae106d11a67f43658d07c235 100644 --- a/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-1.6GF_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-1.6GF_fpn_ms-3x_coco.py @@ -1,17 +1,14 @@ -_base_ = 'faster-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py' +_base_ = "faster-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py" model = dict( backbone=dict( - type='RegNet', - arch='regnetx_1.6gf', + type="RegNet", + arch="regnetx_1.6gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_1.6gf')), - neck=dict( - type='FPN', - in_channels=[72, 168, 408, 912], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_1.6gf"), + ), + neck=dict(type="FPN", in_channels=[72, 168, 408, 912], out_channels=256, num_outs=5), +) diff --git a/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-3.2GF_fpn_1x_coco.py b/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-3.2GF_fpn_1x_coco.py index db49092e2fb7e1cf3dbcad2bb99aa08396ea35e7..727d0c93c5b33282ed270398684bcd1f294e90dc 100644 --- a/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-3.2GF_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-3.2GF_fpn_1x_coco.py @@ -1,30 +1,28 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( data_preprocessor=dict( # The mean and std are used in PyCls when training RegNets mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], - bgr_to_rgb=False), + bgr_to_rgb=False, + ), backbone=dict( _delete_=True, - type='RegNet', - arch='regnetx_3.2gf', + type="RegNet", + arch="regnetx_3.2gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_3.2gf')), - neck=dict( - type='FPN', - in_channels=[96, 192, 432, 1008], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_3.2gf"), + ), + neck=dict(type="FPN", in_channels=[96, 192, 432, 1008], out_channels=256, num_outs=5), +) -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.00005)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.00005)) diff --git a/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-3.2GF_fpn_2x_coco.py b/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-3.2GF_fpn_2x_coco.py index be533603085a89b65556b47f5e333fdde734bbd1..25768a0f8bcccb584c808b45a94d705a1468e86f 100644 --- a/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-3.2GF_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-3.2GF_fpn_2x_coco.py @@ -1,16 +1,9 @@ -_base_ = './faster-rcnn_regnetx-3.2GF_fpn_1x_coco.py' +_base_ = "./faster-rcnn_regnetx-3.2GF_fpn_1x_coco.py" # learning policy max_epochs = 24 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py index d3d5d5d689162d805c0cfb4d84f9a128faf90c25..2448bf9fa9047efc925a8996c5bc37f05a15f845 100644 --- a/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py @@ -1,25 +1,23 @@ -_base_ = ['../common/ms_3x_coco.py', '../_base_/models/faster-rcnn_r50_fpn.py'] +_base_ = ["../common/ms_3x_coco.py", "../_base_/models/faster-rcnn_r50_fpn.py"] model = dict( data_preprocessor=dict( # The mean and std are used in PyCls when training RegNets mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], - bgr_to_rgb=False), + bgr_to_rgb=False, + ), backbone=dict( _delete_=True, - type='RegNet', - arch='regnetx_3.2gf', + type="RegNet", + arch="regnetx_3.2gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_3.2gf')), - neck=dict( - type='FPN', - in_channels=[96, 192, 432, 1008], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_3.2gf"), + ), + neck=dict(type="FPN", in_channels=[96, 192, 432, 1008], out_channels=256, num_outs=5), +) optim_wrapper = dict(optimizer=dict(weight_decay=0.00005)) diff --git a/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-400MF_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-400MF_fpn_ms-3x_coco.py index 2edeff9c1f5a794ed14dc8723917986ac26e3d36..ab3e3c7f8a04e6d538e21c82b44e58dc6fbaf8be 100644 --- a/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-400MF_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-400MF_fpn_ms-3x_coco.py @@ -1,17 +1,14 @@ -_base_ = 'faster-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py' +_base_ = "faster-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py" model = dict( backbone=dict( - type='RegNet', - arch='regnetx_400mf', + type="RegNet", + arch="regnetx_400mf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_400mf')), - neck=dict( - type='FPN', - in_channels=[32, 64, 160, 384], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_400mf"), + ), + neck=dict(type="FPN", in_channels=[32, 64, 160, 384], out_channels=256, num_outs=5), +) diff --git a/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-4GF_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-4GF_fpn_ms-3x_coco.py index afcbb5d5d1a8aee47267d1f82fff8d40fa0d8e9b..1eb6a59c1e39916e8ffce9d680e07fe7b66a0893 100644 --- a/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-4GF_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-4GF_fpn_ms-3x_coco.py @@ -1,17 +1,14 @@ -_base_ = 'faster-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py' +_base_ = "faster-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py" model = dict( backbone=dict( - type='RegNet', - arch='regnetx_4.0gf', + type="RegNet", + arch="regnetx_4.0gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_4.0gf')), - neck=dict( - type='FPN', - in_channels=[80, 240, 560, 1360], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_4.0gf"), + ), + neck=dict(type="FPN", in_channels=[80, 240, 560, 1360], out_channels=256, num_outs=5), +) diff --git a/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-800MF_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-800MF_fpn_ms-3x_coco.py index f659ec9689068afd94aa3bc545d4fed91ffb5eb4..cdb54d7f4948789cf8b05cafacd45f42a532d74e 100644 --- a/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-800MF_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/faster-rcnn_regnetx-800MF_fpn_ms-3x_coco.py @@ -1,17 +1,14 @@ -_base_ = 'faster-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py' +_base_ = "faster-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py" model = dict( backbone=dict( - type='RegNet', - arch='regnetx_800mf', + type="RegNet", + arch="regnetx_800mf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_800mf')), - neck=dict( - type='FPN', - in_channels=[64, 128, 288, 672], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_800mf"), + ), + neck=dict(type="FPN", in_channels=[64, 128, 288, 672], out_channels=256, num_outs=5), +) diff --git a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-1.6GF_fpn_ms-poly-3x_coco.py b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-1.6GF_fpn_ms-poly-3x_coco.py index 60874c66dbc37df824a9c44bb8c28a441f7f84e4..535f4324d555d4ea31bc35c8dc9b82994467b889 100644 --- a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-1.6GF_fpn_ms-poly-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-1.6GF_fpn_ms-poly-3x_coco.py @@ -1,26 +1,18 @@ -_base_ = [ - '../common/ms-poly_3x_coco-instance.py', - '../_base_/models/mask-rcnn_r50_fpn.py' -] +_base_ = ["../common/ms-poly_3x_coco-instance.py", "../_base_/models/mask-rcnn_r50_fpn.py"] model = dict( backbone=dict( _delete_=True, - type='RegNet', - arch='regnetx_1.6gf', + type="RegNet", + arch="regnetx_1.6gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_1.6gf')), - neck=dict( - type='FPN', - in_channels=[72, 168, 408, 912], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_1.6gf"), + ), + neck=dict(type="FPN", in_channels=[72, 168, 408, 912], out_channels=256, num_outs=5), +) -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.00005), - clip_grad=dict(max_norm=35, norm_type=2)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.00005), clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-12GF_fpn_1x_coco.py b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-12GF_fpn_1x_coco.py index e82cecea010fb32143f809add198a052285a6897..14d6ae16af89c4bcebca3e9de86e14b5b9e463e4 100644 --- a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-12GF_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-12GF_fpn_1x_coco.py @@ -1,17 +1,14 @@ -_base_ = './mask-rcnn_regnetx-3.2GF_fpn_1x_coco.py' +_base_ = "./mask-rcnn_regnetx-3.2GF_fpn_1x_coco.py" model = dict( backbone=dict( - type='RegNet', - arch='regnetx_12gf', + type="RegNet", + arch="regnetx_12gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_12gf')), - neck=dict( - type='FPN', - in_channels=[224, 448, 896, 2240], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_12gf"), + ), + neck=dict(type="FPN", in_channels=[224, 448, 896, 2240], out_channels=256, num_outs=5), +) diff --git a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-3.2GF-mdconv-c3-c5_fpn_1x_coco.py b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-3.2GF-mdconv-c3-c5_fpn_1x_coco.py index c7c1d1ac3a7bd87bd210b4cd2194dd7e430f8d96..042607dbc3128529826b74f77ce967f9e29cea97 100644 --- a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-3.2GF-mdconv-c3-c5_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-3.2GF-mdconv-c3-c5_fpn_1x_coco.py @@ -1,7 +1,8 @@ -_base_ = 'mask-rcnn_regnetx-3.2GF_fpn_1x_coco.py' +_base_ = "mask-rcnn_regnetx-3.2GF_fpn_1x_coco.py" model = dict( backbone=dict( - dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), + dcn=dict(type="DCNv2", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_3.2gf'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_3.2gf"), + ) +) diff --git a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-3.2GF_fpn_1x_coco.py b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-3.2GF_fpn_1x_coco.py index c52bf13ff6df5cda353c21ac32a950602620dbde..078d5436daf86a10515ca7512a4e87b9e8eac4b4 100644 --- a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-3.2GF_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-3.2GF_fpn_1x_coco.py @@ -1,30 +1,28 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( data_preprocessor=dict( # The mean and std are used in PyCls when training RegNets mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], - bgr_to_rgb=False), + bgr_to_rgb=False, + ), backbone=dict( _delete_=True, - type='RegNet', - arch='regnetx_3.2gf', + type="RegNet", + arch="regnetx_3.2gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_3.2gf')), - neck=dict( - type='FPN', - in_channels=[96, 192, 432, 1008], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_3.2gf"), + ), + neck=dict(type="FPN", in_channels=[96, 192, 432, 1008], out_channels=256, num_outs=5), +) -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.00005)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.00005)) diff --git a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py index 36482c939dc3e600171b98bc159440e5fb740ffa..3acf1f4251ebf8d9dc95755103447bdc0bb7d653 100644 --- a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-3.2GF_fpn_ms-3x_coco.py @@ -1,60 +1,46 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( data_preprocessor=dict( # The mean and std are used in PyCls when training RegNets mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], - bgr_to_rgb=False), + bgr_to_rgb=False, + ), backbone=dict( _delete_=True, - type='RegNet', - arch='regnetx_3.2gf', + type="RegNet", + arch="regnetx_3.2gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_3.2gf')), - neck=dict( - type='FPN', - in_channels=[96, 192, 432, 1008], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_3.2gf"), + ), + neck=dict(type="FPN", in_channels=[96, 192, 432, 1008], out_channels=256, num_outs=5), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.00005), - clip_grad=dict(max_norm=35, norm_type=2)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.00005), clip_grad=dict(max_norm=35, norm_type=2)) # learning policy max_epochs = 36 train_cfg = dict(max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[28, 34], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[28, 34], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-400MF_fpn_ms-poly-3x_coco.py b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-400MF_fpn_ms-poly-3x_coco.py index b96e1921f0dae8ad6656a7785d9d4655f9f349b3..594f75290de4b3f2a7fb115d9929dc19f5a399b9 100644 --- a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-400MF_fpn_ms-poly-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-400MF_fpn_ms-poly-3x_coco.py @@ -1,26 +1,18 @@ -_base_ = [ - '../common/ms-poly_3x_coco-instance.py', - '../_base_/models/mask-rcnn_r50_fpn.py' -] +_base_ = ["../common/ms-poly_3x_coco-instance.py", "../_base_/models/mask-rcnn_r50_fpn.py"] model = dict( backbone=dict( _delete_=True, - type='RegNet', - arch='regnetx_400mf', + type="RegNet", + arch="regnetx_400mf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_400mf')), - neck=dict( - type='FPN', - in_channels=[32, 64, 160, 384], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_400mf"), + ), + neck=dict(type="FPN", in_channels=[32, 64, 160, 384], out_channels=256, num_outs=5), +) -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.00005), - clip_grad=dict(max_norm=35, norm_type=2)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.00005), clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-4GF_fpn_1x_coco.py b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-4GF_fpn_1x_coco.py index ce9f8ef4ffbcce66ec0184b3ff06a92425231597..4b33820ca9b2970522e5691de2c760981659d495 100644 --- a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-4GF_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-4GF_fpn_1x_coco.py @@ -1,17 +1,14 @@ -_base_ = './mask-rcnn_regnetx-3.2GF_fpn_1x_coco.py' +_base_ = "./mask-rcnn_regnetx-3.2GF_fpn_1x_coco.py" model = dict( backbone=dict( - type='RegNet', - arch='regnetx_4.0gf', + type="RegNet", + arch="regnetx_4.0gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_4.0gf')), - neck=dict( - type='FPN', - in_channels=[80, 240, 560, 1360], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_4.0gf"), + ), + neck=dict(type="FPN", in_channels=[80, 240, 560, 1360], out_channels=256, num_outs=5), +) diff --git a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-4GF_fpn_ms-poly-3x_coco.py b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-4GF_fpn_ms-poly-3x_coco.py index f160ccf66700d98a6403ed736928e529368e800c..292ad488c085884ae7834aa49388aa0a7daafdca 100644 --- a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-4GF_fpn_ms-poly-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-4GF_fpn_ms-poly-3x_coco.py @@ -1,26 +1,18 @@ -_base_ = [ - '../common/ms-poly_3x_coco-instance.py', - '../_base_/models/mask-rcnn_r50_fpn.py' -] +_base_ = ["../common/ms-poly_3x_coco-instance.py", "../_base_/models/mask-rcnn_r50_fpn.py"] model = dict( backbone=dict( _delete_=True, - type='RegNet', - arch='regnetx_4.0gf', + type="RegNet", + arch="regnetx_4.0gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_4.0gf')), - neck=dict( - type='FPN', - in_channels=[80, 240, 560, 1360], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_4.0gf"), + ), + neck=dict(type="FPN", in_channels=[80, 240, 560, 1360], out_channels=256, num_outs=5), +) -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.00005), - clip_grad=dict(max_norm=35, norm_type=2)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.00005), clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-6.4GF_fpn_1x_coco.py b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-6.4GF_fpn_1x_coco.py index e17a3d7695fa7ba9e135d7a436118aae29be4747..65b47bd22b3422ae1a97dfc7a9eea851a39d172c 100644 --- a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-6.4GF_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-6.4GF_fpn_1x_coco.py @@ -1,17 +1,14 @@ -_base_ = './mask-rcnn_regnetx-3.2GF_fpn_1x_coco.py' +_base_ = "./mask-rcnn_regnetx-3.2GF_fpn_1x_coco.py" model = dict( backbone=dict( - type='RegNet', - arch='regnetx_6.4gf', + type="RegNet", + arch="regnetx_6.4gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_6.4gf')), - neck=dict( - type='FPN', - in_channels=[168, 392, 784, 1624], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_6.4gf"), + ), + neck=dict(type="FPN", in_channels=[168, 392, 784, 1624], out_channels=256, num_outs=5), +) diff --git a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-800MF_fpn_ms-poly-3x_coco.py b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-800MF_fpn_ms-poly-3x_coco.py index 93851fdbb99e5d8e3a58062c7ad83d2acad14ac6..4322885058a1443f376be61797ba67cc1d39bd70 100644 --- a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-800MF_fpn_ms-poly-3x_coco.py +++ b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-800MF_fpn_ms-poly-3x_coco.py @@ -1,26 +1,18 @@ -_base_ = [ - '../common/ms-poly_3x_coco-instance.py', - '../_base_/models/mask-rcnn_r50_fpn.py' -] +_base_ = ["../common/ms-poly_3x_coco-instance.py", "../_base_/models/mask-rcnn_r50_fpn.py"] model = dict( backbone=dict( _delete_=True, - type='RegNet', - arch='regnetx_800mf', + type="RegNet", + arch="regnetx_800mf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_800mf')), - neck=dict( - type='FPN', - in_channels=[64, 128, 288, 672], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_800mf"), + ), + neck=dict(type="FPN", in_channels=[64, 128, 288, 672], out_channels=256, num_outs=5), +) -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.00005), - clip_grad=dict(max_norm=35, norm_type=2)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.00005), clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-8GF_fpn_1x_coco.py b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-8GF_fpn_1x_coco.py index 62a4c931512e6b46093b03fd4e80741a93151c6a..e27499b23a60cce2743a22b0b5ac3c5c897db3df 100644 --- a/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-8GF_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/regnet/mask-rcnn_regnetx-8GF_fpn_1x_coco.py @@ -1,17 +1,14 @@ -_base_ = './mask-rcnn_regnetx-3.2GF_fpn_1x_coco.py' +_base_ = "./mask-rcnn_regnetx-3.2GF_fpn_1x_coco.py" model = dict( backbone=dict( - type='RegNet', - arch='regnetx_8.0gf', + type="RegNet", + arch="regnetx_8.0gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_8.0gf')), - neck=dict( - type='FPN', - in_channels=[80, 240, 720, 1920], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_8.0gf"), + ), + neck=dict(type="FPN", in_channels=[80, 240, 720, 1920], out_channels=256, num_outs=5), +) diff --git a/mmpose/configs/mmdet/regnet/retinanet_regnetx-1.6GF_fpn_1x_coco.py b/mmpose/configs/mmdet/regnet/retinanet_regnetx-1.6GF_fpn_1x_coco.py index 7395c1bfbfa16670294c721f9f3135da9b9e69ae..2efe73732d322b7c95a76fd091ec6f68717c6d22 100644 --- a/mmpose/configs/mmdet/regnet/retinanet_regnetx-1.6GF_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/regnet/retinanet_regnetx-1.6GF_fpn_1x_coco.py @@ -1,17 +1,14 @@ -_base_ = './retinanet_regnetx-3.2GF_fpn_1x_coco.py' +_base_ = "./retinanet_regnetx-3.2GF_fpn_1x_coco.py" model = dict( backbone=dict( - type='RegNet', - arch='regnetx_1.6gf', + type="RegNet", + arch="regnetx_1.6gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_1.6gf')), - neck=dict( - type='FPN', - in_channels=[72, 168, 408, 912], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_1.6gf"), + ), + neck=dict(type="FPN", in_channels=[72, 168, 408, 912], out_channels=256, num_outs=5), +) diff --git a/mmpose/configs/mmdet/regnet/retinanet_regnetx-3.2GF_fpn_1x_coco.py b/mmpose/configs/mmdet/regnet/retinanet_regnetx-3.2GF_fpn_1x_coco.py index 8b8a32cec195901e2f1326bf62f4fa4508e744d2..9a11849de78798797ae82be0d16cdd912d8b1761 100644 --- a/mmpose/configs/mmdet/regnet/retinanet_regnetx-3.2GF_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/regnet/retinanet_regnetx-3.2GF_fpn_1x_coco.py @@ -1,31 +1,28 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( data_preprocessor=dict( # The mean and std are used in PyCls when training RegNets mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], - bgr_to_rgb=False), + bgr_to_rgb=False, + ), backbone=dict( _delete_=True, - type='RegNet', - arch='regnetx_3.2gf', + type="RegNet", + arch="regnetx_3.2gf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_3.2gf')), - neck=dict( - type='FPN', - in_channels=[96, 192, 432, 1008], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_3.2gf"), + ), + neck=dict(type="FPN", in_channels=[96, 192, 432, 1008], out_channels=256, num_outs=5), +) -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.00005), - clip_grad=dict(max_norm=35, norm_type=2)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.00005), clip_grad=dict(max_norm=35, norm_type=2)) diff --git a/mmpose/configs/mmdet/regnet/retinanet_regnetx-800MF_fpn_1x_coco.py b/mmpose/configs/mmdet/regnet/retinanet_regnetx-800MF_fpn_1x_coco.py index f6f8989320d6ffbcd55148471f62a962c52f9131..08b5484afb9ee9f8b60dcc0fcbb9a0e4133e6281 100644 --- a/mmpose/configs/mmdet/regnet/retinanet_regnetx-800MF_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/regnet/retinanet_regnetx-800MF_fpn_1x_coco.py @@ -1,17 +1,14 @@ -_base_ = './retinanet_regnetx-3.2GF_fpn_1x_coco.py' +_base_ = "./retinanet_regnetx-3.2GF_fpn_1x_coco.py" model = dict( backbone=dict( - type='RegNet', - arch='regnetx_800mf', + type="RegNet", + arch="regnetx_800mf", out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://regnetx_800mf')), - neck=dict( - type='FPN', - in_channels=[64, 128, 288, 672], - out_channels=256, - num_outs=5)) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://regnetx_800mf"), + ), + neck=dict(type="FPN", in_channels=[64, 128, 288, 672], out_channels=256, num_outs=5), +) diff --git a/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot15train80_test-mot15val20.py b/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot15train80_test-mot15val20.py index 4e30b22964d0504771678dbd0a551bc16a0714ea..2963c5e0fb77a526819ea7c76dd9fbadd17c28ad 100644 --- a/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot15train80_test-mot15val20.py +++ b/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot15train80_test-mot15val20.py @@ -1,7 +1,7 @@ -_base_ = ['./reid_r50_8xb32-6e_mot17train80_test-mot17val20.py'] +_base_ = ["./reid_r50_8xb32-6e_mot17train80_test-mot17val20.py"] model = dict(head=dict(num_classes=368)) # data -data_root = 'data/MOT15/' +data_root = "data/MOT15/" train_dataloader = dict(dataset=dict(data_root=data_root)) val_dataloader = dict(dataset=dict(data_root=data_root)) test_dataloader = val_dataloader diff --git a/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot16train80_test-mot16val20.py b/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot16train80_test-mot16val20.py index 468b9bfb2453f97c83282cc2f383c7592694269c..0eb6cf3de4bfe01c1cc8835657084985ffb60d35 100644 --- a/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot16train80_test-mot16val20.py +++ b/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot16train80_test-mot16val20.py @@ -1,7 +1,7 @@ -_base_ = ['./reid_r50_8xb32-6e_mot17train80_test-mot17val20.py'] +_base_ = ["./reid_r50_8xb32-6e_mot17train80_test-mot17val20.py"] model = dict(head=dict(num_classes=371)) # data -data_root = 'data/MOT16/' +data_root = "data/MOT16/" train_dataloader = dict(dataset=dict(data_root=data_root)) val_dataloader = dict(dataset=dict(data_root=data_root)) test_dataloader = val_dataloader diff --git a/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot17train80_test-mot17val20.py b/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot17train80_test-mot17val20.py index 83669de7c170c5de0e2054808ef7a76878bc1f24..e9b399fa75cd769f1a0e29a2db15e581df3d2616 100644 --- a/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot17train80_test-mot17val20.py +++ b/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot17train80_test-mot17val20.py @@ -1,61 +1,37 @@ -_base_ = [ - '../_base_/datasets/mot_challenge_reid.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/mot_challenge_reid.py", "../_base_/default_runtime.py"] model = dict( - type='BaseReID', - data_preprocessor=dict( - type='ReIDDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - backbone=dict( - type='mmpretrain.ResNet', - depth=50, - num_stages=4, - out_indices=(3, ), - style='pytorch'), - neck=dict(type='GlobalAveragePooling', kernel_size=(8, 4), stride=1), + type="BaseReID", + data_preprocessor=dict(type="ReIDDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + backbone=dict(type="mmpretrain.ResNet", depth=50, num_stages=4, out_indices=(3,), style="pytorch"), + neck=dict(type="GlobalAveragePooling", kernel_size=(8, 4), stride=1), head=dict( - type='LinearReIDHead', + type="LinearReIDHead", num_fcs=1, in_channels=2048, fc_channels=1024, out_channels=128, num_classes=380, - loss_cls=dict(type='mmpretrain.CrossEntropyLoss', loss_weight=1.0), - loss_triplet=dict(type='TripletLoss', margin=0.3, loss_weight=1.0), - norm_cfg=dict(type='BN1d'), - act_cfg=dict(type='ReLU')), + loss_cls=dict(type="mmpretrain.CrossEntropyLoss", loss_weight=1.0), + loss_triplet=dict(type="TripletLoss", margin=0.3, loss_weight=1.0), + norm_cfg=dict(type="BN1d"), + act_cfg=dict(type="ReLU"), + ), init_cfg=dict( - type='Pretrained', - checkpoint= # noqa: E251 - 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_batch256_imagenet_20200708-cfb998bf.pth' # noqa: E501 - )) + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_batch256_imagenet_20200708-cfb998bf.pth", # noqa: E251 # noqa: E501 + ), +) # optimizer -optim_wrapper = dict( - type='OptimWrapper', - clip_grad=None, - optimizer=dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", clip_grad=None, optimizer=dict(type="SGD", lr=0.1, momentum=0.9, weight_decay=0.0001)) # learning policy param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 1000, - by_epoch=False, - begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=6, - by_epoch=True, - milestones=[5], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 1000, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=6, by_epoch=True, milestones=[5], gamma=0.1), ] # train, val, test setting -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=6, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=6, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") diff --git a/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot20train80_test-mot20val20.py b/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot20train80_test-mot20val20.py index 8a807996186c35f91e23f6e0ec95a2191479c15b..13e8521b40c6f3716d18dfe200b08edc1659a510 100644 --- a/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot20train80_test-mot20val20.py +++ b/mmpose/configs/mmdet/reid/reid_r50_8xb32-6e_mot20train80_test-mot20val20.py @@ -1,10 +1,10 @@ -_base_ = ['./reid_r50_8xb32-6e_mot17train80_test-mot17val20.py'] +_base_ = ["./reid_r50_8xb32-6e_mot17train80_test-mot17val20.py"] model = dict(head=dict(num_classes=1701)) # data -data_root = 'data/MOT20/' +data_root = "data/MOT20/" train_dataloader = dict(dataset=dict(data_root=data_root)) val_dataloader = dict(dataset=dict(data_root=data_root)) test_dataloader = val_dataloader # train, val, test setting -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=6, val_interval=7) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=6, val_interval=7) diff --git a/mmpose/configs/mmdet/reppoints/reppoints-bbox_r50-center_fpn-gn_head-gn-grid_1x_coco.py b/mmpose/configs/mmdet/reppoints/reppoints-bbox_r50-center_fpn-gn_head-gn-grid_1x_coco.py index f116e53f6ded9468098733c1bab938831fee041d..a731ab445d2e88ee46305b823b77a116ee9c8fa7 100644 --- a/mmpose/configs/mmdet/reppoints/reppoints-bbox_r50-center_fpn-gn_head-gn-grid_1x_coco.py +++ b/mmpose/configs/mmdet/reppoints/reppoints-bbox_r50-center_fpn-gn_head-gn-grid_1x_coco.py @@ -1,2 +1,2 @@ -_base_ = './reppoints-moment_r50_fpn-gn_head-gn_1x_coco.py' -model = dict(bbox_head=dict(transform_method='minmax', use_grid_points=True)) +_base_ = "./reppoints-moment_r50_fpn-gn_head-gn_1x_coco.py" +model = dict(bbox_head=dict(transform_method="minmax", use_grid_points=True)) diff --git a/mmpose/configs/mmdet/reppoints/reppoints-bbox_r50_fpn-gn_head-gn-grid_1x_coco.py b/mmpose/configs/mmdet/reppoints/reppoints-bbox_r50_fpn-gn_head-gn-grid_1x_coco.py index 76be39b8de8f52d48c6cdd4626f23221e35164ab..145b409539ac33199928f6521c4fc965e10eb8da 100644 --- a/mmpose/configs/mmdet/reppoints/reppoints-bbox_r50_fpn-gn_head-gn-grid_1x_coco.py +++ b/mmpose/configs/mmdet/reppoints/reppoints-bbox_r50_fpn-gn_head-gn-grid_1x_coco.py @@ -1,13 +1,8 @@ -_base_ = './reppoints-moment_r50_fpn-gn_head-gn_1x_coco.py' +_base_ = "./reppoints-moment_r50_fpn-gn_head-gn_1x_coco.py" model = dict( - bbox_head=dict(transform_method='minmax', use_grid_points=True), + bbox_head=dict(transform_method="minmax", use_grid_points=True), # training and testing settings train_cfg=dict( - init=dict( - assigner=dict( - _delete_=True, - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1)))) + init=dict(assigner=dict(_delete_=True, type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1)) + ), +) diff --git a/mmpose/configs/mmdet/reppoints/reppoints-minmax_r50_fpn-gn_head-gn_1x_coco.py b/mmpose/configs/mmdet/reppoints/reppoints-minmax_r50_fpn-gn_head-gn_1x_coco.py index 0e7dffe77a062268737205fd86ab23f22cd85479..b9a25fab548321fb829066f78935d54e1d983869 100644 --- a/mmpose/configs/mmdet/reppoints/reppoints-minmax_r50_fpn-gn_head-gn_1x_coco.py +++ b/mmpose/configs/mmdet/reppoints/reppoints-minmax_r50_fpn-gn_head-gn_1x_coco.py @@ -1,2 +1,2 @@ -_base_ = './reppoints-moment_r50_fpn-gn_head-gn_1x_coco.py' -model = dict(bbox_head=dict(transform_method='minmax')) +_base_ = "./reppoints-moment_r50_fpn-gn_head-gn_1x_coco.py" +model = dict(bbox_head=dict(transform_method="minmax")) diff --git a/mmpose/configs/mmdet/reppoints/reppoints-moment_r101-dconv-c3-c5_fpn-gn_head-gn_2x_coco.py b/mmpose/configs/mmdet/reppoints/reppoints-moment_r101-dconv-c3-c5_fpn-gn_head-gn_2x_coco.py index 5c2bfab40020d7508ba90029ad29b24da8a7ad78..ebf5b8a74cc78f0befabc64078cd53f1fbd6b2dc 100644 --- a/mmpose/configs/mmdet/reppoints/reppoints-moment_r101-dconv-c3-c5_fpn-gn_head-gn_2x_coco.py +++ b/mmpose/configs/mmdet/reppoints/reppoints-moment_r101-dconv-c3-c5_fpn-gn_head-gn_2x_coco.py @@ -1,8 +1,9 @@ -_base_ = './reppoints-moment_r50_fpn-gn_head-gn_2x_coco.py' +_base_ = "./reppoints-moment_r50_fpn-gn_head-gn_2x_coco.py" model = dict( backbone=dict( depth=101, - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), + dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True), - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), + ) +) diff --git a/mmpose/configs/mmdet/reppoints/reppoints-moment_r101_fpn-gn_head-gn_2x_coco.py b/mmpose/configs/mmdet/reppoints/reppoints-moment_r101_fpn-gn_head-gn_2x_coco.py index 02c447ada075ca6b076a5e7ff2ed74fb3b80c30d..345462bbd191605220d5f2c76c2ee2e8b9c80575 100644 --- a/mmpose/configs/mmdet/reppoints/reppoints-moment_r101_fpn-gn_head-gn_2x_coco.py +++ b/mmpose/configs/mmdet/reppoints/reppoints-moment_r101_fpn-gn_head-gn_2x_coco.py @@ -1,6 +1,2 @@ -_base_ = './reppoints-moment_r50_fpn-gn_head-gn_2x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./reppoints-moment_r50_fpn-gn_head-gn_2x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/reppoints/reppoints-moment_r50_fpn-gn_head-gn_1x_coco.py b/mmpose/configs/mmdet/reppoints/reppoints-moment_r50_fpn-gn_head-gn_1x_coco.py index cedf2226b5ecd2e5dd207041523ab4a2627a1734..aa1648a33b4a655d712da08d6646cbe625734315 100644 --- a/mmpose/configs/mmdet/reppoints/reppoints-moment_r50_fpn-gn_head-gn_1x_coco.py +++ b/mmpose/configs/mmdet/reppoints/reppoints-moment_r50_fpn-gn_head-gn_1x_coco.py @@ -1,3 +1,3 @@ -_base_ = './reppoints-moment_r50_fpn_1x_coco.py' -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +_base_ = "./reppoints-moment_r50_fpn_1x_coco.py" +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict(neck=dict(norm_cfg=norm_cfg), bbox_head=dict(norm_cfg=norm_cfg)) diff --git a/mmpose/configs/mmdet/reppoints/reppoints-moment_r50_fpn-gn_head-gn_2x_coco.py b/mmpose/configs/mmdet/reppoints/reppoints-moment_r50_fpn-gn_head-gn_2x_coco.py index 4490d4496af6d680fbed2eedcaf73e138afff0cc..ff23aaaa56524e2411e79c9d59d184b6d142eafb 100644 --- a/mmpose/configs/mmdet/reppoints/reppoints-moment_r50_fpn-gn_head-gn_2x_coco.py +++ b/mmpose/configs/mmdet/reppoints/reppoints-moment_r50_fpn-gn_head-gn_2x_coco.py @@ -1,17 +1,9 @@ -_base_ = './reppoints-moment_r50_fpn-gn_head-gn_1x_coco.py' +_base_ = "./reppoints-moment_r50_fpn-gn_head-gn_1x_coco.py" max_epochs = 24 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/reppoints/reppoints-moment_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/reppoints/reppoints-moment_r50_fpn_1x_coco.py index df7e72a80c66f42fe8554cfb344fee87ee5fe24a..5ac675219b9d8f4fe777a34738138086399dc0b8 100644 --- a/mmpose/configs/mmdet/reppoints/reppoints-moment_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/reppoints/reppoints-moment_r50_fpn_1x_coco.py @@ -1,34 +1,23 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] model = dict( - type='RepPointsDetector', + type="RepPointsDetector", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_input', - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_input", num_outs=5), bbox_head=dict( - type='RepPointsHead', + type="RepPointsHead", num_classes=80, in_channels=256, feat_channels=256, @@ -38,37 +27,22 @@ model = dict( gradient_mul=0.1, point_strides=[8, 16, 32, 64, 128], point_base_scale=4, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5), - loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0), - transform_method='moment'), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox_init=dict(type="SmoothL1Loss", beta=0.11, loss_weight=0.5), + loss_bbox_refine=dict(type="SmoothL1Loss", beta=0.11, loss_weight=1.0), + transform_method="moment", + ), # training and testing settings train_cfg=dict( - init=dict( - assigner=dict(type='PointAssigner', scale=4, pos_num=1), - allowed_border=-1, - pos_weight=-1, - debug=False), + init=dict(assigner=dict(type="PointAssigner", scale=4, pos_num=1), allowed_border=-1, pos_weight=-1, debug=False), refine=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, - debug=False)), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100)) + debug=False, + ), + ), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), +) optim_wrapper = dict(optimizer=dict(lr=0.01)) diff --git a/mmpose/configs/mmdet/reppoints/reppoints-moment_x101-dconv-c3-c5_fpn-gn_head-gn_2x_coco.py b/mmpose/configs/mmdet/reppoints/reppoints-moment_x101-dconv-c3-c5_fpn-gn_head-gn_2x_coco.py index a9909efe511da9423859de6ce096b1b1524a9b6f..1f934b66ac92dbc4071af8d4f1b2c6e79903ef67 100644 --- a/mmpose/configs/mmdet/reppoints/reppoints-moment_x101-dconv-c3-c5_fpn-gn_head-gn_2x_coco.py +++ b/mmpose/configs/mmdet/reppoints/reppoints-moment_x101-dconv-c3-c5_fpn-gn_head-gn_2x_coco.py @@ -1,16 +1,17 @@ -_base_ = './reppoints-moment_r50_fpn-gn_head-gn_2x_coco.py' +_base_ = "./reppoints-moment_r50_fpn-gn_head-gn_2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - dcn=dict(type='DCN', deform_groups=1, fallback_on_stride=False), + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + dcn=dict(type="DCN", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/reppoints/reppoints-partial-minmax_r50_fpn-gn_head-gn_1x_coco.py b/mmpose/configs/mmdet/reppoints/reppoints-partial-minmax_r50_fpn-gn_head-gn_1x_coco.py index 30f7844b8344110896c5d885bd0ca340322045e4..fa5c84aa4f767d6e68c01995af65dea0ef2f09d4 100644 --- a/mmpose/configs/mmdet/reppoints/reppoints-partial-minmax_r50_fpn-gn_head-gn_1x_coco.py +++ b/mmpose/configs/mmdet/reppoints/reppoints-partial-minmax_r50_fpn-gn_head-gn_1x_coco.py @@ -1,2 +1,2 @@ -_base_ = './reppoints-moment_r50_fpn-gn_head-gn_1x_coco.py' -model = dict(bbox_head=dict(transform_method='partial_minmax')) +_base_ = "./reppoints-moment_r50_fpn-gn_head-gn_1x_coco.py" +model = dict(bbox_head=dict(transform_method="partial_minmax")) diff --git a/mmpose/configs/mmdet/res2net/cascade-mask-rcnn_res2net-101_fpn_20e_coco.py b/mmpose/configs/mmdet/res2net/cascade-mask-rcnn_res2net-101_fpn_20e_coco.py index 21b6d2ea1c0167b8dd643211b520ac89ddd63e10..2cd25a2300230269d0eef40ca778d6305f7199bb 100644 --- a/mmpose/configs/mmdet/res2net/cascade-mask-rcnn_res2net-101_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/res2net/cascade-mask-rcnn_res2net-101_fpn_20e_coco.py @@ -1,10 +1,10 @@ -_base_ = '../cascade_rcnn/cascade-mask-rcnn_r50_fpn_20e_coco.py' +_base_ = "../cascade_rcnn/cascade-mask-rcnn_r50_fpn_20e_coco.py" model = dict( backbone=dict( - type='Res2Net', + type="Res2Net", depth=101, scales=4, base_width=26, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://res2net101_v1d_26w_4s'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://res2net101_v1d_26w_4s"), + ) +) diff --git a/mmpose/configs/mmdet/res2net/cascade-rcnn_res2net-101_fpn_20e_coco.py b/mmpose/configs/mmdet/res2net/cascade-rcnn_res2net-101_fpn_20e_coco.py index 670a77454e060f8f639dbdc40064b71cd82520e9..6a7e97355a1534e05fb924af1142cef545fb43be 100644 --- a/mmpose/configs/mmdet/res2net/cascade-rcnn_res2net-101_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/res2net/cascade-rcnn_res2net-101_fpn_20e_coco.py @@ -1,10 +1,10 @@ -_base_ = '../cascade_rcnn/cascade-rcnn_r50_fpn_20e_coco.py' +_base_ = "../cascade_rcnn/cascade-rcnn_r50_fpn_20e_coco.py" model = dict( backbone=dict( - type='Res2Net', + type="Res2Net", depth=101, scales=4, base_width=26, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://res2net101_v1d_26w_4s'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://res2net101_v1d_26w_4s"), + ) +) diff --git a/mmpose/configs/mmdet/res2net/faster-rcnn_res2net-101_fpn_2x_coco.py b/mmpose/configs/mmdet/res2net/faster-rcnn_res2net-101_fpn_2x_coco.py index 033cf574962f51a75c3fce1e74a22efb9c6320f2..aabc2ab788f72889d46c31eab0c91372d74f0071 100644 --- a/mmpose/configs/mmdet/res2net/faster-rcnn_res2net-101_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/res2net/faster-rcnn_res2net-101_fpn_2x_coco.py @@ -1,10 +1,10 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_2x_coco.py' +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_2x_coco.py" model = dict( backbone=dict( - type='Res2Net', + type="Res2Net", depth=101, scales=4, base_width=26, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://res2net101_v1d_26w_4s'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://res2net101_v1d_26w_4s"), + ) +) diff --git a/mmpose/configs/mmdet/res2net/htc_res2net-101_fpn_20e_coco.py b/mmpose/configs/mmdet/res2net/htc_res2net-101_fpn_20e_coco.py index d5542fda4c8181a417f14817180296e84944b832..5d89f3031dcf8c145f879235397222c3c097545f 100644 --- a/mmpose/configs/mmdet/res2net/htc_res2net-101_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/res2net/htc_res2net-101_fpn_20e_coco.py @@ -1,10 +1,10 @@ -_base_ = '../htc/htc_r50_fpn_20e_coco.py' +_base_ = "../htc/htc_r50_fpn_20e_coco.py" model = dict( backbone=dict( - type='Res2Net', + type="Res2Net", depth=101, scales=4, base_width=26, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://res2net101_v1d_26w_4s'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://res2net101_v1d_26w_4s"), + ) +) diff --git a/mmpose/configs/mmdet/res2net/mask-rcnn_res2net-101_fpn_2x_coco.py b/mmpose/configs/mmdet/res2net/mask-rcnn_res2net-101_fpn_2x_coco.py index 3a2d57304d07d9b3dbc58ee9a5d8f2355c6b4427..8e81bdfcfba894bc2d578027728ffdbd0285425d 100644 --- a/mmpose/configs/mmdet/res2net/mask-rcnn_res2net-101_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/res2net/mask-rcnn_res2net-101_fpn_2x_coco.py @@ -1,10 +1,10 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_2x_coco.py' +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_2x_coco.py" model = dict( backbone=dict( - type='Res2Net', + type="Res2Net", depth=101, scales=4, base_width=26, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://res2net101_v1d_26w_4s'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://res2net101_v1d_26w_4s"), + ) +) diff --git a/mmpose/configs/mmdet/resnest/cascade-mask-rcnn_s101_fpn_syncbn-backbone+head_ms-1x_coco.py b/mmpose/configs/mmdet/resnest/cascade-mask-rcnn_s101_fpn_syncbn-backbone+head_ms-1x_coco.py index f4f19925788acc357e9720513d4f388598927a70..5e3d1f7f3562bd5d263d00dfff80d922574c0430 100644 --- a/mmpose/configs/mmdet/resnest/cascade-mask-rcnn_s101_fpn_syncbn-backbone+head_ms-1x_coco.py +++ b/mmpose/configs/mmdet/resnest/cascade-mask-rcnn_s101_fpn_syncbn-backbone+head_ms-1x_coco.py @@ -1,7 +1,2 @@ -_base_ = './cascade-mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py' -model = dict( - backbone=dict( - stem_channels=128, - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='open-mmlab://resnest101'))) +_base_ = "./cascade-mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py" +model = dict(backbone=dict(stem_channels=128, depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnest101"))) diff --git a/mmpose/configs/mmdet/resnest/cascade-mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py b/mmpose/configs/mmdet/resnest/cascade-mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py index c6ef41c05cd97d19320c02fb065b0cde1dda54d7..7a17baab7b39eb33580c3f93490a99743a4af90f 100644 --- a/mmpose/configs/mmdet/resnest/cascade-mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py +++ b/mmpose/configs/mmdet/resnest/cascade-mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py @@ -1,14 +1,11 @@ -_base_ = '../cascade_rcnn/cascade-mask-rcnn_r50_fpn_1x_coco.py' -norm_cfg = dict(type='SyncBN', requires_grad=True) +_base_ = "../cascade_rcnn/cascade-mask-rcnn_r50_fpn_1x_coco.py" +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( # use ResNeSt img_norm - data_preprocessor=dict( - mean=[123.68, 116.779, 103.939], - std=[58.393, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(mean=[123.68, 116.779, 103.939], std=[58.393, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeSt', + type="ResNeSt", stem_channels=64, depth=50, radix=2, @@ -19,83 +16,61 @@ model = dict( frozen_stages=1, norm_cfg=norm_cfg, norm_eval=False, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='open-mmlab://resnest50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnest50"), + ), roi_head=dict( bbox_head=[ dict( - type='Shared4Conv1FCBBoxHead', + type="Shared4Conv1FCBBoxHead", in_channels=256, conv_out_channels=256, fc_out_channels=1024, norm_cfg=norm_cfg, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared4Conv1FCBBoxHead', + type="Shared4Conv1FCBBoxHead", in_channels=256, conv_out_channels=256, fc_out_channels=1024, norm_cfg=norm_cfg, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.05, 0.05, 0.1, 0.1]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared4Conv1FCBBoxHead', + type="Shared4Conv1FCBBoxHead", in_channels=256, conv_out_channels=256, fc_out_channels=1024, norm_cfg=norm_cfg, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.033, 0.033, 0.067, 0.067]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), ], - mask_head=dict(norm_cfg=norm_cfg))) + mask_head=dict(norm_cfg=norm_cfg), + ), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/resnest/cascade-rcnn_s101_fpn_syncbn-backbone+head_ms-range-1x_coco.py b/mmpose/configs/mmdet/resnest/cascade-rcnn_s101_fpn_syncbn-backbone+head_ms-range-1x_coco.py index 9dbf3fae5ffb9382b053852c35e263f109668020..0b62c7940dc72b17915e33bf75a9db329bef673e 100644 --- a/mmpose/configs/mmdet/resnest/cascade-rcnn_s101_fpn_syncbn-backbone+head_ms-range-1x_coco.py +++ b/mmpose/configs/mmdet/resnest/cascade-rcnn_s101_fpn_syncbn-backbone+head_ms-range-1x_coco.py @@ -1,7 +1,2 @@ -_base_ = './cascade-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py' -model = dict( - backbone=dict( - stem_channels=128, - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='open-mmlab://resnest101'))) +_base_ = "./cascade-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py" +model = dict(backbone=dict(stem_channels=128, depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnest101"))) diff --git a/mmpose/configs/mmdet/resnest/cascade-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py b/mmpose/configs/mmdet/resnest/cascade-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py index 7ce7b56320a6511376237710c25061edd44b17dd..3b34c9f73fd76ff6732355877a4996f16637ac36 100644 --- a/mmpose/configs/mmdet/resnest/cascade-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py +++ b/mmpose/configs/mmdet/resnest/cascade-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py @@ -1,13 +1,10 @@ -_base_ = '../cascade_rcnn/cascade-rcnn_r50_fpn_1x_coco.py' -norm_cfg = dict(type='SyncBN', requires_grad=True) +_base_ = "../cascade_rcnn/cascade-rcnn_r50_fpn_1x_coco.py" +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( # use ResNeSt img_norm - data_preprocessor=dict( - mean=[123.68, 116.779, 103.939], - std=[58.393, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(mean=[123.68, 116.779, 103.939], std=[58.393, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeSt', + type="ResNeSt", stem_channels=64, depth=50, radix=2, @@ -18,76 +15,60 @@ model = dict( frozen_stages=1, norm_cfg=norm_cfg, norm_eval=False, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='open-mmlab://resnest50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnest50"), + ), roi_head=dict( bbox_head=[ dict( - type='Shared4Conv1FCBBoxHead', + type="Shared4Conv1FCBBoxHead", in_channels=256, conv_out_channels=256, fc_out_channels=1024, norm_cfg=norm_cfg, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared4Conv1FCBBoxHead', + type="Shared4Conv1FCBBoxHead", in_channels=256, conv_out_channels=256, fc_out_channels=1024, norm_cfg=norm_cfg, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.05, 0.05, 0.1, 0.1]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared4Conv1FCBBoxHead', + type="Shared4Conv1FCBBoxHead", in_channels=256, conv_out_channels=256, fc_out_channels=1024, norm_cfg=norm_cfg, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.033, 0.033, 0.067, 0.067]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) - ], )) + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), + ], + ), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', scale=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/resnest/faster-rcnn_s101_fpn_syncbn-backbone+head_ms-range-1x_coco.py b/mmpose/configs/mmdet/resnest/faster-rcnn_s101_fpn_syncbn-backbone+head_ms-range-1x_coco.py index f1e16321adff643d593268f868c09f5a318e7e93..070fee0a60e4f905579d3ae4509a1d3d001d621b 100644 --- a/mmpose/configs/mmdet/resnest/faster-rcnn_s101_fpn_syncbn-backbone+head_ms-range-1x_coco.py +++ b/mmpose/configs/mmdet/resnest/faster-rcnn_s101_fpn_syncbn-backbone+head_ms-range-1x_coco.py @@ -1,7 +1,2 @@ -_base_ = './faster-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py' -model = dict( - backbone=dict( - stem_channels=128, - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='open-mmlab://resnest101'))) +_base_ = "./faster-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py" +model = dict(backbone=dict(stem_channels=128, depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnest101"))) diff --git a/mmpose/configs/mmdet/resnest/faster-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py b/mmpose/configs/mmdet/resnest/faster-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py index 8f0ec6e07af1fcd250171cb769252eeb03f92da8..4a4fcd6b997f348fa199953d92902798ffbb9ef2 100644 --- a/mmpose/configs/mmdet/resnest/faster-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py +++ b/mmpose/configs/mmdet/resnest/faster-rcnn_s50_fpn_syncbn-backbone+head_ms-range-1x_coco.py @@ -1,13 +1,10 @@ -_base_ = '../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py' -norm_cfg = dict(type='SyncBN', requires_grad=True) +_base_ = "../faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py" +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( # use ResNeSt img_norm - data_preprocessor=dict( - mean=[123.68, 116.779, 103.939], - std=[58.393, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(mean=[123.68, 116.779, 103.939], std=[58.393, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeSt', + type="ResNeSt", stem_channels=64, depth=50, radix=2, @@ -18,22 +15,18 @@ model = dict( frozen_stages=1, norm_cfg=norm_cfg, norm_eval=False, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='open-mmlab://resnest50')), - roi_head=dict( - bbox_head=dict( - type='Shared4Conv1FCBBoxHead', - conv_out_channels=256, - norm_cfg=norm_cfg))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnest50"), + ), + roi_head=dict(bbox_head=dict(type="Shared4Conv1FCBBoxHead", conv_out_channels=256, norm_cfg=norm_cfg)), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', scale=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/resnest/mask-rcnn_s101_fpn_syncbn-backbone+head_ms-1x_coco.py b/mmpose/configs/mmdet/resnest/mask-rcnn_s101_fpn_syncbn-backbone+head_ms-1x_coco.py index 3edf49f052f1f3c875cca2c061276cc1aca77604..635a05e8453b03a5c5e7ab85b170859819cf34e5 100644 --- a/mmpose/configs/mmdet/resnest/mask-rcnn_s101_fpn_syncbn-backbone+head_ms-1x_coco.py +++ b/mmpose/configs/mmdet/resnest/mask-rcnn_s101_fpn_syncbn-backbone+head_ms-1x_coco.py @@ -1,7 +1,2 @@ -_base_ = './mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py' -model = dict( - backbone=dict( - stem_channels=128, - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='open-mmlab://resnest101'))) +_base_ = "./mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py" +model = dict(backbone=dict(stem_channels=128, depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnest101"))) diff --git a/mmpose/configs/mmdet/resnest/mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py b/mmpose/configs/mmdet/resnest/mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py index c6f27000862d74e23a665f3bf8caae0ec4a3d6f5..0fb0d0cb22db436b03d00daea54d60fd2e6bbfc0 100644 --- a/mmpose/configs/mmdet/resnest/mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py +++ b/mmpose/configs/mmdet/resnest/mask-rcnn_s50_fpn_syncbn-backbone+head_ms-1x_coco.py @@ -1,13 +1,10 @@ -_base_ = '../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py' -norm_cfg = dict(type='SyncBN', requires_grad=True) +_base_ = "../mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py" +norm_cfg = dict(type="SyncBN", requires_grad=True) model = dict( # use ResNeSt img_norm - data_preprocessor=dict( - mean=[123.68, 116.779, 103.939], - std=[58.393, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(mean=[123.68, 116.779, 103.939], std=[58.393, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNeSt', + type="ResNeSt", stem_channels=64, depth=50, radix=2, @@ -18,29 +15,20 @@ model = dict( frozen_stages=1, norm_cfg=norm_cfg, norm_eval=False, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='open-mmlab://resnest50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnest50"), + ), roi_head=dict( - bbox_head=dict( - type='Shared4Conv1FCBBoxHead', - conv_out_channels=256, - norm_cfg=norm_cfg), - mask_head=dict(norm_cfg=norm_cfg))) + bbox_head=dict(type="Shared4Conv1FCBBoxHead", conv_out_channels=256, norm_cfg=norm_cfg), mask_head=dict(norm_cfg=norm_cfg) + ), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/resnet_strikes_back/cascade-mask-rcnn_r50-rsb-pre_fpn_1x_coco.py b/mmpose/configs/mmdet/resnet_strikes_back/cascade-mask-rcnn_r50-rsb-pre_fpn_1x_coco.py index de7b95b0863d1ea89382fd9fa5852eccf0f34150..6d61f7cddae72261f5d8bb383062e1768b1720bf 100644 --- a/mmpose/configs/mmdet/resnet_strikes_back/cascade-mask-rcnn_r50-rsb-pre_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/resnet_strikes_back/cascade-mask-rcnn_r50-rsb-pre_fpn_1x_coco.py @@ -1,15 +1,14 @@ _base_ = [ - '../_base_/models/cascade-mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth' # noqa -model = dict( - backbone=dict( - init_cfg=dict( - type='Pretrained', prefix='backbone.', checkpoint=checkpoint))) +checkpoint = "https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth" # noqa +model = dict(backbone=dict(init_cfg=dict(type="Pretrained", prefix="backbone.", checkpoint=checkpoint))) optim_wrapper = dict( - optimizer=dict(_delete_=True, type='AdamW', lr=0.0002, weight_decay=0.05), - paramwise_cfg=dict(norm_decay_mult=0., bypass_duplicate=True)) + optimizer=dict(_delete_=True, type="AdamW", lr=0.0002, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0.0, bypass_duplicate=True), +) diff --git a/mmpose/configs/mmdet/resnet_strikes_back/faster-rcnn_r50-rsb-pre_fpn_1x_coco.py b/mmpose/configs/mmdet/resnet_strikes_back/faster-rcnn_r50-rsb-pre_fpn_1x_coco.py index 8c60f66a7ba8e5b6a7ee6af06e771b3c6ad71f6c..2b51c5b05903d49aff75344256a47f15d38a4b42 100644 --- a/mmpose/configs/mmdet/resnet_strikes_back/faster-rcnn_r50-rsb-pre_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/resnet_strikes_back/faster-rcnn_r50-rsb-pre_fpn_1x_coco.py @@ -1,15 +1,14 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth' # noqa -model = dict( - backbone=dict( - init_cfg=dict( - type='Pretrained', prefix='backbone.', checkpoint=checkpoint))) +checkpoint = "https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth" # noqa +model = dict(backbone=dict(init_cfg=dict(type="Pretrained", prefix="backbone.", checkpoint=checkpoint))) optim_wrapper = dict( - optimizer=dict(_delete_=True, type='AdamW', lr=0.0002, weight_decay=0.05), - paramwise_cfg=dict(norm_decay_mult=0., bypass_duplicate=True)) + optimizer=dict(_delete_=True, type="AdamW", lr=0.0002, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0.0, bypass_duplicate=True), +) diff --git a/mmpose/configs/mmdet/resnet_strikes_back/mask-rcnn_r50-rsb-pre_fpn_1x_coco.py b/mmpose/configs/mmdet/resnet_strikes_back/mask-rcnn_r50-rsb-pre_fpn_1x_coco.py index 85e25d392359b1a7811fb0c933ede5edacbfb9c3..e125b1152724c87feee3542e92ac9f80d856054c 100644 --- a/mmpose/configs/mmdet/resnet_strikes_back/mask-rcnn_r50-rsb-pre_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/resnet_strikes_back/mask-rcnn_r50-rsb-pre_fpn_1x_coco.py @@ -1,15 +1,14 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth' # noqa -model = dict( - backbone=dict( - init_cfg=dict( - type='Pretrained', prefix='backbone.', checkpoint=checkpoint))) +checkpoint = "https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth" # noqa +model = dict(backbone=dict(init_cfg=dict(type="Pretrained", prefix="backbone.", checkpoint=checkpoint))) optim_wrapper = dict( - optimizer=dict(_delete_=True, type='AdamW', lr=0.0002, weight_decay=0.05), - paramwise_cfg=dict(norm_decay_mult=0., bypass_duplicate=True)) + optimizer=dict(_delete_=True, type="AdamW", lr=0.0002, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0.0, bypass_duplicate=True), +) diff --git a/mmpose/configs/mmdet/resnet_strikes_back/retinanet_r50-rsb-pre_fpn_1x_coco.py b/mmpose/configs/mmdet/resnet_strikes_back/retinanet_r50-rsb-pre_fpn_1x_coco.py index 7ce7bfd87d6b41a36acc4ff207695e38ef89700c..acd61255deeb69dbad5ecad1e75fa69566bd95e9 100644 --- a/mmpose/configs/mmdet/resnet_strikes_back/retinanet_r50-rsb-pre_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/resnet_strikes_back/retinanet_r50-rsb-pre_fpn_1x_coco.py @@ -1,15 +1,14 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth' # noqa -model = dict( - backbone=dict( - init_cfg=dict( - type='Pretrained', prefix='backbone.', checkpoint=checkpoint))) +checkpoint = "https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb256-rsb-a1-600e_in1k_20211228-20e21305.pth" # noqa +model = dict(backbone=dict(init_cfg=dict(type="Pretrained", prefix="backbone.", checkpoint=checkpoint))) optim_wrapper = dict( - optimizer=dict(_delete_=True, type='AdamW', lr=0.0001, weight_decay=0.05), - paramwise_cfg=dict(norm_decay_mult=0., bypass_duplicate=True)) + optimizer=dict(_delete_=True, type="AdamW", lr=0.0001, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0.0, bypass_duplicate=True), +) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r101-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r101-caffe_fpn_1x_coco.py index 1f3a4487103eea868eafe8539517b38455025bbe..57f6e514b77281575669c847418a7f807a829857 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r101-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r101-caffe_fpn_1x_coco.py @@ -1,7 +1,2 @@ -_base_ = './retinanet_r50-caffe_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) +_base_ = "./retinanet_r50-caffe_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"))) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r101-caffe_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r101-caffe_fpn_ms-3x_coco.py index cfe773459c2529079274b241f5f99ae66d8906ad..cb47957670af362c528a3da172d0faa2234288de 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r101-caffe_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r101-caffe_fpn_ms-3x_coco.py @@ -1,8 +1,3 @@ -_base_ = './retinanet_r50-caffe_fpn_ms-3x_coco.py' +_base_ = "./retinanet_r50-caffe_fpn_ms-3x_coco.py" # learning policy -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"))) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_1x_coco.py index a7f06002413dcdf2716975655a582a3eefaf007a..9ad6ca968b65aa53f2f182a750a3902827e7774b 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './retinanet_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./retinanet_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_2x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_2x_coco.py index 721112a221953bb86dc3259e3991d7f0f740b26c..8297720844d56f6824f8276301b63de6b2354170 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_2x_coco.py @@ -1,6 +1,2 @@ -_base_ = './retinanet_r50_fpn_2x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./retinanet_r50_fpn_2x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_8xb8-amp-lsj-200e_coco.py index be018eaac672a4c1c3a61eac9940c4d28ea4fb40..4ee60b57775a4783035cd69e9e181c536e5e4a31 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,7 +1,3 @@ -_base_ = './retinanet_r50_fpn_8xb8-amp-lsj-200e_coco.py' +_base_ = "./retinanet_r50_fpn_8xb8-amp-lsj-200e_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_ms-640-800-3x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_ms-640-800-3x_coco.py index 566397227f7861a268c4cc4e111279b95b620ab8..de7eb70b12d5cd9a33520d718fc71aa027855298 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_ms-640-800-3x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r101_fpn_ms-640-800-3x_coco.py @@ -1,9 +1,4 @@ -_base_ = ['../_base_/models/retinanet_r50_fpn.py', '../common/ms_3x_coco.py'] +_base_ = ["../_base_/models/retinanet_r50_fpn.py", "../common/ms_3x_coco.py"] # optimizer -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r18_fpn_1x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r18_fpn_1x_coco.py index 960211806756d38cf74eed998addcca3f8467a4d..935959e193cc0d2e70d46e0a92043e219f8dd35c 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r18_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r18_fpn_1x_coco.py @@ -1,17 +1,16 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] # model model = dict( - backbone=dict( - depth=18, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18')), - neck=dict(in_channels=[64, 128, 256, 512])) -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) + backbone=dict(depth=18, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet18")), + neck=dict(in_channels=[64, 128, 256, 512]), +) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) # TODO: support auto scaling lr # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r18_fpn_1xb8-1x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r18_fpn_1xb8-1x_coco.py index d2e88d68e3366671e402b1766d3b456593262a9b..60475e4b5ae9e606ca2cfa81a2ca788d6a96d28a 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r18_fpn_1xb8-1x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r18_fpn_1xb8-1x_coco.py @@ -1,7 +1,8 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] # data @@ -9,14 +10,12 @@ train_dataloader = dict(batch_size=8) # model model = dict( - backbone=dict( - depth=18, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18')), - neck=dict(in_channels=[64, 128, 256, 512])) + backbone=dict(depth=18, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet18")), + neck=dict(in_channels=[64, 128, 256, 512]), +) # Note: If the learning rate is set to 0.0025, the mAP will be 32.4. -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.005, momentum=0.9, weight_decay=0.0001)) # TODO: support auto scaling lr # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r18_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r18_fpn_8xb8-amp-lsj-200e_coco.py index d6833f3f4711ec28a25ae8a51687fc4ac13ffb89..d9da846f582ee36caf60bb422a3690920ae3c8b6 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r18_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r18_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,7 +1,6 @@ -_base_ = './retinanet_r50_fpn_8xb8-amp-lsj-200e_coco.py' +_base_ = "./retinanet_r50_fpn_8xb8-amp-lsj-200e_coco.py" model = dict( - backbone=dict( - depth=18, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18')), - neck=dict(in_channels=[64, 128, 256, 512])) + backbone=dict(depth=18, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet18")), + neck=dict(in_channels=[64, 128, 256, 512]), +) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_1x_coco.py index 6ba1cdddc4707b40f549189f768457312635669d..4d7c428ba05bfd659ac8175675e5c92cc0e52b05 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_1x_coco.py @@ -1,16 +1,17 @@ -_base_ = './retinanet_r50_fpn_1x_coco.py' +_base_ = "./retinanet_r50_fpn_1x_coco.py" model = dict( data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", # use caffe img_norm mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, - pad_size_divisor=32), + pad_size_divisor=32, + ), backbone=dict( norm_cfg=dict(requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe'))) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), +) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_ms-1x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_ms-1x_coco.py index 93687d8c27b73ae2a172b45a733345e5fc036f03..97e85a2870627fed662c61059581c6ffeba37204 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_ms-1x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_ms-1x_coco.py @@ -1,15 +1,11 @@ -_base_ = './retinanet_r50-caffe_fpn_1x_coco.py' +_base_ = "./retinanet_r50-caffe_fpn_1x_coco.py" train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_ms-2x_coco.py index 6d1604fb9efd5deb11ffc04f6f9685739f82aea9..d99f3bd8a343e70e8d50273442be5e9e90f2afd6 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_ms-2x_coco.py @@ -1,16 +1,9 @@ -_base_ = './retinanet_r50-caffe_fpn_ms-1x_coco.py' +_base_ = "./retinanet_r50-caffe_fpn_ms-1x_coco.py" # training schedule for 2x train_cfg = dict(max_epochs=24) # learning rate policy param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=24, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=24, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_ms-3x_coco.py index 5a6d42a13c27d5fc0b8072e2c96ef5d15a0f248c..b78c6c9ddc3332f64d5ca6f6d9eb656c0fab84c8 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r50-caffe_fpn_ms-3x_coco.py @@ -1,17 +1,10 @@ -_base_ = './retinanet_r50-caffe_fpn_ms-1x_coco.py' +_base_ = "./retinanet_r50-caffe_fpn_ms-1x_coco.py" # training schedule for 2x train_cfg = dict(max_epochs=36) # learning rate policy param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=36, - by_epoch=True, - milestones=[28, 34], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=36, by_epoch=True, milestones=[28, 34], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_1x_coco.py index 00d2567b245dba2b2be815a92146ea1364e1e799..f3264da937514ea68f837fc02e35a8284d2f3714 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_1x_coco.py @@ -1,10 +1,10 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py', - './retinanet_tta.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", + "./retinanet_tta.py", ] # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_2x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_2x_coco.py index 47511b78ed2edb43121de2fc27986f6bb81abcfa..a99115d6ce060e99dfe8550c8ba94a3cac02e441 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_2x_coco.py @@ -1,7 +1,8 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] # training schedule for 2x @@ -9,17 +10,9 @@ train_cfg = dict(max_epochs=24) # learning rate policy param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=24, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=24, by_epoch=True, milestones=[16, 22], gamma=0.1), ] # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_8xb8-amp-lsj-200e_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_8xb8-amp-lsj-200e_coco.py index 2f10db2f3c84d4b1970f13f54c563408487d04af..0fa39b888bfe7fda4d242d18e1dc7fa04af6777f 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_8xb8-amp-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_8xb8-amp-lsj-200e_coco.py @@ -1,19 +1,13 @@ -_base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../common/lsj-200e_coco-detection.py' -] +_base_ = ["../_base_/models/retinanet_r50_fpn.py", "../common/lsj-200e_coco-detection.py"] image_size = (1024, 1024) -batch_augments = [dict(type='BatchFixedSizePad', size=image_size)] +batch_augments = [dict(type="BatchFixedSizePad", size=image_size)] model = dict(data_preprocessor=dict(batch_augments=batch_augments)) train_dataloader = dict(batch_size=8, num_workers=4) # Enable automatic-mixed-precision training with AmpOptimWrapper. -optim_wrapper = dict( - type='AmpOptimWrapper', - optimizer=dict( - type='SGD', lr=0.01 * 4, momentum=0.9, weight_decay=0.00004)) +optim_wrapper = dict(type="AmpOptimWrapper", optimizer=dict(type="SGD", lr=0.01 * 4, momentum=0.9, weight_decay=0.00004)) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_90k_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_90k_coco.py index 1e1b2fd950a0293220cc93ce3f3b377b4163f3aa..f1852030aac05a88bda3d530ac864bd3d6691c4a 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_90k_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_90k_coco.py @@ -1,24 +1,13 @@ -_base_ = 'retinanet_r50_fpn_1x_coco.py' +_base_ = "retinanet_r50_fpn_1x_coco.py" # training schedule for 90k -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=90000, - val_interval=10000) +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=90000, val_interval=10000) # learning rate policy param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=90000, - by_epoch=False, - milestones=[60000, 80000], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=90000, by_epoch=False, milestones=[60000, 80000], gamma=0.1), ] -train_dataloader = dict(sampler=dict(type='InfiniteSampler')) +train_dataloader = dict(sampler=dict(type="InfiniteSampler")) default_hooks = dict(checkpoint=dict(by_epoch=False, interval=10000)) log_processor = dict(by_epoch=False) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_amp-1x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_amp-1x_coco.py index acf5266337b8e73957a1cdf2b06076c1733b4d56..17bc9bc2357cf10cd20dc13cfdaaf317e80889ea 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_amp-1x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_amp-1x_coco.py @@ -1,6 +1,6 @@ -_base_ = './retinanet_r50_fpn_1x_coco.py' +_base_ = "./retinanet_r50_fpn_1x_coco.py" # MMEngine support the following two ways, users can choose # according to convenience # optim_wrapper = dict(type='AmpOptimWrapper') -_base_.optim_wrapper.type = 'AmpOptimWrapper' +_base_.optim_wrapper.type = "AmpOptimWrapper" diff --git a/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_ms-640-800-3x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_ms-640-800-3x_coco.py index d91cf8ce0df15968706631d7eac76e834cba93dc..dc88e1d5b35a4177cf27c78c8b7417b074b0d174 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_ms-640-800-3x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_r50_fpn_ms-640-800-3x_coco.py @@ -1,4 +1,3 @@ -_base_ = ['../_base_/models/retinanet_r50_fpn.py', '../common/ms_3x_coco.py'] +_base_ = ["../_base_/models/retinanet_r50_fpn.py", "../common/ms_3x_coco.py"] # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_tta.py b/mmpose/configs/mmdet/retinanet/retinanet_tta.py index d0f37e0ab25e2aff1ad55e76a7ee02777293d507..e140d0210a9666a0a845c09d14dc6932e358b4c3 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_tta.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_tta.py @@ -1,23 +1,20 @@ -tta_model = dict( - type='DetTTAModel', - tta_cfg=dict(nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)) +tta_model = dict(type="DetTTAModel", tta_cfg=dict(nms=dict(type="nms", iou_threshold=0.5), max_per_img=100)) img_scales = [(1333, 800), (666, 400), (2000, 1200)] tta_pipeline = [ - dict(type='LoadImageFromFile', backend_args=None), + dict(type="LoadImageFromFile", backend_args=None), dict( - type='TestTimeAug', - transforms=[[ - dict(type='Resize', scale=s, keep_ratio=True) for s in img_scales - ], [ - dict(type='RandomFlip', prob=1.), - dict(type='RandomFlip', prob=0.) - ], [dict(type='LoadAnnotations', with_bbox=True)], - [ - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', - 'img_shape', 'scale_factor', 'flip', - 'flip_direction')) - ]]) + type="TestTimeAug", + transforms=[ + [dict(type="Resize", scale=s, keep_ratio=True) for s in img_scales], + [dict(type="RandomFlip", prob=1.0), dict(type="RandomFlip", prob=0.0)], + [dict(type="LoadAnnotations", with_bbox=True)], + [ + dict( + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction"), + ) + ], + ], + ), ] diff --git a/mmpose/configs/mmdet/retinanet/retinanet_x101-32x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_x101-32x4d_fpn_1x_coco.py index 765a4c2cc0f69bf13891bf371c94c17b6cd5f30c..2cade1fa27fde99770988eccfca32dabcc03702d 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_x101-32x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_x101-32x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './retinanet_r50_fpn_1x_coco.py' +_base_ = "./retinanet_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_x101-32x4d_fpn_2x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_x101-32x4d_fpn_2x_coco.py index 14de96faf70180d7828a670630a8f48a3cd1081d..2fe34579ce39056ab3027877432fc2f851399e29 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_x101-32x4d_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_x101-32x4d_fpn_2x_coco.py @@ -1,14 +1,15 @@ -_base_ = './retinanet_r50_fpn_2x_coco.py' +_base_ = "./retinanet_r50_fpn_2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_x101-64x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_x101-64x4d_fpn_1x_coco.py index 948cd18e4d995d18d947b345ba7229b5cad60eb1..28fbd5bbe37fdc814e456ed21bd46f5ca37fba63 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_x101-64x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_x101-64x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './retinanet_r50_fpn_1x_coco.py' +_base_ = "./retinanet_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_x101-64x4d_fpn_2x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_x101-64x4d_fpn_2x_coco.py index ad04b6eea793add40c81d1d7096481597357d5bd..fd4945813cc8995f61e721330d9b4b112c0fc027 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_x101-64x4d_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_x101-64x4d_fpn_2x_coco.py @@ -1,14 +1,15 @@ -_base_ = './retinanet_r50_fpn_2x_coco.py' +_base_ = "./retinanet_r50_fpn_2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/retinanet/retinanet_x101-64x4d_fpn_ms-640-800-3x_coco.py b/mmpose/configs/mmdet/retinanet/retinanet_x101-64x4d_fpn_ms-640-800-3x_coco.py index 853134160cd2128cac7954cca7e008444522fd2c..efeb72f1437c0ac125f086cf23a3187f896dacf1 100644 --- a/mmpose/configs/mmdet/retinanet/retinanet_x101-64x4d_fpn_ms-640-800-3x_coco.py +++ b/mmpose/configs/mmdet/retinanet/retinanet_x101-64x4d_fpn_ms-640-800-3x_coco.py @@ -1,11 +1,8 @@ -_base_ = ['../_base_/models/retinanet_r50_fpn.py', '../common/ms_3x_coco.py'] +_base_ = ["../_base_/models/retinanet_r50_fpn.py", "../common/ms_3x_coco.py"] # optimizer model = dict( backbone=dict( - type='ResNeXt', - depth=101, - groups=64, - base_width=4, - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) -optim_wrapper = dict(optimizer=dict(type='SGD', lr=0.01)) + type="ResNeXt", depth=101, groups=64, base_width=4, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d") + ) +) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01)) diff --git a/mmpose/configs/mmdet/rpn/rpn_r101-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/rpn/rpn_r101-caffe_fpn_1x_coco.py index 22977af8cb761f9415c55f8fa6d458937a00ba06..ab193bcb8d09bc0057de95e53694cb5711358517 100644 --- a/mmpose/configs/mmdet/rpn/rpn_r101-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/rpn/rpn_r101-caffe_fpn_1x_coco.py @@ -1,7 +1,2 @@ -_base_ = './rpn_r50-caffe_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet101_caffe'))) +_base_ = "./rpn_r50-caffe_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet101_caffe"))) diff --git a/mmpose/configs/mmdet/rpn/rpn_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/rpn/rpn_r101_fpn_1x_coco.py index 962728ff08abb4652c617a085649575b6cfdcbf8..07f606da62bee17449be52fb359b5ac43d659bc3 100644 --- a/mmpose/configs/mmdet/rpn/rpn_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/rpn/rpn_r101_fpn_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './rpn_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./rpn_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/rpn/rpn_r101_fpn_2x_coco.py b/mmpose/configs/mmdet/rpn/rpn_r101_fpn_2x_coco.py index ac7671c1c2421c0caa7b42d012cc3a2edc068934..c475676ed42c5bdb44e648440f1f19a407a0640a 100644 --- a/mmpose/configs/mmdet/rpn/rpn_r101_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/rpn/rpn_r101_fpn_2x_coco.py @@ -1,6 +1,2 @@ -_base_ = './rpn_r50_fpn_2x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./rpn_r50_fpn_2x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/rpn/rpn_r50-caffe-c4_1x_coco.py b/mmpose/configs/mmdet/rpn/rpn_r50-caffe-c4_1x_coco.py index 76b878c874d6545e537ee8a9618e83bb095de281..2c2362c2ccd29765fd9da25af961a9e9572370d2 100644 --- a/mmpose/configs/mmdet/rpn/rpn_r50-caffe-c4_1x_coco.py +++ b/mmpose/configs/mmdet/rpn/rpn_r50-caffe-c4_1x_coco.py @@ -1,8 +1,9 @@ _base_ = [ - '../_base_/models/rpn_r50-caffe-c4.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/rpn_r50-caffe-c4.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -val_evaluator = dict(metric='proposal_fast') +val_evaluator = dict(metric="proposal_fast") test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/rpn/rpn_r50-caffe_fpn_1x_coco.py b/mmpose/configs/mmdet/rpn/rpn_r50-caffe_fpn_1x_coco.py index 530f365210572f9bf55ca2775bfdbeba98567076..97df68d7c6ae9c8f25cdd1fa9c13659e1d5c9448 100644 --- a/mmpose/configs/mmdet/rpn/rpn_r50-caffe_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/rpn/rpn_r50-caffe_fpn_1x_coco.py @@ -1,16 +1,13 @@ -_base_ = './rpn_r50_fpn_1x_coco.py' +_base_ = "./rpn_r50_fpn_1x_coco.py" # use caffe img_norm model = dict( data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( norm_cfg=dict(requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe'))) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), +) diff --git a/mmpose/configs/mmdet/rpn/rpn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/rpn/rpn_r50_fpn_1x_coco.py index 7fe88d395b8a32e7513ede3c0c724e29b3554da6..7205788648cdabac98867efeede5d0733df99f61 100644 --- a/mmpose/configs/mmdet/rpn/rpn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/rpn/rpn_r50_fpn_1x_coco.py @@ -1,9 +1,11 @@ _base_ = [ - '../_base_/models/rpn_r50_fpn.py', '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/rpn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -val_evaluator = dict(metric='proposal_fast') +val_evaluator = dict(metric="proposal_fast") test_evaluator = val_evaluator # inference on val dataset and dump the proposals with evaluate metric diff --git a/mmpose/configs/mmdet/rpn/rpn_r50_fpn_2x_coco.py b/mmpose/configs/mmdet/rpn/rpn_r50_fpn_2x_coco.py index 0ebccbcfaf394fcbb4fbdaea51abdd583f628cac..62cb8b784dd2d4171412ef258e3c6416636259f0 100644 --- a/mmpose/configs/mmdet/rpn/rpn_r50_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/rpn/rpn_r50_fpn_2x_coco.py @@ -1,17 +1,9 @@ -_base_ = './rpn_r50_fpn_1x_coco.py' +_base_ = "./rpn_r50_fpn_1x_coco.py" # learning policy max_epochs = 24 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/rpn/rpn_x101-32x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/rpn/rpn_x101-32x4d_fpn_1x_coco.py index d0c73948ac56afa34b9d6c8d22d6158271306b8c..f38c10bbaa0a96a1e04a3fd5c8889aa389d583c7 100644 --- a/mmpose/configs/mmdet/rpn/rpn_x101-32x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/rpn/rpn_x101-32x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './rpn_r50_fpn_1x_coco.py' +_base_ = "./rpn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/rpn/rpn_x101-32x4d_fpn_2x_coco.py b/mmpose/configs/mmdet/rpn/rpn_x101-32x4d_fpn_2x_coco.py index c6880b762abc8f5d3bf12f278054d76958756fb2..b785b72811ed4e8c3900bcd27060e1869ba93738 100644 --- a/mmpose/configs/mmdet/rpn/rpn_x101-32x4d_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/rpn/rpn_x101-32x4d_fpn_2x_coco.py @@ -1,14 +1,15 @@ -_base_ = './rpn_r50_fpn_2x_coco.py' +_base_ = "./rpn_r50_fpn_2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/rpn/rpn_x101-64x4d_fpn_1x_coco.py b/mmpose/configs/mmdet/rpn/rpn_x101-64x4d_fpn_1x_coco.py index 96e691a912c424f09add038c75631a2e1fefeffc..38dfdf3c5a51493d21100b245e042aa3b7817573 100644 --- a/mmpose/configs/mmdet/rpn/rpn_x101-64x4d_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/rpn/rpn_x101-64x4d_fpn_1x_coco.py @@ -1,14 +1,15 @@ -_base_ = './rpn_r50_fpn_1x_coco.py' +_base_ = "./rpn_r50_fpn_1x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/rpn/rpn_x101-64x4d_fpn_2x_coco.py b/mmpose/configs/mmdet/rpn/rpn_x101-64x4d_fpn_2x_coco.py index 4182a39667c47d774a1df9d34a1bc2fe60b45538..0249d8a5ecf638bf110a33c842fa392caaa85a16 100644 --- a/mmpose/configs/mmdet/rpn/rpn_x101-64x4d_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/rpn/rpn_x101-64x4d_fpn_2x_coco.py @@ -1,14 +1,15 @@ -_base_ = './rpn_r50_fpn_2x_coco.py' +_base_ = "./rpn_r50_fpn_2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/rtmdet/classification/cspnext-l_8xb256-rsb-a1-600e_in1k.py b/mmpose/configs/mmdet/rtmdet/classification/cspnext-l_8xb256-rsb-a1-600e_in1k.py index d2e70539f05da69cca53f273d11e3296c87c4eda..fe674a98669b2538f1b1cb7d50b3458f8fb142a1 100644 --- a/mmpose/configs/mmdet/rtmdet/classification/cspnext-l_8xb256-rsb-a1-600e_in1k.py +++ b/mmpose/configs/mmdet/rtmdet/classification/cspnext-l_8xb256-rsb-a1-600e_in1k.py @@ -1,5 +1,3 @@ -_base_ = './cspnext-s_8xb256-rsb-a1-600e_in1k.py' +_base_ = "./cspnext-s_8xb256-rsb-a1-600e_in1k.py" -model = dict( - backbone=dict(deepen_factor=1, widen_factor=1), - head=dict(in_channels=1024)) +model = dict(backbone=dict(deepen_factor=1, widen_factor=1), head=dict(in_channels=1024)) diff --git a/mmpose/configs/mmdet/rtmdet/classification/cspnext-m_8xb256-rsb-a1-600e_in1k.py b/mmpose/configs/mmdet/rtmdet/classification/cspnext-m_8xb256-rsb-a1-600e_in1k.py index e1b1352dd91a803eeafe80f587203f96a247c27f..90e48be9b635a972e7b64e1abc45c305ee01ec0c 100644 --- a/mmpose/configs/mmdet/rtmdet/classification/cspnext-m_8xb256-rsb-a1-600e_in1k.py +++ b/mmpose/configs/mmdet/rtmdet/classification/cspnext-m_8xb256-rsb-a1-600e_in1k.py @@ -1,5 +1,3 @@ -_base_ = './cspnext-s_8xb256-rsb-a1-600e_in1k.py' +_base_ = "./cspnext-s_8xb256-rsb-a1-600e_in1k.py" -model = dict( - backbone=dict(deepen_factor=0.67, widen_factor=0.75), - head=dict(in_channels=768)) +model = dict(backbone=dict(deepen_factor=0.67, widen_factor=0.75), head=dict(in_channels=768)) diff --git a/mmpose/configs/mmdet/rtmdet/classification/cspnext-s_8xb256-rsb-a1-600e_in1k.py b/mmpose/configs/mmdet/rtmdet/classification/cspnext-s_8xb256-rsb-a1-600e_in1k.py index dcfd2ea47d54408ef6d2fe225b57c5c9e540918a..b4a071e4db21e58f5b77e945c9d033627d3ad3aa 100644 --- a/mmpose/configs/mmdet/rtmdet/classification/cspnext-s_8xb256-rsb-a1-600e_in1k.py +++ b/mmpose/configs/mmdet/rtmdet/classification/cspnext-s_8xb256-rsb-a1-600e_in1k.py @@ -1,64 +1,55 @@ _base_ = [ - 'mmpretrain::_base_/datasets/imagenet_bs256_rsb_a12.py', - 'mmpretrain::_base_/schedules/imagenet_bs2048_rsb.py', - 'mmpretrain::_base_/default_runtime.py' + "mmpretrain::_base_/datasets/imagenet_bs256_rsb_a12.py", + "mmpretrain::_base_/schedules/imagenet_bs2048_rsb.py", + "mmpretrain::_base_/default_runtime.py", ] model = dict( - type='ImageClassifier', + type="ImageClassifier", backbone=dict( - type='mmdet.CSPNeXt', - arch='P5', - out_indices=(4, ), + type="mmdet.CSPNeXt", + arch="P5", + out_indices=(4,), expand_ratio=0.5, deepen_factor=0.33, widen_factor=0.5, channel_attention=True, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='mmdet.SiLU')), - neck=dict(type='GlobalAveragePooling'), + norm_cfg=dict(type="BN"), + act_cfg=dict(type="mmdet.SiLU"), + ), + neck=dict(type="GlobalAveragePooling"), head=dict( - type='LinearClsHead', + type="LinearClsHead", num_classes=1000, in_channels=512, - loss=dict( - type='LabelSmoothLoss', - label_smooth_val=0.1, - mode='original', - loss_weight=1.0), - topk=(1, 5)), - train_cfg=dict(augments=[ - dict(type='Mixup', alpha=0.2), - dict(type='CutMix', alpha=1.0) - ])) + loss=dict(type="LabelSmoothLoss", label_smooth_val=0.1, mode="original", loss_weight=1.0), + topk=(1, 5), + ), + train_cfg=dict(augments=[dict(type="Mixup", alpha=0.2), dict(type="CutMix", alpha=1.0)]), +) # dataset settings -train_dataloader = dict(sampler=dict(type='RepeatAugSampler', shuffle=True)) +train_dataloader = dict(sampler=dict(type="RepeatAugSampler", shuffle=True)) # schedule settings optim_wrapper = dict( optimizer=dict(weight_decay=0.01), - paramwise_cfg=dict(bias_decay_mult=0., norm_decay_mult=0.), + paramwise_cfg=dict(bias_decay_mult=0.0, norm_decay_mult=0.0), ) param_scheduler = [ # warm up learning rate scheduler dict( - type='LinearLR', + type="LinearLR", start_factor=0.0001, by_epoch=True, begin=0, end=5, # update by iter - convert_to_iter_based=True), + convert_to_iter_based=True, + ), # main learning rate scheduler - dict( - type='CosineAnnealingLR', - T_max=595, - eta_min=1.0e-6, - by_epoch=True, - begin=5, - end=600) + dict(type="CosineAnnealingLR", T_max=595, eta_min=1.0e-6, by_epoch=True, begin=5, end=600), ] train_cfg = dict(by_epoch=True, max_epochs=600) diff --git a/mmpose/configs/mmdet/rtmdet/classification/cspnext-tiny_8xb256-rsb-a1-600e_in1k.py b/mmpose/configs/mmdet/rtmdet/classification/cspnext-tiny_8xb256-rsb-a1-600e_in1k.py index af3170bdc51778c4601d4426aa88cc27c608f100..5aa6d8e03747ec189a5e9b29b1e2df6282f7bf63 100644 --- a/mmpose/configs/mmdet/rtmdet/classification/cspnext-tiny_8xb256-rsb-a1-600e_in1k.py +++ b/mmpose/configs/mmdet/rtmdet/classification/cspnext-tiny_8xb256-rsb-a1-600e_in1k.py @@ -1,5 +1,3 @@ -_base_ = './cspnext-s_8xb256-rsb-a1-600e_in1k.py' +_base_ = "./cspnext-s_8xb256-rsb-a1-600e_in1k.py" -model = dict( - backbone=dict(deepen_factor=0.167, widen_factor=0.375), - head=dict(in_channels=384)) +model = dict(backbone=dict(deepen_factor=0.167, widen_factor=0.375), head=dict(in_channels=384)) diff --git a/mmpose/configs/mmdet/rtmdet/classification/cspnext-x_8xb256-rsb-a1-600e_in1k.py b/mmpose/configs/mmdet/rtmdet/classification/cspnext-x_8xb256-rsb-a1-600e_in1k.py index edec48d78dbefdb7783c5dd50e97873e29ea6497..ef6e9776dbb3ef21159638c71dcb908ea31b7cf3 100644 --- a/mmpose/configs/mmdet/rtmdet/classification/cspnext-x_8xb256-rsb-a1-600e_in1k.py +++ b/mmpose/configs/mmdet/rtmdet/classification/cspnext-x_8xb256-rsb-a1-600e_in1k.py @@ -1,5 +1,3 @@ -_base_ = './cspnext-s_8xb256-rsb-a1-600e_in1k.py' +_base_ = "./cspnext-s_8xb256-rsb-a1-600e_in1k.py" -model = dict( - backbone=dict(deepen_factor=1.33, widen_factor=1.25), - head=dict(in_channels=1280)) +model = dict(backbone=dict(deepen_factor=1.33, widen_factor=1.25), head=dict(in_channels=1280)) diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet-ins_l_8xb32-300e_coco.py b/mmpose/configs/mmdet/rtmdet/rtmdet-ins_l_8xb32-300e_coco.py index a9c62c27b6da6a0cb9006bf99ab88171ce6aea4d..17790930eacb7c6fe7c0e67e0a08e29ef9673c0d 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet-ins_l_8xb32-300e_coco.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet-ins_l_8xb32-300e_coco.py @@ -1,104 +1,58 @@ -_base_ = './rtmdet_l_8xb32-300e_coco.py' +_base_ = "./rtmdet_l_8xb32-300e_coco.py" model = dict( bbox_head=dict( _delete_=True, - type='RTMDetInsSepBNHead', + type="RTMDetInsSepBNHead", num_classes=80, in_channels=256, stacked_convs=2, share_conv=True, pred_kernel_size=1, feat_channels=256, - act_cfg=dict(type='SiLU', inplace=True), - norm_cfg=dict(type='SyncBN', requires_grad=True), - anchor_generator=dict( - type='MlvlPointGenerator', offset=0, strides=[8, 16, 32]), - bbox_coder=dict(type='DistancePointBBoxCoder'), - loss_cls=dict( - type='QualityFocalLoss', - use_sigmoid=True, - beta=2.0, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=2.0), - loss_mask=dict( - type='DiceLoss', loss_weight=2.0, eps=5e-6, reduction='mean')), + act_cfg=dict(type="SiLU", inplace=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), + anchor_generator=dict(type="MlvlPointGenerator", offset=0, strides=[8, 16, 32]), + bbox_coder=dict(type="DistancePointBBoxCoder"), + loss_cls=dict(type="QualityFocalLoss", use_sigmoid=True, beta=2.0, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), + loss_mask=dict(type="DiceLoss", loss_weight=2.0, eps=5e-6, reduction="mean"), + ), test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=40, - mask_thr_binary=0.5), + nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=40, mask_thr_binary=0.5 + ), ) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict(type='CachedMosaic', img_scale=(640, 640), pad_val=114.0), - dict( - type='RandomResize', - scale=(1280, 1280), - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict( - type='RandomCrop', - crop_size=(640, 640), - recompute_bbox=True, - allow_negative_crop=True), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict( - type='CachedMixUp', - img_scale=(640, 640), - ratio_range=(1.0, 1.0), - max_cached_images=20, - pad_val=(114, 114, 114)), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1)), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="CachedMosaic", img_scale=(640, 640), pad_val=114.0), + dict(type="RandomResize", scale=(1280, 1280), ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640), recompute_bbox=True, allow_negative_crop=True), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="CachedMixUp", img_scale=(640, 640), ratio_range=(1.0, 1.0), max_cached_images=20, pad_val=(114, 114, 114)), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1)), + dict(type="PackDetInputs"), ] train_dataloader = dict(pin_memory=True, dataset=dict(pipeline=train_pipeline)) train_pipeline_stage2 = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict( - type='RandomResize', - scale=(640, 640), - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict( - type='RandomCrop', - crop_size=(640, 640), - recompute_bbox=True, - allow_negative_crop=True), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="RandomResize", scale=(640, 640), ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640), recompute_bbox=True, allow_negative_crop=True), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="PackDetInputs"), ] custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='PipelineSwitchHook', - switch_epoch=280, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="PipelineSwitchHook", switch_epoch=280, switch_pipeline=train_pipeline_stage2), ] -val_evaluator = dict(metric=['bbox', 'segm']) +val_evaluator = dict(metric=["bbox", "segm"]) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet-ins_l_8xb32-300e_cocoHuman.py b/mmpose/configs/mmdet/rtmdet/rtmdet-ins_l_8xb32-300e_cocoHuman.py index 7e66b4558754305a47a0cd193c80eabb4dc30d65..bcf98ee094bbe1b09a7fd2a1bd695ee171146330 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet-ins_l_8xb32-300e_cocoHuman.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet-ins_l_8xb32-300e_cocoHuman.py @@ -1,4 +1,4 @@ -_base_ = './rtmdet_l_8xb32-300e_coco.py' +_base_ = "./rtmdet_l_8xb32-300e_coco.py" # _base_ = [ # '../_base_/default_runtime.py', '../_base_/schedules/schedule_1x.py', # '../_base_/datasets/coco_human_instance.py', './rtmdet_tta.py' @@ -6,147 +6,95 @@ _base_ = './rtmdet_l_8xb32-300e_coco.py' BATCH_SIZE = 16 -load_from = 'models/pretrained/rtmdet-ins_l_8xb32-300e_coco_20221124_103237-78d1d652.pth' +load_from = "models/pretrained/rtmdet-ins_l_8xb32-300e_coco_20221124_103237-78d1d652.pth" model = dict( - type='RTMDet', + type="RTMDet", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.53, 116.28, 123.675], - std=[57.375, 57.12, 58.395], - bgr_to_rgb=False, - batch_augments=None), + type="DetDataPreprocessor", mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], bgr_to_rgb=False, batch_augments=None + ), backbone=dict( - type='CSPNeXt', - arch='P5', + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=1, widen_factor=1, channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU', inplace=True)), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU", inplace=True), + ), neck=dict( - type='CSPNeXtPAFPN', + type="CSPNeXtPAFPN", in_channels=[256, 512, 1024], out_channels=256, num_csp_blocks=3, expand_ratio=0.5, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU', inplace=True)), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU", inplace=True), + ), bbox_head=dict( _delete_=True, - type='RTMDetInsSepBNHead', + type="RTMDetInsSepBNHead", num_classes=1, in_channels=256, stacked_convs=2, share_conv=True, pred_kernel_size=1, feat_channels=256, - act_cfg=dict(type='SiLU', inplace=True), - norm_cfg=dict(type='SyncBN', requires_grad=True), - anchor_generator=dict( - type='MlvlPointGenerator', offset=0, strides=[8, 16, 32]), - bbox_coder=dict(type='DistancePointBBoxCoder'), - loss_cls=dict( - type='QualityFocalLoss', - use_sigmoid=True, - beta=2.0, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=2.0), - loss_mask=dict( - type='DiceLoss', loss_weight=2.0, eps=5e-6, reduction='mean')), - train_cfg=dict( - assigner=dict(type='DynamicSoftLabelAssigner', topk=13), - allowed_border=-1, - pos_weight=-1, - debug=False), + act_cfg=dict(type="SiLU", inplace=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), + anchor_generator=dict(type="MlvlPointGenerator", offset=0, strides=[8, 16, 32]), + bbox_coder=dict(type="DistancePointBBoxCoder"), + loss_cls=dict(type="QualityFocalLoss", use_sigmoid=True, beta=2.0, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), + loss_mask=dict(type="DiceLoss", loss_weight=2.0, eps=5e-6, reduction="mean"), + ), + train_cfg=dict(assigner=dict(type="DynamicSoftLabelAssigner", topk=13), allowed_border=-1, pos_weight=-1, debug=False), test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100, - mask_thr_binary=0.5), + nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100, mask_thr_binary=0.5 + ), ) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict(type='CachedMosaic', img_scale=(640, 640), pad_val=114.0), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="CachedMosaic", img_scale=(640, 640), pad_val=114.0), # dict(type='RemoveRandomInstances'), - dict( - type='RandomResize', - scale=(1280, 1280), - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict( - type='RandomCrop', - crop_size=(640, 640), - recompute_bbox=True, - allow_negative_crop=True), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict( - type='CachedMixUp', - img_scale=(640, 640), - ratio_range=(1.0, 1.0), - max_cached_images=20, - pad_val=(114, 114, 114)), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1)), - dict(type='PackDetInputs') + dict(type="RandomResize", scale=(1280, 1280), ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640), recompute_bbox=True, allow_negative_crop=True), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="CachedMixUp", img_scale=(640, 640), ratio_range=(1.0, 1.0), max_cached_images=20, pad_val=(114, 114, 114)), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1)), + dict(type="PackDetInputs"), ] train_dataloader = dict(pin_memory=True, dataset=dict(pipeline=train_pipeline)) train_pipeline_stage2 = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict( - type='RandomResize', - scale=(640, 640), - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict( - type='RandomCrop', - crop_size=(640, 640), - recompute_bbox=True, - allow_negative_crop=True), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="RandomResize", scale=(640, 640), ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640), recompute_bbox=True, allow_negative_crop=True), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(640, 640), keep_ratio=True), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(640, 640), keep_ratio=True), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -train_dataloader = dict( - batch_size=BATCH_SIZE, - num_workers=10, - batch_sampler=None, - pin_memory=True, - dataset=dict(pipeline=train_pipeline)) -val_dataloader = dict( - batch_size=BATCH_SIZE//2, num_workers=10, dataset=dict(pipeline=test_pipeline)) +train_dataloader = dict(batch_size=BATCH_SIZE, num_workers=10, batch_sampler=None, pin_memory=True, dataset=dict(pipeline=train_pipeline)) +val_dataloader = dict(batch_size=BATCH_SIZE // 2, num_workers=10, dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader max_epochs = 300 @@ -154,10 +102,7 @@ stage2_num_epochs = 20 base_lr = 0.004 * BATCH_SIZE / 32 interval = 10 -train_cfg = dict( - max_epochs=max_epochs, - val_interval=interval, - dynamic_intervals=[(max_epochs - stage2_num_epochs, 1)]) +train_cfg = dict(max_epochs=max_epochs, val_interval=interval, dynamic_intervals=[(max_epochs - stage2_num_epochs, 1)]) val_evaluator = dict(proposal_nums=(100, 1, 10)) test_evaluator = val_evaluator @@ -165,49 +110,33 @@ test_evaluator = val_evaluator # optimizer optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 150 to 300 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # hooks -default_hooks = dict( - checkpoint=dict( - interval=interval, - max_keep_ckpts=3 # only keep latest 3 checkpoints - )) +default_hooks = dict(checkpoint=dict(interval=interval, max_keep_ckpts=3)) # only keep latest 3 checkpoints custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='PipelineSwitchHook', - switch_epoch=280, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="PipelineSwitchHook", switch_epoch=280, switch_pipeline=train_pipeline_stage2), ] -val_evaluator = dict(metric=['bbox', 'segm']) +val_evaluator = dict(metric=["bbox", "segm"]) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet-ins_l_insmask_8xb32-300e_cocoHuman.py b/mmpose/configs/mmdet/rtmdet/rtmdet-ins_l_insmask_8xb32-300e_cocoHuman.py index 8bfb7ceafbfb74c6d24f100d4abfb4fbe315cd2b..d2ae46cf2c85268d327371cb260d9353c3308735 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet-ins_l_insmask_8xb32-300e_cocoHuman.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet-ins_l_insmask_8xb32-300e_cocoHuman.py @@ -1,151 +1,106 @@ # _base_ = './rtmdet_l_8xb32-300e_coco.py' _base_ = [ - '../_base_/default_runtime.py', '../_base_/schedules/schedule_1x.py', - '../_base_/datasets/coco_human_instance.py', './rtmdet_tta.py' + "../_base_/default_runtime.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/datasets/coco_human_instance.py", + "./rtmdet_tta.py", ] BATCH_SIZE = 16 -load_from = 'models/pretrained/rtmdet-ins_l_8xb32-300e_coco_20221124_103237-78d1d652.pth' +load_from = "models/pretrained/rtmdet-ins_l_8xb32-300e_coco_20221124_103237-78d1d652.pth" model = dict( - type='RTMDet', + type="RTMDet", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.53, 116.28, 123.675], - std=[57.375, 57.12, 58.395], - bgr_to_rgb=False, - batch_augments=None), + type="DetDataPreprocessor", mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], bgr_to_rgb=False, batch_augments=None + ), backbone=dict( - type='CSPNeXt', - arch='P5', + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=1, widen_factor=1, channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU', inplace=True)), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU", inplace=True), + ), neck=dict( - type='CSPNeXtPAFPN', + type="CSPNeXtPAFPN", in_channels=[256, 512, 1024], out_channels=256, num_csp_blocks=3, expand_ratio=0.5, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU', inplace=True)), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU", inplace=True), + ), bbox_head=dict( - type='RTMDetInsSepBNHead', + type="RTMDetInsSepBNHead", num_classes=1, in_channels=256, stacked_convs=2, share_conv=True, pred_kernel_size=1, feat_channels=256, - act_cfg=dict(type='SiLU', inplace=True), - norm_cfg=dict(type='SyncBN', requires_grad=True), - anchor_generator=dict( - type='MlvlPointGenerator', offset=0, strides=[8, 16, 32]), - bbox_coder=dict(type='DistancePointBBoxCoder'), - loss_cls=dict( - type='QualityFocalLoss', - use_sigmoid=True, - beta=2.0, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=2.0), - loss_mask=dict( - type='DiceLoss', loss_weight=2.0, eps=5e-6, reduction='mean')), - train_cfg=dict( - assigner=dict(type='DynamicSoftLabelAssigner', topk=13), - allowed_border=-1, - pos_weight=-1, - debug=False), + act_cfg=dict(type="SiLU", inplace=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), + anchor_generator=dict(type="MlvlPointGenerator", offset=0, strides=[8, 16, 32]), + bbox_coder=dict(type="DistancePointBBoxCoder"), + loss_cls=dict(type="QualityFocalLoss", use_sigmoid=True, beta=2.0, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), + loss_mask=dict(type="DiceLoss", loss_weight=2.0, eps=5e-6, reduction="mean"), + ), + train_cfg=dict(assigner=dict(type="DynamicSoftLabelAssigner", topk=13), allowed_border=-1, pos_weight=-1, debug=False), test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100, - mask_thr_binary=0.5), + nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100, mask_thr_binary=0.5 + ), ) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), # dict(type='CachedMosaic', img_scale=(640, 640), pad_val=114.0), - dict(type='RemoveRandomInstances'), - dict( - type='RandomResize', - scale=(1280, 1280), - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict( - type='RandomCrop', - crop_size=(640, 640), - recompute_bbox=True, - allow_negative_crop=True), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="RemoveRandomInstances"), + dict(type="RandomResize", scale=(1280, 1280), ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640), recompute_bbox=True, allow_negative_crop=True), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), # dict( # type='CachedMixUp', # img_scale=(640, 640), # ratio_range=(1.0, 1.0), # max_cached_images=20, # pad_val=(114, 114, 114)), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1)), - dict(type='PackDetInputs') + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1)), + dict(type="PackDetInputs"), ] train_dataloader = dict(pin_memory=True, dataset=dict(pipeline=train_pipeline)) train_pipeline_stage2 = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict( - type='RandomResize', - scale=(640, 640), - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict( - type='RandomCrop', - crop_size=(640, 640), - recompute_bbox=True, - allow_negative_crop=True), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="RandomResize", scale=(640, 640), ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640), recompute_bbox=True, allow_negative_crop=True), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(640, 640), keep_ratio=True), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(640, 640), keep_ratio=True), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -train_dataloader = dict( - batch_size=BATCH_SIZE, - num_workers=10, - batch_sampler=None, - pin_memory=True, - dataset=dict(pipeline=train_pipeline)) -val_dataloader = dict( - batch_size=BATCH_SIZE//2, num_workers=10, dataset=dict(pipeline=test_pipeline)) +train_dataloader = dict(batch_size=BATCH_SIZE, num_workers=10, batch_sampler=None, pin_memory=True, dataset=dict(pipeline=train_pipeline)) +val_dataloader = dict(batch_size=BATCH_SIZE // 2, num_workers=10, dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader max_epochs = 300 @@ -153,10 +108,7 @@ stage2_num_epochs = 20 base_lr = 0.004 * BATCH_SIZE / 32 interval = 10 -train_cfg = dict( - max_epochs=max_epochs, - val_interval=interval, - dynamic_intervals=[(max_epochs - stage2_num_epochs, 1)]) +train_cfg = dict(max_epochs=max_epochs, val_interval=interval, dynamic_intervals=[(max_epochs - stage2_num_epochs, 1)]) val_evaluator = dict(proposal_nums=(100, 1, 10)) test_evaluator = val_evaluator @@ -164,49 +116,33 @@ test_evaluator = val_evaluator # optimizer optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 150 to 300 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # hooks -default_hooks = dict( - checkpoint=dict( - interval=interval, - max_keep_ckpts=3 # only keep latest 3 checkpoints - )) +default_hooks = dict(checkpoint=dict(interval=interval, max_keep_ckpts=3)) # only keep latest 3 checkpoints custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='PipelineSwitchHook', - switch_epoch=280, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="PipelineSwitchHook", switch_epoch=280, switch_pipeline=train_pipeline_stage2), ] -val_evaluator = dict(metric=['bbox', 'segm']) +val_evaluator = dict(metric=["bbox", "segm"]) test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet-ins_m_8xb32-300e_coco.py b/mmpose/configs/mmdet/rtmdet/rtmdet-ins_m_8xb32-300e_coco.py index 66da9148775b425c6b0052beb04f9c8ca17257d9..89396d9ede82feb2bf317b06138576e2af6f8052 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet-ins_m_8xb32-300e_coco.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet-ins_m_8xb32-300e_coco.py @@ -1,6 +1,7 @@ -_base_ = './rtmdet-ins_l_8xb32-300e_coco.py' +_base_ = "./rtmdet-ins_l_8xb32-300e_coco.py" model = dict( backbone=dict(deepen_factor=0.67, widen_factor=0.75), neck=dict(in_channels=[192, 384, 768], out_channels=192, num_csp_blocks=2), - bbox_head=dict(in_channels=192, feat_channels=192)) + bbox_head=dict(in_channels=192, feat_channels=192), +) diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet-ins_s_8xb32-300e_coco.py b/mmpose/configs/mmdet/rtmdet/rtmdet-ins_s_8xb32-300e_coco.py index 28bc21cc93bb36d2d2fc8601b06bb0f0c58d6d49..3bc7b8ef33bc348f3777853a46776e7aeb7ceda6 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet-ins_s_8xb32-300e_coco.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet-ins_s_8xb32-300e_coco.py @@ -1,80 +1,40 @@ -_base_ = './rtmdet-ins_l_8xb32-300e_coco.py' -checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-s_imagenet_600e.pth' # noqa +_base_ = "./rtmdet-ins_l_8xb32-300e_coco.py" +checkpoint = "https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-s_imagenet_600e.pth" # noqa model = dict( - backbone=dict( - deepen_factor=0.33, - widen_factor=0.5, - init_cfg=dict( - type='Pretrained', prefix='backbone.', checkpoint=checkpoint)), + backbone=dict(deepen_factor=0.33, widen_factor=0.5, init_cfg=dict(type="Pretrained", prefix="backbone.", checkpoint=checkpoint)), neck=dict(in_channels=[128, 256, 512], out_channels=128, num_csp_blocks=1), - bbox_head=dict(in_channels=128, feat_channels=128)) + bbox_head=dict(in_channels=128, feat_channels=128), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict(type='CachedMosaic', img_scale=(640, 640), pad_val=114.0), - dict( - type='RandomResize', - scale=(1280, 1280), - ratio_range=(0.5, 2.0), - keep_ratio=True), - dict( - type='RandomCrop', - crop_size=(640, 640), - recompute_bbox=True, - allow_negative_crop=True), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict( - type='CachedMixUp', - img_scale=(640, 640), - ratio_range=(1.0, 1.0), - max_cached_images=20, - pad_val=(114, 114, 114)), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1)), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="CachedMosaic", img_scale=(640, 640), pad_val=114.0), + dict(type="RandomResize", scale=(1280, 1280), ratio_range=(0.5, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640), recompute_bbox=True, allow_negative_crop=True), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="CachedMixUp", img_scale=(640, 640), ratio_range=(1.0, 1.0), max_cached_images=20, pad_val=(114, 114, 114)), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1)), + dict(type="PackDetInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict( - type='RandomResize', - scale=(640, 640), - ratio_range=(0.5, 2.0), - keep_ratio=True), - dict( - type='RandomCrop', - crop_size=(640, 640), - recompute_bbox=True, - allow_negative_crop=True), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="RandomResize", scale=(640, 640), ratio_range=(0.5, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640), recompute_bbox=True, allow_negative_crop=True), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='PipelineSwitchHook', - switch_epoch=280, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="PipelineSwitchHook", switch_epoch=280, switch_pipeline=train_pipeline_stage2), ] diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet-ins_tiny_8xb32-300e_coco.py b/mmpose/configs/mmdet/rtmdet/rtmdet-ins_tiny_8xb32-300e_coco.py index 954f911614e75eb9910effbf1bbc1d7b01120276..7f39351933b30fc111db567218d964bbd2363cbe 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet-ins_tiny_8xb32-300e_coco.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet-ins_tiny_8xb32-300e_coco.py @@ -1,48 +1,33 @@ -_base_ = './rtmdet-ins_s_8xb32-300e_coco.py' +_base_ = "./rtmdet-ins_s_8xb32-300e_coco.py" -checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-tiny_imagenet_600e.pth' # noqa +checkpoint = "https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-tiny_imagenet_600e.pth" # noqa model = dict( - backbone=dict( - deepen_factor=0.167, - widen_factor=0.375, - init_cfg=dict( - type='Pretrained', prefix='backbone.', checkpoint=checkpoint)), + backbone=dict(deepen_factor=0.167, widen_factor=0.375, init_cfg=dict(type="Pretrained", prefix="backbone.", checkpoint=checkpoint)), neck=dict(in_channels=[96, 192, 384], out_channels=96, num_csp_blocks=1), - bbox_head=dict(in_channels=96, feat_channels=96)) + bbox_head=dict(in_channels=96, feat_channels=96), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False), + dict(type="CachedMosaic", img_scale=(640, 640), pad_val=114.0, max_cached_images=20, random_pop=False), + dict(type="RandomResize", scale=(1280, 1280), ratio_range=(0.5, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), dict( - type='LoadAnnotations', - with_bbox=True, - with_mask=True, - poly2mask=False), - dict( - type='CachedMosaic', - img_scale=(640, 640), - pad_val=114.0, - max_cached_images=20, - random_pop=False), - dict( - type='RandomResize', - scale=(1280, 1280), - ratio_range=(0.5, 2.0), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(640, 640)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict( - type='CachedMixUp', + type="CachedMixUp", img_scale=(640, 640), ratio_range=(1.0, 1.0), max_cached_images=10, random_pop=False, pad_val=(114, 114, 114), - prob=0.5), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1)), - dict(type='PackDetInputs') + prob=0.5, + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1)), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet-ins_x_8xb16-300e_coco.py b/mmpose/configs/mmdet/rtmdet/rtmdet-ins_x_8xb16-300e_coco.py index daaa640edac6b2114caf13b650d99d7c7632629a..65155f942e24d052544ae4f79b76fd5007b39acd 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet-ins_x_8xb16-300e_coco.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet-ins_x_8xb16-300e_coco.py @@ -1,10 +1,10 @@ -_base_ = './rtmdet-ins_l_8xb32-300e_coco.py' +_base_ = "./rtmdet-ins_l_8xb32-300e_coco.py" model = dict( backbone=dict(deepen_factor=1.33, widen_factor=1.25), - neck=dict( - in_channels=[320, 640, 1280], out_channels=320, num_csp_blocks=4), - bbox_head=dict(in_channels=320, feat_channels=320)) + neck=dict(in_channels=[320, 640, 1280], out_channels=320, num_csp_blocks=4), + bbox_head=dict(in_channels=320, feat_channels=320), +) base_lr = 0.002 @@ -13,19 +13,15 @@ optim_wrapper = dict(optimizer=dict(lr=base_lr)) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 150 to 300 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=_base_.max_epochs // 2, end=_base_.max_epochs, T_max=_base_.max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet_l_8xb32-300e_coco.py b/mmpose/configs/mmdet/rtmdet/rtmdet_l_8xb32-300e_coco.py index 1cce4d89c84a81d7aa22197cd6dd70fe08637a35..a835b04b5c6172938bf0f3597970d6c663c55edb 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet_l_8xb32-300e_coco.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet_l_8xb32-300e_coco.py @@ -1,122 +1,83 @@ -_base_ = [ - '../_base_/default_runtime.py', '../_base_/schedules/schedule_1x.py', - '../_base_/datasets/coco_detection.py', './rtmdet_tta.py' -] +_base_ = ["../_base_/default_runtime.py", "../_base_/schedules/schedule_1x.py", "../_base_/datasets/coco_detection.py", "./rtmdet_tta.py"] model = dict( - type='RTMDet', + type="RTMDet", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.53, 116.28, 123.675], - std=[57.375, 57.12, 58.395], - bgr_to_rgb=False, - batch_augments=None), + type="DetDataPreprocessor", mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], bgr_to_rgb=False, batch_augments=None + ), backbone=dict( - type='CSPNeXt', - arch='P5', + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=1, widen_factor=1, channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU', inplace=True)), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU", inplace=True), + ), neck=dict( - type='CSPNeXtPAFPN', + type="CSPNeXtPAFPN", in_channels=[256, 512, 1024], out_channels=256, num_csp_blocks=3, expand_ratio=0.5, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU', inplace=True)), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU", inplace=True), + ), bbox_head=dict( - type='RTMDetSepBNHead', + type="RTMDetSepBNHead", num_classes=80, in_channels=256, stacked_convs=2, feat_channels=256, - anchor_generator=dict( - type='MlvlPointGenerator', offset=0, strides=[8, 16, 32]), - bbox_coder=dict(type='DistancePointBBoxCoder'), - loss_cls=dict( - type='QualityFocalLoss', - use_sigmoid=True, - beta=2.0, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=2.0), + anchor_generator=dict(type="MlvlPointGenerator", offset=0, strides=[8, 16, 32]), + bbox_coder=dict(type="DistancePointBBoxCoder"), + loss_cls=dict(type="QualityFocalLoss", use_sigmoid=True, beta=2.0, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), with_objectness=False, exp_on_reg=True, share_conv=True, pred_kernel_size=1, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU', inplace=True)), - train_cfg=dict( - assigner=dict(type='DynamicSoftLabelAssigner', topk=13), - allowed_border=-1, - pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=30000, - min_bbox_size=0, - score_thr=0.001, - nms=dict(type='nms', iou_threshold=0.65), - max_per_img=300), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU", inplace=True), + ), + train_cfg=dict(assigner=dict(type="DynamicSoftLabelAssigner", topk=13), allowed_border=-1, pos_weight=-1, debug=False), + test_cfg=dict(nms_pre=30000, min_bbox_size=0, score_thr=0.001, nms=dict(type="nms", iou_threshold=0.65), max_per_img=300), ) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='CachedMosaic', img_scale=(640, 640), pad_val=114.0), - dict( - type='RandomResize', - scale=(1280, 1280), - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(640, 640)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict( - type='CachedMixUp', - img_scale=(640, 640), - ratio_range=(1.0, 1.0), - max_cached_images=20, - pad_val=(114, 114, 114)), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="CachedMosaic", img_scale=(640, 640), pad_val=114.0), + dict(type="RandomResize", scale=(1280, 1280), ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="CachedMixUp", img_scale=(640, 640), ratio_range=(1.0, 1.0), max_cached_images=20, pad_val=(114, 114, 114)), + dict(type="PackDetInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', - scale=(640, 640), - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(640, 640)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=(640, 640), ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(640, 640), keep_ratio=True), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(640, 640), keep_ratio=True), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -train_dataloader = dict( - batch_size=32, - num_workers=10, - batch_sampler=None, - pin_memory=True, - dataset=dict(pipeline=train_pipeline)) -val_dataloader = dict( - batch_size=5, num_workers=10, dataset=dict(pipeline=test_pipeline)) +train_dataloader = dict(batch_size=32, num_workers=10, batch_sampler=None, pin_memory=True, dataset=dict(pipeline=train_pipeline)) +val_dataloader = dict(batch_size=5, num_workers=10, dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader max_epochs = 300 @@ -124,10 +85,7 @@ stage2_num_epochs = 20 base_lr = 0.004 interval = 10 -train_cfg = dict( - max_epochs=max_epochs, - val_interval=interval, - dynamic_intervals=[(max_epochs - stage2_num_epochs, 1)]) +train_cfg = dict(max_epochs=max_epochs, val_interval=interval, dynamic_intervals=[(max_epochs - stage2_num_epochs, 1)]) val_evaluator = dict(proposal_nums=(100, 1, 10)) test_evaluator = val_evaluator @@ -135,45 +93,29 @@ test_evaluator = val_evaluator # optimizer optim_wrapper = dict( _delete_=True, - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 150 to 300 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # hooks -default_hooks = dict( - checkpoint=dict( - interval=interval, - max_keep_ckpts=3 # only keep latest 3 checkpoints - )) +default_hooks = dict(checkpoint=dict(interval=interval, max_keep_ckpts=3)) # only keep latest 3 checkpoints custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet_l_convnext_b_4xb32-100e_coco.py b/mmpose/configs/mmdet/rtmdet/rtmdet_l_convnext_b_4xb32-100e_coco.py index 85af292bcaba2e1853ed4f3a3f5818c0c0d5813e..79daa6eaa3c003b081cf25365690e2d2327e4b24 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet_l_convnext_b_4xb32-100e_coco.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet_l_convnext_b_4xb32-100e_coco.py @@ -1,81 +1,65 @@ -_base_ = './rtmdet_l_8xb32-300e_coco.py' +_base_ = "./rtmdet_l_8xb32-300e_coco.py" -custom_imports = dict( - imports=['mmpretrain.models'], allow_failed_imports=False) +custom_imports = dict(imports=["mmpretrain.models"], allow_failed_imports=False) -norm_cfg = dict(type='GN', num_groups=32) -checkpoint_file = 'https://download.openmmlab.com/mmclassification/v0/convnext/convnext-base_in21k-pre-3rdparty_in1k-384px_20221219-4570f792.pth' # noqa +norm_cfg = dict(type="GN", num_groups=32) +checkpoint_file = ( + "https://download.openmmlab.com/mmclassification/v0/convnext/convnext-base_in21k-pre-3rdparty_in1k-384px_20221219-4570f792.pth" # noqa +) model = dict( - type='RTMDet', + type="RTMDet", data_preprocessor=dict( _delete_=True, - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, - batch_augments=None), + batch_augments=None, + ), backbone=dict( _delete_=True, - type='mmpretrain.ConvNeXt', - arch='base', + type="mmpretrain.ConvNeXt", + arch="base", out_indices=[1, 2, 3], drop_path_rate=0.7, layer_scale_init_value=1.0, gap_before_final_norm=False, with_cp=True, - init_cfg=dict( - type='Pretrained', checkpoint=checkpoint_file, - prefix='backbone.')), + init_cfg=dict(type="Pretrained", checkpoint=checkpoint_file, prefix="backbone."), + ), neck=dict(in_channels=[256, 512, 1024], norm_cfg=norm_cfg), - bbox_head=dict(norm_cfg=norm_cfg)) + bbox_head=dict(norm_cfg=norm_cfg), +) max_epochs = 100 stage2_num_epochs = 10 interval = 10 base_lr = 0.001 -train_cfg = dict( - max_epochs=max_epochs, - val_interval=interval, - dynamic_intervals=[(max_epochs - stage2_num_epochs, 1)]) +train_cfg = dict(max_epochs=max_epochs, val_interval=interval, dynamic_intervals=[(max_epochs - stage2_num_epochs, 1)]) optim_wrapper = dict( - constructor='LearningRateDecayOptimizerConstructor', - paramwise_cfg={ - 'decay_rate': 0.8, - 'decay_type': 'layer_wise', - 'num_layers': 12 - }, - optimizer=dict(lr=base_lr)) + constructor="LearningRateDecayOptimizerConstructor", + paramwise_cfg={"decay_rate": 0.8, "decay_type": "layer_wise", "num_layers": 12}, + optimizer=dict(lr=base_lr), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 50 to 100 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline={{_base_.train_pipeline_stage2}}) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline={{_base_.train_pipeline_stage2}}), ] diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet_l_swin_b_4xb32-100e_coco.py b/mmpose/configs/mmdet/rtmdet/rtmdet_l_swin_b_4xb32-100e_coco.py index 84b0e0fa7d18848a4c1e305985e33e69e3196790..4017721d94606b93870031aef940261a0481833a 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet_l_swin_b_4xb32-100e_coco.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet_l_swin_b_4xb32-100e_coco.py @@ -1,19 +1,20 @@ -_base_ = './rtmdet_l_8xb32-300e_coco.py' +_base_ = "./rtmdet_l_8xb32-300e_coco.py" -norm_cfg = dict(type='GN', num_groups=32) -checkpoint = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth' # noqa +norm_cfg = dict(type="GN", num_groups=32) +checkpoint = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth" # noqa model = dict( - type='RTMDet', + type="RTMDet", data_preprocessor=dict( _delete_=True, - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, - batch_augments=None), + batch_augments=None, + ), backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", pretrain_img_size=384, embed_dims=128, depths=[2, 2, 18, 2], @@ -22,57 +23,44 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, out_indices=(1, 2, 3), with_cp=True, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=checkpoint)), + init_cfg=dict(type="Pretrained", checkpoint=checkpoint), + ), neck=dict(in_channels=[256, 512, 1024], norm_cfg=norm_cfg), - bbox_head=dict(norm_cfg=norm_cfg)) + bbox_head=dict(norm_cfg=norm_cfg), +) max_epochs = 100 stage2_num_epochs = 10 interval = 10 base_lr = 0.001 -train_cfg = dict( - max_epochs=max_epochs, - val_interval=interval, - dynamic_intervals=[(max_epochs - stage2_num_epochs, 1)]) +train_cfg = dict(max_epochs=max_epochs, val_interval=interval, dynamic_intervals=[(max_epochs - stage2_num_epochs, 1)]) optim_wrapper = dict(optimizer=dict(lr=base_lr)) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 50 to 100 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline={{_base_.train_pipeline_stage2}}) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline={{_base_.train_pipeline_stage2}}), ] diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet_l_swin_b_p6_4xb16-100e_coco.py b/mmpose/configs/mmdet/rtmdet/rtmdet_l_swin_b_p6_4xb16-100e_coco.py index 37d4215c3f014ef20c7817875cbc1689186e0766..ababe410a36823c6a014eb37857175290c96a09f 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet_l_swin_b_p6_4xb16-100e_coco.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet_l_swin_b_p6_4xb16-100e_coco.py @@ -1,66 +1,44 @@ -_base_ = './rtmdet_l_swin_b_4xb32-100e_coco.py' +_base_ = "./rtmdet_l_swin_b_4xb32-100e_coco.py" model = dict( - backbone=dict( - depths=[2, 2, 18, 2, 1], - num_heads=[4, 8, 16, 32, 64], - strides=(4, 2, 2, 2, 2), - out_indices=(1, 2, 3, 4)), + backbone=dict(depths=[2, 2, 18, 2, 1], num_heads=[4, 8, 16, 32, 64], strides=(4, 2, 2, 2, 2), out_indices=(1, 2, 3, 4)), neck=dict(in_channels=[256, 512, 1024, 2048]), - bbox_head=dict( - anchor_generator=dict( - type='MlvlPointGenerator', offset=0, strides=[8, 16, 32, 64]))) + bbox_head=dict(anchor_generator=dict(type="MlvlPointGenerator", offset=0, strides=[8, 16, 32, 64])), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='CachedMosaic', img_scale=(1280, 1280), pad_val=114.0), - dict( - type='RandomResize', - scale=(2560, 2560), - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(1280, 1280)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(1280, 1280), pad_val=dict(img=(114, 114, 114))), - dict( - type='CachedMixUp', - img_scale=(1280, 1280), - ratio_range=(1.0, 1.0), - max_cached_images=20, - pad_val=(114, 114, 114)), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="CachedMosaic", img_scale=(1280, 1280), pad_val=114.0), + dict(type="RandomResize", scale=(2560, 2560), ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(1280, 1280)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(1280, 1280), pad_val=dict(img=(114, 114, 114))), + dict(type="CachedMixUp", img_scale=(1280, 1280), ratio_range=(1.0, 1.0), max_cached_images=20, pad_val=(114, 114, 114)), + dict(type="PackDetInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', - scale=(1280, 1280), - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(1280, 1280)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(1280, 1280), pad_val=dict(img=(114, 114, 114))), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=(1280, 1280), ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(1280, 1280)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(1280, 1280), pad_val=dict(img=(114, 114, 114))), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(1280, 1280), keep_ratio=True), - dict(type='Pad', size=(1280, 1280), pad_val=dict(img=(114, 114, 114))), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(1280, 1280), keep_ratio=True), + dict(type="Pad", size=(1280, 1280), pad_val=dict(img=(114, 114, 114))), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -train_dataloader = dict( - batch_size=16, num_workers=20, dataset=dict(pipeline=train_pipeline)) +train_dataloader = dict(batch_size=16, num_workers=20, dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(num_workers=20, dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader @@ -68,47 +46,34 @@ max_epochs = 100 stage2_num_epochs = 10 custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] img_scales = [(1280, 1280), (640, 640), (1920, 1920)] tta_pipeline = [ - dict(type='LoadImageFromFile', backend_args=None), + dict(type="LoadImageFromFile", backend_args=None), dict( - type='TestTimeAug', + type="TestTimeAug", transforms=[ - [ - dict(type='Resize', scale=s, keep_ratio=True) - for s in img_scales - ], + [dict(type="Resize", scale=s, keep_ratio=True) for s in img_scales], [ # ``RandomFlip`` must be placed before ``Pad``, otherwise # bounding box coordinates after flipping cannot be # recovered correctly. - dict(type='RandomFlip', prob=1.), - dict(type='RandomFlip', prob=0.) + dict(type="RandomFlip", prob=1.0), + dict(type="RandomFlip", prob=0.0), ], [ - dict( - type='Pad', - size=(1920, 1920), - pad_val=dict(img=(114, 114, 114))), + dict(type="Pad", size=(1920, 1920), pad_val=dict(img=(114, 114, 114))), ], - [dict(type='LoadAnnotations', with_bbox=True)], + [dict(type="LoadAnnotations", with_bbox=True)], [ dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction')) - ] - ]) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction"), + ) + ], + ], + ), ] diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet_m_8xb32-300e_coco.py b/mmpose/configs/mmdet/rtmdet/rtmdet_m_8xb32-300e_coco.py index c83f5a60bd7d9f85f46574ee4cd19027391b5e1e..50fd8bf9b2f3a0b3f3ac4af05333bc6629d8e90e 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet_m_8xb32-300e_coco.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet_m_8xb32-300e_coco.py @@ -1,6 +1,7 @@ -_base_ = './rtmdet_l_8xb32-300e_coco.py' +_base_ = "./rtmdet_l_8xb32-300e_coco.py" model = dict( backbone=dict(deepen_factor=0.67, widen_factor=0.75), neck=dict(in_channels=[192, 384, 768], out_channels=192, num_csp_blocks=2), - bbox_head=dict(in_channels=192, feat_channels=192)) + bbox_head=dict(in_channels=192, feat_channels=192), +) diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet_s_8xb32-300e_coco.py b/mmpose/configs/mmdet/rtmdet/rtmdet_s_8xb32-300e_coco.py index cbf76247b74e94735eea0dd70ce6ac9e57f4dadf..bdedf356cc3c4a633a3b086dfd87b79de62105f5 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet_s_8xb32-300e_coco.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet_s_8xb32-300e_coco.py @@ -1,62 +1,38 @@ -_base_ = './rtmdet_l_8xb32-300e_coco.py' -checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-s_imagenet_600e.pth' # noqa +_base_ = "./rtmdet_l_8xb32-300e_coco.py" +checkpoint = "https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-s_imagenet_600e.pth" # noqa model = dict( - backbone=dict( - deepen_factor=0.33, - widen_factor=0.5, - init_cfg=dict( - type='Pretrained', prefix='backbone.', checkpoint=checkpoint)), + backbone=dict(deepen_factor=0.33, widen_factor=0.5, init_cfg=dict(type="Pretrained", prefix="backbone.", checkpoint=checkpoint)), neck=dict(in_channels=[128, 256, 512], out_channels=128, num_csp_blocks=1), - bbox_head=dict(in_channels=128, feat_channels=128, exp_on_reg=False)) + bbox_head=dict(in_channels=128, feat_channels=128, exp_on_reg=False), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='CachedMosaic', img_scale=(640, 640), pad_val=114.0), - dict( - type='RandomResize', - scale=(1280, 1280), - ratio_range=(0.5, 2.0), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(640, 640)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict( - type='CachedMixUp', - img_scale=(640, 640), - ratio_range=(1.0, 1.0), - max_cached_images=20, - pad_val=(114, 114, 114)), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="CachedMosaic", img_scale=(640, 640), pad_val=114.0), + dict(type="RandomResize", scale=(1280, 1280), ratio_range=(0.5, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="CachedMixUp", img_scale=(640, 640), ratio_range=(1.0, 1.0), max_cached_images=20, pad_val=(114, 114, 114)), + dict(type="PackDetInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', - scale=(640, 640), - ratio_range=(0.5, 2.0), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(640, 640)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=(640, 640), ratio_range=(0.5, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='PipelineSwitchHook', - switch_epoch=280, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="PipelineSwitchHook", switch_epoch=280, switch_pipeline=train_pipeline_stage2), ] diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet_tiny_8xb32-300e_coco.py b/mmpose/configs/mmdet/rtmdet/rtmdet_tiny_8xb32-300e_coco.py index a686f4a7f0c4c3bed956c2a3fa504ea8863c669d..391de512f5e181458ae9128ecdd8b4e9d1799dbf 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet_tiny_8xb32-300e_coco.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet_tiny_8xb32-300e_coco.py @@ -1,43 +1,32 @@ -_base_ = './rtmdet_s_8xb32-300e_coco.py' +_base_ = "./rtmdet_s_8xb32-300e_coco.py" -checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-tiny_imagenet_600e.pth' # noqa +checkpoint = "https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-tiny_imagenet_600e.pth" # noqa model = dict( - backbone=dict( - deepen_factor=0.167, - widen_factor=0.375, - init_cfg=dict( - type='Pretrained', prefix='backbone.', checkpoint=checkpoint)), + backbone=dict(deepen_factor=0.167, widen_factor=0.375, init_cfg=dict(type="Pretrained", prefix="backbone.", checkpoint=checkpoint)), neck=dict(in_channels=[96, 192, 384], out_channels=96, num_csp_blocks=1), - bbox_head=dict(in_channels=96, feat_channels=96, exp_on_reg=False)) + bbox_head=dict(in_channels=96, feat_channels=96, exp_on_reg=False), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="CachedMosaic", img_scale=(640, 640), pad_val=114.0, max_cached_images=20, random_pop=False), + dict(type="RandomResize", scale=(1280, 1280), ratio_range=(0.5, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(640, 640)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(640, 640), pad_val=dict(img=(114, 114, 114))), dict( - type='CachedMosaic', - img_scale=(640, 640), - pad_val=114.0, - max_cached_images=20, - random_pop=False), - dict( - type='RandomResize', - scale=(1280, 1280), - ratio_range=(0.5, 2.0), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(640, 640)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), - dict( - type='CachedMixUp', + type="CachedMixUp", img_scale=(640, 640), ratio_range=(1.0, 1.0), max_cached_images=10, random_pop=False, pad_val=(114, 114, 114), - prob=0.5), - dict(type='PackDetInputs') + prob=0.5, + ), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet_tta.py b/mmpose/configs/mmdet/rtmdet/rtmdet_tta.py index 6dde36de3ff06576944a351de9daf53746103f21..60be3cfda14e07d44d58ba415953fa7c8febbf8e 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet_tta.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet_tta.py @@ -1,36 +1,29 @@ -tta_model = dict( - type='DetTTAModel', - tta_cfg=dict(nms=dict(type='nms', iou_threshold=0.6), max_per_img=100)) +tta_model = dict(type="DetTTAModel", tta_cfg=dict(nms=dict(type="nms", iou_threshold=0.6), max_per_img=100)) img_scales = [(640, 640), (320, 320), (960, 960)] tta_pipeline = [ - dict(type='LoadImageFromFile', backend_args=None), + dict(type="LoadImageFromFile", backend_args=None), dict( - type='TestTimeAug', + type="TestTimeAug", transforms=[ - [ - dict(type='Resize', scale=s, keep_ratio=True) - for s in img_scales - ], + [dict(type="Resize", scale=s, keep_ratio=True) for s in img_scales], [ # ``RandomFlip`` must be placed before ``Pad``, otherwise # bounding box coordinates after flipping cannot be # recovered correctly. - dict(type='RandomFlip', prob=1.), - dict(type='RandomFlip', prob=0.) + dict(type="RandomFlip", prob=1.0), + dict(type="RandomFlip", prob=0.0), ], [ - dict( - type='Pad', - size=(960, 960), - pad_val=dict(img=(114, 114, 114))), + dict(type="Pad", size=(960, 960), pad_val=dict(img=(114, 114, 114))), ], - [dict(type='LoadAnnotations', with_bbox=True)], + [dict(type="LoadAnnotations", with_bbox=True)], [ dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction')) - ] - ]) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction"), + ) + ], + ], + ), ] diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet_x_8xb32-300e_coco.py b/mmpose/configs/mmdet/rtmdet/rtmdet_x_8xb32-300e_coco.py index 16a33632c00b19b270b237f5dcd8f603350ac0c9..4bdd6aac12d3943966047db2e6ec904205140d5b 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet_x_8xb32-300e_coco.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet_x_8xb32-300e_coco.py @@ -1,7 +1,7 @@ -_base_ = './rtmdet_l_8xb32-300e_coco.py' +_base_ = "./rtmdet_l_8xb32-300e_coco.py" model = dict( backbone=dict(deepen_factor=1.33, widen_factor=1.25), - neck=dict( - in_channels=[320, 640, 1280], out_channels=320, num_csp_blocks=4), - bbox_head=dict(in_channels=320, feat_channels=320)) + neck=dict(in_channels=[320, 640, 1280], out_channels=320, num_csp_blocks=4), + bbox_head=dict(in_channels=320, feat_channels=320), +) diff --git a/mmpose/configs/mmdet/rtmdet/rtmdet_x_p6_4xb8-300e_coco.py b/mmpose/configs/mmdet/rtmdet/rtmdet_x_p6_4xb8-300e_coco.py index d1bb7fa6a78812e5a415acfb60eccedae9b884e2..ccb3b887a7d781d095a096d26a24adf6c78eed50 100644 --- a/mmpose/configs/mmdet/rtmdet/rtmdet_x_p6_4xb8-300e_coco.py +++ b/mmpose/configs/mmdet/rtmdet/rtmdet_x_p6_4xb8-300e_coco.py @@ -1,64 +1,45 @@ -_base_ = './rtmdet_x_8xb32-300e_coco.py' +_base_ = "./rtmdet_x_8xb32-300e_coco.py" model = dict( - backbone=dict(arch='P6', out_indices=(2, 3, 4, 5)), + backbone=dict(arch="P6", out_indices=(2, 3, 4, 5)), neck=dict(in_channels=[320, 640, 960, 1280]), - bbox_head=dict( - anchor_generator=dict( - type='MlvlPointGenerator', offset=0, strides=[8, 16, 32, 64]))) + bbox_head=dict(anchor_generator=dict(type="MlvlPointGenerator", offset=0, strides=[8, 16, 32, 64])), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='CachedMosaic', img_scale=(1280, 1280), pad_val=114.0), - dict( - type='RandomResize', - scale=(2560, 2560), - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(1280, 1280)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(1280, 1280), pad_val=dict(img=(114, 114, 114))), - dict( - type='CachedMixUp', - img_scale=(1280, 1280), - ratio_range=(1.0, 1.0), - max_cached_images=20, - pad_val=(114, 114, 114)), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="CachedMosaic", img_scale=(1280, 1280), pad_val=114.0), + dict(type="RandomResize", scale=(2560, 2560), ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(1280, 1280)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(1280, 1280), pad_val=dict(img=(114, 114, 114))), + dict(type="CachedMixUp", img_scale=(1280, 1280), ratio_range=(1.0, 1.0), max_cached_images=20, pad_val=(114, 114, 114)), + dict(type="PackDetInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', - scale=(1280, 1280), - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(1280, 1280)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Pad', size=(1280, 1280), pad_val=dict(img=(114, 114, 114))), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=(1280, 1280), ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_size=(1280, 1280)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(1280, 1280), pad_val=dict(img=(114, 114, 114))), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(1280, 1280), keep_ratio=True), - dict(type='Pad', size=(1280, 1280), pad_val=dict(img=(114, 114, 114))), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(1280, 1280), keep_ratio=True), + dict(type="Pad", size=(1280, 1280), pad_val=dict(img=(114, 114, 114))), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -train_dataloader = dict( - batch_size=8, num_workers=20, dataset=dict(pipeline=train_pipeline)) -val_dataloader = dict( - batch_size=5, num_workers=20, dataset=dict(pipeline=test_pipeline)) +train_dataloader = dict(batch_size=8, num_workers=20, dataset=dict(pipeline=train_pipeline)) +val_dataloader = dict(batch_size=5, num_workers=20, dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader max_epochs = 300 @@ -68,65 +49,48 @@ base_lr = 0.004 * 32 / 256 optim_wrapper = dict(optimizer=dict(lr=base_lr)) param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 150 to 300 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] img_scales = [(1280, 1280), (640, 640), (1920, 1920)] tta_pipeline = [ - dict(type='LoadImageFromFile', backend_args=None), + dict(type="LoadImageFromFile", backend_args=None), dict( - type='TestTimeAug', + type="TestTimeAug", transforms=[ - [ - dict(type='Resize', scale=s, keep_ratio=True) - for s in img_scales - ], + [dict(type="Resize", scale=s, keep_ratio=True) for s in img_scales], [ # ``RandomFlip`` must be placed before ``Pad``, otherwise # bounding box coordinates after flipping cannot be # recovered correctly. - dict(type='RandomFlip', prob=1.), - dict(type='RandomFlip', prob=0.) + dict(type="RandomFlip", prob=1.0), + dict(type="RandomFlip", prob=0.0), ], [ - dict( - type='Pad', - size=(1920, 1920), - pad_val=dict(img=(114, 114, 114))), + dict(type="Pad", size=(1920, 1920), pad_val=dict(img=(114, 114, 114))), ], - [dict(type='LoadAnnotations', with_bbox=True)], + [dict(type="LoadAnnotations", with_bbox=True)], [ dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction')) - ] - ]) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction"), + ) + ], + ], + ), ] diff --git a/mmpose/configs/mmdet/sabl/sabl-cascade-rcnn_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/sabl/sabl-cascade-rcnn_r101_fpn_1x_coco.py index 404e7fcb2ac52773c9bc74f411e66584114f378e..93abb24385b2ef10de93851a84738f1d42f6e07a 100644 --- a/mmpose/configs/mmdet/sabl/sabl-cascade-rcnn_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/sabl/sabl-cascade-rcnn_r101_fpn_1x_coco.py @@ -1,90 +1,83 @@ _base_ = [ - '../_base_/models/cascade-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] # model settings model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101')), - roi_head=dict(bbox_head=[ - dict( - type='SABLHead', - num_classes=80, - cls_in_channels=256, - reg_in_channels=256, - roi_feat_size=7, - reg_feat_up_ratio=2, - reg_pre_kernel=3, - reg_post_kernel=3, - reg_pre_num=2, - reg_post_num=1, - cls_out_channels=1024, - reg_offset_out_channels=256, - reg_cls_out_channels=256, - num_cls_fcs=1, - num_reg_fcs=0, - reg_class_agnostic=True, - norm_cfg=None, - bbox_coder=dict( - type='BucketingBBoxCoder', num_buckets=14, scale_factor=1.7), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox_reg=dict(type='SmoothL1Loss', beta=0.1, - loss_weight=1.0)), - dict( - type='SABLHead', - num_classes=80, - cls_in_channels=256, - reg_in_channels=256, - roi_feat_size=7, - reg_feat_up_ratio=2, - reg_pre_kernel=3, - reg_post_kernel=3, - reg_pre_num=2, - reg_post_num=1, - cls_out_channels=1024, - reg_offset_out_channels=256, - reg_cls_out_channels=256, - num_cls_fcs=1, - num_reg_fcs=0, - reg_class_agnostic=True, - norm_cfg=None, - bbox_coder=dict( - type='BucketingBBoxCoder', num_buckets=14, scale_factor=1.5), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox_reg=dict(type='SmoothL1Loss', beta=0.1, - loss_weight=1.0)), - dict( - type='SABLHead', - num_classes=80, - cls_in_channels=256, - reg_in_channels=256, - roi_feat_size=7, - reg_feat_up_ratio=2, - reg_pre_kernel=3, - reg_post_kernel=3, - reg_pre_num=2, - reg_post_num=1, - cls_out_channels=1024, - reg_offset_out_channels=256, - reg_cls_out_channels=256, - num_cls_fcs=1, - num_reg_fcs=0, - reg_class_agnostic=True, - norm_cfg=None, - bbox_coder=dict( - type='BucketingBBoxCoder', num_buckets=14, scale_factor=1.3), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox_reg=dict(type='SmoothL1Loss', beta=0.1, loss_weight=1.0)) - ])) + backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101")), + roi_head=dict( + bbox_head=[ + dict( + type="SABLHead", + num_classes=80, + cls_in_channels=256, + reg_in_channels=256, + roi_feat_size=7, + reg_feat_up_ratio=2, + reg_pre_kernel=3, + reg_post_kernel=3, + reg_pre_num=2, + reg_post_num=1, + cls_out_channels=1024, + reg_offset_out_channels=256, + reg_cls_out_channels=256, + num_cls_fcs=1, + num_reg_fcs=0, + reg_class_agnostic=True, + norm_cfg=None, + bbox_coder=dict(type="BucketingBBoxCoder", num_buckets=14, scale_factor=1.7), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox_reg=dict(type="SmoothL1Loss", beta=0.1, loss_weight=1.0), + ), + dict( + type="SABLHead", + num_classes=80, + cls_in_channels=256, + reg_in_channels=256, + roi_feat_size=7, + reg_feat_up_ratio=2, + reg_pre_kernel=3, + reg_post_kernel=3, + reg_pre_num=2, + reg_post_num=1, + cls_out_channels=1024, + reg_offset_out_channels=256, + reg_cls_out_channels=256, + num_cls_fcs=1, + num_reg_fcs=0, + reg_class_agnostic=True, + norm_cfg=None, + bbox_coder=dict(type="BucketingBBoxCoder", num_buckets=14, scale_factor=1.5), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox_reg=dict(type="SmoothL1Loss", beta=0.1, loss_weight=1.0), + ), + dict( + type="SABLHead", + num_classes=80, + cls_in_channels=256, + reg_in_channels=256, + roi_feat_size=7, + reg_feat_up_ratio=2, + reg_pre_kernel=3, + reg_post_kernel=3, + reg_pre_num=2, + reg_post_num=1, + cls_out_channels=1024, + reg_offset_out_channels=256, + reg_cls_out_channels=256, + num_cls_fcs=1, + num_reg_fcs=0, + reg_class_agnostic=True, + norm_cfg=None, + bbox_coder=dict(type="BucketingBBoxCoder", num_buckets=14, scale_factor=1.3), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox_reg=dict(type="SmoothL1Loss", beta=0.1, loss_weight=1.0), + ), + ] + ), +) diff --git a/mmpose/configs/mmdet/sabl/sabl-cascade-rcnn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/sabl/sabl-cascade-rcnn_r50_fpn_1x_coco.py index 69c59ca20d6c16e458292a55b8e4258a3d9a06bb..8790988d30fbf191fa5ce88857d5318b793a27a4 100644 --- a/mmpose/configs/mmdet/sabl/sabl-cascade-rcnn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/sabl/sabl-cascade-rcnn_r50_fpn_1x_coco.py @@ -1,86 +1,82 @@ _base_ = [ - '../_base_/models/cascade-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] # model settings model = dict( - roi_head=dict(bbox_head=[ - dict( - type='SABLHead', - num_classes=80, - cls_in_channels=256, - reg_in_channels=256, - roi_feat_size=7, - reg_feat_up_ratio=2, - reg_pre_kernel=3, - reg_post_kernel=3, - reg_pre_num=2, - reg_post_num=1, - cls_out_channels=1024, - reg_offset_out_channels=256, - reg_cls_out_channels=256, - num_cls_fcs=1, - num_reg_fcs=0, - reg_class_agnostic=True, - norm_cfg=None, - bbox_coder=dict( - type='BucketingBBoxCoder', num_buckets=14, scale_factor=1.7), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox_reg=dict(type='SmoothL1Loss', beta=0.1, - loss_weight=1.0)), - dict( - type='SABLHead', - num_classes=80, - cls_in_channels=256, - reg_in_channels=256, - roi_feat_size=7, - reg_feat_up_ratio=2, - reg_pre_kernel=3, - reg_post_kernel=3, - reg_pre_num=2, - reg_post_num=1, - cls_out_channels=1024, - reg_offset_out_channels=256, - reg_cls_out_channels=256, - num_cls_fcs=1, - num_reg_fcs=0, - reg_class_agnostic=True, - norm_cfg=None, - bbox_coder=dict( - type='BucketingBBoxCoder', num_buckets=14, scale_factor=1.5), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox_reg=dict(type='SmoothL1Loss', beta=0.1, - loss_weight=1.0)), - dict( - type='SABLHead', - num_classes=80, - cls_in_channels=256, - reg_in_channels=256, - roi_feat_size=7, - reg_feat_up_ratio=2, - reg_pre_kernel=3, - reg_post_kernel=3, - reg_pre_num=2, - reg_post_num=1, - cls_out_channels=1024, - reg_offset_out_channels=256, - reg_cls_out_channels=256, - num_cls_fcs=1, - num_reg_fcs=0, - reg_class_agnostic=True, - norm_cfg=None, - bbox_coder=dict( - type='BucketingBBoxCoder', num_buckets=14, scale_factor=1.3), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox_reg=dict(type='SmoothL1Loss', beta=0.1, loss_weight=1.0)) - ])) + roi_head=dict( + bbox_head=[ + dict( + type="SABLHead", + num_classes=80, + cls_in_channels=256, + reg_in_channels=256, + roi_feat_size=7, + reg_feat_up_ratio=2, + reg_pre_kernel=3, + reg_post_kernel=3, + reg_pre_num=2, + reg_post_num=1, + cls_out_channels=1024, + reg_offset_out_channels=256, + reg_cls_out_channels=256, + num_cls_fcs=1, + num_reg_fcs=0, + reg_class_agnostic=True, + norm_cfg=None, + bbox_coder=dict(type="BucketingBBoxCoder", num_buckets=14, scale_factor=1.7), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox_reg=dict(type="SmoothL1Loss", beta=0.1, loss_weight=1.0), + ), + dict( + type="SABLHead", + num_classes=80, + cls_in_channels=256, + reg_in_channels=256, + roi_feat_size=7, + reg_feat_up_ratio=2, + reg_pre_kernel=3, + reg_post_kernel=3, + reg_pre_num=2, + reg_post_num=1, + cls_out_channels=1024, + reg_offset_out_channels=256, + reg_cls_out_channels=256, + num_cls_fcs=1, + num_reg_fcs=0, + reg_class_agnostic=True, + norm_cfg=None, + bbox_coder=dict(type="BucketingBBoxCoder", num_buckets=14, scale_factor=1.5), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox_reg=dict(type="SmoothL1Loss", beta=0.1, loss_weight=1.0), + ), + dict( + type="SABLHead", + num_classes=80, + cls_in_channels=256, + reg_in_channels=256, + roi_feat_size=7, + reg_feat_up_ratio=2, + reg_pre_kernel=3, + reg_post_kernel=3, + reg_pre_num=2, + reg_post_num=1, + cls_out_channels=1024, + reg_offset_out_channels=256, + reg_cls_out_channels=256, + num_cls_fcs=1, + num_reg_fcs=0, + reg_class_agnostic=True, + norm_cfg=None, + bbox_coder=dict(type="BucketingBBoxCoder", num_buckets=14, scale_factor=1.3), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox_reg=dict(type="SmoothL1Loss", beta=0.1, loss_weight=1.0), + ), + ] + ) +) diff --git a/mmpose/configs/mmdet/sabl/sabl-faster-rcnn_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/sabl/sabl-faster-rcnn_r101_fpn_1x_coco.py index d1bf8b9c8cf1ac62d351456e7b19f75259ec0625..8bc0edc5899b02b655fda5c5b63fa181bd145520 100644 --- a/mmpose/configs/mmdet/sabl/sabl-faster-rcnn_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/sabl/sabl-faster-rcnn_r101_fpn_1x_coco.py @@ -1,17 +1,15 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101')), + backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101")), roi_head=dict( bbox_head=dict( _delete_=True, - type='SABLHead', + type="SABLHead", num_classes=80, cls_in_channels=256, reg_in_channels=256, @@ -28,11 +26,10 @@ model = dict( num_reg_fcs=0, reg_class_agnostic=True, norm_cfg=None, - bbox_coder=dict( - type='BucketingBBoxCoder', num_buckets=14, scale_factor=1.7), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox_reg=dict(type='SmoothL1Loss', beta=0.1, - loss_weight=1.0)))) + bbox_coder=dict(type="BucketingBBoxCoder", num_buckets=14, scale_factor=1.7), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox_reg=dict(type="SmoothL1Loss", beta=0.1, loss_weight=1.0), + ) + ), +) diff --git a/mmpose/configs/mmdet/sabl/sabl-faster-rcnn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/sabl/sabl-faster-rcnn_r50_fpn_1x_coco.py index a727bd6d3da09c86908c3c584509c5313cf732b5..825dd5cebf358618fde39aa746fc787f79bbe9f2 100644 --- a/mmpose/configs/mmdet/sabl/sabl-faster-rcnn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/sabl/sabl-faster-rcnn_r50_fpn_1x_coco.py @@ -1,13 +1,14 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( roi_head=dict( bbox_head=dict( _delete_=True, - type='SABLHead', + type="SABLHead", num_classes=80, cls_in_channels=256, reg_in_channels=256, @@ -24,11 +25,10 @@ model = dict( num_reg_fcs=0, reg_class_agnostic=True, norm_cfg=None, - bbox_coder=dict( - type='BucketingBBoxCoder', num_buckets=14, scale_factor=1.7), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox_reg=dict(type='SmoothL1Loss', beta=0.1, - loss_weight=1.0)))) + bbox_coder=dict(type="BucketingBBoxCoder", num_buckets=14, scale_factor=1.7), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox_reg=dict(type="SmoothL1Loss", beta=0.1, loss_weight=1.0), + ) + ) +) diff --git a/mmpose/configs/mmdet/sabl/sabl-retinanet_r101-gn_fpn_1x_coco.py b/mmpose/configs/mmdet/sabl/sabl-retinanet_r101-gn_fpn_1x_coco.py index f181ad6813e4c6e3729ff80b3b8d915d84b53bf2..4778cb542dd61c99b7eeba7bc6b297faa179c980 100644 --- a/mmpose/configs/mmdet/sabl/sabl-retinanet_r101-gn_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/sabl/sabl-retinanet_r101-gn_fpn_1x_coco.py @@ -1,57 +1,37 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] # model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101')), + backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101")), bbox_head=dict( _delete_=True, - type='SABLRetinaHead', + type="SABLRetinaHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, approx_anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=4, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[8, 16, 32, 64, 128]), - square_anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - scales=[4], - strides=[8, 16, 32, 64, 128]), + type="AnchorGenerator", octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128] + ), + square_anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], scales=[4], strides=[8, 16, 32, 64, 128]), norm_cfg=norm_cfg, - bbox_coder=dict( - type='BucketingBBoxCoder', num_buckets=14, scale_factor=3.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.5), - loss_bbox_reg=dict( - type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.5)), + bbox_coder=dict(type="BucketingBBoxCoder", num_buckets=14, scale_factor=3.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.5), + loss_bbox_reg=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.5), + ), # training and testing settings train_cfg=dict( - assigner=dict( - type='ApproxMaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0.0, - ignore_iof_thr=-1), + assigner=dict(type="ApproxMaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0.0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, - debug=False)) + debug=False, + ), +) # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/sabl/sabl-retinanet_r101-gn_fpn_ms-480-960-2x_coco.py b/mmpose/configs/mmdet/sabl/sabl-retinanet_r101-gn_fpn_ms-480-960-2x_coco.py index dc7209aebad3efcb88945460cf20b36e6ec4b419..de727d1b516a6474944b263e1f94a1097398ad49 100644 --- a/mmpose/configs/mmdet/sabl/sabl-retinanet_r101-gn_fpn_ms-480-960-2x_coco.py +++ b/mmpose/configs/mmdet/sabl/sabl-retinanet_r101-gn_fpn_ms-480-960-2x_coco.py @@ -1,68 +1,46 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] # model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101')), + backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101")), bbox_head=dict( _delete_=True, - type='SABLRetinaHead', + type="SABLRetinaHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, approx_anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=4, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[8, 16, 32, 64, 128]), - square_anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - scales=[4], - strides=[8, 16, 32, 64, 128]), + type="AnchorGenerator", octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128] + ), + square_anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], scales=[4], strides=[8, 16, 32, 64, 128]), norm_cfg=norm_cfg, - bbox_coder=dict( - type='BucketingBBoxCoder', num_buckets=14, scale_factor=3.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.5), - loss_bbox_reg=dict( - type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.5)), + bbox_coder=dict(type="BucketingBBoxCoder", num_buckets=14, scale_factor=3.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.5), + loss_bbox_reg=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.5), + ), # training and testing settings train_cfg=dict( - assigner=dict( - type='ApproxMaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0.0, - ignore_iof_thr=-1), + assigner=dict(type="ApproxMaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0.0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, - debug=False)) + debug=False, + ), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', scale=[(1333, 480), (1333, 960)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=[(1333, 480), (1333, 960)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/sabl/sabl-retinanet_r101-gn_fpn_ms-640-800-2x_coco.py b/mmpose/configs/mmdet/sabl/sabl-retinanet_r101-gn_fpn_ms-640-800-2x_coco.py index ac5f6d9811dc8e45cfc036b3a3d4a04e7fa5ee60..48e5019ad78eb41ae89b3e4e158a0e026fb94758 100644 --- a/mmpose/configs/mmdet/sabl/sabl-retinanet_r101-gn_fpn_ms-640-800-2x_coco.py +++ b/mmpose/configs/mmdet/sabl/sabl-retinanet_r101-gn_fpn_ms-640-800-2x_coco.py @@ -1,68 +1,46 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] # model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101')), + backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101")), bbox_head=dict( _delete_=True, - type='SABLRetinaHead', + type="SABLRetinaHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, approx_anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=4, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[8, 16, 32, 64, 128]), - square_anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - scales=[4], - strides=[8, 16, 32, 64, 128]), + type="AnchorGenerator", octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128] + ), + square_anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], scales=[4], strides=[8, 16, 32, 64, 128]), norm_cfg=norm_cfg, - bbox_coder=dict( - type='BucketingBBoxCoder', num_buckets=14, scale_factor=3.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.5), - loss_bbox_reg=dict( - type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.5)), + bbox_coder=dict(type="BucketingBBoxCoder", num_buckets=14, scale_factor=3.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.5), + loss_bbox_reg=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.5), + ), # training and testing settings train_cfg=dict( - assigner=dict( - type='ApproxMaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0.0, - ignore_iof_thr=-1), + assigner=dict(type="ApproxMaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0.0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, - debug=False)) + debug=False, + ), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', scale=[(1333, 480), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=[(1333, 480), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/sabl/sabl-retinanet_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/sabl/sabl-retinanet_r101_fpn_1x_coco.py index 409695b5dbccfe20bb6e85ee16231211c2ebcdba..5bb2502e6bcf7a67493f858e55a848aef4829bad 100644 --- a/mmpose/configs/mmdet/sabl/sabl-retinanet_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/sabl/sabl-retinanet_r101_fpn_1x_coco.py @@ -1,55 +1,35 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] # model settings model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101')), + backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101")), bbox_head=dict( _delete_=True, - type='SABLRetinaHead', + type="SABLRetinaHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, approx_anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=4, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[8, 16, 32, 64, 128]), - square_anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - scales=[4], - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict( - type='BucketingBBoxCoder', num_buckets=14, scale_factor=3.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.5), - loss_bbox_reg=dict( - type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.5)), + type="AnchorGenerator", octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128] + ), + square_anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], scales=[4], strides=[8, 16, 32, 64, 128]), + bbox_coder=dict(type="BucketingBBoxCoder", num_buckets=14, scale_factor=3.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.5), + loss_bbox_reg=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.5), + ), # training and testing settings train_cfg=dict( - assigner=dict( - type='ApproxMaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0.0, - ignore_iof_thr=-1), + assigner=dict(type="ApproxMaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0.0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, - debug=False)) + debug=False, + ), +) # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/sabl/sabl-retinanet_r50-gn_fpn_1x_coco.py b/mmpose/configs/mmdet/sabl/sabl-retinanet_r50-gn_fpn_1x_coco.py index 4facdb6aaab05fd04b95e8c3ba2f0460090b1d6c..47665a59f051b05afe4a8c449d42f94fdc74f519 100644 --- a/mmpose/configs/mmdet/sabl/sabl-retinanet_r50-gn_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/sabl/sabl-retinanet_r50-gn_fpn_1x_coco.py @@ -1,53 +1,36 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] # model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict( bbox_head=dict( _delete_=True, - type='SABLRetinaHead', + type="SABLRetinaHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, approx_anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=4, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[8, 16, 32, 64, 128]), - square_anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - scales=[4], - strides=[8, 16, 32, 64, 128]), + type="AnchorGenerator", octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128] + ), + square_anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], scales=[4], strides=[8, 16, 32, 64, 128]), norm_cfg=norm_cfg, - bbox_coder=dict( - type='BucketingBBoxCoder', num_buckets=14, scale_factor=3.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.5), - loss_bbox_reg=dict( - type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.5)), + bbox_coder=dict(type="BucketingBBoxCoder", num_buckets=14, scale_factor=3.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.5), + loss_bbox_reg=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.5), + ), # training and testing settings train_cfg=dict( - assigner=dict( - type='ApproxMaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0.0, - ignore_iof_thr=-1), + assigner=dict(type="ApproxMaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0.0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, - debug=False)) + debug=False, + ), +) # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/sabl/sabl-retinanet_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/sabl/sabl-retinanet_r50_fpn_1x_coco.py index 9073d6f002fcb49aecc280f318b8769b477d2d82..22a2045b959b664b905b144e2128082a10d0bda5 100644 --- a/mmpose/configs/mmdet/sabl/sabl-retinanet_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/sabl/sabl-retinanet_r50_fpn_1x_coco.py @@ -1,51 +1,34 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] # model settings model = dict( bbox_head=dict( _delete_=True, - type='SABLRetinaHead', + type="SABLRetinaHead", num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, approx_anchor_generator=dict( - type='AnchorGenerator', - octave_base_scale=4, - scales_per_octave=3, - ratios=[0.5, 1.0, 2.0], - strides=[8, 16, 32, 64, 128]), - square_anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - scales=[4], - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict( - type='BucketingBBoxCoder', num_buckets=14, scale_factor=3.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.5), - loss_bbox_reg=dict( - type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.5)), + type="AnchorGenerator", octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128] + ), + square_anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], scales=[4], strides=[8, 16, 32, 64, 128]), + bbox_coder=dict(type="BucketingBBoxCoder", num_buckets=14, scale_factor=3.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.5), + loss_bbox_reg=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.5), + ), # training and testing settings train_cfg=dict( - assigner=dict( - type='ApproxMaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0.0, - ignore_iof_thr=-1), + assigner=dict(type="ApproxMaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0.0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, - debug=False)) + debug=False, + ), +) # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/scnet/scnet_r101_fpn_20e_coco.py b/mmpose/configs/mmdet/scnet/scnet_r101_fpn_20e_coco.py index ebba52978b23c07a68e3563033c860a95dd515b6..fd335ba6389c7851243460b08d240d33a8b9ffea 100644 --- a/mmpose/configs/mmdet/scnet/scnet_r101_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/scnet/scnet_r101_fpn_20e_coco.py @@ -1,6 +1,2 @@ -_base_ = './scnet_r50_fpn_20e_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./scnet_r50_fpn_20e_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/scnet/scnet_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/scnet/scnet_r50_fpn_1x_coco.py index a0210fdb456c26b2c05d99a2435da14fc30f088d..1c0a192248de9e47fcae8cefc2fc9fc8d108912c 100644 --- a/mmpose/configs/mmdet/scnet/scnet_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/scnet/scnet_r50_fpn_1x_coco.py @@ -1,93 +1,79 @@ -_base_ = '../htc/htc_r50_fpn_1x_coco.py' +_base_ = "../htc/htc_r50_fpn_1x_coco.py" # model settings model = dict( - type='SCNet', + type="SCNet", roi_head=dict( _delete_=True, - type='SCNetRoIHead', + type="SCNetRoIHead", num_stages=3, stage_loss_weights=[1, 0.5, 0.25], bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=[ dict( - type='SCNetBBoxHead', + type="SCNetBBoxHead", num_shared_fcs=2, in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='SCNetBBoxHead', + type="SCNetBBoxHead", num_shared_fcs=2, in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.05, 0.05, 0.1, 0.1]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='SCNetBBoxHead', + type="SCNetBBoxHead", num_shared_fcs=2, in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.033, 0.033, 0.067, 0.067]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), ], mask_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), mask_head=dict( - type='SCNetMaskHead', + type="SCNetMaskHead", num_convs=12, in_channels=256, conv_out_channels=256, num_classes=80, conv_to_res=True, - loss_mask=dict( - type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)), + loss_mask=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0), + ), semantic_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=0), out_channels=256, - featmap_strides=[8]), + featmap_strides=[8], + ), semantic_head=dict( - type='SCNetSemanticHead', + type="SCNetSemanticHead", num_ins=5, fusion_level=1, seg_scale_factor=1 / 8, @@ -95,23 +81,15 @@ model = dict( in_channels=256, conv_out_channels=256, num_classes=183, - loss_seg=dict( - type='CrossEntropyLoss', ignore_index=255, loss_weight=0.2), - conv_to_res=True), + loss_seg=dict(type="CrossEntropyLoss", ignore_index=255, loss_weight=0.2), + conv_to_res=True, + ), glbctx_head=dict( - type='GlobalContextHead', - num_convs=4, - in_channels=256, - conv_out_channels=256, - num_classes=80, - loss_weight=3.0, - conv_to_res=True), - feat_relay_head=dict( - type='FeatureRelayHead', - in_channels=1024, - out_conv_channels=256, - roi_feat_size=7, - scale_factor=2))) + type="GlobalContextHead", num_convs=4, in_channels=256, conv_out_channels=256, num_classes=80, loss_weight=3.0, conv_to_res=True + ), + feat_relay_head=dict(type="FeatureRelayHead", in_channels=1024, out_conv_channels=256, roi_feat_size=7, scale_factor=2), + ), +) # TODO # uncomment below code to enable test time augmentations diff --git a/mmpose/configs/mmdet/scnet/scnet_r50_fpn_20e_coco.py b/mmpose/configs/mmdet/scnet/scnet_r50_fpn_20e_coco.py index 533e1b5f3253387788fbf1a9d6d7a38c7c5c5f30..6d7068e8f7c2267b544edfb37c40e78027e334a8 100644 --- a/mmpose/configs/mmdet/scnet/scnet_r50_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/scnet/scnet_r50_fpn_20e_coco.py @@ -1,15 +1,8 @@ -_base_ = './scnet_r50_fpn_1x_coco.py' +_base_ = "./scnet_r50_fpn_1x_coco.py" # learning policy max_epochs = 20 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 19], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 19], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/scnet/scnet_x101-64x4d_fpn_20e_coco.py b/mmpose/configs/mmdet/scnet/scnet_x101-64x4d_fpn_20e_coco.py index 1e54b030fa68f76f22edf66e3594d66a13c2c672..2e4566088a59909c252adaddc71b27ada74b1bff 100644 --- a/mmpose/configs/mmdet/scnet/scnet_x101-64x4d_fpn_20e_coco.py +++ b/mmpose/configs/mmdet/scnet/scnet_x101-64x4d_fpn_20e_coco.py @@ -1,15 +1,16 @@ -_base_ = './scnet_r50_fpn_20e_coco.py' +_base_ = "./scnet_r50_fpn_20e_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/scnet/scnet_x101-64x4d_fpn_8xb1-20e_coco.py b/mmpose/configs/mmdet/scnet/scnet_x101-64x4d_fpn_8xb1-20e_coco.py index 3cdce7d54248e77e98639d68490cc30dfd625c87..fc564ee39072660f3c4767e4642443ed6afe0241 100644 --- a/mmpose/configs/mmdet/scnet/scnet_x101-64x4d_fpn_8xb1-20e_coco.py +++ b/mmpose/configs/mmdet/scnet/scnet_x101-64x4d_fpn_8xb1-20e_coco.py @@ -1,4 +1,4 @@ -_base_ = './scnet_x101-64x4d_fpn_20e_coco.py' +_base_ = "./scnet_x101-64x4d_fpn_20e_coco.py" train_dataloader = dict(batch_size=1, num_workers=1) optim_wrapper = dict(optimizer=dict(lr=0.01)) diff --git a/mmpose/configs/mmdet/scratch/faster-rcnn_r50-scratch_fpn_gn-all_6x_coco.py b/mmpose/configs/mmdet/scratch/faster-rcnn_r50-scratch_fpn_gn-all_6x_coco.py index 6e632b9a150871a44b698dfdb0fdc3f07308ef81..4bca01b52591ea189eb4d1c1882e6b4bd96c315b 100644 --- a/mmpose/configs/mmdet/scratch/faster-rcnn_r50-scratch_fpn_gn-all_6x_coco.py +++ b/mmpose/configs/mmdet/scratch/faster-rcnn_r50-scratch_fpn_gn-all_6x_coco.py @@ -1,36 +1,23 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict( - backbone=dict( - frozen_stages=-1, - zero_init_residual=False, - norm_cfg=norm_cfg, - init_cfg=None), + backbone=dict(frozen_stages=-1, zero_init_residual=False, norm_cfg=norm_cfg, init_cfg=None), neck=dict(norm_cfg=norm_cfg), - roi_head=dict( - bbox_head=dict( - type='Shared4Conv1FCBBoxHead', - conv_out_channels=256, - norm_cfg=norm_cfg))) + roi_head=dict(bbox_head=dict(type="Shared4Conv1FCBBoxHead", conv_out_channels=256, norm_cfg=norm_cfg)), +) -optim_wrapper = dict(paramwise_cfg=dict(norm_decay_mult=0.)) +optim_wrapper = dict(paramwise_cfg=dict(norm_decay_mult=0.0)) max_epochs = 73 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[65, 71], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[65, 71], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/scratch/mask-rcnn_r50-scratch_fpn_gn-all_6x_coco.py b/mmpose/configs/mmdet/scratch/mask-rcnn_r50-scratch_fpn_gn-all_6x_coco.py index 9796f504b677a841919bb058ded414de25e74a50..82cafd211a78f0378ac25cb306e0399c30dcbf63 100644 --- a/mmpose/configs/mmdet/scratch/mask-rcnn_r50-scratch_fpn_gn-all_6x_coco.py +++ b/mmpose/configs/mmdet/scratch/mask-rcnn_r50-scratch_fpn_gn-all_6x_coco.py @@ -1,37 +1,25 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) +norm_cfg = dict(type="GN", num_groups=32, requires_grad=True) model = dict( - backbone=dict( - frozen_stages=-1, - zero_init_residual=False, - norm_cfg=norm_cfg, - init_cfg=None), + backbone=dict(frozen_stages=-1, zero_init_residual=False, norm_cfg=norm_cfg, init_cfg=None), neck=dict(norm_cfg=norm_cfg), roi_head=dict( - bbox_head=dict( - type='Shared4Conv1FCBBoxHead', - conv_out_channels=256, - norm_cfg=norm_cfg), - mask_head=dict(norm_cfg=norm_cfg))) + bbox_head=dict(type="Shared4Conv1FCBBoxHead", conv_out_channels=256, norm_cfg=norm_cfg), mask_head=dict(norm_cfg=norm_cfg) + ), +) -optim_wrapper = dict(paramwise_cfg=dict(norm_decay_mult=0.)) +optim_wrapper = dict(paramwise_cfg=dict(norm_decay_mult=0.0)) max_epochs = 73 param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[65, 71], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[65, 71], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss-normed-mask_random-ms-2x_lvis-v1.py b/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss-normed-mask_random-ms-2x_lvis-v1.py index 2de87dcca59ccac7fc96c10c2a069fcf0464aeff..4a6b6fb9a8a46adab9edbbdf17caa0a7de5d2c6a 100644 --- a/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss-normed-mask_random-ms-2x_lvis-v1.py +++ b/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss-normed-mask_random-ms-2x_lvis-v1.py @@ -1,5 +1,2 @@ -_base_ = './cascade-mask-rcnn_r101_fpn_seesaw-loss_random-ms-2x_lvis-v1.py' # noqa: E501 -model = dict( - roi_head=dict( - mask_head=dict( - predictor_cfg=dict(type='NormedConv2d', tempearture=20)))) +_base_ = "./cascade-mask-rcnn_r101_fpn_seesaw-loss_random-ms-2x_lvis-v1.py" # noqa: E501 +model = dict(roi_head=dict(mask_head=dict(predictor_cfg=dict(type="NormedConv2d", tempearture=20)))) diff --git a/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss-normed-mask_sample1e-3-ms-2x_lvis-v1.py b/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss-normed-mask_sample1e-3-ms-2x_lvis-v1.py index 4d67ad7d4817a32b365bc2567937f69b68a9c97c..1a68ddf50acde45751c80b7f36f3b88942c9a7b5 100644 --- a/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss-normed-mask_sample1e-3-ms-2x_lvis-v1.py +++ b/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss-normed-mask_sample1e-3-ms-2x_lvis-v1.py @@ -1,5 +1,2 @@ -_base_ = './cascade-mask-rcnn_r101_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py' # noqa: E501 -model = dict( - roi_head=dict( - mask_head=dict( - predictor_cfg=dict(type='NormedConv2d', tempearture=20)))) +_base_ = "./cascade-mask-rcnn_r101_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py" # noqa: E501 +model = dict(roi_head=dict(mask_head=dict(predictor_cfg=dict(type="NormedConv2d", tempearture=20)))) diff --git a/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_random-ms-2x_lvis-v1.py b/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_random-ms-2x_lvis-v1.py index 2a1a87d4203a12a78a26fd873bd6017fafb49cdf..27f07d50b1c7b8ff3e9d01a865b0878f3132ac78 100644 --- a/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_random-ms-2x_lvis-v1.py +++ b/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_random-ms-2x_lvis-v1.py @@ -1,116 +1,83 @@ _base_ = [ - '../_base_/models/cascade-mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101')), + backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101")), roi_head=dict( bbox_head=[ dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1203, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=True, - cls_predictor_cfg=dict(type='NormedLinear', tempearture=20), - loss_cls=dict( - type='SeesawLoss', - p=0.8, - q=2.0, - num_classes=1203, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + cls_predictor_cfg=dict(type="NormedLinear", tempearture=20), + loss_cls=dict(type="SeesawLoss", p=0.8, q=2.0, num_classes=1203, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1203, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.05, 0.05, 0.1, 0.1]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=True, - cls_predictor_cfg=dict(type='NormedLinear', tempearture=20), - loss_cls=dict( - type='SeesawLoss', - p=0.8, - q=2.0, - num_classes=1203, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + cls_predictor_cfg=dict(type="NormedLinear", tempearture=20), + loss_cls=dict(type="SeesawLoss", p=0.8, q=2.0, num_classes=1203, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1203, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.033, 0.033, 0.067, 0.067]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=True, - cls_predictor_cfg=dict(type='NormedLinear', tempearture=20), - loss_cls=dict( - type='SeesawLoss', - p=0.8, - q=2.0, - num_classes=1203, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) + cls_predictor_cfg=dict(type="NormedLinear", tempearture=20), + loss_cls=dict(type="SeesawLoss", p=0.8, q=2.0, num_classes=1203, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), ], - mask_head=dict(num_classes=1203)), + mask_head=dict(num_classes=1203), + ), test_cfg=dict( rcnn=dict( score_thr=0.0001, # LVIS allows up to 300 - max_per_img=300))) + max_per_img=300, + ) + ), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] -dataset_type = 'LVISV1Dataset' -data_root = 'data/lvis_v1/' +dataset_type = "LVISV1Dataset" +data_root = "data/lvis_v1/" train_dataloader = dict( dataset=dict( - type=dataset_type, - data_root=data_root, - ann_file='annotations/lvis_v1_train.json', - data_prefix=dict(img=''), - pipeline=train_pipeline)) + type=dataset_type, data_root=data_root, ann_file="annotations/lvis_v1_train.json", data_prefix=dict(img=""), pipeline=train_pipeline + ) +) val_dataloader = dict( - dataset=dict( - type=dataset_type, - data_root=data_root, - ann_file='annotations/lvis_v1_val.json', - data_prefix=dict(img=''))) + dataset=dict(type=dataset_type, data_root=data_root, ann_file="annotations/lvis_v1_val.json", data_prefix=dict(img="")) +) test_dataloader = val_dataloader -val_evaluator = dict( - type='LVISMetric', - ann_file=data_root + 'annotations/lvis_v1_val.json', - metric=['bbox', 'segm']) +val_evaluator = dict(type="LVISMetric", ann_file=data_root + "annotations/lvis_v1_val.json", metric=["bbox", "segm"]) test_evaluator = val_evaluator train_cfg = dict(val_interval=24) diff --git a/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py b/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py index 0e7b4df91368d23092a68f16ba4a35660ea23130..8cd784e57ec039bdc49c200f0c3f05c54ebdf67e 100644 --- a/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py +++ b/mmpose/configs/mmdet/seesaw_loss/cascade-mask-rcnn_r101_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py @@ -1,94 +1,69 @@ _base_ = [ - '../_base_/models/cascade-mask-rcnn_r50_fpn.py', - '../_base_/datasets/lvis_v1_instance.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-mask-rcnn_r50_fpn.py", + "../_base_/datasets/lvis_v1_instance.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101')), + backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101")), roi_head=dict( bbox_head=[ dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1203, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=True, - cls_predictor_cfg=dict(type='NormedLinear', tempearture=20), - loss_cls=dict( - type='SeesawLoss', - p=0.8, - q=2.0, - num_classes=1203, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + cls_predictor_cfg=dict(type="NormedLinear", tempearture=20), + loss_cls=dict(type="SeesawLoss", p=0.8, q=2.0, num_classes=1203, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1203, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.05, 0.05, 0.1, 0.1]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=True, - cls_predictor_cfg=dict(type='NormedLinear', tempearture=20), - loss_cls=dict( - type='SeesawLoss', - p=0.8, - q=2.0, - num_classes=1203, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + cls_predictor_cfg=dict(type="NormedLinear", tempearture=20), + loss_cls=dict(type="SeesawLoss", p=0.8, q=2.0, num_classes=1203, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1203, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.033, 0.033, 0.067, 0.067]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=True, - cls_predictor_cfg=dict(type='NormedLinear', tempearture=20), - loss_cls=dict( - type='SeesawLoss', - p=0.8, - q=2.0, - num_classes=1203, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) + cls_predictor_cfg=dict(type="NormedLinear", tempearture=20), + loss_cls=dict(type="SeesawLoss", p=0.8, q=2.0, num_classes=1203, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), ], - mask_head=dict(num_classes=1203)), + mask_head=dict(num_classes=1203), + ), test_cfg=dict( rcnn=dict( score_thr=0.0001, # LVIS allows up to 300 - max_per_img=300))) + max_per_img=300, + ) + ), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(dataset=dict(pipeline=train_pipeline))) diff --git a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss-normed-mask_random-ms-2x_lvis-v1.py b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss-normed-mask_random-ms-2x_lvis-v1.py index b518c2135acb39a3d1119a8892c72816910ca496..6281cf73823cc9fa9501e01e50292f6762566443 100644 --- a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss-normed-mask_random-ms-2x_lvis-v1.py +++ b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss-normed-mask_random-ms-2x_lvis-v1.py @@ -1,6 +1,2 @@ -_base_ = './mask-rcnn_r50_fpn_seesaw-loss-normed-mask_random-ms-2x_lvis-v1.py' # noqa: E501 -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./mask-rcnn_r50_fpn_seesaw-loss-normed-mask_random-ms-2x_lvis-v1.py" # noqa: E501 +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss-normed-mask_sample1e-3-ms-2x_lvis-v1.py b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss-normed-mask_sample1e-3-ms-2x_lvis-v1.py index 008bbcae6eb8d189bdd0688b42d663eeba2a661e..bec83719c46b0bcd6e9c836afcfab0f9701c1407 100644 --- a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss-normed-mask_sample1e-3-ms-2x_lvis-v1.py +++ b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss-normed-mask_sample1e-3-ms-2x_lvis-v1.py @@ -1,6 +1,2 @@ -_base_ = './mask-rcnn_r50_fpn_seesaw-loss-normed-mask_sample1e-3-ms-2x_lvis-v1.py' # noqa: E501 -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./mask-rcnn_r50_fpn_seesaw-loss-normed-mask_sample1e-3-ms-2x_lvis-v1.py" # noqa: E501 +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss_random-ms-2x_lvis-v1.py b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss_random-ms-2x_lvis-v1.py index 8a0b6755bf6f218c337d9ee16677e3e64886c019..af89e9186ac234ad47e146b43d81cba9550bbdf3 100644 --- a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss_random-ms-2x_lvis-v1.py +++ b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss_random-ms-2x_lvis-v1.py @@ -1,6 +1,2 @@ -_base_ = './mask-rcnn_r50_fpn_seesaw-loss_random-ms-2x_lvis-v1.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./mask-rcnn_r50_fpn_seesaw-loss_random-ms-2x_lvis-v1.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py index 6143231918e028523b6bb1792887ef7ce16dde02..332d2db3aa5bb649e6e98971a8d7f1b1a226573b 100644 --- a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py +++ b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r101_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py @@ -1,6 +1,2 @@ -_base_ = './mask-rcnn_r50_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./mask-rcnn_r50_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss-normed-mask_random-ms-2x_lvis-v1.py b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss-normed-mask_random-ms-2x_lvis-v1.py index 06d2438cf7c351a2fb352f787bc434cc6afc3ebb..e812033bfc0e59e80ee748cb8102ed0c3dfa956b 100644 --- a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss-normed-mask_random-ms-2x_lvis-v1.py +++ b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss-normed-mask_random-ms-2x_lvis-v1.py @@ -1,5 +1,2 @@ -_base_ = './mask-rcnn_r50_fpn_seesaw-loss_random-ms-2x_lvis-v1.py' -model = dict( - roi_head=dict( - mask_head=dict( - predictor_cfg=dict(type='NormedConv2d', tempearture=20)))) +_base_ = "./mask-rcnn_r50_fpn_seesaw-loss_random-ms-2x_lvis-v1.py" +model = dict(roi_head=dict(mask_head=dict(predictor_cfg=dict(type="NormedConv2d", tempearture=20)))) diff --git a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss-normed-mask_sample1e-3-ms-2x_lvis-v1.py b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss-normed-mask_sample1e-3-ms-2x_lvis-v1.py index 5fc68d3df32015e0fc8d5dd2bc92df416a8fc5fd..26246cb7f9b61c749dc91695590502f59315f6c2 100644 --- a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss-normed-mask_sample1e-3-ms-2x_lvis-v1.py +++ b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss-normed-mask_sample1e-3-ms-2x_lvis-v1.py @@ -1,5 +1,2 @@ -_base_ = './mask-rcnn_r50_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py' -model = dict( - roi_head=dict( - mask_head=dict( - predictor_cfg=dict(type='NormedConv2d', tempearture=20)))) +_base_ = "./mask-rcnn_r50_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py" +model = dict(roi_head=dict(mask_head=dict(predictor_cfg=dict(type="NormedConv2d", tempearture=20)))) diff --git a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_random-ms-2x_lvis-v1.py b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_random-ms-2x_lvis-v1.py index 25c646c9c75c4468e71442049876a77382528e02..e6d4b963e72890b0c803a2976166705cfe5e67de 100644 --- a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_random-ms-2x_lvis-v1.py +++ b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_random-ms-2x_lvis-v1.py @@ -1,59 +1,48 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] model = dict( roi_head=dict( bbox_head=dict( num_classes=1203, - cls_predictor_cfg=dict(type='NormedLinear', tempearture=20), - loss_cls=dict( - type='SeesawLoss', - p=0.8, - q=2.0, - num_classes=1203, - loss_weight=1.0)), - mask_head=dict(num_classes=1203)), + cls_predictor_cfg=dict(type="NormedLinear", tempearture=20), + loss_cls=dict(type="SeesawLoss", p=0.8, q=2.0, num_classes=1203, loss_weight=1.0), + ), + mask_head=dict(num_classes=1203), + ), test_cfg=dict( rcnn=dict( score_thr=0.0001, # LVIS allows up to 300 - max_per_img=300))) + max_per_img=300, + ) + ), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] -dataset_type = 'LVISV1Dataset' -data_root = 'data/lvis_v1/' +dataset_type = "LVISV1Dataset" +data_root = "data/lvis_v1/" train_dataloader = dict( dataset=dict( - type=dataset_type, - data_root=data_root, - ann_file='annotations/lvis_v1_train.json', - data_prefix=dict(img=''), - pipeline=train_pipeline)) + type=dataset_type, data_root=data_root, ann_file="annotations/lvis_v1_train.json", data_prefix=dict(img=""), pipeline=train_pipeline + ) +) val_dataloader = dict( - dataset=dict( - type=dataset_type, - data_root=data_root, - ann_file='annotations/lvis_v1_val.json', - data_prefix=dict(img=''))) + dataset=dict(type=dataset_type, data_root=data_root, ann_file="annotations/lvis_v1_val.json", data_prefix=dict(img="")) +) test_dataloader = val_dataloader -val_evaluator = dict( - type='LVISMetric', - ann_file=data_root + 'annotations/lvis_v1_val.json', - metric=['bbox', 'segm']) +val_evaluator = dict(type="LVISMetric", ann_file=data_root + "annotations/lvis_v1_val.json", metric=["bbox", "segm"]) test_evaluator = val_evaluator train_cfg = dict(val_interval=24) diff --git a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py index d60320e0b78035d24adb86f3aa184433951481fe..11139de89b3a67c068eb12b8602a484f9f819a39 100644 --- a/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py +++ b/mmpose/configs/mmdet/seesaw_loss/mask-rcnn_r50_fpn_seesaw-loss_sample1e-3-ms-2x_lvis-v1.py @@ -1,37 +1,34 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/lvis_v1_instance.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/lvis_v1_instance.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] model = dict( roi_head=dict( bbox_head=dict( num_classes=1203, - cls_predictor_cfg=dict(type='NormedLinear', tempearture=20), - loss_cls=dict( - type='SeesawLoss', - p=0.8, - q=2.0, - num_classes=1203, - loss_weight=1.0)), - mask_head=dict(num_classes=1203)), + cls_predictor_cfg=dict(type="NormedLinear", tempearture=20), + loss_cls=dict(type="SeesawLoss", p=0.8, q=2.0, num_classes=1203, loss_weight=1.0), + ), + mask_head=dict(num_classes=1203), + ), test_cfg=dict( rcnn=dict( score_thr=0.0001, # LVIS allows up to 300 - max_per_img=300))) + max_per_img=300, + ) + ), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(dataset=dict(pipeline=train_pipeline))) diff --git a/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-mocov2-pre_fpn_1x_coco.py b/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-mocov2-pre_fpn_1x_coco.py index 91d45add8aba54de4b25fba11ecf5e18bca0084f..b37aed7c937da8eb6d2a2f4d23d18fbfb69b6151 100644 --- a/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-mocov2-pre_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-mocov2-pre_fpn_1x_coco.py @@ -1,13 +1,15 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( backbone=dict( frozen_stages=0, - norm_cfg=dict(type='SyncBN', requires_grad=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False, - init_cfg=dict( - type='Pretrained', checkpoint='./mocov2_r50_800ep_pretrain.pth'))) + init_cfg=dict(type="Pretrained", checkpoint="./mocov2_r50_800ep_pretrain.pth"), + ) +) diff --git a/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-mocov2-pre_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-mocov2-pre_fpn_ms-2x_coco.py index ddaebf5558a22680d556aa8b3fe79541d634d910..885370fe9af8cdf56b0446c241688b1cf38c3736 100644 --- a/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-mocov2-pre_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-mocov2-pre_fpn_ms-2x_coco.py @@ -1,25 +1,25 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] model = dict( backbone=dict( frozen_stages=0, - norm_cfg=dict(type='SyncBN', requires_grad=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False, - init_cfg=dict( - type='Pretrained', checkpoint='./mocov2_r50_800ep_pretrain.pth'))) + init_cfg=dict(type="Pretrained", checkpoint="./mocov2_r50_800ep_pretrain.pth"), + ) +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomResize', scale=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomResize", scale=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-swav-pre_fpn_1x_coco.py b/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-swav-pre_fpn_1x_coco.py index 785c80ec9d14c8e4b54b2e3359f9b4c680eaca17..fcf781c3ee106a8b16a6d86d2fda8c4fb41f6f41 100644 --- a/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-swav-pre_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-swav-pre_fpn_1x_coco.py @@ -1,13 +1,15 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( backbone=dict( frozen_stages=0, - norm_cfg=dict(type='SyncBN', requires_grad=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False, - init_cfg=dict( - type='Pretrained', checkpoint='./swav_800ep_pretrain.pth.tar'))) + init_cfg=dict(type="Pretrained", checkpoint="./swav_800ep_pretrain.pth.tar"), + ) +) diff --git a/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-swav-pre_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-swav-pre_fpn_ms-2x_coco.py index c393e0b36047f731c91c3f0963ef90347a0910e9..8d96391890a9e35dff24efe50dd9751585000201 100644 --- a/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-swav-pre_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/selfsup_pretrain/mask-rcnn_r50-swav-pre_fpn_ms-2x_coco.py @@ -1,25 +1,25 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] model = dict( backbone=dict( frozen_stages=0, - norm_cfg=dict(type='SyncBN', requires_grad=True), + norm_cfg=dict(type="SyncBN", requires_grad=True), norm_eval=False, - init_cfg=dict( - type='Pretrained', checkpoint='./swav_800ep_pretrain.pth.tar'))) + init_cfg=dict(type="Pretrained", checkpoint="./swav_800ep_pretrain.pth.tar"), + ) +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomResize', scale=[(1333, 640), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomResize", scale=[(1333, 640), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-270k_coco.py b/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-270k_coco.py index 0c6e081e860e1240f8d35efa8176563a8b5be845..208c2e1c88a1c909cf0e55f67ae148ecc2eaea17 100644 --- a/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-270k_coco.py +++ b/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-270k_coco.py @@ -1,31 +1,27 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', + "../_base_/models/mask-rcnn_r50_fpn.py", # 270k iterations with batch_size 64 is roughly equivalent to 144 epochs - '../common/ssj_270k_coco-instance.py', + "../common/ssj_270k_coco-instance.py", ] image_size = (1024, 1024) -batch_augments = [ - dict(type='BatchFixedSizePad', size=image_size, pad_mask=True) -] -norm_cfg = dict(type='SyncBN', requires_grad=True) +batch_augments = [dict(type="BatchFixedSizePad", size=image_size, pad_mask=True)] +norm_cfg = dict(type="SyncBN", requires_grad=True) # Use MMSyncBN that handles empty tensor in head. It can be changed to # SyncBN after https://github.com/pytorch/pytorch/issues/36530 is fixed -head_norm_cfg = dict(type='MMSyncBN', requires_grad=True) +head_norm_cfg = dict(type="MMSyncBN", requires_grad=True) model = dict( # the model is trained from scratch, so init_cfg is None data_preprocessor=dict( # pad_size_divisor=32 is unnecessary in training but necessary # in testing. pad_size_divisor=32, - batch_augments=batch_augments), - backbone=dict( - frozen_stages=-1, norm_eval=False, norm_cfg=norm_cfg, init_cfg=None), + batch_augments=batch_augments, + ), + backbone=dict(frozen_stages=-1, norm_eval=False, norm_cfg=norm_cfg, init_cfg=None), neck=dict(norm_cfg=norm_cfg), rpn_head=dict(num_convs=2), # leads to 0.1+ mAP roi_head=dict( - bbox_head=dict( - type='Shared4Conv1FCBBoxHead', - conv_out_channels=256, - norm_cfg=head_norm_cfg), - mask_head=dict(norm_cfg=head_norm_cfg))) + bbox_head=dict(type="Shared4Conv1FCBBoxHead", conv_out_channels=256, norm_cfg=head_norm_cfg), mask_head=dict(norm_cfg=head_norm_cfg) + ), +) diff --git a/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-90k_coco.py b/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-90k_coco.py index abe8962ac69184241e30628242e5313c52f503f4..143dc17fdb034678290469c592eae1ccb1dcef4b 100644 --- a/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-90k_coco.py +++ b/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-90k_coco.py @@ -1,4 +1,4 @@ -_base_ = 'mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-270k_coco.py' # noqa +_base_ = "mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-270k_coco.py" # noqa # training schedule for 90k max_iters = 90000 @@ -6,13 +6,6 @@ max_iters = 90000 # learning rate policy # lr steps at [0.9, 0.95, 0.975] of the maximum iterations param_scheduler = [ - dict( - type='LinearLR', start_factor=0.067, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=90000, - by_epoch=False, - milestones=[81000, 85500, 87750], - gamma=0.1) + dict(type="LinearLR", start_factor=0.067, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=90000, by_epoch=False, milestones=[81000, 85500, 87750], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-scp-270k_coco.py b/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-scp-270k_coco.py index f0ea57d19728d7c563e56d139888059dd9c81317..9502193a7aec7c79927742eff0d0d6f83ab03726 100644 --- a/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-scp-270k_coco.py +++ b/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-scp-270k_coco.py @@ -1,31 +1,27 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', + "../_base_/models/mask-rcnn_r50_fpn.py", # 270k iterations with batch_size 64 is roughly equivalent to 144 epochs - '../common/ssj_scp_270k_coco-instance.py' + "../common/ssj_scp_270k_coco-instance.py", ] image_size = (1024, 1024) -batch_augments = [ - dict(type='BatchFixedSizePad', size=image_size, pad_mask=True) -] -norm_cfg = dict(type='SyncBN', requires_grad=True) +batch_augments = [dict(type="BatchFixedSizePad", size=image_size, pad_mask=True)] +norm_cfg = dict(type="SyncBN", requires_grad=True) # Use MMSyncBN that handles empty tensor in head. It can be changed to # SyncBN after https://github.com/pytorch/pytorch/issues/36530 is fixed -head_norm_cfg = dict(type='MMSyncBN', requires_grad=True) +head_norm_cfg = dict(type="MMSyncBN", requires_grad=True) model = dict( # the model is trained from scratch, so init_cfg is None data_preprocessor=dict( # pad_size_divisor=32 is unnecessary in training but necessary # in testing. pad_size_divisor=32, - batch_augments=batch_augments), - backbone=dict( - frozen_stages=-1, norm_eval=False, norm_cfg=norm_cfg, init_cfg=None), + batch_augments=batch_augments, + ), + backbone=dict(frozen_stages=-1, norm_eval=False, norm_cfg=norm_cfg, init_cfg=None), neck=dict(norm_cfg=norm_cfg), rpn_head=dict(num_convs=2), # leads to 0.1+ mAP roi_head=dict( - bbox_head=dict( - type='Shared4Conv1FCBBoxHead', - conv_out_channels=256, - norm_cfg=head_norm_cfg), - mask_head=dict(norm_cfg=head_norm_cfg))) + bbox_head=dict(type="Shared4Conv1FCBBoxHead", conv_out_channels=256, norm_cfg=head_norm_cfg), mask_head=dict(norm_cfg=head_norm_cfg) + ), +) diff --git a/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-scp-90k_coco.py b/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-scp-90k_coco.py index e158b5c05aae3345ba9d4d1a55d1bbb82a789726..1d2a9b7ea764e696c6e510ee6753c72ed525336c 100644 --- a/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-scp-90k_coco.py +++ b/mmpose/configs/mmdet/simple_copy_paste/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-scp-90k_coco.py @@ -1,4 +1,4 @@ -_base_ = 'mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-scp-270k_coco.py' # noqa +_base_ = "mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_32xb2-ssj-scp-270k_coco.py" # noqa # training schedule for 90k max_iters = 90000 @@ -6,13 +6,6 @@ max_iters = 90000 # learning rate policy # lr steps at [0.9, 0.95, 0.975] of the maximum iterations param_scheduler = [ - dict( - type='LinearLR', start_factor=0.067, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=90000, - by_epoch=False, - milestones=[81000, 85500, 87750], - gamma=0.1) + dict(type="LinearLR", start_factor=0.067, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=90000, by_epoch=False, milestones=[81000, 85500, 87750], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.01-coco.py b/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.01-coco.py index 2bd09645598204482e9f88f6baf00d32eba9cab6..5f81720954917f8572b0e2f56334de3ae51b587f 100644 --- a/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.01-coco.py +++ b/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.01-coco.py @@ -1,9 +1,8 @@ -_base_ = ['soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.1-coco.py'] +_base_ = ["soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.1-coco.py"] # 1% coco train2017 is set as labeled dataset labeled_dataset = _base_.labeled_dataset unlabeled_dataset = _base_.unlabeled_dataset -labeled_dataset.ann_file = 'semi_anns/instances_train2017.1@1.json' -unlabeled_dataset.ann_file = 'semi_anns/instances_train2017.1@1-unlabeled.json' -train_dataloader = dict( - dataset=dict(datasets=[labeled_dataset, unlabeled_dataset])) +labeled_dataset.ann_file = "semi_anns/instances_train2017.1@1.json" +unlabeled_dataset.ann_file = "semi_anns/instances_train2017.1@1-unlabeled.json" +train_dataloader = dict(dataset=dict(datasets=[labeled_dataset, unlabeled_dataset])) diff --git a/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.02-coco.py b/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.02-coco.py index 8ca38c931926cef33321f931b0c6d5c66824ff55..d64e6866a39f26e418238e55cc203987244f0643 100644 --- a/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.02-coco.py +++ b/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.02-coco.py @@ -1,9 +1,8 @@ -_base_ = ['soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.1-coco.py'] +_base_ = ["soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.1-coco.py"] # 2% coco train2017 is set as labeled dataset labeled_dataset = _base_.labeled_dataset unlabeled_dataset = _base_.unlabeled_dataset -labeled_dataset.ann_file = 'semi_anns/instances_train2017.1@2.json' -unlabeled_dataset.ann_file = 'semi_anns/instances_train2017.1@2-unlabeled.json' -train_dataloader = dict( - dataset=dict(datasets=[labeled_dataset, unlabeled_dataset])) +labeled_dataset.ann_file = "semi_anns/instances_train2017.1@2.json" +unlabeled_dataset.ann_file = "semi_anns/instances_train2017.1@2-unlabeled.json" +train_dataloader = dict(dataset=dict(datasets=[labeled_dataset, unlabeled_dataset])) diff --git a/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.05-coco.py b/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.05-coco.py index 750b7ed6df6c91bab8f68f58f339b2f3696fa693..db9f1f904bd3ffe7288fec5bb5df37a4385afe2d 100644 --- a/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.05-coco.py +++ b/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.05-coco.py @@ -1,9 +1,8 @@ -_base_ = ['soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.1-coco.py'] +_base_ = ["soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.1-coco.py"] # 5% coco train2017 is set as labeled dataset labeled_dataset = _base_.labeled_dataset unlabeled_dataset = _base_.unlabeled_dataset -labeled_dataset.ann_file = 'semi_anns/instances_train2017.1@5.json' -unlabeled_dataset.ann_file = 'semi_anns/instances_train2017.1@5-unlabeled.json' -train_dataloader = dict( - dataset=dict(datasets=[labeled_dataset, unlabeled_dataset])) +labeled_dataset.ann_file = "semi_anns/instances_train2017.1@5.json" +unlabeled_dataset.ann_file = "semi_anns/instances_train2017.1@5-unlabeled.json" +train_dataloader = dict(dataset=dict(datasets=[labeled_dataset, unlabeled_dataset])) diff --git a/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.1-coco.py b/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.1-coco.py index 3713aef442f4add55efafde08b2c98da1773bab0..ce071daa31c9cdf8bc9b0a3f35339e6696b6c8c6 100644 --- a/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.1-coco.py +++ b/mmpose/configs/mmdet/soft_teacher/soft-teacher_faster-rcnn_r50-caffe_fpn_180k_semi-0.1-coco.py @@ -1,35 +1,26 @@ -_base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', '../_base_/default_runtime.py', - '../_base_/datasets/semi_coco_detection.py' -] +_base_ = ["../_base_/models/faster-rcnn_r50_fpn.py", "../_base_/default_runtime.py", "../_base_/datasets/semi_coco_detection.py"] detector = _base_.model detector.data_preprocessor = dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32) + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 +) detector.backbone = dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')) + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), +) model = dict( _delete_=True, - type='SoftTeacher', + type="SoftTeacher", detector=detector, - data_preprocessor=dict( - type='MultiBranchDataPreprocessor', - data_preprocessor=detector.data_preprocessor), + data_preprocessor=dict(type="MultiBranchDataPreprocessor", data_preprocessor=detector.data_preprocessor), semi_train_cfg=dict( freeze_teacher=True, sup_weight=1.0, @@ -40,45 +31,34 @@ model = dict( reg_pseudo_thr=0.02, jitter_times=10, jitter_scale=0.06, - min_pseudo_bbox_wh=(1e-2, 1e-2)), - semi_test_cfg=dict(predict_on='teacher')) + min_pseudo_bbox_wh=(1e-2, 1e-2), + ), + semi_test_cfg=dict(predict_on="teacher"), +) # 10% coco train2017 is set as labeled dataset labeled_dataset = _base_.labeled_dataset unlabeled_dataset = _base_.unlabeled_dataset -labeled_dataset.ann_file = 'semi_anns/instances_train2017.1@10.json' -unlabeled_dataset.ann_file = 'semi_anns/' \ - 'instances_train2017.1@10-unlabeled.json' -unlabeled_dataset.data_prefix = dict(img='train2017/') -train_dataloader = dict( - dataset=dict(datasets=[labeled_dataset, unlabeled_dataset])) +labeled_dataset.ann_file = "semi_anns/instances_train2017.1@10.json" +unlabeled_dataset.ann_file = "semi_anns/" "instances_train2017.1@10-unlabeled.json" +unlabeled_dataset.data_prefix = dict(img="train2017/") +train_dataloader = dict(dataset=dict(datasets=[labeled_dataset, unlabeled_dataset])) # training schedule for 180k -train_cfg = dict( - type='IterBasedTrainLoop', max_iters=180000, val_interval=5000) -val_cfg = dict(type='TeacherStudentValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="IterBasedTrainLoop", max_iters=180000, val_interval=5000) +val_cfg = dict(type="TeacherStudentValLoop") +test_cfg = dict(type="TestLoop") # learning rate policy param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=180000, - by_epoch=False, - milestones=[120000, 160000], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=180000, by_epoch=False, milestones=[120000, 160000], gamma=0.1), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) -default_hooks = dict( - checkpoint=dict(by_epoch=False, interval=10000, max_keep_ckpts=2)) +default_hooks = dict(checkpoint=dict(by_epoch=False, interval=10000, max_keep_ckpts=2)) log_processor = dict(by_epoch=False) -custom_hooks = [dict(type='MeanTeacherHook')] +custom_hooks = [dict(type="MeanTeacherHook")] diff --git a/mmpose/configs/mmdet/solo/decoupled-solo-light_r50_fpn_3x_coco.py b/mmpose/configs/mmdet/solo/decoupled-solo-light_r50_fpn_3x_coco.py index fc35df3c3cbbd70532e066de27b06418549eb906..fd246226d77506637878ffa4c029cf08685f082c 100644 --- a/mmpose/configs/mmdet/solo/decoupled-solo-light_r50_fpn_3x_coco.py +++ b/mmpose/configs/mmdet/solo/decoupled-solo-light_r50_fpn_3x_coco.py @@ -1,9 +1,9 @@ -_base_ = './decoupled-solo_r50_fpn_3x_coco.py' +_base_ = "./decoupled-solo_r50_fpn_3x_coco.py" # model settings model = dict( mask_head=dict( - type='DecoupledSOLOLightHead', + type="DecoupledSOLOLightHead", num_classes=80, in_channels=256, stacked_convs=4, @@ -13,36 +13,24 @@ model = dict( pos_scale=0.2, num_grids=[40, 36, 24, 16, 12], cls_down_index=0, - loss_mask=dict( - type='DiceLoss', use_sigmoid=True, activate=False, - loss_weight=3.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - norm_cfg=dict(type='GN', num_groups=32, requires_grad=True))) + loss_mask=dict(type="DiceLoss", use_sigmoid=True, activate=False, loss_weight=3.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + norm_cfg=dict(type="GN", num_groups=32, requires_grad=True), + ) +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomChoiceResize', - scales=[(852, 512), (852, 480), (852, 448), (852, 416), (852, 384), - (852, 352)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomChoiceResize", scales=[(852, 512), (852, 480), (852, 448), (852, 416), (852, 384), (852, 352)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(852, 512), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(852, 512), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/solo/decoupled-solo_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/solo/decoupled-solo_r50_fpn_1x_coco.py index 6d7f4b90c19d9fdcc3c895deb4101cf7acd7bd8e..d4b4581cb9d4fda6e4f03534812ba29257096a64 100644 --- a/mmpose/configs/mmdet/solo/decoupled-solo_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/solo/decoupled-solo_r50_fpn_1x_coco.py @@ -1,8 +1,8 @@ -_base_ = './solo_r50_fpn_1x_coco.py' +_base_ = "./solo_r50_fpn_1x_coco.py" # model settings model = dict( mask_head=dict( - type='DecoupledSOLOHead', + type="DecoupledSOLOHead", num_classes=80, in_channels=256, stacked_convs=7, @@ -12,13 +12,8 @@ model = dict( pos_scale=0.2, num_grids=[40, 36, 24, 16, 12], cls_down_index=0, - loss_mask=dict( - type='DiceLoss', use_sigmoid=True, activate=False, - loss_weight=3.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - norm_cfg=dict(type='GN', num_groups=32, requires_grad=True))) + loss_mask=dict(type="DiceLoss", use_sigmoid=True, activate=False, loss_weight=3.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + norm_cfg=dict(type="GN", num_groups=32, requires_grad=True), + ) +) diff --git a/mmpose/configs/mmdet/solo/decoupled-solo_r50_fpn_3x_coco.py b/mmpose/configs/mmdet/solo/decoupled-solo_r50_fpn_3x_coco.py index 4a8c19decb72a3d904a277faac06670999f6b322..09b0369faeea27de4ab5664c56e6efe956cac0f5 100644 --- a/mmpose/configs/mmdet/solo/decoupled-solo_r50_fpn_3x_coco.py +++ b/mmpose/configs/mmdet/solo/decoupled-solo_r50_fpn_3x_coco.py @@ -1,9 +1,9 @@ -_base_ = './solo_r50_fpn_3x_coco.py' +_base_ = "./solo_r50_fpn_3x_coco.py" # model settings model = dict( mask_head=dict( - type='DecoupledSOLOHead', + type="DecoupledSOLOHead", num_classes=80, in_channels=256, stacked_convs=7, @@ -13,13 +13,8 @@ model = dict( pos_scale=0.2, num_grids=[40, 36, 24, 16, 12], cls_down_index=0, - loss_mask=dict( - type='DiceLoss', use_sigmoid=True, activate=False, - loss_weight=3.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - norm_cfg=dict(type='GN', num_groups=32, requires_grad=True))) + loss_mask=dict(type="DiceLoss", use_sigmoid=True, activate=False, loss_weight=3.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + norm_cfg=dict(type="GN", num_groups=32, requires_grad=True), + ) +) diff --git a/mmpose/configs/mmdet/solo/solo_r101_fpn_8xb8-lsj-200e_coco.py b/mmpose/configs/mmdet/solo/solo_r101_fpn_8xb8-lsj-200e_coco.py index 0f49c5c1ce67973d15b3fad3ad8c966af8203af7..66097fdf41833dec1e7df4df4a4ebd0fc5ee5b8b 100644 --- a/mmpose/configs/mmdet/solo/solo_r101_fpn_8xb8-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/solo/solo_r101_fpn_8xb8-lsj-200e_coco.py @@ -1,7 +1,3 @@ -_base_ = './solo_r50_fpn_8xb8-lsj-200e_coco.py' +_base_ = "./solo_r50_fpn_8xb8-lsj-200e_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/solo/solo_r18_fpn_8xb8-lsj-200e_coco.py b/mmpose/configs/mmdet/solo/solo_r18_fpn_8xb8-lsj-200e_coco.py index 977ae54dc28e56802289ac552ce20815b7d1d761..f4fd0162689f4001c953140092cc526b88271e1b 100644 --- a/mmpose/configs/mmdet/solo/solo_r18_fpn_8xb8-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/solo/solo_r18_fpn_8xb8-lsj-200e_coco.py @@ -1,7 +1,6 @@ -_base_ = './solo_r50_fpn_8xb8-lsj-200e_coco.py' +_base_ = "./solo_r50_fpn_8xb8-lsj-200e_coco.py" model = dict( - backbone=dict( - depth=18, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18')), - neck=dict(in_channels=[64, 128, 256, 512])) + backbone=dict(depth=18, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet18")), + neck=dict(in_channels=[64, 128, 256, 512]), +) diff --git a/mmpose/configs/mmdet/solo/solo_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/solo/solo_r50_fpn_1x_coco.py index 595e9ffe148be84dcc3d5c89e5315e8ef3a24477..bd9546479dd07056da7c4b111d7655373172b8bc 100644 --- a/mmpose/configs/mmdet/solo/solo_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/solo/solo_r50_fpn_1x_coco.py @@ -1,33 +1,27 @@ -_base_ = [ - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_instance.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings model = dict( - type='SOLO', + type="SOLO", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_mask=True, - pad_size_divisor=32), + pad_size_divisor=32, + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=0, - num_outs=5), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + style="pytorch", + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=0, num_outs=5), mask_head=dict( - type='SOLOHead', + type="SOLOHead", num_classes=80, in_channels=256, stacked_convs=7, @@ -37,26 +31,18 @@ model = dict( pos_scale=0.2, num_grids=[40, 36, 24, 16, 12], cls_down_index=0, - loss_mask=dict(type='DiceLoss', use_sigmoid=True, loss_weight=3.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - norm_cfg=dict(type='GN', num_groups=32, requires_grad=True)), + loss_mask=dict(type="DiceLoss", use_sigmoid=True, loss_weight=3.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + norm_cfg=dict(type="GN", num_groups=32, requires_grad=True), + ), # model training and testing settings test_cfg=dict( - nms_pre=500, - score_thr=0.1, - mask_thr=0.5, - filter_thr=0.05, - kernel='gaussian', # gaussian/linear - sigma=2.0, - max_per_img=100)) + nms_pre=500, score_thr=0.1, mask_thr=0.5, filter_thr=0.05, kernel="gaussian", sigma=2.0, max_per_img=100 # gaussian/linear + ), +) # optimizer optim_wrapper = dict(optimizer=dict(lr=0.01)) -val_evaluator = dict(metric='segm') +val_evaluator = dict(metric="segm") test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/solo/solo_r50_fpn_3x_coco.py b/mmpose/configs/mmdet/solo/solo_r50_fpn_3x_coco.py index 0d5abbd2f4d4e1fdc2e3cb92c8e0157188b0aa9a..a84d5de6fcebcf90e1e0632c2e7f625b9bb18cd5 100644 --- a/mmpose/configs/mmdet/solo/solo_r50_fpn_3x_coco.py +++ b/mmpose/configs/mmdet/solo/solo_r50_fpn_3x_coco.py @@ -1,15 +1,11 @@ -_base_ = './solo_r50_fpn_1x_coco.py' +_base_ = "./solo_r50_fpn_1x_coco.py" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 800), (1333, 768), (1333, 736), (1333, 704), - (1333, 672), (1333, 640)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomChoiceResize", scales=[(1333, 800), (1333, 768), (1333, 736), (1333, 704), (1333, 672), (1333, 640)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) @@ -19,17 +15,6 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 3, - by_epoch=False, - begin=0, - end=500), - dict( - type='MultiStepLR', - begin=0, - end=36, - by_epoch=True, - milestones=[27, 33], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 3, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=36, by_epoch=True, milestones=[27, 33], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/solo/solo_r50_fpn_8xb8-lsj-200e_coco.py b/mmpose/configs/mmdet/solo/solo_r50_fpn_8xb8-lsj-200e_coco.py index d46bf391c907707d222756e9450b661b6edd6985..b7f3098ed4194234ceaaa9711165a1394c837c9b 100644 --- a/mmpose/configs/mmdet/solo/solo_r50_fpn_8xb8-lsj-200e_coco.py +++ b/mmpose/configs/mmdet/solo/solo_r50_fpn_8xb8-lsj-200e_coco.py @@ -1,34 +1,31 @@ -_base_ = '../common/lsj-200e_coco-instance.py' +_base_ = "../common/lsj-200e_coco-instance.py" image_size = (1024, 1024) -batch_augments = [dict(type='BatchFixedSizePad', size=image_size)] +batch_augments = [dict(type="BatchFixedSizePad", size=image_size)] # model settings model = dict( - type='SOLO', + type="SOLO", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32, - batch_augments=batch_augments), + batch_augments=batch_augments, + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=0, - num_outs=5), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + style="pytorch", + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=0, num_outs=5), mask_head=dict( - type='SOLOHead', + type="SOLOHead", num_classes=80, in_channels=256, stacked_convs=7, @@ -38,32 +35,24 @@ model = dict( pos_scale=0.2, num_grids=[40, 36, 24, 16, 12], cls_down_index=0, - loss_mask=dict(type='DiceLoss', use_sigmoid=True, loss_weight=3.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - norm_cfg=dict(type='GN', num_groups=32, requires_grad=True)), + loss_mask=dict(type="DiceLoss", use_sigmoid=True, loss_weight=3.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + norm_cfg=dict(type="GN", num_groups=32, requires_grad=True), + ), # model training and testing settings test_cfg=dict( - nms_pre=500, - score_thr=0.1, - mask_thr=0.5, - filter_thr=0.05, - kernel='gaussian', # gaussian/linear - sigma=2.0, - max_per_img=100)) + nms_pre=500, score_thr=0.1, mask_thr=0.5, filter_thr=0.05, kernel="gaussian", sigma=2.0, max_per_img=100 # gaussian/linear + ), +) train_dataloader = dict(batch_size=8, num_workers=4) # Enable automatic-mixed-precision training with AmpOptimWrapper. optim_wrapper = dict( - type='AmpOptimWrapper', - optimizer=dict( - type='SGD', lr=0.01 * 4, momentum=0.9, weight_decay=0.00004), - clip_grad=dict(max_norm=35, norm_type=2)) + type="AmpOptimWrapper", + optimizer=dict(type="SGD", lr=0.01 * 4, momentum=0.9, weight_decay=0.00004), + clip_grad=dict(max_norm=35, norm_type=2), +) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/solov2/solov2-light_r18_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/solov2/solov2-light_r18_fpn_ms-3x_coco.py index f8fc53e0aed9dd4479f9cd8dcc98ca61db2e50bf..99508b2cbe5e62d001f18560f5c1da0076f0516e 100644 --- a/mmpose/configs/mmdet/solov2/solov2-light_r18_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/solov2/solov2-light_r18_fpn_ms-3x_coco.py @@ -1,7 +1,4 @@ -_base_ = './solov2-light_r50_fpn_ms-3x_coco.py' +_base_ = "./solov2-light_r50_fpn_ms-3x_coco.py" # model settings -model = dict( - backbone=dict( - depth=18, init_cfg=dict(checkpoint='torchvision://resnet18')), - neck=dict(in_channels=[64, 128, 256, 512])) +model = dict(backbone=dict(depth=18, init_cfg=dict(checkpoint="torchvision://resnet18")), neck=dict(in_channels=[64, 128, 256, 512])) diff --git a/mmpose/configs/mmdet/solov2/solov2-light_r34_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/solov2/solov2-light_r34_fpn_ms-3x_coco.py index 149b336655349c70233e78d03f72d7ee3f1a75f3..6e9f64dccc5567b3b0be51f84c9a19adc157afef 100644 --- a/mmpose/configs/mmdet/solov2/solov2-light_r34_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/solov2/solov2-light_r34_fpn_ms-3x_coco.py @@ -1,7 +1,4 @@ -_base_ = './solov2-light_r50_fpn_ms-3x_coco.py' +_base_ = "./solov2-light_r50_fpn_ms-3x_coco.py" # model settings -model = dict( - backbone=dict( - depth=34, init_cfg=dict(checkpoint='torchvision://resnet34')), - neck=dict(in_channels=[64, 128, 256, 512])) +model = dict(backbone=dict(depth=34, init_cfg=dict(checkpoint="torchvision://resnet34")), neck=dict(in_channels=[64, 128, 256, 512])) diff --git a/mmpose/configs/mmdet/solov2/solov2-light_r50-dcn_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/solov2/solov2-light_r50-dcn_fpn_ms-3x_coco.py index 05391944b683985ab975dc8f66be0c8a12f7d255..e24578ed24bff0e6a4e02049a8b44bc809b25b07 100644 --- a/mmpose/configs/mmdet/solov2/solov2-light_r50-dcn_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/solov2/solov2-light_r50-dcn_fpn_ms-3x_coco.py @@ -1,14 +1,14 @@ -_base_ = './solov2-light_r50_fpn_ms-3x_coco.py' +_base_ = "./solov2-light_r50_fpn_ms-3x_coco.py" # model settings model = dict( - backbone=dict( - dcn=dict(type='DCNv2', deformable_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True)), + backbone=dict(dcn=dict(type="DCNv2", deformable_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True)), mask_head=dict( feat_channels=256, stacked_convs=3, scale_ranges=((1, 64), (32, 128), (64, 256), (128, 512), (256, 2048)), mask_feature_head=dict(out_channels=128), - dcn_cfg=dict(type='DCNv2'), - dcn_apply_to_all_conv=False)) # light solov2 head + dcn_cfg=dict(type="DCNv2"), + dcn_apply_to_all_conv=False, + ), +) # light solov2 head diff --git a/mmpose/configs/mmdet/solov2/solov2-light_r50_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/solov2/solov2-light_r50_fpn_ms-3x_coco.py index cf0a7f779c0f587d11c86a31aca19b2663f79a57..bed064f40ccd3f39c04cf3d648e7459a71fa6a1a 100644 --- a/mmpose/configs/mmdet/solov2/solov2-light_r50_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/solov2/solov2-light_r50_fpn_ms-3x_coco.py @@ -1,4 +1,4 @@ -_base_ = './solov2_r50_fpn_1x_coco.py' +_base_ = "./solov2_r50_fpn_1x_coco.py" # model settings model = dict( @@ -6,28 +6,23 @@ model = dict( stacked_convs=2, feat_channels=256, scale_ranges=((1, 56), (28, 112), (56, 224), (112, 448), (224, 896)), - mask_feature_head=dict(out_channels=128))) + mask_feature_head=dict(out_channels=128), + ) +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomChoiceResize', - scales=[(768, 512), (768, 480), (768, 448), (768, 416), (768, 384), - (768, 352)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomChoiceResize", scales=[(768, 512), (768, 480), (768, 448), (768, 416), (768, 384), (768, 352)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(448, 768), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(448, 768), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) @@ -40,17 +35,6 @@ train_cfg = dict(by_epoch=True, max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 3, - by_epoch=False, - begin=0, - end=500), - dict( - type='MultiStepLR', - begin=0, - end=36, - by_epoch=True, - milestones=[27, 33], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 3, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=36, by_epoch=True, milestones=[27, 33], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/solov2/solov2_r101-dcn_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/solov2/solov2_r101-dcn_fpn_ms-3x_coco.py index 370a4eb7db811b285cc55282e4b66360ca338a31..67853880b8bb708f9614592c32546ac392f4d29d 100644 --- a/mmpose/configs/mmdet/solov2/solov2_r101-dcn_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/solov2/solov2_r101-dcn_fpn_ms-3x_coco.py @@ -1,13 +1,12 @@ -_base_ = './solov2_r50_fpn_ms-3x_coco.py' +_base_ = "./solov2_r50_fpn_ms-3x_coco.py" # model settings model = dict( backbone=dict( depth=101, - init_cfg=dict(checkpoint='torchvision://resnet101'), - dcn=dict(type='DCNv2', deformable_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True)), - mask_head=dict( - mask_feature_head=dict(conv_cfg=dict(type='DCNv2')), - dcn_cfg=dict(type='DCNv2'), - dcn_apply_to_all_conv=True)) + init_cfg=dict(checkpoint="torchvision://resnet101"), + dcn=dict(type="DCNv2", deformable_groups=1, fallback_on_stride=False), + stage_with_dcn=(False, True, True, True), + ), + mask_head=dict(mask_feature_head=dict(conv_cfg=dict(type="DCNv2")), dcn_cfg=dict(type="DCNv2"), dcn_apply_to_all_conv=True), +) diff --git a/mmpose/configs/mmdet/solov2/solov2_r101_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/solov2/solov2_r101_fpn_ms-3x_coco.py index 96aaac0a7c2689a125ac0a68edaff2a76dfc773d..98bf203c6930cd0f1735f21c246cab6ae195a713 100644 --- a/mmpose/configs/mmdet/solov2/solov2_r101_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/solov2/solov2_r101_fpn_ms-3x_coco.py @@ -1,6 +1,4 @@ -_base_ = './solov2_r50_fpn_ms-3x_coco.py' +_base_ = "./solov2_r50_fpn_ms-3x_coco.py" # model settings -model = dict( - backbone=dict( - depth=101, init_cfg=dict(checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/solov2/solov2_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/solov2/solov2_r50_fpn_1x_coco.py index 138ca010b5f3f96a4f296ffbe66cb1be3add7ec2..267a32d5def333f98dd3e12cfa2c5b1484989bed 100644 --- a/mmpose/configs/mmdet/solov2/solov2_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/solov2/solov2_r50_fpn_1x_coco.py @@ -1,34 +1,28 @@ -_base_ = [ - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_instance.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings model = dict( - type='SOLOv2', + type="SOLOv2", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_mask=True, - pad_size_divisor=32), + pad_size_divisor=32, + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=0, - num_outs=5), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + style="pytorch", + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=0, num_outs=5), mask_head=dict( - type='SOLOV2Head', + type="SOLOV2Head", num_classes=80, in_channels=256, feat_channels=512, @@ -44,27 +38,19 @@ model = dict( end_level=3, out_channels=256, mask_stride=4, - norm_cfg=dict(type='GN', num_groups=32, requires_grad=True)), - loss_mask=dict(type='DiceLoss', use_sigmoid=True, loss_weight=3.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0)), + norm_cfg=dict(type="GN", num_groups=32, requires_grad=True), + ), + loss_mask=dict(type="DiceLoss", use_sigmoid=True, loss_weight=3.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + ), # model training and testing settings test_cfg=dict( - nms_pre=500, - score_thr=0.1, - mask_thr=0.5, - filter_thr=0.05, - kernel='gaussian', # gaussian/linear - sigma=2.0, - max_per_img=100)) + nms_pre=500, score_thr=0.1, mask_thr=0.5, filter_thr=0.05, kernel="gaussian", sigma=2.0, max_per_img=100 # gaussian/linear + ), +) # optimizer -optim_wrapper = dict( - optimizer=dict(lr=0.01), clip_grad=dict(max_norm=35, norm_type=2)) +optim_wrapper = dict(optimizer=dict(lr=0.01), clip_grad=dict(max_norm=35, norm_type=2)) -val_evaluator = dict(metric='segm') +val_evaluator = dict(metric="segm") test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/solov2/solov2_r50_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/solov2/solov2_r50_fpn_ms-3x_coco.py index d6f09827efbe4e135a784b0808604dbc855ed47e..e7552d63791e1ee0950ffaca72b86cb0ac42465d 100644 --- a/mmpose/configs/mmdet/solov2/solov2_r50_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/solov2/solov2_r50_fpn_ms-3x_coco.py @@ -1,15 +1,11 @@ -_base_ = './solov2_r50_fpn_1x_coco.py' +_base_ = "./solov2_r50_fpn_1x_coco.py" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 800), (1333, 768), (1333, 736), (1333, 704), - (1333, 672), (1333, 640)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomChoiceResize", scales=[(1333, 800), (1333, 768), (1333, 736), (1333, 704), (1333, 672), (1333, 640)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) @@ -19,17 +15,6 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 3, - by_epoch=False, - begin=0, - end=500), - dict( - type='MultiStepLR', - begin=0, - end=36, - by_epoch=True, - milestones=[27, 33], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 3, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=36, by_epoch=True, milestones=[27, 33], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/solov2/solov2_x101-dcn_fpn_ms-3x_coco.py b/mmpose/configs/mmdet/solov2/solov2_x101-dcn_fpn_ms-3x_coco.py index 612c45eb437efc481948edb660ef1a3eebbcfebe..eaab13b06f9dcb6318c9074e15a7b078b8d0ffab 100644 --- a/mmpose/configs/mmdet/solov2/solov2_x101-dcn_fpn_ms-3x_coco.py +++ b/mmpose/configs/mmdet/solov2/solov2_x101-dcn_fpn_ms-3x_coco.py @@ -1,17 +1,15 @@ -_base_ = './solov2_r50_fpn_ms-3x_coco.py' +_base_ = "./solov2_r50_fpn_ms-3x_coco.py" # model settings model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, - dcn=dict(type='DCNv2', deformable_groups=1, fallback_on_stride=False), + dcn=dict(type="DCNv2", deformable_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d')), - mask_head=dict( - mask_feature_head=dict(conv_cfg=dict(type='DCNv2')), - dcn_cfg=dict(type='DCNv2'), - dcn_apply_to_all_conv=True)) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ), + mask_head=dict(mask_feature_head=dict(conv_cfg=dict(type="DCNv2")), dcn_cfg=dict(type="DCNv2"), dcn_apply_to_all_conv=True), +) diff --git a/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py b/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py index f1d5b72ce3fff73504a0c032867d246bc4e30123..a31968c3526dddab887ba9d5bb5062aed01676f5 100644 --- a/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py +++ b/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py @@ -1,41 +1,24 @@ -_base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/mot_challenge_det.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/models/faster-rcnn_r50_fpn.py", "../_base_/datasets/mot_challenge_det.py", "../_base_/default_runtime.py"] model = dict( - rpn_head=dict( - bbox_coder=dict(clip_border=False), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), - roi_head=dict( - bbox_head=dict( - num_classes=1, - bbox_coder=dict(clip_border=False), - loss_bbox=dict(type='SmoothL1Loss', loss_weight=1.0))), + rpn_head=dict(bbox_coder=dict(clip_border=False), loss_bbox=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.0)), + roi_head=dict(bbox_head=dict(num_classes=1, bbox_coder=dict(clip_border=False), loss_bbox=dict(type="SmoothL1Loss", loss_weight=1.0))), init_cfg=dict( - type='Pretrained', - checkpoint= # noqa: E251 - 'http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth' # noqa: E501 - )) + type="Pretrained", + checkpoint="http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth", # noqa: E251 # noqa: E501 + ), +) # training schedule for 4e -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=4, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=4, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate param_scheduler = [ - dict(type='LinearLR', start_factor=0.01, by_epoch=False, begin=0, end=100), - dict( - type='MultiStepLR', - begin=0, - end=4, - by_epoch=True, - milestones=[3], - gamma=0.1) + dict(type="LinearLR", start_factor=0.01, by_epoch=False, begin=0, end=100), + dict(type="MultiStepLR", begin=0, end=4, by_epoch=True, milestones=[3], gamma=0.1), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-4e_mot17train_test-mot17train.py b/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-4e_mot17train_test-mot17train.py index 83647061c7f59dc8a6e8d033cdb8dc81de648df4..fc48a9e69c37fab97cc6082293728c92a8c31566 100644 --- a/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-4e_mot17train_test-mot17train.py +++ b/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-4e_mot17train_test-mot17train.py @@ -1,11 +1,9 @@ -_base_ = ['./faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval'] +_base_ = ["./faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval"] # data -data_root = 'data/MOT17/' -train_dataloader = dict( - dataset=dict(ann_file='annotations/train_cocoformat.json')) -val_dataloader = dict( - dataset=dict(ann_file='annotations/train_cocoformat.json')) +data_root = "data/MOT17/" +train_dataloader = dict(dataset=dict(ann_file="annotations/train_cocoformat.json")) +val_dataloader = dict(dataset=dict(ann_file="annotations/train_cocoformat.json")) test_dataloader = val_dataloader -val_evaluator = dict(ann_file=data_root + 'annotations/train_cocoformat.json') +val_evaluator = dict(ann_file=data_root + "annotations/train_cocoformat.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-8e_mot20halftrain_test-mot20halfval.py b/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-8e_mot20halftrain_test-mot20halfval.py index a6d14ad8be2a939bce168f4f09f08dde50f140c8..54fa708ecfcf6f3d90622ac2dbd091613ae38be5 100644 --- a/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-8e_mot20halftrain_test-mot20halfval.py +++ b/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-8e_mot20halftrain_test-mot20halfval.py @@ -1,29 +1,21 @@ -_base_ = ['./faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval'] +_base_ = ["./faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval"] model = dict( - rpn_head=dict(bbox_coder=dict(clip_border=True)), - roi_head=dict( - bbox_head=dict(bbox_coder=dict(clip_border=True), num_classes=1))) + rpn_head=dict(bbox_coder=dict(clip_border=True)), roi_head=dict(bbox_head=dict(bbox_coder=dict(clip_border=True), num_classes=1)) +) # data -data_root = 'data/MOT20/' +data_root = "data/MOT20/" train_dataloader = dict(dataset=dict(data_root=data_root)) val_dataloader = dict(dataset=dict(data_root=data_root)) test_dataloader = val_dataloader -val_evaluator = dict(ann_file=data_root + - 'annotations/half-val_cocoformat.json') +val_evaluator = dict(ann_file=data_root + "annotations/half-val_cocoformat.json") test_evaluator = val_evaluator # training schedule for 8e -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=8, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=8, val_interval=1) # learning rate param_scheduler = [ - dict(type='LinearLR', start_factor=0.01, by_epoch=False, begin=0, end=100), - dict( - type='MultiStepLR', - begin=0, - end=8, - by_epoch=True, - milestones=[6], - gamma=0.1) + dict(type="LinearLR", start_factor=0.01, by_epoch=False, begin=0, end=100), + dict(type="MultiStepLR", begin=0, end=8, by_epoch=True, milestones=[6], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-8e_mot20train_test-mot20train.py b/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-8e_mot20train_test-mot20train.py index 85c859732cb3e4742d3003d555f72f4cc7ac2e05..a80c125984bab3a5112351f9095212a4c0bef682 100644 --- a/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-8e_mot20train_test-mot20train.py +++ b/mmpose/configs/mmdet/sort/faster-rcnn_r50_fpn_8xb2-8e_mot20train_test-mot20train.py @@ -1,32 +1,21 @@ -_base_ = ['./faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval'] +_base_ = ["./faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval"] model = dict( - rpn_head=dict(bbox_coder=dict(clip_border=True)), - roi_head=dict( - bbox_head=dict(bbox_coder=dict(clip_border=True), num_classes=1))) + rpn_head=dict(bbox_coder=dict(clip_border=True)), roi_head=dict(bbox_head=dict(bbox_coder=dict(clip_border=True), num_classes=1)) +) # data -data_root = 'data/MOT20/' -train_dataloader = dict( - dataset=dict( - data_root=data_root, ann_file='annotations/train_cocoformat.json')) -val_dataloader = dict( - dataset=dict( - data_root=data_root, ann_file='annotations/train_cocoformat.json')) +data_root = "data/MOT20/" +train_dataloader = dict(dataset=dict(data_root=data_root, ann_file="annotations/train_cocoformat.json")) +val_dataloader = dict(dataset=dict(data_root=data_root, ann_file="annotations/train_cocoformat.json")) test_dataloader = val_dataloader -val_evaluator = dict(ann_file=data_root + 'annotations/train_cocoformat.json') +val_evaluator = dict(ann_file=data_root + "annotations/train_cocoformat.json") test_evaluator = val_evaluator # training schedule for 8e -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=8, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=8, val_interval=1) # learning rate param_scheduler = [ - dict(type='LinearLR', start_factor=0.01, by_epoch=False, begin=0, end=100), - dict( - type='MultiStepLR', - begin=0, - end=8, - by_epoch=True, - milestones=[6], - gamma=0.1) + dict(type="LinearLR", start_factor=0.01, by_epoch=False, begin=0, end=100), + dict(type="MultiStepLR", begin=0, end=8, by_epoch=True, milestones=[6], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/sort/sort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py b/mmpose/configs/mmdet/sort/sort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py index 78acb774ec22b7555e633b541c21fe20beb75ce9..0cf0764715b46d9fd74357635590c4bb34a86f66 100644 --- a/mmpose/configs/mmdet/sort/sort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py +++ b/mmpose/configs/mmdet/sort/sort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py @@ -1,54 +1,44 @@ -_base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', - '../_base_/datasets/mot_challenge.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/models/faster-rcnn_r50_fpn.py", "../_base_/datasets/mot_challenge.py", "../_base_/default_runtime.py"] -default_hooks = dict( - logger=dict(type='LoggerHook', interval=1), - visualization=dict(type='TrackVisualizationHook', draw=False)) +default_hooks = dict(logger=dict(type="LoggerHook", interval=1), visualization=dict(type="TrackVisualizationHook", draw=False)) -vis_backends = [dict(type='LocalVisBackend')] -visualizer = dict( - type='TrackLocalVisualizer', vis_backends=vis_backends, name='visualizer') +vis_backends = [dict(type="LocalVisBackend")] +visualizer = dict(type="TrackLocalVisualizer", vis_backends=vis_backends, name="visualizer") # custom hooks custom_hooks = [ # Synchronize model buffers such as running_mean and running_var in BN # at the end of each epoch - dict(type='SyncBuffersHook') + dict(type="SyncBuffersHook") ] detector = _base_.model -detector.pop('data_preprocessor') +detector.pop("data_preprocessor") detector.rpn_head.bbox_coder.update(dict(clip_border=False)) detector.roi_head.bbox_head.update(dict(num_classes=1)) detector.roi_head.bbox_head.bbox_coder.update(dict(clip_border=False)) -detector['init_cfg'] = dict( - type='Pretrained', - checkpoint= # noqa: E251 - 'https://download.openmmlab.com/mmtracking/mot/' - 'faster_rcnn/faster-rcnn_r50_fpn_4e_mot17-half-64ee2ed4.pth') # noqa: E501 +detector["init_cfg"] = dict( + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmtracking/mot/" "faster_rcnn/faster-rcnn_r50_fpn_4e_mot17-half-64ee2ed4.pth", # noqa: E251 +) # noqa: E501 del _base_.model model = dict( - type='DeepSORT', + type="DeepSORT", data_preprocessor=dict( - type='TrackDataPreprocessor', + type="TrackDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, rgb_to_bgr=False, - pad_size_divisor=32), + pad_size_divisor=32, + ), detector=detector, - tracker=dict( - type='SORTTracker', - motion=dict(type='KalmanFilter', center_only=False), - obj_score_thr=0.5, - match_iou_thr=0.5, - reid=None)) + tracker=dict(type="SORTTracker", motion=dict(type="KalmanFilter", center_only=False), obj_score_thr=0.5, match_iou_thr=0.5, reid=None), +) train_dataloader = None train_cfg = None -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") diff --git a/mmpose/configs/mmdet/sort/sort_faster-rcnn_r50_fpn_8xb2-4e_mot17train_test-mot17test.py b/mmpose/configs/mmdet/sort/sort_faster-rcnn_r50_fpn_8xb2-4e_mot17train_test-mot17test.py index 921652c4430ccf63cd5850884b2a064e8dc73251..db0420c1f9b39ec50c9e0c62cbda280052a9e0b6 100644 --- a/mmpose/configs/mmdet/sort/sort_faster-rcnn_r50_fpn_8xb2-4e_mot17train_test-mot17test.py +++ b/mmpose/configs/mmdet/sort/sort_faster-rcnn_r50_fpn_8xb2-4e_mot17train_test-mot17test.py @@ -1,15 +1,8 @@ -_base_ = [ - './sort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain' - '_test-mot17halfval.py' -] +_base_ = ["./sort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain" "_test-mot17halfval.py"] # dataloader -val_dataloader = dict( - dataset=dict(ann_file='annotations/train_cocoformat.json')) -test_dataloader = dict( - dataset=dict( - ann_file='annotations/test_cocoformat.json', - data_prefix=dict(img_path='test'))) +val_dataloader = dict(dataset=dict(ann_file="annotations/train_cocoformat.json")) +test_dataloader = dict(dataset=dict(ann_file="annotations/test_cocoformat.json", data_prefix=dict(img_path="test"))) # evaluator -test_evaluator = dict(format_only=True, outfile_prefix='./mot_17_test_res') +test_evaluator = dict(format_only=True, outfile_prefix="./mot_17_test_res") diff --git a/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r101_fpn_300-proposals_crop-ms-480-800-3x_coco.py b/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r101_fpn_300-proposals_crop-ms-480-800-3x_coco.py index 09c11c6565ea2444fe8ffc930ca49fbffff3e8fa..cc0d1025fe3e2f60d30af683be9478d2968bc160 100644 --- a/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r101_fpn_300-proposals_crop-ms-480-800-3x_coco.py +++ b/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r101_fpn_300-proposals_crop-ms-480-800-3x_coco.py @@ -1,7 +1,3 @@ -_base_ = './sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py' +_base_ = "./sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r101_fpn_ms-480-800-3x_coco.py b/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r101_fpn_ms-480-800-3x_coco.py index a51f11ce5b6d55b2037461a93aa2bd18c8f2639d..55848ce89c9598417e4ff0264c9e3d692c71955c 100644 --- a/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r101_fpn_ms-480-800-3x_coco.py +++ b/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r101_fpn_ms-480-800-3x_coco.py @@ -1,7 +1,3 @@ -_base_ = './sparse-rcnn_r50_fpn_ms-480-800-3x_coco.py' +_base_ = "./sparse-rcnn_r50_fpn_ms-480-800-3x_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r50_fpn_1x_coco.py index 88354427b4138f4f5587f2a4a047bad654693780..bfd048b10f4c5365f176001d5b560325150247a2 100644 --- a/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r50_fpn_1x_coco.py @@ -1,51 +1,38 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] num_stages = 6 num_proposals = 100 model = dict( - type='SparseRCNN', + type="SparseRCNN", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=0, - add_extra_convs='on_input', - num_outs=4), - rpn_head=dict( - type='EmbeddingRPNHead', - num_proposals=num_proposals, - proposal_feature_channel=256), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=0, add_extra_convs="on_input", num_outs=4), + rpn_head=dict(type="EmbeddingRPNHead", num_proposals=num_proposals, proposal_feature_channel=256), roi_head=dict( - type='SparseRoIHead', + type="SparseRoIHead", num_stages=num_stages, stage_loss_weights=[1] * num_stages, proposal_feature_channel=256, bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=2), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=2), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=[ dict( - type='DIIHead', + type="DIIHead", num_classes=80, num_ffn_fcs=2, num_heads=8, @@ -54,48 +41,47 @@ model = dict( feedforward_channels=2048, in_channels=256, dropout=0.0, - ffn_act_cfg=dict(type='ReLU', inplace=True), + ffn_act_cfg=dict(type="ReLU", inplace=True), dynamic_conv_cfg=dict( - type='DynamicConv', + type="DynamicConv", in_channels=256, feat_channels=64, out_channels=256, input_feat_shape=7, - act_cfg=dict(type='ReLU', inplace=True), - norm_cfg=dict(type='LN')), - loss_bbox=dict(type='L1Loss', loss_weight=5.0), - loss_iou=dict(type='GIoULoss', loss_weight=2.0), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=2.0), + act_cfg=dict(type="ReLU", inplace=True), + norm_cfg=dict(type="LN"), + ), + loss_bbox=dict(type="L1Loss", loss_weight=5.0), + loss_iou=dict(type="GIoULoss", loss_weight=2.0), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=2.0), bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - clip_border=False, - target_means=[0., 0., 0., 0.], - target_stds=[0.5, 0.5, 1., 1.])) for _ in range(num_stages) - ]), + type="DeltaXYWHBBoxCoder", clip_border=False, target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.5, 0.5, 1.0, 1.0] + ), + ) + for _ in range(num_stages) + ], + ), # training and testing settings train_cfg=dict( rpn=None, rcnn=[ dict( assigner=dict( - type='HungarianAssigner', + type="HungarianAssigner", match_costs=[ - dict(type='FocalLossCost', weight=2.0), - dict(type='BBoxL1Cost', weight=5.0, box_format='xyxy'), - dict(type='IoUCost', iou_mode='giou', weight=2.0) - ]), - sampler=dict(type='PseudoSampler'), - pos_weight=1) for _ in range(num_stages) - ]), - test_cfg=dict(rpn=None, rcnn=dict(max_per_img=num_proposals))) + dict(type="FocalLossCost", weight=2.0), + dict(type="BBoxL1Cost", weight=5.0, box_format="xyxy"), + dict(type="IoUCost", iou_mode="giou", weight=2.0), + ], + ), + sampler=dict(type="PseudoSampler"), + pos_weight=1, + ) + for _ in range(num_stages) + ], + ), + test_cfg=dict(rpn=None, rcnn=dict(max_per_img=num_proposals)), +) # optimizer -optim_wrapper = dict( - optimizer=dict( - _delete_=True, type='AdamW', lr=0.000025, weight_decay=0.0001), - clip_grad=dict(max_norm=1, norm_type=2)) +optim_wrapper = dict(optimizer=dict(_delete_=True, type="AdamW", lr=0.000025, weight_decay=0.0001), clip_grad=dict(max_norm=1, norm_type=2)) diff --git a/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py b/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py index 93edc0314b510c635f703f82e39c446ed056c6ea..31470bb2e7824e63d6de8ed59f1d8c44a331d2ff 100644 --- a/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py +++ b/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r50_fpn_300-proposals_crop-ms-480-800-3x_coco.py @@ -1,43 +1,57 @@ -_base_ = './sparse-rcnn_r50_fpn_ms-480-800-3x_coco.py' +_base_ = "./sparse-rcnn_r50_fpn_ms-480-800-3x_coco.py" num_proposals = 300 -model = dict( - rpn_head=dict(num_proposals=num_proposals), - test_cfg=dict( - _delete_=True, rpn=None, rcnn=dict(max_per_img=num_proposals))) +model = dict(rpn_head=dict(num_proposals=num_proposals), test_cfg=dict(_delete_=True, rpn=None, rcnn=dict(max_per_img=num_proposals))) # augmentation strategy originates from DETR. train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', - transforms=[[ - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoice", + transforms=[ + [ + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) + ], + [ + dict(type="RandomChoiceResize", scales=[(400, 1333), (500, 1333), (600, 1333)], keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], ], - [ - dict( - type='RandomChoiceResize', - scales=[(400, 1333), (500, 1333), (600, 1333)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), - (576, 1333), (608, 1333), (640, 1333), - (672, 1333), (704, 1333), (736, 1333), - (768, 1333), (800, 1333)], - keep_ratio=True) - ]]), - dict(type='PackDetInputs') + ), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r50_fpn_ms-480-800-3x_coco.py b/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r50_fpn_ms-480-800-3x_coco.py index 156028d7cdd22c32c00a765c6cf86b8f9e2df48b..40de30167b3446b5896fbea8060fb8fd71546471 100644 --- a/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r50_fpn_ms-480-800-3x_coco.py +++ b/mmpose/configs/mmdet/sparse_rcnn/sparse-rcnn_r50_fpn_ms-480-800-3x_coco.py @@ -1,32 +1,36 @@ -_base_ = './sparse-rcnn_r50_fpn_1x_coco.py' +_base_ = "./sparse-rcnn_r50_fpn_1x_coco.py" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) # learning policy max_epochs = 36 -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=max_epochs) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[27, 33], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[27, 33], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/ssd/ssd300_coco.py b/mmpose/configs/mmdet/ssd/ssd300_coco.py index 796d25c905350a8ed263b9cd1d2f8027b8c9a3ca..416fd2693b13c6f376b6d576dcaef29d899dbaa7 100644 --- a/mmpose/configs/mmdet/ssd/ssd300_coco.py +++ b/mmpose/configs/mmdet/ssd/ssd300_coco.py @@ -1,40 +1,32 @@ _base_ = [ - '../_base_/models/ssd300.py', '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/ssd300.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] # dataset settings input_size = 300 train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='Expand', + type="Expand", mean={{_base_.model.data_preprocessor.mean}}, to_rgb={{_base_.model.data_preprocessor.bgr_to_rgb}}, - ratio_range=(1, 4)), - dict( - type='MinIoURandomCrop', - min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), - min_crop_size=0.3), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), - dict(type='RandomFlip', prob=0.5), - dict( - type='PhotoMetricDistortion', - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18), - dict(type='PackDetInputs') + ratio_range=(1, 4), + ), + dict(type="MinIoURandomCrop", min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), + dict(type="RandomFlip", prob=0.5), + dict(type="PhotoMetricDistortion", brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=8, @@ -42,28 +34,26 @@ train_dataloader = dict( batch_sampler=None, dataset=dict( _delete_=True, - type='RepeatDataset', + type="RepeatDataset", times=5, dataset=dict( type={{_base_.dataset_type}}, data_root={{_base_.data_root}}, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args={{_base_.backend_args}}))) + backend_args={{_base_.backend_args}}, + ), + ), +) val_dataloader = dict(batch_size=8, dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=2e-3, momentum=0.9, weight_decay=5e-4)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=2e-3, momentum=0.9, weight_decay=5e-4)) -custom_hooks = [ - dict(type='NumClassCheckHook'), - dict(type='CheckInvalidLossHook', interval=50, priority='VERY_LOW') -] +custom_hooks = [dict(type="NumClassCheckHook"), dict(type="CheckInvalidLossHook", interval=50, priority="VERY_LOW")] # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/ssd/ssd512_coco.py b/mmpose/configs/mmdet/ssd/ssd512_coco.py index 7acd6144202e8fee232e3ed49a557d3cf7c53e15..437c4059e4098e5a87e07d72e1a75a04685b98eb 100644 --- a/mmpose/configs/mmdet/ssd/ssd512_coco.py +++ b/mmpose/configs/mmdet/ssd/ssd512_coco.py @@ -1,54 +1,45 @@ -_base_ = 'ssd300_coco.py' +_base_ = "ssd300_coco.py" # model settings input_size = 512 model = dict( neck=dict( - out_channels=(512, 1024, 512, 256, 256, 256, 256), - level_strides=(2, 2, 2, 2, 1), - level_paddings=(1, 1, 1, 1, 1), - last_kernel_size=4), + out_channels=(512, 1024, 512, 256, 256, 256, 256), level_strides=(2, 2, 2, 2, 1), level_paddings=(1, 1, 1, 1, 1), last_kernel_size=4 + ), bbox_head=dict( in_channels=(512, 1024, 512, 256, 256, 256, 256), anchor_generator=dict( - type='SSDAnchorGenerator', + type="SSDAnchorGenerator", scale_major=False, input_size=input_size, basesize_ratio_range=(0.1, 0.9), strides=[8, 16, 32, 64, 128, 256, 512], - ratios=[[2], [2, 3], [2, 3], [2, 3], [2, 3], [2], [2]]))) + ratios=[[2], [2, 3], [2, 3], [2, 3], [2, 3], [2], [2]], + ), + ), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), dict( - type='Expand', + type="Expand", mean={{_base_.model.data_preprocessor.mean}}, to_rgb={{_base_.model.data_preprocessor.bgr_to_rgb}}, - ratio_range=(1, 4)), - dict( - type='MinIoURandomCrop', - min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), - min_crop_size=0.3), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), - dict(type='RandomFlip', prob=0.5), - dict( - type='PhotoMetricDistortion', - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18), - dict(type='PackDetInputs') + ratio_range=(1, 4), + ), + dict(type="MinIoURandomCrop", min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), + dict(type="RandomFlip", prob=0.5), + dict(type="PhotoMetricDistortion", brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict(dataset=dict(dataset=dict(pipeline=train_pipeline))) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) diff --git a/mmpose/configs/mmdet/ssd/ssdlite_mobilenetv2-scratch_8xb24-600e_coco.py b/mmpose/configs/mmdet/ssd/ssdlite_mobilenetv2-scratch_8xb24-600e_coco.py index 4e508f20ecf33e58ddfe6ff8ee94f516d3e03f79..c21ed6c1404adaf0b9ae440aa5c61495da5533c7 100644 --- a/mmpose/configs/mmdet/ssd/ssdlite_mobilenetv2-scratch_8xb24-600e_coco.py +++ b/mmpose/configs/mmdet/ssd/ssdlite_mobilenetv2-scratch_8xb24-600e_coco.py @@ -1,111 +1,81 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings data_preprocessor = dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=1) + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=1 +) model = dict( - type='SingleStageDetector', + type="SingleStageDetector", data_preprocessor=data_preprocessor, backbone=dict( - type='MobileNetV2', + type="MobileNetV2", out_indices=(4, 7), - norm_cfg=dict(type='BN', eps=0.001, momentum=0.03), - init_cfg=dict(type='TruncNormal', layer='Conv2d', std=0.03)), + norm_cfg=dict(type="BN", eps=0.001, momentum=0.03), + init_cfg=dict(type="TruncNormal", layer="Conv2d", std=0.03), + ), neck=dict( - type='SSDNeck', + type="SSDNeck", in_channels=(96, 1280), out_channels=(96, 1280, 512, 256, 256, 128), level_strides=(2, 2, 2, 2), level_paddings=(1, 1, 1, 1), l2_norm_scale=None, use_depthwise=True, - norm_cfg=dict(type='BN', eps=0.001, momentum=0.03), - act_cfg=dict(type='ReLU6'), - init_cfg=dict(type='TruncNormal', layer='Conv2d', std=0.03)), + norm_cfg=dict(type="BN", eps=0.001, momentum=0.03), + act_cfg=dict(type="ReLU6"), + init_cfg=dict(type="TruncNormal", layer="Conv2d", std=0.03), + ), bbox_head=dict( - type='SSDHead', + type="SSDHead", in_channels=(96, 1280, 512, 256, 256, 128), num_classes=80, use_depthwise=True, - norm_cfg=dict(type='BN', eps=0.001, momentum=0.03), - act_cfg=dict(type='ReLU6'), - init_cfg=dict(type='Normal', layer='Conv2d', std=0.001), - + norm_cfg=dict(type="BN", eps=0.001, momentum=0.03), + act_cfg=dict(type="ReLU6"), + init_cfg=dict(type="Normal", layer="Conv2d", std=0.001), # set anchor size manually instead of using the predefined # SSD300 setting. anchor_generator=dict( - type='SSDAnchorGenerator', + type="SSDAnchorGenerator", scale_major=False, strides=[16, 32, 64, 107, 160, 320], ratios=[[2, 3], [2, 3], [2, 3], [2, 3], [2, 3], [2, 3]], min_sizes=[48, 100, 150, 202, 253, 304], - max_sizes=[100, 150, 202, 253, 304, 320]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2])), + max_sizes=[100, 150, 202, 253, 304, 320], + ), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + ), # model training and testing settings train_cfg=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0., - ignore_iof_thr=-1, - gt_max_assign_all=False), - sampler=dict(type='PseudoSampler'), - smoothl1_beta=1., + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.0, ignore_iof_thr=-1, gt_max_assign_all=False), + sampler=dict(type="PseudoSampler"), + smoothl1_beta=1.0, allowed_border=-1, pos_weight=-1, neg_pos_ratio=3, - debug=False), - test_cfg=dict( - nms_pre=1000, - nms=dict(type='nms', iou_threshold=0.45), - min_bbox_size=0, - score_thr=0.02, - max_per_img=200)) + debug=False, + ), + test_cfg=dict(nms_pre=1000, nms=dict(type="nms", iou_threshold=0.45), min_bbox_size=0, score_thr=0.02, max_per_img=200), +) env_cfg = dict(cudnn_benchmark=True) # dataset settings input_size = 320 train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='Expand', - mean=data_preprocessor['mean'], - to_rgb=data_preprocessor['bgr_to_rgb'], - ratio_range=(1, 4)), - dict( - type='MinIoURandomCrop', - min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), - min_crop_size=0.3), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), - dict(type='RandomFlip', prob=0.5), - dict( - type='PhotoMetricDistortion', - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Expand", mean=data_preprocessor["mean"], to_rgb=data_preprocessor["bgr_to_rgb"], ratio_range=(1, 4)), + dict(type="MinIoURandomCrop", min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), + dict(type="RandomFlip", prob=0.5), + dict(type="PhotoMetricDistortion", brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile"), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=24, @@ -113,15 +83,18 @@ train_dataloader = dict( batch_sampler=None, dataset=dict( _delete_=True, - type='RepeatDataset', + type="RepeatDataset", times=5, dataset=dict( type={{_base_.dataset_type}}, data_root={{_base_.data_root}}, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + ), + ), +) val_dataloader = dict(batch_size=8, dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader @@ -131,26 +104,14 @@ train_cfg = dict(max_epochs=max_epochs, val_interval=5) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='CosineAnnealingLR', - begin=0, - T_max=max_epochs, - end=max_epochs, - by_epoch=True, - eta_min=0) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="CosineAnnealingLR", begin=0, T_max=max_epochs, end=max_epochs, by_epoch=True, eta_min=0), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.015, momentum=0.9, weight_decay=4.0e-5)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=0.015, momentum=0.9, weight_decay=4.0e-5)) -custom_hooks = [ - dict(type='NumClassCheckHook'), - dict(type='CheckInvalidLossHook', interval=50, priority='VERY_LOW') -] +custom_hooks = [dict(type="NumClassCheckHook"), dict(type="CheckInvalidLossHook", interval=50, priority="VERY_LOW")] # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_amp-lsj-100e_coco.py b/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_amp-lsj-100e_coco.py index b004d740a8f1e303bc4ad32593baad021ccae710..078712193b7f3cd66b26c14d17d6de2a5bc4aed0 100644 --- a/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_amp-lsj-100e_coco.py +++ b/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_amp-lsj-100e_coco.py @@ -1,4 +1,4 @@ -_base_ = 'mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py' # noqa +_base_ = "mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py" # noqa # Enable automatic-mixed-precision training with AmpOptimWrapper. -optim_wrapper = dict(type='AmpOptimWrapper') +optim_wrapper = dict(type="AmpOptimWrapper") diff --git a/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py b/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py index 70e92a82e0cd1f083fbb87035f61877da4c11022..312fc6563648a0ec6951ba46865bef1e4babe4af 100644 --- a/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py +++ b/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py @@ -1,67 +1,43 @@ -_base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../common/lsj-100e_coco-instance.py' -] +_base_ = ["../_base_/models/mask-rcnn_r50_fpn.py", "../common/lsj-100e_coco-instance.py"] image_size = (1024, 1024) -batch_augments = [ - dict(type='BatchFixedSizePad', size=image_size, pad_mask=True) -] -norm_cfg = dict(type='SyncBN', requires_grad=True) +batch_augments = [dict(type="BatchFixedSizePad", size=image_size, pad_mask=True)] +norm_cfg = dict(type="SyncBN", requires_grad=True) # Use MMSyncBN that handles empty tensor in head. It can be changed to # SyncBN after https://github.com/pytorch/pytorch/issues/36530 is fixed -head_norm_cfg = dict(type='MMSyncBN', requires_grad=True) +head_norm_cfg = dict(type="MMSyncBN", requires_grad=True) model = dict( # use caffe norm data_preprocessor=dict( mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, - # pad_size_divisor=32 is unnecessary in training but necessary # in testing. pad_size_divisor=32, - batch_augments=batch_augments), - backbone=dict( - frozen_stages=-1, - norm_eval=False, - norm_cfg=norm_cfg, - init_cfg=None, - style='caffe'), + batch_augments=batch_augments, + ), + backbone=dict(frozen_stages=-1, norm_eval=False, norm_cfg=norm_cfg, init_cfg=None, style="caffe"), neck=dict(norm_cfg=norm_cfg), rpn_head=dict(num_convs=2), roi_head=dict( - bbox_head=dict( - type='Shared4Conv1FCBBoxHead', - conv_out_channels=256, - norm_cfg=head_norm_cfg), - mask_head=dict(norm_cfg=head_norm_cfg))) + bbox_head=dict(type="Shared4Conv1FCBBoxHead", conv_out_channels=256, norm_cfg=head_norm_cfg), mask_head=dict(norm_cfg=head_norm_cfg) + ), +) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='RandomResize', - scale=image_size, - ratio_range=(0.1, 2.0), - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=image_size, - recompute_bbox=True, - allow_negative_crop=True), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomResize", scale=image_size, ratio_range=(0.1, 2.0), keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=image_size, recompute_bbox=True, allow_negative_crop=True), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1e-2, 1e-2)), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] # Use RepeatDataset to speed up training diff --git a/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-400e_coco.py b/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-400e_coco.py index cb64c9b6865634412c8b9d951b588cf0fb8cd32b..9dcdd0bb6e9429038fef633d55dfa8b95fad29ef 100644 --- a/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-400e_coco.py +++ b/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-400e_coco.py @@ -1,20 +1,9 @@ -_base_ = './mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py' # noqa +_base_ = "./mask-rcnn_r50-caffe_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py" # noqa # Use RepeatDataset to speed up training # change repeat time from 4 (for 100 epochs) to 16 (for 400 epochs) train_dataloader = dict(dataset=dict(times=4 * 4)) param_scheduler = [ - dict( - type='LinearLR', - start_factor=0.067, - by_epoch=False, - begin=0, - end=500 * 4), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[22, 24], - gamma=0.1) + dict(type="LinearLR", start_factor=0.067, by_epoch=False, begin=0, end=500 * 4), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[22, 24], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_amp-lsj-100e_coco.py b/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_amp-lsj-100e_coco.py index 7fab2c72114cbe8a4d6cd3bdddb4e7c3b8dc2d0c..52398c413dec79a36e8af111cba2c0f9a060a9df 100644 --- a/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_amp-lsj-100e_coco.py +++ b/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_amp-lsj-100e_coco.py @@ -1,4 +1,4 @@ -_base_ = 'mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py' +_base_ = "mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py" # Enable automatic-mixed-precision training with AmpOptimWrapper. -optim_wrapper = dict(type='AmpOptimWrapper') +optim_wrapper = dict(type="AmpOptimWrapper") diff --git a/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py b/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py index 8e06587fb03d42958142cac9ce7b15e7a19a9f6d..43fb8d893e6f66b30ace294d841690d1c8f92227 100644 --- a/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py +++ b/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py @@ -1,30 +1,23 @@ -_base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../common/lsj-100e_coco-instance.py' -] +_base_ = ["../_base_/models/mask-rcnn_r50_fpn.py", "../common/lsj-100e_coco-instance.py"] image_size = (1024, 1024) -batch_augments = [ - dict(type='BatchFixedSizePad', size=image_size, pad_mask=True) -] -norm_cfg = dict(type='SyncBN', requires_grad=True) +batch_augments = [dict(type="BatchFixedSizePad", size=image_size, pad_mask=True)] +norm_cfg = dict(type="SyncBN", requires_grad=True) # Use MMSyncBN that handles empty tensor in head. It can be changed to # SyncBN after https://github.com/pytorch/pytorch/issues/36530 is fixed -head_norm_cfg = dict(type='MMSyncBN', requires_grad=True) +head_norm_cfg = dict(type="MMSyncBN", requires_grad=True) model = dict( # the model is trained from scratch, so init_cfg is None data_preprocessor=dict( # pad_size_divisor=32 is unnecessary in training but necessary # in testing. pad_size_divisor=32, - batch_augments=batch_augments), - backbone=dict( - frozen_stages=-1, norm_eval=False, norm_cfg=norm_cfg, init_cfg=None), + batch_augments=batch_augments, + ), + backbone=dict(frozen_stages=-1, norm_eval=False, norm_cfg=norm_cfg, init_cfg=None), neck=dict(norm_cfg=norm_cfg), rpn_head=dict(num_convs=2), # leads to 0.1+ mAP roi_head=dict( - bbox_head=dict( - type='Shared4Conv1FCBBoxHead', - conv_out_channels=256, - norm_cfg=head_norm_cfg), - mask_head=dict(norm_cfg=head_norm_cfg))) + bbox_head=dict(type="Shared4Conv1FCBBoxHead", conv_out_channels=256, norm_cfg=head_norm_cfg), mask_head=dict(norm_cfg=head_norm_cfg) + ), +) diff --git a/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-50e_coco.py b/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-50e_coco.py index 6621d28c0a80bd669fa857ce4eb7058a6f82296c..621299c815c2200487b6f003a6f7fb3d4c2c5179 100644 --- a/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-50e_coco.py +++ b/mmpose/configs/mmdet/strong_baselines/mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-50e_coco.py @@ -1,4 +1,4 @@ -_base_ = 'mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py' +_base_ = "mask-rcnn_r50_fpn_rpn-2conv_4conv1fc_syncbn-all_lsj-100e_coco.py" # Use RepeatDataset to speed up training # change repeat time from 4 (for 100 epochs) to 2 (for 50 epochs) diff --git a/mmpose/configs/mmdet/strongsort/strongsort_yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py b/mmpose/configs/mmdet/strongsort/strongsort_yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py index 532e2aee718fb481bc81759a2853ac0fddf80e0e..a08122dd6ad387c30c6430c2440595b0ae406d74 100644 --- a/mmpose/configs/mmdet/strongsort/strongsort_yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py +++ b/mmpose/configs/mmdet/strongsort/strongsort_yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py @@ -1,107 +1,95 @@ _base_ = [ - './yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py', # noqa: E501 + "./yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py", # noqa: E501 ] -dataset_type = 'MOTChallengeDataset' +dataset_type = "MOTChallengeDataset" detector = _base_.model -detector.pop('data_preprocessor') +detector.pop("data_preprocessor") del _base_.model model = dict( - type='StrongSORT', + type="StrongSORT", data_preprocessor=dict( - type='TrackDataPreprocessor', + type="TrackDataPreprocessor", pad_size_divisor=32, - batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(576, 1024), - size_divisor=32, - interval=10) - ]), + batch_augments=[dict(type="BatchSyncRandomResize", random_size_range=(576, 1024), size_divisor=32, interval=10)], + ), detector=detector, reid=dict( - type='BaseReID', - data_preprocessor=dict(type='mmpretrain.ClsDataPreprocessor'), - backbone=dict( - type='mmpretrain.ResNet', - depth=50, - num_stages=4, - out_indices=(3, ), - style='pytorch'), - neck=dict(type='GlobalAveragePooling', kernel_size=(8, 4), stride=1), + type="BaseReID", + data_preprocessor=dict(type="mmpretrain.ClsDataPreprocessor"), + backbone=dict(type="mmpretrain.ResNet", depth=50, num_stages=4, out_indices=(3,), style="pytorch"), + neck=dict(type="GlobalAveragePooling", kernel_size=(8, 4), stride=1), head=dict( - type='LinearReIDHead', + type="LinearReIDHead", num_fcs=1, in_channels=2048, fc_channels=1024, out_channels=128, num_classes=380, - loss_cls=dict(type='mmpretrain.CrossEntropyLoss', loss_weight=1.0), - loss_triplet=dict(type='TripletLoss', margin=0.3, loss_weight=1.0), - norm_cfg=dict(type='BN1d'), - act_cfg=dict(type='ReLU'))), - cmc=dict( - type='CameraMotionCompensation', - warp_mode='cv2.MOTION_EUCLIDEAN', - num_iters=100, - stop_eps=0.00001), + loss_cls=dict(type="mmpretrain.CrossEntropyLoss", loss_weight=1.0), + loss_triplet=dict(type="TripletLoss", margin=0.3, loss_weight=1.0), + norm_cfg=dict(type="BN1d"), + act_cfg=dict(type="ReLU"), + ), + ), + cmc=dict(type="CameraMotionCompensation", warp_mode="cv2.MOTION_EUCLIDEAN", num_iters=100, stop_eps=0.00001), tracker=dict( - type='StrongSORTTracker', - motion=dict(type='KalmanFilter', center_only=False, use_nsa=True), + type="StrongSORTTracker", + motion=dict(type="KalmanFilter", center_only=False, use_nsa=True), obj_score_thr=0.6, reid=dict( num_samples=None, img_scale=(256, 128), - img_norm_cfg=dict( - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), + img_norm_cfg=dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), match_score_thr=0.3, motion_weight=0.02, ), match_iou_thr=0.7, - momentums=dict(embeds=0.1, ), + momentums=dict( + embeds=0.1, + ), num_tentatives=2, - num_frames_retain=100), + num_frames_retain=100, + ), postprocess_model=dict( - type='AppearanceFreeLink', - checkpoint= # noqa: E251 - 'https://download.openmmlab.com/mmtracking/mot/strongsort/mot_dataset/aflink_motchallenge_20220812_190310-a7578ad3.pth', # noqa: E501 + type="AppearanceFreeLink", + checkpoint="https://download.openmmlab.com/mmtracking/mot/strongsort/mot_dataset/aflink_motchallenge_20220812_190310-a7578ad3.pth", # noqa: E251 # noqa: E501 temporal_threshold=(0, 30), spatial_threshold=50, confidence_threshold=0.95, - )) + ), +) train_pipeline = None test_pipeline = [ dict( - type='TransformBroadcaster', + type="TransformBroadcaster", transforms=[ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='Resize', scale=_base_.img_scale, keep_ratio=True), - dict( - type='Pad', - size_divisor=32, - pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='LoadTrackAnnotations'), - ]), - dict(type='PackTrackInputs') + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="Resize", scale=_base_.img_scale, keep_ratio=True), + dict(type="Pad", size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="LoadTrackAnnotations"), + ], + ), + dict(type="PackTrackInputs"), ] train_dataloader = None val_dataloader = dict( # Now StrongSORT only support video_based sampling - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( _delete_=True, type=dataset_type, data_root=_base_.data_root, - ann_file='annotations/half-val_cocoformat.json', - data_prefix=dict(img_path='train'), + ann_file="annotations/half-val_cocoformat.json", + data_prefix=dict(img_path="train"), # when you evaluate track performance, you need to remove metainfo test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + ), +) test_dataloader = val_dataloader train_cfg = None @@ -110,21 +98,15 @@ optim_wrapper = None # evaluator val_evaluator = dict( _delete_=True, - type='MOTChallengeMetric', - metric=['HOTA', 'CLEAR', 'Identity'], + type="MOTChallengeMetric", + metric=["HOTA", "CLEAR", "Identity"], # use_postprocess to support AppearanceFreeLink in val_evaluator use_postprocess=True, - postprocess_tracklet_cfg=[ - dict( - type='InterpolateTracklets', - min_num_frames=5, - max_num_frames=20, - use_gsi=True, - smooth_tau=10) - ]) + postprocess_tracklet_cfg=[dict(type="InterpolateTracklets", min_num_frames=5, max_num_frames=20, use_gsi=True, smooth_tau=10)], +) test_evaluator = val_evaluator -default_hooks = dict(logger=dict(type='LoggerHook', interval=1)) +default_hooks = dict(logger=dict(type="LoggerHook", interval=1)) del _base_.param_scheduler del _base_.custom_hooks diff --git a/mmpose/configs/mmdet/strongsort/strongsort_yolox_x_8xb4-80e_crowdhuman-mot20train_test-mot20test.py b/mmpose/configs/mmdet/strongsort/strongsort_yolox_x_8xb4-80e_crowdhuman-mot20train_test-mot20test.py index eab97063932528df7e17c7d65bf9f0d13f5dfa73..8e3d6a53e0a7a714caa2b127b827e7f5edd613eb 100644 --- a/mmpose/configs/mmdet/strongsort/strongsort_yolox_x_8xb4-80e_crowdhuman-mot20train_test-mot20test.py +++ b/mmpose/configs/mmdet/strongsort/strongsort_yolox_x_8xb4-80e_crowdhuman-mot20train_test-mot20test.py @@ -1,44 +1,37 @@ -_base_ = [ - './strongsort_yolox_x_8xb4-80e_crowdhuman-mot17halftrain' - '_test-mot17halfval.py' -] +_base_ = ["./strongsort_yolox_x_8xb4-80e_crowdhuman-mot17halftrain" "_test-mot17halfval.py"] img_scale = (1600, 896) # width, height model = dict( data_preprocessor=dict( - type='TrackDataPreprocessor', + type="TrackDataPreprocessor", pad_size_divisor=32, - batch_augments=[ - dict(type='BatchSyncRandomResize', random_size_range=(640, 1152)) - ])) + batch_augments=[dict(type="BatchSyncRandomResize", random_size_range=(640, 1152))], + ) +) test_pipeline = [ dict( - type='TransformBroadcaster', + type="TransformBroadcaster", transforms=[ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='Resize', scale=img_scale, keep_ratio=True), - dict( - type='Pad', - size_divisor=32, - pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='LoadTrackAnnotations'), - ]), - dict(type='PackTrackInputs') + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="Resize", scale=img_scale, keep_ratio=True), + dict(type="Pad", size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="LoadTrackAnnotations"), + ], + ), + dict(type="PackTrackInputs"), ] val_dataloader = dict( dataset=dict( - data_root='data/MOT17', - ann_file='annotations/train_cocoformat.json', - data_prefix=dict(img_path='train'), - pipeline=test_pipeline)) + data_root="data/MOT17", ann_file="annotations/train_cocoformat.json", data_prefix=dict(img_path="train"), pipeline=test_pipeline + ) +) test_dataloader = dict( dataset=dict( - data_root='data/MOT20', - ann_file='annotations/test_cocoformat.json', - data_prefix=dict(img_path='test'), - pipeline=test_pipeline)) + data_root="data/MOT20", ann_file="annotations/test_cocoformat.json", data_prefix=dict(img_path="test"), pipeline=test_pipeline + ) +) -test_evaluator = dict(format_only=True, outfile_prefix='./mot_20_test_res') +test_evaluator = dict(format_only=True, outfile_prefix="./mot_20_test_res") diff --git a/mmpose/configs/mmdet/strongsort/yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py b/mmpose/configs/mmdet/strongsort/yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py index 59a52e4394b5825d40a99e08793147fe836b4c19..5bc159eac64d334450abd2c5897d2a4ecc2457d9 100644 --- a/mmpose/configs/mmdet/strongsort/yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py +++ b/mmpose/configs/mmdet/strongsort/yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py @@ -1,6 +1,6 @@ -_base_ = ['../yolox/yolox_x_8xb8-300e_coco.py'] +_base_ = ["../yolox/yolox_x_8xb8-300e_coco.py"] -data_root = 'data/MOT17/' +data_root = "data/MOT17/" img_scale = (1440, 800) # width, height batch_size = 4 @@ -10,49 +10,29 @@ model = dict( bbox_head=dict(num_classes=1), test_cfg=dict(nms=dict(iou_threshold=0.7)), init_cfg=dict( - type='Pretrained', - checkpoint= # noqa: E251 - 'https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_x_8x8_300e_coco/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth' # noqa: E501 - )) + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_x_8x8_300e_coco/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth", # noqa: E251 # noqa: E501 + ), +) train_pipeline = [ - dict( - type='Mosaic', - img_scale=img_scale, - pad_val=114.0, - bbox_clip_border=False), - dict( - type='RandomAffine', - scaling_ratio_range=(0.1, 2), - border=(-img_scale[0] // 2, -img_scale[1] // 2), - bbox_clip_border=False), - dict( - type='MixUp', - img_scale=img_scale, - ratio_range=(0.8, 1.6), - pad_val=114.0, - bbox_clip_border=False), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict( - type='Resize', - scale=img_scale, - keep_ratio=True, - clip_object_border=False), - dict(type='Pad', size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False), - dict(type='PackDetInputs') + dict(type="Mosaic", img_scale=img_scale, pad_val=114.0, bbox_clip_border=False), + dict(type="RandomAffine", scaling_ratio_range=(0.1, 2), border=(-img_scale[0] // 2, -img_scale[1] // 2), bbox_clip_border=False), + dict(type="MixUp", img_scale=img_scale, ratio_range=(0.8, 1.6), pad_val=114.0, bbox_clip_border=False), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Resize", scale=img_scale, keep_ratio=True, clip_object_border=False), + dict(type="Pad", size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1), keep_empty=False), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='Resize', scale=img_scale, keep_ratio=True), - dict(type='Pad', size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="Resize", scale=img_scale, keep_ratio=True), + dict(type="Pad", size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( @@ -61,63 +41,65 @@ train_dataloader = dict( num_workers=4, persistent_workers=True, pin_memory=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='MultiImageMixDataset', + type="MultiImageMixDataset", dataset=dict( - type='ConcatDataset', + type="ConcatDataset", datasets=[ dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, - ann_file='annotations/half-train_cocoformat.json', - data_prefix=dict(img='train'), + ann_file="annotations/half-train_cocoformat.json", + data_prefix=dict(img="train"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - metainfo=dict(classes=('pedestrian', )), + metainfo=dict(classes=("pedestrian",)), pipeline=[ - dict( - type='LoadImageFromFile', - backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - ]), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + ], + ), dict( - type='CocoDataset', - data_root='data/crowdhuman', - ann_file='annotations/crowdhuman_train.json', - data_prefix=dict(img='train'), + type="CocoDataset", + data_root="data/crowdhuman", + ann_file="annotations/crowdhuman_train.json", + data_prefix=dict(img="train"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - metainfo=dict(classes=('pedestrian', )), + metainfo=dict(classes=("pedestrian",)), pipeline=[ - dict( - type='LoadImageFromFile', - backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - ]), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + ], + ), dict( - type='CocoDataset', - data_root='data/crowdhuman', - ann_file='annotations/crowdhuman_val.json', - data_prefix=dict(img='val'), + type="CocoDataset", + data_root="data/crowdhuman", + ann_file="annotations/crowdhuman_val.json", + data_prefix=dict(img="val"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - metainfo=dict(classes=('pedestrian', )), + metainfo=dict(classes=("pedestrian",)), pipeline=[ - dict( - type='LoadImageFromFile', - backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - ]), - ]), - pipeline=train_pipeline)) + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + ], + ), + ], + ), + pipeline=train_pipeline, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, dataset=dict( data_root=data_root, - ann_file='annotations/half-val_cocoformat.json', - data_prefix=dict(img='train'), - metainfo=dict(classes=('pedestrian', )), - pipeline=test_pipeline)) + ann_file="annotations/half-val_cocoformat.json", + data_prefix=dict(img="train"), + metainfo=dict(classes=("pedestrian",)), + pipeline=test_pipeline, + ), +) test_dataloader = val_dataloader # training settings @@ -134,53 +116,35 @@ optim_wrapper = dict(optimizer=dict(lr=base_lr)) # learning rate param_scheduler = [ + dict(type="QuadraticWarmupLR", by_epoch=True, begin=0, end=1, convert_to_iter_based=True), dict( - type='QuadraticWarmupLR', - by_epoch=True, - begin=0, - end=1, - convert_to_iter_based=True), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=1, T_max=max_epochs - num_last_epochs, end=max_epochs - num_last_epochs, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), dict( - type='ConstantLR', + type="ConstantLR", by_epoch=True, factor=1, begin=max_epochs - num_last_epochs, end=max_epochs, - ) + ), ] -default_hooks = dict( - checkpoint=dict( - interval=1, - max_keep_ckpts=5 # only keep latest 5 checkpoints - )) +default_hooks = dict(checkpoint=dict(interval=1, max_keep_ckpts=5)) # only keep latest 5 checkpoints custom_hooks = [ - dict( - type='YOLOXModeSwitchHook', - num_last_epochs=num_last_epochs, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0001, - update_buffers=True, - priority=49) + dict(type="YOLOXModeSwitchHook", num_last_epochs=num_last_epochs, priority=48), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0001, update_buffers=True, priority=49), ] # evaluator -val_evaluator = dict( - ann_file=data_root + 'annotations/half-val_cocoformat.json', - format_only=False) +val_evaluator = dict(ann_file=data_root + "annotations/half-val_cocoformat.json", format_only=False) test_evaluator = val_evaluator del _base_.tta_model diff --git a/mmpose/configs/mmdet/strongsort/yolox_x_8xb4-80e_crowdhuman-mot20train_test-mot20test.py b/mmpose/configs/mmdet/strongsort/yolox_x_8xb4-80e_crowdhuman-mot20train_test-mot20test.py index d4eb3cb2c9804f0219ba91d0b5d460da342ab668..abe60beef99cc1704f42640f581e4e7c7c3dacee 100644 --- a/mmpose/configs/mmdet/strongsort/yolox_x_8xb4-80e_crowdhuman-mot20train_test-mot20test.py +++ b/mmpose/configs/mmdet/strongsort/yolox_x_8xb4-80e_crowdhuman-mot20train_test-mot20test.py @@ -1,108 +1,83 @@ -_base_ = ['./yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py'] +_base_ = ["./yolox_x_8xb4-80e_crowdhuman-mot17halftrain_test-mot17halfval.py"] -data_root = 'data/MOT20/' +data_root = "data/MOT20/" img_scale = (1600, 896) # width, height # model settings -model = dict( - data_preprocessor=dict(batch_augments=[ - dict(type='BatchSyncRandomResize', random_size_range=(640, 1152)) - ])) +model = dict(data_preprocessor=dict(batch_augments=[dict(type="BatchSyncRandomResize", random_size_range=(640, 1152))])) train_pipeline = [ - dict( - type='Mosaic', - img_scale=img_scale, - pad_val=114.0, - bbox_clip_border=True), - dict( - type='RandomAffine', - scaling_ratio_range=(0.1, 2), - border=(-img_scale[0] // 2, -img_scale[1] // 2), - bbox_clip_border=True), - dict( - type='MixUp', - img_scale=img_scale, - ratio_range=(0.8, 1.6), - pad_val=114.0, - bbox_clip_border=True), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict( - type='Resize', - scale=img_scale, - keep_ratio=True, - clip_object_border=True), - dict(type='Pad', size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False), - dict(type='PackDetInputs') + dict(type="Mosaic", img_scale=img_scale, pad_val=114.0, bbox_clip_border=True), + dict(type="RandomAffine", scaling_ratio_range=(0.1, 2), border=(-img_scale[0] // 2, -img_scale[1] // 2), bbox_clip_border=True), + dict(type="MixUp", img_scale=img_scale, ratio_range=(0.8, 1.6), pad_val=114.0, bbox_clip_border=True), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Resize", scale=img_scale, keep_ratio=True, clip_object_border=True), + dict(type="Pad", size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1), keep_empty=False), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='Resize', scale=img_scale, keep_ratio=True), - dict(type='Pad', size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="Resize", scale=img_scale, keep_ratio=True), + dict(type="Pad", size_divisor=32, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( dataset=dict( - type='MultiImageMixDataset', + type="MultiImageMixDataset", dataset=dict( - type='ConcatDataset', + type="ConcatDataset", datasets=[ dict( - type='CocoDataset', + type="CocoDataset", data_root=data_root, - ann_file='annotations/train_cocoformat.json', - data_prefix=dict(img='train'), + ann_file="annotations/train_cocoformat.json", + data_prefix=dict(img="train"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - metainfo=dict(classes=('pedestrian', )), + metainfo=dict(classes=("pedestrian",)), pipeline=[ - dict( - type='LoadImageFromFile', - backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - ]), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + ], + ), dict( - type='CocoDataset', - data_root='data/crowdhuman', - ann_file='annotations/crowdhuman_train.json', - data_prefix=dict(img='train'), + type="CocoDataset", + data_root="data/crowdhuman", + ann_file="annotations/crowdhuman_train.json", + data_prefix=dict(img="train"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - metainfo=dict(classes=('pedestrian', )), + metainfo=dict(classes=("pedestrian",)), pipeline=[ - dict( - type='LoadImageFromFile', - backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - ]), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + ], + ), dict( - type='CocoDataset', - data_root='data/crowdhuman', - ann_file='annotations/crowdhuman_val.json', - data_prefix=dict(img='val'), + type="CocoDataset", + data_root="data/crowdhuman", + ann_file="annotations/crowdhuman_val.json", + data_prefix=dict(img="val"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - metainfo=dict(classes=('pedestrian', )), + metainfo=dict(classes=("pedestrian",)), pipeline=[ - dict( - type='LoadImageFromFile', - backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), - ]), - ]), - pipeline=train_pipeline)) + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + ], + ), + ], + ), + pipeline=train_pipeline, + ) +) -val_dataloader = dict( - dataset=dict( - data_root='data/MOT17', ann_file='annotations/train_cocoformat.json')) +val_dataloader = dict(dataset=dict(data_root="data/MOT17", ann_file="annotations/train_cocoformat.json")) test_dataloader = val_dataloader # evaluator -val_evaluator = dict(ann_file='data/MOT17/annotations/train_cocoformat.json') +val_evaluator = dict(ann_file="data/MOT17/annotations/train_cocoformat.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/mmdet/swin/mask-rcnn_swin-s-p4-w7_fpn_amp-ms-crop-3x_coco.py b/mmpose/configs/mmdet/swin/mask-rcnn_swin-s-p4-w7_fpn_amp-ms-crop-3x_coco.py index 4a3e8ad900553c38d11ddc7747cbc0f244f6b4c7..537228b409f50a1c4656c5ebd418b4075b26c3d9 100644 --- a/mmpose/configs/mmdet/swin/mask-rcnn_swin-s-p4-w7_fpn_amp-ms-crop-3x_coco.py +++ b/mmpose/configs/mmdet/swin/mask-rcnn_swin-s-p4-w7_fpn_amp-ms-crop-3x_coco.py @@ -1,6 +1,3 @@ -_base_ = './mask-rcnn_swin-t-p4-w7_fpn_amp-ms-crop-3x_coco.py' -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_small_patch4_window7_224.pth' # noqa -model = dict( - backbone=dict( - depths=[2, 2, 18, 2], - init_cfg=dict(type='Pretrained', checkpoint=pretrained))) +_base_ = "./mask-rcnn_swin-t-p4-w7_fpn_amp-ms-crop-3x_coco.py" +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_small_patch4_window7_224.pth" # noqa +model = dict(backbone=dict(depths=[2, 2, 18, 2], init_cfg=dict(type="Pretrained", checkpoint=pretrained))) diff --git a/mmpose/configs/mmdet/swin/mask-rcnn_swin-t-p4-w7_fpn_1x_coco.py b/mmpose/configs/mmdet/swin/mask-rcnn_swin-t-p4-w7_fpn_1x_coco.py index 5471caa139c0b7670f995501347ddf80383e9268..1e97962552775db768ce182b59e0f886091bf2dd 100644 --- a/mmpose/configs/mmdet/swin/mask-rcnn_swin-t-p4-w7_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/swin/mask-rcnn_swin-t-p4-w7_fpn_1x_coco.py @@ -1,14 +1,15 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth" # noqa model = dict( - type='MaskRCNN', + type="MaskRCNN", backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", embed_dims=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], @@ -16,45 +17,36 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=(0, 1, 2, 3), with_cp=False, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), - neck=dict(in_channels=[96, 192, 384, 768])) + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), + neck=dict(in_channels=[96, 192, 384, 768]), +) max_epochs = 12 train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[8, 11], gamma=0.1), ] # optimizer optim_wrapper = dict( - type='OptimWrapper', + type="OptimWrapper", paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'relative_position_bias_table': dict(decay_mult=0.), - 'norm': dict(decay_mult=0.) - }), - optimizer=dict( - _delete_=True, - type='AdamW', - lr=0.0001, - betas=(0.9, 0.999), - weight_decay=0.05)) + "absolute_pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + } + ), + optimizer=dict(_delete_=True, type="AdamW", lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05), +) diff --git a/mmpose/configs/mmdet/swin/mask-rcnn_swin-t-p4-w7_fpn_amp-ms-crop-3x_coco.py b/mmpose/configs/mmdet/swin/mask-rcnn_swin-t-p4-w7_fpn_amp-ms-crop-3x_coco.py index 622087ba7164fda53a70eb927b9258572b7c8ef0..440b248311a9dfd866b0b2e80f874db0ddfcd5b6 100644 --- a/mmpose/configs/mmdet/swin/mask-rcnn_swin-t-p4-w7_fpn_amp-ms-crop-3x_coco.py +++ b/mmpose/configs/mmdet/swin/mask-rcnn_swin-t-p4-w7_fpn_amp-ms-crop-3x_coco.py @@ -1,3 +1,3 @@ -_base_ = './mask-rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco.py' +_base_ = "./mask-rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco.py" # Enable automatic-mixed-precision training with AmpOptimWrapper. -optim_wrapper = dict(type='AmpOptimWrapper') +optim_wrapper = dict(type="AmpOptimWrapper") diff --git a/mmpose/configs/mmdet/swin/mask-rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco.py b/mmpose/configs/mmdet/swin/mask-rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco.py index 7024b73249ca8c77da89ab9e4653757f36a1d1d2..6527085fee4dbbca4d670576ace985fa9199e1c2 100644 --- a/mmpose/configs/mmdet/swin/mask-rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco.py +++ b/mmpose/configs/mmdet/swin/mask-rcnn_swin-t-p4-w7_fpn_ms-crop-3x_coco.py @@ -1,16 +1,17 @@ _base_ = [ - '../_base_/models/mask-rcnn_r50_fpn.py', - '../_base_/datasets/coco_instance.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/mask-rcnn_r50_fpn.py", + "../_base_/datasets/coco_instance.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth" # noqa model = dict( - type='MaskRCNN', + type="MaskRCNN", backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", embed_dims=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], @@ -18,50 +19,69 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=(0, 1, 2, 3), with_cp=False, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), - neck=dict(in_channels=[96, 192, 384, 768])) + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), + neck=dict(in_channels=[96, 192, 384, 768]), +) # augmentation strategy originates from DETR / Sparse RCNN train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', - transforms=[[ - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoice", + transforms=[ + [ + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) + ], + [ + dict(type="RandomChoiceResize", scales=[(400, 1333), (500, 1333), (600, 1333)], keep_ratio=True), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), + dict( + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], ], - [ - dict( - type='RandomChoiceResize', - scales=[(400, 1333), (500, 1333), (600, 1333)], - keep_ratio=True), - dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), - (576, 1333), (608, 1333), (640, 1333), - (672, 1333), (704, 1333), (736, 1333), - (768, 1333), (800, 1333)], - keep_ratio=True) - ]]), - dict(type='PackDetInputs') + ), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) @@ -70,30 +90,19 @@ train_cfg = dict(max_epochs=max_epochs) # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, - end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[27, 33], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[27, 33], gamma=0.1), ] # optimizer optim_wrapper = dict( - type='OptimWrapper', + type="OptimWrapper", paramwise_cfg=dict( custom_keys={ - 'absolute_pos_embed': dict(decay_mult=0.), - 'relative_position_bias_table': dict(decay_mult=0.), - 'norm': dict(decay_mult=0.) - }), - optimizer=dict( - _delete_=True, - type='AdamW', - lr=0.0001, - betas=(0.9, 0.999), - weight_decay=0.05)) + "absolute_pos_embed": dict(decay_mult=0.0), + "relative_position_bias_table": dict(decay_mult=0.0), + "norm": dict(decay_mult=0.0), + } + ), + optimizer=dict(_delete_=True, type="AdamW", lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05), +) diff --git a/mmpose/configs/mmdet/swin/retinanet_swin-t-p4-w7_fpn_1x_coco.py b/mmpose/configs/mmdet/swin/retinanet_swin-t-p4-w7_fpn_1x_coco.py index 2f40a87e8cf8593edd92f024d0bb0ed43a87b4fb..39e95d7f4a1aed387dc6ebdf9dfc13a6841e14d6 100644 --- a/mmpose/configs/mmdet/swin/retinanet_swin-t-p4-w7_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/swin/retinanet_swin-t-p4-w7_fpn_1x_coco.py @@ -1,13 +1,14 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth" # noqa model = dict( backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", embed_dims=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], @@ -15,8 +16,8 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.2, patch_norm=True, out_indices=(1, 2, 3), @@ -24,8 +25,10 @@ model = dict( # in FPN, otherwise some parameter will not be used with_cp=False, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), - neck=dict(in_channels=[192, 384, 768], start_level=0, num_outs=5)) + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), + neck=dict(in_channels=[192, 384, 768], start_level=0, num_outs=5), +) # optimizer optim_wrapper = dict(optimizer=dict(lr=0.01)) diff --git a/mmpose/configs/mmdet/timm_example/retinanet_timm-efficientnet-b1_fpn_1x_coco.py b/mmpose/configs/mmdet/timm_example/retinanet_timm-efficientnet-b1_fpn_1x_coco.py index b87dddf50f7179dc143b9ab9aecb07d09d4dea4b..6ecbba0848cd301cd3deb63e6b4e3d9df74a24e8 100644 --- a/mmpose/configs/mmdet/timm_example/retinanet_timm-efficientnet-b1_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/timm_example/retinanet_timm-efficientnet-b1_fpn_1x_coco.py @@ -1,23 +1,25 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] # please install mmpretrain # import mmpretrain.models to trigger register_module in mmpretrain -custom_imports = dict( - imports=['mmpretrain.models'], allow_failed_imports=False) +custom_imports = dict(imports=["mmpretrain.models"], allow_failed_imports=False) model = dict( backbone=dict( _delete_=True, - type='mmpretrain.TIMMBackbone', - model_name='efficientnet_b1', + type="mmpretrain.TIMMBackbone", + model_name="efficientnet_b1", features_only=True, pretrained=True, - out_indices=(1, 2, 3, 4)), - neck=dict(in_channels=[24, 40, 112, 320])) + out_indices=(1, 2, 3, 4), + ), + neck=dict(in_channels=[24, 40, 112, 320]), +) # optimizer optim_wrapper = dict(optimizer=dict(lr=0.01)) diff --git a/mmpose/configs/mmdet/timm_example/retinanet_timm-tv-resnet50_fpn_1x_coco.py b/mmpose/configs/mmdet/timm_example/retinanet_timm-tv-resnet50_fpn_1x_coco.py index 74e43506959574abbf08feb44848f4bfa8d65719..8d2110cad0fc022b3477990a180d25befcc56110 100644 --- a/mmpose/configs/mmdet/timm_example/retinanet_timm-tv-resnet50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/timm_example/retinanet_timm-tv-resnet50_fpn_1x_coco.py @@ -1,22 +1,24 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] # please install mmpretrain # import mmpretrain.models to trigger register_module in mmpretrain -custom_imports = dict( - imports=['mmpretrain.models'], allow_failed_imports=False) +custom_imports = dict(imports=["mmpretrain.models"], allow_failed_imports=False) model = dict( backbone=dict( _delete_=True, - type='mmpretrain.TIMMBackbone', - model_name='tv_resnet50', # ResNet-50 with torchvision weights + type="mmpretrain.TIMMBackbone", + model_name="tv_resnet50", # ResNet-50 with torchvision weights features_only=True, pretrained=True, - out_indices=(1, 2, 3, 4))) + out_indices=(1, 2, 3, 4), + ) +) # optimizer optim_wrapper = dict(optimizer=dict(lr=0.01)) diff --git a/mmpose/configs/mmdet/tood/tood_r101-dconv-c3-c5_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/tood/tood_r101-dconv-c3-c5_fpn_ms-2x_coco.py index 45030a6832db39a329d0901dde4a5320f34a9b6e..8777ea4aba086fa300f069e68da880ca36f5230e 100644 --- a/mmpose/configs/mmdet/tood/tood_r101-dconv-c3-c5_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/tood/tood_r101-dconv-c3-c5_fpn_ms-2x_coco.py @@ -1,7 +1,6 @@ -_base_ = './tood_r101_fpn_ms-2x_coco.py' +_base_ = "./tood_r101_fpn_ms-2x_coco.py" model = dict( - backbone=dict( - dcn=dict(type='DCNv2', deformable_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True)), - bbox_head=dict(num_dcn=2)) + backbone=dict(dcn=dict(type="DCNv2", deformable_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True)), + bbox_head=dict(num_dcn=2), +) diff --git a/mmpose/configs/mmdet/tood/tood_r101_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/tood/tood_r101_fpn_ms-2x_coco.py index fc6ae5d942e05ac90162ca9ac67adb311d581e5b..4f703ba6cab2d1c1d633e0ccd545b8b6d2eb51dd 100644 --- a/mmpose/configs/mmdet/tood/tood_r101_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/tood/tood_r101_fpn_ms-2x_coco.py @@ -1,7 +1,3 @@ -_base_ = './tood_r50_fpn_ms-2x_coco.py' +_base_ = "./tood_r50_fpn_ms-2x_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/tood/tood_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/tood/tood_r50_fpn_1x_coco.py index e4839d9d77e64d61b504ed8789bda225cc878da1..7b885fd50bb98eaf700873e08f967496aa1dd0f6 100644 --- a/mmpose/configs/mmdet/tood/tood_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/tood/tood_r50_fpn_1x_coco.py @@ -1,80 +1,60 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings model = dict( - type='TOOD', + type="TOOD", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs='on_output', - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs="on_output", num_outs=5), bbox_head=dict( - type='TOODHead', + type="TOODHead", num_classes=80, in_channels=256, stacked_convs=6, feat_channels=256, - anchor_type='anchor_free', - anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - octave_base_scale=8, - scales_per_octave=1, - strides=[8, 16, 32, 64, 128]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), + anchor_type="anchor_free", + anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), initial_loss_cls=dict( - type='FocalLoss', + type="FocalLoss", use_sigmoid=True, activated=True, # use probability instead of logit as input gamma=2.0, alpha=0.25, - loss_weight=1.0), + loss_weight=1.0, + ), loss_cls=dict( - type='QualityFocalLoss', + type="QualityFocalLoss", use_sigmoid=True, activated=True, # use probability instead of logit as input beta=2.0, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=2.0)), + loss_weight=1.0, + ), + loss_bbox=dict(type="GIoULoss", loss_weight=2.0), + ), train_cfg=dict( initial_epoch=4, - initial_assigner=dict(type='ATSSAssigner', topk=9), - assigner=dict(type='TaskAlignedAssigner', topk=13), + initial_assigner=dict(type="ATSSAssigner", topk=9), + assigner=dict(type="TaskAlignedAssigner", topk=13), alpha=1, beta=6, allowed_border=-1, pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + debug=False, + ), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/tood/tood_r50_fpn_anchor-based_1x_coco.py b/mmpose/configs/mmdet/tood/tood_r50_fpn_anchor-based_1x_coco.py index c7fbf6aff197b821de07f8d4a73f9c72e5f76288..c1cf3e39ae45605f0a26aefe4e5b5af6b8a35836 100644 --- a/mmpose/configs/mmdet/tood/tood_r50_fpn_anchor-based_1x_coco.py +++ b/mmpose/configs/mmdet/tood/tood_r50_fpn_anchor-based_1x_coco.py @@ -1,2 +1,2 @@ -_base_ = './tood_r50_fpn_1x_coco.py' -model = dict(bbox_head=dict(anchor_type='anchor_based')) +_base_ = "./tood_r50_fpn_1x_coco.py" +model = dict(bbox_head=dict(anchor_type="anchor_based")) diff --git a/mmpose/configs/mmdet/tood/tood_r50_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/tood/tood_r50_fpn_ms-2x_coco.py index ffb296dccee30438977bac61b970f5844d647cfa..71fff99bb713af6555e744c927277e4427362cee 100644 --- a/mmpose/configs/mmdet/tood/tood_r50_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/tood/tood_r50_fpn_ms-2x_coco.py @@ -1,17 +1,10 @@ -_base_ = './tood_r50_fpn_1x_coco.py' +_base_ = "./tood_r50_fpn_1x_coco.py" max_epochs = 24 # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] # training schedule for 2x @@ -19,12 +12,10 @@ train_cfg = dict(max_epochs=max_epochs) # multi-scale training train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', scale=[(1333, 480), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=[(1333, 480), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/tood/tood_x101-64x4d-dconv-c4-c5_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/tood/tood_x101-64x4d-dconv-c4-c5_fpn_ms-2x_coco.py index 43405196184715923bb22499958c74fe9bf4a2da..5aacff1ae11217fa1750c0c7c767d47deb803c59 100644 --- a/mmpose/configs/mmdet/tood/tood_x101-64x4d-dconv-c4-c5_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/tood/tood_x101-64x4d-dconv-c4-c5_fpn_ms-2x_coco.py @@ -1,7 +1,8 @@ -_base_ = './tood_x101-64x4d_fpn_ms-2x_coco.py' +_base_ = "./tood_x101-64x4d_fpn_ms-2x_coco.py" model = dict( backbone=dict( - dcn=dict(type='DCNv2', deformable_groups=1, fallback_on_stride=False), + dcn=dict(type="DCNv2", deformable_groups=1, fallback_on_stride=False), stage_with_dcn=(False, False, True, True), ), - bbox_head=dict(num_dcn=2)) + bbox_head=dict(num_dcn=2), +) diff --git a/mmpose/configs/mmdet/tood/tood_x101-64x4d_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/tood/tood_x101-64x4d_fpn_ms-2x_coco.py index 1651542c7562553f206ba763fb9a43838e042450..c85f848fd7197402f6be6dd89ab52a83953acaf2 100644 --- a/mmpose/configs/mmdet/tood/tood_x101-64x4d_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/tood/tood_x101-64x4d_fpn_ms-2x_coco.py @@ -1,16 +1,17 @@ -_base_ = './tood_r50_fpn_ms-2x_coco.py' +_base_ = "./tood_r50_fpn_ms-2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/tridentnet/tridentnet_r50-caffe_1x_coco.py b/mmpose/configs/mmdet/tridentnet/tridentnet_r50-caffe_1x_coco.py index 26a4c12316ee80c7dfae1624af3f4146dba0a414..3d4104828880e7426dc5ef5f6a13f855ebef4075 100644 --- a/mmpose/configs/mmdet/tridentnet/tridentnet_r50-caffe_1x_coco.py +++ b/mmpose/configs/mmdet/tridentnet/tridentnet_r50-caffe_1x_coco.py @@ -1,22 +1,19 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50-caffe-c4.py', - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50-caffe-c4.py", + "../_base_/datasets/coco_detection.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] model = dict( - type='TridentFasterRCNN', + type="TridentFasterRCNN", backbone=dict( - type='TridentResNet', + type="TridentResNet", trident_dilations=(1, 2, 3), num_branch=3, test_branch_idx=1, - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron2/resnet50_caffe')), - roi_head=dict(type='TridentRoIHead', num_branch=3, test_branch_idx=1), - train_cfg=dict( - rpn_proposal=dict(max_per_img=500), - rcnn=dict( - sampler=dict(num=128, pos_fraction=0.5, - add_gt_as_proposals=False)))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron2/resnet50_caffe"), + ), + roi_head=dict(type="TridentRoIHead", num_branch=3, test_branch_idx=1), + train_cfg=dict(rpn_proposal=dict(max_per_img=500), rcnn=dict(sampler=dict(num=128, pos_fraction=0.5, add_gt_as_proposals=False))), +) diff --git a/mmpose/configs/mmdet/tridentnet/tridentnet_r50-caffe_ms-1x_coco.py b/mmpose/configs/mmdet/tridentnet/tridentnet_r50-caffe_ms-1x_coco.py index 806d20b90c96be9357eccd9f9ca8c880b0716cae..4589eadabc9876dfa2b6994b69fb840016ddd594 100644 --- a/mmpose/configs/mmdet/tridentnet/tridentnet_r50-caffe_ms-1x_coco.py +++ b/mmpose/configs/mmdet/tridentnet/tridentnet_r50-caffe_ms-1x_coco.py @@ -1,15 +1,11 @@ -_base_ = 'tridentnet_r50-caffe_1x_coco.py' +_base_ = "tridentnet_r50-caffe_1x_coco.py" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomChoiceResize', - scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), - (1333, 768), (1333, 800)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomChoiceResize", scales=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), (1333, 768), (1333, 800)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/tridentnet/tridentnet_r50-caffe_ms-3x_coco.py b/mmpose/configs/mmdet/tridentnet/tridentnet_r50-caffe_ms-3x_coco.py index 4de249c60c234a9d301658594f7b072b0b48017b..733e8bddd2d2ee4c089207b2bc947d3b46ac2c2b 100644 --- a/mmpose/configs/mmdet/tridentnet/tridentnet_r50-caffe_ms-3x_coco.py +++ b/mmpose/configs/mmdet/tridentnet/tridentnet_r50-caffe_ms-3x_coco.py @@ -1,18 +1,10 @@ -_base_ = 'tridentnet_r50-caffe_ms-1x_coco.py' +_base_ = "tridentnet_r50-caffe_ms-1x_coco.py" # learning rate max_epochs = 36 -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[28, 34], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[28, 34], gamma=0.1), ] diff --git a/mmpose/configs/mmdet/v3det/cascade_rcnn_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py b/mmpose/configs/mmdet/v3det/cascade_rcnn_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py index 567c31bd0e986e071b50ff2aac9cb896d4daf6fd..f5689a501711e2651cfc3476a65ff197a6d315ef 100644 --- a/mmpose/configs/mmdet/v3det/cascade_rcnn_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py +++ b/mmpose/configs/mmdet/v3det/cascade_rcnn_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py @@ -1,164 +1,121 @@ _base_ = [ - '../_base_/models/cascade-rcnn_r50_fpn.py', '../_base_/datasets/v3det.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/cascade-rcnn_r50_fpn.py", + "../_base_/datasets/v3det.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] # model settings model = dict( - rpn_head=dict( - loss_bbox=dict(_delete_=True, type='L1Loss', loss_weight=1.0)), - roi_head=dict(bbox_head=[ - dict( - type='Shared2FCBBoxHead', - in_channels=256, - fc_out_channels=1024, - roi_feat_size=7, - num_classes=13204, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), - reg_class_agnostic=True, - cls_predictor_cfg=dict( - type='NormedLinear', tempearture=50, bias=True), - loss_cls=dict( - type='CrossEntropyCustomLoss', + rpn_head=dict(loss_bbox=dict(_delete_=True, type="L1Loss", loss_weight=1.0)), + roi_head=dict( + bbox_head=[ + dict( + type="Shared2FCBBoxHead", + in_channels=256, + fc_out_channels=1024, + roi_feat_size=7, num_classes=13204, - use_sigmoid=True, - loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), - dict( - type='Shared2FCBBoxHead', - in_channels=256, - fc_out_channels=1024, - roi_feat_size=7, - num_classes=13204, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.05, 0.05, 0.1, 0.1]), - reg_class_agnostic=True, - cls_predictor_cfg=dict( - type='NormedLinear', tempearture=50, bias=True), - loss_cls=dict( - type='CrossEntropyCustomLoss', + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + reg_class_agnostic=True, + cls_predictor_cfg=dict(type="NormedLinear", tempearture=50, bias=True), + loss_cls=dict(type="CrossEntropyCustomLoss", num_classes=13204, use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), + dict( + type="Shared2FCBBoxHead", + in_channels=256, + fc_out_channels=1024, + roi_feat_size=7, num_classes=13204, - use_sigmoid=True, - loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), - dict( - type='Shared2FCBBoxHead', - in_channels=256, - fc_out_channels=1024, - roi_feat_size=7, - num_classes=13204, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.033, 0.033, 0.067, 0.067]), - reg_class_agnostic=True, - cls_predictor_cfg=dict( - type='NormedLinear', tempearture=50, bias=True), - loss_cls=dict( - type='CrossEntropyCustomLoss', + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), + reg_class_agnostic=True, + cls_predictor_cfg=dict(type="NormedLinear", tempearture=50, bias=True), + loss_cls=dict(type="CrossEntropyCustomLoss", num_classes=13204, use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), + dict( + type="Shared2FCBBoxHead", + in_channels=256, + fc_out_channels=1024, + roi_feat_size=7, num_classes=13204, - use_sigmoid=True, - loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)) - ]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), + reg_class_agnostic=True, + cls_predictor_cfg=dict(type="NormedLinear", tempearture=50, bias=True), + loss_cls=dict(type="CrossEntropyCustomLoss", num_classes=13204, use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), + ] + ), # model training and testing settings train_cfg=dict( rpn_proposal=dict(nms_pre=4000, max_per_img=2000), rcnn=[ dict( assigner=dict( - type='MaxIoUAssigner', + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1, - perm_repeat_gt_cfg=dict(iou_thr=0.7, perm_range=0.01)), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + perm_repeat_gt_cfg=dict(iou_thr=0.7, perm_range=0.01), + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False), + debug=False, + ), dict( assigner=dict( - type='MaxIoUAssigner', + type="MaxIoUAssigner", pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6, match_low_quality=False, ignore_iof_thr=-1, - perm_repeat_gt_cfg=dict(iou_thr=0.7, perm_range=0.01)), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + perm_repeat_gt_cfg=dict(iou_thr=0.7, perm_range=0.01), + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False), + debug=False, + ), dict( assigner=dict( - type='MaxIoUAssigner', + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.7, min_pos_iou=0.7, match_low_quality=False, ignore_iof_thr=-1, - perm_repeat_gt_cfg=dict(iou_thr=0.7, perm_range=0.01)), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + perm_repeat_gt_cfg=dict(iou_thr=0.7, perm_range=0.01), + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False) - ]), - test_cfg=dict( - rcnn=dict( - score_thr=0.0001, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=300))) + debug=False, + ), + ], + ), + test_cfg=dict(rcnn=dict(score_thr=0.0001, nms=dict(type="nms", iou_threshold=0.6), max_per_img=300)), +) # dataset settings train_dataloader = dict(batch_size=4, num_workers=8) # training schedule for 1x max_iter = 68760 * 2 -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=max_iter, - val_interval=max_iter) +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=max_iter, val_interval=max_iter) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 2048, - by_epoch=False, - begin=0, - end=5000), - dict( - type='MultiStepLR', - begin=0, - end=max_iter, - by_epoch=False, - milestones=[45840 * 2, 63030 * 2], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 2048, by_epoch=False, begin=0, end=5000), + dict(type="MultiStepLR", begin=0, end=max_iter, by_epoch=False, milestones=[45840 * 2, 63030 * 2], gamma=0.1), ] # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(_delete_=True, type='AdamW', lr=1e-4 * 1, weight_decay=0.1), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", + optimizer=dict(_delete_=True, type="AdamW", lr=1e-4 * 1, weight_decay=0.1), + clip_grad=dict(max_norm=35, norm_type=2), +) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically @@ -166,6 +123,5 @@ optim_wrapper = dict( # - `base_batch_size` = (8 GPUs) x (2 samples per GPU). auto_scale_lr = dict(enable=False, base_batch_size=32) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=5730 * 2)) -log_processor = dict(type='LogProcessor', window_size=50, by_epoch=False) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", by_epoch=False, interval=5730 * 2)) +log_processor = dict(type="LogProcessor", window_size=50, by_epoch=False) diff --git a/mmpose/configs/mmdet/v3det/cascade_rcnn_swinb_fpn_8x4_sample1e-3_mstrain_v3det_2x.py b/mmpose/configs/mmdet/v3det/cascade_rcnn_swinb_fpn_8x4_sample1e-3_mstrain_v3det_2x.py index f6493323ba8d92d2628fb4784f5a12dd564460be..20394eabf4882fd3e2c2a20f59a27dc46d0a1c91 100644 --- a/mmpose/configs/mmdet/v3det/cascade_rcnn_swinb_fpn_8x4_sample1e-3_mstrain_v3det_2x.py +++ b/mmpose/configs/mmdet/v3det/cascade_rcnn_swinb_fpn_8x4_sample1e-3_mstrain_v3det_2x.py @@ -1,14 +1,14 @@ _base_ = [ - './cascade_rcnn_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py', + "./cascade_rcnn_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py", ] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224.pth" # noqa # model settings model = dict( backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", embed_dims=128, depths=[2, 2, 18, 2], num_heads=[4, 8, 16, 32], @@ -16,12 +16,14 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, out_indices=(0, 1, 2, 3), with_cp=False, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), - neck=dict(in_channels=[128, 256, 512, 1024])) + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), + neck=dict(in_channels=[128, 256, 512, 1024]), +) diff --git a/mmpose/configs/mmdet/v3det/deformable-detr-refine-twostage_r50_8xb4_sample1e-3_v3det_50e.py b/mmpose/configs/mmdet/v3det/deformable-detr-refine-twostage_r50_8xb4_sample1e-3_v3det_50e.py index 97544a27edfd75eef4ba25fd12a122f03b392c1f..134a833f0e2d6847da97eec058fdbd4ae44b1e75 100644 --- a/mmpose/configs/mmdet/v3det/deformable-detr-refine-twostage_r50_8xb4_sample1e-3_v3det_50e.py +++ b/mmpose/configs/mmdet/v3det/deformable-detr-refine-twostage_r50_8xb4_sample1e-3_v3det_50e.py @@ -1,47 +1,67 @@ -_base_ = '../deformable_detr/deformable-detr-refine-twostage_r50_16xb2-50e_coco.py' # noqa +_base_ = "../deformable_detr/deformable-detr-refine-twostage_r50_16xb2-50e_coco.py" # noqa model = dict( bbox_head=dict(num_classes=13204), test_cfg=dict(max_per_img=300), ) -data_root = 'data/V3Det/' +data_root = "data/V3Det/" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='PackDetInputs') + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="PackDetInputs"), ] train_dataloader = dict( @@ -49,60 +69,44 @@ train_dataloader = dict( batch_size=4, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( - type='ClassBalancedDataset', + type="ClassBalancedDataset", oversample_thr=1e-3, dataset=dict( - type='V3DetDataset', + type="V3DetDataset", data_root=data_root, - ann_file='annotations/v3det_2023_v1_train.json', - data_prefix=dict(img=''), + ann_file="annotations/v3det_2023_v1_train.json", + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline, - backend_args=None))) + backend_args=None, + ), + ), +) val_dataloader = dict( - dataset=dict( - type='V3DetDataset', - data_root=data_root, - ann_file='annotations/v3det_2023_v1_val.json', - data_prefix=dict(img=''))) + dataset=dict(type="V3DetDataset", data_root=data_root, ann_file="annotations/v3det_2023_v1_val.json", data_prefix=dict(img="")) +) test_dataloader = val_dataloader -val_evaluator = dict( - ann_file=data_root + 'annotations/v3det_2023_v1_val.json', - use_mp_eval=True, - proposal_nums=[300]) +val_evaluator = dict(ann_file=data_root + "annotations/v3det_2023_v1_val.json", use_mp_eval=True, proposal_nums=[300]) test_evaluator = val_evaluator # training schedule for 50e # when using RFS, bs32, each epoch ~ 5730 iter max_iter = 286500 -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=max_iter, - val_interval=max_iter / 5) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=max_iter, val_interval=max_iter / 5) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_iter, - by_epoch=False, - milestones=[229200], # 40e - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_iter, by_epoch=False, milestones=[229200], gamma=0.1)] # 40e default_hooks = dict( - timer=dict(type='IterTimerHook'), - param_scheduler=dict(type='ParamSchedulerHook'), - checkpoint=dict( - type='CheckpointHook', by_epoch=False, interval=5730, - max_keep_ckpts=3)) + timer=dict(type="IterTimerHook"), + param_scheduler=dict(type="ParamSchedulerHook"), + checkpoint=dict(type="CheckpointHook", by_epoch=False, interval=5730, max_keep_ckpts=3), +) -log_processor = dict(type='LogProcessor', window_size=50, by_epoch=False) +log_processor = dict(type="LogProcessor", window_size=50, by_epoch=False) diff --git a/mmpose/configs/mmdet/v3det/deformable-detr-refine-twostage_swin_16xb2_sample1e-3_v3det_50e.py b/mmpose/configs/mmdet/v3det/deformable-detr-refine-twostage_swin_16xb2_sample1e-3_v3det_50e.py index e640cd604a97813a70588d5ffe23701543ab0087..8c5cf304500457743c7be20d8ac7b71c8b3e6594 100644 --- a/mmpose/configs/mmdet/v3det/deformable-detr-refine-twostage_swin_16xb2_sample1e-3_v3det_50e.py +++ b/mmpose/configs/mmdet/v3det/deformable-detr-refine-twostage_swin_16xb2_sample1e-3_v3det_50e.py @@ -1,11 +1,11 @@ -_base_ = 'deformable-detr-refine-twostage_r50_8xb4_sample1e-3_v3det_50e.py' +_base_ = "deformable-detr-refine-twostage_r50_8xb4_sample1e-3_v3det_50e.py" -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224.pth" # noqa model = dict( backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", embed_dims=128, depths=[2, 2, 18, 2], num_heads=[4, 8, 16, 32], @@ -13,14 +13,15 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, out_indices=(1, 2, 3), with_cp=False, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), neck=dict(in_channels=[256, 512, 1024]), ) diff --git a/mmpose/configs/mmdet/v3det/dino-4scale_r50_8xb2_sample1e-3_v3det_36e.py b/mmpose/configs/mmdet/v3det/dino-4scale_r50_8xb2_sample1e-3_v3det_36e.py index d9e6e6be0715512b111171c4b60cca7433f8ca34..dd80db01223ddc501e3075fd18f6d1968a18aef1 100644 --- a/mmpose/configs/mmdet/v3det/dino-4scale_r50_8xb2_sample1e-3_v3det_36e.py +++ b/mmpose/configs/mmdet/v3det/dino-4scale_r50_8xb2_sample1e-3_v3det_36e.py @@ -1,109 +1,111 @@ -_base_ = '../dino/dino-4scale_r50_8xb2-36e_coco.py' +_base_ = "../dino/dino-4scale_r50_8xb2-36e_coco.py" model = dict( bbox_head=dict(num_classes=13204), test_cfg=dict(max_per_img=300), ) -data_root = 'data/V3Det/' +data_root = "data/V3Det/" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='RandomFlip', prob=0.5), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomFlip", prob=0.5), dict( - type='RandomChoice', + type="RandomChoice", transforms=[ [ dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ) ], [ dict( - type='RandomChoiceResize', + type="RandomChoiceResize", # The radio of all image in train dataset < 7 # follow the original implement scales=[(400, 4200), (500, 4200), (600, 4200)], - keep_ratio=True), + keep_ratio=True, + ), + dict(type="RandomCrop", crop_type="absolute_range", crop_size=(384, 600), allow_negative_crop=True), dict( - type='RandomCrop', - crop_type='absolute_range', - crop_size=(384, 600), - allow_negative_crop=True), - dict( - type='RandomChoiceResize', - scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), - (608, 1333), (640, 1333), (672, 1333), (704, 1333), - (736, 1333), (768, 1333), (800, 1333)], - keep_ratio=True) - ] - ]), - dict(type='PackDetInputs') + type="RandomChoiceResize", + scales=[ + (480, 1333), + (512, 1333), + (544, 1333), + (576, 1333), + (608, 1333), + (640, 1333), + (672, 1333), + (704, 1333), + (736, 1333), + (768, 1333), + (800, 1333), + ], + keep_ratio=True, + ), + ], + ], + ), + dict(type="PackDetInputs"), ] train_dataloader = dict( _delete_=True, batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( - type='ClassBalancedDataset', + type="ClassBalancedDataset", oversample_thr=1e-3, dataset=dict( - type='V3DetDataset', + type="V3DetDataset", data_root=data_root, - ann_file='annotations/v3det_2023_v1_train.json', - data_prefix=dict(img=''), + ann_file="annotations/v3det_2023_v1_train.json", + data_prefix=dict(img=""), filter_cfg=dict(filter_empty_gt=False), pipeline=train_pipeline, - backend_args=None))) + backend_args=None, + ), + ), +) val_dataloader = dict( - dataset=dict( - type='V3DetDataset', - data_root=data_root, - ann_file='annotations/v3det_2023_v1_val.json', - data_prefix=dict(img=''))) + dataset=dict(type="V3DetDataset", data_root=data_root, ann_file="annotations/v3det_2023_v1_val.json", data_prefix=dict(img="")) +) test_dataloader = val_dataloader -val_evaluator = dict( - ann_file=data_root + 'annotations/v3det_2023_v1_val.json', - use_mp_eval=True, - proposal_nums=[300]) +val_evaluator = dict(ann_file=data_root + "annotations/v3det_2023_v1_val.json", use_mp_eval=True, proposal_nums=[300]) test_evaluator = val_evaluator # training schedule for 36e # when using RFS, bs16, each epoch ~ 11460 iter max_iter = 412560 -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=max_iter, - val_interval=max_iter / 5) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=max_iter, val_interval=max_iter / 5) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate -param_scheduler = [ - dict( - type='MultiStepLR', - begin=0, - end=max_iter, - by_epoch=False, - milestones=[343800], # 30e - gamma=0.1) -] +param_scheduler = [dict(type="MultiStepLR", begin=0, end=max_iter, by_epoch=False, milestones=[343800], gamma=0.1)] # 30e default_hooks = dict( - timer=dict(type='IterTimerHook'), - param_scheduler=dict(type='ParamSchedulerHook'), - checkpoint=dict( - type='CheckpointHook', - by_epoch=False, - interval=11460, - max_keep_ckpts=3)) + timer=dict(type="IterTimerHook"), + param_scheduler=dict(type="ParamSchedulerHook"), + checkpoint=dict(type="CheckpointHook", by_epoch=False, interval=11460, max_keep_ckpts=3), +) -log_processor = dict(type='LogProcessor', window_size=50, by_epoch=False) +log_processor = dict(type="LogProcessor", window_size=50, by_epoch=False) diff --git a/mmpose/configs/mmdet/v3det/dino-4scale_swin_16xb1_sample1e-3_v3det_36e.py b/mmpose/configs/mmdet/v3det/dino-4scale_swin_16xb1_sample1e-3_v3det_36e.py index 100c4ba4b8cb2c0ac3e44f5e9ddcfc37bbfe6b55..f78846539347e220869a80845d24743a618b1f31 100644 --- a/mmpose/configs/mmdet/v3det/dino-4scale_swin_16xb1_sample1e-3_v3det_36e.py +++ b/mmpose/configs/mmdet/v3det/dino-4scale_swin_16xb1_sample1e-3_v3det_36e.py @@ -1,11 +1,11 @@ -_base_ = 'dino-4scale_r50_8xb2_sample1e-3_v3det_36e.py' +_base_ = "dino-4scale_r50_8xb2_sample1e-3_v3det_36e.py" -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224.pth" # noqa model = dict( backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", embed_dims=128, depths=[2, 2, 18, 2], num_heads=[4, 8, 16, 32], @@ -13,14 +13,15 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, out_indices=(1, 2, 3), with_cp=False, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), neck=dict(in_channels=[256, 512, 1024]), ) diff --git a/mmpose/configs/mmdet/v3det/faster_rcnn_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py b/mmpose/configs/mmdet/v3det/faster_rcnn_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py index 3d306fb094806d75ec614b52a43bf6614d13eed4..b04c1a86d45899c7cc0b0aa09b20533a5c6bb594 100644 --- a/mmpose/configs/mmdet/v3det/faster_rcnn_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py +++ b/mmpose/configs/mmdet/v3det/faster_rcnn_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py @@ -1,6 +1,8 @@ _base_ = [ - '../_base_/models/faster-rcnn_r50_fpn.py', '../_base_/datasets/v3det.py', - '../_base_/schedules/schedule_2x.py', '../_base_/default_runtime.py' + "../_base_/models/faster-rcnn_r50_fpn.py", + "../_base_/datasets/v3det.py", + "../_base_/schedules/schedule_2x.py", + "../_base_/default_runtime.py", ] # model settings model = dict( @@ -8,58 +10,36 @@ model = dict( bbox_head=dict( num_classes=13204, reg_class_agnostic=True, - cls_predictor_cfg=dict( - type='NormedLinear', tempearture=50, bias=True), - loss_cls=dict( - type='CrossEntropyCustomLoss', - num_classes=13204, - use_sigmoid=True, - loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0))), + cls_predictor_cfg=dict(type="NormedLinear", tempearture=50, bias=True), + loss_cls=dict(type="CrossEntropyCustomLoss", num_classes=13204, use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ) + ), # model training and testing settings train_cfg=dict( - rpn_proposal=dict(nms_pre=4000, max_per_img=2000), - rcnn=dict( - assigner=dict( - perm_repeat_gt_cfg=dict(iou_thr=0.7, perm_range=0.01)))), - test_cfg=dict( - rcnn=dict( - score_thr=0.0001, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=300))) + rpn_proposal=dict(nms_pre=4000, max_per_img=2000), rcnn=dict(assigner=dict(perm_repeat_gt_cfg=dict(iou_thr=0.7, perm_range=0.01))) + ), + test_cfg=dict(rcnn=dict(score_thr=0.0001, nms=dict(type="nms", iou_threshold=0.6), max_per_img=300)), +) # dataset settings train_dataloader = dict(batch_size=4, num_workers=8) # training schedule for 2x max_iter = 68760 * 2 -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=max_iter, - val_interval=max_iter) +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=max_iter, val_interval=max_iter) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 2048, - by_epoch=False, - begin=0, - end=5000), - dict( - type='MultiStepLR', - begin=0, - end=max_iter, - by_epoch=False, - milestones=[45840 * 2, 63030 * 2], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 2048, by_epoch=False, begin=0, end=5000), + dict(type="MultiStepLR", begin=0, end=max_iter, by_epoch=False, milestones=[45840 * 2, 63030 * 2], gamma=0.1), ] # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(_delete_=True, type='AdamW', lr=1e-4 * 1, weight_decay=0.1), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", + optimizer=dict(_delete_=True, type="AdamW", lr=1e-4 * 1, weight_decay=0.1), + clip_grad=dict(max_norm=35, norm_type=2), +) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically @@ -67,6 +47,5 @@ optim_wrapper = dict( # - `base_batch_size` = (8 GPUs) x (2 samples per GPU). auto_scale_lr = dict(enable=False, base_batch_size=32) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=5730 * 2)) -log_processor = dict(type='LogProcessor', window_size=50, by_epoch=False) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", by_epoch=False, interval=5730 * 2)) +log_processor = dict(type="LogProcessor", window_size=50, by_epoch=False) diff --git a/mmpose/configs/mmdet/v3det/faster_rcnn_swinb_fpn_8x4_sample1e-3_mstrain_v3det_2x.py b/mmpose/configs/mmdet/v3det/faster_rcnn_swinb_fpn_8x4_sample1e-3_mstrain_v3det_2x.py index b0b1110811230b4bda27da9fd2e58067c7326c52..8e4f0c75a4057f57b334960658de72a437feeeb5 100644 --- a/mmpose/configs/mmdet/v3det/faster_rcnn_swinb_fpn_8x4_sample1e-3_mstrain_v3det_2x.py +++ b/mmpose/configs/mmdet/v3det/faster_rcnn_swinb_fpn_8x4_sample1e-3_mstrain_v3det_2x.py @@ -1,14 +1,14 @@ _base_ = [ - './faster_rcnn_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py', + "./faster_rcnn_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py", ] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224.pth" # noqa # model settings model = dict( backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", embed_dims=128, depths=[2, 2, 18, 2], num_heads=[4, 8, 16, 32], @@ -16,12 +16,14 @@ model = dict( mlp_ratio=4, qkv_bias=True, qk_scale=None, - drop_rate=0., - attn_drop_rate=0., + drop_rate=0.0, + attn_drop_rate=0.0, drop_path_rate=0.3, patch_norm=True, out_indices=(0, 1, 2, 3), with_cp=False, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), - neck=dict(in_channels=[128, 256, 512, 1024])) + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), + neck=dict(in_channels=[128, 256, 512, 1024]), +) diff --git a/mmpose/configs/mmdet/v3det/fcos_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py b/mmpose/configs/mmdet/v3det/fcos_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py index b78e38c93cb0fdedff3948f1ce7b5b7787efcaea..6fc1f4f87f78959100a27cd0617f9f312431b8e2 100644 --- a/mmpose/configs/mmdet/v3det/fcos_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py +++ b/mmpose/configs/mmdet/v3det/fcos_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py @@ -1,70 +1,58 @@ -_base_ = [ - '../_base_/datasets/v3det.py', '../_base_/schedules/schedule_2x.py', - '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/v3det.py", "../_base_/schedules/schedule_2x.py", "../_base_/default_runtime.py"] # model settings model = dict( - type='FCOS', + type="FCOS", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), neck=dict( - type='FPN', + type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, - add_extra_convs='on_output', # use P5 + add_extra_convs="on_output", # use P5 num_outs=5, - relu_before_extra_convs=True), + relu_before_extra_convs=True, + ), bbox_head=dict( - type='FCOSHead', + type="FCOSHead", num_classes=13204, in_channels=256, stacked_convs=4, feat_channels=256, strides=[8, 16, 32, 64, 128], - cls_predictor_cfg=dict(type='NormedLinear', tempearture=50, bias=True), - loss_cls=dict( - type='FocalCustomLoss', - use_sigmoid=True, - num_classes=13204, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='IoULoss', loss_weight=1.0), - loss_centerness=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + cls_predictor_cfg=dict(type="NormedLinear", tempearture=50, bias=True), + loss_cls=dict(type="FocalCustomLoss", use_sigmoid=True, num_classes=13204, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="IoULoss", loss_weight=1.0), + loss_centerness=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), # model training and testing settings train_cfg=dict( assigner=dict( - type='MaxIoUAssigner', + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1, - perm_repeat_gt_cfg=dict(iou_thr=0.7, perm_range=0.01)), + perm_repeat_gt_cfg=dict(iou_thr=0.7, perm_range=0.01), + ), allowed_border=-1, pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.0001, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=300)) + debug=False, + ), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.0001, nms=dict(type="nms", iou_threshold=0.6), max_per_img=300), +) # dataset settings backend_args = None @@ -73,35 +61,20 @@ train_dataloader = dict(batch_size=2, num_workers=8) # training schedule for 2x max_iter = 68760 * 2 * 2 -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=max_iter, - val_interval=max_iter) +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=max_iter, val_interval=max_iter) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0 / 2048, - by_epoch=False, - begin=0, - end=5000 * 2), - dict( - type='MultiStepLR', - begin=0, - end=max_iter, - by_epoch=False, - milestones=[45840 * 2 * 2, 63030 * 2 * 2], - gamma=0.1) + dict(type="LinearLR", start_factor=1.0 / 2048, by_epoch=False, begin=0, end=5000 * 2), + dict(type="MultiStepLR", begin=0, end=max_iter, by_epoch=False, milestones=[45840 * 2 * 2, 63030 * 2 * 2], gamma=0.1), ] # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict( - _delete_=True, type='AdamW', lr=1e-4 * 0.25, weight_decay=0.1), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", + optimizer=dict(_delete_=True, type="AdamW", lr=1e-4 * 0.25, weight_decay=0.1), + clip_grad=dict(max_norm=35, norm_type=2), +) # Default setting for scaling LR automatically # - `enable` means enable scaling LR automatically @@ -109,8 +82,7 @@ optim_wrapper = dict( # - `base_batch_size` = (8 GPUs) x (2 samples per GPU). auto_scale_lr = dict(enable=False, base_batch_size=32) -default_hooks = dict( - checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=5730 * 2)) -log_processor = dict(type='LogProcessor', window_size=50, by_epoch=False) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", by_epoch=False, interval=5730 * 2)) +log_processor = dict(type="LogProcessor", window_size=50, by_epoch=False) find_unused_parameters = True diff --git a/mmpose/configs/mmdet/v3det/fcos_swinb_fpn_8x4_sample1e-3_mstrain_v3det_2x.py b/mmpose/configs/mmdet/v3det/fcos_swinb_fpn_8x4_sample1e-3_mstrain_v3det_2x.py index 6ca952a28fc08ae9b14ad30308eff823b1bba55e..dc1ea1a95c2e352b2370935b7a03650bf299d350 100644 --- a/mmpose/configs/mmdet/v3det/fcos_swinb_fpn_8x4_sample1e-3_mstrain_v3det_2x.py +++ b/mmpose/configs/mmdet/v3det/fcos_swinb_fpn_8x4_sample1e-3_mstrain_v3det_2x.py @@ -1,14 +1,14 @@ _base_ = [ - './fcos_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py', + "./fcos_r50_fpn_8x4_sample1e-3_mstrain_v3det_2x.py", ] -pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224.pth' # noqa +pretrained = "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224.pth" # noqa # model settings model = dict( backbone=dict( _delete_=True, - type='SwinTransformer', + type="SwinTransformer", embed_dims=128, depths=[2, 2, 18, 2], num_heads=[4, 8, 16, 32], @@ -23,5 +23,7 @@ model = dict( out_indices=(0, 1, 2, 3), with_cp=False, convert_weights=True, - init_cfg=dict(type='Pretrained', checkpoint=pretrained)), - neck=dict(in_channels=[128, 256, 512, 1024], force_grad_on_level=True)) + init_cfg=dict(type="Pretrained", checkpoint=pretrained), + ), + neck=dict(in_channels=[128, 256, 512, 1024], force_grad_on_level=True), +) diff --git a/mmpose/configs/mmdet/vfnet/vfnet_r101-mdconv-c3-c5_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/vfnet/vfnet_r101-mdconv-c3-c5_fpn_ms-2x_coco.py index 2dd67a3bcce3bbb66531997133880d65af0c856a..8f22835a3836cbe22057537589e1158277424e29 100644 --- a/mmpose/configs/mmdet/vfnet/vfnet_r101-mdconv-c3-c5_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/vfnet/vfnet_r101-mdconv-c3-c5_fpn_ms-2x_coco.py @@ -1,15 +1,16 @@ -_base_ = './vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco.py' +_base_ = "./vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco.py" model = dict( backbone=dict( - type='ResNet', + type="ResNet", depth=101, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), + style="pytorch", + dcn=dict(type="DCNv2", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True), - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), + ) +) diff --git a/mmpose/configs/mmdet/vfnet/vfnet_r101_fpn_1x_coco.py b/mmpose/configs/mmdet/vfnet/vfnet_r101_fpn_1x_coco.py index b296a07959e43517d792f36f356404a232fb0dc3..3a344e3da3db709b9f1433ca57f8ab46e66abeeb 100644 --- a/mmpose/configs/mmdet/vfnet/vfnet_r101_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/vfnet/vfnet_r101_fpn_1x_coco.py @@ -1,6 +1,2 @@ -_base_ = './vfnet_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./vfnet_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/vfnet/vfnet_r101_fpn_2x_coco.py b/mmpose/configs/mmdet/vfnet/vfnet_r101_fpn_2x_coco.py index 37a7bacb5e409a75ae2cd71fc022837f09537aa7..ece73da850e2504bc31160b446631a7bd5321062 100644 --- a/mmpose/configs/mmdet/vfnet/vfnet_r101_fpn_2x_coco.py +++ b/mmpose/configs/mmdet/vfnet/vfnet_r101_fpn_2x_coco.py @@ -1,20 +1,10 @@ -_base_ = './vfnet_r50_fpn_1x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./vfnet_r50_fpn_1x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) # learning policy max_epochs = 24 param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/vfnet/vfnet_r101_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/vfnet/vfnet_r101_fpn_ms-2x_coco.py index 62f064b7473f4e6fec3ac50962240ac1f828753f..0b7f6759b467abba7d3e46a90011c047e240711e 100644 --- a/mmpose/configs/mmdet/vfnet/vfnet_r101_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/vfnet/vfnet_r101_fpn_ms-2x_coco.py @@ -1,6 +1,2 @@ -_base_ = './vfnet_r50_fpn_ms-2x_coco.py' -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +_base_ = "./vfnet_r50_fpn_ms-2x_coco.py" +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/vfnet/vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/vfnet/vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco.py index 08adf927599b7759dea0e2d14c37ce716482b301..a60e1f9fe7c6d550a2b4dd7e8644fa71c983a3b2 100644 --- a/mmpose/configs/mmdet/vfnet/vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/vfnet/vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco.py @@ -1,6 +1,5 @@ -_base_ = './vfnet_r50_fpn_ms-2x_coco.py' +_base_ = "./vfnet_r50_fpn_ms-2x_coco.py" model = dict( - backbone=dict( - dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True)), - bbox_head=dict(dcn_on_last_conv=True)) + backbone=dict(dcn=dict(type="DCNv2", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True)), + bbox_head=dict(dcn_on_last_conv=True), +) diff --git a/mmpose/configs/mmdet/vfnet/vfnet_r50_fpn_1x_coco.py b/mmpose/configs/mmdet/vfnet/vfnet_r50_fpn_1x_coco.py index 99bc3b5f4c78c7a7cda11e20f209ea40af7dfd80..a8b76ae7c755f8ad8e6a855ea4993615f9c71fef 100644 --- a/mmpose/configs/mmdet/vfnet/vfnet_r50_fpn_1x_coco.py +++ b/mmpose/configs/mmdet/vfnet/vfnet_r50_fpn_1x_coco.py @@ -1,36 +1,32 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings model = dict( - type='VFNet', + type="VFNet", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), neck=dict( - type='FPN', + type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, - add_extra_convs='on_output', # use P5 + add_extra_convs="on_output", # use P5 num_outs=5, - relu_before_extra_convs=True), + relu_before_extra_convs=True, + ), bbox_head=dict( - type='VFNetHead', + type="VFNetHead", num_classes=80, in_channels=256, stacked_convs=3, @@ -40,65 +36,40 @@ model = dict( dcn_on_last_conv=False, use_atss=True, use_vfl=True, - loss_cls=dict( - type='VarifocalLoss', - use_sigmoid=True, - alpha=0.75, - gamma=2.0, - iou_weighted=True, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=1.5), - loss_bbox_refine=dict(type='GIoULoss', loss_weight=2.0)), + loss_cls=dict(type="VarifocalLoss", use_sigmoid=True, alpha=0.75, gamma=2.0, iou_weighted=True, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=1.5), + loss_bbox_refine=dict(type="GIoULoss", loss_weight=2.0), + ), # training and testing settings - train_cfg=dict( - assigner=dict(type='ATSSAssigner', topk=9), - allowed_border=-1, - pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + train_cfg=dict(assigner=dict(type="ATSSAssigner", topk=9), allowed_border=-1, pos_weight=-1, debug=False), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # data setting train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader # optimizer -optim_wrapper = dict( - optimizer=dict(lr=0.01), - paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.), - clip_grad=None) +optim_wrapper = dict(optimizer=dict(lr=0.01), paramwise_cfg=dict(bias_lr_mult=2.0, bias_decay_mult=0.0), clip_grad=None) # learning rate max_epochs = 12 param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[8, 11], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/vfnet/vfnet_r50_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/vfnet/vfnet_r50_fpn_ms-2x_coco.py index 0f8eed298e81967582420ac45a241b2726c47f6a..6d68301d9e060d1139a838cf45b4643cf41ba4f8 100644 --- a/mmpose/configs/mmdet/vfnet/vfnet_r50_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/vfnet/vfnet_r50_fpn_ms-2x_coco.py @@ -1,21 +1,16 @@ -_base_ = './vfnet_r50_fpn_1x_coco.py' +_base_ = "./vfnet_r50_fpn_1x_coco.py" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', scale=[(1333, 480), (1333, 960)], - keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=[(1333, 480), (1333, 960)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) @@ -23,14 +18,8 @@ test_dataloader = val_dataloader # learning policy max_epochs = 24 param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[16, 22], - gamma=0.1) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[16, 22], gamma=0.1), ] train_cfg = dict(max_epochs=max_epochs) diff --git a/mmpose/configs/mmdet/vfnet/vfnet_res2net-101_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/vfnet/vfnet_res2net-101_fpn_ms-2x_coco.py index 94288e8e80e5be2c6e8effd38e30e239cd1e3c5f..526eda3e2fdf95ed0d9711873682a7c3d731b68e 100644 --- a/mmpose/configs/mmdet/vfnet/vfnet_res2net-101_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/vfnet/vfnet_res2net-101_fpn_ms-2x_coco.py @@ -1,16 +1,16 @@ -_base_ = './vfnet_r50_fpn_ms-2x_coco.py' +_base_ = "./vfnet_r50_fpn_ms-2x_coco.py" model = dict( backbone=dict( - type='Res2Net', + type="Res2Net", depth=101, scales=4, base_width=26, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://res2net101_v1d_26w_4s'))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://res2net101_v1d_26w_4s"), + ) +) diff --git a/mmpose/configs/mmdet/vfnet/vfnet_res2net101-mdconv-c3-c5_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/vfnet/vfnet_res2net101-mdconv-c3-c5_fpn_ms-2x_coco.py index 269330d3d8c218e51c3e65b550e4afc3296f2ec4..2104e20ced58149a6b2d4d6fbd0a0bbd9967cd53 100644 --- a/mmpose/configs/mmdet/vfnet/vfnet_res2net101-mdconv-c3-c5_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/vfnet/vfnet_res2net101-mdconv-c3-c5_fpn_ms-2x_coco.py @@ -1,18 +1,18 @@ -_base_ = './vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco.py' +_base_ = "./vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco.py" model = dict( backbone=dict( - type='Res2Net', + type="Res2Net", depth=101, scales=4, base_width=26, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), + style="pytorch", + dcn=dict(type="DCNv2", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True), - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://res2net101_v1d_26w_4s'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://res2net101_v1d_26w_4s"), + ) +) diff --git a/mmpose/configs/mmdet/vfnet/vfnet_x101-32x4d-mdconv-c3-c5_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/vfnet/vfnet_x101-32x4d-mdconv-c3-c5_fpn_ms-2x_coco.py index 465da0cbdf4c4ae34d648349f4f9fa2d3fb13fe6..4949694386988899968c3af295f0a0bfc300c458 100644 --- a/mmpose/configs/mmdet/vfnet/vfnet_x101-32x4d-mdconv-c3-c5_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/vfnet/vfnet_x101-32x4d-mdconv-c3-c5_fpn_ms-2x_coco.py @@ -1,17 +1,18 @@ -_base_ = './vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco.py' +_base_ = "./vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), + style="pytorch", + dcn=dict(type="DCNv2", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/vfnet/vfnet_x101-32x4d_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/vfnet/vfnet_x101-32x4d_fpn_ms-2x_coco.py index 486bcfe5ebd85f8c4ac3b211694e7dd9d13aa302..5dbb5eb16e5de25ad0ab6bcecd535a6ecf811843 100644 --- a/mmpose/configs/mmdet/vfnet/vfnet_x101-32x4d_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/vfnet/vfnet_x101-32x4d_fpn_ms-2x_coco.py @@ -1,15 +1,16 @@ -_base_ = './vfnet_r50_fpn_ms-2x_coco.py' +_base_ = "./vfnet_r50_fpn_ms-2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=32, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_32x4d'))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_32x4d"), + ) +) diff --git a/mmpose/configs/mmdet/vfnet/vfnet_x101-64x4d-mdconv-c3-c5_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/vfnet/vfnet_x101-64x4d-mdconv-c3-c5_fpn_ms-2x_coco.py index 14a070e73ff54d6833aced096e2d94da4171ca42..00e125e5a3390e0df2bc564d2c52bedfad1fe757 100644 --- a/mmpose/configs/mmdet/vfnet/vfnet_x101-64x4d-mdconv-c3-c5_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/vfnet/vfnet_x101-64x4d-mdconv-c3-c5_fpn_ms-2x_coco.py @@ -1,17 +1,18 @@ -_base_ = './vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco.py' +_base_ = "./vfnet_r50-mdconv-c3-c5_fpn_ms-2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), + style="pytorch", + dcn=dict(type="DCNv2", deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, True, True, True), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/vfnet/vfnet_x101-64x4d_fpn_ms-2x_coco.py b/mmpose/configs/mmdet/vfnet/vfnet_x101-64x4d_fpn_ms-2x_coco.py index 92e3f71df6818a5653ec9c0475c277d89a1adb47..2f8de509947fe66bc31a5f14ef173d8a086c93f9 100644 --- a/mmpose/configs/mmdet/vfnet/vfnet_x101-64x4d_fpn_ms-2x_coco.py +++ b/mmpose/configs/mmdet/vfnet/vfnet_x101-64x4d_fpn_ms-2x_coco.py @@ -1,15 +1,16 @@ -_base_ = './vfnet_r50_fpn_ms-2x_coco.py' +_base_ = "./vfnet_r50_fpn_ms-2x_coco.py" model = dict( backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'))) + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ) +) diff --git a/mmpose/configs/mmdet/wider_face/retinanet_r50_fpn_1x_widerface.py b/mmpose/configs/mmdet/wider_face/retinanet_r50_fpn_1x_widerface.py index 78067255f8f69f9d193e8d3ae2fe8a685e4defe1..a4b394a72840c2fe270e71e9257134efa39c3a4f 100644 --- a/mmpose/configs/mmdet/wider_face/retinanet_r50_fpn_1x_widerface.py +++ b/mmpose/configs/mmdet/wider_face/retinanet_r50_fpn_1x_widerface.py @@ -1,10 +1,10 @@ _base_ = [ - '../_base_/models/retinanet_r50_fpn.py', - '../_base_/datasets/wider_face.py', '../_base_/schedules/schedule_1x.py', - '../_base_/default_runtime.py' + "../_base_/models/retinanet_r50_fpn.py", + "../_base_/datasets/wider_face.py", + "../_base_/schedules/schedule_1x.py", + "../_base_/default_runtime.py", ] # model settings model = dict(bbox_head=dict(num_classes=1)) # optimizer -optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)) +optim_wrapper = dict(optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0001)) diff --git a/mmpose/configs/mmdet/wider_face/ssd300_8xb32-24e_widerface.py b/mmpose/configs/mmdet/wider_face/ssd300_8xb32-24e_widerface.py index 02c3c927f78ff022b03bf180789ce91d6061ec9e..f241a599b534f5ce92e7fe7718c160192bc0accd 100644 --- a/mmpose/configs/mmdet/wider_face/ssd300_8xb32-24e_widerface.py +++ b/mmpose/configs/mmdet/wider_face/ssd300_8xb32-24e_widerface.py @@ -1,62 +1,49 @@ _base_ = [ - '../_base_/models/ssd300.py', '../_base_/datasets/wider_face.py', - '../_base_/default_runtime.py', '../_base_/schedules/schedule_2x.py' + "../_base_/models/ssd300.py", + "../_base_/datasets/wider_face.py", + "../_base_/default_runtime.py", + "../_base_/schedules/schedule_2x.py", ] model = dict(bbox_head=dict(num_classes=1)) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PhotoMetricDistortion", brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict( - type='PhotoMetricDistortion', - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18), - dict( - type='Expand', + type="Expand", mean={{_base_.model.data_preprocessor.mean}}, to_rgb={{_base_.model.data_preprocessor.bgr_to_rgb}}, - ratio_range=(1, 4)), - dict( - type='MinIoURandomCrop', - min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), - min_crop_size=0.3), - dict(type='Resize', scale=(300, 300), keep_ratio=False), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + ratio_range=(1, 4), + ), + dict(type="MinIoURandomCrop", min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), + dict(type="Resize", scale=(300, 300), keep_ratio=False), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=_base_.backend_args), - dict(type='Resize', scale=(300, 300), keep_ratio=False), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=_base_.backend_args), + dict(type="Resize", scale=(300, 300), keep_ratio=False), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -dataset_type = 'WIDERFaceDataset' -data_root = 'data/WIDERFace/' -train_dataloader = dict( - batch_size=32, num_workers=8, dataset=dict(pipeline=train_pipeline)) +dataset_type = "WIDERFaceDataset" +data_root = "data/WIDERFace/" +train_dataloader = dict(batch_size=32, num_workers=8, dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader # learning rate param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, - end=1000), - dict(type='MultiStepLR', by_epoch=True, milestones=[16, 20], gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", by_epoch=True, milestones=[16, 20], gamma=0.1), ] # optimizer -optim_wrapper = dict( - optimizer=dict(lr=0.012, momentum=0.9, weight_decay=5e-4), - clip_grad=dict(max_norm=35, norm_type=2)) +optim_wrapper = dict(optimizer=dict(lr=0.012, momentum=0.9, weight_decay=5e-4), clip_grad=dict(max_norm=35, norm_type=2)) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/yolact/yolact_r101_1xb8-55e_coco.py b/mmpose/configs/mmdet/yolact/yolact_r101_1xb8-55e_coco.py index e6ffe29627ff5bd24b8e53be8d7defaa9eb91df7..95271ede009075f1b2d5b91223f280e14643ebdf 100644 --- a/mmpose/configs/mmdet/yolact/yolact_r101_1xb8-55e_coco.py +++ b/mmpose/configs/mmdet/yolact/yolact_r101_1xb8-55e_coco.py @@ -1,7 +1,3 @@ -_base_ = './yolact_r50_1xb8-55e_coco.py' +_base_ = "./yolact_r50_1xb8-55e_coco.py" -model = dict( - backbone=dict( - depth=101, - init_cfg=dict(type='Pretrained', - checkpoint='torchvision://resnet101'))) +model = dict(backbone=dict(depth=101, init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"))) diff --git a/mmpose/configs/mmdet/yolact/yolact_r50_1xb8-55e_coco.py b/mmpose/configs/mmdet/yolact/yolact_r50_1xb8-55e_coco.py index b7dabf1548a733cbf18b8007ae2fa9033a340af6..7f8ecd19c77d1e6b46f83727279753b04897cb36 100644 --- a/mmpose/configs/mmdet/yolact/yolact_r50_1xb8-55e_coco.py +++ b/mmpose/configs/mmdet/yolact/yolact_r50_1xb8-55e_coco.py @@ -1,166 +1,116 @@ -_base_ = [ - '../_base_/datasets/coco_instance.py', '../_base_/default_runtime.py' -] -img_norm_cfg = dict( - mean=[123.68, 116.78, 103.94], std=[58.40, 57.12, 57.38], to_rgb=True) +_base_ = ["../_base_/datasets/coco_instance.py", "../_base_/default_runtime.py"] +img_norm_cfg = dict(mean=[123.68, 116.78, 103.94], std=[58.40, 57.12, 57.38], to_rgb=True) # model settings input_size = 550 model = dict( - type='YOLACT', + type="YOLACT", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=img_norm_cfg['mean'], - std=img_norm_cfg['std'], - bgr_to_rgb=img_norm_cfg['to_rgb'], - pad_mask=True), + type="DetDataPreprocessor", mean=img_norm_cfg["mean"], std=img_norm_cfg["std"], bgr_to_rgb=img_norm_cfg["to_rgb"], pad_mask=True + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=-1, # do not freeze stem - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=False, # update the statistics of bn zero_init_residual=False, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), neck=dict( - type='FPN', + type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, - add_extra_convs='on_input', + add_extra_convs="on_input", num_outs=5, - upsample_cfg=dict(mode='bilinear')), + upsample_cfg=dict(mode="bilinear"), + ), bbox_head=dict( - type='YOLACTHead', + type="YOLACTHead", num_classes=80, in_channels=256, feat_channels=256, anchor_generator=dict( - type='AnchorGenerator', + type="AnchorGenerator", octave_base_scale=3, scales_per_octave=1, base_sizes=[8, 16, 32, 64, 128], ratios=[0.5, 1.0, 2.0], strides=[550.0 / x for x in [69, 35, 18, 9, 5]], - centers=[(550 * 0.5 / x, 550 * 0.5 / x) - for x in [69, 35, 18, 9, 5]]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2]), - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - reduction='none', - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.5), + centers=[(550 * 0.5 / x, 550 * 0.5 / x) for x in [69, 35, 18, 9, 5]], + ), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, reduction="none", loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.5), num_head_convs=1, num_protos=32, - use_ohem=True), + use_ohem=True, + ), mask_head=dict( - type='YOLACTProtonet', + type="YOLACTProtonet", in_channels=256, num_protos=32, num_classes=80, max_masks_to_train=100, loss_mask_weight=6.125, with_seg_branch=True, - loss_segm=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), + loss_segm=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + ), # training and testing settings train_cfg=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0., - ignore_iof_thr=-1, - gt_max_assign_all=False), - sampler=dict(type='PseudoSampler'), # YOLACT should use PseudoSampler + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0.0, ignore_iof_thr=-1, gt_max_assign_all=False), + sampler=dict(type="PseudoSampler"), # YOLACT should use PseudoSampler # smoothl1_beta=1., allowed_border=-1, pos_weight=-1, neg_pos_ratio=3, - debug=False), + debug=False, + ), test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - mask_thr=0.5, - iou_thr=0.5, - top_k=200, - max_per_img=100, - mask_thr_binary=0.5)) + nms_pre=1000, min_bbox_size=0, score_thr=0.05, mask_thr=0.5, iou_thr=0.5, top_k=200, max_per_img=100, mask_thr_binary=0.5 + ), +) # dataset settings train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='FilterAnnotations', min_gt_bbox_wh=(4.0, 4.0)), - dict( - type='Expand', - mean=img_norm_cfg['mean'], - to_rgb=img_norm_cfg['to_rgb'], - ratio_range=(1, 4)), - dict( - type='MinIoURandomCrop', - min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), - min_crop_size=0.3), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), - dict(type='RandomFlip', prob=0.5), - dict( - type='PhotoMetricDistortion', - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="FilterAnnotations", min_gt_bbox_wh=(4.0, 4.0)), + dict(type="Expand", mean=img_norm_cfg["mean"], to_rgb=img_norm_cfg["to_rgb"], ratio_range=(1, 4)), + dict(type="MinIoURandomCrop", min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), + dict(type="RandomFlip", prob=0.5), + dict(type="PhotoMetricDistortion", brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -train_dataloader = dict( - batch_size=8, - num_workers=4, - batch_sampler=None, - dataset=dict(pipeline=train_pipeline)) +train_dataloader = dict(batch_size=8, num_workers=4, batch_sampler=None, dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader max_epochs = 55 # training schedule for 55e -train_cfg = dict( - type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=max_epochs, val_interval=1) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") # learning rate param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[20, 42, 49, 52], - gamma=0.1) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[20, 42, 49, 52], gamma=0.1), ] # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=1e-3, momentum=0.9, weight_decay=5e-4)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(type="SGD", lr=1e-3, momentum=0.9, weight_decay=5e-4)) -custom_hooks = [ - dict(type='CheckInvalidLossHook', interval=50, priority='VERY_LOW') -] +custom_hooks = [dict(type="CheckInvalidLossHook", interval=50, priority="VERY_LOW")] env_cfg = dict(cudnn_benchmark=True) diff --git a/mmpose/configs/mmdet/yolact/yolact_r50_8xb8-55e_coco.py b/mmpose/configs/mmdet/yolact/yolact_r50_8xb8-55e_coco.py index e39c285da10ef4821343ebf3c0d0d4c094a97198..04504c892d06d3f4c5ae9111d0750b546195e78b 100644 --- a/mmpose/configs/mmdet/yolact/yolact_r50_8xb8-55e_coco.py +++ b/mmpose/configs/mmdet/yolact/yolact_r50_8xb8-55e_coco.py @@ -1,21 +1,12 @@ -_base_ = 'yolact_r50_1xb8-55e_coco.py' +_base_ = "yolact_r50_1xb8-55e_coco.py" # optimizer -optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(lr=8e-3), - clip_grad=dict(max_norm=35, norm_type=2)) +optim_wrapper = dict(type="OptimWrapper", optimizer=dict(lr=8e-3), clip_grad=dict(max_norm=35, norm_type=2)) # learning rate max_epochs = 55 param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=1000), - dict( - type='MultiStepLR', - begin=0, - end=max_epochs, - by_epoch=True, - milestones=[20, 42, 49, 52], - gamma=0.1) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=1000), + dict(type="MultiStepLR", begin=0, end=max_epochs, by_epoch=True, milestones=[20, 42, 49, 52], gamma=0.1), ] # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-320-273e_coco.py b/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-320-273e_coco.py index a3d08dd7706e5ba5bec5fc9e8da6fab120ed813d..0493ebd3c296c6b350d11a002b471dc048314899 100644 --- a/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-320-273e_coco.py +++ b/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-320-273e_coco.py @@ -1,28 +1,22 @@ -_base_ = './yolov3_d53_8xb8-ms-608-273e_coco.py' +_base_ = "./yolov3_d53_8xb8-ms-608-273e_coco.py" input_size = (320, 320) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), # `mean` and `to_rgb` should be the same with the `preprocess_cfg` - dict(type='Expand', mean=[0, 0, 0], to_rgb=True, ratio_range=(1, 2)), - dict( - type='MinIoURandomCrop', - min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9), - min_crop_size=0.3), - dict(type='Resize', scale=input_size, keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PhotoMetricDistortion'), - dict(type='PackDetInputs') + dict(type="Expand", mean=[0, 0, 0], to_rgb=True, ratio_range=(1, 2)), + dict(type="MinIoURandomCrop", min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9), min_crop_size=0.3), + dict(type="Resize", scale=input_size, keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PhotoMetricDistortion"), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=input_size, keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=input_size, keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) diff --git a/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-amp-ms-608-273e_coco.py b/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-amp-ms-608-273e_coco.py index 173d8ee22227b3c3f4aa0488cb4e6f131d7dbee4..536e4db13092d55716c9d72f7b226bdae8b53461 100644 --- a/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-amp-ms-608-273e_coco.py +++ b/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-amp-ms-608-273e_coco.py @@ -1,3 +1,3 @@ -_base_ = './yolov3_d53_8xb8-ms-608-273e_coco.py' +_base_ = "./yolov3_d53_8xb8-ms-608-273e_coco.py" # fp16 settings -optim_wrapper = dict(type='AmpOptimWrapper', loss_scale='dynamic') +optim_wrapper = dict(type="AmpOptimWrapper", loss_scale="dynamic") diff --git a/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-ms-416-273e_coco.py b/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-ms-416-273e_coco.py index ca0127e83edaeb8d5851ed089f6bd6d7385a1f86..b334ab54c41108d323979103b176cf0717418c48 100644 --- a/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-ms-416-273e_coco.py +++ b/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-ms-416-273e_coco.py @@ -1,27 +1,21 @@ -_base_ = './yolov3_d53_8xb8-ms-608-273e_coco.py' +_base_ = "./yolov3_d53_8xb8-ms-608-273e_coco.py" train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), # `mean` and `to_rgb` should be the same with the `preprocess_cfg` - dict(type='Expand', mean=[0, 0, 0], to_rgb=True, ratio_range=(1, 2)), - dict( - type='MinIoURandomCrop', - min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9), - min_crop_size=0.3), - dict(type='RandomResize', scale=[(320, 320), (416, 416)], keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PhotoMetricDistortion'), - dict(type='PackDetInputs') + dict(type="Expand", mean=[0, 0, 0], to_rgb=True, ratio_range=(1, 2)), + dict(type="MinIoURandomCrop", min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9), min_crop_size=0.3), + dict(type="RandomResize", scale=[(320, 320), (416, 416)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PhotoMetricDistortion"), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(416, 416), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(416, 416), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) diff --git a/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py b/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py index d4a36dfdaaf9b9e013882a6c28d42cca5942be20..dc915d4a50b244c11cc5f3a4277f2f19c8b453e1 100644 --- a/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py +++ b/mmpose/configs/mmdet/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py @@ -1,70 +1,35 @@ -_base_ = ['../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'] +_base_ = ["../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings -data_preprocessor = dict( - type='DetDataPreprocessor', - mean=[0, 0, 0], - std=[255., 255., 255.], - bgr_to_rgb=True, - pad_size_divisor=32) +data_preprocessor = dict(type="DetDataPreprocessor", mean=[0, 0, 0], std=[255.0, 255.0, 255.0], bgr_to_rgb=True, pad_size_divisor=32) model = dict( - type='YOLOV3', + type="YOLOV3", data_preprocessor=data_preprocessor, - backbone=dict( - type='Darknet', - depth=53, - out_indices=(3, 4, 5), - init_cfg=dict(type='Pretrained', checkpoint='open-mmlab://darknet53')), - neck=dict( - type='YOLOV3Neck', - num_scales=3, - in_channels=[1024, 512, 256], - out_channels=[512, 256, 128]), + backbone=dict(type="Darknet", depth=53, out_indices=(3, 4, 5), init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://darknet53")), + neck=dict(type="YOLOV3Neck", num_scales=3, in_channels=[1024, 512, 256], out_channels=[512, 256, 128]), bbox_head=dict( - type='YOLOV3Head', + type="YOLOV3Head", num_classes=80, in_channels=[512, 256, 128], out_channels=[1024, 512, 256], anchor_generator=dict( - type='YOLOAnchorGenerator', - base_sizes=[[(116, 90), (156, 198), (373, 326)], - [(30, 61), (62, 45), (59, 119)], - [(10, 13), (16, 30), (33, 23)]], - strides=[32, 16, 8]), - bbox_coder=dict(type='YOLOBBoxCoder'), + type="YOLOAnchorGenerator", + base_sizes=[[(116, 90), (156, 198), (373, 326)], [(30, 61), (62, 45), (59, 119)], [(10, 13), (16, 30), (33, 23)]], + strides=[32, 16, 8], + ), + bbox_coder=dict(type="YOLOBBoxCoder"), featmap_strides=[32, 16, 8], - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - loss_weight=1.0, - reduction='sum'), - loss_conf=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - loss_weight=1.0, - reduction='sum'), - loss_xy=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - loss_weight=2.0, - reduction='sum'), - loss_wh=dict(type='MSELoss', loss_weight=2.0, reduction='sum')), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0, reduction="sum"), + loss_conf=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0, reduction="sum"), + loss_xy=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=2.0, reduction="sum"), + loss_wh=dict(type="MSELoss", loss_weight=2.0, reduction="sum"), + ), # training and testing settings - train_cfg=dict( - assigner=dict( - type='GridAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0)), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - conf_thr=0.005, - nms=dict(type='nms', iou_threshold=0.45), - max_per_img=100)) + train_cfg=dict(assigner=dict(type="GridAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0)), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, conf_thr=0.005, nms=dict(type="nms", iou_threshold=0.45), max_per_img=100), +) # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -82,84 +47,73 @@ data_root = 'data/coco/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='Expand', - mean=data_preprocessor['mean'], - to_rgb=data_preprocessor['bgr_to_rgb'], - ratio_range=(1, 2)), - dict( - type='MinIoURandomCrop', - min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9), - min_crop_size=0.3), - dict(type='RandomResize', scale=[(320, 320), (608, 608)], keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PhotoMetricDistortion'), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Expand", mean=data_preprocessor["mean"], to_rgb=data_preprocessor["bgr_to_rgb"], ratio_range=(1, 2)), + dict(type="MinIoURandomCrop", min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9), min_crop_size=0.3), + dict(type="RandomResize", scale=[(320, 320), (608, 608)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PhotoMetricDistortion"), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(608, 608), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(608, 608), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=8, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox', - backend_args=backend_args) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/instances_val2017.json", metric="bbox", backend_args=backend_args) test_evaluator = val_evaluator train_cfg = dict(max_epochs=273, val_interval=7) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0005), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", optimizer=dict(type="SGD", lr=0.001, momentum=0.9, weight_decay=0.0005), clip_grad=dict(max_norm=35, norm_type=2) +) # learning policy param_scheduler = [ - dict(type='LinearLR', start_factor=0.1, by_epoch=False, begin=0, end=2000), - dict(type='MultiStepLR', by_epoch=True, milestones=[218, 246], gamma=0.1) + dict(type="LinearLR", start_factor=0.1, by_epoch=False, begin=0, end=2000), + dict(type="MultiStepLR", by_epoch=True, milestones=[218, 246], gamma=0.1), ] -default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=7)) +default_hooks = dict(checkpoint=dict(type="CheckpointHook", interval=7)) # NOTE: `auto_scale_lr` is for automatically scaling LR, # USER SHOULD NOT CHANGE ITS VALUES. diff --git a/mmpose/configs/mmdet/yolo/yolov3_mobilenetv2_8xb24-320-300e_coco.py b/mmpose/configs/mmdet/yolo/yolov3_mobilenetv2_8xb24-320-300e_coco.py index 07b393734329fd3ed5f4bd11fbc15b4abf7846bb..ce276c7f64b4363324486876d83f6db72ffab04e 100644 --- a/mmpose/configs/mmdet/yolo/yolov3_mobilenetv2_8xb24-320-300e_coco.py +++ b/mmpose/configs/mmdet/yolo/yolov3_mobilenetv2_8xb24-320-300e_coco.py @@ -1,41 +1,32 @@ -_base_ = ['./yolov3_mobilenetv2_8xb24-ms-416-300e_coco.py'] +_base_ = ["./yolov3_mobilenetv2_8xb24-ms-416-300e_coco.py"] # yapf:disable model = dict( bbox_head=dict( anchor_generator=dict( - base_sizes=[[(220, 125), (128, 222), (264, 266)], - [(35, 87), (102, 96), (60, 170)], - [(10, 15), (24, 36), (72, 42)]]))) + base_sizes=[[(220, 125), (128, 222), (264, 266)], [(35, 87), (102, 96), (60, 170)], [(10, 15), (24, 36), (72, 42)]] + ) + ) +) # yapf:enable input_size = (320, 320) train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), # `mean` and `to_rgb` should be the same with the `preprocess_cfg` - dict( - type='Expand', - mean=[123.675, 116.28, 103.53], - to_rgb=True, - ratio_range=(1, 2)), - dict( - type='MinIoURandomCrop', - min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9), - min_crop_size=0.3), - dict(type='Resize', scale=input_size, keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PhotoMetricDistortion'), - dict(type='PackDetInputs') + dict(type="Expand", mean=[123.675, 116.28, 103.53], to_rgb=True, ratio_range=(1, 2)), + dict(type="MinIoURandomCrop", min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9), min_crop_size=0.3), + dict(type="Resize", scale=input_size, keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PhotoMetricDistortion"), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=input_size, keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=input_size, keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict(dataset=dict(dataset=dict(pipeline=train_pipeline))) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) diff --git a/mmpose/configs/mmdet/yolo/yolov3_mobilenetv2_8xb24-ms-416-300e_coco.py b/mmpose/configs/mmdet/yolo/yolov3_mobilenetv2_8xb24-ms-416-300e_coco.py index 9a161b66fe92666e904a9580ab5a1ff16d630ab7..5c8cdb443048ca9296c4f132c0cff6edfcf7391c 100644 --- a/mmpose/configs/mmdet/yolo/yolov3_mobilenetv2_8xb24-ms-416-300e_coco.py +++ b/mmpose/configs/mmdet/yolo/yolov3_mobilenetv2_8xb24-ms-416-300e_coco.py @@ -1,71 +1,42 @@ -_base_ = ['../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'] +_base_ = ["../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] # model settings data_preprocessor = dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32) + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 +) model = dict( - type='YOLOV3', + type="YOLOV3", data_preprocessor=data_preprocessor, backbone=dict( - type='MobileNetV2', + type="MobileNetV2", out_indices=(2, 4, 6), - act_cfg=dict(type='LeakyReLU', negative_slope=0.1), - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://mmdet/mobilenet_v2')), - neck=dict( - type='YOLOV3Neck', - num_scales=3, - in_channels=[320, 96, 32], - out_channels=[96, 96, 96]), + act_cfg=dict(type="LeakyReLU", negative_slope=0.1), + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://mmdet/mobilenet_v2"), + ), + neck=dict(type="YOLOV3Neck", num_scales=3, in_channels=[320, 96, 32], out_channels=[96, 96, 96]), bbox_head=dict( - type='YOLOV3Head', + type="YOLOV3Head", num_classes=80, in_channels=[96, 96, 96], out_channels=[96, 96, 96], anchor_generator=dict( - type='YOLOAnchorGenerator', - base_sizes=[[(116, 90), (156, 198), (373, 326)], - [(30, 61), (62, 45), (59, 119)], - [(10, 13), (16, 30), (33, 23)]], - strides=[32, 16, 8]), - bbox_coder=dict(type='YOLOBBoxCoder'), + type="YOLOAnchorGenerator", + base_sizes=[[(116, 90), (156, 198), (373, 326)], [(30, 61), (62, 45), (59, 119)], [(10, 13), (16, 30), (33, 23)]], + strides=[32, 16, 8], + ), + bbox_coder=dict(type="YOLOBBoxCoder"), featmap_strides=[32, 16, 8], - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - loss_weight=1.0, - reduction='sum'), - loss_conf=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - loss_weight=1.0, - reduction='sum'), - loss_xy=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - loss_weight=2.0, - reduction='sum'), - loss_wh=dict(type='MSELoss', loss_weight=2.0, reduction='sum')), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0, reduction="sum"), + loss_conf=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0, reduction="sum"), + loss_xy=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=2.0, reduction="sum"), + loss_wh=dict(type="MSELoss", loss_weight=2.0, reduction="sum"), + ), # training and testing settings - train_cfg=dict( - assigner=dict( - type='GridAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0)), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - conf_thr=0.005, - nms=dict(type='nms', iou_threshold=0.45), - max_per_img=100)) + train_cfg=dict(assigner=dict(type="GridAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0)), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, conf_thr=0.005, nms=dict(type="nms", iou_threshold=0.45), max_per_img=100), +) # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -83,89 +54,74 @@ data_root = 'data/coco/' backend_args = None train_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='Expand', - mean=data_preprocessor['mean'], - to_rgb=data_preprocessor['bgr_to_rgb'], - ratio_range=(1, 2)), - dict( - type='MinIoURandomCrop', - min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9), - min_crop_size=0.3), - dict(type='RandomResize', scale=[(320, 320), (416, 416)], keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PhotoMetricDistortion'), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Expand", mean=data_preprocessor["mean"], to_rgb=data_preprocessor["bgr_to_rgb"], ratio_range=(1, 2)), + dict(type="MinIoURandomCrop", min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9), min_crop_size=0.3), + dict(type="RandomResize", scale=[(320, 320), (416, 416)], keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PhotoMetricDistortion"), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=(416, 416), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=(416, 416), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=24, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( - type='RepeatDataset', # use RepeatDataset to speed up training + type="RepeatDataset", # use RepeatDataset to speed up training times=10, dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, - backend_args=backend_args))) + backend_args=backend_args, + ), + ), +) val_dataloader = dict( batch_size=24, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox', - backend_args=backend_args) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/instances_val2017.json", metric="bbox", backend_args=backend_args) test_evaluator = val_evaluator train_cfg = dict(max_epochs=30) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='SGD', lr=0.003, momentum=0.9, weight_decay=0.0005), - clip_grad=dict(max_norm=35, norm_type=2)) + type="OptimWrapper", optimizer=dict(type="SGD", lr=0.003, momentum=0.9, weight_decay=0.0005), clip_grad=dict(max_norm=35, norm_type=2) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', - start_factor=0.0001, - by_epoch=False, - begin=0, - end=4000), - dict(type='MultiStepLR', by_epoch=True, milestones=[24, 28], gamma=0.1) + dict(type="LinearLR", start_factor=0.0001, by_epoch=False, begin=0, end=4000), + dict(type="MultiStepLR", by_epoch=True, milestones=[24, 28], gamma=0.1), ] find_unused_parameters = True diff --git a/mmpose/configs/mmdet/yolof/yolof_r50-c5_8xb8-1x_coco.py b/mmpose/configs/mmdet/yolof/yolof_r50-c5_8xb8-1x_coco.py index 5ea228e3e3270e07a4e5b171ab544c704fb172f3..655c5a9f6522904bc4a6aedc5bbfa0fdcdb28e7b 100644 --- a/mmpose/configs/mmdet/yolof/yolof_r50-c5_8xb8-1x_coco.py +++ b/mmpose/configs/mmdet/yolof/yolof_r50-c5_8xb8-1x_coco.py @@ -1,112 +1,73 @@ -_base_ = [ - '../_base_/datasets/coco_detection.py', - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' -] +_base_ = ["../_base_/datasets/coco_detection.py", "../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py"] model = dict( - type='YOLOF', + type="YOLOF", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[103.530, 116.280, 123.675], - std=[1.0, 1.0, 1.0], - bgr_to_rgb=False, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], bgr_to_rgb=False, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, - out_indices=(3, ), + out_indices=(3,), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=False), + norm_cfg=dict(type="BN", requires_grad=False), norm_eval=True, - style='caffe', - init_cfg=dict( - type='Pretrained', - checkpoint='open-mmlab://detectron/resnet50_caffe')), + style="caffe", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://detectron/resnet50_caffe"), + ), neck=dict( - type='DilatedEncoder', + type="DilatedEncoder", in_channels=2048, out_channels=512, block_mid_channels=128, num_residual_blocks=4, - block_dilations=[2, 4, 6, 8]), + block_dilations=[2, 4, 6, 8], + ), bbox_head=dict( - type='YOLOFHead', + type="YOLOFHead", num_classes=80, in_channels=512, reg_decoded_bbox=True, - anchor_generator=dict( - type='AnchorGenerator', - ratios=[1.0], - scales=[1, 2, 4, 8, 16], - strides=[32]), + anchor_generator=dict(type="AnchorGenerator", ratios=[1.0], scales=[1, 2, 4, 8, 16], strides=[32]), bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1., 1., 1., 1.], - add_ctr_clamp=True, - ctr_clamp=32), - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox=dict(type='GIoULoss', loss_weight=1.0)), + type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0], add_ctr_clamp=True, ctr_clamp=32 + ), + loss_cls=dict(type="FocalLoss", use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), + loss_bbox=dict(type="GIoULoss", loss_weight=1.0), + ), # training and testing settings train_cfg=dict( - assigner=dict( - type='UniformAssigner', pos_ignore_thr=0.15, neg_ignore_thr=0.7), - allowed_border=-1, - pos_weight=-1, - debug=False), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + assigner=dict(type="UniformAssigner", pos_ignore_thr=0.15, neg_ignore_thr=0.7), allowed_border=-1, pos_weight=-1, debug=False + ), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # optimizer optim_wrapper = dict( - optimizer=dict(type='SGD', lr=0.12, momentum=0.9, weight_decay=0.0001), - paramwise_cfg=dict( - norm_decay_mult=0., custom_keys={'backbone': dict(lr_mult=1. / 3)})) + optimizer=dict(type="SGD", lr=0.12, momentum=0.9, weight_decay=0.0001), + paramwise_cfg=dict(norm_decay_mult=0.0, custom_keys={"backbone": dict(lr_mult=1.0 / 3)}), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=0.00066667, - by_epoch=False, - begin=0, - end=1500), - dict( - type='MultiStepLR', - begin=0, - end=12, - by_epoch=True, - milestones=[8, 11], - gamma=0.1) + dict(type="LinearLR", start_factor=0.00066667, by_epoch=False, begin=0, end=1500), + dict(type="MultiStepLR", begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1), ] train_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='RandomShift', prob=0.5, max_shift_px=32), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="RandomShift", prob=0.5, max_shift_px=32), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -train_dataloader = dict( - batch_size=8, num_workers=8, dataset=dict(pipeline=train_pipeline)) +train_dataloader = dict(batch_size=8, num_workers=8, dataset=dict(pipeline=train_pipeline)) val_dataloader = dict(dataset=dict(pipeline=test_pipeline)) test_dataloader = val_dataloader diff --git a/mmpose/configs/mmdet/yolof/yolof_r50-c5_8xb8-iter-1x_coco.py b/mmpose/configs/mmdet/yolof/yolof_r50-c5_8xb8-iter-1x_coco.py index 466a820099e3ac1760371e8352a89f93fbeef5ee..9213a453a6365e32907627fd717bc2cabe57c629 100644 --- a/mmpose/configs/mmdet/yolof/yolof_r50-c5_8xb8-iter-1x_coco.py +++ b/mmpose/configs/mmdet/yolof/yolof_r50-c5_8xb8-iter-1x_coco.py @@ -1,4 +1,4 @@ -_base_ = './yolof_r50-c5_8xb8-1x_coco.py' +_base_ = "./yolof_r50-c5_8xb8-1x_coco.py" # We implemented the iter-based config according to the source code. # COCO dataset has 117266 images after filtering. We use 8 gpu and @@ -8,25 +8,14 @@ _base_ = './yolof_r50-c5_8xb8-1x_coco.py' # the iter-based and epoch-based setting have about 0.2 difference on # the mAP evaluation value. -train_cfg = dict( - _delete_=True, - type='IterBasedTrainLoop', - max_iters=22500, - val_interval=4500) +train_cfg = dict(_delete_=True, type="IterBasedTrainLoop", max_iters=22500, val_interval=4500) # learning rate policy param_scheduler = [ - dict( - type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), - dict( - type='MultiStepLR', - begin=0, - end=22500, - by_epoch=False, - milestones=[15000, 20000], - gamma=0.1) + dict(type="LinearLR", start_factor=0.001, by_epoch=False, begin=0, end=500), + dict(type="MultiStepLR", begin=0, end=22500, by_epoch=False, milestones=[15000, 20000], gamma=0.1), ] -train_dataloader = dict(sampler=dict(type='InfiniteSampler')) +train_dataloader = dict(sampler=dict(type="InfiniteSampler")) default_hooks = dict(checkpoint=dict(by_epoch=False, interval=2500)) log_processor = dict(by_epoch=False) diff --git a/mmpose/configs/mmdet/yolox/yolox_l_8xb8-300e_coco.py b/mmpose/configs/mmdet/yolox/yolox_l_8xb8-300e_coco.py index 2a4b287bad595db65df69b7d6f80163bd4a49e44..4d9a46476a628466c07e621badf6fbfe18126e7c 100644 --- a/mmpose/configs/mmdet/yolox/yolox_l_8xb8-300e_coco.py +++ b/mmpose/configs/mmdet/yolox/yolox_l_8xb8-300e_coco.py @@ -1,8 +1,8 @@ -_base_ = './yolox_s_8xb8-300e_coco.py' +_base_ = "./yolox_s_8xb8-300e_coco.py" # model settings model = dict( backbone=dict(deepen_factor=1.0, widen_factor=1.0), - neck=dict( - in_channels=[256, 512, 1024], out_channels=256, num_csp_blocks=3), - bbox_head=dict(in_channels=256, feat_channels=256)) + neck=dict(in_channels=[256, 512, 1024], out_channels=256, num_csp_blocks=3), + bbox_head=dict(in_channels=256, feat_channels=256), +) diff --git a/mmpose/configs/mmdet/yolox/yolox_m_8xb8-300e_coco.py b/mmpose/configs/mmdet/yolox/yolox_m_8xb8-300e_coco.py index d82f9e98f1fcd4a1c6089807adc3cca2b48d6b5e..b4df2981ba2f1b4a0d3baddbb1f60b65db15438d 100644 --- a/mmpose/configs/mmdet/yolox/yolox_m_8xb8-300e_coco.py +++ b/mmpose/configs/mmdet/yolox/yolox_m_8xb8-300e_coco.py @@ -1,4 +1,4 @@ -_base_ = './yolox_s_8xb8-300e_coco.py' +_base_ = "./yolox_s_8xb8-300e_coco.py" # model settings model = dict( diff --git a/mmpose/configs/mmdet/yolox/yolox_nano_8xb8-300e_coco.py b/mmpose/configs/mmdet/yolox/yolox_nano_8xb8-300e_coco.py index 3f7a1c5ab066439c78ffa005a2a60c9057223849..a9813b701a709daa3d6d702694c6589498b77658 100644 --- a/mmpose/configs/mmdet/yolox/yolox_nano_8xb8-300e_coco.py +++ b/mmpose/configs/mmdet/yolox/yolox_nano_8xb8-300e_coco.py @@ -1,11 +1,8 @@ -_base_ = './yolox_tiny_8xb8-300e_coco.py' +_base_ = "./yolox_tiny_8xb8-300e_coco.py" # model settings model = dict( backbone=dict(deepen_factor=0.33, widen_factor=0.25, use_depthwise=True), - neck=dict( - in_channels=[64, 128, 256], - out_channels=64, - num_csp_blocks=1, - use_depthwise=True), - bbox_head=dict(in_channels=64, feat_channels=64, use_depthwise=True)) + neck=dict(in_channels=[64, 128, 256], out_channels=64, num_csp_blocks=1, use_depthwise=True), + bbox_head=dict(in_channels=64, feat_channels=64, use_depthwise=True), +) diff --git a/mmpose/configs/mmdet/yolox/yolox_s_8xb8-300e_coco.py b/mmpose/configs/mmdet/yolox/yolox_s_8xb8-300e_coco.py index 3e324eb5b99202fd42c8d67847a1be1c165b4057..a66298fb1f33022bac7f73da0b7c0de0cde0e1e5 100644 --- a/mmpose/configs/mmdet/yolox/yolox_s_8xb8-300e_coco.py +++ b/mmpose/configs/mmdet/yolox/yolox_s_8xb8-300e_coco.py @@ -1,77 +1,59 @@ -_base_ = [ - '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py', - './yolox_tta.py' -] +_base_ = ["../_base_/schedules/schedule_1x.py", "../_base_/default_runtime.py", "./yolox_tta.py"] img_scale = (640, 640) # width, height # model settings model = dict( - type='YOLOX', + type="YOLOX", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", pad_size_divisor=32, - batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(480, 800), - size_divisor=32, - interval=10) - ]), + batch_augments=[dict(type="BatchSyncRandomResize", random_size_range=(480, 800), size_divisor=32, interval=10)], + ), backbone=dict( - type='CSPDarknet', + type="CSPDarknet", deepen_factor=0.33, widen_factor=0.5, out_indices=(2, 3, 4), use_depthwise=False, spp_kernal_sizes=(5, 9, 13), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), ), neck=dict( - type='YOLOXPAFPN', + type="YOLOXPAFPN", in_channels=[128, 256, 512], out_channels=128, num_csp_blocks=1, use_depthwise=False, - upsample_cfg=dict(scale_factor=2, mode='nearest'), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), + upsample_cfg=dict(scale_factor=2, mode="nearest"), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), bbox_head=dict( - type='YOLOXHead', + type="YOLOXHead", num_classes=80, in_channels=128, feat_channels=128, stacked_convs=2, strides=(8, 16, 32), use_depthwise=False, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - reduction='sum', - loss_weight=1.0), - loss_bbox=dict( - type='IoULoss', - mode='square', - eps=1e-16, - reduction='sum', - loss_weight=5.0), - loss_obj=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - reduction='sum', - loss_weight=1.0), - loss_l1=dict(type='L1Loss', reduction='sum', loss_weight=1.0)), - train_cfg=dict(assigner=dict(type='SimOTAAssigner', center_radius=2.5)), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, reduction="sum", loss_weight=1.0), + loss_bbox=dict(type="IoULoss", mode="square", eps=1e-16, reduction="sum", loss_weight=5.0), + loss_obj=dict(type="CrossEntropyLoss", use_sigmoid=True, reduction="sum", loss_weight=1.0), + loss_l1=dict(type="L1Loss", reduction="sum", loss_weight=1.0), + ), + train_cfg=dict(assigner=dict(type="SimOTAAssigner", center_radius=2.5)), # In order to align the source code, the threshold of the val phase is # 0.01, and the threshold of the test phase is 0.001. - test_cfg=dict(score_thr=0.01, nms=dict(type='nms', iou_threshold=0.65))) + test_cfg=dict(score_thr=0.01, nms=dict(type="nms", iou_threshold=0.65)), +) # dataset settings -data_root = 'data/coco/' -dataset_type = 'CocoDataset' +data_root = "data/coco/" +dataset_type = "CocoDataset" # Example to use different file client # Method 1: simply set the data root and let the file I/O module @@ -89,92 +71,78 @@ dataset_type = 'CocoDataset' backend_args = None train_pipeline = [ - dict(type='Mosaic', img_scale=img_scale, pad_val=114.0), + dict(type="Mosaic", img_scale=img_scale, pad_val=114.0), dict( - type='RandomAffine', + type="RandomAffine", scaling_ratio_range=(0.1, 2), # img_scale is (width, height) - border=(-img_scale[0] // 2, -img_scale[1] // 2)), - dict( - type='MixUp', - img_scale=img_scale, - ratio_range=(0.8, 1.6), - pad_val=114.0), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), + border=(-img_scale[0] // 2, -img_scale[1] // 2), + ), + dict(type="MixUp", img_scale=img_scale, ratio_range=(0.8, 1.6), pad_val=114.0), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), # According to the official implementation, multi-scale # training is not considered here but in the # 'mmdet/models/detectors/yolox.py'. # Resize and Pad are for the last 15 epochs when Mosaic, # RandomAffine, and MixUp are closed by YOLOXModeSwitchHook. - dict(type='Resize', scale=img_scale, keep_ratio=True), + dict(type="Resize", scale=img_scale, keep_ratio=True), dict( - type='Pad', + type="Pad", pad_to_square=True, # If the image is three-channel, the pad value needs # to be set separately for each channel. - pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False), - dict(type='PackDetInputs') + pad_val=dict(img=(114.0, 114.0, 114.0)), + ), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1), keep_empty=False), + dict(type="PackDetInputs"), ] train_dataset = dict( # use MultiImageMixDataset wrapper to support mosaic and mixup - type='MultiImageMixDataset', + type="MultiImageMixDataset", dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), - pipeline=[ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='LoadAnnotations', with_bbox=True) - ], + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), + pipeline=[dict(type="LoadImageFromFile", backend_args=backend_args), dict(type="LoadAnnotations", with_bbox=True)], filter_cfg=dict(filter_empty_gt=False, min_size=32), - backend_args=backend_args), - pipeline=train_pipeline) + backend_args=backend_args, + ), + pipeline=train_pipeline, +) test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=backend_args), - dict(type='Resize', scale=img_scale, keep_ratio=True), - dict( - type='Pad', - pad_to_square=True, - pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=backend_args), + dict(type="Resize", scale=img_scale, keep_ratio=True), + dict(type="Pad", pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( - batch_size=8, - num_workers=4, - persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - dataset=train_dataset) + batch_size=8, num_workers=4, persistent_workers=True, sampler=dict(type="DefaultSampler", shuffle=True), dataset=train_dataset +) val_dataloader = dict( batch_size=8, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=test_pipeline, - backend_args=backend_args)) + backend_args=backend_args, + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox', - backend_args=backend_args) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/instances_val2017.json", metric="bbox", backend_args=backend_args) test_evaluator = val_evaluator # training settings @@ -188,11 +156,10 @@ train_cfg = dict(max_epochs=max_epochs, val_interval=interval) # default 8 gpu base_lr = 0.01 optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict( - type='SGD', lr=base_lr, momentum=0.9, weight_decay=5e-4, - nesterov=True), - paramwise_cfg=dict(norm_decay_mult=0., bias_decay_mult=0.)) + type="OptimWrapper", + optimizer=dict(type="SGD", lr=base_lr, momentum=0.9, weight_decay=5e-4, nesterov=True), + paramwise_cfg=dict(norm_decay_mult=0.0, bias_decay_mult=0.0), +) # learning rate param_scheduler = [ @@ -200,48 +167,38 @@ param_scheduler = [ # use quadratic formula to warm up 5 epochs # and lr is updated by iteration # TODO: fix default scope in get function - type='mmdet.QuadraticWarmupLR', + type="mmdet.QuadraticWarmupLR", by_epoch=True, begin=0, end=5, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), dict( # use cosine lr from 5 to 285 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=5, T_max=max_epochs - num_last_epochs, end=max_epochs - num_last_epochs, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), dict( # use fixed lr during last 15 epochs - type='ConstantLR', + type="ConstantLR", by_epoch=True, factor=1, begin=max_epochs - num_last_epochs, end=max_epochs, - ) + ), ] -default_hooks = dict( - checkpoint=dict( - interval=interval, - max_keep_ckpts=3 # only keep latest 3 checkpoints - )) +default_hooks = dict(checkpoint=dict(interval=interval, max_keep_ckpts=3)) # only keep latest 3 checkpoints custom_hooks = [ - dict( - type='YOLOXModeSwitchHook', - num_last_epochs=num_last_epochs, - priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0001, - update_buffers=True, - priority=49) + dict(type="YOLOXModeSwitchHook", num_last_epochs=num_last_epochs, priority=48), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0001, update_buffers=True, priority=49), ] # NOTE: `auto_scale_lr` is for automatically scaling LR, diff --git a/mmpose/configs/mmdet/yolox/yolox_tiny_8xb8-300e_coco.py b/mmpose/configs/mmdet/yolox/yolox_tiny_8xb8-300e_coco.py index 86f7e9a6191066ab9b672d548b93a29e64746f29..007a064f741ebd0b3f52f4e8691641a596bf3f9b 100644 --- a/mmpose/configs/mmdet/yolox/yolox_tiny_8xb8-300e_coco.py +++ b/mmpose/configs/mmdet/yolox/yolox_tiny_8xb8-300e_coco.py @@ -1,52 +1,39 @@ -_base_ = './yolox_s_8xb8-300e_coco.py' +_base_ = "./yolox_s_8xb8-300e_coco.py" # model settings model = dict( - data_preprocessor=dict(batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(320, 640), - size_divisor=32, - interval=10) - ]), + data_preprocessor=dict(batch_augments=[dict(type="BatchSyncRandomResize", random_size_range=(320, 640), size_divisor=32, interval=10)]), backbone=dict(deepen_factor=0.33, widen_factor=0.375), neck=dict(in_channels=[96, 192, 384], out_channels=96), - bbox_head=dict(in_channels=96, feat_channels=96)) + bbox_head=dict(in_channels=96, feat_channels=96), +) img_scale = (640, 640) # width, height train_pipeline = [ - dict(type='Mosaic', img_scale=img_scale, pad_val=114.0), + dict(type="Mosaic", img_scale=img_scale, pad_val=114.0), dict( - type='RandomAffine', + type="RandomAffine", scaling_ratio_range=(0.5, 1.5), # img_scale is (width, height) - border=(-img_scale[0] // 2, -img_scale[1] // 2)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), + border=(-img_scale[0] // 2, -img_scale[1] // 2), + ), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), # Resize and Pad are for the last 15 epochs when Mosaic and # RandomAffine are closed by YOLOXModeSwitchHook. - dict(type='Resize', scale=img_scale, keep_ratio=True), - dict( - type='Pad', - pad_to_square=True, - pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False), - dict(type='PackDetInputs') + dict(type="Resize", scale=img_scale, keep_ratio=True), + dict(type="Pad", pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1), keep_empty=False), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}), - dict(type='Resize', scale=(416, 416), keep_ratio=True), - dict( - type='Pad', - pad_to_square=True, - pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args={{_base_.backend_args}}), + dict(type="Resize", scale=(416, 416), keep_ratio=True), + dict(type="Pad", pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) diff --git a/mmpose/configs/mmdet/yolox/yolox_tta.py b/mmpose/configs/mmdet/yolox/yolox_tta.py index e65244be6e1bb70393d111ef4d25334d3b2ce8a6..1a7bec18672fe0273c93bc88b220b0eda22a2ab6 100644 --- a/mmpose/configs/mmdet/yolox/yolox_tta.py +++ b/mmpose/configs/mmdet/yolox/yolox_tta.py @@ -1,36 +1,29 @@ -tta_model = dict( - type='DetTTAModel', - tta_cfg=dict(nms=dict(type='nms', iou_threshold=0.65), max_per_img=100)) +tta_model = dict(type="DetTTAModel", tta_cfg=dict(nms=dict(type="nms", iou_threshold=0.65), max_per_img=100)) img_scales = [(640, 640), (320, 320), (960, 960)] tta_pipeline = [ - dict(type='LoadImageFromFile', backend_args=None), + dict(type="LoadImageFromFile", backend_args=None), dict( - type='TestTimeAug', + type="TestTimeAug", transforms=[ - [ - dict(type='Resize', scale=s, keep_ratio=True) - for s in img_scales - ], + [dict(type="Resize", scale=s, keep_ratio=True) for s in img_scales], [ # ``RandomFlip`` must be placed before ``Pad``, otherwise # bounding box coordinates after flipping cannot be # recovered correctly. - dict(type='RandomFlip', prob=1.), - dict(type='RandomFlip', prob=0.) + dict(type="RandomFlip", prob=1.0), + dict(type="RandomFlip", prob=0.0), ], [ - dict( - type='Pad', - pad_to_square=True, - pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="Pad", pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), ], - [dict(type='LoadAnnotations', with_bbox=True)], + [dict(type="LoadAnnotations", with_bbox=True)], [ dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor', 'flip', 'flip_direction')) - ] - ]) + type="PackDetInputs", + meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor", "flip", "flip_direction"), + ) + ], + ], + ), ] diff --git a/mmpose/configs/mmdet/yolox/yolox_x_8xb8-300e_coco.py b/mmpose/configs/mmdet/yolox/yolox_x_8xb8-300e_coco.py index 34828e0363a2f282af59da74e805e59772dfeb69..3ba5eacd358f8a6c08e6026029e580d3c29f99c4 100644 --- a/mmpose/configs/mmdet/yolox/yolox_x_8xb8-300e_coco.py +++ b/mmpose/configs/mmdet/yolox/yolox_x_8xb8-300e_coco.py @@ -1,8 +1,8 @@ -_base_ = './yolox_s_8xb8-300e_coco.py' +_base_ = "./yolox_s_8xb8-300e_coco.py" # model settings model = dict( backbone=dict(deepen_factor=1.33, widen_factor=1.25), - neck=dict( - in_channels=[320, 640, 1280], out_channels=320, num_csp_blocks=4), - bbox_head=dict(in_channels=320, feat_channels=320)) + neck=dict(in_channels=[320, 640, 1280], out_channels=320, num_csp_blocks=4), + bbox_head=dict(in_channels=320, feat_channels=320), +) diff --git a/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s1_dis/dwpose_l_dis_m_coco-256x192.py b/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s1_dis/dwpose_l_dis_m_coco-256x192.py index 422871acbb08f9ecbb67144c3a76166151b37387..23740eb3685cedd040acd30cf7efb0eacf1ee17e 100644 --- a/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s1_dis/dwpose_l_dis_m_coco-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s1_dis/dwpose_l_dis_m_coco-256x192.py @@ -1,6 +1,4 @@ -_base_ = [ - '../../../rtmpose/coco-wholebody/rtmpose-m_8xb64-270e_coco-wholebody-256x192.py' # noqa: E501 -] +_base_ = ["../../../rtmpose/coco-wholebody/rtmpose-m_8xb64-270e_coco-wholebody-256x192.py"] # noqa: E501 # model settings find_unused_parameters = False @@ -12,37 +10,35 @@ logit = True # method details model = dict( _delete_=True, - type='DWPoseDistiller', - teacher_pretrained='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmpose/rtmpose-l_simcc-coco-wholebody_pt-aic-coco_270e-256x192-6f206314_20230124.pth', # noqa: E501 - teacher_cfg='configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/' - 'rtmpose-l_8xb64-270e_coco-wholebody-256x192.py', # noqa: E501 - student_cfg='configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/' - 'rtmpose-m_8xb64-270e_coco-wholebody-256x192.py', # noqa: E501 + type="DWPoseDistiller", + teacher_pretrained="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmpose/rtmpose-l_simcc-coco-wholebody_pt-aic-coco_270e-256x192-6f206314_20230124.pth", # noqa: E501 + teacher_cfg="configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/" "rtmpose-l_8xb64-270e_coco-wholebody-256x192.py", # noqa: E501 + student_cfg="configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/" "rtmpose-m_8xb64-270e_coco-wholebody-256x192.py", # noqa: E501 distill_cfg=[ - dict(methods=[ - dict( - type='FeaLoss', - name='loss_fea', - use_this=fea, - student_channels=768, - teacher_channels=1024, - alpha_fea=0.00007, - ) - ]), - dict(methods=[ - dict( - type='KDLoss', - name='loss_logit', - use_this=logit, - weight=0.1, - ) - ]), + dict( + methods=[ + dict( + type="FeaLoss", + name="loss_fea", + use_this=fea, + student_channels=768, + teacher_channels=1024, + alpha_fea=0.00007, + ) + ] + ), + dict( + methods=[ + dict( + type="KDLoss", + name="loss_logit", + use_this=logit, + weight=0.1, + ) + ] + ), ], - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), ) -optim_wrapper = dict(clip_grad=dict(max_norm=1., norm_type=2)) +optim_wrapper = dict(clip_grad=dict(max_norm=1.0, norm_type=2)) diff --git a/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s1_dis/dwpose_x_dis_l_coco-384x288.py b/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s1_dis/dwpose_x_dis_l_coco-384x288.py index 150cb2bbe62ba7117b79ecbd3cceec3f6a8f64bf..8b09cd459a1148cb830b6cb9654a2f1eaa23d57e 100644 --- a/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s1_dis/dwpose_x_dis_l_coco-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s1_dis/dwpose_x_dis_l_coco-384x288.py @@ -1,6 +1,4 @@ -_base_ = [ - '../../../rtmpose/coco-wholebody/rtmpose-l_8xb32-270e_coco-wholebody-384x288.py' # noqa: E501 -] +_base_ = ["../../../rtmpose/coco-wholebody/rtmpose-l_8xb32-270e_coco-wholebody-384x288.py"] # noqa: E501 # model settings find_unused_parameters = False @@ -12,37 +10,35 @@ logit = True # method details model = dict( _delete_=True, - type='DWPoseDistiller', - teacher_pretrained='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmpose-x_simcc-coco-wholebody_pt-body7_270e-384x288-401dfc90_20230629.pth', # noqa: E501 - teacher_cfg='configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/' - 'rtmpose-x_8xb32-270e_coco-wholebody-384x288.py', # noqa: E501 - student_cfg='configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/' - 'rtmpose-l_8xb32-270e_coco-wholebody-384x288.py', # noqa: E501 + type="DWPoseDistiller", + teacher_pretrained="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/rtmpose-x_simcc-coco-wholebody_pt-body7_270e-384x288-401dfc90_20230629.pth", # noqa: E501 + teacher_cfg="configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/" "rtmpose-x_8xb32-270e_coco-wholebody-384x288.py", # noqa: E501 + student_cfg="configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/" "rtmpose-l_8xb32-270e_coco-wholebody-384x288.py", # noqa: E501 distill_cfg=[ - dict(methods=[ - dict( - type='FeaLoss', - name='loss_fea', - use_this=fea, - student_channels=1024, - teacher_channels=1280, - alpha_fea=0.00007, - ) - ]), - dict(methods=[ - dict( - type='KDLoss', - name='loss_logit', - use_this=logit, - weight=0.1, - ) - ]), + dict( + methods=[ + dict( + type="FeaLoss", + name="loss_fea", + use_this=fea, + student_channels=1024, + teacher_channels=1280, + alpha_fea=0.00007, + ) + ] + ), + dict( + methods=[ + dict( + type="KDLoss", + name="loss_logit", + use_this=logit, + weight=0.1, + ) + ] + ), ], - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), ) -optim_wrapper = dict(clip_grad=dict(max_norm=1., norm_type=2)) +optim_wrapper = dict(clip_grad=dict(max_norm=1.0, norm_type=2)) diff --git a/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s2_dis/dwpose_l-ll_coco-384x288.py b/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s2_dis/dwpose_l-ll_coco-384x288.py index 6c63f99b0cee942b7667ac193d1b46e8c8b00196..9ee24c7beb5aac6967a1ec501704b08cdbf7c246 100644 --- a/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s2_dis/dwpose_l-ll_coco-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s2_dis/dwpose_l-ll_coco-384x288.py @@ -1,6 +1,4 @@ -_base_ = [ - '../../../rtmpose/coco-wholebody/rtmpose-l_8xb32-270e_coco-wholebody-384x288.py' # noqa: E501 -] +_base_ = ["../../../rtmpose/coco-wholebody/rtmpose-l_8xb32-270e_coco-wholebody-384x288.py"] # noqa: E501 # model settings find_unused_parameters = True @@ -16,30 +14,25 @@ train_cfg = dict(max_epochs=60, val_interval=10) # method details model = dict( _delete_=True, - type='DWPoseDistiller', + type="DWPoseDistiller", two_dis=second_dis, - teacher_pretrained='work_dirs/' - 'dwpose_x_dis_l_coco-384x288/dw-x-l_coco_384.pth', # noqa: E501 - teacher_cfg='configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/' - 'rtmpose-l_8xb32-270e_coco-wholebody-384x288.py', # noqa: E501 - student_cfg='configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/' - 'rtmpose-l_8xb32-270e_coco-wholebody-384x288.py', # noqa: E501 + teacher_pretrained="work_dirs/" "dwpose_x_dis_l_coco-384x288/dw-x-l_coco_384.pth", # noqa: E501 + teacher_cfg="configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/" "rtmpose-l_8xb32-270e_coco-wholebody-384x288.py", # noqa: E501 + student_cfg="configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/" "rtmpose-l_8xb32-270e_coco-wholebody-384x288.py", # noqa: E501 distill_cfg=[ - dict(methods=[ - dict( - type='KDLoss', - name='loss_logit', - use_this=logit, - weight=1, - ) - ]), + dict( + methods=[ + dict( + type="KDLoss", + name="loss_logit", + use_this=logit, + weight=1, + ) + ] + ), ], - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), train_cfg=train_cfg, ) -optim_wrapper = dict(clip_grad=dict(max_norm=1., norm_type=2)) +optim_wrapper = dict(clip_grad=dict(max_norm=1.0, norm_type=2)) diff --git a/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s2_dis/dwpose_m-mm_coco-256x192.py b/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s2_dis/dwpose_m-mm_coco-256x192.py index 943ec60184aa6b2b264eabc219385c93080a04bc..03b0ba7e369acb6e3e7d08a12b794450e22a0262 100644 --- a/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s2_dis/dwpose_m-mm_coco-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/dwpose/coco-wholebody/s2_dis/dwpose_m-mm_coco-256x192.py @@ -1,6 +1,4 @@ -_base_ = [ - '../../../rtmpose/coco-wholebody/rtmpose-m_8xb64-270e_coco-wholebody-256x192.py' # noqa: E501 -] +_base_ = ["../../../rtmpose/coco-wholebody/rtmpose-m_8xb64-270e_coco-wholebody-256x192.py"] # noqa: E501 # model settings find_unused_parameters = True @@ -16,30 +14,25 @@ train_cfg = dict(max_epochs=60, val_interval=10) # method details model = dict( _delete_=True, - type='DWPoseDistiller', + type="DWPoseDistiller", two_dis=second_dis, - teacher_pretrained='work_dirs/' - 'dwpose_l_dis_m_coco-256x192/dw-l-m_coco_256.pth', # noqa: E501 - teacher_cfg='configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/' - 'rtmpose-m_8xb64-270e_coco-wholebody-256x192.py', # noqa: E501 - student_cfg='configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/' - 'rtmpose-m_8xb64-270e_coco-wholebody-256x192.py', # noqa: E501 + teacher_pretrained="work_dirs/" "dwpose_l_dis_m_coco-256x192/dw-l-m_coco_256.pth", # noqa: E501 + teacher_cfg="configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/" "rtmpose-m_8xb64-270e_coco-wholebody-256x192.py", # noqa: E501 + student_cfg="configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/" "rtmpose-m_8xb64-270e_coco-wholebody-256x192.py", # noqa: E501 distill_cfg=[ - dict(methods=[ - dict( - type='KDLoss', - name='loss_logit', - use_this=logit, - weight=1, - ) - ]), + dict( + methods=[ + dict( + type="KDLoss", + name="loss_logit", + use_this=logit, + weight=1, + ) + ] + ), ], - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), train_cfg=train_cfg, ) -optim_wrapper = dict(clip_grad=dict(max_norm=1., norm_type=2)) +optim_wrapper = dict(clip_grad=dict(max_norm=1.0, norm_type=2)) diff --git a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_l_dis_m_coco-ubody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_l_dis_m_coco-ubody-256x192.py index b3a917b96e855b869844b45f3cc02910ce6b1d52..56cf112e97aefba028afeb8daab20e9d0fdbe864 100644 --- a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_l_dis_m_coco-ubody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_l_dis_m_coco-ubody-256x192.py @@ -1,6 +1,4 @@ -_base_ = [ - '../../../rtmpose/ubody/rtmpose-m_8xb64-270e_coco-ubody-wholebody-256x192.py' # noqa: E501 -] +_base_ = ["../../../rtmpose/ubody/rtmpose-m_8xb64-270e_coco-ubody-wholebody-256x192.py"] # noqa: E501 # model settings find_unused_parameters = False @@ -12,37 +10,35 @@ logit = True # method details model = dict( _delete_=True, - type='DWPoseDistiller', - teacher_pretrained='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmpose-l_ucoco_256x192-95bb32f5_20230822.pth', # noqa: E501 - teacher_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 - student_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-m_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 + type="DWPoseDistiller", + teacher_pretrained="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/rtmpose-l_ucoco_256x192-95bb32f5_20230822.pth", # noqa: E501 + teacher_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 + student_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-m_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 distill_cfg=[ - dict(methods=[ - dict( - type='FeaLoss', - name='loss_fea', - use_this=fea, - student_channels=768, - teacher_channels=1024, - alpha_fea=0.00007, - ) - ]), - dict(methods=[ - dict( - type='KDLoss', - name='loss_logit', - use_this=logit, - weight=0.1, - ) - ]), + dict( + methods=[ + dict( + type="FeaLoss", + name="loss_fea", + use_this=fea, + student_channels=768, + teacher_channels=1024, + alpha_fea=0.00007, + ) + ] + ), + dict( + methods=[ + dict( + type="KDLoss", + name="loss_logit", + use_this=logit, + weight=0.1, + ) + ] + ), ], - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), ) -optim_wrapper = dict(clip_grad=dict(max_norm=1., norm_type=2)) +optim_wrapper = dict(clip_grad=dict(max_norm=1.0, norm_type=2)) diff --git a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_l_dis_s_coco-ubody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_l_dis_s_coco-ubody-256x192.py index c90a0ea6a7693928565840b250063663e54cf3bb..890a2024edb7869358dc5d3fd4b42766c71f85c7 100644 --- a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_l_dis_s_coco-ubody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_l_dis_s_coco-ubody-256x192.py @@ -1,6 +1,4 @@ -_base_ = [ - '../../../rtmpose/ubody/rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py' # noqa: E501 -] +_base_ = ["../../../rtmpose/ubody/rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py"] # noqa: E501 # model settings find_unused_parameters = False @@ -12,37 +10,35 @@ logit = True # method details model = dict( _delete_=True, - type='DWPoseDistiller', - teacher_pretrained='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmpose-l_ucoco_256x192-95bb32f5_20230822.pth', # noqa: E501 - teacher_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 - student_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 + type="DWPoseDistiller", + teacher_pretrained="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/rtmpose-l_ucoco_256x192-95bb32f5_20230822.pth", # noqa: E501 + teacher_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 + student_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 distill_cfg=[ - dict(methods=[ - dict( - type='FeaLoss', - name='loss_fea', - use_this=fea, - student_channels=512, - teacher_channels=1024, - alpha_fea=0.00007, - ) - ]), - dict(methods=[ - dict( - type='KDLoss', - name='loss_logit', - use_this=logit, - weight=0.1, - ) - ]), + dict( + methods=[ + dict( + type="FeaLoss", + name="loss_fea", + use_this=fea, + student_channels=512, + teacher_channels=1024, + alpha_fea=0.00007, + ) + ] + ), + dict( + methods=[ + dict( + type="KDLoss", + name="loss_logit", + use_this=logit, + weight=0.1, + ) + ] + ), ], - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), ) -optim_wrapper = dict(clip_grad=dict(max_norm=1., norm_type=2)) +optim_wrapper = dict(clip_grad=dict(max_norm=1.0, norm_type=2)) diff --git a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_l_dis_t_coco-ubody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_l_dis_t_coco-ubody-256x192.py index 01618f146a0a150a7fea67e4c0313087ae688312..c8d3b5ab3ada30649f356a6ff15e04572e9028e5 100644 --- a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_l_dis_t_coco-ubody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_l_dis_t_coco-ubody-256x192.py @@ -1,6 +1,4 @@ -_base_ = [ - '../../../rtmpose/ubody/rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py' # noqa: E501 -] +_base_ = ["../../../rtmpose/ubody/rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py"] # noqa: E501 # model settings find_unused_parameters = False @@ -12,37 +10,35 @@ logit = True # method details model = dict( _delete_=True, - type='DWPoseDistiller', - teacher_pretrained='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmpose-l_ucoco_256x192-95bb32f5_20230822.pth', # noqa: E501 - teacher_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 - student_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-t_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 + type="DWPoseDistiller", + teacher_pretrained="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/rtmpose-l_ucoco_256x192-95bb32f5_20230822.pth", # noqa: E501 + teacher_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 + student_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-t_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 distill_cfg=[ - dict(methods=[ - dict( - type='FeaLoss', - name='loss_fea', - use_this=fea, - student_channels=384, - teacher_channels=1024, - alpha_fea=0.00007, - ) - ]), - dict(methods=[ - dict( - type='KDLoss', - name='loss_logit', - use_this=logit, - weight=0.1, - ) - ]), + dict( + methods=[ + dict( + type="FeaLoss", + name="loss_fea", + use_this=fea, + student_channels=384, + teacher_channels=1024, + alpha_fea=0.00007, + ) + ] + ), + dict( + methods=[ + dict( + type="KDLoss", + name="loss_logit", + use_this=logit, + weight=0.1, + ) + ] + ), ], - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), ) -optim_wrapper = dict(clip_grad=dict(max_norm=1., norm_type=2)) +optim_wrapper = dict(clip_grad=dict(max_norm=1.0, norm_type=2)) diff --git a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_x_dis_l_coco-ubody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_x_dis_l_coco-ubody-256x192.py index 85a287324b647c6b19dde1486093b68940df72ff..af064a9c3d37804af5a6e454b6c9c69e9602540d 100644 --- a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_x_dis_l_coco-ubody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/dwpose_x_dis_l_coco-ubody-256x192.py @@ -1,6 +1,4 @@ -_base_ = [ - '../../../rtmpose/ubody/rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py' # noqa: E501 -] +_base_ = ["../../../rtmpose/ubody/rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py"] # noqa: E501 # model settings find_unused_parameters = False @@ -12,37 +10,35 @@ logit = True # method details model = dict( _delete_=True, - type='DWPoseDistiller', - teacher_pretrained='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmpose-x_ucoco_256x192-05f5bcb7_20230822.pth', # noqa: E501 - teacher_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-x_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 - student_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 + type="DWPoseDistiller", + teacher_pretrained="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/rtmpose-x_ucoco_256x192-05f5bcb7_20230822.pth", # noqa: E501 + teacher_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-x_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 + student_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 distill_cfg=[ - dict(methods=[ - dict( - type='FeaLoss', - name='loss_fea', - use_this=fea, - student_channels=1024, - teacher_channels=1280, - alpha_fea=0.00007, - ) - ]), - dict(methods=[ - dict( - type='KDLoss', - name='loss_logit', - use_this=logit, - weight=0.1, - ) - ]), + dict( + methods=[ + dict( + type="FeaLoss", + name="loss_fea", + use_this=fea, + student_channels=1024, + teacher_channels=1280, + alpha_fea=0.00007, + ) + ] + ), + dict( + methods=[ + dict( + type="KDLoss", + name="loss_logit", + use_this=logit, + weight=0.1, + ) + ] + ), ], - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), ) -optim_wrapper = dict(clip_grad=dict(max_norm=1., norm_type=2)) +optim_wrapper = dict(clip_grad=dict(max_norm=1.0, norm_type=2)) diff --git a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/rtmpose_x_dis_l_coco-ubody-384x288.py b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/rtmpose_x_dis_l_coco-ubody-384x288.py index acde64a03a6b1f09689766eae75548c09f9b26a7..a80ee0302e716344650f3d2572dd87792bad2902 100644 --- a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/rtmpose_x_dis_l_coco-ubody-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s1_dis/rtmpose_x_dis_l_coco-ubody-384x288.py @@ -1,6 +1,4 @@ -_base_ = [ - '../../../rtmpose/ubody/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py' # noqa: E501 -] +_base_ = ["../../../rtmpose/ubody/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py"] # noqa: E501 # model settings find_unused_parameters = False @@ -12,37 +10,35 @@ logit = True # method details model = dict( _delete_=True, - type='DWPoseDistiller', - teacher_pretrained='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmpose-x_ucoco_384x288-f5b50679_20230822.pth', # noqa: E501 - teacher_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-x_8xb32-270e_coco-ubody-wholebody-384x288.py', # noqa: E501 - student_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py', # noqa: E501 + type="DWPoseDistiller", + teacher_pretrained="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/rtmpose-x_ucoco_384x288-f5b50679_20230822.pth", # noqa: E501 + teacher_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-x_8xb32-270e_coco-ubody-wholebody-384x288.py", # noqa: E501 + student_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py", # noqa: E501 distill_cfg=[ - dict(methods=[ - dict( - type='FeaLoss', - name='loss_fea', - use_this=fea, - student_channels=1024, - teacher_channels=1280, - alpha_fea=0.00007, - ) - ]), - dict(methods=[ - dict( - type='KDLoss', - name='loss_logit', - use_this=logit, - weight=0.1, - ) - ]), + dict( + methods=[ + dict( + type="FeaLoss", + name="loss_fea", + use_this=fea, + student_channels=1024, + teacher_channels=1280, + alpha_fea=0.00007, + ) + ] + ), + dict( + methods=[ + dict( + type="KDLoss", + name="loss_logit", + use_this=logit, + weight=0.1, + ) + ] + ), ], - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), ) -optim_wrapper = dict(clip_grad=dict(max_norm=1., norm_type=2)) +optim_wrapper = dict(clip_grad=dict(max_norm=1.0, norm_type=2)) diff --git a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_l-ll_coco-ubody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_l-ll_coco-ubody-256x192.py index e3f456a2b9d76d430ae0d894b62c8de6436b6827..2ddb324c6b8d636d932c88ec0a0c259900cc36c9 100644 --- a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_l-ll_coco-ubody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_l-ll_coco-ubody-256x192.py @@ -1,6 +1,4 @@ -_base_ = [ - '../../../rtmpose/ubody/rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py' # noqa: E501 -] +_base_ = ["../../../rtmpose/ubody/rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py"] # noqa: E501 # model settings find_unused_parameters = True @@ -16,30 +14,25 @@ train_cfg = dict(max_epochs=60, val_interval=10) # method details model = dict( _delete_=True, - type='DWPoseDistiller', + type="DWPoseDistiller", two_dis=second_dis, - teacher_pretrained='work_dirs/' - 'dwpose_x_dis_l_coco-ubody-256x192/dw-x-l_ucoco_256.pth', # noqa: E501 - teacher_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 - student_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 + teacher_pretrained="work_dirs/" "dwpose_x_dis_l_coco-ubody-256x192/dw-x-l_ucoco_256.pth", # noqa: E501 + teacher_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 + student_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 distill_cfg=[ - dict(methods=[ - dict( - type='KDLoss', - name='loss_logit', - use_this=logit, - weight=1, - ) - ]), + dict( + methods=[ + dict( + type="KDLoss", + name="loss_logit", + use_this=logit, + weight=1, + ) + ] + ), ], - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), train_cfg=train_cfg, ) -optim_wrapper = dict(clip_grad=dict(max_norm=1., norm_type=2)) +optim_wrapper = dict(clip_grad=dict(max_norm=1.0, norm_type=2)) diff --git a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_l-ll_coco-ubody-384x288.py b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_l-ll_coco-ubody-384x288.py index 3815fad1e2558c7f44b63ad2170021007287e6cb..91c89b612d73287082650c8b47df1d4ea81dbd4a 100644 --- a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_l-ll_coco-ubody-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_l-ll_coco-ubody-384x288.py @@ -1,6 +1,4 @@ -_base_ = [ - '../../../rtmpose/ubody/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py' # noqa: E501 -] +_base_ = ["../../../rtmpose/ubody/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py"] # noqa: E501 # model settings find_unused_parameters = True @@ -16,30 +14,25 @@ train_cfg = dict(max_epochs=60, val_interval=10) # method details model = dict( _delete_=True, - type='DWPoseDistiller', + type="DWPoseDistiller", two_dis=second_dis, - teacher_pretrained='work_dirs/' - 'dwpose_x_dis_l_coco-ubody-384x288/dw-x-l_ucoco_384.pth', # noqa: E501 - teacher_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py', # noqa: E501 - student_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py', # noqa: E501 + teacher_pretrained="work_dirs/" "dwpose_x_dis_l_coco-ubody-384x288/dw-x-l_ucoco_384.pth", # noqa: E501 + teacher_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py", # noqa: E501 + student_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py", # noqa: E501 distill_cfg=[ - dict(methods=[ - dict( - type='KDLoss', - name='loss_logit', - use_this=logit, - weight=1, - ) - ]), + dict( + methods=[ + dict( + type="KDLoss", + name="loss_logit", + use_this=logit, + weight=1, + ) + ] + ), ], - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), train_cfg=train_cfg, ) -optim_wrapper = dict(clip_grad=dict(max_norm=1., norm_type=2)) +optim_wrapper = dict(clip_grad=dict(max_norm=1.0, norm_type=2)) diff --git a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_m-mm_coco-ubody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_m-mm_coco-ubody-256x192.py index 1e6834ffca3593604ad2b550ad7c0c8e5481553d..0bab5a9c126423959d0cf898d146c26479202ef5 100644 --- a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_m-mm_coco-ubody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_m-mm_coco-ubody-256x192.py @@ -1,6 +1,4 @@ -_base_ = [ - '../../../rtmpose/ubody/rtmpose-m_8xb64-270e_coco-ubody-wholebody-256x192.py' # noqa: E501 -] +_base_ = ["../../../rtmpose/ubody/rtmpose-m_8xb64-270e_coco-ubody-wholebody-256x192.py"] # noqa: E501 # model settings find_unused_parameters = True @@ -16,30 +14,25 @@ train_cfg = dict(max_epochs=60, val_interval=10) # method details model = dict( _delete_=True, - type='DWPoseDistiller', + type="DWPoseDistiller", two_dis=second_dis, - teacher_pretrained='work_dirs/' - 'dwpose_l_dis_m_coco-ubody-256x192/dw-l-m_ucoco_256.pth', # noqa: E501 - teacher_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-m_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 - student_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-m_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 + teacher_pretrained="work_dirs/" "dwpose_l_dis_m_coco-ubody-256x192/dw-l-m_ucoco_256.pth", # noqa: E501 + teacher_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-m_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 + student_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-m_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 distill_cfg=[ - dict(methods=[ - dict( - type='KDLoss', - name='loss_logit', - use_this=logit, - weight=1, - ) - ]), + dict( + methods=[ + dict( + type="KDLoss", + name="loss_logit", + use_this=logit, + weight=1, + ) + ] + ), ], - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), train_cfg=train_cfg, ) -optim_wrapper = dict(clip_grad=dict(max_norm=1., norm_type=2)) +optim_wrapper = dict(clip_grad=dict(max_norm=1.0, norm_type=2)) diff --git a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_s-ss_coco-ubody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_s-ss_coco-ubody-256x192.py index 24a4a94642af4e6858c369787fd22b7d530cba51..d62cead9c9fe63b1b5532a7a19dbd2ccc7a1b493 100644 --- a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_s-ss_coco-ubody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_s-ss_coco-ubody-256x192.py @@ -1,6 +1,4 @@ -_base_ = [ - '../../../rtmpose/ubody/rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py' # noqa: E501 -] +_base_ = ["../../../rtmpose/ubody/rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py"] # noqa: E501 # model settings find_unused_parameters = True @@ -16,30 +14,25 @@ train_cfg = dict(max_epochs=60, val_interval=10) # method details model = dict( _delete_=True, - type='DWPoseDistiller', + type="DWPoseDistiller", two_dis=second_dis, - teacher_pretrained='work_dirs/' - 'dwpose_l_dis_s_coco-ubody-256x192/dw-l-s_ucoco_256.pth', # noqa: E501 - teacher_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 - student_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 + teacher_pretrained="work_dirs/" "dwpose_l_dis_s_coco-ubody-256x192/dw-l-s_ucoco_256.pth", # noqa: E501 + teacher_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 + student_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 distill_cfg=[ - dict(methods=[ - dict( - type='KDLoss', - name='loss_logit', - use_this=logit, - weight=1, - ) - ]), + dict( + methods=[ + dict( + type="KDLoss", + name="loss_logit", + use_this=logit, + weight=1, + ) + ] + ), ], - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), train_cfg=train_cfg, ) -optim_wrapper = dict(clip_grad=dict(max_norm=1., norm_type=2)) +optim_wrapper = dict(clip_grad=dict(max_norm=1.0, norm_type=2)) diff --git a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_t-tt_coco-ubody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_t-tt_coco-ubody-256x192.py index c7c322ece2662b943d399afb0854fa6766478e24..eb93e70e96c091d363d32723baabc3c3f12f7a3e 100644 --- a/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_t-tt_coco-ubody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/dwpose/ubody/s2_dis/dwpose_t-tt_coco-ubody-256x192.py @@ -1,6 +1,4 @@ -_base_ = [ - '../../../rtmpose/ubody/rtmpose-t_8xb64-270e_coco-ubody-wholebody-256x192.py' # noqa: E501 -] +_base_ = ["../../../rtmpose/ubody/rtmpose-t_8xb64-270e_coco-ubody-wholebody-256x192.py"] # noqa: E501 # model settings find_unused_parameters = True @@ -16,30 +14,25 @@ train_cfg = dict(max_epochs=60, val_interval=10) # method details model = dict( _delete_=True, - type='DWPoseDistiller', + type="DWPoseDistiller", two_dis=second_dis, - teacher_pretrained='work_dirs/' - 'dwpose_l_dis_t_coco-ubody-256x192/dw-l-t_ucoco_256.pth', # noqa: E501 - teacher_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-t_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 - student_cfg='configs/wholebody_2d_keypoint/rtmpose/ubody/' - 'rtmpose-t_8xb64-270e_coco-ubody-wholebody-256x192.py', # noqa: E501 + teacher_pretrained="work_dirs/" "dwpose_l_dis_t_coco-ubody-256x192/dw-l-t_ucoco_256.pth", # noqa: E501 + teacher_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-t_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 + student_cfg="configs/wholebody_2d_keypoint/rtmpose/ubody/" "rtmpose-t_8xb64-270e_coco-ubody-wholebody-256x192.py", # noqa: E501 distill_cfg=[ - dict(methods=[ - dict( - type='KDLoss', - name='loss_logit', - use_this=logit, - weight=1, - ) - ]), + dict( + methods=[ + dict( + type="KDLoss", + name="loss_logit", + use_this=logit, + weight=1, + ) + ] + ), ], - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), train_cfg=train_cfg, ) -optim_wrapper = dict(clip_grad=dict(max_norm=1., norm_type=2)) +optim_wrapper = dict(clip_grad=dict(max_norm=1.0, norm_type=2)) diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-l_8xb1024-270e_cocktail14-256x192.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-l_8xb1024-270e_cocktail14-256x192.py index 59351d5f4a1920834a1217b7279b70512f151a00..1727e6d23a9e66fc869d1f8796006bce6b72b3eb 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-l_8xb1024-270e_cocktail14-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-l_8xb1024-270e_cocktail14-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # common setting num_keypoints = 133 @@ -16,67 +16,54 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.1), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.1), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=8192) # codec settings -codec = dict( - type='SimCCLabel', - input_size=input_size, - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=input_size, sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='CSPNeXt', - arch='P5', + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., + deepen_factor=1.0, + widen_factor=1.0, channel_attention=True, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="BN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmpose-l_simcc-ucoco_dw-ucoco_270e-256x192-4d6dfc62_20230728.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/rtmpose-l_simcc-ucoco_dw-ucoco_270e-256x192-4d6dfc62_20230728.pth", # noqa + ), + ), neck=dict( - type='CSPNeXtPAFPN', + type="CSPNeXtPAFPN", in_channels=[256, 512, 1024], out_channels=None, out_indices=( @@ -85,112 +72,90 @@ model = dict( ), num_csp_blocks=2, expand_ratio=0.5, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU', inplace=True)), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU", inplace=True), + ), head=dict( - type='RTMWHead', + type="RTMWHead", in_channels=1024, out_channels=num_keypoints, input_size=input_size, in_featuremap_size=tuple([s // 32 for s in input_size]), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), loss=dict( - type='KLDiscretLoss', + type="KLDiscretLoss", use_target_weight=True, - beta=1., + beta=1.0, label_softmax=True, - label_beta=10., + label_beta=10.0, mask=list(range(23, 91)), mask_weight=0.5, ), - decoder=codec), - test_cfg=dict(flip_test=True)) + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PhotometricDistortion"), dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PhotometricDistortion'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] # mapping -aic_coco133 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), - (7, 14), (8, 16), (9, 11), (10, 13), (11, 15)] +aic_coco133 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15)] -crowdpose_coco133 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), - (7, 12), (8, 13), (9, 14), (10, 15), (11, 16)] +crowdpose_coco133 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), (7, 12), (8, 13), (9, 14), (10, 15), (11, 16)] mpii_coco133 = [ (0, 16), @@ -222,11 +187,9 @@ jhmdb_coco133 = [ (14, 15), ] -halpe_coco133 = [(i, i) - for i in range(17)] + [(20, 17), (21, 20), (22, 18), (23, 21), - (24, 19), - (25, 22)] + [(i, i - 3) - for i in range(26, 136)] +halpe_coco133 = ( + [(i, i) for i in range(17)] + [(20, 17), (21, 20), (22, 18), (23, 21), (24, 19), (25, 22)] + [(i, i - 3) for i in range(26, 136)] +) posetrack_coco133 = [ (0, 0), @@ -246,246 +209,215 @@ posetrack_coco133 = [ (16, 16), ] -humanart_coco133 = [(i, i) for i in range(17)] + [(17, 99), (18, 120), - (19, 17), (20, 20)] +humanart_coco133 = [(i, i) for i in range(17)] + [(17, 99), (18, 120), (19, 17), (20, 20)] # train datasets dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_coco133) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_coco133)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_coco133) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_coco133)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_coco133) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_coco133)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_coco133) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_coco133)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_coco133) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_coco133)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_coco133) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_coco133)], ) dataset_humanart = dict( - type='HumanArt21Dataset', + type="HumanArt21Dataset", data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart.json', - filter_cfg=dict(scenes=['real_human']), - data_prefix=dict(img='pose/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=humanart_coco133) - ]) + ann_file="HumanArt/annotations/training_humanart.json", + filter_cfg=dict(scenes=["real_human"]), + data_prefix=dict(img="pose/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=humanart_coco133)], +) ubody_scenes = [ - 'Magic_show', 'Entertainment', 'ConductMusic', 'Online_class', 'TalkShow', - 'Speech', 'Fitness', 'Interview', 'Olympic', 'TVShow', 'Singing', - 'SignLanguage', 'Movie', 'LiveVlog', 'VideoConference' + "Magic_show", + "Entertainment", + "ConductMusic", + "Online_class", + "TalkShow", + "Speech", + "Fitness", + "Interview", + "Olympic", + "TVShow", + "Singing", + "SignLanguage", + "Movie", + "LiveVlog", + "VideoConference", ] ubody_datasets = [] for scene in ubody_scenes: each = dict( - type='UBody2dDataset', + type="UBody2dDataset", data_root=data_root, data_mode=data_mode, - ann_file=f'Ubody/annotations/{scene}/train_annotations.json', - data_prefix=dict(img='pose/UBody/images/'), + ann_file=f"Ubody/annotations/{scene}/train_annotations.json", + data_prefix=dict(img="pose/UBody/images/"), pipeline=[], - sample_interval=10) + sample_interval=10, + ) ubody_datasets.append(each) dataset_ubody = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/ubody2d.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/ubody2d.py"), datasets=ubody_datasets, pipeline=[], test_mode=False, ) face_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale', padding=1.25), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[1.5, 2.0], - rotate_factor=0), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale", padding=1.25), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[1.5, 2.0], rotate_factor=0), ] -wflw_coco133 = [(i * 2, 23 + i) - for i in range(17)] + [(33 + i, 40 + i) for i in range(5)] + [ - (42 + i, 45 + i) for i in range(5) - ] + [(51 + i, 50 + i) - for i in range(9)] + [(60, 59), (61, 60), (63, 61), - (64, 62), (65, 63), (67, 64), - (68, 65), (69, 66), (71, 67), - (72, 68), (73, 69), - (75, 70)] + [(76 + i, 71 + i) - for i in range(20)] +wflw_coco133 = ( + [(i * 2, 23 + i) for i in range(17)] + + [(33 + i, 40 + i) for i in range(5)] + + [(42 + i, 45 + i) for i in range(5)] + + [(51 + i, 50 + i) for i in range(9)] + + [(60, 59), (61, 60), (63, 61), (64, 62), (65, 63), (67, 64), (68, 65), (69, 66), (71, 67), (72, 68), (73, 69), (75, 70)] + + [(76 + i, 71 + i) for i in range(20)] +) dataset_wflw = dict( - type='WFLWDataset', + type="WFLWDataset", data_root=data_root, data_mode=data_mode, - ann_file='wflw/annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='pose/WFLW/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=wflw_coco133), *face_pipeline - ], + ann_file="wflw/annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="pose/WFLW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=wflw_coco133), *face_pipeline], ) mapping_300w_coco133 = [(i, 23 + i) for i in range(68)] dataset_300w = dict( - type='Face300WDataset', + type="Face300WDataset", data_root=data_root, data_mode=data_mode, - ann_file='300w/annotations/face_landmarks_300w_train.json', - data_prefix=dict(img='pose/300w/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mapping_300w_coco133), *face_pipeline - ], + ann_file="300w/annotations/face_landmarks_300w_train.json", + data_prefix=dict(img="pose/300w/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mapping_300w_coco133), *face_pipeline], ) -cofw_coco133 = [(0, 40), (2, 44), (4, 42), (1, 49), (3, 45), (6, 47), (8, 59), - (10, 62), (9, 68), (11, 65), (18, 54), (19, 58), (20, 53), - (21, 56), (22, 71), (23, 77), (24, 74), (25, 85), (26, 89), - (27, 80), (28, 31)] +cofw_coco133 = [ + (0, 40), + (2, 44), + (4, 42), + (1, 49), + (3, 45), + (6, 47), + (8, 59), + (10, 62), + (9, 68), + (11, 65), + (18, 54), + (19, 58), + (20, 53), + (21, 56), + (22, 71), + (23, 77), + (24, 74), + (25, 85), + (26, 89), + (27, 80), + (28, 31), +] dataset_cofw = dict( - type='COFWDataset', + type="COFWDataset", data_root=data_root, data_mode=data_mode, - ann_file='cofw/annotations/cofw_train.json', - data_prefix=dict(img='pose/COFW/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=cofw_coco133), *face_pipeline - ], + ann_file="cofw/annotations/cofw_train.json", + data_prefix=dict(img="pose/COFW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=cofw_coco133), *face_pipeline], ) -lapa_coco133 = [(i * 2, 23 + i) for i in range(17)] + [ - (33 + i, 40 + i) for i in range(5) -] + [(42 + i, 45 + i) for i in range(5)] + [ - (51 + i, 50 + i) for i in range(4) -] + [(58 + i, 54 + i) for i in range(5)] + [(66, 59), (67, 60), (69, 61), - (70, 62), (71, 63), (73, 64), - (75, 65), (76, 66), (78, 67), - (79, 68), (80, 69), - (82, 70)] + [(84 + i, 71 + i) - for i in range(20)] +lapa_coco133 = ( + [(i * 2, 23 + i) for i in range(17)] + + [(33 + i, 40 + i) for i in range(5)] + + [(42 + i, 45 + i) for i in range(5)] + + [(51 + i, 50 + i) for i in range(4)] + + [(58 + i, 54 + i) for i in range(5)] + + [(66, 59), (67, 60), (69, 61), (70, 62), (71, 63), (73, 64), (75, 65), (76, 66), (78, 67), (79, 68), (80, 69), (82, 70)] + + [(84 + i, 71 + i) for i in range(20)] +) dataset_lapa = dict( - type='LapaDataset', + type="LapaDataset", data_root=data_root, data_mode=data_mode, - ann_file='LaPa/annotations/lapa_trainval.json', - data_prefix=dict(img='pose/LaPa/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=lapa_coco133), *face_pipeline - ], + ann_file="LaPa/annotations/lapa_trainval.json", + data_prefix=dict(img="pose/LaPa/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=lapa_coco133), *face_pipeline], ) dataset_wb = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[dataset_coco, dataset_halpe, dataset_ubody], pipeline=[], test_mode=False, ) dataset_body = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[ dataset_aic, dataset_crowdpose, @@ -499,8 +431,8 @@ dataset_body = dict( ) dataset_face = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[ dataset_wflw, dataset_300w, @@ -512,45 +444,59 @@ dataset_face = dict( ) hand_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[1.5, 2.0], - rotate_factor=0), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[1.5, 2.0], rotate_factor=0), ] -interhand_left = [(21, 95), (22, 94), (23, 93), (24, 92), (25, 99), (26, 98), - (27, 97), (28, 96), (29, 103), (30, 102), (31, 101), - (32, 100), (33, 107), (34, 106), (35, 105), (36, 104), - (37, 111), (38, 110), (39, 109), (40, 108), (41, 91)] +interhand_left = [ + (21, 95), + (22, 94), + (23, 93), + (24, 92), + (25, 99), + (26, 98), + (27, 97), + (28, 96), + (29, 103), + (30, 102), + (31, 101), + (32, 100), + (33, 107), + (34, 106), + (35, 105), + (36, 104), + (37, 111), + (38, 110), + (39, 109), + (40, 108), + (41, 91), +] interhand_right = [(i - 21, j + 21) for i, j in interhand_left] interhand_coco133 = interhand_right + interhand_left dataset_interhand2d = dict( - type='InterHand2DDoubleDataset', + type="InterHand2DDoubleDataset", data_root=data_root, data_mode=data_mode, - ann_file='interhand26m/annotations/all/InterHand2.6M_train_data.json', - camera_param_file='interhand26m/annotations/all/' - 'InterHand2.6M_train_camera.json', - joint_file='interhand26m/annotations/all/' - 'InterHand2.6M_train_joint_3d.json', - data_prefix=dict(img='interhand2.6m/images/train/'), + ann_file="interhand26m/annotations/all/InterHand2.6M_train_data.json", + camera_param_file="interhand26m/annotations/all/" "InterHand2.6M_train_camera.json", + joint_file="interhand26m/annotations/all/" "InterHand2.6M_train_joint_3d.json", + data_prefix=dict(img="interhand2.6m/images/train/"), sample_interval=10, pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=num_keypoints, mapping=interhand_coco133, - ), *hand_pipeline + ), + *hand_pipeline, ], ) dataset_hand = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[dataset_interhand2d], pipeline=[], test_mode=False, @@ -564,52 +510,42 @@ train_dataloader = dict( num_workers=4, pin_memory=False, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=train_datasets, pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=val_batch_size, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoWholeBodyDataset', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='data/detection/coco/val2017/'), + type="CocoWholeBodyDataset", + ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="data/detection/coco/val2017/"), pipeline=val_pipeline, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - test_mode=True)) + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + test_mode=True, + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-l_8xb320-270e_cocktail14-384x288.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-l_8xb320-270e_cocktail14-384x288.py index a687f89ef62fa5b673c273d6702617619d7a3482..a676570361c54af2c9e9d488e6b652ba089524fe 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-l_8xb320-270e_cocktail14-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-l_8xb320-270e_cocktail14-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # common setting num_keypoints = 133 @@ -16,29 +16,25 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.1), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.1), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 150 to 300 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size @@ -46,39 +42,37 @@ auto_scale_lr = dict(base_batch_size=2560) # codec settings codec = dict( - type='SimCCLabel', + type="SimCCLabel", input_size=input_size, - sigma=(6., 6.93), + sigma=(6.0, 6.93), simcc_split_ratio=2.0, normalize=False, use_dark=False, - decode_visibility=True) + decode_visibility=True, +) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='CSPNeXt', - arch='P5', + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., + deepen_factor=1.0, + widen_factor=1.0, channel_attention=True, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="BN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmpose-l_simcc-ucoco_dw-ucoco_270e-256x192-4d6dfc62_20230728.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/rtmpose-l_simcc-ucoco_dw-ucoco_270e-256x192-4d6dfc62_20230728.pth", # noqa + ), + ), neck=dict( - type='CSPNeXtPAFPN', + type="CSPNeXtPAFPN", in_channels=[256, 512, 1024], out_channels=None, out_indices=( @@ -87,112 +81,90 @@ model = dict( ), num_csp_blocks=2, expand_ratio=0.5, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU', inplace=True)), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU", inplace=True), + ), head=dict( - type='RTMWHead', + type="RTMWHead", in_channels=1024, out_channels=num_keypoints, input_size=input_size, in_featuremap_size=tuple([s // 32 for s in input_size]), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), loss=dict( - type='KLDiscretLoss', + type="KLDiscretLoss", use_target_weight=True, - beta=1., + beta=1.0, label_softmax=True, - label_beta=10., + label_beta=10.0, mask=list(range(23, 91)), mask_weight=0.5, ), - decoder=codec), - test_cfg=dict(flip_test=True)) + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PhotometricDistortion'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PhotometricDistortion"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] # mapping -aic_coco133 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), - (7, 14), (8, 16), (9, 11), (10, 13), (11, 15)] +aic_coco133 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15)] -crowdpose_coco133 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), - (7, 12), (8, 13), (9, 14), (10, 15), (11, 16)] +crowdpose_coco133 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), (7, 12), (8, 13), (9, 14), (10, 15), (11, 16)] mpii_coco133 = [ (0, 16), @@ -224,11 +196,9 @@ jhmdb_coco133 = [ (14, 15), ] -halpe_coco133 = [(i, i) - for i in range(17)] + [(20, 17), (21, 20), (22, 18), (23, 21), - (24, 19), - (25, 22)] + [(i, i - 3) - for i in range(26, 136)] +halpe_coco133 = ( + [(i, i) for i in range(17)] + [(20, 17), (21, 20), (22, 18), (23, 21), (24, 19), (25, 22)] + [(i, i - 3) for i in range(26, 136)] +) posetrack_coco133 = [ (0, 0), @@ -248,246 +218,215 @@ posetrack_coco133 = [ (16, 16), ] -humanart_coco133 = [(i, i) for i in range(17)] + [(17, 99), (18, 120), - (19, 17), (20, 20)] +humanart_coco133 = [(i, i) for i in range(17)] + [(17, 99), (18, 120), (19, 17), (20, 20)] # train datasets dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_coco133) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_coco133)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_coco133) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_coco133)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_coco133) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_coco133)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_coco133) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_coco133)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_coco133) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_coco133)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_coco133) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_coco133)], ) dataset_humanart = dict( - type='HumanArt21Dataset', + type="HumanArt21Dataset", data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart.json', - filter_cfg=dict(scenes=['real_human']), - data_prefix=dict(img='pose/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=humanart_coco133) - ]) + ann_file="HumanArt/annotations/training_humanart.json", + filter_cfg=dict(scenes=["real_human"]), + data_prefix=dict(img="pose/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=humanart_coco133)], +) ubody_scenes = [ - 'Magic_show', 'Entertainment', 'ConductMusic', 'Online_class', 'TalkShow', - 'Speech', 'Fitness', 'Interview', 'Olympic', 'TVShow', 'Singing', - 'SignLanguage', 'Movie', 'LiveVlog', 'VideoConference' + "Magic_show", + "Entertainment", + "ConductMusic", + "Online_class", + "TalkShow", + "Speech", + "Fitness", + "Interview", + "Olympic", + "TVShow", + "Singing", + "SignLanguage", + "Movie", + "LiveVlog", + "VideoConference", ] ubody_datasets = [] for scene in ubody_scenes: each = dict( - type='UBody2dDataset', + type="UBody2dDataset", data_root=data_root, data_mode=data_mode, - ann_file=f'Ubody/annotations/{scene}/train_annotations.json', - data_prefix=dict(img='pose/UBody/images/'), + ann_file=f"Ubody/annotations/{scene}/train_annotations.json", + data_prefix=dict(img="pose/UBody/images/"), pipeline=[], - sample_interval=10) + sample_interval=10, + ) ubody_datasets.append(each) dataset_ubody = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/ubody2d.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/ubody2d.py"), datasets=ubody_datasets, pipeline=[], test_mode=False, ) face_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale', padding=1.25), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[1.5, 2.0], - rotate_factor=0), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale", padding=1.25), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[1.5, 2.0], rotate_factor=0), ] -wflw_coco133 = [(i * 2, 23 + i) - for i in range(17)] + [(33 + i, 40 + i) for i in range(5)] + [ - (42 + i, 45 + i) for i in range(5) - ] + [(51 + i, 50 + i) - for i in range(9)] + [(60, 59), (61, 60), (63, 61), - (64, 62), (65, 63), (67, 64), - (68, 65), (69, 66), (71, 67), - (72, 68), (73, 69), - (75, 70)] + [(76 + i, 71 + i) - for i in range(20)] +wflw_coco133 = ( + [(i * 2, 23 + i) for i in range(17)] + + [(33 + i, 40 + i) for i in range(5)] + + [(42 + i, 45 + i) for i in range(5)] + + [(51 + i, 50 + i) for i in range(9)] + + [(60, 59), (61, 60), (63, 61), (64, 62), (65, 63), (67, 64), (68, 65), (69, 66), (71, 67), (72, 68), (73, 69), (75, 70)] + + [(76 + i, 71 + i) for i in range(20)] +) dataset_wflw = dict( - type='WFLWDataset', + type="WFLWDataset", data_root=data_root, data_mode=data_mode, - ann_file='wflw/annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='pose/WFLW/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=wflw_coco133), *face_pipeline - ], + ann_file="wflw/annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="pose/WFLW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=wflw_coco133), *face_pipeline], ) mapping_300w_coco133 = [(i, 23 + i) for i in range(68)] dataset_300w = dict( - type='Face300WDataset', + type="Face300WDataset", data_root=data_root, data_mode=data_mode, - ann_file='300w/annotations/face_landmarks_300w_train.json', - data_prefix=dict(img='pose/300w/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mapping_300w_coco133), *face_pipeline - ], + ann_file="300w/annotations/face_landmarks_300w_train.json", + data_prefix=dict(img="pose/300w/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mapping_300w_coco133), *face_pipeline], ) -cofw_coco133 = [(0, 40), (2, 44), (4, 42), (1, 49), (3, 45), (6, 47), (8, 59), - (10, 62), (9, 68), (11, 65), (18, 54), (19, 58), (20, 53), - (21, 56), (22, 71), (23, 77), (24, 74), (25, 85), (26, 89), - (27, 80), (28, 31)] +cofw_coco133 = [ + (0, 40), + (2, 44), + (4, 42), + (1, 49), + (3, 45), + (6, 47), + (8, 59), + (10, 62), + (9, 68), + (11, 65), + (18, 54), + (19, 58), + (20, 53), + (21, 56), + (22, 71), + (23, 77), + (24, 74), + (25, 85), + (26, 89), + (27, 80), + (28, 31), +] dataset_cofw = dict( - type='COFWDataset', + type="COFWDataset", data_root=data_root, data_mode=data_mode, - ann_file='cofw/annotations/cofw_train.json', - data_prefix=dict(img='pose/COFW/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=cofw_coco133), *face_pipeline - ], + ann_file="cofw/annotations/cofw_train.json", + data_prefix=dict(img="pose/COFW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=cofw_coco133), *face_pipeline], ) -lapa_coco133 = [(i * 2, 23 + i) for i in range(17)] + [ - (33 + i, 40 + i) for i in range(5) -] + [(42 + i, 45 + i) for i in range(5)] + [ - (51 + i, 50 + i) for i in range(4) -] + [(58 + i, 54 + i) for i in range(5)] + [(66, 59), (67, 60), (69, 61), - (70, 62), (71, 63), (73, 64), - (75, 65), (76, 66), (78, 67), - (79, 68), (80, 69), - (82, 70)] + [(84 + i, 71 + i) - for i in range(20)] +lapa_coco133 = ( + [(i * 2, 23 + i) for i in range(17)] + + [(33 + i, 40 + i) for i in range(5)] + + [(42 + i, 45 + i) for i in range(5)] + + [(51 + i, 50 + i) for i in range(4)] + + [(58 + i, 54 + i) for i in range(5)] + + [(66, 59), (67, 60), (69, 61), (70, 62), (71, 63), (73, 64), (75, 65), (76, 66), (78, 67), (79, 68), (80, 69), (82, 70)] + + [(84 + i, 71 + i) for i in range(20)] +) dataset_lapa = dict( - type='LapaDataset', + type="LapaDataset", data_root=data_root, data_mode=data_mode, - ann_file='LaPa/annotations/lapa_trainval.json', - data_prefix=dict(img='pose/LaPa/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=lapa_coco133), *face_pipeline - ], + ann_file="LaPa/annotations/lapa_trainval.json", + data_prefix=dict(img="pose/LaPa/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=lapa_coco133), *face_pipeline], ) dataset_wb = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[dataset_coco, dataset_halpe, dataset_ubody], pipeline=[], test_mode=False, ) dataset_body = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[ dataset_aic, dataset_crowdpose, @@ -501,8 +440,8 @@ dataset_body = dict( ) dataset_face = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[ dataset_wflw, dataset_300w, @@ -514,45 +453,59 @@ dataset_face = dict( ) hand_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[1.5, 2.0], - rotate_factor=0), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[1.5, 2.0], rotate_factor=0), ] -interhand_left = [(21, 95), (22, 94), (23, 93), (24, 92), (25, 99), (26, 98), - (27, 97), (28, 96), (29, 103), (30, 102), (31, 101), - (32, 100), (33, 107), (34, 106), (35, 105), (36, 104), - (37, 111), (38, 110), (39, 109), (40, 108), (41, 91)] +interhand_left = [ + (21, 95), + (22, 94), + (23, 93), + (24, 92), + (25, 99), + (26, 98), + (27, 97), + (28, 96), + (29, 103), + (30, 102), + (31, 101), + (32, 100), + (33, 107), + (34, 106), + (35, 105), + (36, 104), + (37, 111), + (38, 110), + (39, 109), + (40, 108), + (41, 91), +] interhand_right = [(i - 21, j + 21) for i, j in interhand_left] interhand_coco133 = interhand_right + interhand_left dataset_interhand2d = dict( - type='InterHand2DDoubleDataset', + type="InterHand2DDoubleDataset", data_root=data_root, data_mode=data_mode, - ann_file='interhand26m/annotations/all/InterHand2.6M_train_data.json', - camera_param_file='interhand26m/annotations/all/' - 'InterHand2.6M_train_camera.json', - joint_file='interhand26m/annotations/all/' - 'InterHand2.6M_train_joint_3d.json', - data_prefix=dict(img='interhand2.6m/images/train/'), + ann_file="interhand26m/annotations/all/InterHand2.6M_train_data.json", + camera_param_file="interhand26m/annotations/all/" "InterHand2.6M_train_camera.json", + joint_file="interhand26m/annotations/all/" "InterHand2.6M_train_joint_3d.json", + data_prefix=dict(img="interhand2.6m/images/train/"), sample_interval=10, pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=num_keypoints, mapping=interhand_coco133, - ), *hand_pipeline + ), + *hand_pipeline, ], ) dataset_hand = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[dataset_interhand2d], pipeline=[], test_mode=False, @@ -566,52 +519,42 @@ train_dataloader = dict( num_workers=4, pin_memory=False, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=train_datasets, pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=val_batch_size, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoWholeBodyDataset', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='data/detection/coco/val2017/'), + type="CocoWholeBodyDataset", + ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="data/detection/coco/val2017/"), pipeline=val_pipeline, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - test_mode=True)) + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + test_mode=True, + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-m_8xb1024-270e_cocktail14-256x192.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-m_8xb1024-270e_cocktail14-256x192.py index fc9d90e5cd61a313c940ec9846a3d784f417dc69..1671dfda03225d4a2fb33bc685b64c1e8334ebb3 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-m_8xb1024-270e_cocktail14-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-m_8xb1024-270e_cocktail14-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # common setting num_keypoints = 133 @@ -16,67 +16,54 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=8192) # codec settings -codec = dict( - type='SimCCLabel', - input_size=input_size, - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=input_size, sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='CSPNeXt', - arch='P5', + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, channel_attention=True, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="BN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/rtmpose-m_simcc-ucoco_dw-ucoco_270e-256x192-c8b76419_20230728.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/rtmpose-m_simcc-ucoco_dw-ucoco_270e-256x192-c8b76419_20230728.pth", # noqa + ), + ), neck=dict( - type='CSPNeXtPAFPN', + type="CSPNeXtPAFPN", in_channels=[192, 384, 768], out_channels=None, out_indices=( @@ -85,112 +72,90 @@ model = dict( ), num_csp_blocks=2, expand_ratio=0.5, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU', inplace=True)), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU", inplace=True), + ), head=dict( - type='RTMWHead', + type="RTMWHead", in_channels=768, out_channels=num_keypoints, input_size=input_size, in_featuremap_size=tuple([s // 32 for s in input_size]), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), loss=dict( - type='KLDiscretLoss', + type="KLDiscretLoss", use_target_weight=True, - beta=1., + beta=1.0, label_softmax=True, - label_beta=10., + label_beta=10.0, mask=list(range(23, 91)), mask_weight=0.5, ), - decoder=codec), - test_cfg=dict(flip_test=True)) + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PhotometricDistortion"), dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PhotometricDistortion'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] # mapping -aic_coco133 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), - (7, 14), (8, 16), (9, 11), (10, 13), (11, 15)] +aic_coco133 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15)] -crowdpose_coco133 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), - (7, 12), (8, 13), (9, 14), (10, 15), (11, 16)] +crowdpose_coco133 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), (7, 12), (8, 13), (9, 14), (10, 15), (11, 16)] mpii_coco133 = [ (0, 16), @@ -222,11 +187,9 @@ jhmdb_coco133 = [ (14, 15), ] -halpe_coco133 = [(i, i) - for i in range(17)] + [(20, 17), (21, 20), (22, 18), (23, 21), - (24, 19), - (25, 22)] + [(i, i - 3) - for i in range(26, 136)] +halpe_coco133 = ( + [(i, i) for i in range(17)] + [(20, 17), (21, 20), (22, 18), (23, 21), (24, 19), (25, 22)] + [(i, i - 3) for i in range(26, 136)] +) posetrack_coco133 = [ (0, 0), @@ -246,246 +209,215 @@ posetrack_coco133 = [ (16, 16), ] -humanart_coco133 = [(i, i) for i in range(17)] + [(17, 99), (18, 120), - (19, 17), (20, 20)] +humanart_coco133 = [(i, i) for i in range(17)] + [(17, 99), (18, 120), (19, 17), (20, 20)] # train datasets dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_coco133) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_coco133)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_coco133) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_coco133)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_coco133) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_coco133)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_coco133) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_coco133)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_coco133) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_coco133)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_coco133) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_coco133)], ) dataset_humanart = dict( - type='HumanArt21Dataset', + type="HumanArt21Dataset", data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart.json', - filter_cfg=dict(scenes=['real_human']), - data_prefix=dict(img='pose/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=humanart_coco133) - ]) + ann_file="HumanArt/annotations/training_humanart.json", + filter_cfg=dict(scenes=["real_human"]), + data_prefix=dict(img="pose/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=humanart_coco133)], +) ubody_scenes = [ - 'Magic_show', 'Entertainment', 'ConductMusic', 'Online_class', 'TalkShow', - 'Speech', 'Fitness', 'Interview', 'Olympic', 'TVShow', 'Singing', - 'SignLanguage', 'Movie', 'LiveVlog', 'VideoConference' + "Magic_show", + "Entertainment", + "ConductMusic", + "Online_class", + "TalkShow", + "Speech", + "Fitness", + "Interview", + "Olympic", + "TVShow", + "Singing", + "SignLanguage", + "Movie", + "LiveVlog", + "VideoConference", ] ubody_datasets = [] for scene in ubody_scenes: each = dict( - type='UBody2dDataset', + type="UBody2dDataset", data_root=data_root, data_mode=data_mode, - ann_file=f'Ubody/annotations/{scene}/train_annotations.json', - data_prefix=dict(img='pose/UBody/images/'), + ann_file=f"Ubody/annotations/{scene}/train_annotations.json", + data_prefix=dict(img="pose/UBody/images/"), pipeline=[], - sample_interval=10) + sample_interval=10, + ) ubody_datasets.append(each) dataset_ubody = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/ubody2d.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/ubody2d.py"), datasets=ubody_datasets, pipeline=[], test_mode=False, ) face_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale', padding=1.25), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[1.5, 2.0], - rotate_factor=0), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale", padding=1.25), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[1.5, 2.0], rotate_factor=0), ] -wflw_coco133 = [(i * 2, 23 + i) - for i in range(17)] + [(33 + i, 40 + i) for i in range(5)] + [ - (42 + i, 45 + i) for i in range(5) - ] + [(51 + i, 50 + i) - for i in range(9)] + [(60, 59), (61, 60), (63, 61), - (64, 62), (65, 63), (67, 64), - (68, 65), (69, 66), (71, 67), - (72, 68), (73, 69), - (75, 70)] + [(76 + i, 71 + i) - for i in range(20)] +wflw_coco133 = ( + [(i * 2, 23 + i) for i in range(17)] + + [(33 + i, 40 + i) for i in range(5)] + + [(42 + i, 45 + i) for i in range(5)] + + [(51 + i, 50 + i) for i in range(9)] + + [(60, 59), (61, 60), (63, 61), (64, 62), (65, 63), (67, 64), (68, 65), (69, 66), (71, 67), (72, 68), (73, 69), (75, 70)] + + [(76 + i, 71 + i) for i in range(20)] +) dataset_wflw = dict( - type='WFLWDataset', + type="WFLWDataset", data_root=data_root, data_mode=data_mode, - ann_file='wflw/annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='pose/WFLW/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=wflw_coco133), *face_pipeline - ], + ann_file="wflw/annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="pose/WFLW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=wflw_coco133), *face_pipeline], ) mapping_300w_coco133 = [(i, 23 + i) for i in range(68)] dataset_300w = dict( - type='Face300WDataset', + type="Face300WDataset", data_root=data_root, data_mode=data_mode, - ann_file='300w/annotations/face_landmarks_300w_train.json', - data_prefix=dict(img='pose/300w/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mapping_300w_coco133), *face_pipeline - ], + ann_file="300w/annotations/face_landmarks_300w_train.json", + data_prefix=dict(img="pose/300w/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mapping_300w_coco133), *face_pipeline], ) -cofw_coco133 = [(0, 40), (2, 44), (4, 42), (1, 49), (3, 45), (6, 47), (8, 59), - (10, 62), (9, 68), (11, 65), (18, 54), (19, 58), (20, 53), - (21, 56), (22, 71), (23, 77), (24, 74), (25, 85), (26, 89), - (27, 80), (28, 31)] +cofw_coco133 = [ + (0, 40), + (2, 44), + (4, 42), + (1, 49), + (3, 45), + (6, 47), + (8, 59), + (10, 62), + (9, 68), + (11, 65), + (18, 54), + (19, 58), + (20, 53), + (21, 56), + (22, 71), + (23, 77), + (24, 74), + (25, 85), + (26, 89), + (27, 80), + (28, 31), +] dataset_cofw = dict( - type='COFWDataset', + type="COFWDataset", data_root=data_root, data_mode=data_mode, - ann_file='cofw/annotations/cofw_train.json', - data_prefix=dict(img='pose/COFW/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=cofw_coco133), *face_pipeline - ], + ann_file="cofw/annotations/cofw_train.json", + data_prefix=dict(img="pose/COFW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=cofw_coco133), *face_pipeline], ) -lapa_coco133 = [(i * 2, 23 + i) for i in range(17)] + [ - (33 + i, 40 + i) for i in range(5) -] + [(42 + i, 45 + i) for i in range(5)] + [ - (51 + i, 50 + i) for i in range(4) -] + [(58 + i, 54 + i) for i in range(5)] + [(66, 59), (67, 60), (69, 61), - (70, 62), (71, 63), (73, 64), - (75, 65), (76, 66), (78, 67), - (79, 68), (80, 69), - (82, 70)] + [(84 + i, 71 + i) - for i in range(20)] +lapa_coco133 = ( + [(i * 2, 23 + i) for i in range(17)] + + [(33 + i, 40 + i) for i in range(5)] + + [(42 + i, 45 + i) for i in range(5)] + + [(51 + i, 50 + i) for i in range(4)] + + [(58 + i, 54 + i) for i in range(5)] + + [(66, 59), (67, 60), (69, 61), (70, 62), (71, 63), (73, 64), (75, 65), (76, 66), (78, 67), (79, 68), (80, 69), (82, 70)] + + [(84 + i, 71 + i) for i in range(20)] +) dataset_lapa = dict( - type='LapaDataset', + type="LapaDataset", data_root=data_root, data_mode=data_mode, - ann_file='LaPa/annotations/lapa_trainval.json', - data_prefix=dict(img='pose/LaPa/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=lapa_coco133), *face_pipeline - ], + ann_file="LaPa/annotations/lapa_trainval.json", + data_prefix=dict(img="pose/LaPa/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=lapa_coco133), *face_pipeline], ) dataset_wb = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[dataset_coco, dataset_halpe, dataset_ubody], pipeline=[], test_mode=False, ) dataset_body = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[ dataset_aic, dataset_crowdpose, @@ -499,8 +431,8 @@ dataset_body = dict( ) dataset_face = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[ dataset_wflw, dataset_300w, @@ -512,45 +444,59 @@ dataset_face = dict( ) hand_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[1.5, 2.0], - rotate_factor=0), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[1.5, 2.0], rotate_factor=0), ] -interhand_left = [(21, 95), (22, 94), (23, 93), (24, 92), (25, 99), (26, 98), - (27, 97), (28, 96), (29, 103), (30, 102), (31, 101), - (32, 100), (33, 107), (34, 106), (35, 105), (36, 104), - (37, 111), (38, 110), (39, 109), (40, 108), (41, 91)] +interhand_left = [ + (21, 95), + (22, 94), + (23, 93), + (24, 92), + (25, 99), + (26, 98), + (27, 97), + (28, 96), + (29, 103), + (30, 102), + (31, 101), + (32, 100), + (33, 107), + (34, 106), + (35, 105), + (36, 104), + (37, 111), + (38, 110), + (39, 109), + (40, 108), + (41, 91), +] interhand_right = [(i - 21, j + 21) for i, j in interhand_left] interhand_coco133 = interhand_right + interhand_left dataset_interhand2d = dict( - type='InterHand2DDoubleDataset', + type="InterHand2DDoubleDataset", data_root=data_root, data_mode=data_mode, - ann_file='interhand26m/annotations/all/InterHand2.6M_train_data.json', - camera_param_file='interhand26m/annotations/all/' - 'InterHand2.6M_train_camera.json', - joint_file='interhand26m/annotations/all/' - 'InterHand2.6M_train_joint_3d.json', - data_prefix=dict(img='interhand2.6m/images/train/'), + ann_file="interhand26m/annotations/all/InterHand2.6M_train_data.json", + camera_param_file="interhand26m/annotations/all/" "InterHand2.6M_train_camera.json", + joint_file="interhand26m/annotations/all/" "InterHand2.6M_train_joint_3d.json", + data_prefix=dict(img="interhand2.6m/images/train/"), sample_interval=10, pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=num_keypoints, mapping=interhand_coco133, - ), *hand_pipeline + ), + *hand_pipeline, ], ) dataset_hand = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[dataset_interhand2d], pipeline=[], test_mode=False, @@ -564,52 +510,42 @@ train_dataloader = dict( num_workers=4, pin_memory=False, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=train_datasets, pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=val_batch_size, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoWholeBodyDataset', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='data/detection/coco/val2017/'), + type="CocoWholeBodyDataset", + ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="data/detection/coco/val2017/"), pipeline=val_pipeline, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - test_mode=True)) + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + test_mode=True, + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-x_8xb320-270e_cocktail14-384x288.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-x_8xb320-270e_cocktail14-384x288.py index 115dc9408b7cda685a387fb058f37298e67f28fe..f64e9a106231374bfa88d56a5ca67125cb453116 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-x_8xb320-270e_cocktail14-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-x_8xb320-270e_cocktail14-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # common setting num_keypoints = 133 @@ -16,29 +16,25 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.1), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.1), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 150 to 300 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size @@ -46,39 +42,37 @@ auto_scale_lr = dict(base_batch_size=2560) # codec settings codec = dict( - type='SimCCLabel', + type="SimCCLabel", input_size=input_size, - sigma=(6., 6.93), + sigma=(6.0, 6.93), simcc_split_ratio=2.0, normalize=False, use_dark=False, - decode_visibility=True) + decode_visibility=True, +) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='CSPNeXt', - arch='P5', + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=1.33, widen_factor=1.25, channel_attention=True, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="BN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/' - 'wholebody_2d_keypoint/rtmpose/ubody/rtmpose-x_simcc-ucoco_pt-aic-coco_270e-384x288-f5b50679_20230822.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/" + "wholebody_2d_keypoint/rtmpose/ubody/rtmpose-x_simcc-ucoco_pt-aic-coco_270e-384x288-f5b50679_20230822.pth", # noqa + ), + ), neck=dict( - type='CSPNeXtPAFPN', + type="CSPNeXtPAFPN", in_channels=[320, 640, 1280], out_channels=None, out_indices=( @@ -87,112 +81,90 @@ model = dict( ), num_csp_blocks=2, expand_ratio=0.5, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU', inplace=True)), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU", inplace=True), + ), head=dict( - type='RTMWHead', + type="RTMWHead", in_channels=1280, out_channels=num_keypoints, input_size=input_size, in_featuremap_size=tuple([s // 32 for s in input_size]), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), loss=dict( - type='KLDiscretLoss', + type="KLDiscretLoss", use_target_weight=True, - beta=1., + beta=1.0, label_softmax=True, - label_beta=10., + label_beta=10.0, mask=list(range(23, 91)), mask_weight=0.5, ), - decoder=codec), - test_cfg=dict(flip_test=True)) + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PhotometricDistortion'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PhotometricDistortion"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] # mapping -aic_coco133 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), - (7, 14), (8, 16), (9, 11), (10, 13), (11, 15)] +aic_coco133 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15)] -crowdpose_coco133 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), - (7, 12), (8, 13), (9, 14), (10, 15), (11, 16)] +crowdpose_coco133 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), (7, 12), (8, 13), (9, 14), (10, 15), (11, 16)] mpii_coco133 = [ (0, 16), @@ -224,11 +196,9 @@ jhmdb_coco133 = [ (14, 15), ] -halpe_coco133 = [(i, i) - for i in range(17)] + [(20, 17), (21, 20), (22, 18), (23, 21), - (24, 19), - (25, 22)] + [(i, i - 3) - for i in range(26, 136)] +halpe_coco133 = ( + [(i, i) for i in range(17)] + [(20, 17), (21, 20), (22, 18), (23, 21), (24, 19), (25, 22)] + [(i, i - 3) for i in range(26, 136)] +) posetrack_coco133 = [ (0, 0), @@ -248,246 +218,215 @@ posetrack_coco133 = [ (16, 16), ] -humanart_coco133 = [(i, i) for i in range(17)] + [(17, 99), (18, 120), - (19, 17), (20, 20)] +humanart_coco133 = [(i, i) for i in range(17)] + [(17, 99), (18, 120), (19, 17), (20, 20)] # train datasets dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_coco133) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_coco133)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_coco133) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_coco133)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_coco133) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_coco133)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_coco133) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_coco133)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_coco133) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_coco133)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_coco133) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_coco133)], ) dataset_humanart = dict( - type='HumanArt21Dataset', + type="HumanArt21Dataset", data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart.json', - filter_cfg=dict(scenes=['real_human']), - data_prefix=dict(img='pose/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=humanart_coco133) - ]) + ann_file="HumanArt/annotations/training_humanart.json", + filter_cfg=dict(scenes=["real_human"]), + data_prefix=dict(img="pose/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=humanart_coco133)], +) ubody_scenes = [ - 'Magic_show', 'Entertainment', 'ConductMusic', 'Online_class', 'TalkShow', - 'Speech', 'Fitness', 'Interview', 'Olympic', 'TVShow', 'Singing', - 'SignLanguage', 'Movie', 'LiveVlog', 'VideoConference' + "Magic_show", + "Entertainment", + "ConductMusic", + "Online_class", + "TalkShow", + "Speech", + "Fitness", + "Interview", + "Olympic", + "TVShow", + "Singing", + "SignLanguage", + "Movie", + "LiveVlog", + "VideoConference", ] ubody_datasets = [] for scene in ubody_scenes: each = dict( - type='UBody2dDataset', + type="UBody2dDataset", data_root=data_root, data_mode=data_mode, - ann_file=f'Ubody/annotations/{scene}/train_annotations.json', - data_prefix=dict(img='pose/UBody/images/'), + ann_file=f"Ubody/annotations/{scene}/train_annotations.json", + data_prefix=dict(img="pose/UBody/images/"), pipeline=[], - sample_interval=10) + sample_interval=10, + ) ubody_datasets.append(each) dataset_ubody = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/ubody2d.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/ubody2d.py"), datasets=ubody_datasets, pipeline=[], test_mode=False, ) face_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale', padding=1.25), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[1.5, 2.0], - rotate_factor=0), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale", padding=1.25), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[1.5, 2.0], rotate_factor=0), ] -wflw_coco133 = [(i * 2, 23 + i) - for i in range(17)] + [(33 + i, 40 + i) for i in range(5)] + [ - (42 + i, 45 + i) for i in range(5) - ] + [(51 + i, 50 + i) - for i in range(9)] + [(60, 59), (61, 60), (63, 61), - (64, 62), (65, 63), (67, 64), - (68, 65), (69, 66), (71, 67), - (72, 68), (73, 69), - (75, 70)] + [(76 + i, 71 + i) - for i in range(20)] +wflw_coco133 = ( + [(i * 2, 23 + i) for i in range(17)] + + [(33 + i, 40 + i) for i in range(5)] + + [(42 + i, 45 + i) for i in range(5)] + + [(51 + i, 50 + i) for i in range(9)] + + [(60, 59), (61, 60), (63, 61), (64, 62), (65, 63), (67, 64), (68, 65), (69, 66), (71, 67), (72, 68), (73, 69), (75, 70)] + + [(76 + i, 71 + i) for i in range(20)] +) dataset_wflw = dict( - type='WFLWDataset', + type="WFLWDataset", data_root=data_root, data_mode=data_mode, - ann_file='wflw/annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='pose/WFLW/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=wflw_coco133), *face_pipeline - ], + ann_file="wflw/annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="pose/WFLW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=wflw_coco133), *face_pipeline], ) mapping_300w_coco133 = [(i, 23 + i) for i in range(68)] dataset_300w = dict( - type='Face300WDataset', + type="Face300WDataset", data_root=data_root, data_mode=data_mode, - ann_file='300w/annotations/face_landmarks_300w_train.json', - data_prefix=dict(img='pose/300w/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mapping_300w_coco133), *face_pipeline - ], + ann_file="300w/annotations/face_landmarks_300w_train.json", + data_prefix=dict(img="pose/300w/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mapping_300w_coco133), *face_pipeline], ) -cofw_coco133 = [(0, 40), (2, 44), (4, 42), (1, 49), (3, 45), (6, 47), (8, 59), - (10, 62), (9, 68), (11, 65), (18, 54), (19, 58), (20, 53), - (21, 56), (22, 71), (23, 77), (24, 74), (25, 85), (26, 89), - (27, 80), (28, 31)] +cofw_coco133 = [ + (0, 40), + (2, 44), + (4, 42), + (1, 49), + (3, 45), + (6, 47), + (8, 59), + (10, 62), + (9, 68), + (11, 65), + (18, 54), + (19, 58), + (20, 53), + (21, 56), + (22, 71), + (23, 77), + (24, 74), + (25, 85), + (26, 89), + (27, 80), + (28, 31), +] dataset_cofw = dict( - type='COFWDataset', + type="COFWDataset", data_root=data_root, data_mode=data_mode, - ann_file='cofw/annotations/cofw_train.json', - data_prefix=dict(img='pose/COFW/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=cofw_coco133), *face_pipeline - ], + ann_file="cofw/annotations/cofw_train.json", + data_prefix=dict(img="pose/COFW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=cofw_coco133), *face_pipeline], ) -lapa_coco133 = [(i * 2, 23 + i) for i in range(17)] + [ - (33 + i, 40 + i) for i in range(5) -] + [(42 + i, 45 + i) for i in range(5)] + [ - (51 + i, 50 + i) for i in range(4) -] + [(58 + i, 54 + i) for i in range(5)] + [(66, 59), (67, 60), (69, 61), - (70, 62), (71, 63), (73, 64), - (75, 65), (76, 66), (78, 67), - (79, 68), (80, 69), - (82, 70)] + [(84 + i, 71 + i) - for i in range(20)] +lapa_coco133 = ( + [(i * 2, 23 + i) for i in range(17)] + + [(33 + i, 40 + i) for i in range(5)] + + [(42 + i, 45 + i) for i in range(5)] + + [(51 + i, 50 + i) for i in range(4)] + + [(58 + i, 54 + i) for i in range(5)] + + [(66, 59), (67, 60), (69, 61), (70, 62), (71, 63), (73, 64), (75, 65), (76, 66), (78, 67), (79, 68), (80, 69), (82, 70)] + + [(84 + i, 71 + i) for i in range(20)] +) dataset_lapa = dict( - type='LapaDataset', + type="LapaDataset", data_root=data_root, data_mode=data_mode, - ann_file='LaPa/annotations/lapa_trainval.json', - data_prefix=dict(img='pose/LaPa/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=lapa_coco133), *face_pipeline - ], + ann_file="LaPa/annotations/lapa_trainval.json", + data_prefix=dict(img="pose/LaPa/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=lapa_coco133), *face_pipeline], ) dataset_wb = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[dataset_coco, dataset_halpe, dataset_ubody], pipeline=[], test_mode=False, ) dataset_body = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[ dataset_aic, dataset_crowdpose, @@ -501,8 +440,8 @@ dataset_body = dict( ) dataset_face = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[ dataset_wflw, dataset_300w, @@ -514,45 +453,59 @@ dataset_face = dict( ) hand_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[1.5, 2.0], - rotate_factor=0), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[1.5, 2.0], rotate_factor=0), ] -interhand_left = [(21, 95), (22, 94), (23, 93), (24, 92), (25, 99), (26, 98), - (27, 97), (28, 96), (29, 103), (30, 102), (31, 101), - (32, 100), (33, 107), (34, 106), (35, 105), (36, 104), - (37, 111), (38, 110), (39, 109), (40, 108), (41, 91)] +interhand_left = [ + (21, 95), + (22, 94), + (23, 93), + (24, 92), + (25, 99), + (26, 98), + (27, 97), + (28, 96), + (29, 103), + (30, 102), + (31, 101), + (32, 100), + (33, 107), + (34, 106), + (35, 105), + (36, 104), + (37, 111), + (38, 110), + (39, 109), + (40, 108), + (41, 91), +] interhand_right = [(i - 21, j + 21) for i, j in interhand_left] interhand_coco133 = interhand_right + interhand_left dataset_interhand2d = dict( - type='InterHand2DDoubleDataset', + type="InterHand2DDoubleDataset", data_root=data_root, data_mode=data_mode, - ann_file='interhand26m/annotations/all/InterHand2.6M_train_data.json', - camera_param_file='interhand26m/annotations/all/' - 'InterHand2.6M_train_camera.json', - joint_file='interhand26m/annotations/all/' - 'InterHand2.6M_train_joint_3d.json', - data_prefix=dict(img='interhand2.6m/images/train/'), + ann_file="interhand26m/annotations/all/InterHand2.6M_train_data.json", + camera_param_file="interhand26m/annotations/all/" "InterHand2.6M_train_camera.json", + joint_file="interhand26m/annotations/all/" "InterHand2.6M_train_joint_3d.json", + data_prefix=dict(img="interhand2.6m/images/train/"), sample_interval=10, pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=num_keypoints, mapping=interhand_coco133, - ), *hand_pipeline + ), + *hand_pipeline, ], ) dataset_hand = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[dataset_interhand2d], pipeline=[], test_mode=False, @@ -566,52 +519,42 @@ train_dataloader = dict( num_workers=4, pin_memory=False, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=train_datasets, pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=val_batch_size, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoWholeBodyDataset', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='data/detection/coco/val2017/'), + type="CocoWholeBodyDataset", + ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="data/detection/coco/val2017/"), pipeline=val_pipeline, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - test_mode=True)) + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + test_mode=True, + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-x_8xb704-270e_cocktail14-256x192.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-x_8xb704-270e_cocktail14-256x192.py index 750ad46d3d1c6982837fa75ca3083245a492a9bd..18cc430888dc8f0f0812b50498fbcbac0889dcb5 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-x_8xb704-270e_cocktail14-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/cocktail14/rtmw-x_8xb704-270e_cocktail14-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # common setting num_keypoints = 133 @@ -16,67 +16,54 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.1), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.1), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=5632) # codec settings -codec = dict( - type='SimCCLabel', - input_size=input_size, - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=input_size, sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='CSPNeXt', - arch='P5', + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=1.33, widen_factor=1.25, channel_attention=True, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="BN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/' - 'wholebody_2d_keypoint/rtmpose/ubody/rtmpose-x_simcc-ucoco_pt-aic-coco_270e-256x192-05f5bcb7_20230822.pth' # noqa - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/" + "wholebody_2d_keypoint/rtmpose/ubody/rtmpose-x_simcc-ucoco_pt-aic-coco_270e-256x192-05f5bcb7_20230822.pth", # noqa + ), + ), neck=dict( - type='CSPNeXtPAFPN', + type="CSPNeXtPAFPN", in_channels=[320, 640, 1280], out_channels=None, out_indices=( @@ -85,112 +72,90 @@ model = dict( ), num_csp_blocks=2, expand_ratio=0.5, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU', inplace=True)), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU", inplace=True), + ), head=dict( - type='RTMWHead', + type="RTMWHead", in_channels=1280, out_channels=num_keypoints, input_size=input_size, in_featuremap_size=tuple([s // 32 for s in input_size]), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), loss=dict( - type='KLDiscretLoss', + type="KLDiscretLoss", use_target_weight=True, - beta=1., + beta=1.0, label_softmax=True, - label_beta=10., + label_beta=10.0, mask=list(range(23, 91)), mask_weight=0.5, ), - decoder=codec), - test_cfg=dict(flip_test=True)) + decoder=codec, + ), + test_cfg=dict(flip_test=True), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PhotometricDistortion"), dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PhotometricDistortion'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - ]), - dict( - type='GenerateTarget', - encoder=codec, - use_dataset_keypoint_weights=True), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + ], + ), + dict(type="GenerateTarget", encoder=codec, use_dataset_keypoint_weights=True), + dict(type="PackPoseInputs"), ] # mapping -aic_coco133 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), - (7, 14), (8, 16), (9, 11), (10, 13), (11, 15)] +aic_coco133 = [(0, 6), (1, 8), (2, 10), (3, 5), (4, 7), (5, 9), (6, 12), (7, 14), (8, 16), (9, 11), (10, 13), (11, 15)] -crowdpose_coco133 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), - (7, 12), (8, 13), (9, 14), (10, 15), (11, 16)] +crowdpose_coco133 = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (6, 11), (7, 12), (8, 13), (9, 14), (10, 15), (11, 16)] mpii_coco133 = [ (0, 16), @@ -222,11 +187,9 @@ jhmdb_coco133 = [ (14, 15), ] -halpe_coco133 = [(i, i) - for i in range(17)] + [(20, 17), (21, 20), (22, 18), (23, 21), - (24, 19), - (25, 22)] + [(i, i - 3) - for i in range(26, 136)] +halpe_coco133 = ( + [(i, i) for i in range(17)] + [(20, 17), (21, 20), (22, 18), (23, 21), (24, 19), (25, 22)] + [(i, i - 3) for i in range(26, 136)] +) posetrack_coco133 = [ (0, 0), @@ -246,246 +209,215 @@ posetrack_coco133 = [ (16, 16), ] -humanart_coco133 = [(i, i) for i in range(17)] + [(17, 99), (18, 120), - (19, 17), (20, 20)] +humanart_coco133 = [(i, i) for i in range(17)] + [(17, 99), (18, 120), (19, 17), (20, 20)] # train datasets dataset_coco = dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='coco/annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='detection/coco/train2017/'), + ann_file="coco/annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="detection/coco/train2017/"), pipeline=[], ) dataset_aic = dict( - type='AicDataset', + type="AicDataset", data_root=data_root, data_mode=data_mode, - ann_file='aic/annotations/aic_train.json', - data_prefix=dict(img='pose/ai_challenge/ai_challenger_keypoint' - '_train_20170902/keypoint_train_images_20170902/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=aic_coco133) - ], + ann_file="aic/annotations/aic_train.json", + data_prefix=dict(img="pose/ai_challenge/ai_challenger_keypoint" "_train_20170902/keypoint_train_images_20170902/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=aic_coco133)], ) dataset_crowdpose = dict( - type='CrowdPoseDataset', + type="CrowdPoseDataset", data_root=data_root, data_mode=data_mode, - ann_file='crowdpose/annotations/mmpose_crowdpose_trainval.json', - data_prefix=dict(img='pose/CrowdPose/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=crowdpose_coco133) - ], + ann_file="crowdpose/annotations/mmpose_crowdpose_trainval.json", + data_prefix=dict(img="pose/CrowdPose/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=crowdpose_coco133)], ) dataset_mpii = dict( - type='MpiiDataset', + type="MpiiDataset", data_root=data_root, data_mode=data_mode, - ann_file='mpii/annotations/mpii_train.json', - data_prefix=dict(img='pose/MPI/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mpii_coco133) - ], + ann_file="mpii/annotations/mpii_train.json", + data_prefix=dict(img="pose/MPI/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mpii_coco133)], ) dataset_jhmdb = dict( - type='JhmdbDataset', + type="JhmdbDataset", data_root=data_root, data_mode=data_mode, - ann_file='jhmdb/annotations/Sub1_train.json', - data_prefix=dict(img='pose/JHMDB/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=jhmdb_coco133) - ], + ann_file="jhmdb/annotations/Sub1_train.json", + data_prefix=dict(img="pose/JHMDB/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=jhmdb_coco133)], ) dataset_halpe = dict( - type='HalpeDataset', + type="HalpeDataset", data_root=data_root, data_mode=data_mode, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict(img='pose/Halpe/hico_20160224_det/images/train2015'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=halpe_coco133) - ], + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=halpe_coco133)], ) dataset_posetrack = dict( - type='PoseTrack18Dataset', + type="PoseTrack18Dataset", data_root=data_root, data_mode=data_mode, - ann_file='posetrack18/annotations/posetrack18_train.json', - data_prefix=dict(img='pose/PoseChallenge2018/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=posetrack_coco133) - ], + ann_file="posetrack18/annotations/posetrack18_train.json", + data_prefix=dict(img="pose/PoseChallenge2018/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=posetrack_coco133)], ) dataset_humanart = dict( - type='HumanArt21Dataset', + type="HumanArt21Dataset", data_root=data_root, data_mode=data_mode, - ann_file='HumanArt/annotations/training_humanart.json', - filter_cfg=dict(scenes=['real_human']), - data_prefix=dict(img='pose/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=humanart_coco133) - ]) + ann_file="HumanArt/annotations/training_humanart.json", + filter_cfg=dict(scenes=["real_human"]), + data_prefix=dict(img="pose/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=humanart_coco133)], +) ubody_scenes = [ - 'Magic_show', 'Entertainment', 'ConductMusic', 'Online_class', 'TalkShow', - 'Speech', 'Fitness', 'Interview', 'Olympic', 'TVShow', 'Singing', - 'SignLanguage', 'Movie', 'LiveVlog', 'VideoConference' + "Magic_show", + "Entertainment", + "ConductMusic", + "Online_class", + "TalkShow", + "Speech", + "Fitness", + "Interview", + "Olympic", + "TVShow", + "Singing", + "SignLanguage", + "Movie", + "LiveVlog", + "VideoConference", ] ubody_datasets = [] for scene in ubody_scenes: each = dict( - type='UBody2dDataset', + type="UBody2dDataset", data_root=data_root, data_mode=data_mode, - ann_file=f'Ubody/annotations/{scene}/train_annotations.json', - data_prefix=dict(img='pose/UBody/images/'), + ann_file=f"Ubody/annotations/{scene}/train_annotations.json", + data_prefix=dict(img="pose/UBody/images/"), pipeline=[], - sample_interval=10) + sample_interval=10, + ) ubody_datasets.append(each) dataset_ubody = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/ubody2d.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/ubody2d.py"), datasets=ubody_datasets, pipeline=[], test_mode=False, ) face_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale', padding=1.25), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[1.5, 2.0], - rotate_factor=0), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale", padding=1.25), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[1.5, 2.0], rotate_factor=0), ] -wflw_coco133 = [(i * 2, 23 + i) - for i in range(17)] + [(33 + i, 40 + i) for i in range(5)] + [ - (42 + i, 45 + i) for i in range(5) - ] + [(51 + i, 50 + i) - for i in range(9)] + [(60, 59), (61, 60), (63, 61), - (64, 62), (65, 63), (67, 64), - (68, 65), (69, 66), (71, 67), - (72, 68), (73, 69), - (75, 70)] + [(76 + i, 71 + i) - for i in range(20)] +wflw_coco133 = ( + [(i * 2, 23 + i) for i in range(17)] + + [(33 + i, 40 + i) for i in range(5)] + + [(42 + i, 45 + i) for i in range(5)] + + [(51 + i, 50 + i) for i in range(9)] + + [(60, 59), (61, 60), (63, 61), (64, 62), (65, 63), (67, 64), (68, 65), (69, 66), (71, 67), (72, 68), (73, 69), (75, 70)] + + [(76 + i, 71 + i) for i in range(20)] +) dataset_wflw = dict( - type='WFLWDataset', + type="WFLWDataset", data_root=data_root, data_mode=data_mode, - ann_file='wflw/annotations/face_landmarks_wflw_train.json', - data_prefix=dict(img='pose/WFLW/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=wflw_coco133), *face_pipeline - ], + ann_file="wflw/annotations/face_landmarks_wflw_train.json", + data_prefix=dict(img="pose/WFLW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=wflw_coco133), *face_pipeline], ) mapping_300w_coco133 = [(i, 23 + i) for i in range(68)] dataset_300w = dict( - type='Face300WDataset', + type="Face300WDataset", data_root=data_root, data_mode=data_mode, - ann_file='300w/annotations/face_landmarks_300w_train.json', - data_prefix=dict(img='pose/300w/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=mapping_300w_coco133), *face_pipeline - ], + ann_file="300w/annotations/face_landmarks_300w_train.json", + data_prefix=dict(img="pose/300w/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=mapping_300w_coco133), *face_pipeline], ) -cofw_coco133 = [(0, 40), (2, 44), (4, 42), (1, 49), (3, 45), (6, 47), (8, 59), - (10, 62), (9, 68), (11, 65), (18, 54), (19, 58), (20, 53), - (21, 56), (22, 71), (23, 77), (24, 74), (25, 85), (26, 89), - (27, 80), (28, 31)] +cofw_coco133 = [ + (0, 40), + (2, 44), + (4, 42), + (1, 49), + (3, 45), + (6, 47), + (8, 59), + (10, 62), + (9, 68), + (11, 65), + (18, 54), + (19, 58), + (20, 53), + (21, 56), + (22, 71), + (23, 77), + (24, 74), + (25, 85), + (26, 89), + (27, 80), + (28, 31), +] dataset_cofw = dict( - type='COFWDataset', + type="COFWDataset", data_root=data_root, data_mode=data_mode, - ann_file='cofw/annotations/cofw_train.json', - data_prefix=dict(img='pose/COFW/images/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=cofw_coco133), *face_pipeline - ], + ann_file="cofw/annotations/cofw_train.json", + data_prefix=dict(img="pose/COFW/images/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=cofw_coco133), *face_pipeline], ) -lapa_coco133 = [(i * 2, 23 + i) for i in range(17)] + [ - (33 + i, 40 + i) for i in range(5) -] + [(42 + i, 45 + i) for i in range(5)] + [ - (51 + i, 50 + i) for i in range(4) -] + [(58 + i, 54 + i) for i in range(5)] + [(66, 59), (67, 60), (69, 61), - (70, 62), (71, 63), (73, 64), - (75, 65), (76, 66), (78, 67), - (79, 68), (80, 69), - (82, 70)] + [(84 + i, 71 + i) - for i in range(20)] +lapa_coco133 = ( + [(i * 2, 23 + i) for i in range(17)] + + [(33 + i, 40 + i) for i in range(5)] + + [(42 + i, 45 + i) for i in range(5)] + + [(51 + i, 50 + i) for i in range(4)] + + [(58 + i, 54 + i) for i in range(5)] + + [(66, 59), (67, 60), (69, 61), (70, 62), (71, 63), (73, 64), (75, 65), (76, 66), (78, 67), (79, 68), (80, 69), (82, 70)] + + [(84 + i, 71 + i) for i in range(20)] +) dataset_lapa = dict( - type='LapaDataset', + type="LapaDataset", data_root=data_root, data_mode=data_mode, - ann_file='LaPa/annotations/lapa_trainval.json', - data_prefix=dict(img='pose/LaPa/'), - pipeline=[ - dict( - type='KeypointConverter', - num_keypoints=num_keypoints, - mapping=lapa_coco133), *face_pipeline - ], + ann_file="LaPa/annotations/lapa_trainval.json", + data_prefix=dict(img="pose/LaPa/"), + pipeline=[dict(type="KeypointConverter", num_keypoints=num_keypoints, mapping=lapa_coco133), *face_pipeline], ) dataset_wb = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[dataset_coco, dataset_halpe, dataset_ubody], pipeline=[], test_mode=False, ) dataset_body = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[ dataset_aic, dataset_crowdpose, @@ -499,8 +431,8 @@ dataset_body = dict( ) dataset_face = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[ dataset_wflw, dataset_300w, @@ -512,45 +444,59 @@ dataset_face = dict( ) hand_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[1.5, 2.0], - rotate_factor=0), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[1.5, 2.0], rotate_factor=0), ] -interhand_left = [(21, 95), (22, 94), (23, 93), (24, 92), (25, 99), (26, 98), - (27, 97), (28, 96), (29, 103), (30, 102), (31, 101), - (32, 100), (33, 107), (34, 106), (35, 105), (36, 104), - (37, 111), (38, 110), (39, 109), (40, 108), (41, 91)] +interhand_left = [ + (21, 95), + (22, 94), + (23, 93), + (24, 92), + (25, 99), + (26, 98), + (27, 97), + (28, 96), + (29, 103), + (30, 102), + (31, 101), + (32, 100), + (33, 107), + (34, 106), + (35, 105), + (36, 104), + (37, 111), + (38, 110), + (39, 109), + (40, 108), + (41, 91), +] interhand_right = [(i - 21, j + 21) for i, j in interhand_left] interhand_coco133 = interhand_right + interhand_left dataset_interhand2d = dict( - type='InterHand2DDoubleDataset', + type="InterHand2DDoubleDataset", data_root=data_root, data_mode=data_mode, - ann_file='interhand26m/annotations/all/InterHand2.6M_train_data.json', - camera_param_file='interhand26m/annotations/all/' - 'InterHand2.6M_train_camera.json', - joint_file='interhand26m/annotations/all/' - 'InterHand2.6M_train_joint_3d.json', - data_prefix=dict(img='interhand2.6m/images/train/'), + ann_file="interhand26m/annotations/all/InterHand2.6M_train_data.json", + camera_param_file="interhand26m/annotations/all/" "InterHand2.6M_train_camera.json", + joint_file="interhand26m/annotations/all/" "InterHand2.6M_train_joint_3d.json", + data_prefix=dict(img="interhand2.6m/images/train/"), sample_interval=10, pipeline=[ dict( - type='KeypointConverter', + type="KeypointConverter", num_keypoints=num_keypoints, mapping=interhand_coco133, - ), *hand_pipeline + ), + *hand_pipeline, ], ) dataset_hand = dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=[dataset_interhand2d], pipeline=[], test_mode=False, @@ -564,52 +510,42 @@ train_dataloader = dict( num_workers=4, pin_memory=False, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=train_datasets, pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=val_batch_size, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoWholeBodyDataset', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='data/detection/coco/val2017/'), + type="CocoWholeBodyDataset", + ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="data/detection/coco/val2017/"), pipeline=val_pipeline, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - test_mode=True)) + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + test_mode=True, + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-l_8xb32-270e_coco-wholebody-384x288.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-l_8xb32-270e_coco-wholebody-384x288.py index 39a6ff79d784df9518a1f457d129c6a89cfc97ca..ae8cc10ec677480d05b1b08887e5ea5e1703b7b1 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-l_8xb32-270e_coco-wholebody-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-l_8xb32-270e_coco-wholebody-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 270 @@ -10,97 +10,78 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(288, 384), - sigma=(6., 6.93), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(288, 384), sigma=(6.0, 6.93), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth' # noqa: E501 - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth", # noqa: E501 + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1024, out_channels=133, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -110,68 +91,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -179,54 +141,43 @@ train_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-l_8xb64-270e_coco-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-l_8xb64-270e_coco-wholebody-256x192.py index 9f32f25777af9d6bc8b668f61bfab76b29d9eea0..33e7284a2d713fe09a45f54849193f950a3b70e0 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-l_8xb64-270e_coco-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-l_8xb64-270e_coco-wholebody-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 270 @@ -10,97 +10,78 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth' # noqa: E501 - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth", # noqa: E501 + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1024, out_channels=133, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -110,68 +91,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -179,54 +141,43 @@ train_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-m_8xb64-270e_coco-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-m_8xb64-270e_coco-wholebody-256x192.py index 8c8c92d5f792a7516a603d326bb8e138bfe212b6..caa847303719809d6b2c167da9ea7ae7ef0cfbd3 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-m_8xb64-270e_coco-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-m_8xb64-270e_coco-wholebody-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 270 @@ -10,97 +10,78 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth' # noqa: E501 - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth", # noqa: E501 + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=133, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -110,68 +91,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -179,54 +141,43 @@ train_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-x_8xb32-270e_coco-wholebody-384x288.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-x_8xb32-270e_coco-wholebody-384x288.py index 55b11c419ae49b9f8e8e9a579ff89057c6b0ba0f..533c577eb3d261d9a290598d249ae3f13ee2c90e 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-x_8xb32-270e_coco-wholebody-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/coco-wholebody/rtmpose-x_8xb32-270e_coco-wholebody-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['mmpose::_base_/default_runtime.py'] +_base_ = ["mmpose::_base_/default_runtime.py"] # common setting num_keypoints = 133 @@ -16,163 +16,125 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=input_size, - sigma=(6., 6.93), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=input_size, sigma=(6.0, 6.93), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=1.33, widen_factor=1.25, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-x_udp-body7_210e-384x288-d28b58e6_20230529.pth' # noqa: E501 - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-x_udp-body7_210e-384x288-d28b58e6_20230529.pth", # noqa: E501 + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1280, out_channels=num_keypoints, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -180,54 +142,43 @@ train_dataloader = dict( batch_size=train_batch_size, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=val_batch_size, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py index 203766402c5189559095290f49b1c376d444a63e..8fec1cda675836876c8eaba4e4c3761a7f7c66e2 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 270 @@ -12,113 +12,107 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 150 to 300 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(288, 384), - sigma=(6., 6.93), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(288, 384), sigma=(6.0, 6.93), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmpose/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth' # noqa: E501 - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmpose/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth", # noqa: E501 + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1024, out_channels=133, - input_size=codec['input_size'], + input_size=codec["input_size"], in_featuremap_size=(9, 12), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'UBody2dDataset' -data_mode = 'topdown' -data_root = 'data/UBody/' +dataset_type = "UBody2dDataset" +data_mode = "topdown" +data_root = "data/UBody/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") scenes = [ - 'Magic_show', 'Entertainment', 'ConductMusic', 'Online_class', 'TalkShow', - 'Speech', 'Fitness', 'Interview', 'Olympic', 'TVShow', 'Singing', - 'SignLanguage', 'Movie', 'LiveVlog', 'VideoConference' + "Magic_show", + "Entertainment", + "ConductMusic", + "Online_class", + "TalkShow", + "Speech", + "Fitness", + "Interview", + "Olympic", + "TVShow", + "Singing", + "SignLanguage", + "Movie", + "LiveVlog", + "VideoConference", ] train_datasets = [ dict( - type='CocoWholeBodyDataset', - data_root='data/coco/', + type="CocoWholeBodyDataset", + data_root="data/coco/", data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), - pipeline=[]) + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + ) ] for scene in scenes: @@ -126,76 +120,58 @@ for scene in scenes: type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file=f'annotations/{scene}/train_annotations.json', - data_prefix=dict(img='images/'), + ann_file=f"annotations/{scene}/train_annotations.json", + data_prefix=dict(img="images/"), pipeline=[], - sample_interval=10) + sample_interval=10, + ) train_datasets.append(train_dataset) # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -203,54 +179,43 @@ train_dataloader = dict( batch_size=train_batch_size, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=train_datasets, pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=val_batch_size, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoWholeBodyDataset', + type="CocoWholeBodyDataset", data_root=data_root, data_mode=data_mode, - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py index 66c42ad8a80a48ee4784bcffb769ebbb157545f3..0291e1a05725d975139cecda8ec34ff886300970 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-l_8xb64-270e_coco-ubody-wholebody-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 270 @@ -12,113 +12,107 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 150 to 300 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmpose/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth' # noqa: E501 - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmpose/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth", # noqa: E501 + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1024, out_channels=133, - input_size=codec['input_size'], + input_size=codec["input_size"], in_featuremap_size=(6, 8), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'UBody2dDataset' -data_mode = 'topdown' -data_root = 'data/UBody/' +dataset_type = "UBody2dDataset" +data_mode = "topdown" +data_root = "data/UBody/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") scenes = [ - 'Magic_show', 'Entertainment', 'ConductMusic', 'Online_class', 'TalkShow', - 'Speech', 'Fitness', 'Interview', 'Olympic', 'TVShow', 'Singing', - 'SignLanguage', 'Movie', 'LiveVlog', 'VideoConference' + "Magic_show", + "Entertainment", + "ConductMusic", + "Online_class", + "TalkShow", + "Speech", + "Fitness", + "Interview", + "Olympic", + "TVShow", + "Singing", + "SignLanguage", + "Movie", + "LiveVlog", + "VideoConference", ] train_datasets = [ dict( - type='CocoWholeBodyDataset', - data_root='data/coco/', + type="CocoWholeBodyDataset", + data_root="data/coco/", data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), - pipeline=[]) + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + ) ] for scene in scenes: @@ -126,76 +120,58 @@ for scene in scenes: type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file=f'annotations/{scene}/train_annotations.json', - data_prefix=dict(img='images/'), + ann_file=f"annotations/{scene}/train_annotations.json", + data_prefix=dict(img="images/"), pipeline=[], - sample_interval=10) + sample_interval=10, + ) train_datasets.append(train_dataset) # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -203,54 +179,43 @@ train_dataloader = dict( batch_size=train_batch_size, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=train_datasets, pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=val_batch_size, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoWholeBodyDataset', + type="CocoWholeBodyDataset", data_root=data_root, data_mode=data_mode, - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-m_8xb64-270e_coco-ubody-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-m_8xb64-270e_coco-ubody-wholebody-256x192.py index 0856fbbe9bf361df1e28d9690d0ea05f5c70ebc8..c0998ab401a9549d7cccfb2bb54e65b70d97926d 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-m_8xb64-270e_coco-ubody-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-m_8xb64-270e_coco-ubody-wholebody-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 270 @@ -12,113 +12,107 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 150 to 300 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmpose/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth' # noqa: E501 - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmpose/cspnext-m_udp-aic-coco_210e-256x192-f2f7d6f6_20230130.pth", # noqa: E501 + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=768, out_channels=133, - input_size=codec['input_size'], + input_size=codec["input_size"], in_featuremap_size=(6, 8), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'UBody2dDataset' -data_mode = 'topdown' -data_root = 'data/UBody/' +dataset_type = "UBody2dDataset" +data_mode = "topdown" +data_root = "data/UBody/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") scenes = [ - 'Magic_show', 'Entertainment', 'ConductMusic', 'Online_class', 'TalkShow', - 'Speech', 'Fitness', 'Interview', 'Olympic', 'TVShow', 'Singing', - 'SignLanguage', 'Movie', 'LiveVlog', 'VideoConference' + "Magic_show", + "Entertainment", + "ConductMusic", + "Online_class", + "TalkShow", + "Speech", + "Fitness", + "Interview", + "Olympic", + "TVShow", + "Singing", + "SignLanguage", + "Movie", + "LiveVlog", + "VideoConference", ] train_datasets = [ dict( - type='CocoWholeBodyDataset', - data_root='data/coco/', + type="CocoWholeBodyDataset", + data_root="data/coco/", data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), - pipeline=[]) + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + ) ] for scene in scenes: @@ -126,76 +120,58 @@ for scene in scenes: type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file=f'annotations/{scene}/train_annotations.json', - data_prefix=dict(img='images/'), + ann_file=f"annotations/{scene}/train_annotations.json", + data_prefix=dict(img="images/"), pipeline=[], - sample_interval=10) + sample_interval=10, + ) train_datasets.append(train_dataset) # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -203,54 +179,43 @@ train_dataloader = dict( batch_size=train_batch_size, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=train_datasets, pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=val_batch_size, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoWholeBodyDataset', + type="CocoWholeBodyDataset", data_root=data_root, data_mode=data_mode, - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py index 66562ee8671b2d79e0a20d9903dcb1aa41aad3e2..c1105a3a7c6d8d9cdd9fe6379cc0be392b4736ab 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-s_8xb64-270e_coco-ubody-wholebody-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 270 @@ -12,113 +12,107 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 150 to 300 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.33, widen_factor=0.5, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmpose/cspnext-s_udp-aic-coco_210e-256x192-92f5a029_20230130.pth' # noqa: E501 - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmpose/cspnext-s_udp-aic-coco_210e-256x192-92f5a029_20230130.pth", # noqa: E501 + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=512, out_channels=133, - input_size=codec['input_size'], + input_size=codec["input_size"], in_featuremap_size=(6, 8), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'UBody2dDataset' -data_mode = 'topdown' -data_root = 'data/UBody/' +dataset_type = "UBody2dDataset" +data_mode = "topdown" +data_root = "data/UBody/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") scenes = [ - 'Magic_show', 'Entertainment', 'ConductMusic', 'Online_class', 'TalkShow', - 'Speech', 'Fitness', 'Interview', 'Olympic', 'TVShow', 'Singing', - 'SignLanguage', 'Movie', 'LiveVlog', 'VideoConference' + "Magic_show", + "Entertainment", + "ConductMusic", + "Online_class", + "TalkShow", + "Speech", + "Fitness", + "Interview", + "Olympic", + "TVShow", + "Singing", + "SignLanguage", + "Movie", + "LiveVlog", + "VideoConference", ] train_datasets = [ dict( - type='CocoWholeBodyDataset', - data_root='data/coco/', + type="CocoWholeBodyDataset", + data_root="data/coco/", data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), - pipeline=[]) + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + ) ] for scene in scenes: @@ -126,76 +120,58 @@ for scene in scenes: type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file=f'annotations/{scene}/train_annotations.json', - data_prefix=dict(img='images/'), + ann_file=f"annotations/{scene}/train_annotations.json", + data_prefix=dict(img="images/"), pipeline=[], - sample_interval=10) + sample_interval=10, + ) train_datasets.append(train_dataset) # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -203,54 +179,43 @@ train_dataloader = dict( batch_size=train_batch_size, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=train_datasets, pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=val_batch_size, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoWholeBodyDataset', + type="CocoWholeBodyDataset", data_root=data_root, data_mode=data_mode, - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-t_8xb64-270e_coco-ubody-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-t_8xb64-270e_coco-ubody-wholebody-256x192.py index beb10b16f315961e2ce7b8dd6506bd3717ea7023..a2489afc178dd3495ab19cb0a8b8c645c33c06a5 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-t_8xb64-270e_coco-ubody-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-t_8xb64-270e_coco-ubody-wholebody-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 270 @@ -12,113 +12,107 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ - dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 150 to 300 epoch - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.167, widen_factor=0.375, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmpose/cspnext-tiny_udp-aic-coco_210e-256x192-cbed682d_20230130.pth' # noqa: E501 - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmpose/cspnext-tiny_udp-aic-coco_210e-256x192-cbed682d_20230130.pth", # noqa: E501 + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=384, out_channels=133, - input_size=codec['input_size'], + input_size=codec["input_size"], in_featuremap_size=(6, 8), - simcc_split_ratio=codec['simcc_split_ratio'], + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'UBody2dDataset' -data_mode = 'topdown' -data_root = 'data/UBody/' +dataset_type = "UBody2dDataset" +data_mode = "topdown" +data_root = "data/UBody/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") scenes = [ - 'Magic_show', 'Entertainment', 'ConductMusic', 'Online_class', 'TalkShow', - 'Speech', 'Fitness', 'Interview', 'Olympic', 'TVShow', 'Singing', - 'SignLanguage', 'Movie', 'LiveVlog', 'VideoConference' + "Magic_show", + "Entertainment", + "ConductMusic", + "Online_class", + "TalkShow", + "Speech", + "Fitness", + "Interview", + "Olympic", + "TVShow", + "Singing", + "SignLanguage", + "Movie", + "LiveVlog", + "VideoConference", ] train_datasets = [ dict( - type='CocoWholeBodyDataset', - data_root='data/coco/', + type="CocoWholeBodyDataset", + data_root="data/coco/", data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), - pipeline=[]) + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + ) ] for scene in scenes: @@ -126,76 +120,58 @@ for scene in scenes: type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file=f'annotations/{scene}/train_annotations.json', - data_prefix=dict(img='images/'), + ann_file=f"annotations/{scene}/train_annotations.json", + data_prefix=dict(img="images/"), pipeline=[], - sample_interval=10) + sample_interval=10, + ) train_datasets.append(train_dataset) # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -203,54 +179,43 @@ train_dataloader = dict( batch_size=train_batch_size, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=train_datasets, pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=val_batch_size, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoWholeBodyDataset', + type="CocoWholeBodyDataset", data_root=data_root, data_mode=data_mode, - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-x_8xb32-270e_coco-ubody-wholebody-384x288.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-x_8xb32-270e_coco-ubody-wholebody-384x288.py index 695f64089720ce87e765785d04815110930f00bd..89583aad656f8d60e67cfe45023a868c855918af 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-x_8xb32-270e_coco-ubody-wholebody-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-x_8xb32-270e_coco-ubody-wholebody-384x288.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # common setting num_keypoints = 133 @@ -16,113 +16,107 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=input_size, - sigma=(6., 6.93), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=input_size, sigma=(6.0, 6.93), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=1.33, widen_factor=1.25, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-x_udp-body7_210e-384x288-d28b58e6_20230529.pth' # noqa: E501 - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-x_udp-body7_210e-384x288-d28b58e6_20230529.pth", # noqa: E501 + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1280, out_channels=num_keypoints, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'UBody2dDataset' -data_mode = 'topdown' -data_root = 'data/UBody/' +dataset_type = "UBody2dDataset" +data_mode = "topdown" +data_root = "data/UBody/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") scenes = [ - 'Magic_show', 'Entertainment', 'ConductMusic', 'Online_class', 'TalkShow', - 'Speech', 'Fitness', 'Interview', 'Olympic', 'TVShow', 'Singing', - 'SignLanguage', 'Movie', 'LiveVlog', 'VideoConference' + "Magic_show", + "Entertainment", + "ConductMusic", + "Online_class", + "TalkShow", + "Speech", + "Fitness", + "Interview", + "Olympic", + "TVShow", + "Singing", + "SignLanguage", + "Movie", + "LiveVlog", + "VideoConference", ] train_datasets = [ dict( - type='CocoWholeBodyDataset', - data_root='data/coco/', + type="CocoWholeBodyDataset", + data_root="data/coco/", data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), - pipeline=[]) + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + ) ] for scene in scenes: @@ -130,76 +124,58 @@ for scene in scenes: type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file=f'annotations/{scene}/train_annotations.json', - data_prefix=dict(img='images/'), + ann_file=f"annotations/{scene}/train_annotations.json", + data_prefix=dict(img="images/"), pipeline=[], - sample_interval=10) + sample_interval=10, + ) train_datasets.append(train_dataset) # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -207,54 +183,43 @@ train_dataloader = dict( batch_size=train_batch_size, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=train_datasets, pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=val_batch_size, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoWholeBodyDataset', + type="CocoWholeBodyDataset", data_root=data_root, data_mode=data_mode, - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-x_8xb64-270e_coco-ubody-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-x_8xb64-270e_coco-ubody-wholebody-256x192.py index 30f1015394dffdbd8d0c313375e50c1fb472da07..b2ef9bac93b8a40a9c0381ffee5ebe4be163433d 100644 --- a/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-x_8xb64-270e_coco-ubody-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/rtmpose/ubody/rtmpose-x_8xb64-270e_coco-ubody-wholebody-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # common setting num_keypoints = 133 @@ -16,113 +16,107 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), clip_grad=dict(max_norm=35, norm_type=2), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='SimCCLabel', - input_size=(192, 256), - sigma=(4.9, 5.66), - simcc_split_ratio=2.0, - normalize=False, - use_dark=False) +codec = dict(type="SimCCLabel", input_size=(192, 256), sigma=(4.9, 5.66), simcc_split_ratio=2.0, normalize=False, use_dark=False) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=1.33, widen_factor=1.25, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmpose/v1/projects/' - 'rtmposev1/cspnext-x_udp-body7_210e-384x288-d28b58e6_20230529.pth' # noqa: E501 - )), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmpose/v1/projects/" + "rtmposev1/cspnext-x_udp-body7_210e-384x288-d28b58e6_20230529.pth", # noqa: E501 + ), + ), head=dict( - type='RTMCCHead', + type="RTMCCHead", in_channels=1280, out_channels=num_keypoints, - input_size=codec['input_size'], - in_featuremap_size=tuple([s // 32 for s in codec['input_size']]), - simcc_split_ratio=codec['simcc_split_ratio'], + input_size=codec["input_size"], + in_featuremap_size=tuple([s // 32 for s in codec["input_size"]]), + simcc_split_ratio=codec["simcc_split_ratio"], final_layer_kernel_size=7, gau_cfg=dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - use_rel_bias=False, - pos_enc=False), - loss=dict( - type='KLDiscretLoss', - use_target_weight=True, - beta=10., - label_softmax=True), - decoder=codec), - test_cfg=dict(flip_test=True, )) + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc=False + ), + loss=dict(type="KLDiscretLoss", use_target_weight=True, beta=10.0, label_softmax=True), + decoder=codec, + ), + test_cfg=dict( + flip_test=True, + ), +) # base dataset settings -dataset_type = 'UBody2dDataset' -data_mode = 'topdown' -data_root = 'data/UBody/' +dataset_type = "UBody2dDataset" +data_mode = "topdown" +data_root = "data/UBody/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") scenes = [ - 'Magic_show', 'Entertainment', 'ConductMusic', 'Online_class', 'TalkShow', - 'Speech', 'Fitness', 'Interview', 'Olympic', 'TVShow', 'Singing', - 'SignLanguage', 'Movie', 'LiveVlog', 'VideoConference' + "Magic_show", + "Entertainment", + "ConductMusic", + "Online_class", + "TalkShow", + "Speech", + "Fitness", + "Interview", + "Olympic", + "TVShow", + "Singing", + "SignLanguage", + "Movie", + "LiveVlog", + "VideoConference", ] train_datasets = [ dict( - type='CocoWholeBodyDataset', - data_root='data/coco/', + type="CocoWholeBodyDataset", + data_root="data/coco/", data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), - pipeline=[]) + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + ) ] for scene in scenes: @@ -130,76 +124,58 @@ for scene in scenes: type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file=f'annotations/{scene}/train_annotations.json', - data_prefix=dict(img='images/'), + ann_file=f"annotations/{scene}/train_annotations.json", + data_prefix=dict(img="images/"), pipeline=[], - sample_interval=10) + sample_interval=10, + ) train_datasets.append(train_dataset) # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.5, 1.5], rotate_factor=90), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.5, 1.5], - rotate_factor=90), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), - dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -207,54 +183,43 @@ train_dataloader = dict( batch_size=train_batch_size, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=train_datasets, pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=val_batch_size, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoWholeBodyDataset', + type="CocoWholeBodyDataset", data_root=data_root, data_mode=data_mode, - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json', - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - data_prefix=dict(img='coco/val2017/'), + ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json", + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + data_prefix=dict(img="coco/val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/cspnext-l_udp_8xb64-210e_coco-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/cspnext-l_udp_8xb64-210e_coco-wholebody-256x192.py index 7182e7a3ed0f235cad12e512008689606ddb8d5c..2ad94f63d9a8ff2716110b3a0acb2c63430a40c6 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/cspnext-l_udp_8xb64-210e_coco-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/cspnext-l_udp_8xb64-210e_coco-wholebody-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,79 +10,70 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, - deepen_factor=1., - widen_factor=1., - out_indices=(4, ), + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/' - 'cspnext-l_8xb256-rsb-a1-600e_in1k-6a760974.pth')), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/" + "cspnext-l_8xb256-rsb-a1-600e_in1k-6a760974.pth", + ), + ), head=dict( - type='HeatmapHead', - in_channels=1024, - out_channels=133, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=1024, out_channels=133, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=False, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -92,68 +83,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -161,52 +133,42 @@ train_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/cspnext-m_udp_8xb64-210e_coco-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/cspnext-m_udp_8xb64-210e_coco-wholebody-256x192.py index 05fae649b8fe7d698255255531e878d954734edd..2189bb9ce0c68ce9fcb750d32a1317a3e4ee77aa 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/cspnext-m_udp_8xb64-210e_coco-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/cspnext-m_udp_8xb64-210e_coco-wholebody-256x192.py @@ -1,4 +1,4 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime max_epochs = 210 @@ -10,79 +10,70 @@ randomness = dict(seed=21) # optimizer optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), - paramwise_cfg=dict( - norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) + type="OptimWrapper", + optimizer=dict(type="AdamW", lr=base_lr, weight_decay=0.05), + paramwise_cfg=dict(norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True), +) # learning rate param_scheduler = [ + dict(type="LinearLR", start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( - type='LinearLR', - start_factor=1.0e-5, - by_epoch=False, - begin=0, - end=1000), - dict( - type='CosineAnnealingLR', + type="CosineAnnealingLR", eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, - convert_to_iter_based=True), + convert_to_iter_based=True, + ), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # codec settings -codec = dict( - type='UDPHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - _scope_='mmdet', - type='CSPNeXt', - arch='P5', + _scope_="mmdet", + type="CSPNeXt", + arch="P5", expand_ratio=0.5, deepen_factor=0.67, widen_factor=0.75, - out_indices=(4, ), + out_indices=(4,), channel_attention=True, - norm_cfg=dict(type='SyncBN'), - act_cfg=dict(type='SiLU'), + norm_cfg=dict(type="SyncBN"), + act_cfg=dict(type="SiLU"), init_cfg=dict( - type='Pretrained', - prefix='backbone.', - checkpoint='https://download.openmmlab.com/mmdetection/v3.0/' - 'rtmdet/cspnext_rsb_pretrain/' - 'cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth')), + type="Pretrained", + prefix="backbone.", + checkpoint="https://download.openmmlab.com/mmdetection/v3.0/" + "rtmdet/cspnext_rsb_pretrain/" + "cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth", + ), + ), head=dict( - type='HeatmapHead', - in_channels=768, - out_channels=133, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=768, out_channels=133, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=False, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=False, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" -backend_args = dict(backend='local') +backend_args = dict(backend="local") # backend_args = dict( # backend='petrel', # path_mapping=dict({ @@ -92,68 +83,49 @@ backend_args = dict(backend='local') # pipelines train_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', scale_factor=[0.6, 1.4], rotate_factor=80), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", scale_factor=[0.6, 1.4], rotate_factor=80), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=1.0), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=1.0), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImage', backend_args=backend_args), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - shift_factor=0., - scale_factor=[0.75, 1.25], - rotate_factor=60), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='mmdet.YOLOXHSVRandomAug'), + dict(type="LoadImage", backend_args=backend_args), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", shift_factor=0.0, scale_factor=[0.75, 1.25], rotate_factor=60), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="mmdet.YOLOXHSVRandomAug"), dict( - type='Albumentation', + type="Albumentation", transforms=[ - dict(type='Blur', p=0.1), - dict(type='MedianBlur', p=0.1), - dict( - type='CoarseDropout', - max_holes=1, - max_height=0.4, - max_width=0.4, - min_holes=1, - min_height=0.2, - min_width=0.2, - p=0.5), - ]), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="Blur", p=0.1), + dict(type="MedianBlur", p=0.1), + dict(type="CoarseDropout", max_holes=1, max_height=0.4, max_width=0.4, min_holes=1, min_height=0.2, min_width=0.2, p=0.5), + ], + ), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] # data loaders @@ -161,52 +133,42 @@ train_dataloader = dict( batch_size=64, num_workers=10, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=10, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader # hooks -default_hooks = dict( - checkpoint=dict( - save_best='coco-wholebody/AP', rule='greater', max_keep_ckpts=1)) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater", max_keep_ckpts=1)) custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='mmdet.PipelineSwitchHook', - switch_epoch=max_epochs - stage2_num_epochs, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="mmdet.PipelineSwitchHook", switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2), ] # evaluators -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w32_8xb64-210e_coco-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w32_8xb64-210e_coco-wholebody-256x192.py index 2595e3fc13e6913a01af45fa8d7b9c6377511ddb..a9b0a4f697ec69c81590cf7df490c8ff44f73d65 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w32_8xb64-210e_coco-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w32_8xb64-210e_coco-wholebody-256x192.py @@ -1,114 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=133, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -116,35 +84,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w32_8xb64-210e_coco-wholebody-384x288.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w32_8xb64-210e_coco-wholebody-384x288.py index 727fa9472ec9c446cb572e6c4fcd49976bf3916b..5b019a4d2bb64d15444e7d16c01d09b1a7814071 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w32_8xb64-210e_coco-wholebody-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w32_8xb64-210e_coco-wholebody-384x288.py @@ -1,114 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=133, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -116,35 +84,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w32_dark-8xb64-210e_coco-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w32_dark-8xb64-210e_coco-wholebody-256x192.py index ffee1d1383e4757b79ed0ea4461c69d7b4247b15..264c0ccf36a273eea8be1015c8fc30b7bc5e5976 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w32_dark-8xb64-210e_coco-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w32_dark-8xb64-210e_coco-wholebody-256x192.py @@ -1,118 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=133, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -120,35 +84,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w48_8xb32-210e_coco-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w48_8xb32-210e_coco-wholebody-256x192.py index 892b4b7936123840c3192e87491123e5f11b3f7f..8c4ec167578d4addeced4e6dcfe8d064023f2f45 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w48_8xb32-210e_coco-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w48_8xb32-210e_coco-wholebody-256x192.py @@ -1,114 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=133, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -116,35 +84,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w48_8xb32-210e_coco-wholebody-384x288.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w48_8xb32-210e_coco-wholebody-384x288.py index d587dbc45bf2f90a3912e263d63d0dc64205298a..7b4e44181d07210c68a4b394a4865612b03f959c 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w48_8xb32-210e_coco-wholebody-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w48_8xb32-210e_coco-wholebody-384x288.py @@ -1,114 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=133, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -116,35 +84,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w48_dark-8xb32-210e_coco-wholebody-384x288.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w48_dark-8xb32-210e_coco-wholebody-384x288.py index 63175b99ea3e604fb87e1e45ef921aee2e7a1b16..8f3d211b70dfbc6e359a1a7397b971954b371ce7 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w48_dark-8xb32-210e_coco-wholebody-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_hrnet-w48_dark-8xb32-210e_coco-wholebody-384x288.py @@ -1,118 +1,82 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(288, 384), - heatmap_size=(72, 96), - sigma=3, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(48, 96)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(48, 96, 192)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(48, 96, 192, 384))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w48-8ef0771d.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(48, 96)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w48-8ef0771d.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=48, out_channels=133, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -120,35 +84,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res101_8xb32-210e_coco-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res101_8xb32-210e_coco-wholebody-256x192.py index c0d8187ab47b54d445d3f125da596f381c494309..3310ba88142224d6e1f54e8321a9659e1666abab 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res101_8xb32-210e_coco-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res101_8xb32-210e_coco-wholebody-256x192.py @@ -1,85 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=133, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=133, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,35 +73,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res101_8xb32-210e_coco-wholebody-384x288.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res101_8xb32-210e_coco-wholebody-384x288.py index 42e98575fba714ab65f3f19f226b5c06c2898a93..0f3016d01ce388331db0fd2b6f325c8d17d02422 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res101_8xb32-210e_coco-wholebody-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res101_8xb32-210e_coco-wholebody-384x288.py @@ -1,85 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=256) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=101, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet101'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=133, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=133, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,35 +73,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res152_8xb32-210e_coco-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res152_8xb32-210e_coco-wholebody-256x192.py index 10c16eb71f9ac28ea6746e85d51ed526dd035abe..54420cd2316de4824f9c2baa34e5217fcae73733 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res152_8xb32-210e_coco-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res152_8xb32-210e_coco-wholebody-256x192.py @@ -1,85 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=133, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=133, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,35 +73,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res152_8xb32-210e_coco-wholebody-384x288.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res152_8xb32-210e_coco-wholebody-384x288.py index 43ec5fb67c23df4e5e3d1c93072c41e0d08b88a6..7a9d987a172e224fdedd32c17365f0e9d57e46c4 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res152_8xb32-210e_coco-wholebody-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res152_8xb32-210e_coco-wholebody-384x288.py @@ -1,85 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=152, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet152'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet152"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=133, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=133, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,35 +73,34 @@ train_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res50_8xb64-210e_coco-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res50_8xb64-210e_coco-wholebody-256x192.py index e568c78b175bf3cc3364235c671d04944d84c53f..416d38b50c159fe34e466005fcf1362189d47bc5 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res50_8xb64-210e_coco-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res50_8xb64-210e_coco-wholebody-256x192.py @@ -1,85 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=133, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=133, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,35 +73,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res50_8xb64-210e_coco-wholebody-384x288.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res50_8xb64-210e_coco-wholebody-384x288.py index 6869d17ba998b7918133eefcf98fc3344e729a26..cb8ffbc6014d897f9f82d0d600f85495054e6e70 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res50_8xb64-210e_coco-wholebody-384x288.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_res50_8xb64-210e_coco-wholebody-384x288.py @@ -1,85 +1,71 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(288, 384), heatmap_size=(72, 96), sigma=3) +codec = dict(type="MSRAHeatmap", input_size=(288, 384), heatmap_size=(72, 96), sigma=3) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ResNet', + type="ResNet", depth=50, - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'), + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), ), head=dict( - type='HeatmapHead', - in_channels=2048, - out_channels=133, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="HeatmapHead", in_channels=2048, out_channels=133, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -87,35 +73,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-mbv3_8xb64-210e_coco-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-mbv3_8xb64-210e_coco-wholebody-256x192.py index cad9c539bef73cce6fc8e48e9d91489ea9f72270..f9924a0154f8081e7e15ae410b9b237ba9b50085 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-mbv3_8xb64-210e_coco-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-mbv3_8xb64-210e_coco-wholebody-256x192.py @@ -1,86 +1,73 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict(type='ViPNAS_MobileNetV3'), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ViPNAS_MobileNetV3"), head=dict( - type='ViPNASHead', + type="ViPNASHead", in_channels=160, out_channels=133, deconv_out_channels=(160, 160, 160), deconv_num_groups=(160, 160, 160), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -88,35 +75,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-mbv3_dark-8xb64-210e_coco-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-mbv3_dark-8xb64-210e_coco-wholebody-256x192.py index d34ea50db64b6a2716469ebf872045b9308fc413..d22604d0380315fc1ea110622ba1030157ca3e92 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-mbv3_dark-8xb64-210e_coco-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-mbv3_dark-8xb64-210e_coco-wholebody-256x192.py @@ -1,90 +1,73 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), - backbone=dict(type='ViPNAS_MobileNetV3'), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), + backbone=dict(type="ViPNAS_MobileNetV3"), head=dict( - type='ViPNASHead', + type="ViPNASHead", in_channels=160, out_channels=133, deconv_out_channels=(160, 160, 160), deconv_num_groups=(160, 160, 160), - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -92,35 +75,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-res50_8xb64-210e_coco-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-res50_8xb64-210e_coco-wholebody-256x192.py index 822e4c698a54a82a62fd30f6cc891f814d024930..439a2ffed0c39b8acda777004ffa407d4ced5198 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-res50_8xb64-210e_coco-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-res50_8xb64-210e_coco-wholebody-256x192.py @@ -1,87 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ViPNAS_ResNet', + type="ViPNAS_ResNet", depth=50, ), head=dict( - type='ViPNASHead', - in_channels=608, - out_channels=133, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="ViPNASHead", in_channels=608, out_channels=133, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -89,35 +72,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-res50_dark-8xb64-210e_coco-wholebody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-res50_dark-8xb64-210e_coco-wholebody-256x192.py index 15b152fe96d3d60806c3461a04a0a4b5c66b3c96..7cd1180da4c867deda67c4bd6469a57c087e0206 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-res50_dark-8xb64-210e_coco-wholebody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/coco-wholebody/td-hm_vipnas-res50_dark-8xb64-210e_coco-wholebody-256x192.py @@ -1,91 +1,70 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', - input_size=(192, 256), - heatmap_size=(48, 64), - sigma=2, - unbiased=True) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2, unbiased=True) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='ViPNAS_ResNet', + type="ViPNAS_ResNet", depth=50, ), head=dict( - type='ViPNASHead', - in_channels=608, - out_channels=133, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + type="ViPNASHead", in_channels=608, out_channels=133, loss=dict(type="KeypointMSELoss", use_target_weight=True), decoder=codec + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'CocoWholeBodyDataset' -data_mode = 'topdown' -data_root = 'data/coco/' +dataset_type = "CocoWholeBodyDataset" +data_mode = "topdown" +data_root = "data/coco/" # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict( - type='RandomBBoxTransform', - rotate_factor=60, - scale_factor=(0.75, 1.25)), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform", rotate_factor=60, scale_factor=(0.75, 1.25)), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -93,35 +72,34 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), pipeline=train_pipeline, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file='annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="val2017/"), test_mode=True, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", pipeline=val_pipeline, - )) + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file=data_root + 'annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file=data_root + "annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/ubody2d/td-hm_hrnet-w32_8xb64-210e_ubody-256x192.py b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/ubody2d/td-hm_hrnet-w32_8xb64-210e_ubody-256x192.py index 055484d0097a1f1538cc67de6062f19067c84a7c..8098952d4a4382518c6202178132b60d1f59fd61 100644 --- a/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/ubody2d/td-hm_hrnet-w32_8xb64-210e_ubody-256x192.py +++ b/mmpose/configs/wholebody_2d_keypoint/topdown_heatmap/ubody2d/td-hm_hrnet-w32_8xb64-210e_ubody-256x192.py @@ -1,112 +1,93 @@ -_base_ = ['../../../_base_/default_runtime.py'] +_base_ = ["../../../_base_/default_runtime.py"] # runtime train_cfg = dict(max_epochs=210, val_interval=10) # optimizer -optim_wrapper = dict(optimizer=dict( - type='Adam', - lr=5e-4, -)) +optim_wrapper = dict( + optimizer=dict( + type="Adam", + lr=5e-4, + ) +) # learning policy param_scheduler = [ - dict( - type='LinearLR', begin=0, end=500, start_factor=0.001, - by_epoch=False), # warm-up - dict( - type='MultiStepLR', - begin=0, - end=210, - milestones=[170, 200], - gamma=0.1, - by_epoch=True) + dict(type="LinearLR", begin=0, end=500, start_factor=0.001, by_epoch=False), # warm-up + dict(type="MultiStepLR", begin=0, end=210, milestones=[170, 200], gamma=0.1, by_epoch=True), ] # automatically scaling LR based on the actual training batch size auto_scale_lr = dict(base_batch_size=512) # hooks -default_hooks = dict( - checkpoint=dict(save_best='coco-wholebody/AP', rule='greater')) +default_hooks = dict(checkpoint=dict(save_best="coco-wholebody/AP", rule="greater")) # codec settings -codec = dict( - type='MSRAHeatmap', input_size=(192, 256), heatmap_size=(48, 64), sigma=2) +codec = dict(type="MSRAHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2) # model settings model = dict( - type='TopdownPoseEstimator', - data_preprocessor=dict( - type='PoseDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True), + type="TopdownPoseEstimator", + data_preprocessor=dict(type="PoseDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True), backbone=dict( - type='HRNet', + type="HRNet", in_channels=3, extra=dict( - stage1=dict( - num_modules=1, - num_branches=1, - block='BOTTLENECK', - num_blocks=(4, ), - num_channels=(64, )), - stage2=dict( - num_modules=1, - num_branches=2, - block='BASIC', - num_blocks=(4, 4), - num_channels=(32, 64)), - stage3=dict( - num_modules=4, - num_branches=3, - block='BASIC', - num_blocks=(4, 4, 4), - num_channels=(32, 64, 128)), - stage4=dict( - num_modules=3, - num_branches=4, - block='BASIC', - num_blocks=(4, 4, 4, 4), - num_channels=(32, 64, 128, 256))), - init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmpose/' - 'pretrain_models/hrnet_w32-36af842e.pth'), + stage1=dict(num_modules=1, num_branches=1, block="BOTTLENECK", num_blocks=(4,), num_channels=(64,)), + stage2=dict(num_modules=1, num_branches=2, block="BASIC", num_blocks=(4, 4), num_channels=(32, 64)), + stage3=dict(num_modules=4, num_branches=3, block="BASIC", num_blocks=(4, 4, 4), num_channels=(32, 64, 128)), + stage4=dict(num_modules=3, num_branches=4, block="BASIC", num_blocks=(4, 4, 4, 4), num_channels=(32, 64, 128, 256)), + ), + init_cfg=dict(type="Pretrained", checkpoint="https://download.openmmlab.com/mmpose/" "pretrain_models/hrnet_w32-36af842e.pth"), ), head=dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=32, out_channels=133, deconv_out_channels=None, - loss=dict(type='KeypointMSELoss', use_target_weight=True), - decoder=codec), + loss=dict(type="KeypointMSELoss", use_target_weight=True), + decoder=codec, + ), test_cfg=dict( flip_test=True, - flip_mode='heatmap', + flip_mode="heatmap", shift_heatmap=True, - )) + ), +) # base dataset settings -dataset_type = 'UBody2dDataset' -data_mode = 'topdown' -data_root = 'data/UBody/' +dataset_type = "UBody2dDataset" +data_mode = "topdown" +data_root = "data/UBody/" scenes = [ - 'Magic_show', 'Entertainment', 'ConductMusic', 'Online_class', 'TalkShow', - 'Speech', 'Fitness', 'Interview', 'Olympic', 'TVShow', 'Singing', - 'SignLanguage', 'Movie', 'LiveVlog', 'VideoConference' + "Magic_show", + "Entertainment", + "ConductMusic", + "Online_class", + "TalkShow", + "Speech", + "Fitness", + "Interview", + "Olympic", + "TVShow", + "Singing", + "SignLanguage", + "Movie", + "LiveVlog", + "VideoConference", ] train_datasets = [ dict( - type='CocoWholeBodyDataset', - data_root='data/coco/', + type="CocoWholeBodyDataset", + data_root="data/coco/", data_mode=data_mode, - ann_file='annotations/coco_wholebody_train_v1.0.json', - data_prefix=dict(img='train2017/'), - pipeline=[]) + ann_file="annotations/coco_wholebody_train_v1.0.json", + data_prefix=dict(img="train2017/"), + pipeline=[], + ) ] for scene in scenes: @@ -114,28 +95,29 @@ for scene in scenes: type=dataset_type, data_root=data_root, data_mode=data_mode, - ann_file=f'annotations/{scene}/train_annotations.json', - data_prefix=dict(img='images/'), + ann_file=f"annotations/{scene}/train_annotations.json", + data_prefix=dict(img="images/"), pipeline=[], - sample_interval=10) + sample_interval=10, + ) train_datasets.append(train_dataset) # pipelines train_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='RandomFlip', direction='horizontal'), - dict(type='RandomHalfBody'), - dict(type='RandomBBoxTransform'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='GenerateTarget', encoder=codec), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="RandomFlip", direction="horizontal"), + dict(type="RandomHalfBody"), + dict(type="RandomBBoxTransform"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="GenerateTarget", encoder=codec), + dict(type="PackPoseInputs"), ] val_pipeline = [ - dict(type='LoadImage'), - dict(type='GetBBoxCenterScale'), - dict(type='TopdownAffine', input_size=codec['input_size']), - dict(type='PackPoseInputs') + dict(type="LoadImage"), + dict(type="GetBBoxCenterScale"), + dict(type="TopdownAffine", input_size=codec["input_size"]), + dict(type="PackPoseInputs"), ] # data loaders @@ -143,31 +125,31 @@ train_dataloader = dict( batch_size=64, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='CombinedDataset', - metainfo=dict(from_file='configs/_base_/datasets/coco_wholebody.py'), + type="CombinedDataset", + metainfo=dict(from_file="configs/_base_/datasets/coco_wholebody.py"), datasets=train_datasets, pipeline=train_pipeline, test_mode=False, - )) + ), +) val_dataloader = dict( batch_size=32, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False, round_up=False), + sampler=dict(type="DefaultSampler", shuffle=False, round_up=False), dataset=dict( - type='CocoWholeBodyDataset', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json', - data_prefix=dict(img='data/coco/val2017/'), + type="CocoWholeBodyDataset", + ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json", + data_prefix=dict(img="data/coco/val2017/"), pipeline=val_pipeline, - bbox_file='data/coco/person_detection_results/' - 'COCO_val2017_detections_AP_H_56_person.json', - test_mode=True)) + bbox_file="data/coco/person_detection_results/" "COCO_val2017_detections_AP_H_56_person.json", + test_mode=True, + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoWholeBodyMetric', - ann_file='data/coco/annotations/coco_wholebody_val_v1.0.json') +val_evaluator = dict(type="CocoWholeBodyMetric", ann_file="data/coco/annotations/coco_wholebody_val_v1.0.json") test_evaluator = val_evaluator diff --git a/mmpose/datasets/__init__.py b/mmpose/datasets/__init__.py index b90a12db4937ffca9ff103b1e5a0c7604de52e0b..0a3010a51b35ef6316b5f8d30443af8148e1bf1f 100644 --- a/mmpose/datasets/__init__.py +++ b/mmpose/datasets/__init__.py @@ -5,4 +5,4 @@ from .datasets import * # noqa from .samplers import MultiSourceSampler from .transforms import * # noqa -__all__ = ['build_dataset', 'CombinedDataset', 'MultiSourceSampler'] +__all__ = ["build_dataset", "CombinedDataset", "MultiSourceSampler"] diff --git a/mmpose/datasets/builder.py b/mmpose/datasets/builder.py index 2e5a236ff49b70b86149d318cbccdfd5af5a6450..841e21ca831c074247be78f85c6951fffd2ab3da 100644 --- a/mmpose/datasets/builder.py +++ b/mmpose/datasets/builder.py @@ -10,9 +10,10 @@ from mmengine.dataset import ConcatDataset, RepeatDataset from mmpose.registry import DATASETS -if platform.system() != 'Windows': +if platform.system() != "Windows": # https://github.com/pytorch/pytorch/issues/973 import resource + rlimit = resource.getrlimit(resource.RLIMIT_NOFILE) base_soft_limit = rlimit[0] hard_limit = rlimit[1] @@ -21,32 +22,32 @@ if platform.system() != 'Windows': def _concat_dataset(cfg, default_args=None): - types = cfg['type'] - ann_files = cfg['ann_file'] - img_prefixes = cfg.get('img_prefix', None) - dataset_infos = cfg.get('dataset_info', None) + types = cfg["type"] + ann_files = cfg["ann_file"] + img_prefixes = cfg.get("img_prefix", None) + dataset_infos = cfg.get("dataset_info", None) - num_joints = cfg['data_cfg'].get('num_joints', None) - dataset_channel = cfg['data_cfg'].get('dataset_channel', None) + num_joints = cfg["data_cfg"].get("num_joints", None) + dataset_channel = cfg["data_cfg"].get("dataset_channel", None) datasets = [] num_dset = len(ann_files) for i in range(num_dset): cfg_copy = copy.deepcopy(cfg) - cfg_copy['ann_file'] = ann_files[i] + cfg_copy["ann_file"] = ann_files[i] if isinstance(types, (list, tuple)): - cfg_copy['type'] = types[i] + cfg_copy["type"] = types[i] if isinstance(img_prefixes, (list, tuple)): - cfg_copy['img_prefix'] = img_prefixes[i] + cfg_copy["img_prefix"] = img_prefixes[i] if isinstance(dataset_infos, (list, tuple)): - cfg_copy['dataset_info'] = dataset_infos[i] + cfg_copy["dataset_info"] = dataset_infos[i] if isinstance(num_joints, (list, tuple)): - cfg_copy['data_cfg']['num_joints'] = num_joints[i] + cfg_copy["data_cfg"]["num_joints"] = num_joints[i] if is_seq_of(dataset_channel, list): - cfg_copy['data_cfg']['dataset_channel'] = dataset_channel[i] + cfg_copy["data_cfg"]["dataset_channel"] = dataset_channel[i] datasets.append(build_dataset(cfg_copy, default_args)) @@ -67,13 +68,11 @@ def build_dataset(cfg, default_args=None): if isinstance(cfg, (list, tuple)): dataset = ConcatDataset([build_dataset(c, default_args) for c in cfg]) - elif cfg['type'] == 'ConcatDataset': - dataset = ConcatDataset( - [build_dataset(c, default_args) for c in cfg['datasets']]) - elif cfg['type'] == 'RepeatDataset': - dataset = RepeatDataset( - build_dataset(cfg['dataset'], default_args), cfg['times']) - elif isinstance(cfg.get('ann_file'), (list, tuple)): + elif cfg["type"] == "ConcatDataset": + dataset = ConcatDataset([build_dataset(c, default_args) for c in cfg["datasets"]]) + elif cfg["type"] == "RepeatDataset": + dataset = RepeatDataset(build_dataset(cfg["dataset"], default_args), cfg["times"]) + elif isinstance(cfg.get("ann_file"), (list, tuple)): dataset = _concat_dataset(cfg, default_args) else: dataset = build_from_cfg(cfg, DATASETS, default_args) diff --git a/mmpose/datasets/dataset_wrappers.py b/mmpose/datasets/dataset_wrappers.py index 5f1bd31afe496bffea3d146f534a87026cdc4bef..a7ff0b3de3877f8f34ba9182b67e9d6008f63672 100644 --- a/mmpose/datasets/dataset_wrappers.py +++ b/mmpose/datasets/dataset_wrappers.py @@ -1,13 +1,14 @@ # Copyright (c) OpenMMLab. All rights reserved. from copy import deepcopy -from typing import Any, Callable, List, Optional, Tuple, Union, Dict +from typing import Any, Callable, Dict, List, Optional, Tuple, Union import numpy as np from mmengine.dataset import BaseDataset from mmengine.registry import build_from_cfg from mmpose.registry import DATASETS + from .datasets.utils import parse_pose_metainfo @@ -23,14 +24,16 @@ class CombinedDataset(BaseDataset): factors for each dataset. Defaults to None """ - def __init__(self, - metainfo: dict, - datasets: list, - pipeline: List[Union[dict, Callable]] = [], - sample_ratio_factor: Optional[List[float]] = None, - dataset_ratio_factor: Optional[List[float]] = None, - keypoints_mapping: Optional[List[Dict]] = None, - **kwargs): + def __init__( + self, + metainfo: dict, + datasets: list, + pipeline: List[Union[dict, Callable]] = [], + sample_ratio_factor: Optional[List[float]] = None, + dataset_ratio_factor: Optional[List[float]] = None, + keypoints_mapping: Optional[List[Dict]] = None, + **kwargs, + ): self.datasets = [] self.resample = sample_ratio_factor is not None @@ -40,8 +43,7 @@ class CombinedDataset(BaseDataset): if self.keypoints_mapping is not None: self.num_joints = 0 for mapping in self.keypoints_mapping: - self.num_joints = max(self.num_joints, max(mapping.values()) +1) - + self.num_joints = max(self.num_joints, max(mapping.values()) + 1) for cfg in datasets: dataset = build_from_cfg(cfg, DATASETS) @@ -62,16 +64,14 @@ class CombinedDataset(BaseDataset): self._lens = [len(dataset) for dataset in self.datasets] if self.resample: - assert len(sample_ratio_factor) == len(datasets), f'the length ' \ - f'of `sample_ratio_factor` {len(sample_ratio_factor)} does ' \ - f'not match the length of `datasets` {len(datasets)}' - assert min(sample_ratio_factor) >= 0.0, 'the ratio values in ' \ - '`sample_ratio_factor` should not be negative.' + assert len(sample_ratio_factor) == len(datasets), ( + f"the length " + f"of `sample_ratio_factor` {len(sample_ratio_factor)} does " + f"not match the length of `datasets` {len(datasets)}" + ) + assert min(sample_ratio_factor) >= 0.0, "the ratio values in " "`sample_ratio_factor` should not be negative." self._lens_ori = self._lens - self._lens = [ - round(l * sample_ratio_factor[i]) - for i, l in enumerate(self._lens_ori) - ] + self._lens = [round(l * sample_ratio_factor[i]) for i, l in enumerate(self._lens_ori)] self._len = sum(self._lens) @@ -102,9 +102,7 @@ class CombinedDataset(BaseDataset): the sub-dataset """ if index >= len(self) or index < -len(self): - raise ValueError( - f'index({index}) is out of bounds for dataset with ' - f'length({len(self)}).') + raise ValueError(f"index({index}) is out of bounds for dataset with " f"length({len(self)}).") if index < 0: index = index + len(self) @@ -115,8 +113,7 @@ class CombinedDataset(BaseDataset): subset_index += 1 if self.resample: - gap = (self._lens_ori[subset_index] - - 1e-4) / self._lens[subset_index] + gap = (self._lens_ori[subset_index] - 1e-4) / self._lens[subset_index] index = round(gap * index + np.random.rand() * gap - 0.5) return subset_index, index @@ -137,7 +134,7 @@ class CombinedDataset(BaseDataset): # the assignment of 'dataset' should not be performed within the # `get_data_info` function. Otherwise, it can lead to the mixed # data augmentation process getting stuck. - data_info['dataset'] = self + data_info["dataset"] = self return self.pipeline(data_info) @@ -153,14 +150,11 @@ class CombinedDataset(BaseDataset): # Get data sample processed by ``subset.pipeline`` data_info = self.datasets[subset_idx][sample_idx] - if 'dataset' in data_info: - data_info.pop('dataset') + if "dataset" in data_info: + data_info.pop("dataset") # Add metainfo items that are required in the pipeline and the model - metainfo_keys = [ - 'upper_body_ids', 'lower_body_ids', 'flip_pairs', - 'dataset_keypoint_weights', 'flip_indices' - ] + metainfo_keys = ["upper_body_ids", "lower_body_ids", "flip_pairs", "dataset_keypoint_weights", "flip_indices"] for key in metainfo_keys: data_info[key] = deepcopy(self._metainfo[key]) @@ -168,25 +162,24 @@ class CombinedDataset(BaseDataset): # Map keypoints based on the dataset keypoint mapping if self.keypoints_mapping is not None: mapping = self.keypoints_mapping[subset_idx] - - keypoints = data_info['keypoints'] + + keypoints = data_info["keypoints"] N, K, D = keypoints.shape - keypoints_visibility = data_info.get('keypoints_visibility', np.zeros((N, K))) - keypoints_visible = data_info.get('keypoints_visible', np.zeros((N, K))) - + keypoints_visibility = data_info.get("keypoints_visibility", np.zeros((N, K))) + keypoints_visible = data_info.get("keypoints_visible", np.zeros((N, K))) + mapped_keypoints = np.zeros((N, self.num_joints, 2)) mapped_visibility = np.zeros((N, self.num_joints)) mapped_visible = np.zeros((N, self.num_joints)) - map_idx = np.stack( - [list(mapping.keys()), list(mapping.values())], axis=1) - mapped_keypoints[:, map_idx[:, 1], :] = data_info['keypoints'][:, map_idx[:, 0], :] + map_idx = np.stack([list(mapping.keys()), list(mapping.values())], axis=1) + mapped_keypoints[:, map_idx[:, 1], :] = data_info["keypoints"][:, map_idx[:, 0], :] mapped_visibility[:, map_idx[:, 1]] = keypoints_visibility[:, map_idx[:, 0]] mapped_visible[:, map_idx[:, 1]] = keypoints_visible[:, map_idx[:, 0]] - data_info['keypoints'] = mapped_keypoints.reshape((N, self.num_joints, 2) ) - data_info['keypoints_visibility'] = mapped_visibility.reshape((N, self.num_joints)) - data_info['keypoints_visible'] = mapped_visible.reshape((N, self.num_joints)) + data_info["keypoints"] = mapped_keypoints.reshape((N, self.num_joints, 2)) + data_info["keypoints_visibility"] = mapped_visibility.reshape((N, self.num_joints)) + data_info["keypoints_visible"] = mapped_visible.reshape((N, self.num_joints)) # print('data_info', data_info) diff --git a/mmpose/datasets/datasets/animal/__init__.py b/mmpose/datasets/datasets/animal/__init__.py index 669f08cddd0ca10756867af160c55303c8a8ac20..dc6e895bb1ea9d754ccf39bdfa2203dbef8007dd 100644 --- a/mmpose/datasets/datasets/animal/__init__.py +++ b/mmpose/datasets/datasets/animal/__init__.py @@ -10,7 +10,13 @@ from .macaque_dataset import MacaqueDataset from .zebra_dataset import ZebraDataset __all__ = [ - 'AnimalPoseDataset', 'AP10KDataset', 'Horse10Dataset', 'MacaqueDataset', - 'FlyDataset', 'LocustDataset', 'ZebraDataset', 'ATRWDataset', - 'AnimalKingdomDataset' + "AnimalPoseDataset", + "AP10KDataset", + "Horse10Dataset", + "MacaqueDataset", + "FlyDataset", + "LocustDataset", + "ZebraDataset", + "ATRWDataset", + "AnimalKingdomDataset", ] diff --git a/mmpose/datasets/datasets/animal/animalkingdom_dataset.py b/mmpose/datasets/datasets/animal/animalkingdom_dataset.py index 35ccb8b67a5b607e91f5120b2bc6c21e3d3eba39..d805c955d95c91f11a9e22ad504f87e1967e5e60 100644 --- a/mmpose/datasets/datasets/animal/animalkingdom_dataset.py +++ b/mmpose/datasets/datasets/animal/animalkingdom_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -83,4 +84,4 @@ class AnimalKingdomDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/ak.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/ak.py") diff --git a/mmpose/datasets/datasets/animal/animalpose_dataset.py b/mmpose/datasets/datasets/animal/animalpose_dataset.py index 0279cf9de0907626f2a6686170dc5e99aafa2d9d..5259e75f32af423f743b2bc2ac562a87daee631a 100644 --- a/mmpose/datasets/datasets/animal/animalpose_dataset.py +++ b/mmpose/datasets/datasets/animal/animalpose_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -72,4 +73,4 @@ class AnimalPoseDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/animalpose.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/animalpose.py") diff --git a/mmpose/datasets/datasets/animal/ap10k_dataset.py b/mmpose/datasets/datasets/animal/ap10k_dataset.py index de1efbc67f7be55c57532684174442a3f865d5fd..1d539397469b6f217f39514cafc16c784e455eda 100644 --- a/mmpose/datasets/datasets/animal/ap10k_dataset.py +++ b/mmpose/datasets/datasets/animal/ap10k_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -70,4 +71,4 @@ class AP10KDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/ap10k.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/ap10k.py") diff --git a/mmpose/datasets/datasets/animal/atrw_dataset.py b/mmpose/datasets/datasets/animal/atrw_dataset.py index de5b1a09a0510969ea0a6d57c15e5bd13104b99b..ca0b27a40f52b0c205860c337d1bccf2c60ae517 100644 --- a/mmpose/datasets/datasets/animal/atrw_dataset.py +++ b/mmpose/datasets/datasets/animal/atrw_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -68,4 +69,4 @@ class ATRWDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/atrw.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/atrw.py") diff --git a/mmpose/datasets/datasets/animal/fly_dataset.py b/mmpose/datasets/datasets/animal/fly_dataset.py index b614d9b9f77b1e2eb7f067ea6cfb21d788857554..cc2310774682b96ce39c1fbb371e8000e906254a 100644 --- a/mmpose/datasets/datasets/animal/fly_dataset.py +++ b/mmpose/datasets/datasets/animal/fly_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -85,4 +86,4 @@ class FlyDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/fly.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/fly.py") diff --git a/mmpose/datasets/datasets/animal/horse10_dataset.py b/mmpose/datasets/datasets/animal/horse10_dataset.py index 0c25dba6a705045b731bddd176bf20a46c285764..3974346da3327deae429fb03cb30d1f61a443350 100644 --- a/mmpose/datasets/datasets/animal/horse10_dataset.py +++ b/mmpose/datasets/datasets/animal/horse10_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -74,4 +75,4 @@ class Horse10Dataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/horse10.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/horse10.py") diff --git a/mmpose/datasets/datasets/animal/locust_dataset.py b/mmpose/datasets/datasets/animal/locust_dataset.py index 3ada76034db8e9cbc25d68ccd9a430ea62394c74..316afdd79582fad589eefd71c1653760066918f1 100644 --- a/mmpose/datasets/datasets/animal/locust_dataset.py +++ b/mmpose/datasets/datasets/animal/locust_dataset.py @@ -5,6 +5,7 @@ from typing import Optional import numpy as np from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -93,7 +94,7 @@ class LocustDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/locust.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/locust.py") def parse_data_info(self, raw_data_info: dict) -> Optional[dict]: """Parse raw Locust annotation of an instance. @@ -110,31 +111,30 @@ class LocustDataset(BaseCocoStyleDataset): dict: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] - img_path = osp.join(self.data_prefix['img'], img['file_name']) + img_path = osp.join(self.data_prefix["img"], img["file_name"]) # get bbox in shape [1, 4], formatted as xywh # use the entire image which is 160x160 bbox = np.array([0, 0, 160, 160], dtype=np.float32).reshape(1, 4) # keypoints in shape [1, K, 2] and keypoints_visible in [1, K] - _keypoints = np.array( - ann['keypoints'], dtype=np.float32).reshape(1, -1, 3) + _keypoints = np.array(ann["keypoints"], dtype=np.float32).reshape(1, -1, 3) keypoints = _keypoints[..., :2] keypoints_visible = np.minimum(1, _keypoints[..., 2]) data_info = { - 'img_id': ann['image_id'], - 'img_path': img_path, - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': ann['num_keypoints'], - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'iscrowd': ann['iscrowd'], - 'id': ann['id'], + "img_id": ann["image_id"], + "img_path": img_path, + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": ann["num_keypoints"], + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "iscrowd": ann["iscrowd"], + "id": ann["id"], } return data_info diff --git a/mmpose/datasets/datasets/animal/macaque_dataset.py b/mmpose/datasets/datasets/animal/macaque_dataset.py index 08da981a1a2299efaadaf727b3960e769999fc35..0aa8ad1f408064d3cdd54125f9b4fe1a7846c9f3 100644 --- a/mmpose/datasets/datasets/animal/macaque_dataset.py +++ b/mmpose/datasets/datasets/animal/macaque_dataset.py @@ -1,6 +1,7 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -71,4 +72,4 @@ class MacaqueDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/macaque.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/macaque.py") diff --git a/mmpose/datasets/datasets/animal/zebra_dataset.py b/mmpose/datasets/datasets/animal/zebra_dataset.py index b399a8479bcf18b8b33115b4cd703563e1a846d3..85f7ce00cd2167d426bfb923e7063e7c016ff893 100644 --- a/mmpose/datasets/datasets/animal/zebra_dataset.py +++ b/mmpose/datasets/datasets/animal/zebra_dataset.py @@ -5,6 +5,7 @@ from typing import Optional import numpy as np from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -67,7 +68,7 @@ class ZebraDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/zebra.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/zebra.py") def parse_data_info(self, raw_data_info: dict) -> Optional[dict]: """Parse raw Zebra annotation of an instance. @@ -84,33 +85,32 @@ class ZebraDataset(BaseCocoStyleDataset): dict: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] - img_path = osp.join(self.data_prefix['img'], img['file_name']) + img_path = osp.join(self.data_prefix["img"], img["file_name"]) # get bbox in shape [1, 4], formatted as xywh # use the entire image which is 160x160 bbox = np.array([0, 0, 160, 160], dtype=np.float32).reshape(1, 4) # keypoints in shape [1, K, 2] and keypoints_visible in [1, K] - _keypoints = np.array( - ann['keypoints'], dtype=np.float32).reshape(1, -1, 3) + _keypoints = np.array(ann["keypoints"], dtype=np.float32).reshape(1, -1, 3) keypoints = _keypoints[..., :2] keypoints_visible = np.minimum(1, _keypoints[..., 2]) - num_keypoints = ann['num_keypoints'] + num_keypoints = ann["num_keypoints"] data_info = { - 'img_id': ann['image_id'], - 'img_path': img_path, - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': num_keypoints, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'iscrowd': ann['iscrowd'], - 'id': ann['id'], + "img_id": ann["image_id"], + "img_path": img_path, + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": num_keypoints, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "iscrowd": ann["iscrowd"], + "id": ann["id"], } return data_info diff --git a/mmpose/datasets/datasets/base/__init__.py b/mmpose/datasets/datasets/base/__init__.py index 810440530e4d091f55aea349b6b2a4f8d3ba593b..4a45d4165cc23acdc3f6187d2ae4f153d0ea1cd2 100644 --- a/mmpose/datasets/datasets/base/__init__.py +++ b/mmpose/datasets/datasets/base/__init__.py @@ -2,4 +2,4 @@ from .base_coco_style_dataset import BaseCocoStyleDataset from .base_mocap_dataset import BaseMocapDataset -__all__ = ['BaseCocoStyleDataset', 'BaseMocapDataset'] +__all__ = ["BaseCocoStyleDataset", "BaseMocapDataset"] diff --git a/mmpose/datasets/datasets/base/base_coco_style_dataset.py b/mmpose/datasets/datasets/base/base_coco_style_dataset.py index 61fad40e1d31458798e4ac04c2fa0d7e7eb4d31f..7784fc0a648bcff5ebf0c0fe3d608380f8179be6 100644 --- a/mmpose/datasets/datasets/base/base_coco_style_dataset.py +++ b/mmpose/datasets/datasets/base/base_coco_style_dataset.py @@ -10,11 +10,12 @@ from mmengine.dataset import BaseDataset, force_full_init from mmengine.fileio import exists, get_local_path, load from mmengine.logging import MessageHub from mmengine.utils import is_list_of -from xtcocotools.coco import COCO from mmpose.registry import DATASETS from mmpose.structures.bbox import bbox_xywh2xyxy from mmpose.structures.keypoint import find_min_padding_exact +from xtcocotools.coco import COCO + from ..utils import parse_pose_metainfo @@ -64,40 +65,40 @@ class BaseCocoStyleDataset(BaseDataset): METAINFO: dict = dict() - def __init__(self, - ann_file: str = '', - bbox_file: Optional[str] = None, - data_mode: str = 'topdown', - metainfo: Optional[dict] = None, - data_root: Optional[str] = None, - data_prefix: dict = dict(img=''), - filter_cfg: Optional[dict] = None, - indices: Optional[Union[int, Sequence[int]]] = None, - serialize_data: bool = True, - pipeline: List[Union[dict, Callable]] = [], - test_mode: bool = False, - lazy_init: bool = False, - max_refetch: int = 1000, - sample_interval: int = 1): - - if data_mode not in {'topdown', 'bottomup'}: - raise ValueError( - f'{self.__class__.__name__} got invalid data_mode: ' - f'{data_mode}. Should be "topdown" or "bottomup".') + def __init__( + self, + ann_file: str = "", + bbox_file: Optional[str] = None, + data_mode: str = "topdown", + metainfo: Optional[dict] = None, + data_root: Optional[str] = None, + data_prefix: dict = dict(img=""), + filter_cfg: Optional[dict] = None, + indices: Optional[Union[int, Sequence[int]]] = None, + serialize_data: bool = True, + pipeline: List[Union[dict, Callable]] = [], + test_mode: bool = False, + lazy_init: bool = False, + max_refetch: int = 1000, + sample_interval: int = 1, + ): + + if data_mode not in {"topdown", "bottomup"}: + raise ValueError(f"{self.__class__.__name__} got invalid data_mode: " f'{data_mode}. Should be "topdown" or "bottomup".') self.data_mode = data_mode if bbox_file: - if self.data_mode != 'topdown': + if self.data_mode != "topdown": raise ValueError( - f'{self.__class__.__name__} is set to {self.data_mode}: ' + f"{self.__class__.__name__} is set to {self.data_mode}: " 'mode, while "bbox_file" is only ' - 'supported in topdown mode.') + "supported in topdown mode." + ) if not test_mode: raise ValueError( - f'{self.__class__.__name__} has `test_mode==False` ' - 'while "bbox_file" is only ' - 'supported when `test_mode==True`.') + f"{self.__class__.__name__} has `test_mode==False` " 'while "bbox_file" is only ' "supported when `test_mode==True`." + ) self.bbox_file = bbox_file self.sample_interval = sample_interval @@ -112,14 +113,14 @@ class BaseCocoStyleDataset(BaseDataset): pipeline=pipeline, test_mode=test_mode, lazy_init=lazy_init, - max_refetch=max_refetch) + max_refetch=max_refetch, + ) if self.test_mode: # save the ann_file into MessageHub for CocoMetric message = MessageHub.get_current_instance() - dataset_name = self.metainfo['dataset_name'] - message.update_info_dict( - {f'{dataset_name}_ann_file': self.ann_file}) + dataset_name = self.metainfo["dataset_name"] + message.update_info_dict({f"{dataset_name}_ann_file": self.ann_file}) @classmethod def _load_metainfo(cls, metainfo: dict = None) -> dict: @@ -136,8 +137,7 @@ class BaseCocoStyleDataset(BaseDataset): metainfo = deepcopy(cls.METAINFO) if not isinstance(metainfo, dict): - raise TypeError( - f'metainfo should be a dict, but got {type(metainfo)}') + raise TypeError(f"metainfo should be a dict, but got {type(metainfo)}") # parse pose metainfo if it has been assigned if metainfo: @@ -166,7 +166,7 @@ class BaseCocoStyleDataset(BaseDataset): # Note: The 'dataset' assignment should not occur within the # `get_data_info` function, as doing so may cause the mixed image # transformations to stall or hang. - data_info['dataset'] = self + data_info["dataset"] = self return self.pipeline(data_info) @@ -183,14 +183,17 @@ class BaseCocoStyleDataset(BaseDataset): # Add metainfo items that are required in the pipeline and the model metainfo_keys = [ - 'dataset_name', 'upper_body_ids', 'lower_body_ids', 'flip_pairs', - 'dataset_keypoint_weights', 'flip_indices', 'skeleton_links' + "dataset_name", + "upper_body_ids", + "lower_body_ids", + "flip_pairs", + "dataset_keypoint_weights", + "flip_indices", + "skeleton_links", ] for key in metainfo_keys: - assert key not in data_info, ( - f'"{key}" is a reserved key for `metainfo`, but already ' - 'exists in the `data_info`.') + assert key not in data_info, f'"{key}" is a reserved key for `metainfo`, but already ' "exists in the `data_info`." data_info[key] = deepcopy(self._metainfo[key]) @@ -205,27 +208,24 @@ class BaseCocoStyleDataset(BaseDataset): else: instance_list, image_list = self._load_annotations() - if self.data_mode == 'topdown': + if self.data_mode == "topdown": data_list = self._get_topdown_data_infos(instance_list) else: - data_list = self._get_bottomup_data_infos( - instance_list, image_list) + data_list = self._get_bottomup_data_infos(instance_list, image_list) return data_list def _load_annotations(self) -> Tuple[List[dict], List[dict]]: """Load data from annotations in COCO format.""" - assert exists(self.ann_file), ( - f'Annotation file `{self.ann_file}`does not exist') + assert exists(self.ann_file), f"Annotation file `{self.ann_file}`does not exist" with get_local_path(self.ann_file) as local_path: self.coco = COCO(local_path) # set the metainfo about categories, which is a list of dict # and each dict contains the 'id', 'name', etc. about this category - if 'categories' in self.coco.dataset: - self._metainfo['CLASSES'] = self.coco.loadCats( - self.coco.getCatIds()) + if "categories" in self.coco.dataset: + self._metainfo["CLASSES"] = self.coco.loadCats(self.coco.getCatIds()) instance_list = [] image_list = [] @@ -234,19 +234,18 @@ class BaseCocoStyleDataset(BaseDataset): if img_id % self.sample_interval != 0: continue img = self.coco.loadImgs(img_id)[0] - img.update({ - 'img_id': - img_id, - 'img_path': - osp.join(self.data_prefix['img'], img['file_name']), - }) + img.update( + { + "img_id": img_id, + "img_path": osp.join(self.data_prefix["img"], img["file_name"]), + } + ) image_list.append(img) ann_ids = self.coco.getAnnIds(imgIds=img_id) for ann in self.coco.loadAnns(ann_ids): - instance_info = self.parse_data_info( - dict(raw_ann_info=ann, raw_img_info=img)) + instance_info = self.parse_data_info(dict(raw_ann_info=ann, raw_img_info=img)) # skip invalid instance annotation. if not instance_info: @@ -270,17 +269,17 @@ class BaseCocoStyleDataset(BaseDataset): dict | None: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] # filter invalid instance - if 'bbox' not in ann or 'keypoints' not in ann: + if "bbox" not in ann or "keypoints" not in ann: return None - img_w, img_h = img['width'], img['height'] + img_w, img_h = img["width"], img["height"] # get bbox in shape [1, 4], formatted as xywh - x, y, w, h = ann['bbox'] + x, y, w, h = ann["bbox"] x1 = np.clip(x, 0, img_w - 1) y1 = np.clip(y, 0, img_h - 1) x2 = np.clip(x + w, 0, img_w - 1) @@ -289,54 +288,53 @@ class BaseCocoStyleDataset(BaseDataset): bbox = np.array([x1, y1, x2, y2], dtype=np.float32).reshape(1, 4) # keypoints in shape [1, K, 2] and keypoints_visible in [1, K] - _keypoints = np.array( - ann['keypoints'], dtype=np.float32).reshape(1, -1, 3) + _keypoints = np.array(ann["keypoints"], dtype=np.float32).reshape(1, -1, 3) keypoints = _keypoints[..., :2] # keypoints_annotated = (_keypoints[..., 2] > 0).astype(np.float32) keypoints_visibility = (_keypoints[..., 2] == 2).astype(np.float32) keypoints_visible = np.minimum(1, _keypoints[..., 2]) - if 'num_keypoints' in ann: - num_keypoints = ann['num_keypoints'] + if "num_keypoints" in ann: + num_keypoints = ann["num_keypoints"] else: num_keypoints = np.count_nonzero(keypoints.max(axis=2)) - if 'area' in ann: - area = np.array(ann['area'], dtype=np.float32) + if "area" in ann: + area = np.array(ann["area"], dtype=np.float32) else: area = np.clip((x2 - x1) * (y2 - y1) * 0.53, a_min=1.0, a_max=None) area = np.array(area, dtype=np.float32) - - id_similarity = np.array([ann.get('identity_similarity', 0.0)]) - identified = np.array([ann.get('identified', 0)]) - pad_to_contain = ann.get('pad_to_contain', None) + + id_similarity = np.array([ann.get("identity_similarity", 0.0)]) + identified = np.array([ann.get("identified", 0)]) + pad_to_contain = ann.get("pad_to_contain", None) if pad_to_contain is None: pad_to_contain = find_min_padding_exact(bbox, _keypoints.reshape(-1, 3)) data_info = { - 'img_id': ann['image_id'], - 'img_path': img['img_path'], - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': num_keypoints, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'keypoints_visibility': keypoints_visibility, - 'pad_to_contain': pad_to_contain, - 'area': area, - 'iscrowd': ann.get('iscrowd', 0), - 'segmentation': ann.get('segmentation', None), - 'id': ann['id'], - 'id_similarity': id_similarity, - 'identified': identified, - 'category_id': np.array(ann['category_id']), + "img_id": ann["image_id"], + "img_path": img["img_path"], + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": num_keypoints, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "keypoints_visibility": keypoints_visibility, + "pad_to_contain": pad_to_contain, + "area": area, + "iscrowd": ann.get("iscrowd", 0), + "segmentation": ann.get("segmentation", None), + "id": ann["id"], + "id_similarity": id_similarity, + "identified": identified, + "category_id": np.array(ann["category_id"]), # store the raw annotation of the instance # it is useful for evaluation without providing ann_file - 'raw_ann_info': copy.deepcopy(ann), - 'source_dataset': self.metainfo['dataset_name'], + "raw_ann_info": copy.deepcopy(ann), + "source_dataset": self.metainfo["dataset_name"], } - if 'crowdIndex' in img: - data_info['crowd_index'] = img['crowdIndex'] + if "crowdIndex" in img: + data_info["crowd_index"] = img["crowdIndex"] return data_info @@ -345,20 +343,20 @@ class BaseCocoStyleDataset(BaseDataset): """Check a data info is an instance with valid bbox and keypoint annotations.""" # crowd annotation - if 'iscrowd' in data_info and data_info['iscrowd']: + if "iscrowd" in data_info and data_info["iscrowd"]: return False # invalid keypoints - if 'num_keypoints' in data_info and data_info['num_keypoints'] == 0: + if "num_keypoints" in data_info and data_info["num_keypoints"] == 0: return False # invalid bbox - if 'bbox' in data_info: - bbox = data_info['bbox'][0] + if "bbox" in data_info: + bbox = data_info["bbox"][0] w, h = bbox[2:4] - bbox[:2] if w <= 0 or h <= 0: return False # invalid keypoints - if 'keypoints' in data_info: - if np.max(data_info['keypoints']) <= 0: + if "keypoints" in data_info: + if np.max(data_info["keypoints"]) <= 0: return False return True @@ -369,8 +367,7 @@ class BaseCocoStyleDataset(BaseDataset): return data_list_tp - def _get_bottomup_data_infos(self, instance_list: List[Dict], - image_list: List[Dict]) -> List[Dict]: + def _get_bottomup_data_infos(self, instance_list: List[Dict], image_list: List[Dict]) -> List[Dict]: """Organize the data list in bottom-up mode.""" # bottom-up data list @@ -379,16 +376,15 @@ class BaseCocoStyleDataset(BaseDataset): used_img_ids = set() # group instances by img_id - for img_id, data_infos in groupby(instance_list, - lambda x: x['img_id']): + for img_id, data_infos in groupby(instance_list, lambda x: x["img_id"]): used_img_ids.add(img_id) data_infos = list(data_infos) # image data - img_path = data_infos[0]['img_path'] + img_path = data_infos[0]["img_path"] data_info_bu = { - 'img_id': img_id, - 'img_path': img_path, + "img_id": img_id, + "img_path": img_path, } for key in data_infos[0].keys(): @@ -407,23 +403,22 @@ class BaseCocoStyleDataset(BaseDataset): # The segmentation annotation of invalid objects will be used # to generate valid region mask in the pipeline. invalid_segs = [] - for data_info_invalid in filterfalse(self._is_valid_instance, - data_infos): - if 'segmentation' in data_info_invalid: - invalid_segs.append(data_info_invalid['segmentation']) - data_info_bu['invalid_segs'] = invalid_segs + for data_info_invalid in filterfalse(self._is_valid_instance, data_infos): + if "segmentation" in data_info_invalid: + invalid_segs.append(data_info_invalid["segmentation"]) + data_info_bu["invalid_segs"] = invalid_segs data_list_bu.append(data_info_bu) # add images without instance for evaluation if self.test_mode: for img_info in image_list: - if img_info['img_id'] not in used_img_ids: + if img_info["img_id"] not in used_img_ids: data_info_bu = { - 'img_id': img_info['img_id'], - 'img_path': img_info['img_path'], - 'id': list(), - 'raw_ann_info': None, + "img_id": img_info["img_id"], + "img_path": img_info["img_path"], + "id": list(), + "raw_ann_info": None, } data_list_bu.append(data_info_bu) @@ -432,58 +427,54 @@ class BaseCocoStyleDataset(BaseDataset): def _load_detection_results(self) -> List[dict]: """Load data from detection results with dummy keypoint annotations.""" - assert exists(self.ann_file), ( - f'Annotation file `{self.ann_file}` does not exist') - assert exists( - self.bbox_file), (f'Bbox file `{self.bbox_file}` does not exist') + assert exists(self.ann_file), f"Annotation file `{self.ann_file}` does not exist" + assert exists(self.bbox_file), f"Bbox file `{self.bbox_file}` does not exist" # load detection results det_results = load(self.bbox_file) - assert is_list_of( - det_results, - dict), (f'BBox file `{self.bbox_file}` should be a list of dict, ' - f'but got {type(det_results)}') + assert is_list_of(det_results, dict), f"BBox file `{self.bbox_file}` should be a list of dict, " f"but got {type(det_results)}" # load coco annotations to build image id-to-name index with get_local_path(self.ann_file) as local_path: self.coco = COCO(local_path) # set the metainfo about categories, which is a list of dict # and each dict contains the 'id', 'name', etc. about this category - self._metainfo['CLASSES'] = self.coco.loadCats(self.coco.getCatIds()) + self._metainfo["CLASSES"] = self.coco.loadCats(self.coco.getCatIds()) - num_keypoints = self.metainfo['num_keypoints'] + num_keypoints = self.metainfo["num_keypoints"] data_list = [] id_ = 0 for det in det_results: # remove non-human instances - if det['category_id'] != 1: + if det["category_id"] != 1: continue - img = self.coco.loadImgs(det['image_id'])[0] + img = self.coco.loadImgs(det["image_id"])[0] - img_path = osp.join(self.data_prefix['img'], img['file_name']) - bbox_xywh = np.array( - det['bbox'][:4], dtype=np.float32).reshape(1, 4) + img_path = osp.join(self.data_prefix["img"], img["file_name"]) + bbox_xywh = np.array(det["bbox"][:4], dtype=np.float32).reshape(1, 4) bbox = bbox_xywh2xyxy(bbox_xywh) - bbox_score = np.array(det['score'], dtype=np.float32).reshape(1) + bbox_score = np.array(det["score"], dtype=np.float32).reshape(1) # use dummy keypoint location and visibility keypoints = np.zeros((1, num_keypoints, 2), dtype=np.float32) keypoints_visible = np.ones((1, num_keypoints), dtype=np.float32) # If segmentation in the detection results, save it for later use - segmentation = det.get('segmentation', None) - - data_list.append({ - 'img_id': det['image_id'], - 'img_path': img_path, - 'img_shape': (img['height'], img['width']), - 'bbox': bbox, - 'bbox_score': bbox_score, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'id': id_, - 'segmentation': segmentation, - }) + segmentation = det.get("segmentation", None) + + data_list.append( + { + "img_id": det["image_id"], + "img_path": img_path, + "img_shape": (img["height"], img["width"]), + "bbox": bbox, + "bbox_score": bbox_score, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "id": id_, + "segmentation": segmentation, + } + ) id_ += 1 @@ -503,16 +494,16 @@ class BaseCocoStyleDataset(BaseDataset): return data_list # filter out annotations with a bbox_score below the threshold - if 'bbox_score_thr' in self.filter_cfg: + if "bbox_score_thr" in self.filter_cfg: - if self.data_mode != 'topdown': + if self.data_mode != "topdown": raise ValueError( - f'{self.__class__.__name__} is set to {self.data_mode} ' + f"{self.__class__.__name__} is set to {self.data_mode} " 'mode, while "bbox_score_thr" is only supported in ' - 'topdown mode.') + "topdown mode." + ) - thr = self.filter_cfg['bbox_score_thr'] - data_list = list( - filterfalse(lambda ann: ann['bbox_score'] < thr, data_list)) + thr = self.filter_cfg["bbox_score_thr"] + data_list = list(filterfalse(lambda ann: ann["bbox_score"] < thr, data_list)) return data_list diff --git a/mmpose/datasets/datasets/base/base_mocap_dataset.py b/mmpose/datasets/datasets/base/base_mocap_dataset.py index f9cea2987c647a1111bb60f94329e80961c8d0b2..7d00f8c0449da9c0c58716d35071254c45bcaabf 100644 --- a/mmpose/datasets/datasets/base/base_mocap_dataset.py +++ b/mmpose/datasets/datasets/base/base_mocap_dataset.py @@ -14,6 +14,7 @@ from mmengine.logging import print_log from mmengine.utils import is_abs from mmpose.registry import DATASETS + from ..utils import parse_pose_metainfo @@ -65,47 +66,43 @@ class BaseMocapDataset(BaseDataset): METAINFO: dict = dict() - def __init__(self, - ann_file: str = '', - seq_len: int = 1, - multiple_target: int = 0, - causal: bool = True, - subset_frac: float = 1.0, - camera_param_file: Optional[str] = None, - data_mode: str = 'topdown', - metainfo: Optional[dict] = None, - data_root: Optional[str] = None, - data_prefix: dict = dict(img=''), - filter_cfg: Optional[dict] = None, - indices: Optional[Union[int, Sequence[int]]] = None, - serialize_data: bool = True, - pipeline: List[Union[dict, Callable]] = [], - test_mode: bool = False, - lazy_init: bool = False, - max_refetch: int = 1000): - - if data_mode not in {'topdown', 'bottomup'}: - raise ValueError( - f'{self.__class__.__name__} got invalid data_mode: ' - f'{data_mode}. Should be "topdown" or "bottomup".') + def __init__( + self, + ann_file: str = "", + seq_len: int = 1, + multiple_target: int = 0, + causal: bool = True, + subset_frac: float = 1.0, + camera_param_file: Optional[str] = None, + data_mode: str = "topdown", + metainfo: Optional[dict] = None, + data_root: Optional[str] = None, + data_prefix: dict = dict(img=""), + filter_cfg: Optional[dict] = None, + indices: Optional[Union[int, Sequence[int]]] = None, + serialize_data: bool = True, + pipeline: List[Union[dict, Callable]] = [], + test_mode: bool = False, + lazy_init: bool = False, + max_refetch: int = 1000, + ): + + if data_mode not in {"topdown", "bottomup"}: + raise ValueError(f"{self.__class__.__name__} got invalid data_mode: " f'{data_mode}. Should be "topdown" or "bottomup".') self.data_mode = data_mode _ann_file = ann_file if not is_abs(_ann_file): _ann_file = osp.join(data_root, _ann_file) - assert exists(_ann_file), ( - f'Annotation file `{_ann_file}` does not exist.') + assert exists(_ann_file), f"Annotation file `{_ann_file}` does not exist." self._load_ann_file(_ann_file) self.camera_param_file = camera_param_file if self.camera_param_file: if not is_abs(self.camera_param_file): - self.camera_param_file = osp.join(data_root, - self.camera_param_file) - assert exists(self.camera_param_file), ( - f'Camera parameters file `{self.camera_param_file}` does not ' - 'exist.') + self.camera_param_file = osp.join(data_root, self.camera_param_file) + assert exists(self.camera_param_file), f"Camera parameters file `{self.camera_param_file}` does not " "exist." self.camera_param = load(self.camera_param_file) self.seq_len = seq_len @@ -113,12 +110,9 @@ class BaseMocapDataset(BaseDataset): self.multiple_target = multiple_target if self.multiple_target: - assert (self.seq_len == 1), ( - 'Multi-target data sample only supports seq_len=1.') + assert self.seq_len == 1, "Multi-target data sample only supports seq_len=1." - assert 0 < subset_frac <= 1, ( - f'Unsupported `subset_frac` {subset_frac}. Supported range ' - 'is (0, 1].') + assert 0 < subset_frac <= 1, f"Unsupported `subset_frac` {subset_frac}. Supported range " "is (0, 1]." self.subset_frac = subset_frac self.sequence_indices = self.get_sequence_indices() @@ -134,7 +128,8 @@ class BaseMocapDataset(BaseDataset): pipeline=pipeline, test_mode=test_mode, lazy_init=lazy_init, - max_refetch=max_refetch) + max_refetch=max_refetch, + ) def _load_ann_file(self, ann_file: str) -> dict: """Load annotation file to get image information. @@ -164,8 +159,7 @@ class BaseMocapDataset(BaseDataset): metainfo = deepcopy(cls.METAINFO) if not isinstance(metainfo, dict): - raise TypeError( - f'metainfo should be a dict, but got {type(metainfo)}') + raise TypeError(f"metainfo should be a dict, but got {type(metainfo)}") # parse pose metainfo if it has been assigned if metainfo: @@ -202,15 +196,10 @@ class BaseMocapDataset(BaseDataset): data_info = super().get_data_info(idx) # Add metainfo items that are required in the pipeline and the model - metainfo_keys = [ - 'upper_body_ids', 'lower_body_ids', 'flip_pairs', - 'dataset_keypoint_weights', 'flip_indices', 'skeleton_links' - ] + metainfo_keys = ["upper_body_ids", "lower_body_ids", "flip_pairs", "dataset_keypoint_weights", "flip_indices", "skeleton_links"] for key in metainfo_keys: - assert key not in data_info, ( - f'"{key}" is a reserved key for `metainfo`, but already ' - 'exists in the `data_info`.') + assert key not in data_info, f'"{key}" is a reserved key for `metainfo`, but already ' "exists in the `data_info`." data_info[key] = deepcopy(self._metainfo[key]) @@ -222,34 +211,29 @@ class BaseMocapDataset(BaseDataset): instance_list, image_list = self._load_annotations() - if self.data_mode == 'topdown': + if self.data_mode == "topdown": data_list = self._get_topdown_data_infos(instance_list) else: - data_list = self._get_bottomup_data_infos(instance_list, - image_list) + data_list = self._get_bottomup_data_infos(instance_list, image_list) return data_list def get_img_info(self, img_idx, img_name): try: - with get_local_path(osp.join(self.data_prefix['img'], - img_name)) as local_path: + with get_local_path(osp.join(self.data_prefix["img"], img_name)) as local_path: im = cv2.imread(local_path) h, w, _ = im.shape except: # noqa: E722 - print_log( - f'Failed to read image {img_name}.', - logger='current', - level=logging.DEBUG) + print_log(f"Failed to read image {img_name}.", logger="current", level=logging.DEBUG) return None img = { - 'file_name': img_name, - 'height': h, - 'width': w, - 'id': img_idx, - 'img_id': img_idx, - 'img_path': osp.join(self.data_prefix['img'], img_name), + "file_name": img_name, + "height": h, + "width": w, + "id": img_idx, + "img_id": img_idx, + "img_path": osp.join(self.data_prefix["img"], img_name), } return img @@ -267,47 +251,44 @@ class BaseMocapDataset(BaseDataset): """ sequence_indices = [] if self.seq_len == 1: - num_imgs = len(self.ann_data['imgname']) + num_imgs = len(self.ann_data["imgname"]) sequence_indices = [[idx] for idx in range(num_imgs)] else: - raise NotImplementedError('Multi-frame data sample unsupported!') + raise NotImplementedError("Multi-frame data sample unsupported!") if self.multiple_target > 0: sequence_indices_merged = [] for i in range(0, len(sequence_indices), self.multiple_target): if i + self.multiple_target > len(sequence_indices): break - sequence_indices_merged.append( - list( - itertools.chain.from_iterable( - sequence_indices[i:i + self.multiple_target]))) + sequence_indices_merged.append(list(itertools.chain.from_iterable(sequence_indices[i : i + self.multiple_target]))) sequence_indices = sequence_indices_merged return sequence_indices def _load_annotations(self) -> Tuple[List[dict], List[dict]]: """Load data from annotations in COCO format.""" - num_keypoints = self.metainfo['num_keypoints'] + num_keypoints = self.metainfo["num_keypoints"] - img_names = self.ann_data['imgname'] + img_names = self.ann_data["imgname"] num_imgs = len(img_names) - if 'S' in self.ann_data.keys(): - kpts_3d = self.ann_data['S'] + if "S" in self.ann_data.keys(): + kpts_3d = self.ann_data["S"] else: kpts_3d = np.zeros((num_imgs, num_keypoints, 4), dtype=np.float32) - if 'part' in self.ann_data.keys(): - kpts_2d = self.ann_data['part'] + if "part" in self.ann_data.keys(): + kpts_2d = self.ann_data["part"] else: kpts_2d = np.zeros((num_imgs, num_keypoints, 3), dtype=np.float32) - if 'center' in self.ann_data.keys(): - centers = self.ann_data['center'] + if "center" in self.ann_data.keys(): + centers = self.ann_data["center"] else: centers = np.zeros((num_imgs, 2), dtype=np.float32) - if 'scale' in self.ann_data.keys(): - scales = self.ann_data['scale'].astype(np.float32) + if "scale" in self.ann_data.keys(): + scales = self.ann_data["scale"].astype(np.float32) else: scales = np.zeros(num_imgs, dtype=np.float32) @@ -319,9 +300,7 @@ class BaseMocapDataset(BaseDataset): if self.multiple_target: expected_num_frames = self.multiple_target - assert len(frame_ids) == (expected_num_frames), ( - f'Expected `frame_ids` == {expected_num_frames}, but ' - f'got {len(frame_ids)} ') + assert len(frame_ids) == (expected_num_frames), f"Expected `frame_ids` == {expected_num_frames}, but " f"got {len(frame_ids)} " _img_names = img_names[frame_ids] @@ -338,30 +317,30 @@ class BaseMocapDataset(BaseDataset): target_idx = list(range(self.multiple_target)) instance_info = { - 'num_keypoints': num_keypoints, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'keypoints_3d': keypoints_3d, - 'keypoints_3d_visible': keypoints_3d_visible, - 'scale': scales[idx], - 'center': centers[idx].astype(np.float32).reshape(1, -1), - 'id': idx, - 'category_id': 1, - 'iscrowd': 0, - 'img_paths': list(_img_names), - 'img_ids': frame_ids, - 'lifting_target': keypoints_3d[target_idx], - 'lifting_target_visible': keypoints_3d_visible[target_idx], - 'target_img_path': _img_names[target_idx], + "num_keypoints": num_keypoints, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "keypoints_3d": keypoints_3d, + "keypoints_3d_visible": keypoints_3d_visible, + "scale": scales[idx], + "center": centers[idx].astype(np.float32).reshape(1, -1), + "id": idx, + "category_id": 1, + "iscrowd": 0, + "img_paths": list(_img_names), + "img_ids": frame_ids, + "lifting_target": keypoints_3d[target_idx], + "lifting_target_visible": keypoints_3d_visible[target_idx], + "target_img_path": _img_names[target_idx], } if self.camera_param_file: _cam_param = self.get_camera_param(_img_names[0]) - instance_info['camera_param'] = _cam_param + instance_info["camera_param"] = _cam_param instance_list.append(instance_info) - if self.data_mode == 'bottomup': + if self.data_mode == "bottomup": for idx, imgname in enumerate(img_names): img_info = self.get_img_info(idx, imgname) image_list.append(img_info) @@ -380,14 +359,14 @@ class BaseMocapDataset(BaseDataset): """Check a data info is an instance with valid bbox and keypoint annotations.""" # crowd annotation - if 'iscrowd' in data_info and data_info['iscrowd']: + if "iscrowd" in data_info and data_info["iscrowd"]: return False # invalid keypoints - if 'num_keypoints' in data_info and data_info['num_keypoints'] == 0: + if "num_keypoints" in data_info and data_info["num_keypoints"] == 0: return False # invalid keypoints - if 'keypoints' in data_info: - if np.max(data_info['keypoints']) <= 0: + if "keypoints" in data_info: + if np.max(data_info["keypoints"]) <= 0: return False return True @@ -398,8 +377,7 @@ class BaseMocapDataset(BaseDataset): return data_list_tp - def _get_bottomup_data_infos(self, instance_list: List[Dict], - image_list: List[Dict]) -> List[Dict]: + def _get_bottomup_data_infos(self, instance_list: List[Dict], image_list: List[Dict]) -> List[Dict]: """Organize the data list in bottom-up mode.""" # bottom-up data list @@ -408,17 +386,16 @@ class BaseMocapDataset(BaseDataset): used_img_ids = set() # group instances by img_id - for img_ids, data_infos in groupby(instance_list, - lambda x: x['img_ids']): + for img_ids, data_infos in groupby(instance_list, lambda x: x["img_ids"]): for img_id in img_ids: used_img_ids.add(img_id) data_infos = list(data_infos) # image data - img_paths = data_infos[0]['img_paths'] + img_paths = data_infos[0]["img_paths"] data_info_bu = { - 'img_ids': img_ids, - 'img_paths': img_paths, + "img_ids": img_ids, + "img_paths": img_paths, } for key in data_infos[0].keys(): @@ -431,22 +408,21 @@ class BaseMocapDataset(BaseDataset): # The segmentation annotation of invalid objects will be used # to generate valid region mask in the pipeline. invalid_segs = [] - for data_info_invalid in filterfalse(self._is_valid_instance, - data_infos): - if 'segmentation' in data_info_invalid: - invalid_segs.append(data_info_invalid['segmentation']) - data_info_bu['invalid_segs'] = invalid_segs + for data_info_invalid in filterfalse(self._is_valid_instance, data_infos): + if "segmentation" in data_info_invalid: + invalid_segs.append(data_info_invalid["segmentation"]) + data_info_bu["invalid_segs"] = invalid_segs data_list_bu.append(data_info_bu) # add images without instance for evaluation if self.test_mode: for img_info in image_list: - if img_info['img_id'] not in used_img_ids: + if img_info["img_id"] not in used_img_ids: data_info_bu = { - 'img_ids': [img_info['img_id']], - 'img_path': [img_info['img_path']], - 'id': list(), + "img_ids": [img_info["img_id"]], + "img_path": [img_info["img_path"]], + "id": list(), } data_list_bu.append(data_info_bu) diff --git a/mmpose/datasets/datasets/body/__init__.py b/mmpose/datasets/datasets/body/__init__.py index f2d29b9cd457501f5a2f2101088e304bc0cb096a..ca7e384ade1524a24483da1ebaa2325f1f053df5 100644 --- a/mmpose/datasets/datasets/body/__init__.py +++ b/mmpose/datasets/datasets/body/__init__.py @@ -15,8 +15,18 @@ from .posetrack18_dataset import PoseTrack18Dataset from .posetrack18_video_dataset import PoseTrack18VideoDataset __all__ = [ - 'CocoDataset', 'MpiiDataset', 'MpiiTrbDataset', 'AicDataset', - 'CrowdPoseDataset', 'OCHumanDataset', 'MhpDataset', 'PoseTrack18Dataset', - 'JhmdbDataset', 'PoseTrack18VideoDataset', 'HumanArtDataset', - 'HumanArt21Dataset', 'ExlposeDataset', 'CocoCropDataset' + "CocoDataset", + "MpiiDataset", + "MpiiTrbDataset", + "AicDataset", + "CrowdPoseDataset", + "OCHumanDataset", + "MhpDataset", + "PoseTrack18Dataset", + "JhmdbDataset", + "PoseTrack18VideoDataset", + "HumanArtDataset", + "HumanArt21Dataset", + "ExlposeDataset", + "CocoCropDataset", ] diff --git a/mmpose/datasets/datasets/body/aic_dataset.py b/mmpose/datasets/datasets/body/aic_dataset.py index b9c7cccc76fb47b53cd73f3152878e051b442199..027dc97c17fbc88e9e5d0e3d4ec8869d7258f2e2 100644 --- a/mmpose/datasets/datasets/body/aic_dataset.py +++ b/mmpose/datasets/datasets/body/aic_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -67,4 +68,4 @@ class AicDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/aic.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/aic.py") diff --git a/mmpose/datasets/datasets/body/coco_dataset.py b/mmpose/datasets/datasets/body/coco_dataset.py index 7cc971f91f70ba28de1b9ae520d10a2f491eb32b..c7585171092925267ddf426ba6c182d5b34e9157 100644 --- a/mmpose/datasets/datasets/body/coco_dataset.py +++ b/mmpose/datasets/datasets/body/coco_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -69,4 +70,4 @@ class CocoDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/coco.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/coco.py") diff --git a/mmpose/datasets/datasets/body/cococrop_dataset.py b/mmpose/datasets/datasets/body/cococrop_dataset.py index d7e9dfc36c95a1a7d402269cb27e4754ceb3b825..f6786bd62e35826857c04c8f89b878a7d3cdf367 100644 --- a/mmpose/datasets/datasets/body/cococrop_dataset.py +++ b/mmpose/datasets/datasets/body/cococrop_dataset.py @@ -1,5 +1,6 @@ -# Copyright (c) OpenMMLab. All rights reserved. +# Copyright (c) Miroslav Purkrabek, ProbPose. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -69,4 +70,4 @@ class CocoCropDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/coco_crop.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/coco_crop.py") diff --git a/mmpose/datasets/datasets/body/crowdpose_dataset.py b/mmpose/datasets/datasets/body/crowdpose_dataset.py index 4218708ff27b37dce7992d73695193442207b6d9..975563e74a037035f801313597641011eb0deb4b 100644 --- a/mmpose/datasets/datasets/body/crowdpose_dataset.py +++ b/mmpose/datasets/datasets/body/crowdpose_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -67,4 +68,4 @@ class CrowdPoseDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/crowdpose.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/crowdpose.py") diff --git a/mmpose/datasets/datasets/body/exlpose_dataset.py b/mmpose/datasets/datasets/body/exlpose_dataset.py index ad29f5d751ea9147d417188333a08ac793d5821e..8f8aafc876c42725ec806e7deeefb5038c194faa 100644 --- a/mmpose/datasets/datasets/body/exlpose_dataset.py +++ b/mmpose/datasets/datasets/body/exlpose_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -66,4 +67,4 @@ class ExlposeDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/exlpose.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/exlpose.py") diff --git a/mmpose/datasets/datasets/body/humanart21_dataset.py b/mmpose/datasets/datasets/body/humanart21_dataset.py index e4b5695261289d3a9ea3e5006cd95c7dd8ec6172..dda480dcfc8a163ca517690f26f17505d423df9b 100644 --- a/mmpose/datasets/datasets/body/humanart21_dataset.py +++ b/mmpose/datasets/datasets/body/humanart21_dataset.py @@ -5,6 +5,7 @@ from typing import Optional import numpy as np from mmpose.registry import DATASETS + from .humanart_dataset import HumanArtDataset @@ -79,7 +80,7 @@ class HumanArt21Dataset(HumanArtDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/humanart21.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/humanart21.py") def parse_data_info(self, raw_data_info: dict) -> Optional[dict]: """Parse raw COCO annotation of an instance. @@ -96,17 +97,17 @@ class HumanArt21Dataset(HumanArtDataset): dict | None: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] # filter invalid instance - if 'bbox' not in ann or 'keypoints' not in ann: + if "bbox" not in ann or "keypoints" not in ann: return None - img_w, img_h = img['width'], img['height'] + img_w, img_h = img["width"], img["height"] # get bbox in shape [1, 4], formatted as xywh - x, y, w, h = ann['bbox'] + x, y, w, h = ann["bbox"] x1 = np.clip(x, 0, img_w - 1) y1 = np.clip(y, 0, img_h - 1) x2 = np.clip(x + w, 0, img_w - 1) @@ -115,34 +116,33 @@ class HumanArt21Dataset(HumanArtDataset): bbox = np.array([x1, y1, x2, y2], dtype=np.float32).reshape(1, 4) # keypoints in shape [1, K, 2] and keypoints_visible in [1, K] - _keypoints = np.array( - ann['keypoints_21'], dtype=np.float32).reshape(1, -1, 3) + _keypoints = np.array(ann["keypoints_21"], dtype=np.float32).reshape(1, -1, 3) keypoints = _keypoints[..., :2] keypoints_visible = np.minimum(1, _keypoints[..., 2]) - if 'num_keypoints' in ann: - num_keypoints = ann['num_keypoints'] + if "num_keypoints" in ann: + num_keypoints = ann["num_keypoints"] else: num_keypoints = np.count_nonzero(keypoints.max(axis=2)) data_info = { - 'img_id': ann['image_id'], - 'img_path': img['img_path'], - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': num_keypoints, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'iscrowd': ann.get('iscrowd', 0), - 'segmentation': ann.get('segmentation', None), - 'id': ann['id'], - 'category_id': ann['category_id'], + "img_id": ann["image_id"], + "img_path": img["img_path"], + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": num_keypoints, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "iscrowd": ann.get("iscrowd", 0), + "segmentation": ann.get("segmentation", None), + "id": ann["id"], + "category_id": ann["category_id"], # store the raw annotation of the instance # it is useful for evaluation without providing ann_file - 'raw_ann_info': copy.deepcopy(ann), + "raw_ann_info": copy.deepcopy(ann), } - if 'crowdIndex' in img: - data_info['crowd_index'] = img['crowdIndex'] + if "crowdIndex" in img: + data_info["crowd_index"] = img["crowdIndex"] return data_info diff --git a/mmpose/datasets/datasets/body/humanart_dataset.py b/mmpose/datasets/datasets/body/humanart_dataset.py index 6f8aa2943d60ed668a2b93cc3d093f2ee929b6f1..c9ab5c96a6f6ffefea35c4abf0ad40676c935ad2 100644 --- a/mmpose/datasets/datasets/body/humanart_dataset.py +++ b/mmpose/datasets/datasets/body/humanart_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -70,4 +71,4 @@ class HumanArtDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/humanart.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/humanart.py") diff --git a/mmpose/datasets/datasets/body/jhmdb_dataset.py b/mmpose/datasets/datasets/body/jhmdb_dataset.py index 940a4cd4dc8f407cf483aeda2c4c02f48d32b92f..241452f1080dca7a8d4d9ea0d4e90bf98ee1ebd6 100644 --- a/mmpose/datasets/datasets/body/jhmdb_dataset.py +++ b/mmpose/datasets/datasets/body/jhmdb_dataset.py @@ -5,6 +5,7 @@ from typing import Optional import numpy as np from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -73,7 +74,7 @@ class JhmdbDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/jhmdb.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/jhmdb.py") def parse_data_info(self, raw_data_info: dict) -> Optional[dict]: """Parse raw COCO annotation of an instance. @@ -90,14 +91,14 @@ class JhmdbDataset(BaseCocoStyleDataset): dict: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] - img_path = osp.join(self.data_prefix['img'], img['file_name']) - img_w, img_h = img['width'], img['height'] + img_path = osp.join(self.data_prefix["img"], img["file_name"]) + img_w, img_h = img["width"], img["height"] # get bbox in shape [1, 4], formatted as xywh - x, y, w, h = ann['bbox'] + x, y, w, h = ann["bbox"] # JHMDB uses matlab format, index is 1-based, # we should first convert to 0-based index x -= 1 @@ -110,8 +111,7 @@ class JhmdbDataset(BaseCocoStyleDataset): bbox = np.array([x1, y1, x2, y2], dtype=np.float32).reshape(1, 4) # keypoints in shape [1, K, 2] and keypoints_visible in [1, K] - _keypoints = np.array( - ann['keypoints'], dtype=np.float32).reshape(1, -1, 3) + _keypoints = np.array(ann["keypoints"], dtype=np.float32).reshape(1, -1, 3) # JHMDB uses matlab format, index is 1-based, # we should first convert to 0-based index keypoints = _keypoints[..., :2] - 1 @@ -119,21 +119,21 @@ class JhmdbDataset(BaseCocoStyleDataset): num_keypoints = np.count_nonzero(keypoints.max(axis=2)) area = np.clip((x2 - x1) * (y2 - y1) * 0.53, a_min=1.0, a_max=None) - category_id = ann.get('category_id', [1] * len(keypoints)) + category_id = ann.get("category_id", [1] * len(keypoints)) data_info = { - 'img_id': ann['image_id'], - 'img_path': img_path, - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': num_keypoints, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'area': np.array(area, dtype=np.float32), - 'iscrowd': ann.get('iscrowd', 0), - 'segmentation': ann.get('segmentation', None), - 'id': ann['id'], - 'category_id': category_id, + "img_id": ann["image_id"], + "img_path": img_path, + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": num_keypoints, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "area": np.array(area, dtype=np.float32), + "iscrowd": ann.get("iscrowd", 0), + "segmentation": ann.get("segmentation", None), + "id": ann["id"], + "category_id": category_id, } return data_info diff --git a/mmpose/datasets/datasets/body/mhp_dataset.py b/mmpose/datasets/datasets/body/mhp_dataset.py index 55d33602536383898c8b65ca48994d33c1616bea..3ed707cf34187a4b7e41c70b40e7aaaf35edacca 100644 --- a/mmpose/datasets/datasets/body/mhp_dataset.py +++ b/mmpose/datasets/datasets/body/mhp_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -69,4 +70,4 @@ class MhpDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/mhp.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/mhp.py") diff --git a/mmpose/datasets/datasets/body/mpii_dataset.py b/mmpose/datasets/datasets/body/mpii_dataset.py index 904d94854b49ea7eda06155195b199d725906e9e..5e1df5efaac32f8f0b72f77ea29dd231e78e47cf 100644 --- a/mmpose/datasets/datasets/body/mpii_dataset.py +++ b/mmpose/datasets/datasets/body/mpii_dataset.py @@ -9,6 +9,7 @@ from scipy.io import loadmat from mmpose.registry import DATASETS from mmpose.structures.bbox import bbox_cs2xyxy + from ..base import BaseCocoStyleDataset @@ -80,43 +81,43 @@ class MpiiDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/mpii.py') - - def __init__(self, - ann_file: str = '', - bbox_file: Optional[str] = None, - headbox_file: Optional[str] = None, - data_mode: str = 'topdown', - metainfo: Optional[dict] = None, - data_root: Optional[str] = None, - data_prefix: dict = dict(img=''), - filter_cfg: Optional[dict] = None, - indices: Optional[Union[int, Sequence[int]]] = None, - serialize_data: bool = True, - pipeline: List[Union[dict, Callable]] = [], - test_mode: bool = False, - lazy_init: bool = False, - max_refetch: int = 1000): + METAINFO: dict = dict(from_file="configs/_base_/datasets/mpii.py") + + def __init__( + self, + ann_file: str = "", + bbox_file: Optional[str] = None, + headbox_file: Optional[str] = None, + data_mode: str = "topdown", + metainfo: Optional[dict] = None, + data_root: Optional[str] = None, + data_prefix: dict = dict(img=""), + filter_cfg: Optional[dict] = None, + indices: Optional[Union[int, Sequence[int]]] = None, + serialize_data: bool = True, + pipeline: List[Union[dict, Callable]] = [], + test_mode: bool = False, + lazy_init: bool = False, + max_refetch: int = 1000, + ): if headbox_file: - if data_mode != 'topdown': + if data_mode != "topdown": raise ValueError( - f'{self.__class__.__name__} is set to {data_mode}: ' - 'mode, while "headbox_file" is only ' - 'supported in topdown mode.') + f"{self.__class__.__name__} is set to {data_mode}: " 'mode, while "headbox_file" is only ' "supported in topdown mode." + ) if not test_mode: raise ValueError( - f'{self.__class__.__name__} has `test_mode==False` ' - 'while "headbox_file" is only ' - 'supported when `test_mode==True`.') + f"{self.__class__.__name__} has `test_mode==False` " 'while "headbox_file" is only ' "supported when `test_mode==True`." + ) headbox_file_type = headbox_file[-3:] - allow_headbox_file_type = ['mat'] + allow_headbox_file_type = ["mat"] if headbox_file_type not in allow_headbox_file_type: raise KeyError( - f'The head boxes file type {headbox_file_type} is not ' - f'supported. Should be `mat` but got {headbox_file_type}.') + f"The head boxes file type {headbox_file_type} is not " f"supported. Should be `mat` but got {headbox_file_type}." + ) self.headbox_file = headbox_file super().__init__( @@ -132,26 +133,24 @@ class MpiiDataset(BaseCocoStyleDataset): pipeline=pipeline, test_mode=test_mode, lazy_init=lazy_init, - max_refetch=max_refetch) + max_refetch=max_refetch, + ) def _load_annotations(self) -> Tuple[List[dict], List[dict]]: """Load data from annotations in MPII format.""" - assert exists(self.ann_file), ( - f'Annotation file `{self.ann_file}` does not exist') + assert exists(self.ann_file), f"Annotation file `{self.ann_file}` does not exist" with get_local_path(self.ann_file) as local_path: with open(local_path) as anno_file: self.anns = json.load(anno_file) if self.headbox_file: - assert exists(self.headbox_file), ( - f'Headbox file `{self.headbox_file}` does not exist') + assert exists(self.headbox_file), f"Headbox file `{self.headbox_file}` does not exist" with get_local_path(self.headbox_file) as local_path: self.headbox_dict = loadmat(local_path) - headboxes_src = np.transpose(self.headbox_dict['headboxes_src'], - [2, 0, 1]) + headboxes_src = np.transpose(self.headbox_dict["headboxes_src"], [2, 0, 1]) SC_BIAS = 0.6 instance_list = [] @@ -160,16 +159,15 @@ class MpiiDataset(BaseCocoStyleDataset): ann_id = 0 # mpii bbox scales are normalized with factor 200. - pixel_std = 200. + pixel_std = 200.0 for idx, ann in enumerate(self.anns): - center = np.array(ann['center'], dtype=np.float32) - scale = np.array([ann['scale'], ann['scale']], - dtype=np.float32) * pixel_std + center = np.array(ann["center"], dtype=np.float32) + scale = np.array([ann["scale"], ann["scale"]], dtype=np.float32) * pixel_std # Adjust center/scale slightly to avoid cropping limbs if center[0] != -1: - center[1] = center[1] + 15. / pixel_std * scale[1] + center[1] = center[1] + 15.0 / pixel_std * scale[1] # MPII uses matlab format, index is 1-based, # we should first convert to 0-based index @@ -181,49 +179,50 @@ class MpiiDataset(BaseCocoStyleDataset): bbox = bbox_cs2xyxy(center, scale) # load keypoints in shape [1, K, 2] and keypoints_visible in [1, K] - keypoints = np.array( - ann['joints'], dtype=np.float32).reshape(1, -1, 2) - keypoints_visible = np.array(ann['joints_vis']).reshape(1, -1) + keypoints = np.array(ann["joints"], dtype=np.float32).reshape(1, -1, 2) + keypoints_visible = np.array(ann["joints_vis"]).reshape(1, -1) x1, y1, x2, y2 = np.split(bbox, axis=1, indices_or_sections=4) area = np.clip((x2 - x1) * (y2 - y1) * 0.53, a_min=1.0, a_max=None) area = area[..., 0].astype(np.float32) - category_id = ann.get('category_id', [1] * len(bbox)) + category_id = ann.get("category_id", [1] * len(bbox)) - segmentation = ann.get('segmentation', None) + segmentation = ann.get("segmentation", None) instance_info = { - 'id': ann_id, - 'img_id': int(ann['image'].split('.')[0]), - 'img_path': osp.join(self.data_prefix['img'], ann['image']), - 'bbox_center': center, - 'bbox_scale': scale, - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'keypoints_visibility': keypoints_visible, - 'area': area, - 'category_id': category_id, + "id": ann_id, + "img_id": int(ann["image"].split(".")[0]), + "img_path": osp.join(self.data_prefix["img"], ann["image"]), + "bbox_center": center, + "bbox_scale": scale, + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "keypoints_visibility": keypoints_visible, + "area": area, + "category_id": category_id, } if segmentation is not None: - instance_info['segmentation'] = segmentation + instance_info["segmentation"] = segmentation if self.headbox_file: # calculate the diagonal length of head box as norm_factor headbox = headboxes_src[idx] head_size = np.linalg.norm(headbox[1] - headbox[0], axis=0) head_size *= SC_BIAS - instance_info['head_size'] = head_size.reshape(1, -1) - - if instance_info['img_id'] not in used_img_ids: - used_img_ids.add(instance_info['img_id']) - image_list.append({ - 'img_id': instance_info['img_id'], - 'img_path': instance_info['img_path'], - }) + instance_info["head_size"] = head_size.reshape(1, -1) + + if instance_info["img_id"] not in used_img_ids: + used_img_ids.add(instance_info["img_id"]) + image_list.append( + { + "img_id": instance_info["img_id"], + "img_path": instance_info["img_path"], + } + ) instance_list.append(instance_info) ann_id = ann_id + 1 diff --git a/mmpose/datasets/datasets/body/mpii_trb_dataset.py b/mmpose/datasets/datasets/body/mpii_trb_dataset.py index 36f76166a91ea35f512972cb26f0a62e9cf78b9d..e3690281ef55df7996411d7131ca76dfb0327863 100644 --- a/mmpose/datasets/datasets/body/mpii_trb_dataset.py +++ b/mmpose/datasets/datasets/body/mpii_trb_dataset.py @@ -8,6 +8,7 @@ from mmengine.fileio import exists, get_local_path from mmpose.registry import DATASETS from mmpose.structures.bbox import bbox_cs2xyxy + from ..base import BaseCocoStyleDataset @@ -101,71 +102,68 @@ class MpiiTrbDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/mpii_trb.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/mpii_trb.py") def _load_annotations(self) -> Tuple[List[dict], List[dict]]: """Load data from annotations in MPII-TRB format.""" - assert exists(self.ann_file), ( - f'Annotation file `{self.ann_file}` does not exist') + assert exists(self.ann_file), f"Annotation file `{self.ann_file}` does not exist" with get_local_path(self.ann_file) as local_path: with open(local_path) as anno_file: self.data = json.load(anno_file) - imgid2info = {img['id']: img for img in self.data['images']} + imgid2info = {img["id"]: img for img in self.data["images"]} instance_list = [] image_list = [] used_img_ids = set() # mpii-trb bbox scales are normalized with factor 200. - pixel_std = 200. + pixel_std = 200.0 - for ann in self.data['annotations']: - img_id = ann['image_id'] + for ann in self.data["annotations"]: + img_id = ann["image_id"] # center, scale in shape [1, 2] and bbox in [1, 4] - center = np.array([ann['center']], dtype=np.float32) - scale = np.array([[ann['scale'], ann['scale']]], - dtype=np.float32) * pixel_std + center = np.array([ann["center"]], dtype=np.float32) + scale = np.array([[ann["scale"], ann["scale"]]], dtype=np.float32) * pixel_std bbox = bbox_cs2xyxy(center, scale) # keypoints in shape [1, K, 2] and keypoints_visible in [1, K] - _keypoints = np.array( - ann['keypoints'], dtype=np.float32).reshape(1, -1, 3) + _keypoints = np.array(ann["keypoints"], dtype=np.float32).reshape(1, -1, 3) keypoints = _keypoints[..., :2] keypoints_visible = np.minimum(1, _keypoints[..., 2]) - img_path = osp.join(self.data_prefix['img'], - imgid2info[img_id]['file_name']) + img_path = osp.join(self.data_prefix["img"], imgid2info[img_id]["file_name"]) instance_info = { - 'id': ann['id'], - 'img_id': img_id, - 'img_path': img_path, - 'bbox_center': center, - 'bbox_scale': scale, - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': ann['num_joints'], - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'iscrowd': ann['iscrowd'], + "id": ann["id"], + "img_id": img_id, + "img_path": img_path, + "bbox_center": center, + "bbox_scale": scale, + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": ann["num_joints"], + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "iscrowd": ann["iscrowd"], } # val set - if 'headbox' in ann: - instance_info['headbox'] = np.array( - ann['headbox'], dtype=np.float32) + if "headbox" in ann: + instance_info["headbox"] = np.array(ann["headbox"], dtype=np.float32) instance_list.append(instance_info) - if instance_info['img_id'] not in used_img_ids: - used_img_ids.add(instance_info['img_id']) - image_list.append({ - 'img_id': instance_info['img_id'], - 'img_path': instance_info['img_path'], - }) - - instance_list = sorted(instance_list, key=lambda x: x['id']) + if instance_info["img_id"] not in used_img_ids: + used_img_ids.add(instance_info["img_id"]) + image_list.append( + { + "img_id": instance_info["img_id"], + "img_path": instance_info["img_path"], + } + ) + + instance_list = sorted(instance_list, key=lambda x: x["id"]) return instance_list, image_list diff --git a/mmpose/datasets/datasets/body/ochuman_dataset.py b/mmpose/datasets/datasets/body/ochuman_dataset.py index 695d090ea998dd530e0f65f902916107e77c4f6d..14b5e6cc9a9cceb529a6d3fcbdc098f043b3e4b2 100644 --- a/mmpose/datasets/datasets/body/ochuman_dataset.py +++ b/mmpose/datasets/datasets/body/ochuman_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -75,4 +76,4 @@ class OCHumanDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/ochuman.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/ochuman.py") diff --git a/mmpose/datasets/datasets/body/posetrack18_dataset.py b/mmpose/datasets/datasets/body/posetrack18_dataset.py index b8110c107f6869085ed795c8f1f0338d2c6ed21d..3b65ace6dd84533b6900203f4b628f6f59419c28 100644 --- a/mmpose/datasets/datasets/body/posetrack18_dataset.py +++ b/mmpose/datasets/datasets/body/posetrack18_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -69,4 +70,4 @@ class PoseTrack18Dataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/posetrack18.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/posetrack18.py") diff --git a/mmpose/datasets/datasets/body/posetrack18_video_dataset.py b/mmpose/datasets/datasets/body/posetrack18_video_dataset.py index f862d9bc5aa039123633663fc2277b9a61c87fc8..cfde3f148f4c542f730f3064d70521903badae63 100644 --- a/mmpose/datasets/datasets/body/posetrack18_video_dataset.py +++ b/mmpose/datasets/datasets/body/posetrack18_video_dataset.py @@ -5,10 +5,11 @@ from typing import Callable, List, Optional, Sequence, Union import numpy as np from mmengine.fileio import exists, get_local_path, load from mmengine.utils import is_list_of -from xtcocotools.coco import COCO from mmpose.registry import DATASETS from mmpose.structures.bbox import bbox_xywh2xyxy +from xtcocotools.coco import COCO + from ..base import BaseCocoStyleDataset @@ -101,81 +102,74 @@ class PoseTrack18VideoDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/posetrack18.py') - - def __init__(self, - ann_file: str = '', - bbox_file: Optional[str] = None, - data_mode: str = 'topdown', - frame_weights: List[Union[int, float]] = [0.0, 1.0], - frame_sampler_mode: str = 'random', - frame_range: Optional[Union[int, List[int]]] = None, - num_sampled_frame: Optional[int] = None, - frame_indices: Optional[Sequence[int]] = None, - ph_fill_len: int = 6, - metainfo: Optional[dict] = None, - data_root: Optional[str] = None, - data_prefix: dict = dict(img=''), - filter_cfg: Optional[dict] = None, - indices: Optional[Union[int, Sequence[int]]] = None, - serialize_data: bool = True, - pipeline: List[Union[dict, Callable]] = [], - test_mode: bool = False, - lazy_init: bool = False, - max_refetch: int = 1000): - assert sum(frame_weights) == 1, 'Invalid `frame_weights`: should sum'\ - f' to 1.0, but got {frame_weights}.' + METAINFO: dict = dict(from_file="configs/_base_/datasets/posetrack18.py") + + def __init__( + self, + ann_file: str = "", + bbox_file: Optional[str] = None, + data_mode: str = "topdown", + frame_weights: List[Union[int, float]] = [0.0, 1.0], + frame_sampler_mode: str = "random", + frame_range: Optional[Union[int, List[int]]] = None, + num_sampled_frame: Optional[int] = None, + frame_indices: Optional[Sequence[int]] = None, + ph_fill_len: int = 6, + metainfo: Optional[dict] = None, + data_root: Optional[str] = None, + data_prefix: dict = dict(img=""), + filter_cfg: Optional[dict] = None, + indices: Optional[Union[int, Sequence[int]]] = None, + serialize_data: bool = True, + pipeline: List[Union[dict, Callable]] = [], + test_mode: bool = False, + lazy_init: bool = False, + max_refetch: int = 1000, + ): + assert sum(frame_weights) == 1, "Invalid `frame_weights`: should sum" f" to 1.0, but got {frame_weights}." for weight in frame_weights: - assert weight >= 0, 'frame_weight can not be a negative value.' + assert weight >= 0, "frame_weight can not be a negative value." self.frame_weights = np.array(frame_weights) - if frame_sampler_mode not in {'fixed', 'random'}: + if frame_sampler_mode not in {"fixed", "random"}: raise ValueError( - f'{self.__class__.__name__} got invalid frame_sampler_mode: ' - f'{frame_sampler_mode}. Should be `"fixed"` or `"random"`.') + f"{self.__class__.__name__} got invalid frame_sampler_mode: " f'{frame_sampler_mode}. Should be `"fixed"` or `"random"`.' + ) self.frame_sampler_mode = frame_sampler_mode - if frame_sampler_mode == 'random': - assert frame_range is not None, \ - '`frame_sampler_mode` is set as `random`, ' \ - 'please specify the `frame_range`.' + if frame_sampler_mode == "random": + assert frame_range is not None, "`frame_sampler_mode` is set as `random`, " "please specify the `frame_range`." if isinstance(frame_range, int): - assert frame_range >= 0, \ - 'frame_range can not be a negative value.' + assert frame_range >= 0, "frame_range can not be a negative value." self.frame_range = [-frame_range, frame_range] elif isinstance(frame_range, Sequence): - assert len(frame_range) == 2, 'The length must be 2.' - assert frame_range[0] <= 0 and frame_range[ - 1] >= 0 and frame_range[1] > frame_range[ - 0], 'Invalid `frame_range`' + assert len(frame_range) == 2, "The length must be 2." + assert frame_range[0] <= 0 and frame_range[1] >= 0 and frame_range[1] > frame_range[0], "Invalid `frame_range`" for i in frame_range: - assert isinstance(i, int), 'Each element must be int.' + assert isinstance(i, int), "Each element must be int." self.frame_range = frame_range else: - raise TypeError( - f'The type of `frame_range` must be int or Sequence, ' - f'but got {type(frame_range)}.') - - assert num_sampled_frame is not None, \ - '`frame_sampler_mode` is set as `random`, please specify ' \ - '`num_sampled_frame`, e.g. the number of sampled frames.' - - assert len(frame_weights) == num_sampled_frame + 1, \ - f'the length of frame_weights({len(frame_weights)}) '\ - f'does not match the number of sampled adjacent '\ - f'frames({num_sampled_frame})' + raise TypeError(f"The type of `frame_range` must be int or Sequence, " f"but got {type(frame_range)}.") + + assert num_sampled_frame is not None, ( + "`frame_sampler_mode` is set as `random`, please specify " "`num_sampled_frame`, e.g. the number of sampled frames." + ) + + assert len(frame_weights) == num_sampled_frame + 1, ( + f"the length of frame_weights({len(frame_weights)}) " + f"does not match the number of sampled adjacent " + f"frames({num_sampled_frame})" + ) self.frame_indices = None self.num_sampled_frame = num_sampled_frame - if frame_sampler_mode == 'fixed': - assert frame_indices is not None, \ - '`frame_sampler_mode` is set as `fixed`, ' \ - 'please specify the `frame_indices`.' - assert len(frame_weights) == len(frame_indices), \ - f'the length of frame_weights({len(frame_weights)}) does not '\ - f'match the length of frame_indices({len(frame_indices)}).' + if frame_sampler_mode == "fixed": + assert frame_indices is not None, "`frame_sampler_mode` is set as `fixed`, " "please specify the `frame_indices`." + assert len(frame_weights) == len(frame_indices), ( + f"the length of frame_weights({len(frame_weights)}) does not " f"match the length of frame_indices({len(frame_indices)})." + ) frame_indices.sort() self.frame_indices = frame_indices self.frame_range = None @@ -196,7 +190,8 @@ class PoseTrack18VideoDataset(BaseCocoStyleDataset): pipeline=pipeline, test_mode=test_mode, lazy_init=lazy_init, - max_refetch=max_refetch) + max_refetch=max_refetch, + ) def parse_data_info(self, raw_data_info: dict) -> Optional[dict]: """Parse raw annotation of an instance. @@ -213,18 +208,17 @@ class PoseTrack18VideoDataset(BaseCocoStyleDataset): dict: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] # filter invalid instance - if 'bbox' not in ann or 'keypoints' not in ann or max( - ann['keypoints']) == 0: + if "bbox" not in ann or "keypoints" not in ann or max(ann["keypoints"]) == 0: return None - img_w, img_h = img['width'], img['height'] + img_w, img_h = img["width"], img["height"] # get the bbox of the center frame # get bbox in shape [1, 4], formatted as xywh - x, y, w, h = ann['bbox'] + x, y, w, h = ann["bbox"] x1 = np.clip(x, 0, img_w - 1) y1 = np.clip(y, 0, img_h - 1) x2 = np.clip(x + w, 0, img_w - 1) @@ -234,27 +228,26 @@ class PoseTrack18VideoDataset(BaseCocoStyleDataset): # get the keypoints of the center frame # keypoints in shape [1, K, 2] and keypoints_visible in [1, K] - _keypoints = np.array( - ann['keypoints'], dtype=np.float32).reshape(1, -1, 3) + _keypoints = np.array(ann["keypoints"], dtype=np.float32).reshape(1, -1, 3) keypoints = _keypoints[..., :2] keypoints_visible = np.minimum(1, _keypoints[..., 2]) # deal with multiple image paths img_paths: list = [] # get the image path of the center frame - center_img_path = osp.join(self.data_prefix['img'], img['file_name']) + center_img_path = osp.join(self.data_prefix["img"], img["file_name"]) # append the center image path first img_paths.append(center_img_path) # select the frame indices - if self.frame_sampler_mode == 'fixed': + if self.frame_sampler_mode == "fixed": indices = self.frame_indices else: # self.frame_sampler_mode == 'random': low, high = self.frame_range indices = np.random.randint(low, high + 1, self.num_sampled_frame) - nframes = int(img['nframes']) - file_name = img['file_name'] + nframes = int(img["nframes"]) + file_name = img["file_name"] ref_idx = int(osp.splitext(osp.basename(file_name))[0]) for idx in indices: @@ -265,38 +258,34 @@ class PoseTrack18VideoDataset(BaseCocoStyleDataset): # clip the frame index to make sure that it does not exceed # the boundings of frame indices support_idx = np.clip(support_idx, 0, nframes - 1) - sup_img_path = osp.join( - osp.dirname(center_img_path), - str(support_idx).zfill(self.ph_fill_len) + '.jpg') + sup_img_path = osp.join(osp.dirname(center_img_path), str(support_idx).zfill(self.ph_fill_len) + ".jpg") img_paths.append(sup_img_path) data_info = { - 'img_id': int(img['frame_id']), - 'img_path': img_paths, - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': ann['num_keypoints'], - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'frame_weights': self.frame_weights, - 'id': ann['id'], + "img_id": int(img["frame_id"]), + "img_path": img_paths, + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": ann["num_keypoints"], + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "frame_weights": self.frame_weights, + "id": ann["id"], } return data_info def _load_detection_results(self) -> List[dict]: """Load data from detection results with dummy keypoint annotations.""" - assert exists(self.ann_file), ( - f'Annotation file `{self.ann_file}` does not exist') - assert exists( - self.bbox_file), (f'Bbox file `{self.bbox_file}` does not exist') + assert exists(self.ann_file), f"Annotation file `{self.ann_file}` does not exist" + assert exists(self.bbox_file), f"Bbox file `{self.bbox_file}` does not exist" # load detection results det_results = load(self.bbox_file) assert is_list_of(det_results, dict), ( - f'annotation file `{self.bbox_file}` should be a list of dicts, ' - f'but got type {type(det_results)}') + f"annotation file `{self.bbox_file}` should be a list of dicts, " f"but got type {type(det_results)}" + ) # load coco annotations to build image id-to-name index with get_local_path(self.ann_file) as local_path: @@ -307,61 +296,59 @@ class PoseTrack18VideoDataset(BaseCocoStyleDataset): # mapping image id to name id2name = {} for img_id, image in self.coco.imgs.items(): - file_name = image['file_name'] + file_name = image["file_name"] id2name[img_id] = file_name name2id[file_name] = img_id - num_keypoints = self.metainfo['num_keypoints'] + num_keypoints = self.metainfo["num_keypoints"] data_list = [] id_ = 0 for det in det_results: # remove non-human instances - if det['category_id'] != 1: + if det["category_id"] != 1: continue # get the predicted bbox and bbox_score - bbox_xywh = np.array( - det['bbox'][:4], dtype=np.float32).reshape(1, 4) + bbox_xywh = np.array(det["bbox"][:4], dtype=np.float32).reshape(1, 4) bbox = bbox_xywh2xyxy(bbox_xywh) - bbox_score = np.array(det['score'], dtype=np.float32).reshape(1) + bbox_score = np.array(det["score"], dtype=np.float32).reshape(1) # use dummy keypoint location and visibility keypoints = np.zeros((1, num_keypoints, 2), dtype=np.float32) keypoints_visible = np.ones((1, num_keypoints), dtype=np.float32) # deal with different bbox file formats - if 'nframes' in det: - nframes = int(det['nframes']) + if "nframes" in det: + nframes = int(det["nframes"]) else: - if 'image_name' in det: - img_id = name2id[det['image_name']] + if "image_name" in det: + img_id = name2id[det["image_name"]] else: - img_id = det['image_id'] + img_id = det["image_id"] img_ann = self.coco.loadImgs(img_id)[0] - nframes = int(img_ann['nframes']) + nframes = int(img_ann["nframes"]) # deal with multiple image paths img_paths: list = [] - if 'image_name' in det: - image_name = det['image_name'] + if "image_name" in det: + image_name = det["image_name"] else: - image_name = id2name[det['image_id']] + image_name = id2name[det["image_id"]] # get the image path of the center frame - center_img_path = osp.join(self.data_prefix['img'], image_name) + center_img_path = osp.join(self.data_prefix["img"], image_name) # append the center image path first img_paths.append(center_img_path) # "images/val/012834_mpii_test/000000.jpg" -->> "000000.jpg" - center_image_name = image_name.split('/')[-1] - ref_idx = int(center_image_name.replace('.jpg', '')) + center_image_name = image_name.split("/")[-1] + ref_idx = int(center_image_name.replace(".jpg", "")) # select the frame indices - if self.frame_sampler_mode == 'fixed': + if self.frame_sampler_mode == "fixed": indices = self.frame_indices else: # self.frame_sampler_mode == 'random': low, high = self.frame_range - indices = np.random.randint(low, high + 1, - self.num_sampled_frame) + indices = np.random.randint(low, high + 1, self.num_sampled_frame) for idx in indices: if self.test_mode and idx == 0: @@ -371,22 +358,22 @@ class PoseTrack18VideoDataset(BaseCocoStyleDataset): # clip the frame index to make sure that it does not exceed # the boundings of frame indices support_idx = np.clip(support_idx, 0, nframes - 1) - sup_img_path = center_img_path.replace( - center_image_name, - str(support_idx).zfill(self.ph_fill_len) + '.jpg') + sup_img_path = center_img_path.replace(center_image_name, str(support_idx).zfill(self.ph_fill_len) + ".jpg") img_paths.append(sup_img_path) - data_list.append({ - 'img_id': det['image_id'], - 'img_path': img_paths, - 'frame_weights': self.frame_weights, - 'bbox': bbox, - 'bbox_score': bbox_score, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'id': id_, - }) + data_list.append( + { + "img_id": det["image_id"], + "img_path": img_paths, + "frame_weights": self.frame_weights, + "bbox": bbox, + "bbox_score": bbox_score, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "id": id_, + } + ) id_ += 1 diff --git a/mmpose/datasets/datasets/body3d/__init__.py b/mmpose/datasets/datasets/body3d/__init__.py index d5afeca578a7c937cfcfe89302e62d03dcaab05d..6ecd24d6f29d59e856dea7b614db97e851f45a37 100644 --- a/mmpose/datasets/datasets/body3d/__init__.py +++ b/mmpose/datasets/datasets/body3d/__init__.py @@ -1,4 +1,4 @@ # Copyright (c) OpenMMLab. All rights reserved. from .h36m_dataset import Human36mDataset -__all__ = ['Human36mDataset'] +__all__ = ["Human36mDataset"] diff --git a/mmpose/datasets/datasets/body3d/h36m_dataset.py b/mmpose/datasets/datasets/body3d/h36m_dataset.py index 397738c2769731cdbde612a214522e32b2721e3c..b8c3f110940b1bd43b2a7f92e6b5a98a3f8ecdb5 100644 --- a/mmpose/datasets/datasets/body3d/h36m_dataset.py +++ b/mmpose/datasets/datasets/body3d/h36m_dataset.py @@ -104,44 +104,45 @@ class Human36mDataset(BaseMocapDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/h36m.py') - SUPPORTED_keypoint_2d_src = {'gt', 'detection', 'pipeline'} - - def __init__(self, - ann_file: str = '', - seq_len: int = 1, - seq_step: int = 1, - multiple_target: int = 0, - multiple_target_step: int = 0, - pad_video_seq: bool = False, - causal: bool = True, - subset_frac: float = 1.0, - keypoint_2d_src: str = 'gt', - keypoint_2d_det_file: Optional[str] = None, - factor_file: Optional[str] = None, - camera_param_file: Optional[str] = None, - data_mode: str = 'topdown', - metainfo: Optional[dict] = None, - data_root: Optional[str] = None, - data_prefix: dict = dict(img=''), - filter_cfg: Optional[dict] = None, - indices: Optional[Union[int, Sequence[int]]] = None, - serialize_data: bool = True, - pipeline: List[Union[dict, Callable]] = [], - test_mode: bool = False, - lazy_init: bool = False, - max_refetch: int = 1000): + METAINFO: dict = dict(from_file="configs/_base_/datasets/h36m.py") + SUPPORTED_keypoint_2d_src = {"gt", "detection", "pipeline"} + + def __init__( + self, + ann_file: str = "", + seq_len: int = 1, + seq_step: int = 1, + multiple_target: int = 0, + multiple_target_step: int = 0, + pad_video_seq: bool = False, + causal: bool = True, + subset_frac: float = 1.0, + keypoint_2d_src: str = "gt", + keypoint_2d_det_file: Optional[str] = None, + factor_file: Optional[str] = None, + camera_param_file: Optional[str] = None, + data_mode: str = "topdown", + metainfo: Optional[dict] = None, + data_root: Optional[str] = None, + data_prefix: dict = dict(img=""), + filter_cfg: Optional[dict] = None, + indices: Optional[Union[int, Sequence[int]]] = None, + serialize_data: bool = True, + pipeline: List[Union[dict, Callable]] = [], + test_mode: bool = False, + lazy_init: bool = False, + max_refetch: int = 1000, + ): # check keypoint_2d_src self.keypoint_2d_src = keypoint_2d_src if self.keypoint_2d_src not in self.SUPPORTED_keypoint_2d_src: raise ValueError( - f'Unsupported `keypoint_2d_src` "{self.keypoint_2d_src}". ' - f'Supported options are {self.SUPPORTED_keypoint_2d_src}') + f'Unsupported `keypoint_2d_src` "{self.keypoint_2d_src}". ' f"Supported options are {self.SUPPORTED_keypoint_2d_src}" + ) if keypoint_2d_det_file: if not is_abs(keypoint_2d_det_file): - self.keypoint_2d_det_file = osp.join(data_root, - keypoint_2d_det_file) + self.keypoint_2d_det_file = osp.join(data_root, keypoint_2d_det_file) else: self.keypoint_2d_det_file = keypoint_2d_det_file @@ -151,8 +152,7 @@ class Human36mDataset(BaseMocapDataset): if factor_file: if not is_abs(factor_file): factor_file = osp.join(data_root, factor_file) - assert exists(factor_file), (f'`factor_file`: {factor_file}' - 'does not exist.') + assert exists(factor_file), f"`factor_file`: {factor_file}" "does not exist." self.factor_file = factor_file if multiple_target > 0 and multiple_target_step == 0: @@ -176,14 +176,15 @@ class Human36mDataset(BaseMocapDataset): pipeline=pipeline, test_mode=test_mode, lazy_init=lazy_init, - max_refetch=max_refetch) + max_refetch=max_refetch, + ) def get_sequence_indices(self) -> List[List[int]]: """Split original videos into sequences and build frame indices. This method overrides the default one in the base class. """ - imgnames = self.ann_data['imgname'] + imgnames = self.ann_data["imgname"] video_frames = defaultdict(list) for idx, imgname in enumerate(imgnames): subj, action, camera = self._parse_h36m_imgname(imgname) @@ -197,11 +198,9 @@ class Human36mDataset(BaseMocapDataset): if self.multiple_target: for _, _indices in sorted(video_frames.items()): n_frame = len(_indices) - seqs_from_video = [ - _indices[i:(i + self.multiple_target):_step] - for i in range(0, n_frame, self.multiple_target_step) - ][:(n_frame + self.multiple_target_step - - self.multiple_target) // self.multiple_target_step] + seqs_from_video = [_indices[i : (i + self.multiple_target) : _step] for i in range(0, n_frame, self.multiple_target_step)][ + : (n_frame + self.multiple_target_step - self.multiple_target) // self.multiple_target_step + ] sequence_indices.extend(seqs_from_video) else: @@ -219,19 +218,12 @@ class Human36mDataset(BaseMocapDataset): frames_right = frames_left for i in range(n_frame): pad_left = max(0, frames_left - i // _step) - pad_right = max( - 0, frames_right - (n_frame - 1 - i) // _step) + pad_right = max(0, frames_right - (n_frame - 1 - i) // _step) start = max(i % _step, i - frames_left * _step) - end = min(n_frame - (n_frame - 1 - i) % _step, - i + frames_right * _step + 1) - sequence_indices.append([_indices[0]] * pad_left + - _indices[start:end:_step] + - [_indices[-1]] * pad_right) + end = min(n_frame - (n_frame - 1 - i) % _step, i + frames_right * _step + 1) + sequence_indices.append([_indices[0]] * pad_left + _indices[start:end:_step] + [_indices[-1]] * pad_right) else: - seqs_from_video = [ - _indices[i:(i + _len):_step] - for i in range(0, n_frame - _len + 1) - ] + seqs_from_video = [_indices[i : (i + _len) : _step] for i in range(0, n_frame - _len + 1)] sequence_indices.extend(seqs_from_video) # reduce dataset size if needed @@ -247,45 +239,38 @@ class Human36mDataset(BaseMocapDataset): instance_list, image_list = super()._load_annotations() h36m_data = self.ann_data - kpts_3d = h36m_data['S'] - - if self.keypoint_2d_src == 'detection': - assert exists(self.keypoint_2d_det_file), ( - f'`keypoint_2d_det_file`: `{self.keypoint_2d_det_file}`' - 'does not exist.') - kpts_2d = self._load_keypoint_2d_detection( - self.keypoint_2d_det_file) + kpts_3d = h36m_data["S"] + + if self.keypoint_2d_src == "detection": + assert exists(self.keypoint_2d_det_file), f"`keypoint_2d_det_file`: `{self.keypoint_2d_det_file}`" "does not exist." + kpts_2d = self._load_keypoint_2d_detection(self.keypoint_2d_det_file) assert kpts_2d.shape[0] == kpts_3d.shape[0], ( - f'Number of `kpts_2d` ({kpts_2d.shape[0]}) does not match ' - f'number of `kpts_3d` ({kpts_3d.shape[0]}).') + f"Number of `kpts_2d` ({kpts_2d.shape[0]}) does not match " f"number of `kpts_3d` ({kpts_3d.shape[0]})." + ) assert kpts_2d.shape[2] == 3, ( - f'Expect `kpts_2d.shape[2]` == 3, but got ' - f'{kpts_2d.shape[2]}. Please check the format of ' - f'{self.keypoint_2d_det_file}') + f"Expect `kpts_2d.shape[2]` == 3, but got " + f"{kpts_2d.shape[2]}. Please check the format of " + f"{self.keypoint_2d_det_file}" + ) for idx, frame_ids in enumerate(self.sequence_indices): kpt_2d = kpts_2d[frame_ids].astype(np.float32) keypoints = kpt_2d[..., :2] keypoints_visible = kpt_2d[..., 2] - instance_list[idx].update({ - 'keypoints': - keypoints, - 'keypoints_visible': - keypoints_visible - }) + instance_list[idx].update({"keypoints": keypoints, "keypoints_visible": keypoints_visible}) if self.factor_file: with get_local_path(self.factor_file) as local_path: factors = np.load(local_path).astype(np.float32) else: - factors = np.zeros((kpts_3d.shape[0], ), dtype=np.float32) + factors = np.zeros((kpts_3d.shape[0],), dtype=np.float32) assert factors.shape[0] == kpts_3d.shape[0], ( - f'Number of `factors` ({factors.shape[0]}) does not match ' - f'number of `kpts_3d` ({kpts_3d.shape[0]}).') + f"Number of `factors` ({factors.shape[0]}) does not match " f"number of `kpts_3d` ({kpts_3d.shape[0]})." + ) for idx, frame_ids in enumerate(self.sequence_indices): factor = factors[frame_ids].astype(np.float32) - instance_list[idx].update({'factor': factor}) + instance_list[idx].update({"factor": factor}) return instance_list, image_list @@ -296,19 +281,19 @@ class Human36mDataset(BaseMocapDataset): A typical h36m image filename is like: S1_Directions_1.54138969_000001.jpg """ - subj, rest = osp.basename(imgname).split('_', 1) - action, rest = rest.split('.', 1) - camera, rest = rest.split('_', 1) + subj, rest = osp.basename(imgname).split("_", 1) + action, rest = rest.split(".", 1) + camera, rest = rest.split("_", 1) return subj, action, camera def get_camera_param(self, imgname) -> dict: """Get camera parameters of a frame by its image name.""" - assert hasattr(self, 'camera_param') + assert hasattr(self, "camera_param") subj, _, camera = self._parse_h36m_imgname(imgname) return self.camera_param[(subj, camera)] def _load_keypoint_2d_detection(self, det_file): - """"Load 2D joint detection results from file.""" + """ "Load 2D joint detection results from file.""" with get_local_path(det_file) as local_path: kpts_2d = np.load(local_path).astype(np.float32) diff --git a/mmpose/datasets/datasets/face/__init__.py b/mmpose/datasets/datasets/face/__init__.py index 1b78d87502f660342d7a9822070f6cd4b47eb3be..757aec20cd82c37f226e42c4b4f8717f7aa1c0b4 100644 --- a/mmpose/datasets/datasets/face/__init__.py +++ b/mmpose/datasets/datasets/face/__init__.py @@ -7,7 +7,4 @@ from .face_300wlp_dataset import Face300WLPDataset from .lapa_dataset import LapaDataset from .wflw_dataset import WFLWDataset -__all__ = [ - 'Face300WDataset', 'WFLWDataset', 'AFLWDataset', 'COFWDataset', - 'CocoWholeBodyFaceDataset', 'LapaDataset', 'Face300WLPDataset' -] +__all__ = ["Face300WDataset", "WFLWDataset", "AFLWDataset", "COFWDataset", "CocoWholeBodyFaceDataset", "LapaDataset", "Face300WLPDataset"] diff --git a/mmpose/datasets/datasets/face/aflw_dataset.py b/mmpose/datasets/datasets/face/aflw_dataset.py index deda0974bb58ba52371f727e788342b5502987a5..9be91381075c4b2c039e24d60d3ccb43d0ade38e 100644 --- a/mmpose/datasets/datasets/face/aflw_dataset.py +++ b/mmpose/datasets/datasets/face/aflw_dataset.py @@ -6,6 +6,7 @@ import numpy as np from mmpose.registry import DATASETS from mmpose.structures.bbox import bbox_cs2xyxy + from ..base import BaseCocoStyleDataset @@ -60,7 +61,7 @@ class AFLWDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/aflw.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/aflw.py") def parse_data_info(self, raw_data_info: dict) -> Optional[dict]: """Parse raw Face AFLW annotation of an instance. @@ -77,46 +78,43 @@ class AFLWDataset(BaseCocoStyleDataset): dict: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] - img_path = osp.join(self.data_prefix['img'], img['file_name']) + img_path = osp.join(self.data_prefix["img"], img["file_name"]) # aflw bbox scales are normalized with factor 200. - pixel_std = 200. + pixel_std = 200.0 # center, scale in shape [1, 2] and bbox in [1, 4] - center = np.array([ann['center']], dtype=np.float32) - scale = np.array([[ann['scale'], ann['scale']]], - dtype=np.float32) * pixel_std + center = np.array([ann["center"]], dtype=np.float32) + scale = np.array([[ann["scale"], ann["scale"]]], dtype=np.float32) * pixel_std bbox = bbox_cs2xyxy(center, scale) # keypoints in shape [1, K, 2] and keypoints_visible in [1, K] - _keypoints = np.array( - ann['keypoints'], dtype=np.float32).reshape(1, -1, 3) + _keypoints = np.array(ann["keypoints"], dtype=np.float32).reshape(1, -1, 3) keypoints = _keypoints[..., :2] keypoints_visible = np.minimum(1, _keypoints[..., 2]) - num_keypoints = ann['num_keypoints'] + num_keypoints = ann["num_keypoints"] data_info = { - 'img_id': ann['image_id'], - 'img_path': img_path, - 'bbox': bbox, - 'bbox_center': center, - 'bbox_scale': scale, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': num_keypoints, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'iscrowd': ann['iscrowd'], - 'id': ann['id'], + "img_id": ann["image_id"], + "img_path": img_path, + "bbox": bbox, + "bbox_center": center, + "bbox_scale": scale, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": num_keypoints, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "iscrowd": ann["iscrowd"], + "id": ann["id"], } if self.test_mode: # 'box_size' is used as normalization factor - assert 'box_size' in ann, '"box_size" is missing in annotation, '\ - 'which is required for evaluation.' - data_info['box_size'] = ann['box_size'] + assert "box_size" in ann, '"box_size" is missing in annotation, ' "which is required for evaluation." + data_info["box_size"] = ann["box_size"] return data_info diff --git a/mmpose/datasets/datasets/face/coco_wholebody_face_dataset.py b/mmpose/datasets/datasets/face/coco_wholebody_face_dataset.py index bc2c5be386012a341879a3910dcf72e5672e5d6f..89357acfd5d6cb2afe1d33ff2e4cdd36269652f3 100644 --- a/mmpose/datasets/datasets/face/coco_wholebody_face_dataset.py +++ b/mmpose/datasets/datasets/face/coco_wholebody_face_dataset.py @@ -5,6 +5,7 @@ from typing import Optional import numpy as np from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -56,8 +57,7 @@ class CocoWholeBodyFaceDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict( - from_file='configs/_base_/datasets/coco_wholebody_face.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/coco_wholebody_face.py") def parse_data_info(self, raw_data_info: dict) -> Optional[dict]: """Parse raw CocoWholeBody Face annotation of an instance. @@ -74,18 +74,18 @@ class CocoWholeBodyFaceDataset(BaseCocoStyleDataset): dict: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] # filter invalid instance - if not ann['face_valid'] or max(ann['face_kpts']) <= 0: + if not ann["face_valid"] or max(ann["face_kpts"]) <= 0: return None - img_path = osp.join(self.data_prefix['img'], img['file_name']) - img_w, img_h = img['width'], img['height'] + img_path = osp.join(self.data_prefix["img"], img["file_name"]) + img_w, img_h = img["width"], img["height"] # get bbox in shape [1, 4], formatted as xywh - x, y, w, h = ann['face_box'] + x, y, w, h = ann["face_box"] x1 = np.clip(x, 0, img_w - 1) y1 = np.clip(y, 0, img_h - 1) x2 = np.clip(x + w, 0, img_w - 1) @@ -94,22 +94,21 @@ class CocoWholeBodyFaceDataset(BaseCocoStyleDataset): bbox = np.array([x1, y1, x2, y2], dtype=np.float32).reshape(1, 4) # keypoints in shape [1, K, 2] and keypoints_visible in [1, K] - _keypoints = np.array( - ann['face_kpts'], dtype=np.float32).reshape(1, -1, 3) + _keypoints = np.array(ann["face_kpts"], dtype=np.float32).reshape(1, -1, 3) keypoints = _keypoints[..., :2] keypoints_visible = np.minimum(1, _keypoints[..., 2]) num_keypoints = np.count_nonzero(keypoints.max(axis=2)) data_info = { - 'img_id': ann['image_id'], - 'img_path': img_path, - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': num_keypoints, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'iscrowd': ann['iscrowd'], - 'id': ann['id'], + "img_id": ann["image_id"], + "img_path": img_path, + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": num_keypoints, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "iscrowd": ann["iscrowd"], + "id": ann["id"], } return data_info diff --git a/mmpose/datasets/datasets/face/cofw_dataset.py b/mmpose/datasets/datasets/face/cofw_dataset.py index 5ec2a37efd8b7fc125ebd87df88bc9c99cd86250..da934a739a6b44d013aaf375317ae5c3d2f48345 100644 --- a/mmpose/datasets/datasets/face/cofw_dataset.py +++ b/mmpose/datasets/datasets/face/cofw_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -50,4 +51,4 @@ class COFWDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/cofw.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/cofw.py") diff --git a/mmpose/datasets/datasets/face/face_300w_dataset.py b/mmpose/datasets/datasets/face/face_300w_dataset.py index c70e892b4f707dc5990566b760e0a2566eb4a53f..c2304390af4358a64fb57b4bde8eec3dca2f1d5f 100644 --- a/mmpose/datasets/datasets/face/face_300w_dataset.py +++ b/mmpose/datasets/datasets/face/face_300w_dataset.py @@ -6,6 +6,7 @@ import numpy as np from mmpose.registry import DATASETS from mmpose.structures.bbox import bbox_cs2xyxy + from ..base import BaseCocoStyleDataset @@ -57,7 +58,7 @@ class Face300WDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/300w.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/300w.py") def parse_data_info(self, raw_data_info: dict) -> Optional[dict]: """Parse raw Face300W annotation of an instance. @@ -74,39 +75,37 @@ class Face300WDataset(BaseCocoStyleDataset): dict: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] - img_path = osp.join(self.data_prefix['img'], img['file_name']) + img_path = osp.join(self.data_prefix["img"], img["file_name"]) # 300w bbox scales are normalized with factor 200. - pixel_std = 200. + pixel_std = 200.0 # center, scale in shape [1, 2] and bbox in [1, 4] - center = np.array([ann['center']], dtype=np.float32) - scale = np.array([[ann['scale'], ann['scale']]], - dtype=np.float32) * pixel_std + center = np.array([ann["center"]], dtype=np.float32) + scale = np.array([[ann["scale"], ann["scale"]]], dtype=np.float32) * pixel_std bbox = bbox_cs2xyxy(center, scale) # keypoints in shape [1, K, 2] and keypoints_visible in [1, K] - _keypoints = np.array( - ann['keypoints'], dtype=np.float32).reshape(1, -1, 3) + _keypoints = np.array(ann["keypoints"], dtype=np.float32).reshape(1, -1, 3) keypoints = _keypoints[..., :2] keypoints_visible = np.minimum(1, _keypoints[..., 2]) - num_keypoints = ann['num_keypoints'] + num_keypoints = ann["num_keypoints"] data_info = { - 'img_id': ann['image_id'], - 'img_path': img_path, - 'bbox': bbox, - 'bbox_center': center, - 'bbox_scale': scale, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': num_keypoints, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'iscrowd': ann['iscrowd'], - 'id': ann['id'], + "img_id": ann["image_id"], + "img_path": img_path, + "bbox": bbox, + "bbox_center": center, + "bbox_scale": scale, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": num_keypoints, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "iscrowd": ann["iscrowd"], + "id": ann["id"], } return data_info diff --git a/mmpose/datasets/datasets/face/face_300wlp_dataset.py b/mmpose/datasets/datasets/face/face_300wlp_dataset.py index 215df09a532146740eb60c822e77f438e04d100e..684e956f8935d3fbf2e6a5ae8baeb427b69b9ebf 100644 --- a/mmpose/datasets/datasets/face/face_300wlp_dataset.py +++ b/mmpose/datasets/datasets/face/face_300wlp_dataset.py @@ -1,6 +1,7 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -52,4 +53,4 @@ class Face300WLPDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/300wlp.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/300wlp.py") diff --git a/mmpose/datasets/datasets/face/lapa_dataset.py b/mmpose/datasets/datasets/face/lapa_dataset.py index 1a5bdc4ec08cebe690ae1f5f2a659e9c087634ec..45278f61693e02f093302b7f9ac87bb818670c94 100644 --- a/mmpose/datasets/datasets/face/lapa_dataset.py +++ b/mmpose/datasets/datasets/face/lapa_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -51,4 +52,4 @@ class LapaDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/lapa.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/lapa.py") diff --git a/mmpose/datasets/datasets/face/wflw_dataset.py b/mmpose/datasets/datasets/face/wflw_dataset.py index 9c1c23053ce87fc92a234334e637e7a8e0402a9e..01b97ec5fc80c2d2470b5507e1005944b797cf98 100644 --- a/mmpose/datasets/datasets/face/wflw_dataset.py +++ b/mmpose/datasets/datasets/face/wflw_dataset.py @@ -6,6 +6,7 @@ import numpy as np from mmpose.registry import DATASETS from mmpose.structures.bbox import bbox_cs2xyxy + from ..base import BaseCocoStyleDataset @@ -57,7 +58,7 @@ class WFLWDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/wflw.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/wflw.py") def parse_data_info(self, raw_data_info: dict) -> Optional[dict]: """Parse raw Face WFLW annotation of an instance. @@ -74,39 +75,37 @@ class WFLWDataset(BaseCocoStyleDataset): dict: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] - img_path = osp.join(self.data_prefix['img'], img['file_name']) + img_path = osp.join(self.data_prefix["img"], img["file_name"]) # wflw bbox scales are normalized with factor 200. - pixel_std = 200. + pixel_std = 200.0 # center, scale in shape [1, 2] and bbox in [1, 4] - center = np.array([ann['center']], dtype=np.float32) - scale = np.array([[ann['scale'], ann['scale']]], - dtype=np.float32) * pixel_std + center = np.array([ann["center"]], dtype=np.float32) + scale = np.array([[ann["scale"], ann["scale"]]], dtype=np.float32) * pixel_std bbox = bbox_cs2xyxy(center, scale) # keypoints in shape [1, K, 2] and keypoints_visible in [1, K] - _keypoints = np.array( - ann['keypoints'], dtype=np.float32).reshape(1, -1, 3) + _keypoints = np.array(ann["keypoints"], dtype=np.float32).reshape(1, -1, 3) keypoints = _keypoints[..., :2] keypoints_visible = np.minimum(1, _keypoints[..., 2]) - num_keypoints = ann['num_keypoints'] + num_keypoints = ann["num_keypoints"] data_info = { - 'img_id': ann['image_id'], - 'img_path': img_path, - 'bbox': bbox, - 'bbox_center': center, - 'bbox_scale': scale, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': num_keypoints, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'iscrowd': ann['iscrowd'], - 'id': ann['id'], + "img_id": ann["image_id"], + "img_path": img_path, + "bbox": bbox, + "bbox_center": center, + "bbox_scale": scale, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": num_keypoints, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "iscrowd": ann["iscrowd"], + "id": ann["id"], } return data_info diff --git a/mmpose/datasets/datasets/fashion/__init__.py b/mmpose/datasets/datasets/fashion/__init__.py index 8be25dede3d16dfb7754c794d86d7f236e8f647b..4257bf8e38eef668b7fff72d24f67af267ec4bfd 100644 --- a/mmpose/datasets/datasets/fashion/__init__.py +++ b/mmpose/datasets/datasets/fashion/__init__.py @@ -2,4 +2,4 @@ from .deepfashion2_dataset import DeepFashion2Dataset from .deepfashion_dataset import DeepFashionDataset -__all__ = ['DeepFashionDataset', 'DeepFashion2Dataset'] +__all__ = ["DeepFashionDataset", "DeepFashion2Dataset"] diff --git a/mmpose/datasets/datasets/fashion/deepfashion2_dataset.py b/mmpose/datasets/datasets/fashion/deepfashion2_dataset.py index c3cde9bf97be254927aa6a06f46bdcc225f14283..84c5c73d985a9742124eba3149d244e9124df992 100644 --- a/mmpose/datasets/datasets/fashion/deepfashion2_dataset.py +++ b/mmpose/datasets/datasets/fashion/deepfashion2_dataset.py @@ -1,10 +1,11 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset -@DATASETS.register_module(name='DeepFashion2Dataset') +@DATASETS.register_module(name="DeepFashion2Dataset") class DeepFashion2Dataset(BaseCocoStyleDataset): """DeepFashion2 dataset for fashion landmark detection.""" - METAINFO: dict = dict(from_file='configs/_base_/datasets/deepfashion2.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/deepfashion2.py") diff --git a/mmpose/datasets/datasets/fashion/deepfashion_dataset.py b/mmpose/datasets/datasets/fashion/deepfashion_dataset.py index a0aa4937323e41333d48a82a11862e68ffc697f0..96e96ce1e25fe3520bedff37f07e7de4593a534f 100644 --- a/mmpose/datasets/datasets/fashion/deepfashion_dataset.py +++ b/mmpose/datasets/datasets/fashion/deepfashion_dataset.py @@ -2,6 +2,7 @@ from typing import Callable, List, Optional, Sequence, Union from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -82,21 +83,23 @@ class DeepFashionDataset(BaseCocoStyleDataset): image. Default: 1000. """ - def __init__(self, - ann_file: str = '', - subset: str = '', - bbox_file: Optional[str] = None, - data_mode: str = 'topdown', - metainfo: Optional[dict] = None, - data_root: Optional[str] = None, - data_prefix: dict = dict(img=''), - filter_cfg: Optional[dict] = None, - indices: Optional[Union[int, Sequence[int]]] = None, - serialize_data: bool = True, - pipeline: List[Union[dict, Callable]] = [], - test_mode: bool = False, - lazy_init: bool = False, - max_refetch: int = 1000): + def __init__( + self, + ann_file: str = "", + subset: str = "", + bbox_file: Optional[str] = None, + data_mode: str = "topdown", + metainfo: Optional[dict] = None, + data_root: Optional[str] = None, + data_prefix: dict = dict(img=""), + filter_cfg: Optional[dict] = None, + indices: Optional[Union[int, Sequence[int]]] = None, + serialize_data: bool = True, + pipeline: List[Union[dict, Callable]] = [], + test_mode: bool = False, + lazy_init: bool = False, + max_refetch: int = 1000, + ): self._check_subset_and_metainfo(subset) super().__init__( @@ -112,26 +115,22 @@ class DeepFashionDataset(BaseCocoStyleDataset): pipeline=pipeline, test_mode=test_mode, lazy_init=lazy_init, - max_refetch=max_refetch) + max_refetch=max_refetch, + ) @classmethod - def _check_subset_and_metainfo(cls, subset: str = '') -> None: + def _check_subset_and_metainfo(cls, subset: str = "") -> None: """Check the subset of body and set the corresponding metainfo. Args: subset(str): the subset of body: could be ``'full'``, ``'upper'`` or ``'lower'``. Default: '', which means ``'full'``. """ - if subset == '' or subset == 'full': - cls.METAINFO = dict( - from_file='configs/_base_/datasets/deepfashion_full.py') - elif subset == 'upper': - cls.METAINFO = dict( - from_file='configs/_base_/datasets/deepfashion_upper.py') - elif subset == 'lower': - cls.METAINFO = dict( - from_file='configs/_base_/datasets/deepfashion_lower.py') + if subset == "" or subset == "full": + cls.METAINFO = dict(from_file="configs/_base_/datasets/deepfashion_full.py") + elif subset == "upper": + cls.METAINFO = dict(from_file="configs/_base_/datasets/deepfashion_upper.py") + elif subset == "lower": + cls.METAINFO = dict(from_file="configs/_base_/datasets/deepfashion_lower.py") else: - raise ValueError( - f'{cls.__class__.__name__} got invalid subset: ' - f'{subset}. Should be "full", "lower" or "upper".') + raise ValueError(f"{cls.__class__.__name__} got invalid subset: " f'{subset}. Should be "full", "lower" or "upper".') diff --git a/mmpose/datasets/datasets/hand/__init__.py b/mmpose/datasets/datasets/hand/__init__.py index 72f9bc14f19a4499b9b098c4d5313acabb9e45ee..525337aeb4b970d4faf5b96210bd011ffec99ba2 100644 --- a/mmpose/datasets/datasets/hand/__init__.py +++ b/mmpose/datasets/datasets/hand/__init__.py @@ -7,6 +7,10 @@ from .panoptic_hand2d_dataset import PanopticHand2DDataset from .rhd2d_dataset import Rhd2DDataset __all__ = [ - 'OneHand10KDataset', 'FreiHandDataset', 'PanopticHand2DDataset', - 'Rhd2DDataset', 'CocoWholeBodyHandDataset', 'InterHand2DDoubleDataset' + "OneHand10KDataset", + "FreiHandDataset", + "PanopticHand2DDataset", + "Rhd2DDataset", + "CocoWholeBodyHandDataset", + "InterHand2DDoubleDataset", ] diff --git a/mmpose/datasets/datasets/hand/coco_wholebody_hand_dataset.py b/mmpose/datasets/datasets/hand/coco_wholebody_hand_dataset.py index 15ac669d40b012a0d19cbb5d2931b40709199a50..9ab9a104cb5d9ae23877eb3da832b16fab031096 100644 --- a/mmpose/datasets/datasets/hand/coco_wholebody_hand_dataset.py +++ b/mmpose/datasets/datasets/hand/coco_wholebody_hand_dataset.py @@ -4,10 +4,11 @@ from typing import List, Tuple import numpy as np from mmengine.fileio import exists, get_local_path -from xtcocotools.coco import COCO from mmpose.registry import DATASETS from mmpose.structures.bbox import bbox_xywh2xyxy +from xtcocotools.coco import COCO + from ..base import BaseCocoStyleDataset @@ -81,14 +82,12 @@ class CocoWholeBodyHandDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict( - from_file='configs/_base_/datasets/coco_wholebody_hand.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/coco_wholebody_hand.py") def _load_annotations(self) -> Tuple[List[dict], List[dict]]: """Load data from annotations in COCO format.""" - assert exists(self.ann_file), ( - f'Annotation file `{self.ann_file}` does not exist') + assert exists(self.ann_file), f"Annotation file `{self.ann_file}` does not exist" with get_local_path(self.ann_file) as local_path: self.coco = COCO(local_path) @@ -99,51 +98,46 @@ class CocoWholeBodyHandDataset(BaseCocoStyleDataset): for img_id in self.coco.getImgIds(): img = self.coco.loadImgs(img_id)[0] - img.update({ - 'img_id': - img_id, - 'img_path': - osp.join(self.data_prefix['img'], img['file_name']), - }) + img.update( + { + "img_id": img_id, + "img_path": osp.join(self.data_prefix["img"], img["file_name"]), + } + ) image_list.append(img) ann_ids = self.coco.getAnnIds(imgIds=img_id, iscrowd=False) anns = self.coco.loadAnns(ann_ids) for ann in anns: - for type in ['left', 'right']: + for type in ["left", "right"]: # filter invalid hand annotations, there might be two # valid instances (left and right hand) in one image - if ann[f'{type}hand_valid'] and max( - ann[f'{type}hand_kpts']) > 0: + if ann[f"{type}hand_valid"] and max(ann[f"{type}hand_kpts"]) > 0: - bbox_xywh = np.array( - ann[f'{type}hand_box'], - dtype=np.float32).reshape(1, 4) + bbox_xywh = np.array(ann[f"{type}hand_box"], dtype=np.float32).reshape(1, 4) bbox = bbox_xywh2xyxy(bbox_xywh) - _keypoints = np.array( - ann[f'{type}hand_kpts'], - dtype=np.float32).reshape(1, -1, 3) + _keypoints = np.array(ann[f"{type}hand_kpts"], dtype=np.float32).reshape(1, -1, 3) keypoints = _keypoints[..., :2] keypoints_visible = np.minimum(1, _keypoints[..., 2]) num_keypoints = np.count_nonzero(keypoints.max(axis=2)) instance_info = { - 'img_id': ann['image_id'], - 'img_path': img['img_path'], - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': num_keypoints, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'iscrowd': ann['iscrowd'], - 'segmentation': ann['segmentation'], - 'id': id, + "img_id": ann["image_id"], + "img_path": img["img_path"], + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": num_keypoints, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "iscrowd": ann["iscrowd"], + "segmentation": ann["segmentation"], + "id": id, } instance_list.append(instance_info) id = id + 1 - instance_list = sorted(instance_list, key=lambda x: x['id']) + instance_list = sorted(instance_list, key=lambda x: x["id"]) return instance_list, image_list diff --git a/mmpose/datasets/datasets/hand/freihand_dataset.py b/mmpose/datasets/datasets/hand/freihand_dataset.py index 8f0e23cdd577d12e6d20656fde59f7da58a45150..ebee047942bedbba151440dc69c4137df9291d1c 100644 --- a/mmpose/datasets/datasets/hand/freihand_dataset.py +++ b/mmpose/datasets/datasets/hand/freihand_dataset.py @@ -5,6 +5,7 @@ from typing import Optional import numpy as np from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -79,7 +80,7 @@ class FreiHandDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/freihand2d.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/freihand2d.py") def parse_data_info(self, raw_data_info: dict) -> Optional[dict]: """Parse raw COCO annotation of an instance. @@ -96,33 +97,32 @@ class FreiHandDataset(BaseCocoStyleDataset): dict: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] - img_path = osp.join(self.data_prefix['img'], img['file_name']) + img_path = osp.join(self.data_prefix["img"], img["file_name"]) # use the entire image which is 224x224 bbox = np.array([0, 0, 224, 224], dtype=np.float32).reshape(1, 4) # keypoints in shape [1, K, 2] and keypoints_visible in [1, K] - _keypoints = np.array( - ann['keypoints'], dtype=np.float32).reshape(1, -1, 3) + _keypoints = np.array(ann["keypoints"], dtype=np.float32).reshape(1, -1, 3) keypoints = _keypoints[..., :2] keypoints_visible = np.minimum(1, _keypoints[..., 2]) num_keypoints = np.count_nonzero(keypoints.max(axis=2)) data_info = { - 'img_id': ann['image_id'], - 'img_path': img_path, - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': num_keypoints, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'iscrowd': ann['iscrowd'], - 'segmentation': ann['segmentation'], - 'id': ann['id'], + "img_id": ann["image_id"], + "img_path": img_path, + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": num_keypoints, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "iscrowd": ann["iscrowd"], + "segmentation": ann["segmentation"], + "id": ann["id"], } return data_info diff --git a/mmpose/datasets/datasets/hand/interhand2d_double_dataset.py b/mmpose/datasets/datasets/hand/interhand2d_double_dataset.py index e8841e6f54e93c90758e3e9133db8cc24994d134..bdfc1b7f02376675fc3bf9fe658622984fd9b8fd 100644 --- a/mmpose/datasets/datasets/hand/interhand2d_double_dataset.py +++ b/mmpose/datasets/datasets/hand/interhand2d_double_dataset.py @@ -7,12 +7,12 @@ from typing import Callable, List, Optional, Sequence, Tuple, Union import numpy as np from mmengine.fileio import exists, get_local_path from mmengine.utils import is_abs -from xtcocotools.coco import COCO from mmpose.codecs.utils import camera_to_pixel from mmpose.datasets.datasets import BaseCocoStyleDataset from mmpose.registry import DATASETS from mmpose.structures.bbox import bbox_xywh2xyxy +from xtcocotools.coco import COCO @DATASETS.register_module() @@ -113,42 +113,44 @@ class InterHand2DDoubleDataset(BaseCocoStyleDataset): Default: 1. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/interhand3d.py') - - def __init__(self, - ann_file: str = '', - camera_param_file: str = '', - joint_file: str = '', - use_gt_root_depth: bool = True, - rootnet_result_file: Optional[str] = None, - data_mode: str = 'topdown', - metainfo: Optional[dict] = None, - data_root: Optional[str] = None, - data_prefix: dict = dict(img=''), - filter_cfg: Optional[dict] = None, - indices: Optional[Union[int, Sequence[int]]] = None, - serialize_data: bool = True, - pipeline: List[Union[dict, Callable]] = [], - test_mode: bool = False, - lazy_init: bool = False, - max_refetch: int = 1000, - sample_interval: int = 1): + METAINFO: dict = dict(from_file="configs/_base_/datasets/interhand3d.py") + + def __init__( + self, + ann_file: str = "", + camera_param_file: str = "", + joint_file: str = "", + use_gt_root_depth: bool = True, + rootnet_result_file: Optional[str] = None, + data_mode: str = "topdown", + metainfo: Optional[dict] = None, + data_root: Optional[str] = None, + data_prefix: dict = dict(img=""), + filter_cfg: Optional[dict] = None, + indices: Optional[Union[int, Sequence[int]]] = None, + serialize_data: bool = True, + pipeline: List[Union[dict, Callable]] = [], + test_mode: bool = False, + lazy_init: bool = False, + max_refetch: int = 1000, + sample_interval: int = 1, + ): _ann_file = ann_file if data_root is not None and not is_abs(_ann_file): _ann_file = osp.join(data_root, _ann_file) - assert exists(_ann_file), 'Annotation file does not exist.' + assert exists(_ann_file), "Annotation file does not exist." self.ann_file = _ann_file _camera_param_file = camera_param_file if data_root is not None and not is_abs(_camera_param_file): _camera_param_file = osp.join(data_root, _camera_param_file) - assert exists(_camera_param_file), 'Camera file does not exist.' + assert exists(_camera_param_file), "Camera file does not exist." self.camera_param_file = _camera_param_file _joint_file = joint_file if data_root is not None and not is_abs(_joint_file): _joint_file = osp.join(data_root, _joint_file) - assert exists(_joint_file), 'Joint file does not exist.' + assert exists(_joint_file), "Joint file does not exist." self.joint_file = _joint_file self.use_gt_root_depth = use_gt_root_depth @@ -156,10 +158,8 @@ class InterHand2DDoubleDataset(BaseCocoStyleDataset): assert rootnet_result_file is not None _rootnet_result_file = rootnet_result_file if data_root is not None and not is_abs(_rootnet_result_file): - _rootnet_result_file = osp.join(data_root, - _rootnet_result_file) - assert exists( - _rootnet_result_file), 'Rootnet result file does not exist.' + _rootnet_result_file = osp.join(data_root, _rootnet_result_file) + assert exists(_rootnet_result_file), "Rootnet result file does not exist." self.rootnet_result_file = _rootnet_result_file super().__init__( @@ -175,26 +175,26 @@ class InterHand2DDoubleDataset(BaseCocoStyleDataset): test_mode=test_mode, lazy_init=lazy_init, max_refetch=max_refetch, - sample_interval=sample_interval) + sample_interval=sample_interval, + ) def _load_annotations(self) -> Tuple[List[dict], List[dict]]: """Load data from annotations in COCO format.""" - assert exists(self.ann_file), 'Annotation file does not exist' + assert exists(self.ann_file), "Annotation file does not exist" with get_local_path(self.ann_file) as local_path: self.coco = COCO(local_path) # set the metainfo about categories, which is a list of dict # and each dict contains the 'id', 'name', etc. about this category - if 'categories' in self.coco.dataset: - self._metainfo['CLASSES'] = self.coco.loadCats( - self.coco.getCatIds()) + if "categories" in self.coco.dataset: + self._metainfo["CLASSES"] = self.coco.loadCats(self.coco.getCatIds()) with get_local_path(self.camera_param_file) as local_path: - with open(local_path, 'r') as f: + with open(local_path, "r") as f: self.cameras = json.load(f) with get_local_path(self.joint_file) as local_path: - with open(local_path, 'r') as f: + with open(local_path, "r") as f: self.joints = json.load(f) instance_list = [] @@ -204,19 +204,18 @@ class InterHand2DDoubleDataset(BaseCocoStyleDataset): if idx % self.sample_interval != 0: continue img = self.coco.loadImgs(img_id)[0] - img.update({ - 'img_id': - img_id, - 'img_path': - osp.join(self.data_prefix['img'], img['file_name']), - }) + img.update( + { + "img_id": img_id, + "img_path": osp.join(self.data_prefix["img"], img["file_name"]), + } + ) image_list.append(img) ann_ids = self.coco.getAnnIds(imgIds=img_id) ann = self.coco.loadAnns(ann_ids)[0] - instance_info = self.parse_data_info( - dict(raw_ann_info=ann, raw_img_info=img)) + instance_info = self.parse_data_info(dict(raw_ann_info=ann, raw_img_info=img)) # skip invalid instance annotation. if not instance_info: @@ -240,47 +239,36 @@ class InterHand2DDoubleDataset(BaseCocoStyleDataset): dict | None: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] if not self.use_gt_root_depth: rootnet_result = {} with get_local_path(self.rootnet_result_file) as local_path: rootnet_annot = json.load(local_path) for i in range(len(rootnet_annot)): - rootnet_result[str( - rootnet_annot[i]['annot_id'])] = rootnet_annot[i] - - num_keypoints = self.metainfo['num_keypoints'] - - capture_id = str(img['capture']) - camera_name = img['camera'] - frame_idx = str(img['frame_idx']) - camera_pos = np.array( - self.cameras[capture_id]['campos'][camera_name], dtype=np.float32) - camera_rot = np.array( - self.cameras[capture_id]['camrot'][camera_name], dtype=np.float32) - focal = np.array( - self.cameras[capture_id]['focal'][camera_name], dtype=np.float32) - principal_pt = np.array( - self.cameras[capture_id]['princpt'][camera_name], dtype=np.float32) - joint_world = np.array( - self.joints[capture_id][frame_idx]['world_coord'], - dtype=np.float32) - joint_valid = np.array(ann['joint_valid'], dtype=np.float32).flatten() - - keypoints_cam = np.dot( - camera_rot, - joint_world.transpose(1, 0) - - camera_pos.reshape(3, 1)).transpose(1, 0) + rootnet_result[str(rootnet_annot[i]["annot_id"])] = rootnet_annot[i] + + num_keypoints = self.metainfo["num_keypoints"] + + capture_id = str(img["capture"]) + camera_name = img["camera"] + frame_idx = str(img["frame_idx"]) + camera_pos = np.array(self.cameras[capture_id]["campos"][camera_name], dtype=np.float32) + camera_rot = np.array(self.cameras[capture_id]["camrot"][camera_name], dtype=np.float32) + focal = np.array(self.cameras[capture_id]["focal"][camera_name], dtype=np.float32) + principal_pt = np.array(self.cameras[capture_id]["princpt"][camera_name], dtype=np.float32) + joint_world = np.array(self.joints[capture_id][frame_idx]["world_coord"], dtype=np.float32) + joint_valid = np.array(ann["joint_valid"], dtype=np.float32).flatten() + + keypoints_cam = np.dot(camera_rot, joint_world.transpose(1, 0) - camera_pos.reshape(3, 1)).transpose(1, 0) if self.use_gt_root_depth: - bbox_xywh = np.array(ann['bbox'], dtype=np.float32).reshape(1, 4) + bbox_xywh = np.array(ann["bbox"], dtype=np.float32).reshape(1, 4) else: - rootnet_ann_data = rootnet_result[str(ann['id'])] - bbox_xywh = np.array( - rootnet_ann_data['bbox'], dtype=np.float32).reshape(1, 4) + rootnet_ann_data = rootnet_result[str(ann["id"])] + bbox_xywh = np.array(rootnet_ann_data["bbox"], dtype=np.float32).reshape(1, 4) bbox = bbox_xywh2xyxy(bbox_xywh) @@ -292,51 +280,40 @@ class InterHand2DDoubleDataset(BaseCocoStyleDataset): joint_valid[:20] *= joint_valid[20] joint_valid[21:] *= joint_valid[41] - joints_3d_visible = np.minimum(1, - joint_valid.reshape(-1, - 1)).reshape(1, -1) - keypoints_img = camera_to_pixel( - keypoints_cam, - focal[0], - focal[1], - principal_pt[0], - principal_pt[1], - shift=True)[..., :2] - joints_3d = np.zeros((keypoints_cam.shape[-2], 3), - dtype=np.float32).reshape(1, -1, 3) + joints_3d_visible = np.minimum(1, joint_valid.reshape(-1, 1)).reshape(1, -1) + keypoints_img = camera_to_pixel(keypoints_cam, focal[0], focal[1], principal_pt[0], principal_pt[1], shift=True)[..., :2] + joints_3d = np.zeros((keypoints_cam.shape[-2], 3), dtype=np.float32).reshape(1, -1, 3) joints_3d[..., :2] = keypoints_img - joints_3d[..., :21, - 2] = keypoints_cam[..., :21, 2] - keypoints_cam[..., 20, 2] - joints_3d[..., 21:, - 2] = keypoints_cam[..., 21:, 2] - keypoints_cam[..., 41, 2] + joints_3d[..., :21, 2] = keypoints_cam[..., :21, 2] - keypoints_cam[..., 20, 2] + joints_3d[..., 21:, 2] = keypoints_cam[..., 21:, 2] - keypoints_cam[..., 41, 2] data_info = { - 'img_id': ann['image_id'], - 'img_path': img['img_path'], - 'keypoints': joints_3d[:, :, :2], - 'keypoints_visible': joints_3d_visible, - 'hand_type': self.encode_handtype(ann['hand_type']), - 'hand_type_valid': np.array([ann['hand_type_valid']]), - 'dataset': self.metainfo['dataset_name'], - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': num_keypoints, - 'iscrowd': ann.get('iscrowd', False), - 'id': ann['id'], + "img_id": ann["image_id"], + "img_path": img["img_path"], + "keypoints": joints_3d[:, :, :2], + "keypoints_visible": joints_3d_visible, + "hand_type": self.encode_handtype(ann["hand_type"]), + "hand_type_valid": np.array([ann["hand_type_valid"]]), + "dataset": self.metainfo["dataset_name"], + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": num_keypoints, + "iscrowd": ann.get("iscrowd", False), + "id": ann["id"], # store the raw annotation of the instance # it is useful for evaluation without providing ann_file - 'raw_ann_info': copy.deepcopy(ann), + "raw_ann_info": copy.deepcopy(ann), } return data_info @staticmethod def encode_handtype(hand_type): - if hand_type == 'right': + if hand_type == "right": return np.array([[1, 0]], dtype=np.float32) - elif hand_type == 'left': + elif hand_type == "left": return np.array([[0, 1]], dtype=np.float32) - elif hand_type == 'interacting': + elif hand_type == "interacting": return np.array([[1, 1]], dtype=np.float32) else: - assert 0, f'Not support hand type: {hand_type}' + assert 0, f"Not support hand type: {hand_type}" diff --git a/mmpose/datasets/datasets/hand/onehand10k_dataset.py b/mmpose/datasets/datasets/hand/onehand10k_dataset.py index 3519ace560ef70ce680955bfa82d52c1a11b6b3e..bc4308eebd31b7ca5047acdf8ade52bec4bd6a53 100644 --- a/mmpose/datasets/datasets/hand/onehand10k_dataset.py +++ b/mmpose/datasets/datasets/hand/onehand10k_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -74,4 +75,4 @@ class OneHand10KDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/onehand10k.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/onehand10k.py") diff --git a/mmpose/datasets/datasets/hand/panoptic_hand2d_dataset.py b/mmpose/datasets/datasets/hand/panoptic_hand2d_dataset.py index 26d364840ebe5756687a72a4de52b0213ffdcea2..8935acab0756a44d9eca069005203b79997f7895 100644 --- a/mmpose/datasets/datasets/hand/panoptic_hand2d_dataset.py +++ b/mmpose/datasets/datasets/hand/panoptic_hand2d_dataset.py @@ -5,6 +5,7 @@ from typing import Optional import numpy as np from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -79,8 +80,7 @@ class PanopticHand2DDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict( - from_file='configs/_base_/datasets/panoptic_hand2d.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/panoptic_hand2d.py") def parse_data_info(self, raw_data_info: dict) -> Optional[dict]: """Parse raw COCO annotation of an instance. @@ -97,14 +97,14 @@ class PanopticHand2DDataset(BaseCocoStyleDataset): dict: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] - img_path = osp.join(self.data_prefix['img'], img['file_name']) - img_w, img_h = img['width'], img['height'] + img_path = osp.join(self.data_prefix["img"], img["file_name"]) + img_w, img_h = img["width"], img["height"] # get bbox in shape [1, 4], formatted as xywh - x, y, w, h = ann['bbox'] + x, y, w, h = ann["bbox"] x1 = np.clip(x, 0, img_w - 1) y1 = np.clip(y, 0, img_h - 1) x2 = np.clip(x + w, 0, img_w - 1) @@ -113,25 +113,24 @@ class PanopticHand2DDataset(BaseCocoStyleDataset): bbox = np.array([x1, y1, x2, y2], dtype=np.float32).reshape(1, 4) # keypoints in shape [1, K, 2] and keypoints_visible in [1, K] - _keypoints = np.array( - ann['keypoints'], dtype=np.float32).reshape(1, -1, 3) + _keypoints = np.array(ann["keypoints"], dtype=np.float32).reshape(1, -1, 3) keypoints = _keypoints[..., :2] keypoints_visible = np.minimum(1, _keypoints[..., 2]) num_keypoints = np.count_nonzero(keypoints.max(axis=2)) data_info = { - 'img_id': ann['image_id'], - 'img_path': img_path, - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': num_keypoints, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'iscrowd': ann['iscrowd'], - 'segmentation': ann['segmentation'], - 'head_size': ann['head_size'], - 'id': ann['id'], + "img_id": ann["image_id"], + "img_path": img_path, + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": num_keypoints, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "iscrowd": ann["iscrowd"], + "segmentation": ann["segmentation"], + "head_size": ann["head_size"], + "id": ann["id"], } return data_info diff --git a/mmpose/datasets/datasets/hand/rhd2d_dataset.py b/mmpose/datasets/datasets/hand/rhd2d_dataset.py index ebc4301590a5f8c8d474b0ef37de4d03309ad0b9..af22998b18551969e554fb46c646ff6860bd3742 100644 --- a/mmpose/datasets/datasets/hand/rhd2d_dataset.py +++ b/mmpose/datasets/datasets/hand/rhd2d_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -74,4 +75,4 @@ class Rhd2DDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/rhd2d.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/rhd2d.py") diff --git a/mmpose/datasets/datasets/hand3d/__init__.py b/mmpose/datasets/datasets/hand3d/__init__.py index 20d4049ef81cd91f3594e221e9e685774b6a2032..454bf201bf16dfc1c4447bbbbe4f5fdb864eaef5 100644 --- a/mmpose/datasets/datasets/hand3d/__init__.py +++ b/mmpose/datasets/datasets/hand3d/__init__.py @@ -1,4 +1,4 @@ # Copyright (c) OpenMMLab. All rights reserved. from .interhand_3d_dataset import InterHand3DDataset -__all__ = ['InterHand3DDataset'] +__all__ = ["InterHand3DDataset"] diff --git a/mmpose/datasets/datasets/hand3d/interhand_3d_dataset.py b/mmpose/datasets/datasets/hand3d/interhand_3d_dataset.py index 13d0bd26b3801742ec442e5b0146fec42b774e26..0741ddcf260123d5075c4f8ad1df64dd42c5b84a 100644 --- a/mmpose/datasets/datasets/hand3d/interhand_3d_dataset.py +++ b/mmpose/datasets/datasets/hand3d/interhand_3d_dataset.py @@ -7,12 +7,12 @@ from typing import Callable, List, Optional, Sequence, Tuple, Union import numpy as np from mmengine.fileio import exists, get_local_path from mmengine.utils import is_abs -from xtcocotools.coco import COCO from mmpose.codecs.utils import camera_to_pixel from mmpose.datasets.datasets import BaseCocoStyleDataset from mmpose.registry import DATASETS from mmpose.structures.bbox import bbox_xywh2xyxy +from xtcocotools.coco import COCO @DATASETS.register_module() @@ -111,42 +111,44 @@ class InterHand3DDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/interhand3d.py') - - def __init__(self, - ann_file: str = '', - camera_param_file: str = '', - joint_file: str = '', - use_gt_root_depth: bool = True, - rootnet_result_file: Optional[str] = None, - data_mode: str = 'topdown', - metainfo: Optional[dict] = None, - data_root: Optional[str] = None, - data_prefix: dict = dict(img=''), - filter_cfg: Optional[dict] = None, - indices: Optional[Union[int, Sequence[int]]] = None, - serialize_data: bool = True, - pipeline: List[Union[dict, Callable]] = [], - test_mode: bool = False, - lazy_init: bool = False, - max_refetch: int = 1000): + METAINFO: dict = dict(from_file="configs/_base_/datasets/interhand3d.py") + + def __init__( + self, + ann_file: str = "", + camera_param_file: str = "", + joint_file: str = "", + use_gt_root_depth: bool = True, + rootnet_result_file: Optional[str] = None, + data_mode: str = "topdown", + metainfo: Optional[dict] = None, + data_root: Optional[str] = None, + data_prefix: dict = dict(img=""), + filter_cfg: Optional[dict] = None, + indices: Optional[Union[int, Sequence[int]]] = None, + serialize_data: bool = True, + pipeline: List[Union[dict, Callable]] = [], + test_mode: bool = False, + lazy_init: bool = False, + max_refetch: int = 1000, + ): _ann_file = ann_file if not is_abs(_ann_file): _ann_file = osp.join(data_root, _ann_file) - assert exists(_ann_file), 'Annotation file does not exist.' + assert exists(_ann_file), "Annotation file does not exist." self.ann_file = _ann_file _camera_param_file = camera_param_file if not is_abs(_camera_param_file): _camera_param_file = osp.join(data_root, _camera_param_file) - assert exists(_camera_param_file), 'Camera file does not exist.' + assert exists(_camera_param_file), "Camera file does not exist." self.camera_param_file = _camera_param_file _joint_file = joint_file if not is_abs(_joint_file): _joint_file = osp.join(data_root, _joint_file) - assert exists(_joint_file), 'Joint file does not exist.' + assert exists(_joint_file), "Joint file does not exist." self.joint_file = _joint_file self.use_gt_root_depth = use_gt_root_depth @@ -154,10 +156,8 @@ class InterHand3DDataset(BaseCocoStyleDataset): assert rootnet_result_file is not None _rootnet_result_file = rootnet_result_file if not is_abs(_rootnet_result_file): - _rootnet_result_file = osp.join(data_root, - _rootnet_result_file) - assert exists( - _rootnet_result_file), 'Rootnet result file does not exist.' + _rootnet_result_file = osp.join(data_root, _rootnet_result_file) + assert exists(_rootnet_result_file), "Rootnet result file does not exist." self.rootnet_result_file = _rootnet_result_file super().__init__( @@ -172,26 +172,26 @@ class InterHand3DDataset(BaseCocoStyleDataset): pipeline=pipeline, test_mode=test_mode, lazy_init=lazy_init, - max_refetch=max_refetch) + max_refetch=max_refetch, + ) def _load_annotations(self) -> Tuple[List[dict], List[dict]]: """Load data from annotations in COCO format.""" - assert exists(self.ann_file), 'Annotation file does not exist' + assert exists(self.ann_file), "Annotation file does not exist" with get_local_path(self.ann_file) as local_path: self.coco = COCO(local_path) # set the metainfo about categories, which is a list of dict # and each dict contains the 'id', 'name', etc. about this category - if 'categories' in self.coco.dataset: - self._metainfo['CLASSES'] = self.coco.loadCats( - self.coco.getCatIds()) + if "categories" in self.coco.dataset: + self._metainfo["CLASSES"] = self.coco.loadCats(self.coco.getCatIds()) with get_local_path(self.camera_param_file) as local_path: - with open(local_path, 'r') as f: + with open(local_path, "r") as f: self.cameras = json.load(f) with get_local_path(self.joint_file) as local_path: - with open(local_path, 'r') as f: + with open(local_path, "r") as f: self.joints = json.load(f) instance_list = [] @@ -199,19 +199,18 @@ class InterHand3DDataset(BaseCocoStyleDataset): for idx, img_id in enumerate(self.coco.getImgIds()): img = self.coco.loadImgs(img_id)[0] - img.update({ - 'img_id': - img_id, - 'img_path': - osp.join(self.data_prefix['img'], img['file_name']), - }) + img.update( + { + "img_id": img_id, + "img_path": osp.join(self.data_prefix["img"], img["file_name"]), + } + ) image_list.append(img) ann_ids = self.coco.getAnnIds(imgIds=img_id) ann = self.coco.loadAnns(ann_ids)[0] - instance_info = self.parse_data_info( - dict(raw_ann_info=ann, raw_img_info=img)) + instance_info = self.parse_data_info(dict(raw_ann_info=ann, raw_img_info=img)) # skip invalid instance annotation. if not instance_info: @@ -235,48 +234,37 @@ class InterHand3DDataset(BaseCocoStyleDataset): dict | None: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] if not self.use_gt_root_depth: rootnet_result = {} with get_local_path(self.rootnet_result_file) as local_path: rootnet_annot = json.load(local_path) for i in range(len(rootnet_annot)): - rootnet_result[str( - rootnet_annot[i]['annot_id'])] = rootnet_annot[i] - - num_keypoints = self.metainfo['num_keypoints'] - - capture_id = str(img['capture']) - camera_name = img['camera'] - frame_idx = str(img['frame_idx']) - camera_pos = np.array( - self.cameras[capture_id]['campos'][camera_name], dtype=np.float32) - camera_rot = np.array( - self.cameras[capture_id]['camrot'][camera_name], dtype=np.float32) - focal = np.array( - self.cameras[capture_id]['focal'][camera_name], dtype=np.float32) - principal_pt = np.array( - self.cameras[capture_id]['princpt'][camera_name], dtype=np.float32) - joint_world = np.array( - self.joints[capture_id][frame_idx]['world_coord'], - dtype=np.float32) - joint_valid = np.array(ann['joint_valid'], dtype=np.float32).flatten() - - keypoints_cam = np.dot( - camera_rot, - joint_world.transpose(1, 0) - - camera_pos.reshape(3, 1)).transpose(1, 0) + rootnet_result[str(rootnet_annot[i]["annot_id"])] = rootnet_annot[i] + + num_keypoints = self.metainfo["num_keypoints"] + + capture_id = str(img["capture"]) + camera_name = img["camera"] + frame_idx = str(img["frame_idx"]) + camera_pos = np.array(self.cameras[capture_id]["campos"][camera_name], dtype=np.float32) + camera_rot = np.array(self.cameras[capture_id]["camrot"][camera_name], dtype=np.float32) + focal = np.array(self.cameras[capture_id]["focal"][camera_name], dtype=np.float32) + principal_pt = np.array(self.cameras[capture_id]["princpt"][camera_name], dtype=np.float32) + joint_world = np.array(self.joints[capture_id][frame_idx]["world_coord"], dtype=np.float32) + joint_valid = np.array(ann["joint_valid"], dtype=np.float32).flatten() + + keypoints_cam = np.dot(camera_rot, joint_world.transpose(1, 0) - camera_pos.reshape(3, 1)).transpose(1, 0) if self.use_gt_root_depth: - bbox_xywh = np.array(ann['bbox'], dtype=np.float32).reshape(1, 4) + bbox_xywh = np.array(ann["bbox"], dtype=np.float32).reshape(1, 4) abs_depth = [keypoints_cam[20, 2], keypoints_cam[41, 2]] else: - rootnet_ann_data = rootnet_result[str(ann['id'])] - bbox_xywh = np.array( - rootnet_ann_data['bbox'], dtype=np.float32).reshape(1, 4) - abs_depth = rootnet_ann_data['abs_depth'] + rootnet_ann_data = rootnet_result[str(ann["id"])] + bbox_xywh = np.array(rootnet_ann_data["bbox"], dtype=np.float32).reshape(1, 4) + abs_depth = rootnet_ann_data["abs_depth"] bbox = bbox_xywh2xyxy(bbox_xywh) # 41: 'l_wrist', left hand root @@ -290,58 +278,47 @@ class InterHand3DDataset(BaseCocoStyleDataset): joint_valid[:20] *= joint_valid[20] joint_valid[21:] *= joint_valid[41] - joints_3d_visible = np.minimum(1, - joint_valid.reshape(-1, - 1)).reshape(1, -1) - keypoints_img = camera_to_pixel( - keypoints_cam, - focal[0], - focal[1], - principal_pt[0], - principal_pt[1], - shift=True)[..., :2] - joints_3d = np.zeros((keypoints_cam.shape[-2], 3), - dtype=np.float32).reshape(1, -1, 3) + joints_3d_visible = np.minimum(1, joint_valid.reshape(-1, 1)).reshape(1, -1) + keypoints_img = camera_to_pixel(keypoints_cam, focal[0], focal[1], principal_pt[0], principal_pt[1], shift=True)[..., :2] + joints_3d = np.zeros((keypoints_cam.shape[-2], 3), dtype=np.float32).reshape(1, -1, 3) joints_3d[..., :2] = keypoints_img - joints_3d[..., :21, - 2] = keypoints_cam[..., :21, 2] - keypoints_cam[..., 20, 2] - joints_3d[..., 21:, - 2] = keypoints_cam[..., 21:, 2] - keypoints_cam[..., 41, 2] + joints_3d[..., :21, 2] = keypoints_cam[..., :21, 2] - keypoints_cam[..., 20, 2] + joints_3d[..., 21:, 2] = keypoints_cam[..., 21:, 2] - keypoints_cam[..., 41, 2] data_info = { - 'img_id': ann['image_id'], - 'img_path': img['img_path'], - 'rotation': 0, - 'keypoints': joints_3d, - 'keypoints_cam': keypoints_cam.reshape(1, -1, 3), - 'keypoints_visible': joints_3d_visible, - 'hand_type': self.encode_handtype(ann['hand_type']), - 'hand_type_valid': np.array([ann['hand_type_valid']]), - 'rel_root_depth': rel_root_depth, - 'rel_root_valid': rel_root_valid, - 'abs_depth': abs_depth, - 'focal': focal, - 'principal_pt': principal_pt, - 'dataset': self.metainfo['dataset_name'], - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': num_keypoints, - 'iscrowd': ann.get('iscrowd', False), - 'id': ann['id'], + "img_id": ann["image_id"], + "img_path": img["img_path"], + "rotation": 0, + "keypoints": joints_3d, + "keypoints_cam": keypoints_cam.reshape(1, -1, 3), + "keypoints_visible": joints_3d_visible, + "hand_type": self.encode_handtype(ann["hand_type"]), + "hand_type_valid": np.array([ann["hand_type_valid"]]), + "rel_root_depth": rel_root_depth, + "rel_root_valid": rel_root_valid, + "abs_depth": abs_depth, + "focal": focal, + "principal_pt": principal_pt, + "dataset": self.metainfo["dataset_name"], + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": num_keypoints, + "iscrowd": ann.get("iscrowd", False), + "id": ann["id"], # store the raw annotation of the instance # it is useful for evaluation without providing ann_file - 'raw_ann_info': copy.deepcopy(ann), + "raw_ann_info": copy.deepcopy(ann), } return data_info @staticmethod def encode_handtype(hand_type): - if hand_type == 'right': + if hand_type == "right": return np.array([[1, 0]], dtype=np.float32) - elif hand_type == 'left': + elif hand_type == "left": return np.array([[0, 1]], dtype=np.float32) - elif hand_type == 'interacting': + elif hand_type == "interacting": return np.array([[1, 1]], dtype=np.float32) else: - assert 0, f'Not support hand type: {hand_type}' + assert 0, f"Not support hand type: {hand_type}" diff --git a/mmpose/datasets/datasets/utils.py b/mmpose/datasets/datasets/utils.py index 7433a168b9ef8d9c267095301abfcbf8422886f5..d945adff34c4c5d5edc4829951dd969b196a6d24 100644 --- a/mmpose/datasets/datasets/utils.py +++ b/mmpose/datasets/datasets/utils.py @@ -89,35 +89,35 @@ def parse_pose_metainfo(metainfo: dict): - "sigmas" (numpy.ndarray): Same as the ``"sigmas"`` in the input """ - if 'from_file' in metainfo: - cfg_file = metainfo['from_file'] + if "from_file" in metainfo: + cfg_file = metainfo["from_file"] if not osp.isfile(cfg_file): # Search configs in 'mmpose/.mim/configs/' in case that mmpose # is installed in non-editable mode. import mmpose + mmpose_path = osp.dirname(mmpose.__file__) - _cfg_file = osp.join(mmpose_path, '.mim', 'configs', '_base_', - 'datasets', osp.basename(cfg_file)) + _cfg_file = osp.join(mmpose_path, ".mim", "configs", "_base_", "datasets", osp.basename(cfg_file)) if osp.isfile(_cfg_file): warnings.warn( f'The metainfo config file "{cfg_file}" does not exist. ' f'A matched config file "{_cfg_file}" will be used ' - 'instead.') + "instead." + ) cfg_file = _cfg_file else: - raise FileNotFoundError( - f'The metainfo config file "{cfg_file}" does not exist.') + raise FileNotFoundError(f'The metainfo config file "{cfg_file}" does not exist.') # TODO: remove the nested structure of dataset_info # metainfo = Config.fromfile(metainfo['from_file']) metainfo = Config.fromfile(cfg_file).dataset_info # check data integrity - assert 'dataset_name' in metainfo - assert 'keypoint_info' in metainfo - assert 'skeleton_info' in metainfo - assert 'joint_weights' in metainfo - assert 'sigmas' in metainfo + assert "dataset_name" in metainfo + assert "keypoint_info" in metainfo + assert "skeleton_info" in metainfo + assert "joint_weights" in metainfo + assert "sigmas" in metainfo # parse metainfo parsed = dict( @@ -137,47 +137,46 @@ def parse_pose_metainfo(metainfo: dict): sigmas=None, ) - parsed['dataset_name'] = metainfo['dataset_name'] + parsed["dataset_name"] = metainfo["dataset_name"] # parse keypoint information - parsed['num_keypoints'] = len(metainfo['keypoint_info']) - - for kpt_id, kpt in metainfo['keypoint_info'].items(): - kpt_name = kpt['name'] - parsed['keypoint_id2name'][kpt_id] = kpt_name - parsed['keypoint_name2id'][kpt_name] = kpt_id - parsed['keypoint_colors'].append(kpt.get('color', [255, 128, 0])) - - kpt_type = kpt.get('type', '') - if kpt_type == 'upper': - parsed['upper_body_ids'].append(kpt_id) - elif kpt_type == 'lower': - parsed['lower_body_ids'].append(kpt_id) - - swap_kpt = kpt.get('swap', '') - if swap_kpt == kpt_name or swap_kpt == '': - parsed['flip_indices'].append(kpt_name) + parsed["num_keypoints"] = len(metainfo["keypoint_info"]) + + for kpt_id, kpt in metainfo["keypoint_info"].items(): + kpt_name = kpt["name"] + parsed["keypoint_id2name"][kpt_id] = kpt_name + parsed["keypoint_name2id"][kpt_name] = kpt_id + parsed["keypoint_colors"].append(kpt.get("color", [255, 128, 0])) + + kpt_type = kpt.get("type", "") + if kpt_type == "upper": + parsed["upper_body_ids"].append(kpt_id) + elif kpt_type == "lower": + parsed["lower_body_ids"].append(kpt_id) + + swap_kpt = kpt.get("swap", "") + if swap_kpt == kpt_name or swap_kpt == "": + parsed["flip_indices"].append(kpt_name) else: - parsed['flip_indices'].append(swap_kpt) + parsed["flip_indices"].append(swap_kpt) pair = (swap_kpt, kpt_name) - if pair not in parsed['flip_pairs']: - parsed['flip_pairs'].append(pair) + if pair not in parsed["flip_pairs"]: + parsed["flip_pairs"].append(pair) # parse skeleton information - parsed['num_skeleton_links'] = len(metainfo['skeleton_info']) - for _, sk in metainfo['skeleton_info'].items(): - parsed['skeleton_links'].append(sk['link']) - parsed['skeleton_link_colors'].append(sk.get('color', [96, 96, 255])) + parsed["num_skeleton_links"] = len(metainfo["skeleton_info"]) + for _, sk in metainfo["skeleton_info"].items(): + parsed["skeleton_links"].append(sk["link"]) + parsed["skeleton_link_colors"].append(sk.get("color", [96, 96, 255])) # parse extra information - parsed['dataset_keypoint_weights'] = np.array( - metainfo['joint_weights'], dtype=np.float32) - parsed['sigmas'] = np.array(metainfo['sigmas'], dtype=np.float32) + parsed["dataset_keypoint_weights"] = np.array(metainfo["joint_weights"], dtype=np.float32) + parsed["sigmas"] = np.array(metainfo["sigmas"], dtype=np.float32) - if 'stats_info' in metainfo: - parsed['stats_info'] = {} - for name, val in metainfo['stats_info'].items(): - parsed['stats_info'][name] = np.array(val, dtype=np.float32) + if "stats_info" in metainfo: + parsed["stats_info"] = {} + for name, val in metainfo["stats_info"].items(): + parsed["stats_info"][name] = np.array(val, dtype=np.float32) # formatting def _map(src, mapping: dict): @@ -187,16 +186,11 @@ def parse_pose_metainfo(metainfo: dict): else: return mapping[src] - parsed['flip_pairs'] = _map( - parsed['flip_pairs'], mapping=parsed['keypoint_name2id']) - parsed['flip_indices'] = _map( - parsed['flip_indices'], mapping=parsed['keypoint_name2id']) - parsed['skeleton_links'] = _map( - parsed['skeleton_links'], mapping=parsed['keypoint_name2id']) - - parsed['keypoint_colors'] = np.array( - parsed['keypoint_colors'], dtype=np.uint8) - parsed['skeleton_link_colors'] = np.array( - parsed['skeleton_link_colors'], dtype=np.uint8) + parsed["flip_pairs"] = _map(parsed["flip_pairs"], mapping=parsed["keypoint_name2id"]) + parsed["flip_indices"] = _map(parsed["flip_indices"], mapping=parsed["keypoint_name2id"]) + parsed["skeleton_links"] = _map(parsed["skeleton_links"], mapping=parsed["keypoint_name2id"]) + + parsed["keypoint_colors"] = np.array(parsed["keypoint_colors"], dtype=np.uint8) + parsed["skeleton_link_colors"] = np.array(parsed["skeleton_link_colors"], dtype=np.uint8) return parsed diff --git a/mmpose/datasets/datasets/wholebody/__init__.py b/mmpose/datasets/datasets/wholebody/__init__.py index b3934fc225e301251e356f9c2d8880d982ec6dc9..965c720dcceb433c50ef05cb6672915f19af4998 100644 --- a/mmpose/datasets/datasets/wholebody/__init__.py +++ b/mmpose/datasets/datasets/wholebody/__init__.py @@ -3,4 +3,4 @@ from .coco_wholebody_dataset import CocoWholeBodyDataset from .halpe_dataset import HalpeDataset from .ubody2d_dataset import UBody2dDataset -__all__ = ['CocoWholeBodyDataset', 'HalpeDataset', 'UBody2dDataset'] +__all__ = ["CocoWholeBodyDataset", "HalpeDataset", "UBody2dDataset"] diff --git a/mmpose/datasets/datasets/wholebody/coco_wholebody_dataset.py b/mmpose/datasets/datasets/wholebody/coco_wholebody_dataset.py index 9c8b88c20fb7471a3cbb0e904ac023a0b300fcc1..f790b2cb7650a611fa216a7c896411b9d6f4b507 100644 --- a/mmpose/datasets/datasets/wholebody/coco_wholebody_dataset.py +++ b/mmpose/datasets/datasets/wholebody/coco_wholebody_dataset.py @@ -6,6 +6,7 @@ from typing import Optional import numpy as np from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -64,8 +65,7 @@ class CocoWholeBodyDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict( - from_file='configs/_base_/datasets/coco_wholebody.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/coco_wholebody.py") def parse_data_info(self, raw_data_info: dict) -> Optional[dict]: """Parse raw COCO annotation of an instance. @@ -82,14 +82,14 @@ class CocoWholeBodyDataset(BaseCocoStyleDataset): dict: Parsed instance annotation """ - ann = raw_data_info['raw_ann_info'] - img = raw_data_info['raw_img_info'] + ann = raw_data_info["raw_ann_info"] + img = raw_data_info["raw_img_info"] - img_path = osp.join(self.data_prefix['img'], img['file_name']) - img_w, img_h = img['width'], img['height'] + img_path = osp.join(self.data_prefix["img"], img["file_name"]) + img_w, img_h = img["width"], img["height"] # get bbox in shape [1, 4], formatted as xywh - x, y, w, h = ann['bbox'] + x, y, w, h = ann["bbox"] x1 = np.clip(x, 0, img_w - 1) y1 = np.clip(y, 0, img_h - 1) x2 = np.clip(x + w, 0, img_w - 1) @@ -99,36 +99,36 @@ class CocoWholeBodyDataset(BaseCocoStyleDataset): # keypoints in shape [1, K, 2] and keypoints_visible in [1, K] # COCO-Wholebody: consisting of body, foot, face and hand keypoints - _keypoints = np.array(ann['keypoints'] + ann['foot_kpts'] + - ann['face_kpts'] + ann['lefthand_kpts'] + - ann['righthand_kpts']).reshape(1, -1, 3) + _keypoints = np.array( + ann["keypoints"] + ann["foot_kpts"] + ann["face_kpts"] + ann["lefthand_kpts"] + ann["righthand_kpts"] + ).reshape(1, -1, 3) keypoints = _keypoints[..., :2] keypoints_visible = np.minimum(1, _keypoints[..., 2] > 0) - if 'area' in ann: - area = np.array(ann['area'], dtype=np.float32) + if "area" in ann: + area = np.array(ann["area"], dtype=np.float32) else: area = np.clip((x2 - x1) * (y2 - y1) * 0.53, a_min=1.0, a_max=None) area = np.array(area, dtype=np.float32) - num_keypoints = ann['num_keypoints'] + num_keypoints = ann["num_keypoints"] data_info = { - 'img_id': ann['image_id'], - 'img_path': img_path, - 'bbox': bbox, - 'bbox_score': np.ones(1, dtype=np.float32), - 'num_keypoints': num_keypoints, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'iscrowd': ann['iscrowd'], - 'segmentation': ann['segmentation'], - 'area': area, - 'id': ann['id'], - 'category_id': ann['category_id'], + "img_id": ann["image_id"], + "img_path": img_path, + "bbox": bbox, + "bbox_score": np.ones(1, dtype=np.float32), + "num_keypoints": num_keypoints, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "iscrowd": ann["iscrowd"], + "segmentation": ann["segmentation"], + "area": area, + "id": ann["id"], + "category_id": ann["category_id"], # store the raw annotation of the instance # it is useful for evaluation without providing ann_file - 'raw_ann_info': copy.deepcopy(ann), + "raw_ann_info": copy.deepcopy(ann), } return data_info diff --git a/mmpose/datasets/datasets/wholebody/halpe_dataset.py b/mmpose/datasets/datasets/wholebody/halpe_dataset.py index 0699f3b7023b200ee42e3cfe7f475a51123ef190..b79504144125b3faa9908eddf1a91c4a6436ac97 100644 --- a/mmpose/datasets/datasets/wholebody/halpe_dataset.py +++ b/mmpose/datasets/datasets/wholebody/halpe_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from ..base import BaseCocoStyleDataset @@ -56,4 +57,4 @@ class HalpeDataset(BaseCocoStyleDataset): image. Default: 1000. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/halpe.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/halpe.py") diff --git a/mmpose/datasets/datasets/wholebody/ubody2d_dataset.py b/mmpose/datasets/datasets/wholebody/ubody2d_dataset.py index 9a0cb1711a18f9abcf534367db9b12f585b82281..63f51da13050ed22cba539b5a4f088d78105d8b1 100644 --- a/mmpose/datasets/datasets/wholebody/ubody2d_dataset.py +++ b/mmpose/datasets/datasets/wholebody/ubody2d_dataset.py @@ -1,5 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. from mmpose.registry import DATASETS + from .coco_wholebody_dataset import CocoWholeBodyDataset @@ -60,4 +61,4 @@ class UBody2dDataset(CocoWholeBodyDataset): Default: 1. """ - METAINFO: dict = dict(from_file='configs/_base_/datasets/ubody2d.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/ubody2d.py") diff --git a/mmpose/datasets/datasets/wholebody3d/__init__.py b/mmpose/datasets/datasets/wholebody3d/__init__.py index 19e1fe2f6cd449cf691e56fec776d604804badda..05c56f7faf6df06ca9d6a196e3e6f2ebbff770c3 100644 --- a/mmpose/datasets/datasets/wholebody3d/__init__.py +++ b/mmpose/datasets/datasets/wholebody3d/__init__.py @@ -2,4 +2,4 @@ from .h3wb_dataset import H36MWholeBodyDataset from .ubody3d_dataset import UBody3dDataset -__all__ = ['UBody3dDataset', 'H36MWholeBodyDataset'] +__all__ = ["UBody3dDataset", "H36MWholeBodyDataset"] diff --git a/mmpose/datasets/datasets/wholebody3d/h3wb_dataset.py b/mmpose/datasets/datasets/wholebody3d/h3wb_dataset.py index 95e40db4b406742386bdbe02dd40ea5c3edda282..3d070d9a3ed03ad20a94b9a9bb2accadf4e4ff38 100644 --- a/mmpose/datasets/datasets/wholebody3d/h3wb_dataset.py +++ b/mmpose/datasets/datasets/wholebody3d/h3wb_dataset.py @@ -5,12 +5,13 @@ import numpy as np from mmengine.fileio import get_local_path from mmpose.registry import DATASETS + from ..body3d import Human36mDataset @DATASETS.register_module() class H36MWholeBodyDataset(Human36mDataset): - METAINFO: dict = dict(from_file='configs/_base_/datasets/h3wb.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/h3wb.py") """Human3.6M 3D WholeBody Dataset. "H3WB: Human3.6M 3D WholeBody Dataset and Benchmark", ICCV'2023. @@ -92,11 +93,11 @@ class H36MWholeBodyDataset(Human36mDataset): def __init__(self, test_mode: bool = False, **kwargs): - self.camera_order_id = ['54138969', '55011271', '58860488', '60457274'] + self.camera_order_id = ["54138969", "55011271", "58860488", "60457274"] if not test_mode: - self.subjects = ['S1', 'S5', 'S6'] + self.subjects = ["S1", "S5", "S6"] else: - self.subjects = ['S7'] + self.subjects = ["S7"] super().__init__(test_mode=test_mode, **kwargs) @@ -104,8 +105,8 @@ class H36MWholeBodyDataset(Human36mDataset): with get_local_path(ann_file) as local_path: data = np.load(local_path, allow_pickle=True) - self.ann_data = data['train_data'].item() - self.camera_data = data['metadata'].item() + self.ann_data = data["train_data"].item() + self.camera_data = data["metadata"].item() def get_sequence_indices(self) -> List[List[int]]: return [] @@ -122,28 +123,22 @@ class H36MWholeBodyDataset(Human36mDataset): for cam in self.camera_order_id: if cam not in self.ann_data[subject][act]: continue - keypoints_2d = self.ann_data[subject][act][cam]['pose_2d'] - keypoints_3d = self.ann_data[subject][act][cam][ - 'camera_3d'] + keypoints_2d = self.ann_data[subject][act][cam]["pose_2d"] + keypoints_3d = self.ann_data[subject][act][cam]["camera_3d"] num_keypoints = keypoints_2d.shape[1] camera_param = self.camera_data[subject][cam] camera_param = { - 'K': camera_param['K'][0, :2, ...], - 'R': camera_param['R'][0], - 'T': camera_param['T'].reshape(3, 1), - 'Distortion': camera_param['Distortion'][0] + "K": camera_param["K"][0, :2, ...], + "R": camera_param["R"][0], + "T": camera_param["T"].reshape(3, 1), + "Distortion": camera_param["Distortion"][0], } seq_step = 1 _len = (self.seq_len - 1) * seq_step + 1 - _indices = list( - range(len(self.ann_data[subject][act]['frame_id']))) - seq_indices = [ - _indices[i:(i + _len):seq_step] - for i in list(range(0, - len(_indices) - _len + 1)) - ] + _indices = list(range(len(self.ann_data[subject][act]["frame_id"]))) + seq_indices = [_indices[i : (i + _len) : seq_step] for i in list(range(0, len(_indices) - _len + 1))] for idx, frame_ids in enumerate(seq_indices): expected_num_frames = self.seq_len @@ -151,60 +146,39 @@ class H36MWholeBodyDataset(Human36mDataset): expected_num_frames = self.multiple_target assert len(frame_ids) == (expected_num_frames), ( - f'Expected `frame_ids` == {expected_num_frames}, but ' # noqa - f'got {len(frame_ids)} ') + f"Expected `frame_ids` == {expected_num_frames}, but " # noqa + f"got {len(frame_ids)} " + ) _kpts_2d = keypoints_2d[frame_ids] _kpts_3d = keypoints_3d[frame_ids] - target_idx = [-1] if self.causal else [ - int(self.seq_len) // 2 - ] + target_idx = [-1] if self.causal else [int(self.seq_len) // 2] if self.multiple_target > 0: target_idx = list(range(self.multiple_target)) instance_info = { - 'num_keypoints': - num_keypoints, - 'keypoints': - _kpts_2d, - 'keypoints_3d': - _kpts_3d / 1000, - 'keypoints_visible': - np.ones_like(_kpts_2d[..., 0], dtype=np.float32), - 'keypoints_3d_visible': - np.ones_like(_kpts_2d[..., 0], dtype=np.float32), - 'scale': - np.zeros((1, 1), dtype=np.float32), - 'center': - np.zeros((1, 2), dtype=np.float32), - 'factor': - np.zeros((1, 1), dtype=np.float32), - 'id': - instance_id, - 'category_id': - 1, - 'iscrowd': - 0, - 'camera_param': - camera_param, - 'img_paths': [ - f'{subject}/{act}/{cam}/{i:06d}.jpg' - for i in frame_ids - ], - 'img_ids': - frame_ids, - 'lifting_target': - _kpts_3d[target_idx] / 1000, - 'lifting_target_visible': - np.ones_like(_kpts_2d[..., 0], - dtype=np.float32)[target_idx], + "num_keypoints": num_keypoints, + "keypoints": _kpts_2d, + "keypoints_3d": _kpts_3d / 1000, + "keypoints_visible": np.ones_like(_kpts_2d[..., 0], dtype=np.float32), + "keypoints_3d_visible": np.ones_like(_kpts_2d[..., 0], dtype=np.float32), + "scale": np.zeros((1, 1), dtype=np.float32), + "center": np.zeros((1, 2), dtype=np.float32), + "factor": np.zeros((1, 1), dtype=np.float32), + "id": instance_id, + "category_id": 1, + "iscrowd": 0, + "camera_param": camera_param, + "img_paths": [f"{subject}/{act}/{cam}/{i:06d}.jpg" for i in frame_ids], + "img_ids": frame_ids, + "lifting_target": _kpts_3d[target_idx] / 1000, + "lifting_target_visible": np.ones_like(_kpts_2d[..., 0], dtype=np.float32)[target_idx], } instance_list.append(instance_info) - if self.data_mode == 'bottomup': - for idx, img_name in enumerate( - instance_info['img_paths']): + if self.data_mode == "bottomup": + for idx, img_name in enumerate(instance_info["img_paths"]): img_info = self.get_img_info(idx, img_name) image_list.append(img_info) diff --git a/mmpose/datasets/datasets/wholebody3d/ubody3d_dataset.py b/mmpose/datasets/datasets/wholebody3d/ubody3d_dataset.py index 85b8d893e7bd131d6a9ccf179771964f843e89e5..e4a47256b962f1f7ea130e657c1f4cd8be705f52 100644 --- a/mmpose/datasets/datasets/wholebody3d/ubody3d_dataset.py +++ b/mmpose/datasets/datasets/wholebody3d/ubody3d_dataset.py @@ -5,10 +5,10 @@ from typing import List, Tuple import numpy as np from mmengine.fileio import get_local_path -from xtcocotools.coco import COCO from mmpose.datasets.datasets import BaseMocapDataset from mmpose.registry import DATASETS +from xtcocotools.coco import COCO @DATASETS.register_module() @@ -69,12 +69,7 @@ class UBody3dDataset(BaseMocapDataset): image. Default: 1000. """ - def __init__(self, - multiple_target: int = 0, - multiple_target_step: int = 0, - seq_step: int = 1, - pad_video_seq: bool = False, - **kwargs): + def __init__(self, multiple_target: int = 0, multiple_target_step: int = 0, seq_step: int = 1, pad_video_seq: bool = False, **kwargs): self.seq_step = seq_step self.pad_video_seq = pad_video_seq @@ -84,7 +79,7 @@ class UBody3dDataset(BaseMocapDataset): super().__init__(multiple_target=multiple_target, **kwargs) - METAINFO: dict = dict(from_file='configs/_base_/datasets/ubody3d.py') + METAINFO: dict = dict(from_file="configs/_base_/datasets/ubody3d.py") def _load_ann_file(self, ann_file: str) -> dict: """Load annotation file.""" @@ -96,7 +91,7 @@ class UBody3dDataset(BaseMocapDataset): img_ids = self.ann_data.getImgIds() for img_id in img_ids: img_info = self.ann_data.loadImgs(img_id)[0] - subj, _, _ = self._parse_image_name(img_info['file_name']) + subj, _, _ = self._parse_image_name(img_info["file_name"]) video_frames[subj].append(img_id) sequence_indices = [] @@ -107,11 +102,9 @@ class UBody3dDataset(BaseMocapDataset): for _, _img_ids in sorted(video_frames.items()): n_frame = len(_img_ids) _ann_ids = self.ann_data.getAnnIds(imgIds=_img_ids) - seqs_from_video = [ - _ann_ids[i:(i + self.multiple_target):_step] - for i in range(0, n_frame, self.multiple_target_step) - ][:(n_frame + self.multiple_target_step - - self.multiple_target) // self.multiple_target_step] + seqs_from_video = [_ann_ids[i : (i + self.multiple_target) : _step] for i in range(0, n_frame, self.multiple_target_step)][ + : (n_frame + self.multiple_target_step - self.multiple_target) // self.multiple_target_step + ] sequence_indices.extend(seqs_from_video) else: for _, _img_ids in sorted(video_frames.items()): @@ -128,19 +121,12 @@ class UBody3dDataset(BaseMocapDataset): frames_right = frames_left for i in range(n_frame): pad_left = max(0, frames_left - i // _step) - pad_right = max( - 0, frames_right - (n_frame - 1 - i) // _step) + pad_right = max(0, frames_right - (n_frame - 1 - i) // _step) start = max(i % _step, i - frames_left * _step) - end = min(n_frame - (n_frame - 1 - i) % _step, - i + frames_right * _step + 1) - sequence_indices.append([_ann_ids[0]] * pad_left + - _ann_ids[start:end:_step] + - [_ann_ids[-1]] * pad_right) + end = min(n_frame - (n_frame - 1 - i) % _step, i + frames_right * _step + 1) + sequence_indices.append([_ann_ids[0]] * pad_left + _ann_ids[start:end:_step] + [_ann_ids[-1]] * pad_right) else: - seqs_from_video = [ - _ann_ids[i:(i + _len):_step] - for i in range(0, n_frame - _len + 1, _step) - ] + seqs_from_video = [_ann_ids[i : (i + _len) : _step] for i in range(0, n_frame - _len + 1, _step)] sequence_indices.extend(seqs_from_video) # reduce dataset size if needed @@ -161,15 +147,14 @@ class UBody3dDataset(BaseMocapDataset): Returns: tuple[str, int]: Video name and frame index. """ - trim, file_name = image_path.split('/')[-2:] - frame_id, suffix = file_name.split('.') + trim, file_name = image_path.split("/")[-2:] + frame_id, suffix = file_name.split(".") return trim, frame_id, suffix def _load_annotations(self): """Load data from annotations in COCO format.""" - num_keypoints = self.metainfo['num_keypoints'] - self._metainfo['CLASSES'] = self.ann_data.loadCats( - self.ann_data.getCatIds()) + num_keypoints = self.metainfo["num_keypoints"] + self._metainfo["CLASSES"] = self.ann_data.loadCats(self.ann_data.getCatIds()) instance_list = [] image_list = [] @@ -179,69 +164,65 @@ class UBody3dDataset(BaseMocapDataset): if self.multiple_target: expected_num_frames = self.multiple_target - assert len(_ann_ids) == (expected_num_frames), ( - f'Expected `frame_ids` == {expected_num_frames}, but ' - f'got {len(_ann_ids)} ') + assert len(_ann_ids) == (expected_num_frames), f"Expected `frame_ids` == {expected_num_frames}, but " f"got {len(_ann_ids)} " anns = self.ann_data.loadAnns(_ann_ids) img_ids = [] kpts = np.zeros((len(anns), num_keypoints, 2), dtype=np.float32) kpts_3d = np.zeros((len(anns), num_keypoints, 3), dtype=np.float32) - keypoints_visible = np.zeros((len(anns), num_keypoints, 1), - dtype=np.float32) + keypoints_visible = np.zeros((len(anns), num_keypoints, 1), dtype=np.float32) for j, ann in enumerate(anns): - img_ids.append(ann['image_id']) - kpts[j] = np.array(ann['keypoints'], dtype=np.float32) - kpts_3d[j] = np.array(ann['keypoints_3d'], dtype=np.float32) - keypoints_visible[j] = np.array( - ann['keypoints_valid'], dtype=np.float32) + img_ids.append(ann["image_id"]) + kpts[j] = np.array(ann["keypoints"], dtype=np.float32) + kpts_3d[j] = np.array(ann["keypoints_3d"], dtype=np.float32) + keypoints_visible[j] = np.array(ann["keypoints_valid"], dtype=np.float32) imgs = self.ann_data.loadImgs(img_ids) keypoints_visible = keypoints_visible.squeeze(-1) scales = np.zeros(len(imgs), dtype=np.float32) centers = np.zeros((len(imgs), 2), dtype=np.float32) - img_paths = np.array([img['file_name'] for img in imgs]) - factors = np.zeros((kpts_3d.shape[0], ), dtype=np.float32) + img_paths = np.array([img["file_name"] for img in imgs]) + factors = np.zeros((kpts_3d.shape[0],), dtype=np.float32) target_idx = [-1] if self.causal else [int(self.seq_len // 2)] if self.multiple_target: target_idx = list(range(self.multiple_target)) - cam_param = anns[-1]['camera_param'] - if 'w' not in cam_param or 'h' not in cam_param: - cam_param['w'] = 1000 - cam_param['h'] = 1000 + cam_param = anns[-1]["camera_param"] + if "w" not in cam_param or "h" not in cam_param: + cam_param["w"] = 1000 + cam_param["h"] = 1000 instance_info = { - 'num_keypoints': num_keypoints, - 'keypoints': kpts, - 'keypoints_3d': kpts_3d, - 'keypoints_visible': keypoints_visible, - 'scale': scales, - 'center': centers, - 'id': i, - 'category_id': 1, - 'iscrowd': 0, - 'img_paths': list(img_paths), - 'img_ids': [img['id'] for img in imgs], - 'lifting_target': kpts_3d[target_idx], - 'lifting_target_visible': keypoints_visible[target_idx], - 'target_img_paths': img_paths[target_idx], - 'camera_param': cam_param, - 'factor': factors, - 'target_idx': target_idx, + "num_keypoints": num_keypoints, + "keypoints": kpts, + "keypoints_3d": kpts_3d, + "keypoints_visible": keypoints_visible, + "scale": scales, + "center": centers, + "id": i, + "category_id": 1, + "iscrowd": 0, + "img_paths": list(img_paths), + "img_ids": [img["id"] for img in imgs], + "lifting_target": kpts_3d[target_idx], + "lifting_target_visible": keypoints_visible[target_idx], + "target_img_paths": img_paths[target_idx], + "camera_param": cam_param, + "factor": factors, + "target_idx": target_idx, } instance_list.append(instance_info) for img_id in self.ann_data.getImgIds(): img = self.ann_data.loadImgs(img_id)[0] - img.update({ - 'img_id': - img_id, - 'img_path': - osp.join(self.data_prefix['img'], img['file_name']), - }) + img.update( + { + "img_id": img_id, + "img_path": osp.join(self.data_prefix["img"], img["file_name"]), + } + ) image_list.append(img) return instance_list, image_list diff --git a/mmpose/datasets/samplers.py b/mmpose/datasets/samplers.py index d6bb34287a8c6b43552601eeb9b2e7c9a4fa90df..4d0cc27d810d4c99817b146f2d4f3f2fe5952c7c 100644 --- a/mmpose/datasets/samplers.py +++ b/mmpose/datasets/samplers.py @@ -29,24 +29,24 @@ class MultiSourceSampler(Sampler): Defaults to ``None`` """ - def __init__(self, - dataset: Sized, - batch_size: int, - source_ratio: List[Union[int, float]], - shuffle: bool = True, - round_up: bool = True, - seed: Optional[int] = None) -> None: - - assert isinstance(dataset, CombinedDataset),\ - f'The dataset must be CombinedDataset, but get {dataset}' - assert isinstance(batch_size, int) and batch_size > 0, \ - 'batch_size must be a positive integer value, ' \ - f'but got batch_size={batch_size}' - assert isinstance(source_ratio, list), \ - f'source_ratio must be a list, but got source_ratio={source_ratio}' - assert len(source_ratio) == len(dataset._lens), \ - 'The length of source_ratio must be equal to ' \ - f'the number of datasets, but got source_ratio={source_ratio}' + def __init__( + self, + dataset: Sized, + batch_size: int, + source_ratio: List[Union[int, float]], + shuffle: bool = True, + round_up: bool = True, + seed: Optional[int] = None, + ) -> None: + + assert isinstance(dataset, CombinedDataset), f"The dataset must be CombinedDataset, but get {dataset}" + assert isinstance(batch_size, int) and batch_size > 0, ( + "batch_size must be a positive integer value, " f"but got batch_size={batch_size}" + ) + assert isinstance(source_ratio, list), f"source_ratio must be a list, but got source_ratio={source_ratio}" + assert len(source_ratio) == len(dataset._lens), ( + "The length of source_ratio must be equal to " f"the number of datasets, but got source_ratio={source_ratio}" + ) rank, world_size = get_dist_info() self.rank = rank @@ -57,22 +57,17 @@ class MultiSourceSampler(Sampler): self.batch_size = batch_size self.source_ratio = source_ratio self.num_samples = int(math.ceil(len(self.dataset) * 1.0 / world_size)) - self.num_per_source = [ - int(batch_size * sr / sum(source_ratio)) for sr in source_ratio - ] + self.num_per_source = [int(batch_size * sr / sum(source_ratio)) for sr in source_ratio] self.num_per_source[0] = batch_size - sum(self.num_per_source[1:]) - assert sum(self.num_per_source) == batch_size, \ - 'The sum of num_per_source must be equal to ' \ - f'batch_size, but get {self.num_per_source}' + assert sum(self.num_per_source) == batch_size, ( + "The sum of num_per_source must be equal to " f"batch_size, but get {self.num_per_source}" + ) self.seed = sync_random_seed() if seed is None else seed self.shuffle = shuffle self.round_up = round_up - self.source2inds = { - source: self._indices_of_rank(len(ds)) - for source, ds in enumerate(dataset.datasets) - } + self.source2inds = {source: self._indices_of_rank(len(ds)) for source, ds in enumerate(dataset.datasets)} def _infinite_indices(self, sample_size: int) -> Iterator[int]: """Infinitely yield a sequence of indices.""" @@ -86,9 +81,7 @@ class MultiSourceSampler(Sampler): def _indices_of_rank(self, sample_size: int) -> Iterator[int]: """Slice the infinite indices by rank.""" - yield from itertools.islice( - self._infinite_indices(sample_size), self.rank, None, - self.world_size) + yield from itertools.islice(self._infinite_indices(sample_size), self.rank, None, self.world_size) def __iter__(self) -> Iterator[int]: batch_buffer = [] diff --git a/mmpose/datasets/transforms/__init__.py b/mmpose/datasets/transforms/__init__.py index 56780d4e6a69ce9883e9295922d40b9767949732..cd25508943cf2b40641472020f2d60d81c91385c 100644 --- a/mmpose/datasets/transforms/__init__.py +++ b/mmpose/datasets/transforms/__init__.py @@ -1,12 +1,23 @@ # Copyright (c) OpenMMLab. All rights reserved. -from .bottomup_transforms import (BottomupGetHeatmapMask, BottomupRandomAffine, - BottomupRandomChoiceResize, - BottomupRandomCrop, BottomupResize) -from .common_transforms import (Albumentation, FilterAnnotations, - GenerateTarget, GetBBoxCenterScale, - PhotometricDistortion, RandomBBoxTransform, - RandomFlip, RandomHalfBody, YOLOXHSVRandomAug, - RandomPatchesBlackout) +from .bottomup_transforms import ( + BottomupGetHeatmapMask, + BottomupRandomAffine, + BottomupRandomChoiceResize, + BottomupRandomCrop, + BottomupResize, +) +from .common_transforms import ( + Albumentation, + FilterAnnotations, + GenerateTarget, + GetBBoxCenterScale, + PhotometricDistortion, + RandomBBoxTransform, + RandomFlip, + RandomHalfBody, + RandomPatchesBlackout, + YOLOXHSVRandomAug, +) from .converting import KeypointConverter, SingleHandConverter from .formatting import PackPoseInputs from .hand_transforms import HandRandomFlip @@ -16,12 +27,28 @@ from .pose3d_transforms import RandomFlipAroundRoot from .topdown_transforms import TopdownAffine __all__ = [ - 'GetBBoxCenterScale', 'RandomBBoxTransform', 'RandomFlip', - 'RandomHalfBody', 'TopdownAffine', 'Albumentation', - 'PhotometricDistortion', 'PackPoseInputs', 'LoadImage', - 'BottomupGetHeatmapMask', 'BottomupRandomAffine', 'BottomupResize', - 'GenerateTarget', 'KeypointConverter', 'RandomFlipAroundRoot', - 'FilterAnnotations', 'YOLOXHSVRandomAug', 'YOLOXMixUp', 'Mosaic', - 'BottomupRandomCrop', 'BottomupRandomChoiceResize', 'HandRandomFlip', - 'SingleHandConverter', 'RandomPatchesBlackout' + "GetBBoxCenterScale", + "RandomBBoxTransform", + "RandomFlip", + "RandomHalfBody", + "TopdownAffine", + "Albumentation", + "PhotometricDistortion", + "PackPoseInputs", + "LoadImage", + "BottomupGetHeatmapMask", + "BottomupRandomAffine", + "BottomupResize", + "GenerateTarget", + "KeypointConverter", + "RandomFlipAroundRoot", + "FilterAnnotations", + "YOLOXHSVRandomAug", + "YOLOXMixUp", + "Mosaic", + "BottomupRandomCrop", + "BottomupRandomChoiceResize", + "HandRandomFlip", + "SingleHandConverter", + "RandomPatchesBlackout", ] diff --git a/mmpose/datasets/transforms/bottomup_transforms.py b/mmpose/datasets/transforms/bottomup_transforms.py index c27afd042a502569603baddb6d07e6f4265ab678..a254030bdbd97bdadb98386e40fd65e405c2b001 100644 --- a/mmpose/datasets/transforms/bottomup_transforms.py +++ b/mmpose/datasets/transforms/bottomup_transforms.py @@ -4,17 +4,22 @@ from typing import Dict, List, Optional, Sequence, Tuple, Union import cv2 import numpy as np -import xtcocotools.mask as cocomask from mmcv.image import imflip_, imresize from mmcv.image.geometric import imrescale from mmcv.transforms import BaseTransform from mmcv.transforms.utils import cache_randomness from scipy.stats import truncnorm +import xtcocotools.mask as cocomask from mmpose.registry import TRANSFORMS -from mmpose.structures.bbox import (bbox_clip_border, bbox_corner2xyxy, - bbox_xyxy2corner, get_pers_warp_matrix, - get_udp_warp_matrix, get_warp_matrix) +from mmpose.structures.bbox import ( + bbox_clip_border, + bbox_corner2xyxy, + bbox_xyxy2corner, + get_pers_warp_matrix, + get_udp_warp_matrix, + get_warp_matrix, +) from mmpose.structures.keypoint import keypoint_clip_border @@ -40,8 +45,7 @@ class BottomupGetHeatmapMask(BaseTransform): super().__init__() self.get_invalid = get_invalid - def _segs_to_mask(self, segs: list, img_shape: Tuple[int, - int]) -> np.ndarray: + def _segs_to_mask(self, segs: list, img_shape: Tuple[int, int]) -> np.ndarray: """Calculate mask from object segmentations. Args: @@ -90,9 +94,9 @@ class BottomupGetHeatmapMask(BaseTransform): dict: Result dict with images distorted. """ - invalid_segs = results.get('invalid_segs', []) - img_shape = results['img_shape'] # (img_h, img_w) - input_size = results['input_size'] + invalid_segs = results.get("invalid_segs", []) + img_shape = results["img_shape"] # (img_h, img_w) + input_size = results["input_size"] mask = self._segs_to_mask(invalid_segs, img_shape) if not self.get_invalid: @@ -102,42 +106,39 @@ class BottomupGetHeatmapMask(BaseTransform): # Apply an affine transform to the mask if the image has been # transformed - if 'warp_mat' in results: - warp_mat = results['warp_mat'] + if "warp_mat" in results: + warp_mat = results["warp_mat"] mask = mask.astype(np.float32) - mask = cv2.warpAffine( - mask, warp_mat, input_size, flags=cv2.INTER_LINEAR) + mask = cv2.warpAffine(mask, warp_mat, input_size, flags=cv2.INTER_LINEAR) # Flip the mask if the image has been flipped - if results.get('flip', False): - flip_dir = results['flip_direction'] + if results.get("flip", False): + flip_dir = results["flip_direction"] if flip_dir is not None: mask = imflip_(mask, flip_dir) # Resize the mask to the same size of heatmaps - if 'heatmaps' in results: - heatmaps = results['heatmaps'] + if "heatmaps" in results: + heatmaps = results["heatmaps"] if isinstance(heatmaps, list): # Multi-level heatmaps heatmap_mask = [] - for hm in results['heatmaps']: + for hm in results["heatmaps"]: h, w = hm.shape[1:3] - _mask = imresize( - mask, size=(w, h), interpolation='bilinear') + _mask = imresize(mask, size=(w, h), interpolation="bilinear") heatmap_mask.append(_mask) else: h, w = heatmaps.shape[1:3] - heatmap_mask = imresize( - mask, size=(w, h), interpolation='bilinear') + heatmap_mask = imresize(mask, size=(w, h), interpolation="bilinear") else: heatmap_mask = mask # Binarize the mask(s) if isinstance(heatmap_mask, list): - results['heatmap_mask'] = [hm > 0.5 for hm in heatmap_mask] + results["heatmap_mask"] = [hm > 0.5 for hm in heatmap_mask] else: - results['heatmap_mask'] = heatmap_mask > 0.5 + results["heatmap_mask"] = heatmap_mask > 0.5 return results @@ -187,29 +188,31 @@ class BottomupRandomAffine(BaseTransform): .. _`UDP (CVPR 2020)`: https://arxiv.org/abs/1911.07524 """ - def __init__(self, - input_size: Optional[Tuple[int, int]] = None, - shift_factor: float = 0.2, - shift_prob: float = 1., - scale_factor: Tuple[float, float] = (0.75, 1.5), - scale_prob: float = 1., - scale_type: str = 'short', - rotate_factor: float = 30., - rotate_prob: float = 1, - shear_factor: float = 2.0, - shear_prob: float = 1.0, - use_udp: bool = False, - pad_val: Union[float, Tuple[float]] = 0, - border: Tuple[int, int] = (0, 0), - distribution='trunc_norm', - transform_mode='affine', - bbox_keep_corner: bool = True, - clip_border: bool = False) -> None: + def __init__( + self, + input_size: Optional[Tuple[int, int]] = None, + shift_factor: float = 0.2, + shift_prob: float = 1.0, + scale_factor: Tuple[float, float] = (0.75, 1.5), + scale_prob: float = 1.0, + scale_type: str = "short", + rotate_factor: float = 30.0, + rotate_prob: float = 1, + shear_factor: float = 2.0, + shear_prob: float = 1.0, + use_udp: bool = False, + pad_val: Union[float, Tuple[float]] = 0, + border: Tuple[int, int] = (0, 0), + distribution="trunc_norm", + transform_mode="affine", + bbox_keep_corner: bool = True, + clip_border: bool = False, + ) -> None: super().__init__() - assert transform_mode in ('affine', 'affine_udp', 'perspective'), \ - f'the argument transform_mode should be either \'affine\', ' \ - f'\'affine_udp\' or \'perspective\', but got \'{transform_mode}\'' + assert transform_mode in ("affine", "affine_udp", "perspective"), ( + f"the argument transform_mode should be either 'affine', " f"'affine_udp' or 'perspective', but got '{transform_mode}'" + ) self.input_size = input_size self.shift_factor = shift_factor @@ -232,26 +235,20 @@ class BottomupRandomAffine(BaseTransform): if isinstance(pad_val, (int, float)): pad_val = (pad_val, pad_val, pad_val) - if 'affine' in transform_mode: - self._transform = partial( - cv2.warpAffine, flags=cv2.INTER_LINEAR, borderValue=pad_val) + if "affine" in transform_mode: + self._transform = partial(cv2.warpAffine, flags=cv2.INTER_LINEAR, borderValue=pad_val) else: self._transform = partial(cv2.warpPerspective, borderValue=pad_val) - def _random(self, - low: float = -1., - high: float = 1., - size: tuple = ()) -> np.ndarray: - if self.distribution == 'trunc_norm': + def _random(self, low: float = -1.0, high: float = 1.0, size: tuple = ()) -> np.ndarray: + if self.distribution == "trunc_norm": """Sample from a truncated normal distribution.""" return truncnorm.rvs(low, high, size=size).astype(np.float32) - elif self.distribution == 'uniform': + elif self.distribution == "uniform": x = np.random.rand(*size) return x * (high - low) + low else: - raise ValueError(f'the argument `distribution` should be either' - f'\'trunc_norn\' or \'uniform\', but got ' - f'{self.distribution}.') + raise ValueError(f"the argument `distribution` should be either" f"'trunc_norn' or 'uniform', but got " f"{self.distribution}.") def _fix_aspect_ratio(self, scale: np.ndarray, aspect_ratio: float): """Extend the scale to match the given aspect ratio. @@ -265,19 +262,19 @@ class BottomupRandomAffine(BaseTransform): """ w, h = scale if w > h * aspect_ratio: - if self.scale_type == 'long': + if self.scale_type == "long": _w, _h = w, w / aspect_ratio - elif self.scale_type == 'short': + elif self.scale_type == "short": _w, _h = h * aspect_ratio, h else: - raise ValueError(f'Unknown scale type: {self.scale_type}') + raise ValueError(f"Unknown scale type: {self.scale_type}") else: - if self.scale_type == 'short': + if self.scale_type == "short": _w, _h = w, w / aspect_ratio - elif self.scale_type == 'long': + elif self.scale_type == "long": _w, _h = h * aspect_ratio, h else: - raise ValueError(f'Unknown scale type: {self.scale_type}') + raise ValueError(f"Unknown scale type: {self.scale_type}") return np.array([_w, _h], dtype=scale.dtype) @cache_randomness @@ -292,15 +289,14 @@ class BottomupRandomAffine(BaseTransform): """ # get offset if np.random.rand() < self.shift_prob: - offset = self._random(size=(2, )) * self.shift_factor + offset = self._random(size=(2,)) * self.shift_factor else: - offset = np.zeros((2, ), dtype=np.float32) + offset = np.zeros((2,), dtype=np.float32) # get scale if np.random.rand() < self.scale_prob: scale_min, scale_max = self.scale_factor - scale = scale_min + (scale_max - scale_min) * ( - self._random(size=(1, )) + 1) / 2 + scale = scale_min + (scale_max - scale_min) * (self._random(size=(1,)) + 1) / 2 else: scale = np.ones(1, dtype=np.float32) @@ -311,11 +307,10 @@ class BottomupRandomAffine(BaseTransform): rotate = 0 # get shear - if 'perspective' in self.transform_mode and np.random.rand( - ) < self.shear_prob: - shear = self._random(size=(2, )) * self.shear_factor + if "perspective" in self.transform_mode and np.random.rand() < self.shear_prob: + shear = self._random(size=(2,)) * self.shear_factor else: - shear = np.zeros((2, ), dtype=np.float32) + shear = np.zeros((2,), dtype=np.float32) return offset, scale, rotate, shear @@ -333,62 +328,46 @@ class BottomupRandomAffine(BaseTransform): dict: Result dict with images distorted. """ - img_h, img_w = results['img_shape'][:2] + img_h, img_w = results["img_shape"][:2] w, h = self.input_size offset_rate, scale_rate, rotate, shear = self._get_transform_params() - if 'affine' in self.transform_mode: + if "affine" in self.transform_mode: offset = offset_rate * [img_w, img_h] scale = scale_rate * [img_w, img_h] # adjust the scale to match the target aspect ratio scale = self._fix_aspect_ratio(scale, aspect_ratio=w / h) - if self.transform_mode == 'affine_udp': - center = np.array([(img_w - 1.0) / 2, (img_h - 1.0) / 2], - dtype=np.float32) - warp_mat = get_udp_warp_matrix( - center=center + offset, - scale=scale, - rot=rotate, - output_size=(w, h)) + if self.transform_mode == "affine_udp": + center = np.array([(img_w - 1.0) / 2, (img_h - 1.0) / 2], dtype=np.float32) + warp_mat = get_udp_warp_matrix(center=center + offset, scale=scale, rot=rotate, output_size=(w, h)) else: center = np.array([img_w / 2, img_h / 2], dtype=np.float32) - warp_mat = get_warp_matrix( - center=center + offset, - scale=scale, - rot=rotate, - output_size=(w, h)) + warp_mat = get_warp_matrix(center=center + offset, scale=scale, rot=rotate, output_size=(w, h)) else: offset = offset_rate * [w, h] center = np.array([w / 2, h / 2], dtype=np.float32) - warp_mat = get_pers_warp_matrix( - center=center, - translate=offset, - scale=scale_rate[0], - rot=rotate, - shear=shear) + warp_mat = get_pers_warp_matrix(center=center, translate=offset, scale=scale_rate[0], rot=rotate, shear=shear) # warp image and keypoints - results['img'] = self._transform(results['img'], warp_mat, - (int(w), int(h))) + results["img"] = self._transform(results["img"], warp_mat, (int(w), int(h))) - if 'keypoints' in results: + if "keypoints" in results: # Only transform (x, y) coordinates - kpts = cv2.transform(results['keypoints'], warp_mat) + kpts = cv2.transform(results["keypoints"], warp_mat) if kpts.shape[-1] == 3: kpts = kpts[..., :2] / kpts[..., 2:3] - results['keypoints'] = kpts + results["keypoints"] = kpts if self.clip_border: - results['keypoints'], results[ - 'keypoints_visible'] = keypoint_clip_border( - results['keypoints'], results['keypoints_visible'], - (w, h)) + results["keypoints"], results["keypoints_visible"] = keypoint_clip_border( + results["keypoints"], results["keypoints_visible"], (w, h) + ) - if 'bbox' in results: - bbox = bbox_xyxy2corner(results['bbox']) + if "bbox" in results: + bbox = bbox_xyxy2corner(results["bbox"]) bbox = cv2.transform(bbox, warp_mat) if bbox.shape[-1] == 3: bbox = bbox[..., :2] / bbox[..., 2:3] @@ -396,17 +375,17 @@ class BottomupRandomAffine(BaseTransform): bbox = bbox_corner2xyxy(bbox) if self.clip_border: bbox = bbox_clip_border(bbox, (w, h)) - results['bbox'] = bbox + results["bbox"] = bbox - if 'area' in results: + if "area" in results: warp_mat_for_area = warp_mat if warp_mat.shape[0] == 2: aux_row = np.array([[0.0, 0.0, 1.0]], dtype=warp_mat.dtype) warp_mat_for_area = np.concatenate((warp_mat, aux_row)) - results['area'] *= np.linalg.det(warp_mat_for_area) + results["area"] *= np.linalg.det(warp_mat_for_area) - results['input_size'] = self.input_size - results['warp_mat'] = warp_mat + results["input_size"] = self.input_size + results["warp_mat"] = warp_mat return results @@ -463,13 +442,15 @@ class BottomupResize(BaseTransform): .. _`UDP (CVPR 2020)`: https://arxiv.org/abs/1911.07524 """ - def __init__(self, - input_size: Tuple[int, int], - aug_scales: Optional[List[float]] = None, - size_factor: int = 32, - resize_mode: str = 'fit', - pad_val: tuple = (0, 0, 0), - use_udp: bool = False): + def __init__( + self, + input_size: Tuple[int, int], + aug_scales: Optional[List[float]] = None, + size_factor: int = 32, + resize_mode: str = "fit", + pad_val: tuple = (0, 0, 0), + use_udp: bool = False, + ): super().__init__() self.input_size = input_size @@ -484,8 +465,7 @@ class BottomupResize(BaseTransform): """Ceil the given size (tuple of [w, h]) to a multiple of the base.""" return tuple(int(np.ceil(s / base) * base) for s in size) - def _get_input_size(self, img_size: Tuple[int, int], - input_size: Tuple[int, int]) -> Tuple: + def _get_input_size(self, img_size: Tuple[int, int], input_size: Tuple[int, int]) -> Tuple: """Calculate the actual input size (which the original image will be resized to) and the padded input size (which the resized image will be padded to, or which is the size of the model input). @@ -504,34 +484,32 @@ class BottomupResize(BaseTransform): img_w, img_h = img_size ratio = img_w / img_h - if self.resize_mode == 'fit': - padded_input_size = self._ceil_to_multiple(input_size, - self.size_factor) + if self.resize_mode == "fit": + padded_input_size = self._ceil_to_multiple(input_size, self.size_factor) if padded_input_size != input_size: raise ValueError( - 'When ``resize_mode==\'fit\', the input size (height and' - ' width) should be mulitples of the size_factor(' - f'{self.size_factor}) at all scales. Got invalid input ' - f'size {input_size}.') + "When ``resize_mode=='fit', the input size (height and" + " width) should be mulitples of the size_factor(" + f"{self.size_factor}) at all scales. Got invalid input " + f"size {input_size}." + ) pad_w, pad_h = padded_input_size rsz_w = min(pad_w, pad_h * ratio) rsz_h = min(pad_h, pad_w / ratio) actual_input_size = (rsz_w, rsz_h) - elif self.resize_mode == 'expand': - _padded_input_size = self._ceil_to_multiple( - input_size, self.size_factor) + elif self.resize_mode == "expand": + _padded_input_size = self._ceil_to_multiple(input_size, self.size_factor) pad_w, pad_h = _padded_input_size rsz_w = max(pad_w, pad_h * ratio) rsz_h = max(pad_h, pad_w / ratio) actual_input_size = (rsz_w, rsz_h) - padded_input_size = self._ceil_to_multiple(actual_input_size, - self.size_factor) + padded_input_size = self._ceil_to_multiple(actual_input_size, self.size_factor) else: - raise ValueError(f'Invalid resize mode {self.resize_mode}') + raise ValueError(f"Invalid resize mode {self.resize_mode}") return actual_input_size, padded_input_size @@ -549,8 +527,8 @@ class BottomupResize(BaseTransform): dict: Result dict with images distorted. """ - img = results['img'] - img_h, img_w = results['ori_shape'] + img = results["img"] + img_h, img_w = results["ori_shape"] w, h = self.input_size input_sizes = [(w, h)] @@ -560,53 +538,37 @@ class BottomupResize(BaseTransform): imgs = [] for i, (_w, _h) in enumerate(input_sizes): - actual_input_size, padded_input_size = self._get_input_size( - img_size=(img_w, img_h), input_size=(_w, _h)) + actual_input_size, padded_input_size = self._get_input_size(img_size=(img_w, img_h), input_size=(_w, _h)) if self.use_udp: - center = np.array([(img_w - 1.0) / 2, (img_h - 1.0) / 2], - dtype=np.float32) + center = np.array([(img_w - 1.0) / 2, (img_h - 1.0) / 2], dtype=np.float32) scale = np.array([img_w, img_h], dtype=np.float32) - warp_mat = get_udp_warp_matrix( - center=center, - scale=scale, - rot=0, - output_size=actual_input_size) + warp_mat = get_udp_warp_matrix(center=center, scale=scale, rot=0, output_size=actual_input_size) else: center = np.array([img_w / 2, img_h / 2], dtype=np.float32) - scale = np.array([ - img_w * padded_input_size[0] / actual_input_size[0], - img_h * padded_input_size[1] / actual_input_size[1] - ], - dtype=np.float32) - warp_mat = get_warp_matrix( - center=center, - scale=scale, - rot=0, - output_size=padded_input_size) - - _img = cv2.warpAffine( - img, - warp_mat, - padded_input_size, - flags=cv2.INTER_LINEAR, - borderValue=self.pad_val) + scale = np.array( + [img_w * padded_input_size[0] / actual_input_size[0], img_h * padded_input_size[1] / actual_input_size[1]], + dtype=np.float32, + ) + warp_mat = get_warp_matrix(center=center, scale=scale, rot=0, output_size=padded_input_size) + + _img = cv2.warpAffine(img, warp_mat, padded_input_size, flags=cv2.INTER_LINEAR, borderValue=self.pad_val) imgs.append(_img) # Store the transform information w.r.t. the main input size if i == 0: - results['img_shape'] = padded_input_size[::-1] - results['input_center'] = center - results['input_scale'] = scale - results['input_size'] = padded_input_size + results["img_shape"] = padded_input_size[::-1] + results["input_center"] = center + results["input_scale"] = scale + results["input_size"] = padded_input_size if self.aug_scales: - results['img'] = imgs - results['aug_scales'] = self.aug_scales + results["img"] = imgs + results["aug_scales"] = self.aug_scales else: - results['img'] = imgs[0] - results['aug_scale'] = None + results["img"] = imgs[0] + results["aug_scale"] = None return results @@ -672,21 +634,20 @@ class BottomupRandomCrop(BaseTransform): ``allow_negative_crop`` is set to False, skip this image. """ - def __init__(self, - crop_size: tuple, - crop_type: str = 'absolute', - allow_negative_crop: bool = False, - recompute_bbox: bool = False, - bbox_clip_border: bool = True) -> None: - if crop_type not in [ - 'relative_range', 'relative', 'absolute', 'absolute_range' - ]: - raise ValueError(f'Invalid crop_type {crop_type}.') - if crop_type in ['absolute', 'absolute_range']: + def __init__( + self, + crop_size: tuple, + crop_type: str = "absolute", + allow_negative_crop: bool = False, + recompute_bbox: bool = False, + bbox_clip_border: bool = True, + ) -> None: + if crop_type not in ["relative_range", "relative", "absolute", "absolute_range"]: + raise ValueError(f"Invalid crop_type {crop_type}.") + if crop_type in ["absolute", "absolute_range"]: assert crop_size[0] > 0 and crop_size[1] > 0 - assert isinstance(crop_size[0], int) and isinstance( - crop_size[1], int) - if crop_type == 'absolute_range': + assert isinstance(crop_size[0], int) and isinstance(crop_size[1], int) + if crop_type == "absolute_range": assert crop_size[0] <= crop_size[1] else: assert 0 < crop_size[0] <= 1 and 0 < crop_size[1] <= 1 @@ -696,8 +657,7 @@ class BottomupRandomCrop(BaseTransform): self.bbox_clip_border = bbox_clip_border self.recompute_bbox = recompute_bbox - def _crop_data(self, results: dict, crop_size: Tuple[int, int], - allow_negative_crop: bool) -> Union[dict, None]: + def _crop_data(self, results: dict, crop_size: Tuple[int, int], allow_negative_crop: bool) -> Union[dict, None]: """Function to randomly crop images, bounding boxes, masks, semantic segmentation maps. @@ -714,7 +674,7 @@ class BottomupRandomCrop(BaseTransform): be returned when there is no valid bbox after cropping. """ assert crop_size[0] > 0 and crop_size[1] > 0 - img = results['img'] + img = results["img"] margin_h = max(img.shape[0] - crop_size[0], 0) margin_w = max(img.shape[1] - crop_size[1], 0) offset_h, offset_w = self._rand_offset((margin_h, margin_w)) @@ -722,83 +682,71 @@ class BottomupRandomCrop(BaseTransform): crop_x1, crop_x2 = offset_w, offset_w + crop_size[1] # Record the warp matrix for the RandomCrop - warp_mat = np.array([[1, 0, -offset_w], [0, 1, -offset_h], [0, 0, 1]], - dtype=np.float32) - if results.get('warp_mat', None) is None: - results['warp_mat'] = warp_mat + warp_mat = np.array([[1, 0, -offset_w], [0, 1, -offset_h], [0, 0, 1]], dtype=np.float32) + if results.get("warp_mat", None) is None: + results["warp_mat"] = warp_mat else: - results['warp_mat'] = warp_mat @ results['warp_mat'] + results["warp_mat"] = warp_mat @ results["warp_mat"] # crop the image img = img[crop_y1:crop_y2, crop_x1:crop_x2, ...] img_shape = img.shape - results['img'] = img - results['img_shape'] = img_shape[:2] + results["img"] = img + results["img_shape"] = img_shape[:2] # crop bboxes accordingly and clip to the image boundary - if results.get('bbox', None) is not None: + if results.get("bbox", None) is not None: distances = (-offset_w, -offset_h) - bboxes = results['bbox'] + bboxes = results["bbox"] bboxes = bboxes + np.tile(np.asarray(distances), 2) if self.bbox_clip_border: bboxes[..., 0::2] = bboxes[..., 0::2].clip(0, img_shape[1]) bboxes[..., 1::2] = bboxes[..., 1::2].clip(0, img_shape[0]) - valid_inds = (bboxes[..., 0] < img_shape[1]) & \ - (bboxes[..., 1] < img_shape[0]) & \ - (bboxes[..., 2] > 0) & \ - (bboxes[..., 3] > 0) + valid_inds = (bboxes[..., 0] < img_shape[1]) & (bboxes[..., 1] < img_shape[0]) & (bboxes[..., 2] > 0) & (bboxes[..., 3] > 0) # If the crop does not contain any gt-bbox area and # allow_negative_crop is False, skip this image. - if (not valid_inds.any() and not allow_negative_crop): + if not valid_inds.any() and not allow_negative_crop: return None - results['bbox'] = bboxes[valid_inds] - meta_keys = [ - 'bbox_score', 'id', 'category_id', 'raw_ann_info', 'iscrowd' - ] + results["bbox"] = bboxes[valid_inds] + meta_keys = ["bbox_score", "id", "category_id", "raw_ann_info", "iscrowd"] for key in meta_keys: if results.get(key): if isinstance(results[key], list): - results[key] = np.asarray( - results[key])[valid_inds].tolist() + results[key] = np.asarray(results[key])[valid_inds].tolist() else: results[key] = results[key][valid_inds] - if results.get('keypoints', None) is not None: - keypoints = results['keypoints'] + if results.get("keypoints", None) is not None: + keypoints = results["keypoints"] distances = np.asarray(distances).reshape(1, 1, 2) keypoints = keypoints + distances if self.bbox_clip_border: keypoints_outside_x = keypoints[:, :, 0] < 0 keypoints_outside_y = keypoints[:, :, 1] < 0 keypoints_outside_width = keypoints[:, :, 0] > img_shape[1] - keypoints_outside_height = keypoints[:, :, - 1] > img_shape[0] + keypoints_outside_height = keypoints[:, :, 1] > img_shape[0] kpt_outside = np.logical_or.reduce( - (keypoints_outside_x, keypoints_outside_y, - keypoints_outside_width, keypoints_outside_height)) + (keypoints_outside_x, keypoints_outside_y, keypoints_outside_width, keypoints_outside_height) + ) - results['keypoints_visible'][kpt_outside] *= 0 + results["keypoints_visible"][kpt_outside] *= 0 keypoints[:, :, 0] = keypoints[:, :, 0].clip(0, img_shape[1]) keypoints[:, :, 1] = keypoints[:, :, 1].clip(0, img_shape[0]) - results['keypoints'] = keypoints[valid_inds] - results['keypoints_visible'] = results['keypoints_visible'][ - valid_inds] + results["keypoints"] = keypoints[valid_inds] + results["keypoints_visible"] = results["keypoints_visible"][valid_inds] - if results.get('segmentation', None) is not None: - results['segmentation'] = results['segmentation'][ - crop_y1:crop_y2, crop_x1:crop_x2] + if results.get("segmentation", None) is not None: + results["segmentation"] = results["segmentation"][crop_y1:crop_y2, crop_x1:crop_x2] - if results.get('masks', None) is not None: - results['masks'] = results['masks'][valid_inds.nonzero( - )[0]].crop(np.asarray([crop_x1, crop_y1, crop_x2, crop_y2])) + if results.get("masks", None) is not None: + results["masks"] = results["masks"][valid_inds.nonzero()[0]].crop(np.asarray([crop_x1, crop_y1, crop_x2, crop_y2])) if self.recompute_bbox: - results['bbox'] = results['masks'].get_bboxes( - type(results['bbox'])) + results["bbox"] = results["masks"].get_bboxes(type(results["bbox"])) return results @@ -831,17 +779,13 @@ class BottomupRandomCrop(BaseTransform): crop_size (Tuple[int, int]): (crop_h, crop_w) in absolute pixels. """ h, w = image_size - if self.crop_type == 'absolute': + if self.crop_type == "absolute": return min(self.crop_size[1], h), min(self.crop_size[0], w) - elif self.crop_type == 'absolute_range': - crop_h = np.random.randint( - min(h, self.crop_size[0]), - min(h, self.crop_size[1]) + 1) - crop_w = np.random.randint( - min(w, self.crop_size[0]), - min(w, self.crop_size[1]) + 1) + elif self.crop_type == "absolute_range": + crop_h = np.random.randint(min(h, self.crop_size[0]), min(h, self.crop_size[1]) + 1) + crop_w = np.random.randint(min(w, self.crop_size[0]), min(w, self.crop_size[1]) + 1) return crop_h, crop_w - elif self.crop_type == 'relative': + elif self.crop_type == "relative": crop_w, crop_h = self.crop_size return int(h * crop_h + 0.5), int(w * crop_w + 0.5) else: @@ -862,7 +806,7 @@ class BottomupRandomCrop(BaseTransform): key in result dict is updated according to crop size. None will be returned when there is no valid bbox after cropping. """ - image_size = results['img'].shape[:2] + image_size = results["img"].shape[:2] crop_size = self._get_crop_size(image_size) results = self._crop_data(results, crop_size, self.allow_negative_crop) return results @@ -913,7 +857,7 @@ class BottomupRandomChoiceResize(BaseTransform): scales: Sequence[Union[int, Tuple]], keep_ratio: bool = False, clip_object_border: bool = True, - backend: str = 'cv2', + backend: str = "cv2", **resize_kwargs, ) -> None: super().__init__() @@ -945,61 +889,45 @@ class BottomupRandomChoiceResize(BaseTransform): if self.keep_ratio: - img, scale_factor = imrescale( - results['img'], - self.scale, - interpolation='bilinear', - return_scale=True, - backend=self.backend) + img, scale_factor = imrescale(results["img"], self.scale, interpolation="bilinear", return_scale=True, backend=self.backend) # the w_scale and h_scale has minor difference # a real fix should be done in the mmcv.imrescale in the future new_h, new_w = img.shape[:2] - h, w = results['img'].shape[:2] + h, w = results["img"].shape[:2] w_scale = new_w / w h_scale = new_h / h else: - img, w_scale, h_scale = imresize( - results['img'], - self.scale, - interpolation='bilinear', - return_scale=True, - backend=self.backend) - - results['img'] = img - results['img_shape'] = img.shape[:2] - results['scale_factor'] = (w_scale, h_scale) - results['input_size'] = img.shape[:2] - w, h = results['ori_shape'] + img, w_scale, h_scale = imresize(results["img"], self.scale, interpolation="bilinear", return_scale=True, backend=self.backend) + + results["img"] = img + results["img_shape"] = img.shape[:2] + results["scale_factor"] = (w_scale, h_scale) + results["input_size"] = img.shape[:2] + w, h = results["ori_shape"] center = np.array([w / 2, h / 2], dtype=np.float32) scale = np.array([w, h], dtype=np.float32) - results['input_center'] = center - results['input_scale'] = scale + results["input_center"] = center + results["input_scale"] = scale def _resize_bboxes(self, results: dict) -> None: """Resize bounding boxes with ``self.scale``.""" - if results.get('bbox', None) is not None: - bboxes = results['bbox'] * np.tile( - np.array(results['scale_factor']), 2) + if results.get("bbox", None) is not None: + bboxes = results["bbox"] * np.tile(np.array(results["scale_factor"]), 2) if self.clip_object_border: - bboxes[:, 0::2] = np.clip(bboxes[:, 0::2], 0, - results['img_shape'][1]) - bboxes[:, 1::2] = np.clip(bboxes[:, 1::2], 0, - results['img_shape'][0]) - results['bbox'] = bboxes + bboxes[:, 0::2] = np.clip(bboxes[:, 0::2], 0, results["img_shape"][1]) + bboxes[:, 1::2] = np.clip(bboxes[:, 1::2], 0, results["img_shape"][0]) + results["bbox"] = bboxes def _resize_keypoints(self, results: dict) -> None: """Resize keypoints with ``self.scale``.""" - if results.get('keypoints', None) is not None: - keypoints = results['keypoints'] + if results.get("keypoints", None) is not None: + keypoints = results["keypoints"] - keypoints[:, :, :2] = keypoints[:, :, :2] * np.array( - results['scale_factor']) + keypoints[:, :, :2] = keypoints[:, :, :2] * np.array(results["scale_factor"]) if self.clip_object_border: - keypoints[:, :, 0] = np.clip(keypoints[:, :, 0], 0, - results['img_shape'][1]) - keypoints[:, :, 1] = np.clip(keypoints[:, :, 1], 0, - results['img_shape'][0]) - results['keypoints'] = keypoints + keypoints[:, :, 0] = np.clip(keypoints[:, :, 0], 0, results["img_shape"][1]) + keypoints[:, :, 1] = np.clip(keypoints[:, :, 1], 0, results["img_shape"][0]) + results["keypoints"] = keypoints def transform(self, results: dict) -> dict: """Apply resize transforms on results from a list of scales. @@ -1020,5 +948,5 @@ class BottomupRandomChoiceResize(BaseTransform): self._resize_bboxes(results) self._resize_keypoints(results) - results['scale_idx'] = scale_idx + results["scale_idx"] = scale_idx return results diff --git a/mmpose/datasets/transforms/common_transforms.py b/mmpose/datasets/transforms/common_transforms.py index c469c0ea4bea634f27bc837528eeddbf1e6f02fd..6c3f5af1da90e328e078f08c48712f9697d0fcca 100644 --- a/mmpose/datasets/transforms/common_transforms.py +++ b/mmpose/datasets/transforms/common_transforms.py @@ -1,4 +1,6 @@ -# Copyright (c) OpenMMLab. All rights reserved. +# Copyright (c) OpenMMLab and Miroslav Purkrabek, ProbPose. +# Edited by MP based on the original code in mmpose. +# All rights reserved. import warnings from copy import deepcopy from typing import Dict, List, Optional, Sequence, Tuple, Union @@ -12,17 +14,16 @@ from mmcv.transforms import BaseTransform from mmcv.transforms.utils import avoid_cache_randomness, cache_randomness from mmengine import is_list_of from mmengine.dist import get_dist_info -from scipy.stats import truncnorm +from pycocotools import mask as Mask from scipy.ndimage import distance_transform_edt +from scipy.stats import truncnorm from mmpose.codecs import * # noqa: F401, F403 from mmpose.registry import KEYPOINT_CODECS, TRANSFORMS -from mmpose.structures.bbox import bbox_xyxy2cs, flip_bbox, bbox_cs2xyxy +from mmpose.structures.bbox import bbox_cs2xyxy, bbox_xyxy2cs, flip_bbox from mmpose.structures.keypoint import flip_keypoints from mmpose.utils.typing import MultiConfig -from pycocotools import mask as Mask - try: import albumentations except ImportError: @@ -68,23 +69,22 @@ class GetBBoxCenterScale(BaseTransform): Returns: dict: The result dict. """ - + # Save the original bbox wrt. input - results['bbox_xyxy_wrt_input'] = results['bbox'] - - if 'bbox_center' in results and 'bbox_scale' in results: + results["bbox_xyxy_wrt_input"] = results["bbox"] + + if "bbox_center" in results and "bbox_scale" in results: rank, _ = get_dist_info() if rank == 0: - warnings.warn('Use the existing "bbox_center" and "bbox_scale"' - '. The padding will still be applied.') - results['bbox_scale'] = results['bbox_scale'] * self.padding + warnings.warn('Use the existing "bbox_center" and "bbox_scale"' ". The padding will still be applied.") + results["bbox_scale"] = results["bbox_scale"] * self.padding else: - bbox = results['bbox'] + bbox = results["bbox"] center, scale = bbox_xyxy2cs(bbox, padding=self.padding) - results['bbox_center'] = center - results['bbox_scale'] = scale + results["bbox_center"] = center + results["bbox_scale"] = scale return results @@ -94,7 +94,7 @@ class GetBBoxCenterScale(BaseTransform): Returns: str: Formatted string. """ - repr_str = self.__class__.__name__ + f'(padding={self.padding})' + repr_str = self.__class__.__name__ + f"(padding={self.padding})" return repr_str @@ -119,7 +119,8 @@ class MaskBackground(BaseTransform): `bbox_scale`. Defaults to 1.25 """ - def __init__(self, + def __init__( + self, continue_on_failure: bool = True, prob: float = 1.0, alpha: float = 1.0, @@ -127,22 +128,24 @@ class MaskBackground(BaseTransform): erode_amount: float = 0.5, dilate_prob: float = 0.0, dilate_amount: float = 0.5, + patches_computation_method: str = "voronoi", + context_size: float = 1.25, ) -> None: - + super().__init__() - - assert 0 <= alpha <= 1, 'alpha should be in [0, 1]' - assert 0 <= prob <= 1, 'prob should be in [0, 1]' + + assert 0 <= alpha <= 1, "alpha should be in [0, 1]" + assert 0 <= prob <= 1, "prob should be in [0, 1]" self.continue_on_failure = continue_on_failure self.alpha = alpha self.prob = prob - - assert 0 <= erode_prob <= 1, 'erode_prob should be in [0, 1]' - assert 0 <= dilate_prob <= 1, 'dilate_prob should be in [0, 1]' - assert 0 < erode_amount < 1, 'erode_amount should be in [0, 1]' - assert 0 < dilate_amount < 1, 'dilate_amount should be in [0, 1]' - assert erode_prob + dilate_prob <= 1, 'erode_prob + dilate_prob should be less than or equal to 1' + + assert 0 <= erode_prob <= 1, "erode_prob should be in [0, 1]" + assert 0 <= dilate_prob <= 1, "dilate_prob should be in [0, 1]" + assert 0 < erode_amount < 1, "erode_amount should be in [0, 1]" + assert 0 < dilate_amount < 1, "dilate_amount should be in [0, 1]" + assert erode_prob + dilate_prob <= 1, "erode_prob + dilate_prob should be less than or equal to 1" self.noise_prob = erode_prob + dilate_prob if self.noise_prob > 0: self.erode_prob = erode_prob / (self.noise_prob) @@ -153,12 +156,14 @@ class MaskBackground(BaseTransform): self.erode_amount = erode_amount self.dilate_amount = dilate_amount + self.patches_computation_method = patches_computation_method + self.context_size = context_size def _perturb_by_dilation(self, mask: np.ndarray) -> np.ndarray: """Perturb the mask to simulate real-world detector.""" mask_shape = mask.shape - mask_area = (mask>0).sum() + mask_area = (mask > 0).sum() # Close the mask to erase small holes k = max(mask_area // 1000, 5) @@ -169,14 +174,14 @@ class MaskBackground(BaseTransform): k = max(mask_area // 3000, 5) kernel = np.ones((k, k), np.uint8) mask = cv2.dilate(mask, kernel, iterations=1) - + return mask.reshape(mask_shape) def _perturb_by_erosion(self, mask: np.ndarray) -> np.ndarray: """Perturb the mask to simulate real-world detector.""" mask_shape = mask.shape - mask_area = (mask>0).sum() + mask_area = (mask > 0).sum() # Close the mask to erase small holes k = max(mask_area // 1000, 5) @@ -187,13 +192,13 @@ class MaskBackground(BaseTransform): k = max(mask_area // 3000, 5) kernel = np.ones((k, k), np.uint8) mask = cv2.erode(mask, kernel, iterations=1) - + return mask.reshape(mask_shape) @cache_randomness def _perturb_by_patches(self, mask: np.ndarray, amount: float, num_patches: int = 10) -> np.ndarray: mask_shape = mask.shape - + # Generate 10 random seeds uniformly distributed in the mask mask_idx = np.where(mask.flatten() > 0)[0] seeds = np.random.choice(mask_idx, num_patches, replace=False) @@ -232,21 +237,21 @@ class MaskBackground(BaseTransform): return mask # Erode and dilate the mask to increase smoothness - kernel = np.ones((5, 5), np.uint8) + kernel = np.ones((5, 5), np.uint8) mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel) mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel) - + increase_mask = np.random.choice([False, True], p=[self.erode_prob, self.dilate_prob]) - + if increase_mask: if self._coin_flip(): try: mask = self._perturb_by_patches( - mask=1-mask, + mask=1 - mask, amount=self.dilate_amount, num_patches=50, ) - mask = 1-mask + mask = 1 - mask except ValueError: pass else: @@ -266,7 +271,7 @@ class MaskBackground(BaseTransform): else: mask = self._perturb_by_erosion(mask) - mask = (mask>0).astype(np.uint8) + mask = (mask > 0).astype(np.uint8) return mask.reshape(mask_shape) @cache_randomness @@ -285,17 +290,21 @@ class MaskBackground(BaseTransform): dict: The result dict. """ - # Try to load the mask from the results - mask = results.get('segmentation', None) + # Try to load the mask from the results + mask = results.get("segmentation", None) # print("\nMaskBackground: ", mask is not None) if mask is None and not self.continue_on_failure: - raise ValueError('No mask found in the results and self.continue_on_failure is set to False.') + raise ValueError("No mask found in the results and self.continue_on_failure is set to False.") if mask is not None and self._do_masking(): # Convert mask from polygons to binary mask - try: - mask_rle = Mask.frPyObjects(mask, results['img_shape'][0], results['img_shape'][1]) + try: + try: + mask_rle = Mask.frPyObjects(mask, results["img_shape"][0], results["img_shape"][1]) + mask_rle = Mask.merge(mask_rle) + except ValueError: + mask_rle = mask except IndexError: # breakpoint() # print("Mask shape:", mask.shape) @@ -304,11 +313,8 @@ class MaskBackground(BaseTransform): # print("Image shape:", results['img_shape']) return results - - - mask_rle = Mask.merge(mask_rle) - img = results['img'].copy() - masked_image = results['img'].copy() + img = results["img"].copy() + masked_image = results["img"].copy() mask = Mask.decode(mask_rle).reshape((img.shape[0], img.shape[1], 1)) binary_mask = (mask > 0).astype(np.uint8) @@ -318,7 +324,7 @@ class MaskBackground(BaseTransform): binary_mask = self._perturb_mask(binary_mask) masked_image = masked_image * binary_mask - results['img'] = cv2.addWeighted(img, 1 - self.alpha, masked_image, self.alpha, 0) + results["img"] = cv2.addWeighted(img, 1 - self.alpha, masked_image, self.alpha, 0) # hash_id = abs(hash(555)) # cv2.imwrite("tmp_visualization/_perturbed_mask_{:d}.jpg".format(hash_id), mask * 255) @@ -328,8 +334,8 @@ class MaskBackground(BaseTransform): # Save the mask as a binary mask # Save the image - img = results['img'] - + img = results["img"] + # img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) # cv2.imwrite("tmp_visualization/masked_image_{:d}.jpg".format(abs(hash(555))), img) @@ -341,7 +347,7 @@ class MaskBackground(BaseTransform): Returns: str: Formatted string. """ - repr_str = self.__class__.__name__ + f'(continue_on_failure={self.continue_on_failure})' + repr_str = self.__class__.__name__ + f"(continue_on_failure={self.continue_on_failure})" return repr_str @@ -390,28 +396,30 @@ class RandomFlip(BaseTransform): to ``'horizontal'``. """ - def __init__(self, - prob: Union[float, List[float]] = 0.5, - direction: Union[str, List[str]] = 'horizontal') -> None: + def __init__(self, prob: Union[float, List[float]] = 0.5, direction: Union[str, List[str]] = "horizontal") -> None: if isinstance(prob, list): assert is_list_of(prob, float) assert 0 <= sum(prob) <= 1 elif isinstance(prob, float): assert 0 <= prob <= 1 else: - raise ValueError(f'probs must be float or list of float, but \ - got `{type(prob)}`.') + raise ValueError( + f"probs must be float or list of float, but \ + got `{type(prob)}`." + ) self.prob = prob - valid_directions = ['horizontal', 'vertical', 'diagonal'] + valid_directions = ["horizontal", "vertical", "diagonal"] if isinstance(direction, str): assert direction in valid_directions elif isinstance(direction, list): assert is_list_of(direction, str) assert set(direction).issubset(set(valid_directions)) else: - raise ValueError(f'direction must be either str or list of str, \ - but got `{type(direction)}`.') + raise ValueError( + f"direction must be either str or list of str, \ + but got `{type(direction)}`." + ) self.direction = direction if isinstance(prob, list): @@ -420,8 +428,7 @@ class RandomFlip(BaseTransform): @cache_randomness def _choose_direction(self) -> str: """Choose the flip direction according to `prob` and `direction`""" - if isinstance(self.direction, - List) and not isinstance(self.direction, str): + if isinstance(self.direction, List) and not isinstance(self.direction, str): # None means non-flip direction_list: list = list(self.direction) + [None] elif isinstance(self.direction, str): @@ -432,7 +439,7 @@ class RandomFlip(BaseTransform): non_prob: float = 1 - sum(self.prob) prob_list = self.prob + [non_prob] elif isinstance(self.prob, float): - non_prob = 1. - self.prob + non_prob = 1.0 - self.prob # exclude non-flip single_ratio = self.prob / (len(direction_list) - 1) prob_list = [single_ratio] * (len(direction_list) - 1) + [non_prob] @@ -456,66 +463,55 @@ class RandomFlip(BaseTransform): flip_dir = self._choose_direction() if flip_dir is None: - results['flip'] = False - results['flip_direction'] = None + results["flip"] = False + results["flip_direction"] = None else: - results['flip'] = True - results['flip_direction'] = flip_dir + results["flip"] = True + results["flip_direction"] = flip_dir - h, w = results.get('input_size', results['img_shape']) + h, w = results.get("input_size", results["img_shape"]) # flip image and mask - if isinstance(results['img'], list): - results['img'] = [ - imflip(img, direction=flip_dir) for img in results['img'] - ] + if isinstance(results["img"], list): + results["img"] = [imflip(img, direction=flip_dir) for img in results["img"]] else: - results['img'] = imflip(results['img'], direction=flip_dir) + results["img"] = imflip(results["img"], direction=flip_dir) - if 'img_mask' in results: - results['img_mask'] = imflip( - results['img_mask'], direction=flip_dir) + if "img_mask" in results: + results["img_mask"] = imflip(results["img_mask"], direction=flip_dir) # flip bboxes - if results.get('bbox', None) is not None: - results['bbox'] = flip_bbox( - results['bbox'], - image_size=(w, h), - bbox_format='xyxy', - direction=flip_dir) - + if results.get("bbox", None) is not None: + results["bbox"] = flip_bbox(results["bbox"], image_size=(w, h), bbox_format="xyxy", direction=flip_dir) + # flip bboxes - if results.get('bbox_xyxy_wrt_input', None) is not None: - results['bbox_xyxy_wrt_input'] = flip_bbox( - results['bbox_xyxy_wrt_input'], - image_size=(w, h), - bbox_format='xyxy', - direction=flip_dir) + if results.get("bbox_xyxy_wrt_input", None) is not None: + results["bbox_xyxy_wrt_input"] = flip_bbox( + results["bbox_xyxy_wrt_input"], image_size=(w, h), bbox_format="xyxy", direction=flip_dir + ) - if results.get('bbox_center', None) is not None: - results['bbox_center'] = flip_bbox( - results['bbox_center'], - image_size=(w, h), - bbox_format='center', - direction=flip_dir) + if results.get("bbox_center", None) is not None: + results["bbox_center"] = flip_bbox(results["bbox_center"], image_size=(w, h), bbox_format="center", direction=flip_dir) # flip keypoints - if results.get('keypoints', None) is not None: + if results.get("keypoints", None) is not None: keypoints, keypoints_visible = flip_keypoints( - results['keypoints'], - results.get('keypoints_visible', None), + results["keypoints"], + results.get("keypoints_visible", None), image_size=(w, h), - flip_indices=results['flip_indices'], - direction=flip_dir) + flip_indices=results["flip_indices"], + direction=flip_dir, + ) _, keypoints_visibility = flip_keypoints( - results['keypoints'], - results.get('keypoints_visibility', None), + results["keypoints"], + results.get("keypoints_visibility", None), image_size=(w, h), - flip_indices=results['flip_indices'], - direction=flip_dir) + flip_indices=results["flip_indices"], + direction=flip_dir, + ) - results['keypoints'] = keypoints - results['keypoints_visible'] = keypoints_visible - results['keypoints_visibility'] = keypoints_visibility + results["keypoints"] = keypoints + results["keypoints_visible"] = keypoints_visible + results["keypoints_visibility"] = keypoints_visibility return results @@ -526,8 +522,8 @@ class RandomFlip(BaseTransform): str: Formatted string. """ repr_str = self.__class__.__name__ - repr_str += f'(prob={self.prob}, ' - repr_str += f'direction={self.direction})' + repr_str += f"(prob={self.prob}, " + repr_str += f"direction={self.direction})" return repr_str @@ -561,13 +557,15 @@ class RandomHalfBody(BaseTransform): keypoint number meets the requirement. Defaults to 0.3 """ - def __init__(self, - min_total_keypoints: int = 9, - min_upper_keypoints: int = 2, - min_lower_keypoints: int = 3, - padding: float = 1.5, - prob: float = 0.3, - upper_prioritized_prob: float = 0.7) -> None: + def __init__( + self, + min_total_keypoints: int = 9, + min_upper_keypoints: int = 2, + min_lower_keypoints: int = 3, + padding: float = 1.5, + prob: float = 0.3, + upper_prioritized_prob: float = 0.7, + ) -> None: super().__init__() self.min_total_keypoints = min_total_keypoints self.min_upper_keypoints = min_upper_keypoints @@ -576,9 +574,7 @@ class RandomHalfBody(BaseTransform): self.prob = prob self.upper_prioritized_prob = upper_prioritized_prob - def _get_half_body_bbox(self, keypoints: np.ndarray, - half_body_ids: List[int] - ) -> Tuple[np.ndarray, np.ndarray]: + def _get_half_body_bbox(self, keypoints: np.ndarray, half_body_ids: List[int]) -> Tuple[np.ndarray, np.ndarray]: """Get half-body bbox center and scale of a single instance. Args: @@ -601,11 +597,13 @@ class RandomHalfBody(BaseTransform): scale = np.array([w, h], dtype=center.dtype) * self.padding return center, scale - - def _get_half_body_exact_bbox(self, keypoints: np.ndarray, - half_body_ids: List[int], - bbox: np.ndarray, - ) -> np.ndarray: + + def _get_half_body_exact_bbox( + self, + keypoints: np.ndarray, + half_body_ids: List[int], + bbox: np.ndarray, + ) -> np.ndarray: """Get half-body bbox center and scale of a single instance. Args: @@ -639,10 +637,9 @@ class RandomHalfBody(BaseTransform): return np.array([x1, y1, x2, y2]) @cache_randomness - def _random_select_half_body(self, keypoints_visible: np.ndarray, - upper_body_ids: List[int], - lower_body_ids: List[int] - ) -> List[Optional[List[int]]]: + def _random_select_half_body( + self, keypoints_visible: np.ndarray, upper_body_ids: List[int], lower_body_ids: List[int] + ) -> List[Optional[List[int]]]: """Randomly determine whether applying half-body transform and get the half-body keyponit indices of each instances. @@ -675,16 +672,14 @@ class RandomHalfBody(BaseTransform): num_lower = len(lower_valid_ids) prefer_upper = np.random.rand() < self.upper_prioritized_prob - if (num_upper < self.min_upper_keypoints - and num_lower < self.min_lower_keypoints): + if num_upper < self.min_upper_keypoints and num_lower < self.min_lower_keypoints: indices = None elif num_lower < self.min_lower_keypoints: indices = upper_valid_ids elif num_upper < self.min_upper_keypoints: indices = lower_valid_ids else: - indices = ( - upper_valid_ids if prefer_upper else lower_valid_ids) + indices = upper_valid_ids if prefer_upper else lower_valid_ids half_body_ids.append(indices) @@ -702,9 +697,10 @@ class RandomHalfBody(BaseTransform): dict: The result dict. """ half_body_ids = self._random_select_half_body( - keypoints_visible=results['keypoints_visible'], - upper_body_ids=results['upper_body_ids'], - lower_body_ids=results['lower_body_ids']) + keypoints_visible=results["keypoints_visible"], + upper_body_ids=results["upper_body_ids"], + lower_body_ids=results["lower_body_ids"], + ) bbox_center = [] bbox_scale = [] @@ -713,21 +709,19 @@ class RandomHalfBody(BaseTransform): for i, indices in enumerate(half_body_ids): if indices is None: - bbox_center.append(results['bbox_center'][i]) - bbox_scale.append(results['bbox_scale'][i]) - bbox_xyxy_wrt_input.append(results['bbox_xyxy_wrt_input'][i]) + bbox_center.append(results["bbox_center"][i]) + bbox_scale.append(results["bbox_scale"][i]) + bbox_xyxy_wrt_input.append(results["bbox_xyxy_wrt_input"][i]) else: - _center, _scale = self._get_half_body_bbox( - results['keypoints'][i], indices) + _center, _scale = self._get_half_body_bbox(results["keypoints"][i], indices) bbox_center.append(_center) bbox_scale.append(_scale) - exact_bbox = self._get_half_body_exact_bbox( - results['keypoints'][i], indices, results['bbox_xyxy_wrt_input'][i]) + exact_bbox = self._get_half_body_exact_bbox(results["keypoints"][i], indices, results["bbox_xyxy_wrt_input"][i]) bbox_xyxy_wrt_input.append(exact_bbox) - results['bbox_center'] = np.stack(bbox_center) - results['bbox_scale'] = np.stack(bbox_scale) - results['bbox_xyxy_wrt_input'] = np.stack(bbox_xyxy_wrt_input) + results["bbox_center"] = np.stack(bbox_center) + results["bbox_scale"] = np.stack(bbox_scale) + results["bbox_xyxy_wrt_input"] = np.stack(bbox_xyxy_wrt_input) return results def __repr__(self) -> str: @@ -737,12 +731,12 @@ class RandomHalfBody(BaseTransform): str: Formatted string. """ repr_str = self.__class__.__name__ - repr_str += f'(min_total_keypoints={self.min_total_keypoints}, ' - repr_str += f'min_upper_keypoints={self.min_upper_keypoints}, ' - repr_str += f'min_lower_keypoints={self.min_lower_keypoints}, ' - repr_str += f'padding={self.padding}, ' - repr_str += f'prob={self.prob}, ' - repr_str += f'upper_prioritized_prob={self.upper_prioritized_prob})' + repr_str += f"(min_total_keypoints={self.min_total_keypoints}, " + repr_str += f"min_upper_keypoints={self.min_upper_keypoints}, " + repr_str += f"min_lower_keypoints={self.min_lower_keypoints}, " + repr_str += f"padding={self.padding}, " + repr_str += f"prob={self.prob}, " + repr_str += f"upper_prioritized_prob={self.upper_prioritized_prob})" return repr_str @@ -769,10 +763,7 @@ class RandomPatchesBlackout(BaseTransform): prob (float): The probability to apply black patches. Defaults to 0.8 """ - def __init__(self, - grid_size: Tuple[int, int] = (8, 6), - mask_ratio: float = 0.3, - prob: float = 0.8) -> None: + def __init__(self, grid_size: Tuple[int, int] = (8, 6), mask_ratio: float = 0.3, prob: float = 0.8) -> None: super().__init__() self.grid_size = grid_size self.mask_ratio = mask_ratio @@ -783,21 +774,16 @@ class RandomPatchesBlackout(BaseTransform): black_patches = np.zeros((grid_h, grid_w), dtype=bool) if np.random.rand() < self.prob: - + # Split image into grid num_patches = int(self.grid_size[0] * self.grid_size[1]) # Randomly choose patches to blackout - black_patches = np.random.choice( - [0, 1], - num_patches, - p=[1 - self.mask_ratio, self.mask_ratio] - ) + black_patches = np.random.choice([0, 1], num_patches, p=[1 - self.mask_ratio, self.mask_ratio]) black_patches = black_patches.reshape(grid_h, grid_w).astype(bool) return black_patches - def transform(self, results: Dict) -> Optional[dict]: """The transform function of :class:`HalfBodyTransform`. @@ -810,13 +796,13 @@ class RandomPatchesBlackout(BaseTransform): dict: The result dict. """ - img = results['img'] - + img = results["img"] + if "transformed_keypoints" in results: - kpts = results['transformed_keypoints'].squeeze() + kpts = results["transformed_keypoints"].squeeze() else: - kpts = results['keypoints'].squeeze() - + kpts = results["keypoints"].squeeze() + h, w = img.shape[:2] grid_h, grid_w = self.grid_size dh = np.ceil(h / grid_h).astype(int) @@ -828,18 +814,12 @@ class RandomPatchesBlackout(BaseTransform): for j in range(grid_w): if black_patches[i, j]: # Set all pixel in the patch to black - img[i*dh : (i+1)*dh, j*dw : (j+1)*dw, :] = 0 - + img[i * dh : (i + 1) * dh, j * dw : (j + 1) * dw, :] = 0 # Set keypoints in the patch to invisible - in_black = ( - (kpts[:, 0] >= j*dw) & - (kpts[:, 0] < (j+1)*dw) & - (kpts[:, 1] >= i*dh) & - (kpts[:, 1] < (i+1)*dh) - ) - results['keypoints_visibility'][:, in_black] = 0 - + in_black = (kpts[:, 0] >= j * dw) & (kpts[:, 0] < (j + 1) * dw) & (kpts[:, 1] >= i * dh) & (kpts[:, 1] < (i + 1) * dh) + results["keypoints_visibility"][:, in_black] = 0 + return results def __repr__(self) -> str: @@ -849,9 +829,9 @@ class RandomPatchesBlackout(BaseTransform): str: Formatted string. """ repr_str = self.__class__.__name__ - repr_str += f'(grid_size={self.grid_size}, ' - repr_str += f'mask_ratio={self.mask_ratio}, ' - repr_str += f'prob={self.prob})' + repr_str += f"(grid_size={self.grid_size}, " + repr_str += f"mask_ratio={self.mask_ratio}, " + repr_str += f"prob={self.prob})" return repr_str @@ -878,11 +858,9 @@ class RandomEdgesBlackout(BaseTransform): texture_prob (float): The probability to apply texture to the blackout area. Defaults to 0.0 """ - def __init__(self, - mask_ratio_range: tuple[float, float] = (0.1, 0.3), - prob: float = 0.8, - texture_prob: float = 0.0, - context_size:float = 1.25) -> None: + def __init__( + self, mask_ratio_range: tuple[float, float] = (0.1, 0.3), prob: float = 0.8, texture_prob: float = 0.0, context_size: float = 1.25 + ) -> None: super().__init__() self.mask_ratio_range = mask_ratio_range self.prob = prob @@ -911,7 +889,7 @@ class RandomEdgesBlackout(BaseTransform): y0 = np.maximum(y0, 0).astype(int) x1 = np.minimum(x1, w).astype(int) y1 = np.minimum(y1, h).astype(int) - + # Set default values x = 0 y = 0 @@ -920,35 +898,31 @@ class RandomEdgesBlackout(BaseTransform): is_textured = False if np.random.rand() < self.prob: - + # Generate random rectangle to keep - rh, rw = np.random.uniform( - 1-self.mask_ratio_range[1], - 1-self.mask_ratio_range[0], - 2 - ) - dh = int((y1-y0) * rh) - dw = int((x1-x0) * rw) - x_end = x1-dw if x1-dw > x0 else x0+1 - y_end = y1-dh if y1-dh > y0 else y0+1 + rh, rw = np.random.uniform(1 - self.mask_ratio_range[1], 1 - self.mask_ratio_range[0], 2) + dh = int((y1 - y0) * rh) + dw = int((x1 - x0) * rw) + x_end = x1 - dw if x1 - dw > x0 else x0 + 1 + y_end = y1 - dh if y1 - dh > y0 else y0 + 1 try: x = np.random.randint(x0, x_end) y = np.random.randint(y0, y_end) except ValueError: - print(x, x0, dw, x1, x1-dw, x_end) - print(y, y0, dh, y1, y1-dh, y_end) + print(x, x0, dw, x1, x1 - dw, x_end) + print(y, y0, dh, y1, y1 - dh, y_end) raise ValueError # Set all pixel outside of the rectangle to black - mask[y:y+dh, x:x+dw] = True - + mask[y : y + dh, x : x + dw] = True + # Invert the mask. True means blackout mask = ~mask # Add texture is_textured = np.random.rand() < self.texture_prob - return mask, (x, y, dw+x, dh+y), is_textured + return mask, (x, y, dw + x, dh + y), is_textured def _get_random_color(self) -> np.ndarray: """Get random color. @@ -960,10 +934,7 @@ class RandomEdgesBlackout(BaseTransform): s = np.random.uniform(0.75, 1) l = np.random.uniform(0.3, 0.7) hls_color = np.array([h, l, s]) - rgb_color = cv2.cvtColor( - np.array([[hls_color]], dtype=np.float32), - cv2.COLOR_HLS2RGB - ).squeeze() * 255 + rgb_color = cv2.cvtColor(np.array([[hls_color]], dtype=np.float32), cv2.COLOR_HLS2RGB).squeeze() * 255 color = rgb_color.astype(np.uint8) return color.tolist() @@ -977,15 +948,17 @@ class RandomEdgesBlackout(BaseTransform): Returns: np.array: texture """ - mode = np.random.choice([ - 'lines', - 'squares', - 'circles', - # 'noise', - # 'uniform', - ]) - - if mode == 'lines': + mode = np.random.choice( + [ + "lines", + "squares", + "circles", + # 'noise', + # 'uniform', + ] + ) + + if mode == "lines": texture = np.zeros((h, w, 3), dtype=np.uint8) texture[:, :, :] = self._get_random_color() num_lines = np.random.randint(1, 20) @@ -995,7 +968,7 @@ class RandomEdgesBlackout(BaseTransform): line_width = np.random.randint(1, 10) color = self._get_random_color() cv2.line(texture, (x1, y1), (x2, y2), color, line_width) - elif mode == 'squares': + elif mode == "squares": texture = np.zeros((h, w, 3), dtype=np.uint8) texture[:, :, :] = self._get_random_color() num_squares = np.random.randint(1, 20) @@ -1004,7 +977,7 @@ class RandomEdgesBlackout(BaseTransform): x2, y2 = np.random.randint(0, w), np.random.randint(0, h) color = self._get_random_color() cv2.rectangle(texture, (x1, y1), (x2, y2), color, -1) - elif mode == 'circles': + elif mode == "circles": texture = np.zeros((h, w, 3), dtype=np.uint8) texture[:, :, :] = self._get_random_color() num_circles = np.random.randint(1, 20) @@ -1013,9 +986,9 @@ class RandomEdgesBlackout(BaseTransform): r = np.random.randint(1, min(w, h) // 2) color = self._get_random_color() cv2.circle(texture, (x, y), r, color, -1) - elif mode == 'noise': + elif mode == "noise": texture = np.random.randint(0, 256, (h, w, 3), dtype=np.uint8) - elif mode == 'uniform': + elif mode == "uniform": texture = np.zeros((h, w, 3), dtype=np.uint8) texture[:, :, :] = self._get_random_color() @@ -1033,15 +1006,15 @@ class RandomEdgesBlackout(BaseTransform): dict: The result dict. """ - img = results['img'] - + img = results["img"] + if "transformed_keypoints" in results: - kpts = results['transformed_keypoints'].squeeze() + kpts = results["transformed_keypoints"].squeeze() else: - kpts = results['keypoints'].squeeze() + kpts = results["keypoints"].squeeze() # Generate random mask - mask, (x1, y1, x2, y2), is_textured = self._get_random_mask(img.shape[1], img.shape[0], results['bbox_xyxy_wrt_input'].flatten()) + mask, (x1, y1, x2, y2), is_textured = self._get_random_mask(img.shape[1], img.shape[0], results["bbox_xyxy_wrt_input"].flatten()) # breakpoint() # print("img shape", img.shape) # print("results", results.keys()) @@ -1054,28 +1027,23 @@ class RandomEdgesBlackout(BaseTransform): else: # Set all pixel outside of the rectangle to black img[mask, :] = 0 - results['img'] = img + results["img"] = img # Set keypoints outside of the rectangle to invisible - in_rect = ( - (kpts[:, 0] >= x1) & - (kpts[:, 0] < x2) & - (kpts[:, 1] >= y1) & - (kpts[:, 1] < y2) - ) - results['keypoints_visibility'][:, ~in_rect] = 0 + in_rect = (kpts[:, 0] >= x1) & (kpts[:, 0] < x2) & (kpts[:, 1] >= y1) & (kpts[:, 1] < y2) + results["keypoints_visibility"][:, ~in_rect] = 0 # Create new entry describing keypoints in the 'cropped' area - results['keypoints_in_image'] = in_rect.squeeze().astype(int) + results["keypoints_in_image"] = in_rect.squeeze().astype(int) # Crop the bbox_xyxy_wrt_input according to the blackout area - if 'bbox_xyxy_wrt_input' in results: - bbox_xyxy = results['bbox_xyxy_wrt_input'].flatten() + if "bbox_xyxy_wrt_input" in results: + bbox_xyxy = results["bbox_xyxy_wrt_input"].flatten() bbox_xyxy[0] = np.maximum(bbox_xyxy[0], x1) bbox_xyxy[1] = np.maximum(bbox_xyxy[1], y1) bbox_xyxy[2] = np.minimum(bbox_xyxy[2], x2) bbox_xyxy[3] = np.minimum(bbox_xyxy[3], y2) - results['bbox_xyxy_wrt_input'] = bbox_xyxy.reshape(-1, 4) + results["bbox_xyxy_wrt_input"] = bbox_xyxy.reshape(-1, 4) return results @@ -1086,9 +1054,9 @@ class RandomEdgesBlackout(BaseTransform): str: Formatted string. """ repr_str = self.__class__.__name__ - repr_str += f'(mask_ratio_range={self.mask_ratio_range}, ' - repr_str += f'prob={self.prob}), ' - repr_str += f'texture_prob={self.texture_prob})' + repr_str += f"(mask_ratio_range={self.mask_ratio_range}, " + repr_str += f"prob={self.prob}), " + repr_str += f"texture_prob={self.texture_prob})" return repr_str @@ -1127,13 +1095,15 @@ class RandomBBoxTransform(BaseTransform): to 0.6 """ - def __init__(self, - shift_factor: float = 0.16, - shift_prob: float = 0.3, - scale_factor: Tuple[float, float] = (0.5, 1.5), - scale_prob: float = 1.0, - rotate_factor: float = 80.0, - rotate_prob: float = 0.6) -> None: + def __init__( + self, + shift_factor: float = 0.16, + shift_prob: float = 0.3, + scale_factor: Tuple[float, float] = (0.5, 1.5), + scale_prob: float = 1.0, + rotate_factor: float = 80.0, + rotate_prob: float = 0.6, + ) -> None: super().__init__() self.shift_factor = shift_factor @@ -1144,9 +1114,7 @@ class RandomBBoxTransform(BaseTransform): self.rotate_prob = rotate_prob @staticmethod - def _truncnorm(low: float = -1., - high: float = 1., - size: tuple = ()) -> np.ndarray: + def _truncnorm(low: float = -1.0, high: float = 1.0, size: tuple = ()) -> np.ndarray: """Sample from a truncated normal distribution.""" return truncnorm.rvs(low, high, size=size).astype(np.float32) @@ -1170,21 +1138,18 @@ class RandomBBoxTransform(BaseTransform): # Get shift parameters offset = offset_v * self.shift_factor - offset = np.where( - np.random.rand(num_bboxes, 1) < self.shift_prob, offset, 0.) + offset = np.where(np.random.rand(num_bboxes, 1) < self.shift_prob, offset, 0.0) # Get scaling parameters scale_min, scale_max = self.scale_factor mu = (scale_max + scale_min) * 0.5 sigma = (scale_max - scale_min) * 0.5 scale = scale_v * sigma + mu - scale = np.where( - np.random.rand(num_bboxes, 1) < self.scale_prob, scale, 1.) + scale = np.where(np.random.rand(num_bboxes, 1) < self.scale_prob, scale, 1.0) # Get rotation parameters rotate = rotate_v * self.rotate_factor - rotate = np.where( - np.random.rand(num_bboxes) < self.rotate_prob, rotate, 0.) + rotate = np.where(np.random.rand(num_bboxes) < self.rotate_prob, rotate, 0.0) return offset, scale, rotate @@ -1199,21 +1164,21 @@ class RandomBBoxTransform(BaseTransform): Returns: dict: The result dict. """ - bbox_scale = results['bbox_scale'] + bbox_scale = results["bbox_scale"] num_bboxes = bbox_scale.shape[0] offset, scale, rotate = self._get_transform_params(num_bboxes) - results['bbox_center'] = results['bbox_center'] + offset * bbox_scale - results['bbox_scale'] = results['bbox_scale'] * scale - results['bbox_rotation'] = rotate + results["bbox_center"] = results["bbox_center"] + offset * bbox_scale + results["bbox_scale"] = results["bbox_scale"] * scale + results["bbox_rotation"] = rotate - bbox_xyxy_wrt_input = results.get('bbox_xyxy_wrt_input', None) + bbox_xyxy_wrt_input = results.get("bbox_xyxy_wrt_input", None) if bbox_xyxy_wrt_input is not None: _c, _s = bbox_xyxy2cs(bbox_xyxy_wrt_input, padding=1.0) _c = _c + offset * _s _s = _s * scale - results['bbox_xyxy_wrt_input'] = bbox_cs2xyxy(_c, _s).flatten() + results["bbox_xyxy_wrt_input"] = bbox_cs2xyxy(_c, _s).flatten() return results @@ -1224,12 +1189,12 @@ class RandomBBoxTransform(BaseTransform): str: Formatted string. """ repr_str = self.__class__.__name__ - repr_str += f'(shift_prob={self.shift_prob}, ' - repr_str += f'shift_factor={self.shift_factor}, ' - repr_str += f'scale_prob={self.scale_prob}, ' - repr_str += f'scale_factor={self.scale_factor}, ' - repr_str += f'rotate_prob={self.rotate_prob}, ' - repr_str += f'rotate_factor={self.rotate_factor})' + repr_str += f"(shift_prob={self.shift_prob}, " + repr_str += f"shift_factor={self.shift_factor}, " + repr_str += f"scale_prob={self.scale_prob}, " + repr_str += f"scale_factor={self.scale_factor}, " + repr_str += f"rotate_prob={self.rotate_prob}, " + repr_str += f"rotate_factor={self.rotate_factor})" return repr_str @@ -1280,20 +1245,17 @@ class Albumentation(BaseTransform): Defaults to None, which will use {'img': 'image'}. """ - def __init__(self, - transforms: List[dict], - keymap: Optional[dict] = None) -> None: + def __init__(self, transforms: List[dict], keymap: Optional[dict] = None) -> None: if albumentations is None: - raise RuntimeError('albumentations is not installed') + raise RuntimeError("albumentations is not installed") self.transforms = transforms - self.aug = albumentations.Compose( - [self.albu_builder(t) for t in self.transforms]) + self.aug = albumentations.Compose([self.albu_builder(t) for t in self.transforms]) if not keymap: self.keymap_to_albu = { - 'img': 'image', + "img": "image", } else: self.keymap_to_albu = keymap @@ -1310,30 +1272,24 @@ class Albumentation(BaseTransform): albumentations.BasicTransform: The constructed transform object """ - assert isinstance(cfg, dict) and 'type' in cfg + assert isinstance(cfg, dict) and "type" in cfg args = cfg.copy() - obj_type = args.pop('type') + obj_type = args.pop("type") if mmengine.is_str(obj_type): if albumentations is None: - raise RuntimeError('albumentations is not installed') + raise RuntimeError("albumentations is not installed") rank, _ = get_dist_info() - if rank == 0 and not hasattr( - albumentations.augmentations.transforms, obj_type): - warnings.warn( - f'{obj_type} is not pixel-level transformations. ' - 'Please use with caution.') + if rank == 0 and not hasattr(albumentations.augmentations.transforms, obj_type): + warnings.warn(f"{obj_type} is not pixel-level transformations. " "Please use with caution.") obj_cls = getattr(albumentations, obj_type) elif isinstance(obj_type, type): obj_cls = obj_type else: - raise TypeError(f'type must be a str, but got {type(obj_type)}') + raise TypeError(f"type must be a str, but got {type(obj_type)}") - if 'transforms' in args: - args['transforms'] = [ - self.albu_builder(transform) - for transform in args['transforms'] - ] + if "transforms" in args: + args["transforms"] = [self.albu_builder(transform) for transform in args["transforms"]] return obj_cls(**args) @@ -1352,8 +1308,7 @@ class Albumentation(BaseTransform): # map result dict to albumentations format results_albu = {} for k, v in self.keymap_to_albu.items(): - assert k in results, \ - f'The `{k}` is required to perform albumentations transforms' + assert k in results, f"The `{k}` is required to perform albumentations transforms" results_albu[v] = results[k] # Apply albumentations transforms @@ -1371,7 +1326,7 @@ class Albumentation(BaseTransform): Returns: str: Formatted string. """ - repr_str = self.__class__.__name__ + f'(transforms={self.transforms})' + repr_str = self.__class__.__name__ + f"(transforms={self.transforms})" return repr_str @@ -1405,11 +1360,13 @@ class PhotometricDistortion(BaseTransform): hue_delta (int): delta of hue. """ - def __init__(self, - brightness_delta: int = 32, - contrast_range: Sequence[Number] = (0.5, 1.5), - saturation_range: Sequence[Number] = (0.5, 1.5), - hue_delta: int = 18) -> None: + def __init__( + self, + brightness_delta: int = 32, + contrast_range: Sequence[Number] = (0.5, 1.5), + saturation_range: Sequence[Number] = (0.5, 1.5), + hue_delta: int = 18, + ) -> None: self.brightness_delta = brightness_delta self.contrast_lower, self.contrast_upper = contrast_range self.saturation_lower, self.saturation_upper = saturation_range @@ -1437,29 +1394,32 @@ class PhotometricDistortion(BaseTransform): # the beta in `self._convert` to be added to image array # in brightness distortion - brightness_beta = np.random.uniform(-self.brightness_delta, - self.brightness_delta) + brightness_beta = np.random.uniform(-self.brightness_delta, self.brightness_delta) # the alpha in `self._convert` to be multiplied to image array # in contrast distortion - contrast_alpha = np.random.uniform(self.contrast_lower, - self.contrast_upper) + contrast_alpha = np.random.uniform(self.contrast_lower, self.contrast_upper) # the alpha in `self._convert` to be multiplied to image array # in saturation distortion to hsv-formatted img - saturation_alpha = np.random.uniform(self.saturation_lower, - self.saturation_upper) + saturation_alpha = np.random.uniform(self.saturation_lower, self.saturation_upper) # delta of hue to add to image array in hue distortion hue_delta = np.random.randint(-self.hue_delta, self.hue_delta) # the random permutation of channel order swap_channel_order = np.random.permutation(3) - return (contrast_mode, brightness_flag, contrast_flag, hsv_mode, - swap_flag, brightness_beta, contrast_alpha, saturation_alpha, - hue_delta, swap_channel_order) + return ( + contrast_mode, + brightness_flag, + contrast_flag, + hsv_mode, + swap_flag, + brightness_beta, + contrast_alpha, + saturation_alpha, + hue_delta, + swap_channel_order, + ) - def _convert(self, - img: np.ndarray, - alpha: float = 1, - beta: float = 0) -> np.ndarray: + def _convert(self, img: np.ndarray, alpha: float = 1, beta: float = 0) -> np.ndarray: """Multiple with alpha and add beta with clip. Args: @@ -1488,12 +1448,21 @@ class PhotometricDistortion(BaseTransform): dict: Result dict with images distorted. """ - assert 'img' in results, '`img` is not found in results' - img = results['img'] - - (contrast_mode, brightness_flag, contrast_flag, hsv_mode, swap_flag, - brightness_beta, contrast_alpha, saturation_alpha, hue_delta, - swap_channel_order) = self._random_flags() + assert "img" in results, "`img` is not found in results" + img = results["img"] + + ( + contrast_mode, + brightness_flag, + contrast_flag, + hsv_mode, + swap_flag, + brightness_beta, + contrast_alpha, + saturation_alpha, + hue_delta, + swap_channel_order, + ) = self._random_flags() # random brightness distortion if brightness_flag: @@ -1510,8 +1479,7 @@ class PhotometricDistortion(BaseTransform): img = mmcv.bgr2hsv(img) if hsv_mode == 1 or hsv_mode == 3: # apply saturation distortion to hsv-formatted img - img[:, :, 1] = self._convert( - img[:, :, 1], alpha=saturation_alpha) + img[:, :, 1] = self._convert(img[:, :, 1], alpha=saturation_alpha) if hsv_mode == 2 or hsv_mode == 3: # apply hue distortion to hsv-formatted img img[:, :, 0] = img[:, :, 0].astype(int) + hue_delta @@ -1525,7 +1493,7 @@ class PhotometricDistortion(BaseTransform): if swap_flag: img = img[..., swap_channel_order] - results['img'] = img + results["img"] = img return results def __repr__(self) -> str: @@ -1535,12 +1503,14 @@ class PhotometricDistortion(BaseTransform): str: Formatted string. """ repr_str = self.__class__.__name__ - repr_str += (f'(brightness_delta={self.brightness_delta}, ' - f'contrast_range=({self.contrast_lower}, ' - f'{self.contrast_upper}), ' - f'saturation_range=({self.saturation_lower}, ' - f'{self.saturation_upper}), ' - f'hue_delta={self.hue_delta})') + repr_str += ( + f"(brightness_delta={self.brightness_delta}, " + f"contrast_range=({self.contrast_lower}, " + f"{self.contrast_upper}), " + f"saturation_range=({self.saturation_lower}, " + f"{self.saturation_upper}), " + f"hue_delta={self.hue_delta})" + ) return repr_str @@ -1580,33 +1550,29 @@ class GenerateTarget(BaseTransform): effect. Defaults to ``None`` """ - def __init__(self, - encoder: MultiConfig, - target_type: Optional[str] = None, - multilevel: bool = False, - use_dataset_keypoint_weights: bool = False) -> None: + def __init__( + self, encoder: MultiConfig, target_type: Optional[str] = None, multilevel: bool = False, use_dataset_keypoint_weights: bool = False + ) -> None: super().__init__() if target_type is not None: rank, _ = get_dist_info() if rank == 0: warnings.warn( - 'The argument `target_type` is deprecated in' - ' GenerateTarget. The target type and encoded ' - 'keys will be determined by encoder(s).', - DeprecationWarning) + "The argument `target_type` is deprecated in" + " GenerateTarget. The target type and encoded " + "keys will be determined by encoder(s).", + DeprecationWarning, + ) self.encoder_cfg = deepcopy(encoder) self.multilevel = multilevel self.use_dataset_keypoint_weights = use_dataset_keypoint_weights if isinstance(self.encoder_cfg, list): - self.encoder = [ - KEYPOINT_CODECS.build(cfg) for cfg in self.encoder_cfg - ] + self.encoder = [KEYPOINT_CODECS.build(cfg) for cfg in self.encoder_cfg] else: - assert not self.multilevel, ( - 'Need multiple encoder configs if ``multilevel==True``') + assert not self.multilevel, "Need multiple encoder configs if ``multilevel==True``" self.encoder = KEYPOINT_CODECS.build(self.encoder_cfg) def transform(self, results: Dict) -> Optional[dict]: @@ -1615,53 +1581,45 @@ class GenerateTarget(BaseTransform): See ``transform()`` method of :class:`BaseTransform` for details. """ - if results.get('transformed_keypoints', None) is not None: + if results.get("transformed_keypoints", None) is not None: # use keypoints transformed by TopdownAffine - keypoints = results['transformed_keypoints'] - elif results.get('keypoints', None) is not None: + keypoints = results["transformed_keypoints"] + elif results.get("keypoints", None) is not None: # use original keypoints - keypoints = results['keypoints'] + keypoints = results["keypoints"] else: - raise ValueError( - 'GenerateTarget requires \'transformed_keypoints\' or' - ' \'keypoints\' in the results.') + raise ValueError("GenerateTarget requires 'transformed_keypoints' or" " 'keypoints' in the results.") - keypoints_visible = results['keypoints_visible'] + keypoints_visible = results["keypoints_visible"] if keypoints_visible.ndim == 3 and keypoints_visible.shape[2] == 2: - keypoints_visible, keypoints_visible_weights = \ - keypoints_visible[..., 0], keypoints_visible[..., 1] - results['keypoints_visible'] = keypoints_visible - results['keypoints_visible_weights'] = keypoints_visible_weights - - id_similarity = results.get('id_similarity', np.array([0])) - keypoints_visibility = results.get("keypoints_visibility", None) - + keypoints_visible, keypoints_visible_weights = keypoints_visible[..., 0], keypoints_visible[..., 1] + results["keypoints_visible"] = keypoints_visible + results["keypoints_visible_weights"] = keypoints_visible_weights + + id_similarity = results.get("id_similarity", np.array([0])) + keypoints_visibility = results.get("keypoints_visibility", None) + # Encoded items from the encoder(s) will be updated into the results. # Please refer to the document of the specific codec for details about # encoded items. if not isinstance(self.encoder, list): # For single encoding, the encoded items will be directly added # into results. - auxiliary_encode_kwargs = { - key: results[key] - for key in self.encoder.auxiliary_encode_keys - } + auxiliary_encode_kwargs = {key: results[key] for key in self.encoder.auxiliary_encode_keys} encoded = self.encoder.encode( keypoints=keypoints, keypoints_visible=keypoints_visible, keypoints_visibility=keypoints_visibility, id_similarity=id_similarity, - **auxiliary_encode_kwargs) + **auxiliary_encode_kwargs, + ) if self.encoder.field_mapping_table: - encoded[ - 'field_mapping_table'] = self.encoder.field_mapping_table + encoded["field_mapping_table"] = self.encoder.field_mapping_table if self.encoder.instance_mapping_table: - encoded['instance_mapping_table'] = \ - self.encoder.instance_mapping_table + encoded["instance_mapping_table"] = self.encoder.instance_mapping_table if self.encoder.label_mapping_table: - encoded[ - 'label_mapping_table'] = self.encoder.label_mapping_table + encoded["label_mapping_table"] = self.encoder.label_mapping_table else: encoded_list = [] @@ -1669,17 +1627,16 @@ class GenerateTarget(BaseTransform): _instance_mapping_table = dict() _label_mapping_table = dict() for _encoder in self.encoder: - auxiliary_encode_kwargs = { - key: results[key] - for key in _encoder.auxiliary_encode_keys - } + auxiliary_encode_kwargs = {key: results[key] for key in _encoder.auxiliary_encode_keys} encoded_list.append( _encoder.encode( keypoints=keypoints, keypoints_visible=keypoints_visible, keypoints_visibility=keypoints_visibility, id_similarity=id_similarity, - **auxiliary_encode_kwargs)) + **auxiliary_encode_kwargs, + ) + ) _field_mapping_table.update(_encoder.field_mapping_table) _instance_mapping_table.update(_encoder.instance_mapping_table) @@ -1690,16 +1647,10 @@ class GenerateTarget(BaseTransform): # should have the same keys. keys = encoded_list[0].keys() - if not all(_encoded.keys() == keys - for _encoded in encoded_list): - raise ValueError( - 'Encoded items from all encoders must have the same ' - 'keys if ``multilevel==True``.') + if not all(_encoded.keys() == keys for _encoded in encoded_list): + raise ValueError("Encoded items from all encoders must have the same " "keys if ``multilevel==True``.") - encoded = { - k: [_encoded[k] for _encoded in encoded_list] - for k in keys - } + encoded = {k: [_encoded[k] for _encoded in encoded_list] for k in keys} else: # For combined encoding, the encoded items from different @@ -1712,33 +1663,31 @@ class GenerateTarget(BaseTransform): for _encoded in encoded_list: for key, value in _encoded.items(): - if key == 'keypoint_weights': + if key == "keypoint_weights": keypoint_weights.append(value) elif key not in encoded: encoded[key] = value else: raise ValueError( - f'Overlapping item "{key}" from multiple ' - 'encoders, which is not supported when ' - '``multilevel==False``') + f'Overlapping item "{key}" from multiple ' "encoders, which is not supported when " "``multilevel==False``" + ) if keypoint_weights: - encoded['keypoint_weights'] = keypoint_weights + encoded["keypoint_weights"] = keypoint_weights if _field_mapping_table: - encoded['field_mapping_table'] = _field_mapping_table + encoded["field_mapping_table"] = _field_mapping_table if _instance_mapping_table: - encoded['instance_mapping_table'] = _instance_mapping_table + encoded["instance_mapping_table"] = _instance_mapping_table if _label_mapping_table: - encoded['label_mapping_table'] = _label_mapping_table + encoded["label_mapping_table"] = _label_mapping_table - if self.use_dataset_keypoint_weights and 'keypoint_weights' in encoded: - if isinstance(encoded['keypoint_weights'], list): - for w in encoded['keypoint_weights']: - w = w * results['dataset_keypoint_weights'] + if self.use_dataset_keypoint_weights and "keypoint_weights" in encoded: + if isinstance(encoded["keypoint_weights"], list): + for w in encoded["keypoint_weights"]: + w = w * results["dataset_keypoint_weights"] else: - encoded['keypoint_weights'] = encoded[ - 'keypoint_weights'] * results['dataset_keypoint_weights'] + encoded["keypoint_weights"] = encoded["keypoint_weights"] * results["dataset_keypoint_weights"] results.update(encoded) @@ -1751,9 +1700,8 @@ class GenerateTarget(BaseTransform): str: Formatted string. """ repr_str = self.__class__.__name__ - repr_str += (f'(encoder={str(self.encoder_cfg)}, ') - repr_str += ('use_dataset_keypoint_weights=' - f'{self.use_dataset_keypoint_weights})') + repr_str += f"(encoder={str(self.encoder_cfg)}, " + repr_str += "use_dataset_keypoint_weights=" f"{self.use_dataset_keypoint_weights})" return repr_str @@ -1777,19 +1725,14 @@ class YOLOXHSVRandomAug(BaseTransform): value_delta (int): delat of value. Defaults to 30. """ - def __init__(self, - hue_delta: int = 5, - saturation_delta: int = 30, - value_delta: int = 30) -> None: + def __init__(self, hue_delta: int = 5, saturation_delta: int = 30, value_delta: int = 30) -> None: self.hue_delta = hue_delta self.saturation_delta = saturation_delta self.value_delta = value_delta @cache_randomness def _get_hsv_gains(self): - hsv_gains = np.random.uniform(-1, 1, 3) * [ - self.hue_delta, self.saturation_delta, self.value_delta - ] + hsv_gains = np.random.uniform(-1, 1, 3) * [self.hue_delta, self.saturation_delta, self.value_delta] # random selection of h, s, v hsv_gains *= np.random.randint(0, 2, 3) # prevent overflow @@ -1797,7 +1740,7 @@ class YOLOXHSVRandomAug(BaseTransform): return hsv_gains def transform(self, results: dict) -> dict: - img = results['img'] + img = results["img"] hsv_gains = self._get_hsv_gains() img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV).astype(np.int16) @@ -1806,14 +1749,14 @@ class YOLOXHSVRandomAug(BaseTransform): img_hsv[..., 2] = np.clip(img_hsv[..., 2] + hsv_gains[2], 0, 255) cv2.cvtColor(img_hsv.astype(img.dtype), cv2.COLOR_HSV2BGR, dst=img) - results['img'] = img + results["img"] = img return results def __repr__(self): repr_str = self.__class__.__name__ - repr_str += f'(hue_delta={self.hue_delta}, ' - repr_str += f'saturation_delta={self.saturation_delta}, ' - repr_str += f'value_delta={self.value_delta})' + repr_str += f"(hue_delta={self.hue_delta}, " + repr_str += f"saturation_delta={self.saturation_delta}, " + repr_str += f"value_delta={self.value_delta})" return repr_str @@ -1858,14 +1801,16 @@ class FilterAnnotations(BaseTransform): becomes an empty bbox after filtering. Defaults to True. """ - def __init__(self, - min_gt_bbox_wh: Tuple[int, int] = (1, 1), - min_gt_area: int = 1, - min_kpt_vis: int = 1, - by_box: bool = False, - by_area: bool = False, - by_kpt: bool = True, - keep_empty: bool = True) -> None: + def __init__( + self, + min_gt_bbox_wh: Tuple[int, int] = (1, 1), + min_gt_area: int = 1, + min_kpt_vis: int = 1, + by_box: bool = False, + by_area: bool = False, + by_kpt: bool = True, + keep_empty: bool = True, + ) -> None: assert by_box or by_kpt or by_area self.min_gt_bbox_wh = min_gt_bbox_wh @@ -1885,22 +1830,20 @@ class FilterAnnotations(BaseTransform): Returns: dict: Updated result dict. """ - assert 'keypoints' in results - kpts = results['keypoints'] + assert "keypoints" in results + kpts = results["keypoints"] if kpts.shape[0] == 0: return results tests = [] - if self.by_box and 'bbox' in results: - bbox = results['bbox'] - tests.append( - ((bbox[..., 2] - bbox[..., 0] > self.min_gt_bbox_wh[0]) & - (bbox[..., 3] - bbox[..., 1] > self.min_gt_bbox_wh[1]))) - if self.by_area and 'area' in results: - area = results['area'] + if self.by_box and "bbox" in results: + bbox = results["bbox"] + tests.append(((bbox[..., 2] - bbox[..., 0] > self.min_gt_bbox_wh[0]) & (bbox[..., 3] - bbox[..., 1] > self.min_gt_bbox_wh[1]))) + if self.by_area and "area" in results: + area = results["area"] tests.append(area >= self.min_gt_area) if self.by_kpt: - kpts_vis = results['keypoints_visible'] + kpts_vis = results["keypoints_visible"] if kpts_vis.ndim == 3: kpts_vis = kpts_vis[..., 0] tests.append(kpts_vis.sum(axis=1) >= self.min_kpt_vis) @@ -1913,8 +1856,7 @@ class FilterAnnotations(BaseTransform): if self.keep_empty: return None - keys = ('bbox', 'bbox_score', 'category_id', 'keypoints', - 'keypoints_visible', 'area') + keys = ("bbox", "bbox_score", "category_id", "keypoints", "keypoints_visible", "area") for key in keys: if key in results: results[key] = results[key][keep] @@ -1922,14 +1864,16 @@ class FilterAnnotations(BaseTransform): return results def __repr__(self): - return (f'{self.__class__.__name__}(' - f'min_gt_bbox_wh={self.min_gt_bbox_wh}, ' - f'min_gt_area={self.min_gt_area}, ' - f'min_kpt_vis={self.min_kpt_vis}, ' - f'by_box={self.by_box}, ' - f'by_area={self.by_area}, ' - f'by_kpt={self.by_kpt}, ' - f'keep_empty={self.keep_empty})') + return ( + f"{self.__class__.__name__}(" + f"min_gt_bbox_wh={self.min_gt_bbox_wh}, " + f"min_gt_area={self.min_gt_area}, " + f"min_kpt_vis={self.min_kpt_vis}, " + f"by_box={self.by_box}, " + f"by_area={self.by_area}, " + f"by_kpt={self.by_kpt}, " + f"keep_empty={self.keep_empty})" + ) def compute_paddings(bbox, bbox_s, kpts): @@ -1940,16 +1884,26 @@ def compute_paddings(bbox, bbox_s, kpts): kpts = kpts.reshape(-1, 2) else: kpts = kpts.reshape(-1, 3) - + x0, y0, x1, y1 = bbox - x_bbox_distances = np.max(np.stack([ - np.clip(x0 - kpts[:, 0], a_min=0, a_max=None), - np.clip(kpts[:, 0] - x1, a_min=0, a_max=None), - ]), axis=0) - y_bbox_distances = np.max(np.stack([ - np.clip(y0 - kpts[:, 1], a_min=0, a_max=None), - np.clip(kpts[:, 1] - y1, a_min=0, a_max=None), - ]), axis=0) + x_bbox_distances = np.max( + np.stack( + [ + np.clip(x0 - kpts[:, 0], a_min=0, a_max=None), + np.clip(kpts[:, 0] - x1, a_min=0, a_max=None), + ] + ), + axis=0, + ) + y_bbox_distances = np.max( + np.stack( + [ + np.clip(y0 - kpts[:, 1], a_min=0, a_max=None), + np.clip(kpts[:, 1] - y1, a_min=0, a_max=None), + ] + ), + axis=0, + ) padding_x = 2 * x_bbox_distances / bbox_s[0] padding_y = 2 * y_bbox_distances / bbox_s[1] @@ -1957,4 +1911,4 @@ def compute_paddings(bbox, bbox_s, kpts): padding = np.maximum(x_bbox_distances, y_bbox_distances) - return padding.flatten() \ No newline at end of file + return padding.flatten() diff --git a/mmpose/datasets/transforms/converting.py b/mmpose/datasets/transforms/converting.py index b7e214733fa4ac842afe2c6efe8f5917d961889a..637a530ef1942d89c9cd23ea50187473bf4f7e1b 100644 --- a/mmpose/datasets/transforms/converting.py +++ b/mmpose/datasets/transforms/converting.py @@ -57,9 +57,7 @@ class KeypointConverter(BaseTransform): >>> results = self(results) """ - def __init__(self, num_keypoints: int, - mapping: Union[List[Tuple[int, int]], List[Tuple[Tuple, - int]]]): + def __init__(self, num_keypoints: int, mapping: Union[List[Tuple[int, int]], List[Tuple[Tuple, int]]]): self.num_keypoints = num_keypoints self.mapping = mapping if len(mapping): @@ -71,8 +69,7 @@ class KeypointConverter(BaseTransform): interpolation = False for x in source_index: if isinstance(x, (list, tuple)): - assert len(x) == 2, 'source_index should be a list/tuple of ' \ - 'length 2' + assert len(x) == 2, "source_index should be a list/tuple of " "length 2" src1.append(x[0]) src2.append(x[1]) interpolation = True @@ -91,23 +88,21 @@ class KeypointConverter(BaseTransform): def transform(self, results: dict) -> dict: """Transforms the keypoint results to match the target keypoints.""" - num_instances = results['keypoints'].shape[0] + num_instances = results["keypoints"].shape[0] - if 'keypoints_visible' not in results: - results['keypoints_visible'] = np.ones( - (num_instances, results['keypoints'].shape[1])) + if "keypoints_visible" not in results: + results["keypoints_visible"] = np.ones((num_instances, results["keypoints"].shape[1])) - if len(results['keypoints_visible'].shape) > 2: - results['keypoints_visible'] = results['keypoints_visible'][:, :, - 0] + if len(results["keypoints_visible"].shape) > 2: + results["keypoints_visible"] = results["keypoints_visible"][:, :, 0] # Initialize output arrays keypoints = np.zeros((num_instances, self.num_keypoints, 3)) keypoints_visible = np.zeros((num_instances, self.num_keypoints)) - key = 'keypoints_3d' if 'keypoints_3d' in results else 'keypoints' + key = "keypoints_3d" if "keypoints_3d" in results else "keypoints" c = results[key].shape[-1] - flip_indices = results.get('flip_indices', None) + flip_indices = results.get("flip_indices", None) # Create a mask to weight visibility loss keypoints_visible_weights = keypoints_visible.copy() @@ -115,37 +110,29 @@ class KeypointConverter(BaseTransform): # Interpolate keypoints if pairs of source indexes provided if self.interpolation: - keypoints[:, self.target_index, :c] = 0.5 * ( - results[key][:, self.source_index] + - results[key][:, self.source_index2]) - keypoints_visible[:, self.target_index] = results[ - 'keypoints_visible'][:, self.source_index] * results[ - 'keypoints_visible'][:, self.source_index2] + keypoints[:, self.target_index, :c] = 0.5 * (results[key][:, self.source_index] + results[key][:, self.source_index2]) + keypoints_visible[:, self.target_index] = ( + results["keypoints_visible"][:, self.source_index] * results["keypoints_visible"][:, self.source_index2] + ) # Flip keypoints if flip_indices provided if flip_indices is not None: - for i, (x1, x2) in enumerate( - zip(self.source_index, self.source_index2)): + for i, (x1, x2) in enumerate(zip(self.source_index, self.source_index2)): idx = flip_indices[x1] if x1 == x2 else i flip_indices[i] = idx if idx < self.num_keypoints else i - flip_indices = flip_indices[:len(self.source_index)] + flip_indices = flip_indices[: len(self.source_index)] # Otherwise just copy from the source index else: - keypoints[:, - self.target_index, :c] = results[key][:, - self.source_index] - keypoints_visible[:, self.target_index] = results[ - 'keypoints_visible'][:, self.source_index] + keypoints[:, self.target_index, :c] = results[key][:, self.source_index] + keypoints_visible[:, self.target_index] = results["keypoints_visible"][:, self.source_index] # Update the results dict - results['keypoints'] = keypoints[..., :2] - results['keypoints_visible'] = np.stack( - [keypoints_visible, keypoints_visible_weights], axis=2) - if 'keypoints_3d' in results: - results['keypoints_3d'] = keypoints - results['lifting_target'] = keypoints[results['target_idx']] - results['lifting_target_visible'] = keypoints_visible[ - results['target_idx']] - results['flip_indices'] = flip_indices + results["keypoints"] = keypoints[..., :2] + results["keypoints_visible"] = np.stack([keypoints_visible, keypoints_visible_weights], axis=2) + if "keypoints_3d" in results: + results["keypoints_3d"] = keypoints + results["lifting_target"] = keypoints[results["target_idx"]] + results["lifting_target_visible"] = keypoints_visible[results["target_idx"]] + results["flip_indices"] = flip_indices return results @@ -156,8 +143,7 @@ class KeypointConverter(BaseTransform): str: Formatted string. """ repr_str = self.__class__.__name__ - repr_str += f'(num_keypoints={self.num_keypoints}, '\ - f'mapping={self.mapping})' + repr_str += f"(num_keypoints={self.num_keypoints}, " f"mapping={self.mapping})" return repr_str @@ -201,22 +187,20 @@ class SingleHandConverter(BaseTransform): >>> results = self(results) """ - def __init__(self, num_keypoints: int, - left_hand_mapping: Union[List[Tuple[int, int]], - List[Tuple[Tuple, int]]], - right_hand_mapping: Union[List[Tuple[int, int]], - List[Tuple[Tuple, int]]]): + def __init__( + self, + num_keypoints: int, + left_hand_mapping: Union[List[Tuple[int, int]], List[Tuple[Tuple, int]]], + right_hand_mapping: Union[List[Tuple[int, int]], List[Tuple[Tuple, int]]], + ): self.num_keypoints = num_keypoints - self.left_hand_converter = KeypointConverter(num_keypoints, - left_hand_mapping) - self.right_hand_converter = KeypointConverter(num_keypoints, - right_hand_mapping) + self.left_hand_converter = KeypointConverter(num_keypoints, left_hand_mapping) + self.right_hand_converter = KeypointConverter(num_keypoints, right_hand_mapping) def transform(self, results: dict) -> dict: """Transforms the keypoint results to match the target keypoints.""" - assert 'hand_type' in results, ( - 'hand_type should be provided in results') - hand_type = results['hand_type'] + assert "hand_type" in results, "hand_type should be provided in results" + hand_type = results["hand_type"] if np.sum(hand_type - [[0, 1]]) <= 1e-6: # left hand @@ -224,7 +208,7 @@ class SingleHandConverter(BaseTransform): elif np.sum(hand_type - [[1, 0]]) <= 1e-6: results = self.right_hand_converter(results) else: - raise ValueError('hand_type should be left or right') + raise ValueError("hand_type should be left or right") return results @@ -235,7 +219,9 @@ class SingleHandConverter(BaseTransform): str: Formatted string. """ repr_str = self.__class__.__name__ - repr_str += f'(num_keypoints={self.num_keypoints}, '\ - f'left_hand_converter={self.left_hand_converter}, '\ - f'right_hand_converter={self.right_hand_converter})' + repr_str += ( + f"(num_keypoints={self.num_keypoints}, " + f"left_hand_converter={self.left_hand_converter}, " + f"right_hand_converter={self.right_hand_converter})" + ) return repr_str diff --git a/mmpose/datasets/transforms/formatting.py b/mmpose/datasets/transforms/formatting.py index 833a3bad5b3a3ee6024e1d21bc852ba45185c6f4..9ec40b2b391a9d62a75c0083d8d89dbc05a9e339 100644 --- a/mmpose/datasets/transforms/formatting.py +++ b/mmpose/datasets/transforms/formatting.py @@ -11,8 +11,7 @@ from mmpose.registry import TRANSFORMS from mmpose.structures import MultilevelPixelData, PoseDataSample -def image_to_tensor(img: Union[np.ndarray, - Sequence[np.ndarray]]) -> torch.torch.Tensor: +def image_to_tensor(img: Union[np.ndarray, Sequence[np.ndarray]]) -> torch.torch.Tensor: """Translate image or sequence of images to tensor. Multiple image tensors will be stacked. @@ -37,8 +36,7 @@ def image_to_tensor(img: Union[np.ndarray, return tensor -def keypoints_to_tensor(keypoints: Union[np.ndarray, Sequence[np.ndarray]] - ) -> torch.torch.Tensor: +def keypoints_to_tensor(keypoints: Union[np.ndarray, Sequence[np.ndarray]]) -> torch.torch.Tensor: """Translate keypoints or sequence of keypoints to tensor. Multiple keypoints tensors will be stacked. @@ -54,8 +52,7 @@ def keypoints_to_tensor(keypoints: Union[np.ndarray, Sequence[np.ndarray]] tensor = torch.from_numpy(keypoints).contiguous() else: assert is_seq_of(keypoints, np.ndarray) - tensor = torch.stack( - [keypoints_to_tensor(_keypoints) for _keypoints in keypoints]) + tensor = torch.stack([keypoints_to_tensor(_keypoints) for _keypoints in keypoints]) return tensor @@ -105,63 +102,77 @@ class PackPoseInputs(BaseTransform): # items in `instance_mapping_table` will be directly packed into # PoseDataSample.gt_instances without converting to Tensor instance_mapping_table = dict( - bbox='bboxes', - bbox_score='bbox_scores', - keypoints='keypoints', - keypoints_cam='keypoints_cam', - keypoints_visible='keypoints_visible', - keypoints_visibility='keypoints_visibility', + bbox="bboxes", + bbox_score="bbox_scores", + keypoints="keypoints", + keypoints_cam="keypoints_cam", + keypoints_visible="keypoints_visible", + keypoints_visibility="keypoints_visibility", # In CocoMetric, the area of predicted instances will be calculated # using gt_instances.bbox_scales. To unsure correspondence with # previous version, this key is preserved here. - bbox_scale='bbox_scales', + bbox_scale="bbox_scales", # `head_size` is used for computing MpiiPCKAccuracy metric, # namely, PCKh - head_size='head_size', + head_size="head_size", # `in_image` is used for training in/out probability prediction # and as a mask for some losses - in_image='in_image', + in_image="in_image", # `annotated` is used as weight for some losses. Different from # both `keypoints_visible` and `keypoint_weights`, `annotated` is # a binary mask indicating whether the keypoint is annotated. # annotated='annotated', - keypoints_scaled='keypoints_scaled', - heatmap_keypoints='heatmap_keypoints', - keypoints_in_image='keypoints_in_image', - bbox_mask='bbox_mask', - identification_similarity='identification_similarity', - identified='identified', - out_heatmaps='out_heatmaps', - out_kpt_weights='out_kpt_weights', - bbox_xyxy_wrt_input='bbox_xyxy_wrt_input', + keypoints_scaled="keypoints_scaled", + heatmap_keypoints="heatmap_keypoints", + keypoints_in_image="keypoints_in_image", + bbox_mask="bbox_mask", + identification_similarity="identification_similarity", + identified="identified", + out_heatmaps="out_heatmaps", + out_kpt_weights="out_kpt_weights", + bbox_xyxy_wrt_input="bbox_xyxy_wrt_input", ) # items in `field_mapping_table` will be packed into # PoseDataSample.gt_fields and converted to Tensor. These items will be # used for computing losses field_mapping_table = dict( - heatmaps='heatmaps', - instance_heatmaps='instance_heatmaps', - heatmap_mask='heatmap_mask', - heatmap_weights='heatmap_weights', - displacements='displacements', - displacement_weights='displacement_weights') + heatmaps="heatmaps", + instance_heatmaps="instance_heatmaps", + heatmap_mask="heatmap_mask", + heatmap_weights="heatmap_weights", + displacements="displacements", + displacement_weights="displacement_weights", + ) # items in `label_mapping_table` will be packed into # PoseDataSample.gt_instance_labels and converted to Tensor. These items # will be used for computing losses label_mapping_table = dict( - keypoint_labels='keypoint_labels', - keypoint_weights='keypoint_weights', - keypoints_visible_weights='keypoints_visible_weights') - - def __init__(self, - meta_keys=('id', 'img_id', 'img_path', 'category_id', - 'crowd_index', 'ori_shape', 'img_shape', - 'input_size', 'input_center', 'input_scale', - 'flip', 'flip_direction', 'flip_indices', - 'raw_ann_info', 'dataset_name'), - pack_transformed=False): + keypoint_labels="keypoint_labels", keypoint_weights="keypoint_weights", keypoints_visible_weights="keypoints_visible_weights" + ) + + def __init__( + self, + meta_keys=( + "id", + "img_id", + "img_path", + "category_id", + "crowd_index", + "ori_shape", + "img_shape", + "input_size", + "input_center", + "input_scale", + "flip", + "flip_direction", + "flip_indices", + "raw_ann_info", + "dataset_name", + ), + pack_transformed=False, + ): self.meta_keys = meta_keys self.pack_transformed = pack_transformed @@ -180,48 +191,43 @@ class PackPoseInputs(BaseTransform): """ # print("\n\nPacking results") # Pack image(s) for 2d pose estimation - if 'img' in results: - img = results['img'] + if "img" in results: + img = results["img"] inputs_tensor = image_to_tensor(img) # Pack keypoints for 3d pose-lifting - elif 'lifting_target' in results and 'keypoints' in results: - if 'keypoint_labels' in results: - keypoints = results['keypoint_labels'] + elif "lifting_target" in results and "keypoints" in results: + if "keypoint_labels" in results: + keypoints = results["keypoint_labels"] else: - keypoints = results['keypoints'] + keypoints = results["keypoints"] inputs_tensor = keypoints_to_tensor(keypoints) if "in_image" in results: - if 'keypoints_in_image' not in results: + if "keypoints_in_image" not in results: # If `keypoints_in_image` is not provided, use `keypoints_visible` as # default value. ('keypoints_visible' = annotated) - results['keypoints_in_image'] = results['in_image'] - results['keypoints_in_image'] = ( - results['keypoints_in_image'] & - results['in_image']) + results["keypoints_in_image"] = results["in_image"] + results["keypoints_in_image"] = results["keypoints_in_image"] & results["in_image"] data_sample = PoseDataSample() # pack instance data gt_instances = InstanceData() - _instance_mapping_table = results.get('instance_mapping_table', - self.instance_mapping_table) + _instance_mapping_table = results.get("instance_mapping_table", self.instance_mapping_table) for key, packed_key in _instance_mapping_table.items(): if key in results: gt_instances.set_field(results[key], packed_key) # pack `transformed_keypoints` for visualizing data transform # and augmentation results - if self.pack_transformed and 'transformed_keypoints' in results: - gt_instances.set_field(results['transformed_keypoints'], - 'transformed_keypoints') + if self.pack_transformed and "transformed_keypoints" in results: + gt_instances.set_field(results["transformed_keypoints"], "transformed_keypoints") data_sample.gt_instances = gt_instances # pack instance labels gt_instance_labels = InstanceData() - _label_mapping_table = results.get('label_mapping_table', - self.label_mapping_table) + _label_mapping_table = results.get("label_mapping_table", self.label_mapping_table) for key, packed_key in _label_mapping_table.items(): if key in results: if isinstance(results[key], list): @@ -238,24 +244,19 @@ class PackPoseInputs(BaseTransform): # pack fields gt_fields = None - _field_mapping_table = results.get('field_mapping_table', - self.field_mapping_table) + _field_mapping_table = results.get("field_mapping_table", self.field_mapping_table) for key, packed_key in _field_mapping_table.items(): if key in results: if isinstance(results[key], list): if gt_fields is None: gt_fields = MultilevelPixelData() else: - assert isinstance( - gt_fields, MultilevelPixelData - ), 'Got mixed single-level and multi-level pixel data.' + assert isinstance(gt_fields, MultilevelPixelData), "Got mixed single-level and multi-level pixel data." else: if gt_fields is None: gt_fields = PixelData() else: - assert isinstance( - gt_fields, PixelData - ), 'Got mixed single-level and multi-level pixel data.' + assert isinstance(gt_fields, PixelData), "Got mixed single-level and multi-level pixel data." gt_fields.set_field(results[key], packed_key) @@ -266,8 +267,8 @@ class PackPoseInputs(BaseTransform): data_sample.set_metainfo(img_meta) packed_results = dict() - packed_results['inputs'] = inputs_tensor - packed_results['data_samples'] = data_sample + packed_results["inputs"] = inputs_tensor + packed_results["data_samples"] = data_sample # print("\n\nPacked results done") @@ -285,6 +286,6 @@ class PackPoseInputs(BaseTransform): str: Formatted string. """ repr_str = self.__class__.__name__ - repr_str += f'(meta_keys={self.meta_keys}, ' - repr_str += f'pack_transformed={self.pack_transformed})' + repr_str += f"(meta_keys={self.meta_keys}, " + repr_str += f"pack_transformed={self.pack_transformed})" return repr_str diff --git a/mmpose/datasets/transforms/hand_transforms.py b/mmpose/datasets/transforms/hand_transforms.py index cd43f860e57c7f72ea292aeff9da70085741674e..804e022ad0a1f76caecfe58ef347189434960d4f 100644 --- a/mmpose/datasets/transforms/hand_transforms.py +++ b/mmpose/datasets/transforms/hand_transforms.py @@ -3,6 +3,7 @@ from typing import List, Union from mmpose.codecs import * # noqa: F401, F403 from mmpose.registry import TRANSFORMS + from .common_transforms import RandomFlip @@ -39,7 +40,7 @@ class HandRandomFlip(RandomFlip): """ def __init__(self, prob: Union[float, List[float]] = 0.5) -> None: - super().__init__(prob=prob, direction='horizontal') + super().__init__(prob=prob, direction="horizontal") def transform(self, results: dict) -> dict: """The transform function of :class:`HandRandomFlip`. @@ -56,12 +57,12 @@ class HandRandomFlip(RandomFlip): results = super().transform(results) # flip hand type and root depth - hand_type = results['hand_type'] - rel_root_depth = results['rel_root_depth'] - flipped = results['flip'] + hand_type = results["hand_type"] + rel_root_depth = results["rel_root_depth"] + flipped = results["flip"] if flipped: hand_type[..., [0, 1]] = hand_type[..., [1, 0]] rel_root_depth = -rel_root_depth - results['hand_type'] = hand_type - results['rel_root_depth'] = rel_root_depth + results["hand_type"] = hand_type + results["rel_root_depth"] = rel_root_depth return results diff --git a/mmpose/datasets/transforms/loading.py b/mmpose/datasets/transforms/loading.py index 5542001a3b4307e8804bc4c155a150931ec1a149..4c010fdf2565ce4d82a8b10183c3ba720664e289 100644 --- a/mmpose/datasets/transforms/loading.py +++ b/mmpose/datasets/transforms/loading.py @@ -5,7 +5,6 @@ import numpy as np from mmcv.transforms import LoadImageFromFile from mmpose.registry import TRANSFORMS - from mmpose.structures.keypoint import fix_bbox_aspect_ratio @@ -55,51 +54,48 @@ class LoadImage(LoadImageFromFile): dict: The result dict. """ try: - if 'img' not in results: + if "img" not in results: # Load image from file by :meth:`LoadImageFromFile.transform` results = super().transform(results) else: - img = results['img'] + img = results["img"] assert isinstance(img, np.ndarray) if self.to_float32: img = img.astype(np.float32) - if 'img_path' not in results: - results['img_path'] = None - results['img_shape'] = img.shape[:2] - results['ori_shape'] = img.shape[:2] + if "img_path" not in results: + results["img_path"] = None + results["img_shape"] = img.shape[:2] + results["ori_shape"] = img.shape[:2] if self.pad_to_aspect_ratio: # Pad image with zeros to ensure activation map is not cut off - abox_xyxy = fix_bbox_aspect_ratio( - results['bbox'], aspect_ratio=3/4, padding=1.25, bbox_format='xyxy').flatten() - - x_pad = np.array([max(0, -abox_xyxy[0]), max(0, abox_xyxy[2] - results['img_shape'][1])], dtype=int) - y_pad = np.array([max(0, -abox_xyxy[1]), max(0, abox_xyxy[3] - results['img_shape'][0])], dtype=int) - - img = results['img'] - img = np.pad(img, ((y_pad[0], y_pad[1]), (x_pad[0], x_pad[1]), (0, 0)), mode='constant', constant_values=255) - results['img'] = img - + abox_xyxy = fix_bbox_aspect_ratio(results["bbox"], aspect_ratio=3 / 4, padding=1.25, bbox_format="xyxy").flatten() + + x_pad = np.array([max(0, -abox_xyxy[0]), max(0, abox_xyxy[2] - results["img_shape"][1])], dtype=int) + y_pad = np.array([max(0, -abox_xyxy[1]), max(0, abox_xyxy[3] - results["img_shape"][0])], dtype=int) + + img = results["img"] + img = np.pad(img, ((y_pad[0], y_pad[1]), (x_pad[0], x_pad[1]), (0, 0)), mode="constant", constant_values=255) + results["img"] = img + # Update bbox - bbox = np.array(results['bbox']).flatten() + bbox = np.array(results["bbox"]).flatten() bbox[:2] += np.array([x_pad[0], y_pad[0]]) bbox[2:] += np.array([x_pad[0], y_pad[0]]) - results['bbox'] = bbox.reshape(np.array(results['bbox']).shape) + results["bbox"] = bbox.reshape(np.array(results["bbox"]).shape) # Update keypoints - kpts = np.array(results['keypoints']).reshape(-1, 2) + kpts = np.array(results["keypoints"]).reshape(-1, 2) kpts[:, :2] += np.array([x_pad[0], y_pad[0]]) - results['keypoints'] = kpts.reshape(np.array(results['keypoints']).shape) + results["keypoints"] = kpts.reshape(np.array(results["keypoints"]).shape) # Update img_shape and ori_shape - results['img_shape'] = img.shape[:2] - results['ori_shape'] = img.shape[:2] + results["img_shape"] = img.shape[:2] + results["ori_shape"] = img.shape[:2] except Exception as e: - e = type(e)( - f'`{str(e)}` occurs when loading `{results["img_path"]}`.' - 'Please check whether the file exists.') + e = type(e)(f'`{str(e)}` occurs when loading `{results["img_path"]}`.' "Please check whether the file exists.") raise e return results diff --git a/mmpose/datasets/transforms/mix_img_transforms.py b/mmpose/datasets/transforms/mix_img_transforms.py index 84d03ea5a2f1a993cb7f870de9d8bf288c0e0211..063abfa1a4ee204b7256f810b9eca4c26faa3cb3 100644 --- a/mmpose/datasets/transforms/mix_img_transforms.py +++ b/mmpose/datasets/transforms/mix_img_transforms.py @@ -11,8 +11,7 @@ from mmengine.dataset.base_dataset import Compose from numpy import random from mmpose.registry import TRANSFORMS -from mmpose.structures import (bbox_clip_border, flip_bbox, flip_keypoints, - keypoint_clip_border) +from mmpose.structures import bbox_clip_border, flip_bbox, flip_keypoints, keypoint_clip_border class MixImageTransform(BaseTransform, metaclass=ABCMeta): @@ -25,9 +24,7 @@ class MixImageTransform(BaseTransform, metaclass=ABCMeta): Defaults to 1.0. """ - def __init__(self, - pre_transform: Optional[Sequence[str]] = None, - prob: float = 1.0): + def __init__(self, pre_transform: Optional[Sequence[str]] = None, prob: float = 1.0): self.prob = prob @@ -45,15 +42,15 @@ class MixImageTransform(BaseTransform, metaclass=ABCMeta): if random.uniform(0, 1) < self.prob: - dataset = results.pop('dataset', None) + dataset = results.pop("dataset", None) - results['mixed_data_list'] = self._get_mixed_data_list(dataset) + results["mixed_data_list"] = self._get_mixed_data_list(dataset) results = self.apply_mix(results) - if 'mixed_data_list' in results: - results.pop('mixed_data_list') + if "mixed_data_list" in results: + results.pop("mixed_data_list") - results['dataset'] = dataset + results["dataset"] = dataset return results @@ -66,19 +63,15 @@ class MixImageTransform(BaseTransform, metaclass=ABCMeta): Returns: List[dict]: A list of dictionaries containing mixed data samples. """ - indexes = [ - random.randint(0, len(dataset)) for _ in range(self.num_aux_image) - ] + indexes = [random.randint(0, len(dataset)) for _ in range(self.num_aux_image)] - mixed_data_list = [ - copy.deepcopy(dataset.get_data_info(index)) for index in indexes - ] + mixed_data_list = [copy.deepcopy(dataset.get_data_info(index)) for index in indexes] if self.pre_transform is not None: for i, data in enumerate(mixed_data_list): - data.update({'dataset': dataset}) + data.update({"dataset": dataset}) _results = self.pre_transform(data) - _results.pop('dataset') + _results.pop("dataset") mixed_data_list[i] = _results return mixed_data_list @@ -168,29 +161,26 @@ class Mosaic(MixImageTransform): def apply_mix(self, results: dict) -> dict: """Apply mosaic augmentation to the input data.""" - assert 'mixed_data_list' in results - mixed_data_list = results.pop('mixed_data_list') + assert "mixed_data_list" in results + mixed_data_list = results.pop("mixed_data_list") assert len(mixed_data_list) == self.num_aux_image img, annos = self._create_mosaic_image(results, mixed_data_list) - bboxes = annos['bboxes'] - kpts = annos['keypoints'] - kpts_vis = annos['keypoints_visible'] - - bboxes = bbox_clip_border(bboxes, (2 * self.img_scale[0], - 2 * self.img_scale[1])) - kpts, kpts_vis = keypoint_clip_border(kpts, kpts_vis, - (2 * self.img_scale[0], - 2 * self.img_scale[1])) - - results['img'] = img - results['img_shape'] = img.shape - results['bbox'] = bboxes - results['category_id'] = annos['category_id'] - results['bbox_score'] = annos['bbox_scores'] - results['keypoints'] = kpts - results['keypoints_visible'] = kpts_vis - results['area'] = annos['area'] + bboxes = annos["bboxes"] + kpts = annos["keypoints"] + kpts_vis = annos["keypoints_visible"] + + bboxes = bbox_clip_border(bboxes, (2 * self.img_scale[0], 2 * self.img_scale[1])) + kpts, kpts_vis = keypoint_clip_border(kpts, kpts_vis, (2 * self.img_scale[0], 2 * self.img_scale[1])) + + results["img"] = img + results["img_shape"] = img.shape + results["bbox"] = bboxes + results["category_id"] = annos["category_id"] + results["bbox_score"] = annos["bbox_scores"] + results["keypoints"] = kpts + results["keypoints_visible"] = kpts_vis + results["area"] = annos["area"] return results @@ -200,28 +190,23 @@ class Mosaic(MixImageTransform): # init mosaic image img_scale_w, img_scale_h = self.img_scale - mosaic_img = np.full((int(img_scale_h * 2), int(img_scale_w * 2), 3), - self.pad_val, - dtype=results['img'].dtype) + mosaic_img = np.full((int(img_scale_h * 2), int(img_scale_w * 2), 3), self.pad_val, dtype=results["img"].dtype) # calculate mosaic center - center = (int(random.uniform(*self.center_range) * img_scale_w), - int(random.uniform(*self.center_range) * img_scale_h)) + center = (int(random.uniform(*self.center_range) * img_scale_w), int(random.uniform(*self.center_range) * img_scale_h)) annos = defaultdict(list) - locs = ('top_left', 'top_right', 'bottom_left', 'bottom_right') + locs = ("top_left", "top_right", "bottom_left", "bottom_right") for loc, data in zip(locs, (results, *mixed_data_list)): # process image - img = data['img'] + img = data["img"] h, w = img.shape[:2] scale_ratio = min(img_scale_h / h, img_scale_w / w) - img = mmcv.imresize(img, - (int(w * scale_ratio), int(h * scale_ratio))) + img = mmcv.imresize(img, (int(w * scale_ratio), int(h * scale_ratio))) # paste - paste_coord, crop_coord = self._mosaic_combine( - loc, center, img.shape[:2][::-1]) + paste_coord, crop_coord = self._mosaic_combine(loc, center, img.shape[:2][::-1]) x1_p, y1_p, x2_p, y2_p = paste_coord x1_c, y1_c, x2_c, y2_c = crop_coord @@ -231,31 +216,31 @@ class Mosaic(MixImageTransform): padh = y1_p - y1_c # merge annotations - if 'bbox' in data: - bboxes = data['bbox'] + if "bbox" in data: + bboxes = data["bbox"] # rescale & translate bboxes *= scale_ratio bboxes[..., ::2] += padw bboxes[..., 1::2] += padh - annos['bboxes'].append(bboxes) - annos['bbox_scores'].append(data['bbox_score']) - annos['category_id'].append(data['category_id']) + annos["bboxes"].append(bboxes) + annos["bbox_scores"].append(data["bbox_score"]) + annos["category_id"].append(data["category_id"]) - if 'keypoints' in data: - kpts = data['keypoints'] + if "keypoints" in data: + kpts = data["keypoints"] # rescale & translate kpts *= scale_ratio kpts[..., 0] += padw kpts[..., 1] += padh - annos['keypoints'].append(kpts) - annos['keypoints_visible'].append(data['keypoints_visible']) + annos["keypoints"].append(kpts) + annos["keypoints_visible"].append(data["keypoints_visible"]) - if 'area' in data: - annos['area'].append(data['area'] * scale_ratio**2) + if "area" in data: + annos["area"].append(data["area"] * scale_ratio**2) for key in annos: annos[key] = np.concatenate(annos[key]) @@ -267,36 +252,33 @@ class Mosaic(MixImageTransform): """Determine the overall coordinates of the mosaic image and the specific coordinates of the cropped sub-image.""" - assert loc in ('top_left', 'top_right', 'bottom_left', 'bottom_right') + assert loc in ("top_left", "top_right", "bottom_left", "bottom_right") x1, y1, x2, y2 = 0, 0, 0, 0 cx, cy = center w, h = img_shape - if loc == 'top_left': + if loc == "top_left": x1, y1, x2, y2 = max(cx - w, 0), max(cy - h, 0), cx, cy crop_coord = w - (x2 - x1), h - (y2 - y1), w, h - elif loc == 'top_right': - x1, y1, x2, y2 = cx, max(cy - h, 0), min(cx + w, - self.img_scale[0] * 2), cy + elif loc == "top_right": + x1, y1, x2, y2 = cx, max(cy - h, 0), min(cx + w, self.img_scale[0] * 2), cy crop_coord = 0, h - (y2 - y1), min(w, x2 - x1), h - elif loc == 'bottom_left': - x1, y1, x2, y2 = max(cx - w, - 0), cy, cx, min(self.img_scale[1] * 2, cy + h) + elif loc == "bottom_left": + x1, y1, x2, y2 = max(cx - w, 0), cy, cx, min(self.img_scale[1] * 2, cy + h) crop_coord = w - (x2 - x1), 0, w, min(y2 - y1, h) else: - x1, y1, x2, y2 = cx, cy, min(cx + w, self.img_scale[0] * - 2), min(self.img_scale[1] * 2, cy + h) + x1, y1, x2, y2 = cx, cy, min(cx + w, self.img_scale[0] * 2), min(self.img_scale[1] * 2, cy + h) crop_coord = 0, 0, min(w, x2 - x1), min(y2 - y1, h) return (x1, y1, x2, y2), crop_coord def __repr__(self) -> str: repr_str = self.__class__.__name__ - repr_str += f'(img_scale={self.img_scale}, ' - repr_str += f'center_range={self.center_range}, ' - repr_str += f'pad_val={self.pad_val}, ' - repr_str += f'prob={self.prob})' + repr_str += f"(img_scale={self.img_scale}, " + repr_str += f"center_range={self.center_range}, " + repr_str += f"pad_val={self.pad_val}, " + repr_str += f"prob={self.prob})" return repr_str @@ -361,16 +343,19 @@ class YOLOXMixUp(MixImageTransform): prob (float): Probability of applying the mixup transformation. Defaults to 1.0. """ + num_aux_image = 1 - def __init__(self, - img_scale: Tuple[int, int] = (640, 640), - ratio_range: Tuple[float, float] = (0.5, 1.5), - flip_ratio: float = 0.5, - pad_val: float = 114.0, - bbox_clip_border: bool = True, - pre_transform: Sequence[dict] = None, - prob: float = 1.0): + def __init__( + self, + img_scale: Tuple[int, int] = (640, 640), + ratio_range: Tuple[float, float] = (0.5, 1.5), + flip_ratio: float = 0.5, + pad_val: float = 114.0, + bbox_clip_border: bool = True, + pre_transform: Sequence[dict] = None, + prob: float = 1.0, + ): assert isinstance(img_scale, tuple) super().__init__(pre_transform=pre_transform, prob=prob) self.img_scale = img_scale @@ -382,30 +367,30 @@ class YOLOXMixUp(MixImageTransform): def apply_mix(self, results: dict) -> dict: """YOLOX MixUp transform function.""" - assert 'mixed_data_list' in results - mixed_data_list = results.pop('mixed_data_list') + assert "mixed_data_list" in results + mixed_data_list = results.pop("mixed_data_list") assert len(mixed_data_list) == self.num_aux_image - if mixed_data_list[0]['keypoints'].shape[0] == 0: + if mixed_data_list[0]["keypoints"].shape[0] == 0: return results img, annos = self._create_mixup_image(results, mixed_data_list) - bboxes = annos['bboxes'] - kpts = annos['keypoints'] - kpts_vis = annos['keypoints_visible'] + bboxes = annos["bboxes"] + kpts = annos["keypoints"] + kpts_vis = annos["keypoints_visible"] h, w = img.shape[:2] bboxes = bbox_clip_border(bboxes, (w, h)) kpts, kpts_vis = keypoint_clip_border(kpts, kpts_vis, (w, h)) - results['img'] = img.astype(np.uint8) - results['img_shape'] = img.shape - results['bbox'] = bboxes - results['category_id'] = annos['category_id'] - results['bbox_score'] = annos['bbox_scores'] - results['keypoints'] = kpts - results['keypoints_visible'] = kpts_vis - results['area'] = annos['area'] + results["img"] = img.astype(np.uint8) + results["img_shape"] = img.shape + results["bbox"] = bboxes + results["category_id"] = annos["category_id"] + results["bbox_score"] = annos["bbox_scores"] + results["keypoints"] = kpts + results["keypoints_visible"] = kpts_vis + results["area"] = annos["area"] return results @@ -414,27 +399,23 @@ class YOLOXMixUp(MixImageTransform): two input images.""" aux_results = mixed_data_list[0] - aux_img = aux_results['img'] + aux_img = aux_results["img"] # init mixup image - out_img = np.ones((self.img_scale[1], self.img_scale[0], 3), - dtype=aux_img.dtype) * self.pad_val + out_img = np.ones((self.img_scale[1], self.img_scale[0], 3), dtype=aux_img.dtype) * self.pad_val annos = defaultdict(list) # Calculate scale ratio and resize aux_img - scale_ratio = min(self.img_scale[1] / aux_img.shape[0], - self.img_scale[0] / aux_img.shape[1]) - aux_img = mmcv.imresize(aux_img, (int(aux_img.shape[1] * scale_ratio), - int(aux_img.shape[0] * scale_ratio))) + scale_ratio = min(self.img_scale[1] / aux_img.shape[0], self.img_scale[0] / aux_img.shape[1]) + aux_img = mmcv.imresize(aux_img, (int(aux_img.shape[1] * scale_ratio), int(aux_img.shape[0] * scale_ratio))) # Set the resized aux_img in the top-left of out_img - out_img[:aux_img.shape[0], :aux_img.shape[1]] = aux_img + out_img[: aux_img.shape[0], : aux_img.shape[1]] = aux_img # random rescale jit_factor = random.uniform(*self.ratio_range) scale_ratio *= jit_factor - out_img = mmcv.imresize(out_img, (int(out_img.shape[1] * jit_factor), - int(out_img.shape[0] * jit_factor))) + out_img = mmcv.imresize(out_img, (int(out_img.shape[1] * jit_factor), int(out_img.shape[0] * jit_factor))) # random flip is_filp = random.uniform(0, 1) > self.flip_ratio @@ -442,7 +423,7 @@ class YOLOXMixUp(MixImageTransform): out_img = out_img[:, ::-1, :] # random crop - ori_img = results['img'] + ori_img = results["img"] aux_h, aux_w = out_img.shape[:2] h, w = ori_img.shape[:2] padded_img = np.ones((max(aux_h, h), max(aux_w, w), 3)) * self.pad_val @@ -451,41 +432,34 @@ class YOLOXMixUp(MixImageTransform): dy = random.randint(0, max(0, padded_img.shape[0] - h) + 1) dx = random.randint(0, max(0, padded_img.shape[1] - w) + 1) - padded_cropped_img = padded_img[dy:dy + h, dx:dx + w] + padded_cropped_img = padded_img[dy : dy + h, dx : dx + w] # mix up mixup_img = 0.5 * ori_img + 0.5 * padded_cropped_img # merge annotations # bboxes - bboxes = aux_results['bbox'].copy() + bboxes = aux_results["bbox"].copy() bboxes *= scale_ratio bboxes = bbox_clip_border(bboxes, (aux_w, aux_h)) if is_filp: - bboxes = flip_bbox(bboxes, [aux_w, aux_h], 'xyxy') + bboxes = flip_bbox(bboxes, [aux_w, aux_h], "xyxy") bboxes[..., ::2] -= dx bboxes[..., 1::2] -= dy - annos['bboxes'] = [results['bbox'], bboxes] - annos['bbox_scores'] = [ - results['bbox_score'], aux_results['bbox_score'] - ] - annos['category_id'] = [ - results['category_id'], aux_results['category_id'] - ] + annos["bboxes"] = [results["bbox"], bboxes] + annos["bbox_scores"] = [results["bbox_score"], aux_results["bbox_score"]] + annos["category_id"] = [results["category_id"], aux_results["category_id"]] # keypoints - kpts = aux_results['keypoints'] * scale_ratio - kpts, kpts_vis = keypoint_clip_border(kpts, - aux_results['keypoints_visible'], - (aux_w, aux_h)) + kpts = aux_results["keypoints"] * scale_ratio + kpts, kpts_vis = keypoint_clip_border(kpts, aux_results["keypoints_visible"], (aux_w, aux_h)) if is_filp: - kpts, kpts_vis = flip_keypoints(kpts, kpts_vis, (aux_w, aux_h), - aux_results['flip_indices']) + kpts, kpts_vis = flip_keypoints(kpts, kpts_vis, (aux_w, aux_h), aux_results["flip_indices"]) kpts[..., 0] -= dx kpts[..., 1] -= dy - annos['keypoints'] = [results['keypoints'], kpts] - annos['keypoints_visible'] = [results['keypoints_visible'], kpts_vis] - annos['area'] = [results['area'], aux_results['area'] * scale_ratio**2] + annos["keypoints"] = [results["keypoints"], kpts] + annos["keypoints_visible"] = [results["keypoints_visible"], kpts_vis] + annos["area"] = [results["area"], aux_results["area"] * scale_ratio**2] for key in annos: annos[key] = np.concatenate(annos[key]) @@ -494,8 +468,8 @@ class YOLOXMixUp(MixImageTransform): def __repr__(self) -> str: repr_str = self.__class__.__name__ - repr_str += f'(img_scale={self.img_scale}, ' - repr_str += f'ratio_range={self.ratio_range}, ' - repr_str += f'flip_ratio={self.flip_ratio}, ' - repr_str += f'pad_val={self.pad_val})' + repr_str += f"(img_scale={self.img_scale}, " + repr_str += f"ratio_range={self.ratio_range}, " + repr_str += f"flip_ratio={self.flip_ratio}, " + repr_str += f"pad_val={self.pad_val})" return repr_str diff --git a/mmpose/datasets/transforms/pose3d_transforms.py b/mmpose/datasets/transforms/pose3d_transforms.py index 9dec8db64ba114dae1d86f3a21b709327787097b..aba286e21873c863730023934ddd4177940c2e15 100644 --- a/mmpose/datasets/transforms/pose3d_transforms.py +++ b/mmpose/datasets/transforms/pose3d_transforms.py @@ -43,12 +43,9 @@ class RandomFlipAroundRoot(BaseTransform): - camera_param (optional) """ - def __init__(self, - keypoints_flip_cfg: dict, - target_flip_cfg: dict, - flip_prob: float = 0.5, - flip_camera: bool = False, - flip_label: bool = False): + def __init__( + self, keypoints_flip_cfg: dict, target_flip_cfg: dict, flip_prob: float = 0.5, flip_camera: bool = False, flip_label: bool = False + ): self.keypoints_flip_cfg = keypoints_flip_cfg self.target_flip_cfg = target_flip_cfg self.flip_prob = flip_prob @@ -69,72 +66,70 @@ class RandomFlipAroundRoot(BaseTransform): if np.random.rand() <= self.flip_prob: if self.flip_label: - assert 'keypoint_labels' in results - assert 'lifting_target_label' in results - keypoints_key = 'keypoint_labels' - keypoints_visible_key = 'keypoint_labels_visible' - target_key = 'lifting_target_label' + assert "keypoint_labels" in results + assert "lifting_target_label" in results + keypoints_key = "keypoint_labels" + keypoints_visible_key = "keypoint_labels_visible" + target_key = "lifting_target_label" else: - assert 'keypoints' in results - assert 'lifting_target' in results - keypoints_key = 'keypoints' - keypoints_visible_key = 'keypoints_visible' - target_key = 'lifting_target' + assert "keypoints" in results + assert "lifting_target" in results + keypoints_key = "keypoints" + keypoints_visible_key = "keypoints_visible" + target_key = "lifting_target" keypoints = results[keypoints_key] if keypoints_visible_key in results: keypoints_visible = results[keypoints_visible_key] else: - keypoints_visible = np.ones( - keypoints.shape[:-1], dtype=np.float32) + keypoints_visible = np.ones(keypoints.shape[:-1], dtype=np.float32) lifting_target = results[target_key] - if 'lifting_target_visible' in results: - lifting_target_visible = results['lifting_target_visible'] + if "lifting_target_visible" in results: + lifting_target_visible = results["lifting_target_visible"] else: - lifting_target_visible = np.ones( - lifting_target.shape[:-1], dtype=np.float32) + lifting_target_visible = np.ones(lifting_target.shape[:-1], dtype=np.float32) - if 'flip_indices' not in results: + if "flip_indices" not in results: flip_indices = list(range(self.num_keypoints)) else: - flip_indices = results['flip_indices'] + flip_indices = results["flip_indices"] # flip joint coordinates - _camera_param = deepcopy(results['camera_param']) + _camera_param = deepcopy(results["camera_param"]) keypoints, keypoints_visible = flip_keypoints_custom_center( keypoints, keypoints_visible, flip_indices, - center_mode=self.keypoints_flip_cfg.get( - 'center_mode', 'static'), - center_x=self.keypoints_flip_cfg.get('center_x', 0.5), - center_index=self.keypoints_flip_cfg.get('center_index', 0)) + center_mode=self.keypoints_flip_cfg.get("center_mode", "static"), + center_x=self.keypoints_flip_cfg.get("center_x", 0.5), + center_index=self.keypoints_flip_cfg.get("center_index", 0), + ) lifting_target, lifting_target_visible = flip_keypoints_custom_center( # noqa lifting_target, lifting_target_visible, flip_indices, - center_mode=self.target_flip_cfg.get('center_mode', 'static'), - center_x=self.target_flip_cfg.get('center_x', 0.5), - center_index=self.target_flip_cfg.get('center_index', 0)) + center_mode=self.target_flip_cfg.get("center_mode", "static"), + center_x=self.target_flip_cfg.get("center_x", 0.5), + center_index=self.target_flip_cfg.get("center_index", 0), + ) results[keypoints_key] = keypoints results[keypoints_visible_key] = keypoints_visible results[target_key] = lifting_target - results['lifting_target_visible'] = lifting_target_visible + results["lifting_target_visible"] = lifting_target_visible # flip horizontal distortion coefficients if self.flip_camera: - assert 'camera_param' in results, \ - 'Camera parameters are missing.' + assert "camera_param" in results, "Camera parameters are missing." - assert 'c' in _camera_param - _camera_param['c'][0] *= -1 + assert "c" in _camera_param + _camera_param["c"][0] *= -1 - if 'p' in _camera_param: - _camera_param['p'][0] *= -1 + if "p" in _camera_param: + _camera_param["p"][0] *= -1 - results['camera_param'].update(_camera_param) + results["camera_param"].update(_camera_param) return results diff --git a/mmpose/datasets/transforms/topdown_transforms.py b/mmpose/datasets/transforms/topdown_transforms.py index 6b60bc0198199b7b9141f0d835888c20c29017cd..d3c3380b6bd6b557453a18d6ea7393bbefe5cbe2 100644 --- a/mmpose/datasets/transforms/topdown_transforms.py +++ b/mmpose/datasets/transforms/topdown_transforms.py @@ -7,7 +7,7 @@ from mmcv.transforms import BaseTransform from mmengine import is_seq_of from mmpose.registry import TRANSFORMS -from mmpose.structures.bbox import get_udp_warp_matrix, get_warp_matrix, bbox_cs2xyxy, bbox_xyxy2cs +from mmpose.structures.bbox import bbox_cs2xyxy, bbox_xyxy2cs, get_udp_warp_matrix, get_warp_matrix @TRANSFORMS.register_module() @@ -42,14 +42,10 @@ class TopdownAffine(BaseTransform): .. _`UDP (CVPR 2020)`: https://arxiv.org/abs/1911.07524 """ - def __init__(self, - input_size: Tuple[int, int], - input_padding: float = 1.25, - use_udp: bool = False) -> None: + def __init__(self, input_size: Tuple[int, int], input_padding: float = 1.25, use_udp: bool = False) -> None: super().__init__() - assert is_seq_of(input_size, int) and len(input_size) == 2, ( - f'Invalid input_size {input_size}') + assert is_seq_of(input_size, int) and len(input_size) == 2, f"Invalid input_size {input_size}" self.input_size = input_size self.use_udp = use_udp @@ -68,9 +64,7 @@ class TopdownAffine(BaseTransform): """ w, h = np.hsplit(bbox_scale, [1]) - bbox_scale = np.where(w > h * aspect_ratio, - np.hstack([w, w / aspect_ratio]), - np.hstack([h * aspect_ratio, h])) + bbox_scale = np.where(w > h * aspect_ratio, np.hstack([w, w / aspect_ratio]), np.hstack([h * aspect_ratio, h])) return bbox_scale def transform(self, results: Dict) -> Optional[dict]: @@ -87,79 +81,68 @@ class TopdownAffine(BaseTransform): w, h = self.input_size warp_size = (int(w), int(h)) - img_h, img_w = results['img'].shape[:2] + img_h, img_w = results["img"].shape[:2] - bbox_xyxy = results['bbox_xyxy_wrt_input'].flatten() + bbox_xyxy = results["bbox_xyxy_wrt_input"].flatten() bbox_xyxy[:2] = np.maximum(bbox_xyxy[:2], 0) bbox_xyxy[2:4] = np.minimum(bbox_xyxy[2:4], [img_w, img_h]) x0, y0, x1, y1 = bbox_xyxy[:4].astype(int) bbox_mask = np.zeros((img_h, img_w), dtype=np.uint8) bbox_mask[y0:y1, x0:x1] = 1 - # Take the bbox wrt the input - bbox_xyxy_wrt_input = results.get('bbox_xyxy_wrt_input', None) + bbox_xyxy_wrt_input = results.get("bbox_xyxy_wrt_input", None) if bbox_xyxy_wrt_input is not None: _c, _s = bbox_xyxy2cs(bbox_xyxy_wrt_input, padding=self.input_padding) - results['bbox_center'] = _c.reshape(1, 2) - results['bbox_scale'] = _s.reshape(1, 2) + results["bbox_center"] = _c.reshape(1, 2) + results["bbox_scale"] = _s.reshape(1, 2) # reshape bbox to fixed aspect ratio - results['bbox_scale'] = self._fix_aspect_ratio( - results['bbox_scale'], aspect_ratio=w / h) + results["bbox_scale"] = self._fix_aspect_ratio(results["bbox_scale"], aspect_ratio=w / h) # TODO: support multi-instance - assert results['bbox_center'].shape[0] == 1, ( - 'Top-down heatmap only supports single instance. Got invalid ' - f'shape of bbox_center {results["bbox_center"].shape}.') - - center = results['bbox_center'][0] - scale = results['bbox_scale'][0] - if 'bbox_rotation' in results: - rot = results['bbox_rotation'][0] + assert results["bbox_center"].shape[0] == 1, ( + "Top-down heatmap only supports single instance. Got invalid " f'shape of bbox_center {results["bbox_center"].shape}.' + ) + + center = results["bbox_center"][0] + scale = results["bbox_scale"][0] + if "bbox_rotation" in results: + rot = results["bbox_rotation"][0] else: - rot = 0. + rot = 0.0 if self.use_udp: - warp_mat = get_udp_warp_matrix( - center, scale, rot, output_size=(w, h)) + warp_mat = get_udp_warp_matrix(center, scale, rot, output_size=(w, h)) else: warp_mat = get_warp_matrix(center, scale, rot, output_size=(w, h)) - if isinstance(results['img'], list): - results['img'] = [ - cv2.warpAffine( - img, warp_mat, warp_size, flags=cv2.INTER_LINEAR) - for img in results['img'] - ] + if isinstance(results["img"], list): + results["img"] = [cv2.warpAffine(img, warp_mat, warp_size, flags=cv2.INTER_LINEAR) for img in results["img"]] else: - results['img'] = cv2.warpAffine( - results['img'], warp_mat, warp_size, flags=cv2.INTER_LINEAR) - bbox_mask = cv2.warpAffine( - bbox_mask, warp_mat, warp_size, flags=cv2.INTER_LINEAR) + results["img"] = cv2.warpAffine(results["img"], warp_mat, warp_size, flags=cv2.INTER_LINEAR) + bbox_mask = cv2.warpAffine(bbox_mask, warp_mat, warp_size, flags=cv2.INTER_LINEAR) bbox_mask = bbox_mask.reshape(1, h, w) - results['bbox_mask'] = bbox_mask + results["bbox_mask"] = bbox_mask - if results.get('keypoints', None) is not None: - if results.get('transformed_keypoints', None) is not None: - transformed_keypoints = results['transformed_keypoints'].copy() + if results.get("keypoints", None) is not None: + if results.get("transformed_keypoints", None) is not None: + transformed_keypoints = results["transformed_keypoints"].copy() else: - transformed_keypoints = results['keypoints'].copy() + transformed_keypoints = results["keypoints"].copy() # Only transform (x, y) coordinates - transformed_keypoints[..., :2] = cv2.transform( - transformed_keypoints[..., :2], warp_mat) - results['transformed_keypoints'] = transformed_keypoints + transformed_keypoints[..., :2] = cv2.transform(transformed_keypoints[..., :2], warp_mat) + results["transformed_keypoints"] = transformed_keypoints - if results.get('bbox_xyxy_wrt_input', None) is not None: - bbox_xyxy_wrt_input = results['bbox_xyxy_wrt_input'].copy() + if results.get("bbox_xyxy_wrt_input", None) is not None: + bbox_xyxy_wrt_input = results["bbox_xyxy_wrt_input"].copy() bbox_xyxy_wrt_input = bbox_xyxy_wrt_input.reshape(1, 2, 2) - bbox_xyxy_wrt_input = cv2.transform( - bbox_xyxy_wrt_input, warp_mat) - results['bbox_xyxy_wrt_input'] = bbox_xyxy_wrt_input.reshape(1, 4) + bbox_xyxy_wrt_input = cv2.transform(bbox_xyxy_wrt_input, warp_mat) + results["bbox_xyxy_wrt_input"] = bbox_xyxy_wrt_input.reshape(1, 4) - results['input_size'] = (w, h) - results['input_center'] = center - results['input_scale'] = scale + results["input_size"] = (w, h) + results["input_center"] = center + results["input_scale"] = scale return results @@ -170,6 +153,6 @@ class TopdownAffine(BaseTransform): str: Formatted string. """ repr_str = self.__class__.__name__ - repr_str += f'(input_size={self.input_size}, ' - repr_str += f'use_udp={self.use_udp})' + repr_str += f"(input_size={self.input_size}, " + repr_str += f"use_udp={self.use_udp})" return repr_str diff --git a/mmpose/demo/body3d_pose_lifter_demo.py b/mmpose/demo/body3d_pose_lifter_demo.py index dbb51a4b9d38320dc981dd978ccb894b2029044e..68d468804f2cb01a5cd4ef1ab23f453709464d88 100644 --- a/mmpose/demo/body3d_pose_lifter_demo.py +++ b/mmpose/demo/body3d_pose_lifter_demo.py @@ -13,19 +13,24 @@ import mmengine import numpy as np from mmengine.logging import print_log -from mmpose.apis import (_track_by_iou, _track_by_oks, - convert_keypoint_definition, extract_pose_sequence, - inference_pose_lifter_model, inference_topdown, - init_model) +from mmpose.apis import ( + _track_by_iou, + _track_by_oks, + convert_keypoint_definition, + extract_pose_sequence, + inference_pose_lifter_model, + inference_topdown, + init_model, +) from mmpose.models.pose_estimators import PoseLifter from mmpose.models.pose_estimators.topdown import TopdownPoseEstimator from mmpose.registry import VISUALIZERS -from mmpose.structures import (PoseDataSample, merge_data_samples, - split_instances) +from mmpose.structures import PoseDataSample, merge_data_samples, split_instances from mmpose.utils import adapt_mmdet_pipeline try: from mmdet.apis import inference_detector, init_detector + has_mmdet = True except (ImportError, ModuleNotFoundError): has_mmdet = False @@ -33,108 +38,79 @@ except (ImportError, ModuleNotFoundError): def parse_args(): parser = ArgumentParser() - parser.add_argument('det_config', help='Config file for detection') - parser.add_argument('det_checkpoint', help='Checkpoint file for detection') - parser.add_argument( - 'pose_estimator_config', - type=str, - default=None, - help='Config file for the 1st stage 2D pose estimator') - parser.add_argument( - 'pose_estimator_checkpoint', - type=str, - default=None, - help='Checkpoint file for the 1st stage 2D pose estimator') + parser.add_argument("det_config", help="Config file for detection") + parser.add_argument("det_checkpoint", help="Checkpoint file for detection") + parser.add_argument("pose_estimator_config", type=str, default=None, help="Config file for the 1st stage 2D pose estimator") + parser.add_argument("pose_estimator_checkpoint", type=str, default=None, help="Checkpoint file for the 1st stage 2D pose estimator") + parser.add_argument("pose_lifter_config", help="Config file for the 2nd stage pose lifter model") + parser.add_argument("pose_lifter_checkpoint", help="Checkpoint file for the 2nd stage pose lifter model") + parser.add_argument("--input", type=str, default="", help="Video path") + parser.add_argument("--show", action="store_true", default=False, help="Whether to show visualizations") parser.add_argument( - 'pose_lifter_config', - help='Config file for the 2nd stage pose lifter model') - parser.add_argument( - 'pose_lifter_checkpoint', - help='Checkpoint file for the 2nd stage pose lifter model') - parser.add_argument('--input', type=str, default='', help='Video path') - parser.add_argument( - '--show', - action='store_true', + "--disable-rebase-keypoint", + action="store_true", default=False, - help='Whether to show visualizations') + help="Whether to disable rebasing the predicted 3D pose so its " + "lowest keypoint has a height of 0 (landing on the ground). Rebase " + "is useful for visualization when the model do not predict the " + "global position of the 3D pose.", + ) parser.add_argument( - '--disable-rebase-keypoint', - action='store_true', + "--disable-norm-pose-2d", + action="store_true", default=False, - help='Whether to disable rebasing the predicted 3D pose so its ' - 'lowest keypoint has a height of 0 (landing on the ground). Rebase ' - 'is useful for visualization when the model do not predict the ' - 'global position of the 3D pose.') + help="Whether to scale the bbox (along with the 2D pose) to the " + "average bbox scale of the dataset, and move the bbox (along with the " + "2D pose) to the average bbox center of the dataset. This is useful " + "when bbox is small, especially in multi-person scenarios.", + ) parser.add_argument( - '--disable-norm-pose-2d', - action='store_true', - default=False, - help='Whether to scale the bbox (along with the 2D pose) to the ' - 'average bbox scale of the dataset, and move the bbox (along with the ' - '2D pose) to the average bbox center of the dataset. This is useful ' - 'when bbox is small, especially in multi-person scenarios.') - parser.add_argument( - '--num-instances', + "--num-instances", type=int, default=1, - help='The number of 3D poses to be visualized in every frame. If ' - 'less than 0, it will be set to the number of pose results in the ' - 'first frame.') - parser.add_argument( - '--output-root', - type=str, - default='', - help='Root of the output video file. ' - 'Default not saving the visualization video.') - parser.add_argument( - '--save-predictions', - action='store_true', - default=False, - help='Whether to save predicted results') - parser.add_argument( - '--device', default='cuda:0', help='Device used for inference') + help="The number of 3D poses to be visualized in every frame. If " + "less than 0, it will be set to the number of pose results in the " + "first frame.", + ) parser.add_argument( - '--det-cat-id', - type=int, - default=0, - help='Category id for bounding box detection model') - parser.add_argument( - '--bbox-thr', - type=float, - default=0.3, - help='Bounding box score threshold') - parser.add_argument('--kpt-thr', type=float, default=0.3) - parser.add_argument( - '--use-oks-tracking', action='store_true', help='Using OKS tracking') - parser.add_argument( - '--tracking-thr', type=float, default=0.3, help='Tracking threshold') - parser.add_argument( - '--show-interval', type=int, default=0, help='Sleep seconds per frame') - parser.add_argument( - '--thickness', - type=int, - default=1, - help='Link thickness for visualization') - parser.add_argument( - '--radius', - type=int, - default=3, - help='Keypoint radius for visualization') + "--output-root", type=str, default="", help="Root of the output video file. " "Default not saving the visualization video." + ) + parser.add_argument("--save-predictions", action="store_true", default=False, help="Whether to save predicted results") + parser.add_argument("--device", default="cuda:0", help="Device used for inference") + parser.add_argument("--det-cat-id", type=int, default=0, help="Category id for bounding box detection model") + parser.add_argument("--bbox-thr", type=float, default=0.3, help="Bounding box score threshold") + parser.add_argument("--kpt-thr", type=float, default=0.3) + parser.add_argument("--use-oks-tracking", action="store_true", help="Using OKS tracking") + parser.add_argument("--tracking-thr", type=float, default=0.3, help="Tracking threshold") + parser.add_argument("--show-interval", type=int, default=0, help="Sleep seconds per frame") + parser.add_argument("--thickness", type=int, default=1, help="Link thickness for visualization") + parser.add_argument("--radius", type=int, default=3, help="Keypoint radius for visualization") parser.add_argument( - '--online', - action='store_true', + "--online", + action="store_true", default=False, - help='Inference mode. If set to True, can not use future frame' - 'information when using multi frames for inference in the 2D pose' - 'detection stage. Default: False.') + help="Inference mode. If set to True, can not use future frame" + "information when using multi frames for inference in the 2D pose" + "detection stage. Default: False.", + ) args = parser.parse_args() return args -def process_one_image(args, detector, frame, frame_idx, pose_estimator, - pose_est_results_last, pose_est_results_list, next_id, - pose_lifter, visualize_frame, visualizer): +def process_one_image( + args, + detector, + frame, + frame_idx, + pose_estimator, + pose_est_results_last, + pose_est_results_list, + next_id, + pose_lifter, + visualize_frame, + visualizer, +): """Visualize detected and predicted keypoints of one image. Pipeline of this function: @@ -206,7 +182,7 @@ def process_one_image(args, detector, frame, frame_idx, pose_estimator, next_id (int): The next track id to be used. """ pose_lift_dataset = pose_lifter.cfg.test_dataloader.dataset - pose_lift_dataset_name = pose_lifter.dataset_meta['dataset_name'] + pose_lift_dataset_name = pose_lifter.dataset_meta["dataset_name"] # First stage: conduct 2D pose detection in a Topdown manner # use detector to obtain person bounding boxes @@ -216,8 +192,7 @@ def process_one_image(args, detector, frame, frame_idx, pose_estimator, # filter out the person instances with category and bbox threshold # e.g. 0 for person in COCO bboxes = pred_instance.bboxes - bboxes = bboxes[np.logical_and(pred_instance.labels == args.det_cat_id, - pred_instance.scores > args.bbox_thr)] + bboxes = bboxes[np.logical_and(pred_instance.labels == args.det_cat_id, pred_instance.scores > args.bbox_thr)] # estimate pose results for current image pose_est_results = inference_topdown(pose_estimator, frame, bboxes) @@ -227,7 +202,7 @@ def process_one_image(args, detector, frame, frame_idx, pose_estimator, else: _track = _track_by_iou - pose_det_dataset_name = pose_estimator.dataset_meta['dataset_name'] + pose_det_dataset_name = pose_estimator.dataset_meta["dataset_name"] pose_est_results_converted = [] # convert 2d pose estimation results into the format for pose-lifting @@ -236,10 +211,9 @@ def process_one_image(args, detector, frame, frame_idx, pose_estimator, pred_instances = data_sample.pred_instances.cpu().numpy() keypoints = pred_instances.keypoints # calculate area and bbox - if 'bboxes' in pred_instances: - areas = np.array([(bbox[2] - bbox[0]) * (bbox[3] - bbox[1]) - for bbox in pred_instances.bboxes]) - pose_est_results[i].pred_instances.set_field(areas, 'areas') + if "bboxes" in pred_instances: + areas = np.array([(bbox[2] - bbox[0]) * (bbox[3] - bbox[1]) for bbox in pred_instances.bboxes]) + pose_est_results[i].pred_instances.set_field(areas, "areas") else: areas, bboxes = [], [] for keypoint in keypoints: @@ -253,9 +227,7 @@ def process_one_image(args, detector, frame, frame_idx, pose_estimator, pose_est_results[i].pred_instances.bboxes = np.array(bboxes) # track id - track_id, pose_est_results_last, _ = _track(data_sample, - pose_est_results_last, - args.tracking_thr) + track_id, pose_est_results_last, _ = _track(data_sample, pose_est_results_last, args.tracking_thr) if track_id == -1: if np.count_nonzero(keypoints[:, :, 1]) >= 3: track_id = next_id @@ -264,27 +236,19 @@ def process_one_image(args, detector, frame, frame_idx, pose_estimator, # If the number of keypoints detected is small, # delete that person instance. keypoints[:, :, 1] = -10 - pose_est_results[i].pred_instances.set_field( - keypoints, 'keypoints') - pose_est_results[i].pred_instances.set_field( - pred_instances.bboxes * 0, 'bboxes') - pose_est_results[i].set_field(pred_instances, 'pred_instances') + pose_est_results[i].pred_instances.set_field(keypoints, "keypoints") + pose_est_results[i].pred_instances.set_field(pred_instances.bboxes * 0, "bboxes") + pose_est_results[i].set_field(pred_instances, "pred_instances") track_id = -1 - pose_est_results[i].set_field(track_id, 'track_id') + pose_est_results[i].set_field(track_id, "track_id") # convert keypoints for pose-lifting pose_est_result_converted = PoseDataSample() - pose_est_result_converted.set_field( - pose_est_results[i].pred_instances.clone(), 'pred_instances') - pose_est_result_converted.set_field( - pose_est_results[i].gt_instances.clone(), 'gt_instances') - keypoints = convert_keypoint_definition(keypoints, - pose_det_dataset_name, - pose_lift_dataset_name) - pose_est_result_converted.pred_instances.set_field( - keypoints, 'keypoints') - pose_est_result_converted.set_field(pose_est_results[i].track_id, - 'track_id') + pose_est_result_converted.set_field(pose_est_results[i].pred_instances.clone(), "pred_instances") + pose_est_result_converted.set_field(pose_est_results[i].gt_instances.clone(), "gt_instances") + keypoints = convert_keypoint_definition(keypoints, pose_det_dataset_name, pose_lift_dataset_name) + pose_est_result_converted.pred_instances.set_field(keypoints, "keypoints") + pose_est_result_converted.set_field(pose_est_results[i].track_id, "track_id") pose_est_results_converted.append(pose_est_result_converted) pose_est_results_list.append(pose_est_results_converted.copy()) @@ -294,29 +258,27 @@ def process_one_image(args, detector, frame, frame_idx, pose_estimator, pose_seq_2d = extract_pose_sequence( pose_est_results_list, frame_idx=frame_idx, - causal=pose_lift_dataset.get('causal', False), - seq_len=pose_lift_dataset.get('seq_len', 1), - step=pose_lift_dataset.get('seq_step', 1)) + causal=pose_lift_dataset.get("causal", False), + seq_len=pose_lift_dataset.get("seq_len", 1), + step=pose_lift_dataset.get("seq_step", 1), + ) # conduct 2D-to-3D pose lifting norm_pose_2d = not args.disable_norm_pose_2d pose_lift_results = inference_pose_lifter_model( - pose_lifter, - pose_seq_2d, - image_size=visualize_frame.shape[:2], - norm_pose_2d=norm_pose_2d) + pose_lifter, pose_seq_2d, image_size=visualize_frame.shape[:2], norm_pose_2d=norm_pose_2d + ) # post-processing for idx, pose_lift_result in enumerate(pose_lift_results): - pose_lift_result.track_id = pose_est_results[idx].get('track_id', 1e4) + pose_lift_result.track_id = pose_est_results[idx].get("track_id", 1e4) pred_instances = pose_lift_result.pred_instances keypoints = pred_instances.keypoints keypoint_scores = pred_instances.keypoint_scores if keypoint_scores.ndim == 3: keypoint_scores = np.squeeze(keypoint_scores, axis=1) - pose_lift_results[ - idx].pred_instances.keypoint_scores = keypoint_scores + pose_lift_results[idx].pred_instances.keypoint_scores = keypoint_scores if keypoints.ndim == 4: keypoints = np.squeeze(keypoints, axis=1) @@ -326,17 +288,15 @@ def process_one_image(args, detector, frame, frame_idx, pose_estimator, # rebase height (z-axis) if not args.disable_rebase_keypoint: - keypoints[..., 2] -= np.min( - keypoints[..., 2], axis=-1, keepdims=True) + keypoints[..., 2] -= np.min(keypoints[..., 2], axis=-1, keepdims=True) pose_lift_results[idx].pred_instances.keypoints = keypoints - pose_lift_results = sorted( - pose_lift_results, key=lambda x: x.get('track_id', 1e4)) + pose_lift_results = sorted(pose_lift_results, key=lambda x: x.get("track_id", 1e4)) pred_3d_data_samples = merge_data_samples(pose_lift_results) det_data_sample = merge_data_samples(pose_est_results) - pred_3d_instances = pred_3d_data_samples.get('pred_instances', None) + pred_3d_instances = pred_3d_data_samples.get("pred_instances", None) if args.num_instances < 0: args.num_instances = len(pose_lift_results) @@ -344,7 +304,7 @@ def process_one_image(args, detector, frame, frame_idx, pose_estimator, # Visualization if visualizer is not None: visualizer.add_datasample( - 'result', + "result", visualize_frame, data_sample=pred_3d_data_samples, det_data_sample=det_data_sample, @@ -355,47 +315,36 @@ def process_one_image(args, detector, frame, frame_idx, pose_estimator, draw_bbox=True, kpt_thr=args.kpt_thr, num_instances=args.num_instances, - wait_time=args.show_interval) + wait_time=args.show_interval, + ) return pose_est_results, pose_est_results_list, pred_3d_instances, next_id def main(): - assert has_mmdet, 'Please install mmdet to run the demo.' + assert has_mmdet, "Please install mmdet to run the demo." args = parse_args() - assert args.show or (args.output_root != '') - assert args.input != '' + assert args.show or (args.output_root != "") + assert args.input != "" assert args.det_config is not None assert args.det_checkpoint is not None - detector = init_detector( - args.det_config, args.det_checkpoint, device=args.device.lower()) + detector = init_detector(args.det_config, args.det_checkpoint, device=args.device.lower()) detector.cfg = adapt_mmdet_pipeline(detector.cfg) - pose_estimator = init_model( - args.pose_estimator_config, - args.pose_estimator_checkpoint, - device=args.device.lower()) + pose_estimator = init_model(args.pose_estimator_config, args.pose_estimator_checkpoint, device=args.device.lower()) - assert isinstance(pose_estimator, TopdownPoseEstimator), 'Only "TopDown"' \ - 'model is supported for the 1st stage (2D pose detection)' + assert isinstance(pose_estimator, TopdownPoseEstimator), 'Only "TopDown"' "model is supported for the 1st stage (2D pose detection)" - det_kpt_color = pose_estimator.dataset_meta.get('keypoint_colors', None) - det_dataset_skeleton = pose_estimator.dataset_meta.get( - 'skeleton_links', None) - det_dataset_link_color = pose_estimator.dataset_meta.get( - 'skeleton_link_colors', None) + det_kpt_color = pose_estimator.dataset_meta.get("keypoint_colors", None) + det_dataset_skeleton = pose_estimator.dataset_meta.get("skeleton_links", None) + det_dataset_link_color = pose_estimator.dataset_meta.get("skeleton_link_colors", None) - pose_lifter = init_model( - args.pose_lifter_config, - args.pose_lifter_checkpoint, - device=args.device.lower()) + pose_lifter = init_model(args.pose_lifter_config, args.pose_lifter_checkpoint, device=args.device.lower()) - assert isinstance(pose_lifter, PoseLifter), \ - 'Only "PoseLifter" model is supported for the 2nd stage ' \ - '(2D-to-3D lifting)' + assert isinstance(pose_lifter, PoseLifter), 'Only "PoseLifter" model is supported for the 2nd stage ' "(2D-to-3D lifting)" pose_lifter.cfg.visualizer.radius = args.radius pose_lifter.cfg.visualizer.line_width = args.thickness @@ -407,33 +356,31 @@ def main(): # the dataset_meta is loaded from the checkpoint visualizer.set_dataset_meta(pose_lifter.dataset_meta) - if args.input == 'webcam': - input_type = 'webcam' + if args.input == "webcam": + input_type = "webcam" else: - input_type = mimetypes.guess_type(args.input)[0].split('/')[0] + input_type = mimetypes.guess_type(args.input)[0].split("/")[0] - if args.output_root == '': + if args.output_root == "": save_output = False else: mmengine.mkdir_or_exist(args.output_root) - output_file = os.path.join(args.output_root, - os.path.basename(args.input)) - if args.input == 'webcam': - output_file += '.mp4' + output_file = os.path.join(args.output_root, os.path.basename(args.input)) + if args.input == "webcam": + output_file += ".mp4" save_output = True if args.save_predictions: - assert args.output_root != '' - args.pred_save_path = f'{args.output_root}/results_' \ - f'{os.path.splitext(os.path.basename(args.input))[0]}.json' + assert args.output_root != "" + args.pred_save_path = f"{args.output_root}/results_" f"{os.path.splitext(os.path.basename(args.input))[0]}.json" if save_output: - fourcc = cv2.VideoWriter_fourcc(*'mp4v') + fourcc = cv2.VideoWriter_fourcc(*"mp4v") pose_est_results_list = [] pred_instances_list = [] - if input_type == 'image': - frame = mmcv.imread(args.input, channel_order='rgb') + if input_type == "image": + frame = mmcv.imread(args.input, channel_order="rgb") _, _, pred_3d_instances, _ = process_one_image( args=args, detector=detector, @@ -445,7 +392,8 @@ def main(): next_id=0, pose_lifter=pose_lifter, visualize_frame=frame, - visualizer=visualizer) + visualizer=visualizer, + ) if args.save_predictions: # save prediction results @@ -455,16 +403,16 @@ def main(): frame_vis = visualizer.get_image() mmcv.imwrite(mmcv.rgb2bgr(frame_vis), output_file) - elif input_type in ['webcam', 'video']: + elif input_type in ["webcam", "video"]: next_id = 0 pose_est_results = [] - if args.input == 'webcam': + if args.input == "webcam": video = cv2.VideoCapture(0) else: video = cv2.VideoCapture(args.input) - (major_ver, minor_ver, subminor_ver) = (cv2.__version__).split('.') + major_ver, minor_ver, subminor_ver = (cv2.__version__).split(".") if int(major_ver) < 3: fps = video.get(cv2.cv.CV_CAP_PROP_FPS) else: @@ -484,35 +432,30 @@ def main(): # First stage: 2D pose detection # make person results for current image - (pose_est_results, pose_est_results_list, pred_3d_instances, - next_id) = process_one_image( - args=args, - detector=detector, - frame=frame, - frame_idx=frame_idx, - pose_estimator=pose_estimator, - pose_est_results_last=pose_est_results_last, - pose_est_results_list=pose_est_results_list, - next_id=next_id, - pose_lifter=pose_lifter, - visualize_frame=mmcv.bgr2rgb(frame), - visualizer=visualizer) + pose_est_results, pose_est_results_list, pred_3d_instances, next_id = process_one_image( + args=args, + detector=detector, + frame=frame, + frame_idx=frame_idx, + pose_estimator=pose_estimator, + pose_est_results_last=pose_est_results_last, + pose_est_results_list=pose_est_results_list, + next_id=next_id, + pose_lifter=pose_lifter, + visualize_frame=mmcv.bgr2rgb(frame), + visualizer=visualizer, + ) if args.save_predictions: # save prediction results - pred_instances_list.append( - dict( - frame_id=frame_idx, - instances=split_instances(pred_3d_instances))) + pred_instances_list.append(dict(frame_id=frame_idx, instances=split_instances(pred_3d_instances))) if save_output: frame_vis = visualizer.get_image() if video_writer is None: # the size of the image with visualization may vary # depending on the presence of heatmaps - video_writer = cv2.VideoWriter(output_file, fourcc, fps, - (frame_vis.shape[1], - frame_vis.shape[0])) + video_writer = cv2.VideoWriter(output_file, fourcc, fps, (frame_vis.shape[1], frame_vis.shape[0])) video_writer.write(mmcv.rgb2bgr(frame_vis)) @@ -528,26 +471,17 @@ def main(): video_writer.release() else: args.save_predictions = False - raise ValueError( - f'file {os.path.basename(args.input)} has invalid format.') + raise ValueError(f"file {os.path.basename(args.input)} has invalid format.") if args.save_predictions: - with open(args.pred_save_path, 'w') as f: - json.dump( - dict( - meta_info=pose_lifter.dataset_meta, - instance_info=pred_instances_list), - f, - indent='\t') - print(f'predictions have been saved at {args.pred_save_path}') + with open(args.pred_save_path, "w") as f: + json.dump(dict(meta_info=pose_lifter.dataset_meta, instance_info=pred_instances_list), f, indent="\t") + print(f"predictions have been saved at {args.pred_save_path}") if save_output: - input_type = input_type.replace('webcam', 'video') - print_log( - f'the output {input_type} has been saved at {output_file}', - logger='current', - level=logging.INFO) + input_type = input_type.replace("webcam", "video") + print_log(f"the output {input_type} has been saved at {output_file}", logger="current", level=logging.INFO) -if __name__ == '__main__': +if __name__ == "__main__": main() diff --git a/mmpose/demo/bottomup_demo.py b/mmpose/demo/bottomup_demo.py index b493e4c4a1abd4c9c93ccf6bb6b03b63e40dcaea..048509d27de16794c6c14542bf6d405e7e0b1f9e 100644 --- a/mmpose/demo/bottomup_demo.py +++ b/mmpose/demo/bottomup_demo.py @@ -17,11 +17,7 @@ from mmpose.registry import VISUALIZERS from mmpose.structures import split_instances -def process_one_image(args, - img, - pose_estimator, - visualizer=None, - show_interval=0): +def process_one_image(args, img, pose_estimator, visualizer=None, show_interval=0): """Visualize predicted keypoints (and heatmaps) of one image.""" # inference a single image @@ -30,13 +26,13 @@ def process_one_image(args, # show the results if isinstance(img, str): - img = mmcv.imread(img, channel_order='rgb') + img = mmcv.imread(img, channel_order="rgb") elif isinstance(img, np.ndarray): img = mmcv.bgr2rgb(img) if visualizer is not None: visualizer.add_datasample( - 'result', + "result", img, data_sample=results, draw_gt=False, @@ -45,79 +41,48 @@ def process_one_image(args, show_kpt_idx=args.show_kpt_idx, show=args.show, wait_time=show_interval, - kpt_thr=args.kpt_thr) + kpt_thr=args.kpt_thr, + ) return results.pred_instances def parse_args(): parser = ArgumentParser() - parser.add_argument('config', help='Config file') - parser.add_argument('checkpoint', help='Checkpoint file') + parser.add_argument("config", help="Config file") + parser.add_argument("checkpoint", help="Checkpoint file") + parser.add_argument("--input", type=str, default="", help="Image/Video file") + parser.add_argument("--show", action="store_true", default=False, help="whether to show img") parser.add_argument( - '--input', type=str, default='', help='Image/Video file') - parser.add_argument( - '--show', - action='store_true', - default=False, - help='whether to show img') - parser.add_argument( - '--output-root', - type=str, - default='', - help='root of the output img file. ' - 'Default not saving the visualization images.') - parser.add_argument( - '--save-predictions', - action='store_true', - default=False, - help='whether to save predicted results') - parser.add_argument( - '--device', default='cuda:0', help='Device used for inference') - parser.add_argument( - '--draw-heatmap', - action='store_true', - help='Visualize the predicted heatmap') - parser.add_argument( - '--show-kpt-idx', - action='store_true', - default=False, - help='Whether to show the index of keypoints') - parser.add_argument( - '--kpt-thr', type=float, default=0.3, help='Keypoint score threshold') - parser.add_argument( - '--radius', - type=int, - default=3, - help='Keypoint radius for visualization') - parser.add_argument( - '--thickness', - type=int, - default=1, - help='Link thickness for visualization') - parser.add_argument( - '--show-interval', type=int, default=0, help='Sleep seconds per frame') + "--output-root", type=str, default="", help="root of the output img file. " "Default not saving the visualization images." + ) + parser.add_argument("--save-predictions", action="store_true", default=False, help="whether to save predicted results") + parser.add_argument("--device", default="cuda:0", help="Device used for inference") + parser.add_argument("--draw-heatmap", action="store_true", help="Visualize the predicted heatmap") + parser.add_argument("--show-kpt-idx", action="store_true", default=False, help="Whether to show the index of keypoints") + parser.add_argument("--kpt-thr", type=float, default=0.3, help="Keypoint score threshold") + parser.add_argument("--radius", type=int, default=3, help="Keypoint radius for visualization") + parser.add_argument("--thickness", type=int, default=1, help="Link thickness for visualization") + parser.add_argument("--show-interval", type=int, default=0, help="Sleep seconds per frame") args = parser.parse_args() return args def main(): args = parse_args() - assert args.show or (args.output_root != '') - assert args.input != '' + assert args.show or (args.output_root != "") + assert args.input != "" output_file = None if args.output_root: mmengine.mkdir_or_exist(args.output_root) - output_file = os.path.join(args.output_root, - os.path.basename(args.input)) - if args.input == 'webcam': - output_file += '.mp4' + output_file = os.path.join(args.output_root, os.path.basename(args.input)) + if args.input == "webcam": + output_file += ".mp4" if args.save_predictions: - assert args.output_root != '' - args.pred_save_path = f'{args.output_root}/results_' \ - f'{os.path.splitext(os.path.basename(args.input))[0]}.json' + assert args.output_root != "" + args.pred_save_path = f"{args.output_root}/results_" f"{os.path.splitext(os.path.basename(args.input))[0]}.json" # build the model from a config file and a checkpoint file if args.draw_heatmap: @@ -125,11 +90,7 @@ def main(): else: cfg_options = None - model = init_model( - args.config, - args.checkpoint, - device=args.device, - cfg_options=cfg_options) + model = init_model(args.config, args.checkpoint, device=args.device, cfg_options=cfg_options) # build visualizer model.cfg.visualizer.radius = args.radius @@ -137,15 +98,14 @@ def main(): visualizer = VISUALIZERS.build(model.cfg.visualizer) visualizer.set_dataset_meta(model.dataset_meta) - if args.input == 'webcam': - input_type = 'webcam' + if args.input == "webcam": + input_type = "webcam" else: - input_type = mimetypes.guess_type(args.input)[0].split('/')[0] + input_type = mimetypes.guess_type(args.input)[0].split("/")[0] - if input_type == 'image': + if input_type == "image": # inference - pred_instances = process_one_image( - args, args.input, model, visualizer, show_interval=0) + pred_instances = process_one_image(args, args.input, model, visualizer, show_interval=0) if args.save_predictions: pred_instances_list = split_instances(pred_instances) @@ -154,9 +114,9 @@ def main(): img_vis = visualizer.get_image() mmcv.imwrite(mmcv.rgb2bgr(img_vis), output_file) - elif input_type in ['webcam', 'video']: + elif input_type in ["webcam", "video"]: - if args.input == 'webcam': + if args.input == "webcam": cap = cv2.VideoCapture(0) else: cap = cv2.VideoCapture(args.input) @@ -172,29 +132,21 @@ def main(): if not success: break - pred_instances = process_one_image(args, frame, model, visualizer, - 0.001) + pred_instances = process_one_image(args, frame, model, visualizer, 0.001) if args.save_predictions: # save prediction results - pred_instances_list.append( - dict( - frame_id=frame_idx, - instances=split_instances(pred_instances))) + pred_instances_list.append(dict(frame_id=frame_idx, instances=split_instances(pred_instances))) # output videos if output_file: frame_vis = visualizer.get_image() if video_writer is None: - fourcc = cv2.VideoWriter_fourcc(*'mp4v') + fourcc = cv2.VideoWriter_fourcc(*"mp4v") # the size of the image with visualization may vary # depending on the presence of heatmaps - video_writer = cv2.VideoWriter( - output_file, - fourcc, - 25, # saved fps - (frame_vis.shape[1], frame_vis.shape[0])) + video_writer = cv2.VideoWriter(output_file, fourcc, 25, (frame_vis.shape[1], frame_vis.shape[0])) # saved fps video_writer.write(mmcv.rgb2bgr(frame_vis)) @@ -212,26 +164,17 @@ def main(): else: args.save_predictions = False - raise ValueError( - f'file {os.path.basename(args.input)} has invalid format.') + raise ValueError(f"file {os.path.basename(args.input)} has invalid format.") if args.save_predictions: - with open(args.pred_save_path, 'w') as f: - json.dump( - dict( - meta_info=model.dataset_meta, - instance_info=pred_instances_list), - f, - indent='\t') - print(f'predictions have been saved at {args.pred_save_path}') + with open(args.pred_save_path, "w") as f: + json.dump(dict(meta_info=model.dataset_meta, instance_info=pred_instances_list), f, indent="\t") + print(f"predictions have been saved at {args.pred_save_path}") if output_file: - input_type = input_type.replace('webcam', 'video') - print_log( - f'the output {input_type} has been saved at {output_file}', - logger='current', - level=logging.INFO) + input_type = input_type.replace("webcam", "video") + print_log(f"the output {input_type} has been saved at {output_file}", logger="current", level=logging.INFO) -if __name__ == '__main__': +if __name__ == "__main__": main() diff --git a/mmpose/demo/hand3d_internet_demo.py b/mmpose/demo/hand3d_internet_demo.py index 1cb10a820a46e38f01dcde3e0f36224784099d79..a460893df56882e7eb6df533d48675ffd8af49c3 100644 --- a/mmpose/demo/hand3d_internet_demo.py +++ b/mmpose/demo/hand3d_internet_demo.py @@ -14,63 +14,34 @@ from mmengine.logging import print_log from mmpose.apis import inference_topdown, init_model from mmpose.registry import VISUALIZERS -from mmpose.structures import (PoseDataSample, merge_data_samples, - split_instances) +from mmpose.structures import PoseDataSample, merge_data_samples, split_instances def parse_args(): parser = ArgumentParser() - parser.add_argument('config', help='Config file') - parser.add_argument('checkpoint', help='Checkpoint file') + parser.add_argument("config", help="Config file") + parser.add_argument("checkpoint", help="Checkpoint file") + parser.add_argument("--input", type=str, default="", help="Image/Video file") parser.add_argument( - '--input', type=str, default='', help='Image/Video file') + "--output-root", type=str, default="", help="root of the output img file. " "Default not saving the visualization images." + ) + parser.add_argument("--save-predictions", action="store_true", default=False, help="whether to save predicted results") parser.add_argument( - '--output-root', - type=str, - default='', - help='root of the output img file. ' - 'Default not saving the visualization images.') - parser.add_argument( - '--save-predictions', - action='store_true', - default=False, - help='whether to save predicted results') - parser.add_argument( - '--disable-rebase-keypoint', - action='store_true', + "--disable-rebase-keypoint", + action="store_true", default=False, - help='Whether to disable rebasing the predicted 3D pose so its ' - 'lowest keypoint has a height of 0 (landing on the ground). Rebase ' - 'is useful for visualization when the model do not predict the ' - 'global position of the 3D pose.') - parser.add_argument( - '--show', - action='store_true', - default=False, - help='whether to show result') - parser.add_argument('--device', default='cpu', help='Device for inference') - parser.add_argument( - '--kpt-thr', - type=float, - default=0.3, - help='Visualizing keypoint thresholds') - parser.add_argument( - '--show-kpt-idx', - action='store_true', - default=False, - help='Whether to show the index of keypoints') - parser.add_argument( - '--show-interval', type=int, default=0, help='Sleep seconds per frame') - parser.add_argument( - '--radius', - type=int, - default=3, - help='Keypoint radius for visualization') - parser.add_argument( - '--thickness', - type=int, - default=1, - help='Link thickness for visualization') + help="Whether to disable rebasing the predicted 3D pose so its " + "lowest keypoint has a height of 0 (landing on the ground). Rebase " + "is useful for visualization when the model do not predict the " + "global position of the 3D pose.", + ) + parser.add_argument("--show", action="store_true", default=False, help="whether to show result") + parser.add_argument("--device", default="cpu", help="Device for inference") + parser.add_argument("--kpt-thr", type=float, default=0.3, help="Visualizing keypoint thresholds") + parser.add_argument("--show-kpt-idx", action="store_true", default=False, help="Whether to show the index of keypoints") + parser.add_argument("--show-interval", type=int, default=0, help="Sleep seconds per frame") + parser.add_argument("--radius", type=int, default=3, help="Keypoint radius for visualization") + parser.add_argument("--thickness", type=int, default=1, help="Link thickness for visualization") args = parser.parse_args() return args @@ -105,7 +76,7 @@ def process_one_image(args, img, model, visualizer=None, show_interval=0): if scores.max() > 1: scores /= 255 - res_2d.pred_instances.set_field(keypoints[..., :2].copy(), 'keypoints') + res_2d.pred_instances.set_field(keypoints[..., :2].copy(), "keypoints") # rotate the keypoint to make z-axis correspondent to height # for better visualization @@ -115,8 +86,7 @@ def process_one_image(args, img, model, visualizer=None, show_interval=0): # rebase height (z-axis) if not args.disable_rebase_keypoint: valid = scores > 0 - keypoints[..., 2] -= np.min( - keypoints[valid, 2], axis=-1, keepdims=True) + keypoints[..., 2] -= np.min(keypoints[valid, 2], axis=-1, keepdims=True) pose_results[idx].pred_instances.keypoints = keypoints pose_results[idx].pred_instances.keypoint_scores = scores @@ -127,13 +97,13 @@ def process_one_image(args, img, model, visualizer=None, show_interval=0): # show the results if isinstance(img, str): - img = mmcv.imread(img, channel_order='rgb') + img = mmcv.imread(img, channel_order="rgb") elif isinstance(img, np.ndarray): img = mmcv.bgr2rgb(img) if visualizer is not None: visualizer.add_datasample( - 'result', + "result", img, data_sample=data_samples, det_data_sample=data_samples_2d, @@ -146,34 +116,32 @@ def process_one_image(args, img, model, visualizer=None, show_interval=0): axis_elev=15, show_kpt_idx=args.show_kpt_idx, show=args.show, - wait_time=show_interval) + wait_time=show_interval, + ) # if there is no instance detected, return None - return data_samples.get('pred_instances', None) + return data_samples.get("pred_instances", None) def main(): args = parse_args() - assert args.input != '' - assert args.show or (args.output_root != '') + assert args.input != "" + assert args.show or (args.output_root != "") output_file = None if args.output_root: mmengine.mkdir_or_exist(args.output_root) - output_file = os.path.join(args.output_root, - os.path.basename(args.input)) - if args.input == 'webcam': - output_file += '.mp4' + output_file = os.path.join(args.output_root, os.path.basename(args.input)) + if args.input == "webcam": + output_file += ".mp4" if args.save_predictions: - assert args.output_root != '' - args.pred_save_path = f'{args.output_root}/results_' \ - f'{os.path.splitext(os.path.basename(args.input))[0]}.json' + assert args.output_root != "" + args.pred_save_path = f"{args.output_root}/results_" f"{os.path.splitext(os.path.basename(args.input))[0]}.json" # build the model from a config file and a checkpoint file - model = init_model( - args.config, args.checkpoint, device=args.device.lower()) + model = init_model(args.config, args.checkpoint, device=args.device.lower()) # init visualizer model.cfg.visualizer.radius = args.radius @@ -182,12 +150,12 @@ def main(): visualizer = VISUALIZERS.build(model.cfg.visualizer) visualizer.set_dataset_meta(model.dataset_meta) - if args.input == 'webcam': - input_type = 'webcam' + if args.input == "webcam": + input_type = "webcam" else: - input_type = mimetypes.guess_type(args.input)[0].split('/')[0] + input_type = mimetypes.guess_type(args.input)[0].split("/")[0] - if input_type == 'image': + if input_type == "image": # inference pred_instances = process_one_image(args, args.input, model, visualizer) @@ -198,9 +166,9 @@ def main(): img_vis = visualizer.get_image() mmcv.imwrite(mmcv.rgb2bgr(img_vis), output_file) - elif input_type in ['webcam', 'video']: + elif input_type in ["webcam", "video"]: - if args.input == 'webcam': + if args.input == "webcam": cap = cv2.VideoCapture(0) else: cap = cv2.VideoCapture(args.input) @@ -217,29 +185,21 @@ def main(): break # topdown pose estimation - pred_instances = process_one_image(args, frame, model, visualizer, - 0.001) + pred_instances = process_one_image(args, frame, model, visualizer, 0.001) if args.save_predictions: # save prediction results - pred_instances_list.append( - dict( - frame_id=frame_idx, - instances=split_instances(pred_instances))) + pred_instances_list.append(dict(frame_id=frame_idx, instances=split_instances(pred_instances))) # output videos if output_file: frame_vis = visualizer.get_image() if video_writer is None: - fourcc = cv2.VideoWriter_fourcc(*'mp4v') + fourcc = cv2.VideoWriter_fourcc(*"mp4v") # the size of the image with visualization may vary # depending on the presence of heatmaps - video_writer = cv2.VideoWriter( - output_file, - fourcc, - 25, # saved fps - (frame_vis.shape[1], frame_vis.shape[0])) + video_writer = cv2.VideoWriter(output_file, fourcc, 25, (frame_vis.shape[1], frame_vis.shape[0])) # saved fps video_writer.write(mmcv.rgb2bgr(frame_vis)) @@ -257,29 +217,17 @@ def main(): else: args.save_predictions = False - raise ValueError( - f'file {os.path.basename(args.input)} has invalid format.') + raise ValueError(f"file {os.path.basename(args.input)} has invalid format.") if args.save_predictions: - with open(args.pred_save_path, 'w') as f: - json.dump( - dict( - meta_info=model.dataset_meta, - instance_info=pred_instances_list), - f, - indent='\t') - print_log( - f'predictions have been saved at {args.pred_save_path}', - logger='current', - level=logging.INFO) + with open(args.pred_save_path, "w") as f: + json.dump(dict(meta_info=model.dataset_meta, instance_info=pred_instances_list), f, indent="\t") + print_log(f"predictions have been saved at {args.pred_save_path}", logger="current", level=logging.INFO) if output_file is not None: - input_type = input_type.replace('webcam', 'video') - print_log( - f'the output {input_type} has been saved at {output_file}', - logger='current', - level=logging.INFO) + input_type = input_type.replace("webcam", "video") + print_log(f"the output {input_type} has been saved at {output_file}", logger="current", level=logging.INFO) -if __name__ == '__main__': +if __name__ == "__main__": main() diff --git a/mmpose/demo/image_demo.py b/mmpose/demo/image_demo.py index 6a408d17605fb5809968317c1357b12386f58b6f..cb1249f82151590cc1dfa3d4065ba3f5ac355430 100644 --- a/mmpose/demo/image_demo.py +++ b/mmpose/demo/image_demo.py @@ -12,49 +12,19 @@ from mmpose.structures import merge_data_samples def parse_args(): parser = ArgumentParser() - parser.add_argument('img', help='Image file') - parser.add_argument('config', help='Config file') - parser.add_argument('checkpoint', help='Checkpoint file') - parser.add_argument('--out-file', default=None, help='Path to output file') - parser.add_argument( - '--device', default='cuda:0', help='Device used for inference') - parser.add_argument( - '--draw-heatmap', - action='store_true', - help='Visualize the predicted heatmap') - parser.add_argument( - '--show-kpt-idx', - action='store_true', - default=False, - help='Whether to show the index of keypoints') - parser.add_argument( - '--skeleton-style', - default='mmpose', - type=str, - choices=['mmpose', 'openpose'], - help='Skeleton style selection') - parser.add_argument( - '--kpt-thr', - type=float, - default=0.3, - help='Visualizing keypoint thresholds') - parser.add_argument( - '--radius', - type=int, - default=3, - help='Keypoint radius for visualization') - parser.add_argument( - '--thickness', - type=int, - default=1, - help='Link thickness for visualization') - parser.add_argument( - '--alpha', type=float, default=0.8, help='The transparency of bboxes') - parser.add_argument( - '--show', - action='store_true', - default=False, - help='whether to show img') + parser.add_argument("img", help="Image file") + parser.add_argument("config", help="Config file") + parser.add_argument("checkpoint", help="Checkpoint file") + parser.add_argument("--out-file", default=None, help="Path to output file") + parser.add_argument("--device", default="cuda:0", help="Device used for inference") + parser.add_argument("--draw-heatmap", action="store_true", help="Visualize the predicted heatmap") + parser.add_argument("--show-kpt-idx", action="store_true", default=False, help="Whether to show the index of keypoints") + parser.add_argument("--skeleton-style", default="mmpose", type=str, choices=["mmpose", "openpose"], help="Skeleton style selection") + parser.add_argument("--kpt-thr", type=float, default=0.3, help="Visualizing keypoint thresholds") + parser.add_argument("--radius", type=int, default=3, help="Keypoint radius for visualization") + parser.add_argument("--thickness", type=int, default=1, help="Link thickness for visualization") + parser.add_argument("--alpha", type=float, default=0.8, help="The transparency of bboxes") + parser.add_argument("--show", action="store_true", default=False, help="whether to show img") args = parser.parse_args() return args @@ -68,11 +38,7 @@ def main(): else: cfg_options = None - model = init_model( - args.config, - args.checkpoint, - device=args.device, - cfg_options=cfg_options) + model = init_model(args.config, args.checkpoint, device=args.device, cfg_options=cfg_options) # init visualizer model.cfg.visualizer.radius = args.radius @@ -80,17 +46,16 @@ def main(): model.cfg.visualizer.line_width = args.thickness visualizer = VISUALIZERS.build(model.cfg.visualizer) - visualizer.set_dataset_meta( - model.dataset_meta, skeleton_style=args.skeleton_style) + visualizer.set_dataset_meta(model.dataset_meta, skeleton_style=args.skeleton_style) # inference a single image batch_results = inference_topdown(model, args.img) results = merge_data_samples(batch_results) # show the results - img = imread(args.img, channel_order='rgb') + img = imread(args.img, channel_order="rgb") visualizer.add_datasample( - 'result', + "result", img, data_sample=results, draw_gt=False, @@ -100,14 +65,12 @@ def main(): show_kpt_idx=args.show_kpt_idx, skeleton_style=args.skeleton_style, show=args.show, - out_file=args.out_file) + out_file=args.out_file, + ) if args.out_file is not None: - print_log( - f'the output image has been saved at {args.out_file}', - logger='current', - level=logging.INFO) + print_log(f"the output image has been saved at {args.out_file}", logger="current", level=logging.INFO) -if __name__ == '__main__': +if __name__ == "__main__": main() diff --git a/mmpose/demo/inferencer_demo.py b/mmpose/demo/inferencer_demo.py index d20c433f4e0f3dac55b2ce2d5f54fea058392407..4bd64c91f8e1f3788cc68fccf02f9cf94d03449b 100644 --- a/mmpose/demo/inferencer_demo.py +++ b/mmpose/demo/inferencer_demo.py @@ -13,70 +13,48 @@ POSE2D_SPECIFIC_ARGS = dict( def parse_args(): parser = ArgumentParser() - parser.add_argument( - 'inputs', - type=str, - nargs='?', - help='Input image/video path or folder path.') + parser.add_argument("inputs", type=str, nargs="?", help="Input image/video path or folder path.") # init args parser.add_argument( - '--pose2d', + "--pose2d", type=str, default=None, - help='Pretrained 2D pose estimation algorithm. It\'s the path to the ' - 'config file or the model name defined in metafile.') + help="Pretrained 2D pose estimation algorithm. It's the path to the " "config file or the model name defined in metafile.", + ) parser.add_argument( - '--pose2d-weights', + "--pose2d-weights", type=str, default=None, - help='Path to the custom checkpoint file of the selected pose model. ' + help="Path to the custom checkpoint file of the selected pose model. " 'If it is not specified and "pose2d" is a model name of metafile, ' - 'the weights will be loaded from metafile.') + "the weights will be loaded from metafile.", + ) parser.add_argument( - '--pose3d', + "--pose3d", type=str, default=None, - help='Pretrained 3D pose estimation algorithm. It\'s the path to the ' - 'config file or the model name defined in metafile.') + help="Pretrained 3D pose estimation algorithm. It's the path to the " "config file or the model name defined in metafile.", + ) parser.add_argument( - '--pose3d-weights', + "--pose3d-weights", type=str, default=None, - help='Path to the custom checkpoint file of the selected pose model. ' + help="Path to the custom checkpoint file of the selected pose model. " 'If it is not specified and "pose3d" is a model name of metafile, ' - 'the weights will be loaded from metafile.') - parser.add_argument( - '--det-model', - type=str, - default=None, - help='Config path or alias of detection model.') - parser.add_argument( - '--det-weights', - type=str, - default=None, - help='Path to the checkpoints of detection model.') - parser.add_argument( - '--det-cat-ids', - type=int, - nargs='+', - default=0, - help='Category id for detection model.') - parser.add_argument( - '--scope', - type=str, - default='mmpose', - help='Scope where modules are defined.') - parser.add_argument( - '--device', + "the weights will be loaded from metafile.", + ) + parser.add_argument("--det-model", type=str, default=None, help="Config path or alias of detection model.") + parser.add_argument("--det-weights", type=str, default=None, help="Path to the checkpoints of detection model.") + parser.add_argument("--det-cat-ids", type=int, nargs="+", default=0, help="Category id for detection model.") + parser.add_argument("--scope", type=str, default="mmpose", help="Scope where modules are defined.") + parser.add_argument( + "--device", type=str, default=None, - help='Device used for inference. ' - 'If not specified, the available device will be automatically used.') - parser.add_argument( - '--show-progress', - action='store_true', - help='Display the progress bar during inference.') + help="Device used for inference. " "If not specified, the available device will be automatically used.", + ) + parser.add_argument("--show-progress", action="store_true", help="Display the progress bar during inference.") # The default arguments for prediction filtering differ for top-down # and bottom-up models. We assign the default arguments according to the @@ -88,111 +66,72 @@ def parse_args(): break # call args - parser.add_argument( - '--show', - action='store_true', - help='Display the image/video in a popup window.') - parser.add_argument( - '--draw-bbox', - action='store_true', - help='Whether to draw the bounding boxes.') - parser.add_argument( - '--draw-heatmap', - action='store_true', + parser.add_argument("--show", action="store_true", help="Display the image/video in a popup window.") + parser.add_argument("--draw-bbox", action="store_true", help="Whether to draw the bounding boxes.") + parser.add_argument("--draw-heatmap", action="store_true", default=False, help="Whether to draw the predicted heatmaps.") + parser.add_argument("--bbox-thr", type=float, default=filter_args["bbox_thr"], help="Bounding box score threshold") + parser.add_argument("--nms-thr", type=float, default=filter_args["nms_thr"], help="IoU threshold for bounding box NMS") + parser.add_argument( + "--pose-based-nms", + type=lambda arg: arg.lower() in ("true", "yes", "t", "y", "1"), + default=filter_args["pose_based_nms"], + help="Whether to use pose-based NMS", + ) + parser.add_argument("--kpt-thr", type=float, default=0.3, help="Keypoint score threshold") + parser.add_argument("--tracking-thr", type=float, default=0.3, help="Tracking threshold") + parser.add_argument("--use-oks-tracking", action="store_true", help="Whether to use OKS as similarity in tracking") + parser.add_argument( + "--disable-norm-pose-2d", + action="store_true", + help="Whether to scale the bbox (along with the 2D pose) to the " + "average bbox scale of the dataset, and move the bbox (along with the " + "2D pose) to the average bbox center of the dataset. This is useful " + "when bbox is small, especially in multi-person scenarios.", + ) + parser.add_argument( + "--disable-rebase-keypoint", + action="store_true", default=False, - help='Whether to draw the predicted heatmaps.') - parser.add_argument( - '--bbox-thr', - type=float, - default=filter_args['bbox_thr'], - help='Bounding box score threshold') - parser.add_argument( - '--nms-thr', - type=float, - default=filter_args['nms_thr'], - help='IoU threshold for bounding box NMS') - parser.add_argument( - '--pose-based-nms', - type=lambda arg: arg.lower() in ('true', 'yes', 't', 'y', '1'), - default=filter_args['pose_based_nms'], - help='Whether to use pose-based NMS') + help="Whether to disable rebasing the predicted 3D pose so its " + "lowest keypoint has a height of 0 (landing on the ground). Rebase " + "is useful for visualization when the model do not predict the " + "global position of the 3D pose.", + ) parser.add_argument( - '--kpt-thr', type=float, default=0.3, help='Keypoint score threshold') - parser.add_argument( - '--tracking-thr', type=float, default=0.3, help='Tracking threshold') - parser.add_argument( - '--use-oks-tracking', - action='store_true', - help='Whether to use OKS as similarity in tracking') - parser.add_argument( - '--disable-norm-pose-2d', - action='store_true', - help='Whether to scale the bbox (along with the 2D pose) to the ' - 'average bbox scale of the dataset, and move the bbox (along with the ' - '2D pose) to the average bbox center of the dataset. This is useful ' - 'when bbox is small, especially in multi-person scenarios.') - parser.add_argument( - '--disable-rebase-keypoint', - action='store_true', - default=False, - help='Whether to disable rebasing the predicted 3D pose so its ' - 'lowest keypoint has a height of 0 (landing on the ground). Rebase ' - 'is useful for visualization when the model do not predict the ' - 'global position of the 3D pose.') - parser.add_argument( - '--num-instances', + "--num-instances", type=int, default=1, - help='The number of 3D poses to be visualized in every frame. If ' - 'less than 0, it will be set to the number of pose results in the ' - 'first frame.') - parser.add_argument( - '--radius', - type=int, - default=3, - help='Keypoint radius for visualization.') - parser.add_argument( - '--thickness', - type=int, - default=1, - help='Link thickness for visualization.') - parser.add_argument( - '--skeleton-style', - default='mmpose', - type=str, - choices=['mmpose', 'openpose'], - help='Skeleton style selection') - parser.add_argument( - '--black-background', - action='store_true', - help='Plot predictions on a black image') - parser.add_argument( - '--vis-out-dir', - type=str, - default='', - help='Directory for saving visualized results.') - parser.add_argument( - '--pred-out-dir', - type=str, - default='', - help='Directory for saving inference results.') - parser.add_argument( - '--show-alias', - action='store_true', - help='Display all the available model aliases.') + help="The number of 3D poses to be visualized in every frame. If " + "less than 0, it will be set to the number of pose results in the " + "first frame.", + ) + parser.add_argument("--radius", type=int, default=3, help="Keypoint radius for visualization.") + parser.add_argument("--thickness", type=int, default=1, help="Link thickness for visualization.") + parser.add_argument("--skeleton-style", default="mmpose", type=str, choices=["mmpose", "openpose"], help="Skeleton style selection") + parser.add_argument("--black-background", action="store_true", help="Plot predictions on a black image") + parser.add_argument("--vis-out-dir", type=str, default="", help="Directory for saving visualized results.") + parser.add_argument("--pred-out-dir", type=str, default="", help="Directory for saving inference results.") + parser.add_argument("--show-alias", action="store_true", help="Display all the available model aliases.") call_args = vars(parser.parse_args()) init_kws = [ - 'pose2d', 'pose2d_weights', 'scope', 'device', 'det_model', - 'det_weights', 'det_cat_ids', 'pose3d', 'pose3d_weights', - 'show_progress' + "pose2d", + "pose2d_weights", + "scope", + "device", + "det_model", + "det_weights", + "det_cat_ids", + "pose3d", + "pose3d_weights", + "show_progress", ] init_args = {} for init_kw in init_kws: init_args[init_kw] = call_args.pop(init_kw) - display_alias = call_args.pop('show_alias') + display_alias = call_args.pop("show_alias") return init_args, call_args, display_alias @@ -204,13 +143,13 @@ def display_model_aliases(model_aliases: Dict[str, str]) -> None: max_alias_length = max(map(len, aliases)) print(f'{"ALIAS".ljust(max_alias_length+2)}MODEL_NAME') for alias in sorted(aliases): - print(f'{alias.ljust(max_alias_length+2)}{model_aliases[alias]}') + print(f"{alias.ljust(max_alias_length+2)}{model_aliases[alias]}") def main(): init_args, call_args, display_alias = parse_args() if display_alias: - model_alises = get_model_aliases(init_args['scope']) + model_alises = get_model_aliases(init_args["scope"]) display_model_aliases(model_alises) else: inferencer = MMPoseInferencer(**init_args) @@ -218,5 +157,5 @@ def main(): pass -if __name__ == '__main__': +if __name__ == "__main__": main() diff --git a/mmpose/demo/mmdetection_cfg/cascade_rcnn_x101_64x4d_fpn_1class.py b/mmpose/demo/mmdetection_cfg/cascade_rcnn_x101_64x4d_fpn_1class.py index 0ccb78cfcab59b58839f8165dbf157b4d34721d2..63b4e5b02aa020042f817fc00350ead81cc9472a 100644 --- a/mmpose/demo/mmdetection_cfg/cascade_rcnn_x101_64x4d_fpn_1class.py +++ b/mmpose/demo/mmdetection_cfg/cascade_rcnn_x101_64x4d_fpn_1class.py @@ -1,270 +1,203 @@ +# Copyright (c) OpenMMLab. All rights reserved. # runtime settings -default_scope = 'mmdet' +default_scope = "mmdet" default_hooks = dict( - timer=dict(type='IterTimerHook'), - logger=dict(type='LoggerHook', interval=50), - param_scheduler=dict(type='ParamSchedulerHook'), - checkpoint=dict(type='CheckpointHook', interval=1), - sampler_seed=dict(type='DistSamplerSeedHook'), - visualization=dict(type='DetVisualizationHook')) + timer=dict(type="IterTimerHook"), + logger=dict(type="LoggerHook", interval=50), + param_scheduler=dict(type="ParamSchedulerHook"), + checkpoint=dict(type="CheckpointHook", interval=1), + sampler_seed=dict(type="DistSamplerSeedHook"), + visualization=dict(type="DetVisualizationHook"), +) env_cfg = dict( cudnn_benchmark=False, - mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), - dist_cfg=dict(backend='nccl'), + mp_cfg=dict(mp_start_method="fork", opencv_num_threads=0), + dist_cfg=dict(backend="nccl"), ) -vis_backends = [dict(type='LocalVisBackend')] -visualizer = dict( - type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer') -log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True) +vis_backends = [dict(type="LocalVisBackend")] +visualizer = dict(type="DetLocalVisualizer", vis_backends=vis_backends, name="visualizer") +log_processor = dict(type="LogProcessor", window_size=50, by_epoch=True) -log_level = 'INFO' +log_level = "INFO" load_from = None resume = False # model settings model = dict( - type='CascadeRCNN', + type="CascadeRCNN", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_mask=True, - pad_size_divisor=32), + pad_size_divisor=32, + ), backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="open-mmlab://resnext101_64x4d"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.0), + ), roi_head=dict( - type='CascadeRoIHead', + type="CascadeRoIHead", num_stages=3, stage_loss_weights=[1, 0.5, 0.25], bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=[ dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.05, 0.05, 0.1, 0.1]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.033, 0.033, 0.067, 0.067]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) - ]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), + ], + ), # model training and testing settings train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=[ dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False), + debug=False, + ), dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.6, - neg_iou_thr=0.6, - min_pos_iou=0.6, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False), + debug=False, + ), dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.7, - min_pos_iou=0.7, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.7, min_pos_iou=0.7, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False) - ]), + debug=False, + ), + ], + ), test_cfg=dict( - rpn=dict( - nms_pre=1000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100))) + rpn=dict(nms_pre=1000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), + ), +) # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type="LoadImageFromFile"), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox', - format_only=False) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/instances_val2017.json", metric="bbox", format_only=False) test_evaluator = val_evaluator diff --git a/mmpose/demo/mmdetection_cfg/cascade_rcnn_x101_64x4d_fpn_coco.py b/mmpose/demo/mmdetection_cfg/cascade_rcnn_x101_64x4d_fpn_coco.py index f91bd0d105b9394c514ffb82d54117dba347680a..87579ca402d9025360dad80d7172eaf5e58af1c2 100644 --- a/mmpose/demo/mmdetection_cfg/cascade_rcnn_x101_64x4d_fpn_coco.py +++ b/mmpose/demo/mmdetection_cfg/cascade_rcnn_x101_64x4d_fpn_coco.py @@ -1,256 +1,192 @@ +# Copyright (c) OpenMMLab. All rights reserved. checkpoint_config = dict(interval=1) # yapf:disable log_config = dict( interval=50, hooks=[ - dict(type='TextLoggerHook'), + dict(type="TextLoggerHook"), # dict(type='TensorboardLoggerHook') - ]) + ], +) # yapf:enable -dist_params = dict(backend='nccl') -log_level = 'INFO' +dist_params = dict(backend="nccl") +log_level = "INFO" load_from = None resume_from = None -workflow = [('train', 1)] +workflow = [("train", 1)] # optimizer -optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) +optimizer = dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) # learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=0.001, - step=[16, 19]) +lr_config = dict(policy="step", warmup="linear", warmup_iters=500, warmup_ratio=0.001, step=[16, 19]) total_epochs = 20 # model settings model = dict( - type='CascadeRCNN', - pretrained='open-mmlab://resnext101_64x4d', + type="CascadeRCNN", + pretrained="open-mmlab://resnext101_64x4d", backbone=dict( - type='ResNeXt', + type="ResNeXt", depth=101, groups=64, base_width=4, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + norm_cfg=dict(type="BN", requires_grad=True), + style="pytorch", + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.0), + ), roi_head=dict( - type='CascadeRoIHead', + type="CascadeRoIHead", num_stages=3, stage_loss_weights=[1, 0.5, 0.25], bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=[ dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.05, 0.05, 0.1, 0.1]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, - loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.033, 0.033, 0.067, 0.067]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=True, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) - ]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=1.0, loss_weight=1.0), + ), + ], + ), # model training and testing settings train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=2000, - max_per_img=2000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=2000, max_per_img=2000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=[ dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False), + debug=False, + ), dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.6, - neg_iou_thr=0.6, - min_pos_iou=0.6, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False), + debug=False, + ), dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.7, - min_pos_iou=0.7, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.7, min_pos_iou=0.7, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False) - ]), + debug=False, + ), + ], + ), test_cfg=dict( - rpn=dict( - nms_pre=1000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100))) + rpn=dict(nms_pre=1000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), + ), +) -dataset_type = 'CocoDataset' -data_root = 'data/coco' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +dataset_type = "CocoDataset" +data_root = "data/coco" +img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", img_scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", flip_ratio=0.5), + dict(type="Normalize", **img_norm_cfg), + dict(type="Pad", size_divisor=32), + dict(type="DefaultFormatBundle"), + dict(type="Collect", keys=["img", "gt_bboxes", "gt_labels"]), ] test_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type="LoadImageFromFile"), dict( - type='MultiScaleFlipAug', + type="MultiScaleFlipAug", img_scale=(1333, 800), flip=False, transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img']), - ]) + dict(type="Resize", keep_ratio=True), + dict(type="RandomFlip"), + dict(type="Normalize", **img_norm_cfg), + dict(type="Pad", size_divisor=32), + dict(type="DefaultFormatBundle"), + dict(type="Collect", keys=["img"]), + ], + ), ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type=dataset_type, - ann_file=f'{data_root}/annotations/instances_train2017.json', - img_prefix=f'{data_root}/train2017/', - pipeline=train_pipeline), + ann_file=f"{data_root}/annotations/instances_train2017.json", + img_prefix=f"{data_root}/train2017/", + pipeline=train_pipeline, + ), val=dict( type=dataset_type, - ann_file=f'{data_root}/annotations/instances_val2017.json', - img_prefix=f'{data_root}/val2017/', - pipeline=test_pipeline), + ann_file=f"{data_root}/annotations/instances_val2017.json", + img_prefix=f"{data_root}/val2017/", + pipeline=test_pipeline, + ), test=dict( type=dataset_type, - ann_file=f'{data_root}/annotations/instances_val2017.json', - img_prefix=f'{data_root}/val2017/', - pipeline=test_pipeline)) -evaluation = dict(interval=1, metric='bbox') + ann_file=f"{data_root}/annotations/instances_val2017.json", + img_prefix=f"{data_root}/val2017/", + pipeline=test_pipeline, + ), +) +evaluation = dict(interval=1, metric="bbox") diff --git a/mmpose/demo/mmdetection_cfg/faster_rcnn_r50_fpn_1class.py b/mmpose/demo/mmdetection_cfg/faster_rcnn_r50_fpn_1class.py index ee54f5b66bd216c485db0a56a68bf2793428d123..9e4da5292cd50b7010413aeca1d2744c57e216b9 100644 --- a/mmpose/demo/mmdetection_cfg/faster_rcnn_r50_fpn_1class.py +++ b/mmpose/demo/mmdetection_cfg/faster_rcnn_r50_fpn_1class.py @@ -1,182 +1,147 @@ +# Copyright (c) OpenMMLab. All rights reserved. checkpoint_config = dict(interval=1) # yapf:disable log_config = dict( interval=50, hooks=[ - dict(type='TextLoggerHook'), + dict(type="TextLoggerHook"), # dict(type='TensorboardLoggerHook') - ]) + ], +) # yapf:enable -dist_params = dict(backend='nccl') -log_level = 'INFO' +dist_params = dict(backend="nccl") +log_level = "INFO" load_from = None resume_from = None -workflow = [('train', 1)] +workflow = [("train", 1)] # optimizer -optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) +optimizer = dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) # learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=0.001, - step=[8, 11]) +lr_config = dict(policy="step", warmup="linear", warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) total_epochs = 12 model = dict( - type='FasterRCNN', - pretrained='torchvision://resnet50', + type="FasterRCNN", + pretrained="torchvision://resnet50", backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + style="pytorch", + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), roi_head=dict( - type='StandardRoIHead', + type="StandardRoIHead", bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0))), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), + ), # model training and testing settings train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=2000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=2000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False)), + debug=False, + ), + ), test_cfg=dict( - rpn=dict( - nms_pre=1000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100) + rpn=dict(nms_pre=1000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), # soft-nms is also supported for rcnn testing # e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05) - )) + ), +) -dataset_type = 'CocoDataset' -data_root = 'data/coco' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +dataset_type = "CocoDataset" +data_root = "data/coco" +img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", img_scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", flip_ratio=0.5), + dict(type="Normalize", **img_norm_cfg), + dict(type="Pad", size_divisor=32), + dict(type="DefaultFormatBundle"), + dict(type="Collect", keys=["img", "gt_bboxes", "gt_labels"]), ] test_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type="LoadImageFromFile"), dict( - type='MultiScaleFlipAug', + type="MultiScaleFlipAug", img_scale=(1333, 800), flip=False, transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img']), - ]) + dict(type="Resize", keep_ratio=True), + dict(type="RandomFlip"), + dict(type="Normalize", **img_norm_cfg), + dict(type="Pad", size_divisor=32), + dict(type="DefaultFormatBundle"), + dict(type="Collect", keys=["img"]), + ], + ), ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type=dataset_type, - ann_file=f'{data_root}/annotations/instances_train2017.json', - img_prefix=f'{data_root}/train2017/', - pipeline=train_pipeline), + ann_file=f"{data_root}/annotations/instances_train2017.json", + img_prefix=f"{data_root}/train2017/", + pipeline=train_pipeline, + ), val=dict( type=dataset_type, - ann_file=f'{data_root}/annotations/instances_val2017.json', - img_prefix=f'{data_root}/val2017/', - pipeline=test_pipeline), + ann_file=f"{data_root}/annotations/instances_val2017.json", + img_prefix=f"{data_root}/val2017/", + pipeline=test_pipeline, + ), test=dict( type=dataset_type, - ann_file=f'{data_root}/annotations/instances_val2017.json', - img_prefix=f'{data_root}/val2017/', - pipeline=test_pipeline)) -evaluation = dict(interval=1, metric='bbox') + ann_file=f"{data_root}/annotations/instances_val2017.json", + img_prefix=f"{data_root}/val2017/", + pipeline=test_pipeline, + ), +) +evaluation = dict(interval=1, metric="bbox") diff --git a/mmpose/demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py b/mmpose/demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py index 5bceed65ba1359995caf82f614d6d1a7b86da460..e65b30ec1771ae6d79e0f95c044d7fd49cba0b3a 100644 --- a/mmpose/demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py +++ b/mmpose/demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py @@ -1,196 +1,155 @@ +# Copyright (c) OpenMMLab. All rights reserved. # runtime settings -default_scope = 'mmdet' +default_scope = "mmdet" default_hooks = dict( - timer=dict(type='IterTimerHook'), - logger=dict(type='LoggerHook', interval=50), - param_scheduler=dict(type='ParamSchedulerHook'), - checkpoint=dict(type='CheckpointHook', interval=1), - sampler_seed=dict(type='DistSamplerSeedHook'), - visualization=dict(type='DetVisualizationHook')) + timer=dict(type="IterTimerHook"), + logger=dict(type="LoggerHook", interval=50), + param_scheduler=dict(type="ParamSchedulerHook"), + checkpoint=dict(type="CheckpointHook", interval=1), + sampler_seed=dict(type="DistSamplerSeedHook"), + visualization=dict(type="DetVisualizationHook"), +) env_cfg = dict( cudnn_benchmark=False, - mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), - dist_cfg=dict(backend='nccl'), + mp_cfg=dict(mp_start_method="fork", opencv_num_threads=0), + dist_cfg=dict(backend="nccl"), ) -vis_backends = [dict(type='LocalVisBackend')] -visualizer = dict( - type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer') -log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True) +vis_backends = [dict(type="LocalVisBackend")] +visualizer = dict(type="DetLocalVisualizer", vis_backends=vis_backends, name="visualizer") +log_processor = dict(type="LogProcessor", window_size=50, by_epoch=True) -log_level = 'INFO' +log_level = "INFO" load_from = None resume = False # model settings model = dict( - type='FasterRCNN', + type="FasterRCNN", data_preprocessor=dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=32), + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32 + ), backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), roi_head=dict( - type='StandardRoIHead', + type="StandardRoIHead", bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0., 0., 0., 0.], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0))), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), + ), # model training and testing settings train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=2000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=2000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False)), + debug=False, + ), + ), test_cfg=dict( - rpn=dict( - nms_pre=1000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100) + rpn=dict(nms_pre=1000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), # soft-nms is also supported for rcnn testing # e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05) - )) + ), +) # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', prob=0.5), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", prob=0.5), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='Resize', scale=(1333, 800), keep_ratio=True), + dict(type="LoadImageFromFile"), + dict(type="Resize", scale=(1333, 800), keep_ratio=True), # If you don't have a gt annotation, delete the pipeline - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), - batch_sampler=dict(type='AspectRatioBatchSampler'), + sampler=dict(type="DefaultSampler", shuffle=True), + batch_sampler=dict(type="AspectRatioBatchSampler"), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline)) + pipeline=train_pipeline, + ), +) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + ), +) test_dataloader = val_dataloader -val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'annotations/instances_val2017.json', - metric='bbox', - format_only=False) +val_evaluator = dict(type="CocoMetric", ann_file=data_root + "annotations/instances_val2017.json", metric="bbox", format_only=False) test_evaluator = val_evaluator diff --git a/mmpose/demo/mmdetection_cfg/mask_rcnn_r50_fpn_2x_coco.py b/mmpose/demo/mmdetection_cfg/mask_rcnn_r50_fpn_2x_coco.py index 05d39fa9a87a0200f9b9d29cd19acd28c155d126..02d80df2697e6ad811bfe2bce6c792cce6a3560b 100644 --- a/mmpose/demo/mmdetection_cfg/mask_rcnn_r50_fpn_2x_coco.py +++ b/mmpose/demo/mmdetection_cfg/mask_rcnn_r50_fpn_2x_coco.py @@ -1,242 +1,187 @@ +# Copyright (c) OpenMMLab. All rights reserved. model = dict( - type='MaskRCNN', + type="MaskRCNN", backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0.0, 0.0, 0.0, 0.0], - target_stds=[1.0, 1.0, 1.0, 1.0]), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), roi_head=dict( - type='StandardRoIHead', + type="StandardRoIHead", bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0.0, 0.0, 0.0, 0.0], - target_stds=[0.1, 0.1, 0.2, 0.2]), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), - loss_bbox=dict(type='L1Loss', loss_weight=1.0)), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="L1Loss", loss_weight=1.0), + ), mask_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), mask_head=dict( - type='FCNMaskHead', + type="FCNMaskHead", num_convs=4, in_channels=256, conv_out_channels=256, num_classes=80, - loss_mask=dict( - type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), + loss_mask=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0), + ), + ), train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=2000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=2000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, - debug=False)), + debug=False, + ), + ), test_cfg=dict( - rpn=dict( - nms_pre=1000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100, - mask_thr_binary=0.5))) -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) + rpn=dict(nms_pre=1000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100, mask_thr_binary=0.5), + ), +) +dataset_type = "CocoDataset" +data_root = "data/coco/" +img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict( - type='Normalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="Resize", img_scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", flip_ratio=0.5), + dict(type="Normalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="Pad", size_divisor=32), + dict(type="DefaultFormatBundle"), + dict(type="Collect", keys=["img", "gt_bboxes", "gt_labels", "gt_masks"]), ] test_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type="LoadImageFromFile"), dict( - type='MultiScaleFlipAug', + type="MultiScaleFlipAug", img_scale=(1333, 800), flip=False, transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict( - type='Normalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']) - ]) + dict(type="Resize", keep_ratio=True), + dict(type="RandomFlip"), + dict(type="Normalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="Pad", size_divisor=32), + dict(type="ImageToTensor", keys=["img"]), + dict(type="Collect", keys=["img"]), + ], + ), ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( - type='CocoDataset', - ann_file='data/coco/annotations/instances_train2017.json', - img_prefix='data/coco/train2017/', + type="CocoDataset", + ann_file="data/coco/annotations/instances_train2017.json", + img_prefix="data/coco/train2017/", pipeline=[ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True, with_mask=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict( - type='Normalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict( - type='Collect', - keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) - ]), + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True, with_mask=True), + dict(type="Resize", img_scale=(1333, 800), keep_ratio=True), + dict(type="RandomFlip", flip_ratio=0.5), + dict(type="Normalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="Pad", size_divisor=32), + dict(type="DefaultFormatBundle"), + dict(type="Collect", keys=["img", "gt_bboxes", "gt_labels", "gt_masks"]), + ], + ), val=dict( - type='CocoDataset', - ann_file='data/coco/annotations/instances_val2017.json', - img_prefix='data/coco/val2017/', + type="CocoDataset", + ann_file="data/coco/annotations/instances_val2017.json", + img_prefix="data/coco/val2017/", pipeline=[ - dict(type='LoadImageFromFile'), + dict(type="LoadImageFromFile"), dict( - type='MultiScaleFlipAug', + type="MultiScaleFlipAug", img_scale=(1333, 800), flip=False, transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict( - type='Normalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']) - ]) - ]), + dict(type="Resize", keep_ratio=True), + dict(type="RandomFlip"), + dict(type="Normalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="Pad", size_divisor=32), + dict(type="ImageToTensor", keys=["img"]), + dict(type="Collect", keys=["img"]), + ], + ), + ], + ), test=dict( - type='CocoDataset', - ann_file='data/coco/annotations/instances_val2017.json', - img_prefix='data/coco/val2017/', + type="CocoDataset", + ann_file="data/coco/annotations/instances_val2017.json", + img_prefix="data/coco/val2017/", pipeline=[ - dict(type='LoadImageFromFile'), + dict(type="LoadImageFromFile"), dict( - type='MultiScaleFlipAug', + type="MultiScaleFlipAug", img_scale=(1333, 800), flip=False, transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict( - type='Normalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']) - ]) - ])) -evaluation = dict(metric=['bbox', 'segm']) -optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) + dict(type="Resize", keep_ratio=True), + dict(type="RandomFlip"), + dict(type="Normalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="Pad", size_divisor=32), + dict(type="ImageToTensor", keys=["img"]), + dict(type="Collect", keys=["img"]), + ], + ), + ], + ), +) +evaluation = dict(metric=["bbox", "segm"]) +optimizer = dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=0.001, - step=[16, 22]) -runner = dict(type='EpochBasedRunner', max_epochs=24) +lr_config = dict(policy="step", warmup="linear", warmup_iters=500, warmup_ratio=0.001, step=[16, 22]) +runner = dict(type="EpochBasedRunner", max_epochs=24) checkpoint_config = dict(interval=1) -log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) -custom_hooks = [dict(type='NumClassCheckHook')] -dist_params = dict(backend='nccl') -log_level = 'INFO' +log_config = dict(interval=50, hooks=[dict(type="TextLoggerHook")]) +custom_hooks = [dict(type="NumClassCheckHook")] +dist_params = dict(backend="nccl") +log_level = "INFO" load_from = None resume_from = None -workflow = [('train', 1)] +workflow = [("train", 1)] diff --git a/mmpose/demo/mmdetection_cfg/rtmdet_m_640-8xb32_coco-person.py b/mmpose/demo/mmdetection_cfg/rtmdet_m_640-8xb32_coco-person.py index 620de8dc8f038f7267bc566e04afd8b647ba75da..b4e24778a1bf075347d31e2a517efc6bd764cfd0 100644 --- a/mmpose/demo/mmdetection_cfg/rtmdet_m_640-8xb32_coco-person.py +++ b/mmpose/demo/mmdetection_cfg/rtmdet_m_640-8xb32_coco-person.py @@ -1,20 +1,17 @@ -_base_ = 'mmdet::rtmdet/rtmdet_m_8xb32-300e_coco.py' +# Copyright (c) OpenMMLab. All rights reserved. +_base_ = "mmdet::rtmdet/rtmdet_m_8xb32-300e_coco.py" -checkpoint = 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth' # noqa +checkpoint = ( + "https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-m_8xb256-rsb-a1-600e_in1k-ecb3bbd9.pth" # noqa +) model = dict( - backbone=dict( - init_cfg=dict( - type='Pretrained', prefix='backbone.', checkpoint=checkpoint)), + backbone=dict(init_cfg=dict(type="Pretrained", prefix="backbone.", checkpoint=checkpoint)), bbox_head=dict(num_classes=1), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) -train_dataloader = dict(dataset=dict(metainfo=dict(classes=('person', )))) +train_dataloader = dict(dataset=dict(metainfo=dict(classes=("person",)))) -val_dataloader = dict(dataset=dict(metainfo=dict(classes=('person', )))) +val_dataloader = dict(dataset=dict(metainfo=dict(classes=("person",)))) test_dataloader = val_dataloader diff --git a/mmpose/demo/mmdetection_cfg/rtmdet_m_8xb32-300e_coco.py b/mmpose/demo/mmdetection_cfg/rtmdet_m_8xb32-300e_coco.py index 6d0d3dfef15f96f7c0ea188998c304031fa8c828..4e4fdb7fb7d96f396eb3a84e611e2fbea4e92f02 100644 --- a/mmpose/demo/mmdetection_cfg/rtmdet_m_8xb32-300e_coco.py +++ b/mmpose/demo/mmdetection_cfg/rtmdet_m_8xb32-300e_coco.py @@ -1 +1,2 @@ -_base_ = 'mmdet::rtmdet/rtmdet_m_8xb32-300e_coco.py' +# Copyright (c) OpenMMLab. All rights reserved. +_base_ = "mmdet::rtmdet/rtmdet_m_8xb32-300e_coco.py" diff --git a/mmpose/demo/mmdetection_cfg/rtmdet_nano_320-8xb32_coco-person.py b/mmpose/demo/mmdetection_cfg/rtmdet_nano_320-8xb32_coco-person.py index c2f1b64e4acb31ec24396c8421492d7d2fdd7aab..e61b588a396e21c1f95efff9e95e2fd10607ab5b 100644 --- a/mmpose/demo/mmdetection_cfg/rtmdet_nano_320-8xb32_coco-person.py +++ b/mmpose/demo/mmdetection_cfg/rtmdet_nano_320-8xb32_coco-person.py @@ -1,4 +1,5 @@ -_base_ = 'mmdet::rtmdet/rtmdet_l_8xb32-300e_coco.py' +# Copyright (c) OpenMMLab. All rights reserved. +_base_ = "mmdet::rtmdet/rtmdet_l_8xb32-300e_coco.py" input_shape = 320 @@ -14,91 +15,46 @@ model = dict( num_csp_blocks=1, use_depthwise=True, ), - bbox_head=dict( - in_channels=64, - feat_channels=64, - share_conv=False, - exp_on_reg=False, - use_depthwise=True, - num_classes=1), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + bbox_head=dict(in_channels=64, feat_channels=64, share_conv=False, exp_on_reg=False, use_depthwise=True, num_classes=1), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='CachedMosaic', - img_scale=(input_shape, input_shape), - pad_val=114.0, - max_cached_images=20, - random_pop=False), - dict( - type='RandomResize', - scale=(input_shape * 2, input_shape * 2), - ratio_range=(0.5, 1.5), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(input_shape, input_shape)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict( - type='Pad', - size=(input_shape, input_shape), - pad_val=dict(img=(114, 114, 114))), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="CachedMosaic", img_scale=(input_shape, input_shape), pad_val=114.0, max_cached_images=20, random_pop=False), + dict(type="RandomResize", scale=(input_shape * 2, input_shape * 2), ratio_range=(0.5, 1.5), keep_ratio=True), + dict(type="RandomCrop", crop_size=(input_shape, input_shape)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(input_shape, input_shape), pad_val=dict(img=(114, 114, 114))), + dict(type="PackDetInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', - scale=(input_shape, input_shape), - ratio_range=(0.5, 1.5), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(input_shape, input_shape)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict( - type='Pad', - size=(input_shape, input_shape), - pad_val=dict(img=(114, 114, 114))), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=(input_shape, input_shape), ratio_range=(0.5, 1.5), keep_ratio=True), + dict(type="RandomCrop", crop_size=(input_shape, input_shape)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(input_shape, input_shape), pad_val=dict(img=(114, 114, 114))), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='Resize', scale=(input_shape, input_shape), keep_ratio=True), - dict( - type='Pad', - size=(input_shape, input_shape), - pad_val=dict(img=(114, 114, 114))), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile"), + dict(type="Resize", scale=(input_shape, input_shape), keep_ratio=True), + dict(type="Pad", size=(input_shape, input_shape), pad_val=dict(img=(114, 114, 114))), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -train_dataloader = dict( - dataset=dict(pipeline=train_pipeline, metainfo=dict(classes=('person', )))) +train_dataloader = dict(dataset=dict(pipeline=train_pipeline, metainfo=dict(classes=("person",)))) -val_dataloader = dict( - dataset=dict(pipeline=test_pipeline, metainfo=dict(classes=('person', )))) +val_dataloader = dict(dataset=dict(pipeline=test_pipeline, metainfo=dict(classes=("person",)))) test_dataloader = val_dataloader custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='PipelineSwitchHook', - switch_epoch=280, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="PipelineSwitchHook", switch_epoch=280, switch_pipeline=train_pipeline_stage2), ] diff --git a/mmpose/demo/mmdetection_cfg/rtmdet_nano_320-8xb32_hand.py b/mmpose/demo/mmdetection_cfg/rtmdet_nano_320-8xb32_hand.py index 278cc0bfe82670f89a566dc7e79b362b5a23a3d9..0c00ed6c38d0f6528304fdaac8c4d009343dd726 100644 --- a/mmpose/demo/mmdetection_cfg/rtmdet_nano_320-8xb32_hand.py +++ b/mmpose/demo/mmdetection_cfg/rtmdet_nano_320-8xb32_hand.py @@ -1,4 +1,5 @@ -_base_ = 'mmdet::rtmdet/rtmdet_l_8xb32-300e_coco.py' +# Copyright (c) OpenMMLab. All rights reserved. +_base_ = "mmdet::rtmdet/rtmdet_l_8xb32-300e_coco.py" input_shape = 320 @@ -14,134 +15,111 @@ model = dict( num_csp_blocks=1, use_depthwise=True, ), - bbox_head=dict( - in_channels=64, - feat_channels=64, - share_conv=False, - exp_on_reg=False, - use_depthwise=True, - num_classes=1), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.6), - max_per_img=100)) + bbox_head=dict(in_channels=64, feat_channels=64, share_conv=False, exp_on_reg=False, use_depthwise=True, num_classes=1), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type="nms", iou_threshold=0.6), max_per_img=100), +) # file_client_args = dict( # backend='petrel', # path_mapping=dict({'data/': 's3://openmmlab/datasets/'})) train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='CachedMosaic', - img_scale=(input_shape, input_shape), - pad_val=114.0, - max_cached_images=20, - random_pop=False), - dict( - type='RandomResize', - scale=(input_shape * 2, input_shape * 2), - ratio_range=(0.5, 1.5), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(input_shape, input_shape)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict( - type='Pad', - size=(input_shape, input_shape), - pad_val=dict(img=(114, 114, 114))), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="CachedMosaic", img_scale=(input_shape, input_shape), pad_val=114.0, max_cached_images=20, random_pop=False), + dict(type="RandomResize", scale=(input_shape * 2, input_shape * 2), ratio_range=(0.5, 1.5), keep_ratio=True), + dict(type="RandomCrop", crop_size=(input_shape, input_shape)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(input_shape, input_shape), pad_val=dict(img=(114, 114, 114))), + dict(type="PackDetInputs"), ] train_pipeline_stage2 = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='RandomResize', - scale=(input_shape, input_shape), - ratio_range=(0.5, 1.5), - keep_ratio=True), - dict(type='RandomCrop', crop_size=(input_shape, input_shape)), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict( - type='Pad', - size=(input_shape, input_shape), - pad_val=dict(img=(114, 114, 114))), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="RandomResize", scale=(input_shape, input_shape), ratio_range=(0.5, 1.5), keep_ratio=True), + dict(type="RandomCrop", crop_size=(input_shape, input_shape)), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Pad", size=(input_shape, input_shape), pad_val=dict(img=(114, 114, 114))), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='Resize', scale=(input_shape, input_shape), keep_ratio=True), - dict( - type='Pad', - size=(input_shape, input_shape), - pad_val=dict(img=(114, 114, 114))), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile"), + dict(type="Resize", scale=(input_shape, input_shape), keep_ratio=True), + dict(type="Pad", size=(input_shape, input_shape), pad_val=dict(img=(114, 114, 114))), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] -data_mode = 'topdown' -data_root = 'data/' +data_mode = "topdown" +data_root = "data/" train_dataset = dict( _delete_=True, - type='ConcatDataset', + type="ConcatDataset", datasets=[ dict( - type='mmpose.OneHand10KDataset', + type="mmpose.OneHand10KDataset", data_root=data_root, data_mode=data_mode, pipeline=train_pipeline, - ann_file='onehand10k/annotations/onehand10k_train.json', - data_prefix=dict(img='pose/OneHand10K/')), + ann_file="onehand10k/annotations/onehand10k_train.json", + data_prefix=dict(img="pose/OneHand10K/"), + ), dict( - type='mmpose.FreiHandDataset', + type="mmpose.FreiHandDataset", data_root=data_root, data_mode=data_mode, pipeline=train_pipeline, - ann_file='freihand/annotations/freihand_train.json', - data_prefix=dict(img='pose/FreiHand/')), + ann_file="freihand/annotations/freihand_train.json", + data_prefix=dict(img="pose/FreiHand/"), + ), dict( - type='mmpose.Rhd2DDataset', + type="mmpose.Rhd2DDataset", data_root=data_root, data_mode=data_mode, pipeline=train_pipeline, - ann_file='rhd/annotations/rhd_train.json', - data_prefix=dict(img='pose/RHD/')), + ann_file="rhd/annotations/rhd_train.json", + data_prefix=dict(img="pose/RHD/"), + ), dict( - type='mmpose.HalpeHandDataset', + type="mmpose.HalpeHandDataset", data_root=data_root, data_mode=data_mode, pipeline=train_pipeline, - ann_file='halpe/annotations/halpe_train_v1.json', - data_prefix=dict( - img='pose/Halpe/hico_20160224_det/images/train2015/') # noqa - ) + ann_file="halpe/annotations/halpe_train_v1.json", + data_prefix=dict(img="pose/Halpe/hico_20160224_det/images/train2015/"), # noqa + ), ], ignore_keys=[ - 'CLASSES', 'dataset_keypoint_weights', 'dataset_name', 'flip_indices', - 'flip_pairs', 'keypoint_colors', 'keypoint_id2name', - 'keypoint_name2id', 'lower_body_ids', 'num_keypoints', - 'num_skeleton_links', 'sigmas', 'skeleton_link_colors', - 'skeleton_links', 'upper_body_ids' + "CLASSES", + "dataset_keypoint_weights", + "dataset_name", + "flip_indices", + "flip_pairs", + "keypoint_colors", + "keypoint_id2name", + "keypoint_name2id", + "lower_body_ids", + "num_keypoints", + "num_skeleton_links", + "sigmas", + "skeleton_link_colors", + "skeleton_links", + "upper_body_ids", ], ) test_dataset = dict( _delete_=True, - type='mmpose.OneHand10KDataset', + type="mmpose.OneHand10KDataset", data_root=data_root, data_mode=data_mode, pipeline=test_pipeline, - ann_file='onehand10k/annotations/onehand10k_test.json', - data_prefix=dict(img='pose/OneHand10K/'), + ann_file="onehand10k/annotations/onehand10k_test.json", + data_prefix=dict(img="pose/OneHand10K/"), ) train_dataloader = dict(dataset=train_dataset) @@ -149,23 +127,13 @@ val_dataloader = dict(dataset=test_dataset) test_dataloader = val_dataloader custom_hooks = [ - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0002, - update_buffers=True, - priority=49), - dict( - type='PipelineSwitchHook', - switch_epoch=280, - switch_pipeline=train_pipeline_stage2) + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0002, update_buffers=True, priority=49), + dict(type="PipelineSwitchHook", switch_epoch=280, switch_pipeline=train_pipeline_stage2), ] val_evaluator = dict( - type='CocoMetric', - ann_file=data_root + 'onehand10k/annotations/onehand10k_test.json', - metric='bbox', - format_only=False) + type="CocoMetric", ann_file=data_root + "onehand10k/annotations/onehand10k_test.json", metric="bbox", format_only=False +) test_evaluator = val_evaluator train_cfg = dict(val_interval=1) diff --git a/mmpose/demo/mmdetection_cfg/rtmdet_tiny_8xb32-300e_coco.py b/mmpose/demo/mmdetection_cfg/rtmdet_tiny_8xb32-300e_coco.py index db26ca83388163047fcd45bcaede7d839bdb58f8..a9b3377837d077e23543a16b769816dab6493663 100644 --- a/mmpose/demo/mmdetection_cfg/rtmdet_tiny_8xb32-300e_coco.py +++ b/mmpose/demo/mmdetection_cfg/rtmdet_tiny_8xb32-300e_coco.py @@ -1 +1,2 @@ -_base_ = 'mmdet::rtmdet/rtmdet_tiny_8xb32-300e_coco.py' +# Copyright (c) OpenMMLab. All rights reserved. +_base_ = "mmdet::rtmdet/rtmdet_tiny_8xb32-300e_coco.py" diff --git a/mmpose/demo/mmdetection_cfg/ssdlite_mobilenetv2-scratch_8xb24-600e_coco.py b/mmpose/demo/mmdetection_cfg/ssdlite_mobilenetv2-scratch_8xb24-600e_coco.py index 05c6e9659c7d80eea468624247b8f98d7ad5b428..7bd3b1564983817078afcc1e00102b7ac2550a60 100644 --- a/mmpose/demo/mmdetection_cfg/ssdlite_mobilenetv2-scratch_8xb24-600e_coco.py +++ b/mmpose/demo/mmdetection_cfg/ssdlite_mobilenetv2-scratch_8xb24-600e_coco.py @@ -1,109 +1,83 @@ +# Copyright (c) OpenMMLab. All rights reserved. # model settings data_preprocessor = dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=1) + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=1 +) model = dict( - type='SingleStageDetector', + type="SingleStageDetector", data_preprocessor=data_preprocessor, backbone=dict( - type='MobileNetV2', + type="MobileNetV2", out_indices=(4, 7), - norm_cfg=dict(type='BN', eps=0.001, momentum=0.03), - init_cfg=dict(type='TruncNormal', layer='Conv2d', std=0.03)), + norm_cfg=dict(type="BN", eps=0.001, momentum=0.03), + init_cfg=dict(type="TruncNormal", layer="Conv2d", std=0.03), + ), neck=dict( - type='SSDNeck', + type="SSDNeck", in_channels=(96, 1280), out_channels=(96, 1280, 512, 256, 256, 128), level_strides=(2, 2, 2, 2), level_paddings=(1, 1, 1, 1), l2_norm_scale=None, use_depthwise=True, - norm_cfg=dict(type='BN', eps=0.001, momentum=0.03), - act_cfg=dict(type='ReLU6'), - init_cfg=dict(type='TruncNormal', layer='Conv2d', std=0.03)), + norm_cfg=dict(type="BN", eps=0.001, momentum=0.03), + act_cfg=dict(type="ReLU6"), + init_cfg=dict(type="TruncNormal", layer="Conv2d", std=0.03), + ), bbox_head=dict( - type='SSDHead', + type="SSDHead", in_channels=(96, 1280, 512, 256, 256, 128), num_classes=80, use_depthwise=True, - norm_cfg=dict(type='BN', eps=0.001, momentum=0.03), - act_cfg=dict(type='ReLU6'), - init_cfg=dict(type='Normal', layer='Conv2d', std=0.001), - + norm_cfg=dict(type="BN", eps=0.001, momentum=0.03), + act_cfg=dict(type="ReLU6"), + init_cfg=dict(type="Normal", layer="Conv2d", std=0.001), # set anchor size manually instead of using the predefined # SSD300 setting. anchor_generator=dict( - type='SSDAnchorGenerator', + type="SSDAnchorGenerator", scale_major=False, strides=[16, 32, 64, 107, 160, 320], ratios=[[2, 3], [2, 3], [2, 3], [2, 3], [2, 3], [2, 3]], min_sizes=[48, 100, 150, 202, 253, 304], - max_sizes=[100, 150, 202, 253, 304, 320]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2])), + max_sizes=[100, 150, 202, 253, 304, 320], + ), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + ), # model training and testing settings train_cfg=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0., - ignore_iof_thr=-1, - gt_max_assign_all=False), - sampler=dict(type='PseudoSampler'), - smoothl1_beta=1., + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.0, ignore_iof_thr=-1, gt_max_assign_all=False), + sampler=dict(type="PseudoSampler"), + smoothl1_beta=1.0, allowed_border=-1, pos_weight=-1, neg_pos_ratio=3, - debug=False), - test_cfg=dict( - nms_pre=1000, - nms=dict(type='nms', iou_threshold=0.45), - min_bbox_size=0, - score_thr=0.02, - max_per_img=200)) + debug=False, + ), + test_cfg=dict(nms_pre=1000, nms=dict(type="nms", iou_threshold=0.45), min_bbox_size=0, score_thr=0.02, max_per_img=200), +) env_cfg = dict(cudnn_benchmark=True) # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' +dataset_type = "CocoDataset" +data_root = "data/coco/" input_size = 320 train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='Expand', - mean=data_preprocessor['mean'], - to_rgb=data_preprocessor['bgr_to_rgb'], - ratio_range=(1, 4)), - dict( - type='MinIoURandomCrop', - min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), - min_crop_size=0.3), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), - dict(type='RandomFlip', prob=0.5), - dict( - type='PhotoMetricDistortion', - brightness_delta=32, - contrast_range=(0.5, 1.5), - saturation_range=(0.5, 1.5), - hue_delta=18), - dict(type='PackDetInputs') + dict(type="LoadImageFromFile"), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="Expand", mean=data_preprocessor["mean"], to_rgb=data_preprocessor["bgr_to_rgb"], ratio_range=(1, 4)), + dict(type="MinIoURandomCrop", min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), + dict(type="RandomFlip", prob=0.5), + dict(type="PhotoMetricDistortion", brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), + dict(type="PackDetInputs"), ] test_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile"), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=24, @@ -111,26 +85,31 @@ train_dataloader = dict( batch_sampler=None, dataset=dict( _delete_=True, - type='RepeatDataset', + type="RepeatDataset", times=5, dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), filter_cfg=dict(filter_empty_gt=True, min_size=32), - pipeline=train_pipeline))) + pipeline=train_pipeline, + ), + ), +) val_dataloader = dict( batch_size=8, num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, - ann_file='annotations/instances_val2017.json', - data_prefix=dict(img='val2017/'), + ann_file="annotations/instances_val2017.json", + data_prefix=dict(img="val2017/"), test_mode=True, - pipeline=test_pipeline)) + pipeline=test_pipeline, + ), +) test_dataloader = val_dataloader diff --git a/mmpose/demo/mmdetection_cfg/ssdlite_mobilenetv2_scratch_600e_onehand.py b/mmpose/demo/mmdetection_cfg/ssdlite_mobilenetv2_scratch_600e_onehand.py index ebdd2e719cb29263f0902ad627fc5742a92fca72..807b8838c455c53330d02df3fb88d87ec77de743 100644 --- a/mmpose/demo/mmdetection_cfg/ssdlite_mobilenetv2_scratch_600e_onehand.py +++ b/mmpose/demo/mmdetection_cfg/ssdlite_mobilenetv2_scratch_600e_onehand.py @@ -1,105 +1,92 @@ +# Copyright (c) OpenMMLab. All rights reserved. # ========================================================= # from 'mmdetection/configs/_base_/default_runtime.py' # ========================================================= -default_scope = 'mmdet' +default_scope = "mmdet" checkpoint_config = dict(interval=1) # yapf:disable log_config = dict( interval=50, hooks=[ - dict(type='TextLoggerHook'), + dict(type="TextLoggerHook"), # dict(type='TensorboardLoggerHook') - ]) + ], +) # yapf:enable -custom_hooks = [dict(type='NumClassCheckHook')] +custom_hooks = [dict(type="NumClassCheckHook")] # ========================================================= # model settings data_preprocessor = dict( - type='DetDataPreprocessor', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - bgr_to_rgb=True, - pad_size_divisor=1) + type="DetDataPreprocessor", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=1 +) model = dict( - type='SingleStageDetector', + type="SingleStageDetector", data_preprocessor=data_preprocessor, backbone=dict( - type='MobileNetV2', + type="MobileNetV2", out_indices=(4, 7), - norm_cfg=dict(type='BN', eps=0.001, momentum=0.03), - init_cfg=dict(type='TruncNormal', layer='Conv2d', std=0.03)), + norm_cfg=dict(type="BN", eps=0.001, momentum=0.03), + init_cfg=dict(type="TruncNormal", layer="Conv2d", std=0.03), + ), neck=dict( - type='SSDNeck', + type="SSDNeck", in_channels=(96, 1280), out_channels=(96, 1280, 512, 256, 256, 128), level_strides=(2, 2, 2, 2), level_paddings=(1, 1, 1, 1), l2_norm_scale=None, use_depthwise=True, - norm_cfg=dict(type='BN', eps=0.001, momentum=0.03), - act_cfg=dict(type='ReLU6'), - init_cfg=dict(type='TruncNormal', layer='Conv2d', std=0.03)), + norm_cfg=dict(type="BN", eps=0.001, momentum=0.03), + act_cfg=dict(type="ReLU6"), + init_cfg=dict(type="TruncNormal", layer="Conv2d", std=0.03), + ), bbox_head=dict( - type='SSDHead', + type="SSDHead", in_channels=(96, 1280, 512, 256, 256, 128), num_classes=1, use_depthwise=True, - norm_cfg=dict(type='BN', eps=0.001, momentum=0.03), - act_cfg=dict(type='ReLU6'), - init_cfg=dict(type='Normal', layer='Conv2d', std=0.001), - + norm_cfg=dict(type="BN", eps=0.001, momentum=0.03), + act_cfg=dict(type="ReLU6"), + init_cfg=dict(type="Normal", layer="Conv2d", std=0.001), # set anchor size manually instead of using the predefined # SSD300 setting. anchor_generator=dict( - type='SSDAnchorGenerator', + type="SSDAnchorGenerator", scale_major=False, strides=[16, 32, 64, 107, 160, 320], ratios=[[2, 3], [2, 3], [2, 3], [2, 3], [2, 3], [2, 3]], min_sizes=[48, 100, 150, 202, 253, 304], - max_sizes=[100, 150, 202, 253, 304, 320]), - bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[.0, .0, .0, .0], - target_stds=[0.1, 0.1, 0.2, 0.2])), + max_sizes=[100, 150, 202, 253, 304, 320], + ), + bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), + ), # model training and testing settings train_cfg=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0., - ignore_iof_thr=-1, - gt_max_assign_all=False), - sampler=dict(type='PseudoSampler'), - smoothl1_beta=1., + assigner=dict(type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.0, ignore_iof_thr=-1, gt_max_assign_all=False), + sampler=dict(type="PseudoSampler"), + smoothl1_beta=1.0, allowed_border=-1, pos_weight=-1, neg_pos_ratio=3, - debug=False), - test_cfg=dict( - nms_pre=1000, - nms=dict(type='nms', iou_threshold=0.45), - min_bbox_size=0, - score_thr=0.02, - max_per_img=200)) + debug=False, + ), + test_cfg=dict(nms_pre=1000, nms=dict(type="nms", iou_threshold=0.45), min_bbox_size=0, score_thr=0.02, max_per_img=200), +) cudnn_benchmark = True # dataset settings -file_client_args = dict(backend='disk') +file_client_args = dict(backend="disk") -dataset_type = 'CocoDataset' -data_root = 'data/onehand10k/' -classes = ('hand', ) +dataset_type = "CocoDataset" +data_root = "data/onehand10k/" +classes = ("hand",) input_size = 320 test_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='Resize', scale=(input_size, input_size), keep_ratio=False), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile"), + dict(type="Resize", scale=(input_size, input_size), keep_ratio=False), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] val_dataloader = dict( @@ -107,35 +94,25 @@ val_dataloader = dict( num_workers=2, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( - type=dataset_type, - data_root=data_root, - ann_file='annotations/onehand10k_test.json', - test_mode=True, - pipeline=test_pipeline)) + type=dataset_type, data_root=data_root, ann_file="annotations/onehand10k_test.json", test_mode=True, pipeline=test_pipeline + ), +) test_dataloader = val_dataloader # optimizer -optimizer = dict(type='SGD', lr=0.015, momentum=0.9, weight_decay=4.0e-5) +optimizer = dict(type="SGD", lr=0.015, momentum=0.9, weight_decay=4.0e-5) optimizer_config = dict(grad_clip=None) # learning policy -lr_config = dict( - policy='CosineAnnealing', - warmup='linear', - warmup_iters=500, - warmup_ratio=0.001, - min_lr=0) -runner = dict(type='EpochBasedRunner', max_epochs=120) +lr_config = dict(policy="CosineAnnealing", warmup="linear", warmup_iters=500, warmup_ratio=0.001, min_lr=0) +runner = dict(type="EpochBasedRunner", max_epochs=120) # Avoid evaluation and saving weights too frequently -evaluation = dict(interval=5, metric='bbox') +evaluation = dict(interval=5, metric="bbox") checkpoint_config = dict(interval=5) -custom_hooks = [ - dict(type='NumClassCheckHook'), - dict(type='CheckInvalidLossHook', interval=50, priority='VERY_LOW') -] +custom_hooks = [dict(type="NumClassCheckHook"), dict(type="CheckInvalidLossHook", interval=50, priority="VERY_LOW")] log_config = dict(interval=5) @@ -144,10 +121,9 @@ log_config = dict(interval=5) # base_batch_size = (8 GPUs) x (24 samples per GPU) auto_scale_lr = dict(base_batch_size=192) -load_from = 'https://download.openmmlab.com/mmdetection/' -'v2.0/ssd/ssdlite_mobilenetv2_scratch_600e_coco/' -'ssdlite_mobilenetv2_scratch_600e_coco_20210629_110627-974d9307.pth' +load_from = "https://download.openmmlab.com/mmdetection/" +"v2.0/ssd/ssdlite_mobilenetv2_scratch_600e_coco/" +"ssdlite_mobilenetv2_scratch_600e_coco_20210629_110627-974d9307.pth" -vis_backends = [dict(type='LocalVisBackend')] -visualizer = dict( - type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer') +vis_backends = [dict(type="LocalVisBackend")] +visualizer = dict(type="DetLocalVisualizer", vis_backends=vis_backends, name="visualizer") diff --git a/mmpose/demo/mmdetection_cfg/yolov3_d53_320_273e_coco.py b/mmpose/demo/mmdetection_cfg/yolov3_d53_320_273e_coco.py index d7e9cca1eb34f9935a9eaf74b4cae18d1efaa248..bb82b9f013dcdd5a57994ed7e9ebc188b42dd615 100644 --- a/mmpose/demo/mmdetection_cfg/yolov3_d53_320_273e_coco.py +++ b/mmpose/demo/mmdetection_cfg/yolov3_d53_320_273e_coco.py @@ -1,140 +1,109 @@ +# Copyright (c) OpenMMLab. All rights reserved. # model settings model = dict( - type='YOLOV3', - pretrained='open-mmlab://darknet53', - backbone=dict(type='Darknet', depth=53, out_indices=(3, 4, 5)), - neck=dict( - type='YOLOV3Neck', - num_scales=3, - in_channels=[1024, 512, 256], - out_channels=[512, 256, 128]), + type="YOLOV3", + pretrained="open-mmlab://darknet53", + backbone=dict(type="Darknet", depth=53, out_indices=(3, 4, 5)), + neck=dict(type="YOLOV3Neck", num_scales=3, in_channels=[1024, 512, 256], out_channels=[512, 256, 128]), bbox_head=dict( - type='YOLOV3Head', + type="YOLOV3Head", num_classes=80, in_channels=[512, 256, 128], out_channels=[1024, 512, 256], anchor_generator=dict( - type='YOLOAnchorGenerator', - base_sizes=[[(116, 90), (156, 198), (373, 326)], - [(30, 61), (62, 45), (59, 119)], - [(10, 13), (16, 30), (33, 23)]], - strides=[32, 16, 8]), - bbox_coder=dict(type='YOLOBBoxCoder'), + type="YOLOAnchorGenerator", + base_sizes=[[(116, 90), (156, 198), (373, 326)], [(30, 61), (62, 45), (59, 119)], [(10, 13), (16, 30), (33, 23)]], + strides=[32, 16, 8], + ), + bbox_coder=dict(type="YOLOBBoxCoder"), featmap_strides=[32, 16, 8], - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - loss_weight=1.0, - reduction='sum'), - loss_conf=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - loss_weight=1.0, - reduction='sum'), - loss_xy=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - loss_weight=2.0, - reduction='sum'), - loss_wh=dict(type='MSELoss', loss_weight=2.0, reduction='sum')), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0, reduction="sum"), + loss_conf=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0, reduction="sum"), + loss_xy=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=2.0, reduction="sum"), + loss_wh=dict(type="MSELoss", loss_weight=2.0, reduction="sum"), + ), # training and testing settings - train_cfg=dict( - assigner=dict( - type='GridAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0)), - test_cfg=dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - conf_thr=0.005, - nms=dict(type='nms', iou_threshold=0.45), - max_per_img=100)) + train_cfg=dict(assigner=dict(type="GridAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0)), + test_cfg=dict(nms_pre=1000, min_bbox_size=0, score_thr=0.05, conf_thr=0.005, nms=dict(type="nms", iou_threshold=0.45), max_per_img=100), +) # dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco' -img_norm_cfg = dict(mean=[0, 0, 0], std=[255., 255., 255.], to_rgb=True) +dataset_type = "CocoDataset" +data_root = "data/coco" +img_norm_cfg = dict(mean=[0, 0, 0], std=[255.0, 255.0, 255.0], to_rgb=True) train_pipeline = [ - dict(type='LoadImageFromFile', to_float32=True), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='PhotoMetricDistortion'), - dict( - type='Expand', - mean=img_norm_cfg['mean'], - to_rgb=img_norm_cfg['to_rgb'], - ratio_range=(1, 2)), - dict( - type='MinIoURandomCrop', - min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9), - min_crop_size=0.3), - dict(type='Resize', img_scale=(320, 320), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) + dict(type="LoadImageFromFile", to_float32=True), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PhotoMetricDistortion"), + dict(type="Expand", mean=img_norm_cfg["mean"], to_rgb=img_norm_cfg["to_rgb"], ratio_range=(1, 2)), + dict(type="MinIoURandomCrop", min_ious=(0.4, 0.5, 0.6, 0.7, 0.8, 0.9), min_crop_size=0.3), + dict(type="Resize", img_scale=(320, 320), keep_ratio=True), + dict(type="RandomFlip", flip_ratio=0.5), + dict(type="Normalize", **img_norm_cfg), + dict(type="Pad", size_divisor=32), + dict(type="DefaultFormatBundle"), + dict(type="Collect", keys=["img", "gt_bboxes", "gt_labels"]), ] test_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type="LoadImageFromFile"), dict( - type='MultiScaleFlipAug', + type="MultiScaleFlipAug", img_scale=(320, 320), flip=False, transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img']) - ]) + dict(type="Resize", keep_ratio=True), + dict(type="RandomFlip"), + dict(type="Normalize", **img_norm_cfg), + dict(type="Pad", size_divisor=32), + dict(type="DefaultFormatBundle"), + dict(type="Collect", keys=["img"]), + ], + ), ] data = dict( samples_per_gpu=8, workers_per_gpu=4, train=dict( type=dataset_type, - ann_file=f'{data_root}/annotations/instances_train2017.json', - img_prefix=f'{data_root}/train2017/', - pipeline=train_pipeline), + ann_file=f"{data_root}/annotations/instances_train2017.json", + img_prefix=f"{data_root}/train2017/", + pipeline=train_pipeline, + ), val=dict( type=dataset_type, - ann_file=f'{data_root}/annotations/instances_val2017.json', - img_prefix=f'{data_root}/val2017/', - pipeline=test_pipeline), + ann_file=f"{data_root}/annotations/instances_val2017.json", + img_prefix=f"{data_root}/val2017/", + pipeline=test_pipeline, + ), test=dict( type=dataset_type, - ann_file=f'{data_root}/annotations/instances_val2017.json', - img_prefix=f'{data_root}/val2017/', - pipeline=test_pipeline)) + ann_file=f"{data_root}/annotations/instances_val2017.json", + img_prefix=f"{data_root}/val2017/", + pipeline=test_pipeline, + ), +) # optimizer -optimizer = dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0005) +optimizer = dict(type="SGD", lr=0.001, momentum=0.9, weight_decay=0.0005) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) # learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=2000, # same as burn-in in darknet - warmup_ratio=0.1, - step=[218, 246]) +lr_config = dict(policy="step", warmup="linear", warmup_iters=2000, warmup_ratio=0.1, step=[218, 246]) # same as burn-in in darknet # runtime settings -runner = dict(type='EpochBasedRunner', max_epochs=273) -evaluation = dict(interval=1, metric=['bbox']) +runner = dict(type="EpochBasedRunner", max_epochs=273) +evaluation = dict(interval=1, metric=["bbox"]) checkpoint_config = dict(interval=1) # yapf:disable log_config = dict( interval=50, hooks=[ - dict(type='TextLoggerHook'), + dict(type="TextLoggerHook"), # dict(type='TensorboardLoggerHook') - ]) + ], +) # yapf:enable -custom_hooks = [dict(type='NumClassCheckHook')] +custom_hooks = [dict(type="NumClassCheckHook")] -dist_params = dict(backend='nccl') -log_level = 'INFO' +dist_params = dict(backend="nccl") +log_level = "INFO" load_from = None resume_from = None -workflow = [('train', 1)] +workflow = [("train", 1)] diff --git a/mmpose/demo/mmdetection_cfg/yolox-s_8xb8-300e_coco-face.py b/mmpose/demo/mmdetection_cfg/yolox-s_8xb8-300e_coco-face.py index 16f891304ac8d6242a3e054fb18c60a9cb4a237c..fcf8fe0b6017463d59a046c188e93c2f0aa2a6f7 100644 --- a/mmpose/demo/mmdetection_cfg/yolox-s_8xb8-300e_coco-face.py +++ b/mmpose/demo/mmdetection_cfg/yolox-s_8xb8-300e_coco-face.py @@ -1,300 +1,208 @@ -train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=300, val_interval=10) -val_cfg = dict(type='ValLoop') -test_cfg = dict(type='TestLoop') +# Copyright (c) OpenMMLab. All rights reserved. +train_cfg = dict(type="EpochBasedTrainLoop", max_epochs=300, val_interval=10) +val_cfg = dict(type="ValLoop") +test_cfg = dict(type="TestLoop") param_scheduler = [ - dict( - type='mmdet.QuadraticWarmupLR', - by_epoch=True, - begin=0, - end=5, - convert_to_iter_based=True), - dict( - type='CosineAnnealingLR', - eta_min=0.0005, - begin=5, - T_max=285, - end=285, - by_epoch=True, - convert_to_iter_based=True), - dict(type='ConstantLR', by_epoch=True, factor=1, begin=285, end=300) + dict(type="mmdet.QuadraticWarmupLR", by_epoch=True, begin=0, end=5, convert_to_iter_based=True), + dict(type="CosineAnnealingLR", eta_min=0.0005, begin=5, T_max=285, end=285, by_epoch=True, convert_to_iter_based=True), + dict(type="ConstantLR", by_epoch=True, factor=1, begin=285, end=300), ] optim_wrapper = dict( - type='OptimWrapper', - optimizer=dict( - type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005, nesterov=True), - paramwise_cfg=dict(norm_decay_mult=0.0, bias_decay_mult=0.0)) + type="OptimWrapper", + optimizer=dict(type="SGD", lr=0.01, momentum=0.9, weight_decay=0.0005, nesterov=True), + paramwise_cfg=dict(norm_decay_mult=0.0, bias_decay_mult=0.0), +) auto_scale_lr = dict(enable=False, base_batch_size=64) -default_scope = 'mmdet' +default_scope = "mmdet" default_hooks = dict( - timer=dict(type='IterTimerHook'), - logger=dict(type='LoggerHook', interval=50), - param_scheduler=dict(type='ParamSchedulerHook'), - checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=3), - sampler_seed=dict(type='DistSamplerSeedHook'), - visualization=dict(type='DetVisualizationHook')) -env_cfg = dict( - cudnn_benchmark=False, - mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), - dist_cfg=dict(backend='nccl')) -vis_backends = [dict(type='LocalVisBackend')] -visualizer = dict( - type='DetLocalVisualizer', - vis_backends=[dict(type='LocalVisBackend')], - name='visualizer') -log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True) -log_level = 'INFO' -load_from = 'https://download.openmmlab.com/mmdetection/' \ - 'v2.0/yolox/yolox_s_8x8_300e_coco/' \ - 'yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth' + timer=dict(type="IterTimerHook"), + logger=dict(type="LoggerHook", interval=50), + param_scheduler=dict(type="ParamSchedulerHook"), + checkpoint=dict(type="CheckpointHook", interval=10, max_keep_ckpts=3), + sampler_seed=dict(type="DistSamplerSeedHook"), + visualization=dict(type="DetVisualizationHook"), +) +env_cfg = dict(cudnn_benchmark=False, mp_cfg=dict(mp_start_method="fork", opencv_num_threads=0), dist_cfg=dict(backend="nccl")) +vis_backends = [dict(type="LocalVisBackend")] +visualizer = dict(type="DetLocalVisualizer", vis_backends=[dict(type="LocalVisBackend")], name="visualizer") +log_processor = dict(type="LogProcessor", window_size=50, by_epoch=True) +log_level = "INFO" +load_from = ( + "https://download.openmmlab.com/mmdetection/" "v2.0/yolox/yolox_s_8x8_300e_coco/" "yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth" +) resume = False img_scale = (640, 640) model = dict( - type='YOLOX', + type="YOLOX", data_preprocessor=dict( - type='DetDataPreprocessor', + type="DetDataPreprocessor", pad_size_divisor=32, - batch_augments=[ - dict( - type='BatchSyncRandomResize', - random_size_range=(480, 800), - size_divisor=32, - interval=10) - ]), + batch_augments=[dict(type="BatchSyncRandomResize", random_size_range=(480, 800), size_divisor=32, interval=10)], + ), backbone=dict( - type='CSPDarknet', + type="CSPDarknet", deepen_factor=0.33, widen_factor=0.5, out_indices=(2, 3, 4), use_depthwise=False, spp_kernal_sizes=(5, 9, 13), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), neck=dict( - type='YOLOXPAFPN', + type="YOLOXPAFPN", in_channels=[128, 256, 512], out_channels=128, num_csp_blocks=1, use_depthwise=False, - upsample_cfg=dict(scale_factor=2, mode='nearest'), - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')), + upsample_cfg=dict(scale_factor=2, mode="nearest"), + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ), bbox_head=dict( - type='YOLOXHead', + type="YOLOXHead", num_classes=1, in_channels=128, feat_channels=128, stacked_convs=2, strides=(8, 16, 32), use_depthwise=False, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - reduction='sum', - loss_weight=1.0), - loss_bbox=dict( - type='IoULoss', - mode='square', - eps=1e-16, - reduction='sum', - loss_weight=5.0), - loss_obj=dict( - type='CrossEntropyLoss', - use_sigmoid=True, - reduction='sum', - loss_weight=1.0), - loss_l1=dict(type='L1Loss', reduction='sum', loss_weight=1.0)), - train_cfg=dict(assigner=dict(type='SimOTAAssigner', center_radius=2.5)), - test_cfg=dict(score_thr=0.01, nms=dict(type='nms', iou_threshold=0.65))) -data_root = 'data/coco/' -dataset_type = 'CocoDataset' -backend_args = dict(backend='local') + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, reduction="sum", loss_weight=1.0), + loss_bbox=dict(type="IoULoss", mode="square", eps=1e-16, reduction="sum", loss_weight=5.0), + loss_obj=dict(type="CrossEntropyLoss", use_sigmoid=True, reduction="sum", loss_weight=1.0), + loss_l1=dict(type="L1Loss", reduction="sum", loss_weight=1.0), + ), + train_cfg=dict(assigner=dict(type="SimOTAAssigner", center_radius=2.5)), + test_cfg=dict(score_thr=0.01, nms=dict(type="nms", iou_threshold=0.65)), +) +data_root = "data/coco/" +dataset_type = "CocoDataset" +backend_args = dict(backend="local") train_pipeline = [ - dict(type='Mosaic', img_scale=(640, 640), pad_val=114.0), - dict( - type='RandomAffine', scaling_ratio_range=(0.1, 2), - border=(-320, -320)), - dict( - type='MixUp', - img_scale=(640, 640), - ratio_range=(0.8, 1.6), - pad_val=114.0), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Resize', scale=(640, 640), keep_ratio=True), - dict( - type='Pad', - pad_to_square=True, - pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False), - dict(type='PackDetInputs') + dict(type="Mosaic", img_scale=(640, 640), pad_val=114.0), + dict(type="RandomAffine", scaling_ratio_range=(0.1, 2), border=(-320, -320)), + dict(type="MixUp", img_scale=(640, 640), ratio_range=(0.8, 1.6), pad_val=114.0), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Resize", scale=(640, 640), keep_ratio=True), + dict(type="Pad", pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1), keep_empty=False), + dict(type="PackDetInputs"), ] train_dataset = dict( - type='MultiImageMixDataset', + type="MultiImageMixDataset", dataset=dict( - type='CocoDataset', - data_root='data/coco/', - ann_file='annotations/instances_train2017.json', - data_prefix=dict(img='train2017/'), - pipeline=[ - dict(type='LoadImageFromFile', backend_args=dict(backend='local')), - dict(type='LoadAnnotations', with_bbox=True) - ], - filter_cfg=dict(filter_empty_gt=False, min_size=32)), + type="CocoDataset", + data_root="data/coco/", + ann_file="annotations/instances_train2017.json", + data_prefix=dict(img="train2017/"), + pipeline=[dict(type="LoadImageFromFile", backend_args=dict(backend="local")), dict(type="LoadAnnotations", with_bbox=True)], + filter_cfg=dict(filter_empty_gt=False, min_size=32), + ), pipeline=[ - dict(type='Mosaic', img_scale=(640, 640), pad_val=114.0), - dict( - type='RandomAffine', - scaling_ratio_range=(0.1, 2), - border=(-320, -320)), - dict( - type='MixUp', - img_scale=(640, 640), - ratio_range=(0.8, 1.6), - pad_val=114.0), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Resize', scale=(640, 640), keep_ratio=True), - dict( - type='Pad', - pad_to_square=True, - pad_val=dict(img=(114.0, 114.0, 114.0))), - dict( - type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False), - dict(type='PackDetInputs') - ]) + dict(type="Mosaic", img_scale=(640, 640), pad_val=114.0), + dict(type="RandomAffine", scaling_ratio_range=(0.1, 2), border=(-320, -320)), + dict(type="MixUp", img_scale=(640, 640), ratio_range=(0.8, 1.6), pad_val=114.0), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Resize", scale=(640, 640), keep_ratio=True), + dict(type="Pad", pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1), keep_empty=False), + dict(type="PackDetInputs"), + ], +) test_pipeline = [ - dict(type='LoadImageFromFile', backend_args=dict(backend='local')), - dict(type='Resize', scale=(640, 640), keep_ratio=True), - dict( - type='Pad', - pad_to_square=True, - pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=dict(backend="local")), + dict(type="Resize", scale=(640, 640), keep_ratio=True), + dict(type="Pad", pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ] train_dataloader = dict( batch_size=8, num_workers=4, persistent_workers=True, - sampler=dict(type='DefaultSampler', shuffle=True), + sampler=dict(type="DefaultSampler", shuffle=True), dataset=dict( - type='MultiImageMixDataset', + type="MultiImageMixDataset", dataset=dict( - type='CocoDataset', - data_root='data/coco/', - ann_file='annotations/coco_face_train.json', - data_prefix=dict(img='train2017/'), - pipeline=[ - dict( - type='LoadImageFromFile', - backend_args=dict(backend='local')), - dict(type='LoadAnnotations', with_bbox=True) - ], + type="CocoDataset", + data_root="data/coco/", + ann_file="annotations/coco_face_train.json", + data_prefix=dict(img="train2017/"), + pipeline=[dict(type="LoadImageFromFile", backend_args=dict(backend="local")), dict(type="LoadAnnotations", with_bbox=True)], filter_cfg=dict(filter_empty_gt=False, min_size=32), - metainfo=dict(CLASSES=('person', ), PALETTE=(220, 20, 60))), + metainfo=dict(CLASSES=("person",), PALETTE=(220, 20, 60)), + ), pipeline=[ - dict(type='Mosaic', img_scale=(640, 640), pad_val=114.0), - dict( - type='RandomAffine', - scaling_ratio_range=(0.1, 2), - border=(-320, -320)), - dict( - type='MixUp', - img_scale=(640, 640), - ratio_range=(0.8, 1.6), - pad_val=114.0), - dict(type='YOLOXHSVRandomAug'), - dict(type='RandomFlip', prob=0.5), - dict(type='Resize', scale=(640, 640), keep_ratio=True), - dict( - type='Pad', - pad_to_square=True, - pad_val=dict(img=(114.0, 114.0, 114.0))), - dict( - type='FilterAnnotations', - min_gt_bbox_wh=(1, 1), - keep_empty=False), - dict(type='PackDetInputs') - ])) + dict(type="Mosaic", img_scale=(640, 640), pad_val=114.0), + dict(type="RandomAffine", scaling_ratio_range=(0.1, 2), border=(-320, -320)), + dict(type="MixUp", img_scale=(640, 640), ratio_range=(0.8, 1.6), pad_val=114.0), + dict(type="YOLOXHSVRandomAug"), + dict(type="RandomFlip", prob=0.5), + dict(type="Resize", scale=(640, 640), keep_ratio=True), + dict(type="Pad", pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="FilterAnnotations", min_gt_bbox_wh=(1, 1), keep_empty=False), + dict(type="PackDetInputs"), + ], + ), +) val_dataloader = dict( batch_size=8, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( - type='CocoDataset', - data_root='data/coco/', - ann_file='annotations/coco_face_val.json', - data_prefix=dict(img='val2017/'), + type="CocoDataset", + data_root="data/coco/", + ann_file="annotations/coco_face_val.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=[ - dict(type='LoadImageFromFile', backend_args=dict(backend='local')), - dict(type='Resize', scale=(640, 640), keep_ratio=True), - dict( - type='Pad', - pad_to_square=True, - pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=dict(backend="local")), + dict(type="Resize", scale=(640, 640), keep_ratio=True), + dict(type="Pad", pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ], - metainfo=dict(CLASSES=('person', ), PALETTE=(220, 20, 60)))) + metainfo=dict(CLASSES=("person",), PALETTE=(220, 20, 60)), + ), +) test_dataloader = dict( batch_size=8, num_workers=4, persistent_workers=True, drop_last=False, - sampler=dict(type='DefaultSampler', shuffle=False), + sampler=dict(type="DefaultSampler", shuffle=False), dataset=dict( - type='CocoDataset', - data_root='data/coco/', - ann_file='annotations/coco_face_val.json', - data_prefix=dict(img='val2017/'), + type="CocoDataset", + data_root="data/coco/", + ann_file="annotations/coco_face_val.json", + data_prefix=dict(img="val2017/"), test_mode=True, pipeline=[ - dict(type='LoadImageFromFile', backend_args=dict(backend='local')), - dict(type='Resize', scale=(640, 640), keep_ratio=True), - dict( - type='Pad', - pad_to_square=True, - pad_val=dict(img=(114.0, 114.0, 114.0))), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='PackDetInputs', - meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', - 'scale_factor')) + dict(type="LoadImageFromFile", backend_args=dict(backend="local")), + dict(type="Resize", scale=(640, 640), keep_ratio=True), + dict(type="Pad", pad_to_square=True, pad_val=dict(img=(114.0, 114.0, 114.0))), + dict(type="LoadAnnotations", with_bbox=True), + dict(type="PackDetInputs", meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor")), ], - metainfo=dict(CLASSES=('person', ), PALETTE=(220, 20, 60)))) -val_evaluator = dict( - type='CocoMetric', - ann_file='data/coco/annotations/coco_face_val.json', - metric='bbox') -test_evaluator = dict( - type='CocoMetric', - ann_file='data/coco/annotations/instances_val2017.json', - metric='bbox') + metainfo=dict(CLASSES=("person",), PALETTE=(220, 20, 60)), + ), +) +val_evaluator = dict(type="CocoMetric", ann_file="data/coco/annotations/coco_face_val.json", metric="bbox") +test_evaluator = dict(type="CocoMetric", ann_file="data/coco/annotations/instances_val2017.json", metric="bbox") max_epochs = 300 num_last_epochs = 15 interval = 10 base_lr = 0.01 custom_hooks = [ - dict(type='YOLOXModeSwitchHook', num_last_epochs=15, priority=48), - dict(type='SyncNormHook', priority=48), - dict( - type='EMAHook', - ema_type='ExpMomentumEMA', - momentum=0.0001, - strict_load=False, - update_buffers=True, - priority=49) + dict(type="YOLOXModeSwitchHook", num_last_epochs=15, priority=48), + dict(type="SyncNormHook", priority=48), + dict(type="EMAHook", ema_type="ExpMomentumEMA", momentum=0.0001, strict_load=False, update_buffers=True, priority=49), ] -metainfo = dict(CLASSES=('person', ), PALETTE=(220, 20, 60)) -launcher = 'pytorch' +metainfo = dict(CLASSES=("person",), PALETTE=(220, 20, 60)) +launcher = "pytorch" diff --git a/mmpose/demo/mmtracking_cfg/deepsort_faster-rcnn_fpn_4e_mot17-private-half.py b/mmpose/demo/mmtracking_cfg/deepsort_faster-rcnn_fpn_4e_mot17-private-half.py index 1d7fccf0cbe9929618274218274726eb28577273..a5ac742436c905f55c41ff6d367e484b7e5a84f1 100644 --- a/mmpose/demo/mmtracking_cfg/deepsort_faster-rcnn_fpn_4e_mot17-private-half.py +++ b/mmpose/demo/mmtracking_cfg/deepsort_faster-rcnn_fpn_4e_mot17-private-half.py @@ -1,321 +1,225 @@ +# Copyright (c) OpenMMLab. All rights reserved. model = dict( detector=dict( - type='FasterRCNN', + type="FasterRCNN", backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch', - init_cfg=dict( - type='Pretrained', checkpoint='torchvision://resnet50')), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + style="pytorch", + init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet50"), + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0.0, 0.0, 0.0, 0.0], - target_stds=[1.0, 1.0, 1.0, 1.0], - clip_border=False), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict( - type='SmoothL1Loss', beta=0.1111111111111111, - loss_weight=1.0)), + type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0], clip_border=False + ), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=0.1111111111111111, loss_weight=1.0), + ), roi_head=dict( - type='StandardRoIHead', + type="StandardRoIHead", bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict( - type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0.0, 0.0, 0.0, 0.0], - target_stds=[0.1, 0.1, 0.2, 0.2], - clip_border=False), + type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2], clip_border=False + ), reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', loss_weight=1.0))), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", loss_weight=1.0), + ), + ), train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=2000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=2000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False)), + debug=False, + ), + ), test_cfg=dict( - rpn=dict( - nms_pre=1000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100)), + rpn=dict(nms_pre=1000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), + ), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmtracking/' - 'mot/faster_rcnn/faster-rcnn_r50_fpn_4e_mot17-half-64ee2ed4.pth')), - type='DeepSORT', - motion=dict(type='KalmanFilter', center_only=False), + type="Pretrained", + checkpoint="https://download.openmmlab.com/mmtracking/" "mot/faster_rcnn/faster-rcnn_r50_fpn_4e_mot17-half-64ee2ed4.pth", + ), + ), + type="DeepSORT", + motion=dict(type="KalmanFilter", center_only=False), reid=dict( - type='BaseReID', - backbone=dict( - type='ResNet', - depth=50, - num_stages=4, - out_indices=(3, ), - style='pytorch'), - neck=dict(type='GlobalAveragePooling', kernel_size=(8, 4), stride=1), + type="BaseReID", + backbone=dict(type="ResNet", depth=50, num_stages=4, out_indices=(3,), style="pytorch"), + neck=dict(type="GlobalAveragePooling", kernel_size=(8, 4), stride=1), head=dict( - type='LinearReIDHead', + type="LinearReIDHead", num_fcs=1, in_channels=2048, fc_channels=1024, out_channels=128, num_classes=380, - loss=dict(type='CrossEntropyLoss', loss_weight=1.0), - loss_pairwise=dict( - type='TripletLoss', margin=0.3, loss_weight=1.0), - norm_cfg=dict(type='BN1d'), - act_cfg=dict(type='ReLU')), + loss=dict(type="CrossEntropyLoss", loss_weight=1.0), + loss_pairwise=dict(type="TripletLoss", margin=0.3, loss_weight=1.0), + norm_cfg=dict(type="BN1d"), + act_cfg=dict(type="ReLU"), + ), init_cfg=dict( - type='Pretrained', - checkpoint='https://download.openmmlab.com/mmtracking/' - 'mot/reid/tracktor_reid_r50_iter25245-a452f51f.pth')), + type="Pretrained", checkpoint="https://download.openmmlab.com/mmtracking/" "mot/reid/tracktor_reid_r50_iter25245-a452f51f.pth" + ), + ), tracker=dict( - type='SortTracker', + type="SortTracker", obj_score_thr=0.5, - reid=dict( - num_samples=10, - img_scale=(256, 128), - img_norm_cfg=None, - match_score_thr=2.0), + reid=dict(num_samples=10, img_scale=(256, 128), img_norm_cfg=None, match_score_thr=2.0), match_iou_thr=0.5, momentums=None, num_tentatives=2, - num_frames_retain=100)) -dataset_type = 'MOTChallengeDataset' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) + num_frames_retain=100, + ), +) +dataset_type = "MOTChallengeDataset" +img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ - dict(type='LoadMultiImagesFromFile', to_float32=True), - dict(type='SeqLoadAnnotations', with_bbox=True, with_track=True), - dict( - type='SeqResize', - img_scale=(1088, 1088), - share_params=True, - ratio_range=(0.8, 1.2), - keep_ratio=True, - bbox_clip_border=False), - dict(type='SeqPhotoMetricDistortion', share_params=True), - dict( - type='SeqRandomCrop', - share_params=False, - crop_size=(1088, 1088), - bbox_clip_border=False), - dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.5), - dict( - type='SeqNormalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='SeqPad', size_divisor=32), - dict(type='MatchInstances', skip_nomatch=True), - dict( - type='VideoCollect', - keys=[ - 'img', 'gt_bboxes', 'gt_labels', 'gt_match_indices', - 'gt_instance_ids' - ]), - dict(type='SeqDefaultFormatBundle', ref_prefix='ref') + dict(type="LoadMultiImagesFromFile", to_float32=True), + dict(type="SeqLoadAnnotations", with_bbox=True, with_track=True), + dict(type="SeqResize", img_scale=(1088, 1088), share_params=True, ratio_range=(0.8, 1.2), keep_ratio=True, bbox_clip_border=False), + dict(type="SeqPhotoMetricDistortion", share_params=True), + dict(type="SeqRandomCrop", share_params=False, crop_size=(1088, 1088), bbox_clip_border=False), + dict(type="SeqRandomFlip", share_params=True, flip_ratio=0.5), + dict(type="SeqNormalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="SeqPad", size_divisor=32), + dict(type="MatchInstances", skip_nomatch=True), + dict(type="VideoCollect", keys=["img", "gt_bboxes", "gt_labels", "gt_match_indices", "gt_instance_ids"]), + dict(type="SeqDefaultFormatBundle", ref_prefix="ref"), ] test_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type="LoadImageFromFile"), dict( - type='MultiScaleFlipAug', + type="MultiScaleFlipAug", img_scale=(1088, 1088), flip=False, transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict( - type='Normalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='VideoCollect', keys=['img']) - ]) + dict(type="Resize", keep_ratio=True), + dict(type="RandomFlip"), + dict(type="Normalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="Pad", size_divisor=32), + dict(type="ImageToTensor", keys=["img"]), + dict(type="VideoCollect", keys=["img"]), + ], + ), ] -data_root = 'data/MOT17/' +data_root = "data/MOT17/" data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( - type='MOTChallengeDataset', + type="MOTChallengeDataset", visibility_thr=-1, - ann_file='data/MOT17/annotations/half-train_cocoformat.json', - img_prefix='data/MOT17/train', - ref_img_sampler=dict( - num_ref_imgs=1, - frame_range=10, - filter_key_img=True, - method='uniform'), + ann_file="data/MOT17/annotations/half-train_cocoformat.json", + img_prefix="data/MOT17/train", + ref_img_sampler=dict(num_ref_imgs=1, frame_range=10, filter_key_img=True, method="uniform"), pipeline=[ - dict(type='LoadMultiImagesFromFile', to_float32=True), - dict(type='SeqLoadAnnotations', with_bbox=True, with_track=True), - dict( - type='SeqResize', - img_scale=(1088, 1088), - share_params=True, - ratio_range=(0.8, 1.2), - keep_ratio=True, - bbox_clip_border=False), - dict(type='SeqPhotoMetricDistortion', share_params=True), - dict( - type='SeqRandomCrop', - share_params=False, - crop_size=(1088, 1088), - bbox_clip_border=False), - dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.5), - dict( - type='SeqNormalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='SeqPad', size_divisor=32), - dict(type='MatchInstances', skip_nomatch=True), + dict(type="LoadMultiImagesFromFile", to_float32=True), + dict(type="SeqLoadAnnotations", with_bbox=True, with_track=True), dict( - type='VideoCollect', - keys=[ - 'img', 'gt_bboxes', 'gt_labels', 'gt_match_indices', - 'gt_instance_ids' - ]), - dict(type='SeqDefaultFormatBundle', ref_prefix='ref') - ]), + type="SeqResize", img_scale=(1088, 1088), share_params=True, ratio_range=(0.8, 1.2), keep_ratio=True, bbox_clip_border=False + ), + dict(type="SeqPhotoMetricDistortion", share_params=True), + dict(type="SeqRandomCrop", share_params=False, crop_size=(1088, 1088), bbox_clip_border=False), + dict(type="SeqRandomFlip", share_params=True, flip_ratio=0.5), + dict(type="SeqNormalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="SeqPad", size_divisor=32), + dict(type="MatchInstances", skip_nomatch=True), + dict(type="VideoCollect", keys=["img", "gt_bboxes", "gt_labels", "gt_match_indices", "gt_instance_ids"]), + dict(type="SeqDefaultFormatBundle", ref_prefix="ref"), + ], + ), val=dict( - type='MOTChallengeDataset', - ann_file='data/MOT17/annotations/half-val_cocoformat.json', - img_prefix='data/MOT17/train', + type="MOTChallengeDataset", + ann_file="data/MOT17/annotations/half-val_cocoformat.json", + img_prefix="data/MOT17/train", ref_img_sampler=None, pipeline=[ - dict(type='LoadImageFromFile'), + dict(type="LoadImageFromFile"), dict( - type='MultiScaleFlipAug', + type="MultiScaleFlipAug", img_scale=(1088, 1088), flip=False, transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict( - type='Normalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='VideoCollect', keys=['img']) - ]) - ]), + dict(type="Resize", keep_ratio=True), + dict(type="RandomFlip"), + dict(type="Normalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="Pad", size_divisor=32), + dict(type="ImageToTensor", keys=["img"]), + dict(type="VideoCollect", keys=["img"]), + ], + ), + ], + ), test=dict( - type='MOTChallengeDataset', - ann_file='data/MOT17/annotations/half-val_cocoformat.json', - img_prefix='data/MOT17/train', + type="MOTChallengeDataset", + ann_file="data/MOT17/annotations/half-val_cocoformat.json", + img_prefix="data/MOT17/train", ref_img_sampler=None, pipeline=[ - dict(type='LoadImageFromFile'), + dict(type="LoadImageFromFile"), dict( - type='MultiScaleFlipAug', + type="MultiScaleFlipAug", img_scale=(1088, 1088), flip=False, transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict( - type='Normalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='VideoCollect', keys=['img']) - ]) - ])) -optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) + dict(type="Resize", keep_ratio=True), + dict(type="RandomFlip"), + dict(type="Normalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="Pad", size_divisor=32), + dict(type="ImageToTensor", keys=["img"]), + dict(type="VideoCollect", keys=["img"]), + ], + ), + ], + ), +) +optimizer = dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) checkpoint_config = dict(interval=1) -log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) -dist_params = dict(backend='nccl') -log_level = 'INFO' +log_config = dict(interval=50, hooks=[dict(type="TextLoggerHook")]) +dist_params = dict(backend="nccl") +log_level = "INFO" load_from = None resume_from = None -workflow = [('train', 1)] -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=100, - warmup_ratio=0.01, - step=[3]) +workflow = [("train", 1)] +lr_config = dict(policy="step", warmup="linear", warmup_iters=100, warmup_ratio=0.01, step=[3]) total_epochs = 4 -evaluation = dict(metric=['bbox', 'track'], interval=1) -search_metrics = ['MOTA', 'IDF1', 'FN', 'FP', 'IDs', 'MT', 'ML'] +evaluation = dict(metric=["bbox", "track"], interval=1) +search_metrics = ["MOTA", "IDF1", "FN", "FP", "IDs", "MT", "ML"] diff --git a/mmpose/demo/mmtracking_cfg/tracktor_faster-rcnn_r50_fpn_4e_mot17-private.py b/mmpose/demo/mmtracking_cfg/tracktor_faster-rcnn_r50_fpn_4e_mot17-private.py index 9736269bd9ca1f950eadaa7a4933656db3130ca8..19d42206e44631c372be03e0693c54d053c8b4d7 100644 --- a/mmpose/demo/mmtracking_cfg/tracktor_faster-rcnn_r50_fpn_4e_mot17-private.py +++ b/mmpose/demo/mmtracking_cfg/tracktor_faster-rcnn_r50_fpn_4e_mot17-private.py @@ -1,325 +1,222 @@ +# Copyright (c) OpenMMLab. All rights reserved. model = dict( detector=dict( - type='FasterRCNN', - pretrained='torchvision://resnet50', + type="FasterRCNN", + pretrained="torchvision://resnet50", backbone=dict( - type='ResNet', + type="ResNet", depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, - norm_cfg=dict(type='BN', requires_grad=True), + norm_cfg=dict(type="BN", requires_grad=True), norm_eval=True, - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - num_outs=5), + style="pytorch", + ), + neck=dict(type="FPN", in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( - type='RPNHead', + type="RPNHead", in_channels=256, feat_channels=256, - anchor_generator=dict( - type='AnchorGenerator', - scales=[8], - ratios=[0.5, 1.0, 2.0], - strides=[4, 8, 16, 32, 64]), + anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0.0, 0.0, 0.0, 0.0], - target_stds=[1.0, 1.0, 1.0, 1.0], - clip_border=False), - loss_cls=dict( - type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), - loss_bbox=dict( - type='SmoothL1Loss', beta=0.1111111111111111, - loss_weight=1.0)), + type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0], clip_border=False + ), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", beta=0.1111111111111111, loss_weight=1.0), + ), roi_head=dict( - type='StandardRoIHead', + type="StandardRoIHead", bbox_roi_extractor=dict( - type='SingleRoIExtractor', - roi_layer=dict( - type='RoIAlign', output_size=7, sampling_ratio=0), + type="SingleRoIExtractor", + roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0), out_channels=256, - featmap_strides=[4, 8, 16, 32]), + featmap_strides=[4, 8, 16, 32], + ), bbox_head=dict( - type='Shared2FCBBoxHead', + type="Shared2FCBBoxHead", in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, bbox_coder=dict( - type='DeltaXYWHBBoxCoder', - target_means=[0.0, 0.0, 0.0, 0.0], - target_stds=[0.1, 0.1, 0.2, 0.2], - clip_border=False), + type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2], clip_border=False + ), reg_class_agnostic=False, - loss_cls=dict( - type='CrossEntropyLoss', - use_sigmoid=False, - loss_weight=1.0), - loss_bbox=dict(type='SmoothL1Loss', loss_weight=1.0))), + loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0), + loss_bbox=dict(type="SmoothL1Loss", loss_weight=1.0), + ), + ), train_cfg=dict( rpn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.7, - neg_iou_thr=0.3, - min_pos_iou=0.3, - match_low_quality=True, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=256, - pos_fraction=0.5, - neg_pos_ub=-1, - add_gt_as_proposals=False), + type="MaxIoUAssigner", pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, - debug=False), - rpn_proposal=dict( - nms_pre=2000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), + debug=False, + ), + rpn_proposal=dict(nms_pre=2000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.5, - min_pos_iou=0.5, - match_low_quality=False, - ignore_iof_thr=-1), - sampler=dict( - type='RandomSampler', - num=512, - pos_fraction=0.25, - neg_pos_ub=-1, - add_gt_as_proposals=True), + type="MaxIoUAssigner", pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1 + ), + sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, - debug=False)), + debug=False, + ), + ), test_cfg=dict( - rpn=dict( - nms_pre=1000, - max_per_img=1000, - nms=dict(type='nms', iou_threshold=0.7), - min_bbox_size=0), - rcnn=dict( - score_thr=0.05, - nms=dict(type='nms', iou_threshold=0.5), - max_per_img=100))), - type='Tracktor', + rpn=dict(nms_pre=1000, max_per_img=1000, nms=dict(type="nms", iou_threshold=0.7), min_bbox_size=0), + rcnn=dict(score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5), max_per_img=100), + ), + ), + type="Tracktor", pretrains=dict( - detector='https://download.openmmlab.com/mmtracking/' - 'mot/faster_rcnn/faster-rcnn_r50_fpn_4e_mot17-ffa52ae7.pth', - reid='https://download.openmmlab.com/mmtracking/mot/' - 'reid/reid_r50_6e_mot17-4bf6b63d.pth'), + detector="https://download.openmmlab.com/mmtracking/" "mot/faster_rcnn/faster-rcnn_r50_fpn_4e_mot17-ffa52ae7.pth", + reid="https://download.openmmlab.com/mmtracking/mot/" "reid/reid_r50_6e_mot17-4bf6b63d.pth", + ), reid=dict( - type='BaseReID', - backbone=dict( - type='ResNet', - depth=50, - num_stages=4, - out_indices=(3, ), - style='pytorch'), - neck=dict(type='GlobalAveragePooling', kernel_size=(8, 4), stride=1), + type="BaseReID", + backbone=dict(type="ResNet", depth=50, num_stages=4, out_indices=(3,), style="pytorch"), + neck=dict(type="GlobalAveragePooling", kernel_size=(8, 4), stride=1), head=dict( - type='LinearReIDHead', + type="LinearReIDHead", num_fcs=1, in_channels=2048, fc_channels=1024, out_channels=128, num_classes=378, - loss=dict(type='CrossEntropyLoss', loss_weight=1.0), - loss_pairwise=dict( - type='TripletLoss', margin=0.3, loss_weight=1.0), - norm_cfg=dict(type='BN1d'), - act_cfg=dict(type='ReLU'))), - motion=dict( - type='CameraMotionCompensation', - warp_mode='cv2.MOTION_EUCLIDEAN', - num_iters=100, - stop_eps=1e-05), + loss=dict(type="CrossEntropyLoss", loss_weight=1.0), + loss_pairwise=dict(type="TripletLoss", margin=0.3, loss_weight=1.0), + norm_cfg=dict(type="BN1d"), + act_cfg=dict(type="ReLU"), + ), + ), + motion=dict(type="CameraMotionCompensation", warp_mode="cv2.MOTION_EUCLIDEAN", num_iters=100, stop_eps=1e-05), tracker=dict( - type='TracktorTracker', + type="TracktorTracker", obj_score_thr=0.5, - regression=dict( - obj_score_thr=0.5, - nms=dict(type='nms', iou_threshold=0.6), - match_iou_thr=0.3), - reid=dict( - num_samples=10, - img_scale=(256, 128), - img_norm_cfg=None, - match_score_thr=2.0, - match_iou_thr=0.2), + regression=dict(obj_score_thr=0.5, nms=dict(type="nms", iou_threshold=0.6), match_iou_thr=0.3), + reid=dict(num_samples=10, img_scale=(256, 128), img_norm_cfg=None, match_score_thr=2.0, match_iou_thr=0.2), momentums=None, - num_frames_retain=10)) -dataset_type = 'MOTChallengeDataset' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) + num_frames_retain=10, + ), +) +dataset_type = "MOTChallengeDataset" +img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ - dict(type='LoadMultiImagesFromFile', to_float32=True), - dict(type='SeqLoadAnnotations', with_bbox=True, with_track=True), - dict( - type='SeqResize', - img_scale=(1088, 1088), - share_params=True, - ratio_range=(0.8, 1.2), - keep_ratio=True, - bbox_clip_border=False), - dict(type='SeqPhotoMetricDistortion', share_params=True), - dict( - type='SeqRandomCrop', - share_params=False, - crop_size=(1088, 1088), - bbox_clip_border=False), - dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.5), - dict( - type='SeqNormalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='SeqPad', size_divisor=32), - dict(type='MatchInstances', skip_nomatch=True), - dict( - type='VideoCollect', - keys=[ - 'img', 'gt_bboxes', 'gt_labels', 'gt_match_indices', - 'gt_instance_ids' - ]), - dict(type='SeqDefaultFormatBundle', ref_prefix='ref') + dict(type="LoadMultiImagesFromFile", to_float32=True), + dict(type="SeqLoadAnnotations", with_bbox=True, with_track=True), + dict(type="SeqResize", img_scale=(1088, 1088), share_params=True, ratio_range=(0.8, 1.2), keep_ratio=True, bbox_clip_border=False), + dict(type="SeqPhotoMetricDistortion", share_params=True), + dict(type="SeqRandomCrop", share_params=False, crop_size=(1088, 1088), bbox_clip_border=False), + dict(type="SeqRandomFlip", share_params=True, flip_ratio=0.5), + dict(type="SeqNormalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="SeqPad", size_divisor=32), + dict(type="MatchInstances", skip_nomatch=True), + dict(type="VideoCollect", keys=["img", "gt_bboxes", "gt_labels", "gt_match_indices", "gt_instance_ids"]), + dict(type="SeqDefaultFormatBundle", ref_prefix="ref"), ] test_pipeline = [ - dict(type='LoadImageFromFile'), + dict(type="LoadImageFromFile"), dict( - type='MultiScaleFlipAug', + type="MultiScaleFlipAug", img_scale=(1088, 1088), flip=False, transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict( - type='Normalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='VideoCollect', keys=['img']) - ]) + dict(type="Resize", keep_ratio=True), + dict(type="RandomFlip"), + dict(type="Normalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="Pad", size_divisor=32), + dict(type="ImageToTensor", keys=["img"]), + dict(type="VideoCollect", keys=["img"]), + ], + ), ] -data_root = 'data/MOT17/' +data_root = "data/MOT17/" data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( - type='MOTChallengeDataset', + type="MOTChallengeDataset", visibility_thr=-1, - ann_file='data/MOT17/annotations/train_cocoformat.json', - img_prefix='data/MOT17/train', - ref_img_sampler=dict( - num_ref_imgs=1, - frame_range=10, - filter_key_img=True, - method='uniform'), + ann_file="data/MOT17/annotations/train_cocoformat.json", + img_prefix="data/MOT17/train", + ref_img_sampler=dict(num_ref_imgs=1, frame_range=10, filter_key_img=True, method="uniform"), pipeline=[ - dict(type='LoadMultiImagesFromFile', to_float32=True), - dict(type='SeqLoadAnnotations', with_bbox=True, with_track=True), - dict( - type='SeqResize', - img_scale=(1088, 1088), - share_params=True, - ratio_range=(0.8, 1.2), - keep_ratio=True, - bbox_clip_border=False), - dict(type='SeqPhotoMetricDistortion', share_params=True), - dict( - type='SeqRandomCrop', - share_params=False, - crop_size=(1088, 1088), - bbox_clip_border=False), - dict(type='SeqRandomFlip', share_params=True, flip_ratio=0.5), - dict( - type='SeqNormalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='SeqPad', size_divisor=32), - dict(type='MatchInstances', skip_nomatch=True), + dict(type="LoadMultiImagesFromFile", to_float32=True), + dict(type="SeqLoadAnnotations", with_bbox=True, with_track=True), dict( - type='VideoCollect', - keys=[ - 'img', 'gt_bboxes', 'gt_labels', 'gt_match_indices', - 'gt_instance_ids' - ]), - dict(type='SeqDefaultFormatBundle', ref_prefix='ref') - ]), + type="SeqResize", img_scale=(1088, 1088), share_params=True, ratio_range=(0.8, 1.2), keep_ratio=True, bbox_clip_border=False + ), + dict(type="SeqPhotoMetricDistortion", share_params=True), + dict(type="SeqRandomCrop", share_params=False, crop_size=(1088, 1088), bbox_clip_border=False), + dict(type="SeqRandomFlip", share_params=True, flip_ratio=0.5), + dict(type="SeqNormalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="SeqPad", size_divisor=32), + dict(type="MatchInstances", skip_nomatch=True), + dict(type="VideoCollect", keys=["img", "gt_bboxes", "gt_labels", "gt_match_indices", "gt_instance_ids"]), + dict(type="SeqDefaultFormatBundle", ref_prefix="ref"), + ], + ), val=dict( - type='MOTChallengeDataset', - ann_file='data/MOT17/annotations/train_cocoformat.json', - img_prefix='data/MOT17/train', + type="MOTChallengeDataset", + ann_file="data/MOT17/annotations/train_cocoformat.json", + img_prefix="data/MOT17/train", ref_img_sampler=None, pipeline=[ - dict(type='LoadImageFromFile'), + dict(type="LoadImageFromFile"), dict( - type='MultiScaleFlipAug', + type="MultiScaleFlipAug", img_scale=(1088, 1088), flip=False, transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict( - type='Normalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='VideoCollect', keys=['img']) - ]) - ]), + dict(type="Resize", keep_ratio=True), + dict(type="RandomFlip"), + dict(type="Normalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="Pad", size_divisor=32), + dict(type="ImageToTensor", keys=["img"]), + dict(type="VideoCollect", keys=["img"]), + ], + ), + ], + ), test=dict( - type='MOTChallengeDataset', - ann_file='data/MOT17/annotations/train_cocoformat.json', - img_prefix='data/MOT17/train', + type="MOTChallengeDataset", + ann_file="data/MOT17/annotations/train_cocoformat.json", + img_prefix="data/MOT17/train", ref_img_sampler=None, pipeline=[ - dict(type='LoadImageFromFile'), + dict(type="LoadImageFromFile"), dict( - type='MultiScaleFlipAug', + type="MultiScaleFlipAug", img_scale=(1088, 1088), flip=False, transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict( - type='Normalize', - mean=[123.675, 116.28, 103.53], - std=[58.395, 57.12, 57.375], - to_rgb=True), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='VideoCollect', keys=['img']) - ]) - ])) -optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) + dict(type="Resize", keep_ratio=True), + dict(type="RandomFlip"), + dict(type="Normalize", mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), + dict(type="Pad", size_divisor=32), + dict(type="ImageToTensor", keys=["img"]), + dict(type="VideoCollect", keys=["img"]), + ], + ), + ], + ), +) +optimizer = dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) checkpoint_config = dict(interval=1) -log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) -dist_params = dict(backend='nccl') -log_level = 'INFO' +log_config = dict(interval=50, hooks=[dict(type="TextLoggerHook")]) +dist_params = dict(backend="nccl") +log_level = "INFO" load_from = None resume_from = None -workflow = [('train', 1)] -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=100, - warmup_ratio=0.01, - step=[3]) +workflow = [("train", 1)] +lr_config = dict(policy="step", warmup="linear", warmup_iters=100, warmup_ratio=0.01, step=[3]) total_epochs = 4 -evaluation = dict(metric=['bbox', 'track'], interval=1) -search_metrics = ['MOTA', 'IDF1', 'FN', 'FP', 'IDs', 'MT', 'ML'] -test_set = 'train' +evaluation = dict(metric=["bbox", "track"], interval=1) +search_metrics = ["MOTA", "IDF1", "FN", "FP", "IDs", "MT", "ML"] +test_set = "train" diff --git a/mmpose/demo/topdown_demo_with_mmdet.py b/mmpose/demo/topdown_demo_with_mmdet.py index 4e39c362076b27b3dab23536894f2ce616989938..50e4847e6eeeb028771e29fa761653f7415eb416 100644 --- a/mmpose/demo/topdown_demo_with_mmdet.py +++ b/mmpose/demo/topdown_demo_with_mmdet.py @@ -12,8 +12,7 @@ import mmengine import numpy as np from mmengine.logging import print_log -from mmpose.apis import inference_topdown -from mmpose.apis import init_model as init_pose_estimator +from mmpose.apis import inference_topdown, init_model as init_pose_estimator from mmpose.evaluation.functional import nms from mmpose.registry import VISUALIZERS from mmpose.structures import merge_data_samples, split_instances @@ -21,26 +20,20 @@ from mmpose.utils import adapt_mmdet_pipeline try: from mmdet.apis import inference_detector, init_detector + has_mmdet = True except (ImportError, ModuleNotFoundError): has_mmdet = False -def process_one_image(args, - img, - detector, - pose_estimator, - visualizer=None, - show_interval=0): +def process_one_image(args, img, detector, pose_estimator, visualizer=None, show_interval=0): """Visualize predicted keypoints (and heatmaps) of one image.""" # predict bbox det_result = inference_detector(detector, img) pred_instance = det_result.pred_instances.cpu().numpy() - bboxes = np.concatenate( - (pred_instance.bboxes, pred_instance.scores[:, None]), axis=1) - bboxes = bboxes[np.logical_and(pred_instance.labels == args.det_cat_id, - pred_instance.scores > args.bbox_thr)] + bboxes = np.concatenate((pred_instance.bboxes, pred_instance.scores[:, None]), axis=1) + bboxes = bboxes[np.logical_and(pred_instance.labels == args.det_cat_id, pred_instance.scores > args.bbox_thr)] bboxes = bboxes[nms(bboxes, args.nms_thr), :4] # predict keypoints @@ -49,13 +42,13 @@ def process_one_image(args, # show the results if isinstance(img, str): - img = mmcv.imread(img, channel_order='rgb') + img = mmcv.imread(img, channel_order="rgb") elif isinstance(img, np.ndarray): img = mmcv.bgr2rgb(img) if visualizer is not None: visualizer.add_datasample( - 'result', + "result", img, data_sample=data_samples, draw_gt=False, @@ -65,10 +58,11 @@ def process_one_image(args, skeleton_style=args.skeleton_style, show=args.show, wait_time=show_interval, - kpt_thr=args.kpt_thr) + kpt_thr=args.kpt_thr, + ) # if there is no instance detected, return None - return data_samples.get('pred_instances', None) + return data_samples.get("pred_instances", None) def main(): @@ -77,108 +71,52 @@ def main(): Using mmdet to detect the human. """ parser = ArgumentParser() - parser.add_argument('det_config', help='Config file for detection') - parser.add_argument('det_checkpoint', help='Checkpoint file for detection') - parser.add_argument('pose_config', help='Config file for pose') - parser.add_argument('pose_checkpoint', help='Checkpoint file for pose') - parser.add_argument( - '--input', type=str, default='', help='Image/Video file') - parser.add_argument( - '--show', - action='store_true', - default=False, - help='whether to show img') - parser.add_argument( - '--output-root', - type=str, - default='', - help='root of the output img file. ' - 'Default not saving the visualization images.') - parser.add_argument( - '--save-predictions', - action='store_true', - default=False, - help='whether to save predicted results') - parser.add_argument( - '--device', default='cuda:0', help='Device used for inference') - parser.add_argument( - '--det-cat-id', - type=int, - default=0, - help='Category id for bounding box detection model') - parser.add_argument( - '--bbox-thr', - type=float, - default=0.3, - help='Bounding box score threshold') - parser.add_argument( - '--nms-thr', - type=float, - default=0.3, - help='IoU threshold for bounding box NMS') - parser.add_argument( - '--kpt-thr', - type=float, - default=0.3, - help='Visualizing keypoint thresholds') - parser.add_argument( - '--draw-heatmap', - action='store_true', - default=False, - help='Draw heatmap predicted by the model') - parser.add_argument( - '--show-kpt-idx', - action='store_true', - default=False, - help='Whether to show the index of keypoints') - parser.add_argument( - '--skeleton-style', - default='mmpose', - type=str, - choices=['mmpose', 'openpose'], - help='Skeleton style selection') - parser.add_argument( - '--radius', - type=int, - default=3, - help='Keypoint radius for visualization') - parser.add_argument( - '--thickness', - type=int, - default=1, - help='Link thickness for visualization') - parser.add_argument( - '--show-interval', type=int, default=0, help='Sleep seconds per frame') - parser.add_argument( - '--alpha', type=float, default=0.8, help='The transparency of bboxes') - parser.add_argument( - '--draw-bbox', action='store_true', help='Draw bboxes of instances') - - assert has_mmdet, 'Please install mmdet to run the demo.' + parser.add_argument("det_config", help="Config file for detection") + parser.add_argument("det_checkpoint", help="Checkpoint file for detection") + parser.add_argument("pose_config", help="Config file for pose") + parser.add_argument("pose_checkpoint", help="Checkpoint file for pose") + parser.add_argument("--input", type=str, default="", help="Image/Video file") + parser.add_argument("--show", action="store_true", default=False, help="whether to show img") + parser.add_argument( + "--output-root", type=str, default="", help="root of the output img file. " "Default not saving the visualization images." + ) + parser.add_argument("--save-predictions", action="store_true", default=False, help="whether to save predicted results") + parser.add_argument("--device", default="cuda:0", help="Device used for inference") + parser.add_argument("--det-cat-id", type=int, default=0, help="Category id for bounding box detection model") + parser.add_argument("--bbox-thr", type=float, default=0.3, help="Bounding box score threshold") + parser.add_argument("--nms-thr", type=float, default=0.3, help="IoU threshold for bounding box NMS") + parser.add_argument("--kpt-thr", type=float, default=0.3, help="Visualizing keypoint thresholds") + parser.add_argument("--draw-heatmap", action="store_true", default=False, help="Draw heatmap predicted by the model") + parser.add_argument("--show-kpt-idx", action="store_true", default=False, help="Whether to show the index of keypoints") + parser.add_argument("--skeleton-style", default="mmpose", type=str, choices=["mmpose", "openpose"], help="Skeleton style selection") + parser.add_argument("--radius", type=int, default=3, help="Keypoint radius for visualization") + parser.add_argument("--thickness", type=int, default=1, help="Link thickness for visualization") + parser.add_argument("--show-interval", type=int, default=0, help="Sleep seconds per frame") + parser.add_argument("--alpha", type=float, default=0.8, help="The transparency of bboxes") + parser.add_argument("--draw-bbox", action="store_true", help="Draw bboxes of instances") + + assert has_mmdet, "Please install mmdet to run the demo." args = parser.parse_args() - assert args.show or (args.output_root != '') - assert args.input != '' + assert args.show or (args.output_root != "") + assert args.input != "" assert args.det_config is not None assert args.det_checkpoint is not None output_file = None if args.output_root: mmengine.mkdir_or_exist(args.output_root) - output_file = os.path.join(args.output_root, - os.path.basename(args.input)) - if args.input == 'webcam': - output_file += '.mp4' + output_file = os.path.join(args.output_root, os.path.basename(args.input)) + if args.input == "webcam": + output_file += ".mp4" if args.save_predictions: - assert args.output_root != '' - args.pred_save_path = f'{args.output_root}/results_' \ - f'{os.path.splitext(os.path.basename(args.input))[0]}.json' + assert args.output_root != "" + args.pred_save_path = f"{args.output_root}/results_" f"{os.path.splitext(os.path.basename(args.input))[0]}.json" # build detector - detector = init_detector( - args.det_config, args.det_checkpoint, device=args.device) + detector = init_detector(args.det_config, args.det_checkpoint, device=args.device) detector.cfg = adapt_mmdet_pipeline(detector.cfg) # build pose estimator @@ -186,8 +124,8 @@ def main(): args.pose_config, args.pose_checkpoint, device=args.device, - cfg_options=dict( - model=dict(test_cfg=dict(output_heatmaps=args.draw_heatmap)))) + cfg_options=dict(model=dict(test_cfg=dict(output_heatmaps=args.draw_heatmap))), + ) # build visualizer pose_estimator.cfg.visualizer.radius = args.radius @@ -196,19 +134,17 @@ def main(): visualizer = VISUALIZERS.build(pose_estimator.cfg.visualizer) # the dataset_meta is loaded from the checkpoint and # then pass to the model in init_pose_estimator - visualizer.set_dataset_meta( - pose_estimator.dataset_meta, skeleton_style=args.skeleton_style) + visualizer.set_dataset_meta(pose_estimator.dataset_meta, skeleton_style=args.skeleton_style) - if args.input == 'webcam': - input_type = 'webcam' + if args.input == "webcam": + input_type = "webcam" else: - input_type = mimetypes.guess_type(args.input)[0].split('/')[0] + input_type = mimetypes.guess_type(args.input)[0].split("/")[0] - if input_type == 'image': + if input_type == "image": # inference - pred_instances = process_one_image(args, args.input, detector, - pose_estimator, visualizer) + pred_instances = process_one_image(args, args.input, detector, pose_estimator, visualizer) if args.save_predictions: pred_instances_list = split_instances(pred_instances) @@ -217,9 +153,9 @@ def main(): img_vis = visualizer.get_image() mmcv.imwrite(mmcv.rgb2bgr(img_vis), output_file) - elif input_type in ['webcam', 'video']: + elif input_type in ["webcam", "video"]: - if args.input == 'webcam': + if args.input == "webcam": cap = cv2.VideoCapture(0) else: cap = cv2.VideoCapture(args.input) @@ -236,30 +172,21 @@ def main(): break # topdown pose estimation - pred_instances = process_one_image(args, frame, detector, - pose_estimator, visualizer, - 0.001) + pred_instances = process_one_image(args, frame, detector, pose_estimator, visualizer, 0.001) if args.save_predictions: # save prediction results - pred_instances_list.append( - dict( - frame_id=frame_idx, - instances=split_instances(pred_instances))) + pred_instances_list.append(dict(frame_id=frame_idx, instances=split_instances(pred_instances))) # output videos if output_file: frame_vis = visualizer.get_image() if video_writer is None: - fourcc = cv2.VideoWriter_fourcc(*'mp4v') + fourcc = cv2.VideoWriter_fourcc(*"mp4v") # the size of the image with visualization may vary # depending on the presence of heatmaps - video_writer = cv2.VideoWriter( - output_file, - fourcc, - 25, # saved fps - (frame_vis.shape[1], frame_vis.shape[0])) + video_writer = cv2.VideoWriter(output_file, fourcc, 25, (frame_vis.shape[1], frame_vis.shape[0])) # saved fps video_writer.write(mmcv.rgb2bgr(frame_vis)) @@ -277,26 +204,17 @@ def main(): else: args.save_predictions = False - raise ValueError( - f'file {os.path.basename(args.input)} has invalid format.') + raise ValueError(f"file {os.path.basename(args.input)} has invalid format.") if args.save_predictions: - with open(args.pred_save_path, 'w') as f: - json.dump( - dict( - meta_info=pose_estimator.dataset_meta, - instance_info=pred_instances_list), - f, - indent='\t') - print(f'predictions have been saved at {args.pred_save_path}') + with open(args.pred_save_path, "w") as f: + json.dump(dict(meta_info=pose_estimator.dataset_meta, instance_info=pred_instances_list), f, indent="\t") + print(f"predictions have been saved at {args.pred_save_path}") if output_file: - input_type = input_type.replace('webcam', 'video') - print_log( - f'the output {input_type} has been saved at {output_file}', - logger='current', - level=logging.INFO) + input_type = input_type.replace("webcam", "video") + print_log(f"the output {input_type} has been saved at {output_file}", logger="current", level=logging.INFO) -if __name__ == '__main__': +if __name__ == "__main__": main() diff --git a/mmpose/engine/hooks/__init__.py b/mmpose/engine/hooks/__init__.py index 2527a258bcf8888bae8b3c259d7a97b3fce541e4..324f54ccc8af8aa5f8649494b7429c7732815336 100644 --- a/mmpose/engine/hooks/__init__.py +++ b/mmpose/engine/hooks/__init__.py @@ -6,6 +6,10 @@ from .sync_norm_hook import SyncNormHook from .visualization_hook import PoseVisualizationHook __all__ = [ - 'PoseVisualizationHook', 'ExpMomentumEMA', 'BadCaseAnalysisHook', - 'YOLOXPoseModeSwitchHook', 'SyncNormHook', 'RTMOModeSwitchHook' + "PoseVisualizationHook", + "ExpMomentumEMA", + "BadCaseAnalysisHook", + "YOLOXPoseModeSwitchHook", + "SyncNormHook", + "RTMOModeSwitchHook", ] diff --git a/mmpose/engine/hooks/badcase_hook.py b/mmpose/engine/hooks/badcase_hook.py index a06ef5af53fc0eedd7546cd590a5c8bb848c1c9b..8079108d4db1eda762905462c2ac900c2ea95d9d 100644 --- a/mmpose/engine/hooks/badcase_hook.py +++ b/mmpose/engine/hooks/badcase_hook.py @@ -61,14 +61,14 @@ class BadCaseAnalysisHook(Hook): self, enable: bool = False, show: bool = False, - wait_time: float = 0., + wait_time: float = 0.0, interval: int = 50, kpt_thr: float = 0.3, out_dir: Optional[str] = None, backend_args: Optional[dict] = None, - metric_type: str = 'loss', - metric: ConfigDict = ConfigDict(type='KeypointMSELoss'), - metric_key: str = 'PCK', + metric_type: str = "loss", + metric: ConfigDict = ConfigDict(type="KeypointMSELoss"), + metric_key: str = "PCK", badcase_thr: float = 5, ): self._visualizer: Visualizer = Visualizer.get_current_instance() @@ -78,10 +78,12 @@ class BadCaseAnalysisHook(Hook): if self.show: # No need to think about vis backends. self._visualizer._vis_backends = {} - warnings.warn('The show is True, it means that only ' - 'the prediction results are visualized ' - 'without storing data, so vis_backends ' - 'needs to be excluded.') + warnings.warn( + "The show is True, it means that only " + "the prediction results are visualized " + "without storing data, so vis_backends " + "needs to be excluded." + ) self.wait_time = wait_time self.enable = enable @@ -90,15 +92,14 @@ class BadCaseAnalysisHook(Hook): self.backend_args = backend_args self.metric_type = metric_type - if metric_type not in ['loss', 'accuracy']: + if metric_type not in ["loss", "accuracy"]: raise KeyError( - f'The badcase metric type {metric_type} is not supported by ' + f"The badcase metric type {metric_type} is not supported by " f"{self.__class__.__name__}. Should be one of 'loss', " - f"'accuracy', but got {metric_type}.") - self.metric = MODELS.build(metric) if metric_type == 'loss'\ - else METRICS.build(metric) - self.metric_name = metric.type if metric_type == 'loss'\ - else metric_key + f"'accuracy', but got {metric_type}." + ) + self.metric = MODELS.build(metric) if metric_type == "loss" else METRICS.build(metric) + self.metric_name = metric.type if metric_type == "loss" else metric_key self.metric_key = metric_key self.badcase_thr = badcase_thr self.results = [] @@ -115,14 +116,12 @@ class BadCaseAnalysisHook(Hook): is_badcase (bool): whether the sample is a badcase or not metric_value (float) """ - if self.metric_type == 'loss': + if self.metric_type == "loss": gts = data_sample.gt_instances.keypoints preds = data_sample.pred_instances.keypoints weights = data_sample.gt_instances.keypoints_visible with torch.no_grad(): - metric_value = self.metric( - torch.from_numpy(preds), torch.from_numpy(gts), - torch.from_numpy(weights)).item() + metric_value = self.metric(torch.from_numpy(preds), torch.from_numpy(gts), torch.from_numpy(weights)).item() is_badcase = metric_value >= self.badcase_thr else: self.metric.process([data_batch], [data_sample.to_dict()]) @@ -130,8 +129,7 @@ class BadCaseAnalysisHook(Hook): is_badcase = metric_value <= self.badcase_thr return is_badcase, metric_value - def after_test_iter(self, runner: Runner, batch_idx: int, data_batch: dict, - outputs: Sequence[PoseDataSample]) -> None: + def after_test_iter(self, runner: Runner, batch_idx: int, data_batch: dict, outputs: Sequence[PoseDataSample]) -> None: """Run after every testing iterations. Args: @@ -144,8 +142,7 @@ class BadCaseAnalysisHook(Hook): return if self.out_dir is not None: - self.out_dir = os.path.join(runner.work_dir, runner.timestamp, - self.out_dir) + self.out_dir = os.path.join(runner.work_dir, runner.timestamp, self.out_dir) mmengine.mkdir_or_exist(self.out_dir) self._visualizer.set_dataset_meta(runner.test_evaluator.dataset_meta) @@ -153,38 +150,33 @@ class BadCaseAnalysisHook(Hook): for data_sample in outputs: self._test_index += 1 - img_path = data_sample.get('img_path') + img_path = data_sample.get("img_path") img_bytes = fileio.get(img_path, backend_args=self.backend_args) - img = mmcv.imfrombytes(img_bytes, channel_order='rgb') + img = mmcv.imfrombytes(img_bytes, channel_order="rgb") data_sample = merge_data_samples([data_sample]) - is_badcase, metric_value = self.check_badcase( - data_batch, data_sample) + is_badcase, metric_value = self.check_badcase(data_batch, data_sample) if is_badcase: - img_name, postfix = os.path.basename(img_path).rsplit('.', 1) + img_name, postfix = os.path.basename(img_path).rsplit(".", 1) bboxes = data_sample.gt_instances.bboxes.astype(int).tolist() - bbox_info = 'bbox' + str(bboxes) + bbox_info = "bbox" + str(bboxes) metric_postfix = self.metric_name + str(round(metric_value, 2)) - self.results.append({ - 'img': img_name, - 'bbox': bboxes, - self.metric_name: metric_value - }) + self.results.append({"img": img_name, "bbox": bboxes, self.metric_name: metric_value}) - badcase_name = f'{img_name}_{bbox_info}_{metric_postfix}' + badcase_name = f"{img_name}_{bbox_info}_{metric_postfix}" out_file = None if self.out_dir is not None: - out_file = f'{badcase_name}.{postfix}' + out_file = f"{badcase_name}.{postfix}" out_file = os.path.join(self.out_dir, out_file) # draw gt keypoints in blue color - self._visualizer.kpt_color = 'blue' - self._visualizer.link_color = 'blue' + self._visualizer.kpt_color = "blue" + self._visualizer.link_color = "blue" img_gt_drawn = self._visualizer.add_datasample( - badcase_name if self.show else 'test_img', + badcase_name if self.show else "test_img", img, data_sample=data_sample, show=False, @@ -195,12 +187,13 @@ class BadCaseAnalysisHook(Hook): wait_time=self.wait_time, kpt_thr=self.kpt_thr, out_file=None, - step=self._test_index) + step=self._test_index, + ) # draw pred keypoints in red color - self._visualizer.kpt_color = 'red' - self._visualizer.link_color = 'red' + self._visualizer.kpt_color = "red" + self._visualizer.link_color = "red" self._visualizer.add_datasample( - badcase_name if self.show else 'test_img', + badcase_name if self.show else "test_img", img_gt_drawn, data_sample=data_sample, show=self.show, @@ -211,11 +204,10 @@ class BadCaseAnalysisHook(Hook): wait_time=self.wait_time, kpt_thr=self.kpt_thr, out_file=out_file, - step=self._test_index) + step=self._test_index, + ) - def after_test_epoch(self, - runner, - metrics: Optional[Dict[str, float]] = None) -> None: + def after_test_epoch(self, runner, metrics: Optional[Dict[str, float]] = None) -> None: """All subclasses should override this method, if they need any operations after each test epoch. @@ -229,11 +221,8 @@ class BadCaseAnalysisHook(Hook): return mmengine.mkdir_or_exist(self.out_dir) - out_file = os.path.join(self.out_dir, 'results.json') - with open(out_file, 'w') as f: + out_file = os.path.join(self.out_dir, "results.json") + with open(out_file, "w") as f: json.dump(self.results, f) - print_log( - f'the bad cases are saved under {self.out_dir}', - logger='current', - level=logging.INFO) + print_log(f"the bad cases are saved under {self.out_dir}", logger="current", level=logging.INFO) diff --git a/mmpose/engine/hooks/ema_hook.py b/mmpose/engine/hooks/ema_hook.py index fd1a689f96f49c33059ec1e4afbe7b01b85164f9..636378d16c347efb83f5a847c774a9fa1a506efe 100644 --- a/mmpose/engine/hooks/ema_hook.py +++ b/mmpose/engine/hooks/ema_hook.py @@ -37,24 +37,20 @@ class ExpMomentumEMA(ExponentialMovingAverage): False. """ - def __init__(self, - model: nn.Module, - momentum: float = 0.0002, - gamma: int = 2000, - interval=1, - device: Optional[torch.device] = None, - update_buffers: bool = False) -> None: - super().__init__( - model=model, - momentum=momentum, - interval=interval, - device=device, - update_buffers=update_buffers) - assert gamma > 0, f'gamma must be greater than 0, but got {gamma}' + def __init__( + self, + model: nn.Module, + momentum: float = 0.0002, + gamma: int = 2000, + interval=1, + device: Optional[torch.device] = None, + update_buffers: bool = False, + ) -> None: + super().__init__(model=model, momentum=momentum, interval=interval, device=device, update_buffers=update_buffers) + assert gamma > 0, f"gamma must be greater than 0, but got {gamma}" self.gamma = gamma - def avg_func(self, averaged_param: Tensor, source_param: Tensor, - steps: int) -> None: + def avg_func(self, averaged_param: Tensor, source_param: Tensor, steps: int) -> None: """Compute the moving average of the parameters using the exponential momentum strategy. @@ -64,6 +60,5 @@ class ExpMomentumEMA(ExponentialMovingAverage): steps (int): The number of times the parameters have been updated. """ - momentum = (1 - self.momentum) * math.exp( - -float(1 + steps) / self.gamma) + self.momentum + momentum = (1 - self.momentum) * math.exp(-float(1 + steps) / self.gamma) + self.momentum averaged_param.mul_(1 - momentum).add_(source_param, alpha=momentum) diff --git a/mmpose/engine/hooks/mode_switch_hooks.py b/mmpose/engine/hooks/mode_switch_hooks.py index 8990ecab678fe067cab64cf95e613f0439eddba1..96fca31184fc87ad5e5bbf617ac66f198bdc53be 100644 --- a/mmpose/engine/hooks/mode_switch_hooks.py +++ b/mmpose/engine/hooks/mode_switch_hooks.py @@ -31,17 +31,14 @@ class YOLOXPoseModeSwitchHook(Hook): during training. Defaults to None. """ - def __init__(self, - num_last_epochs: int = 20, - new_train_dataset: dict = None, - new_train_pipeline: Sequence[dict] = None): + def __init__(self, num_last_epochs: int = 20, new_train_dataset: dict = None, new_train_pipeline: Sequence[dict] = None): self.num_last_epochs = num_last_epochs self.new_train_dataset = new_train_dataset self.new_train_pipeline = new_train_pipeline def _modify_dataloader(self, runner: Runner): """Modify dataloader with new dataset and pipeline configurations.""" - runner.logger.info(f'New Pipeline: {self.new_train_pipeline}') + runner.logger.info(f"New Pipeline: {self.new_train_pipeline}") train_dataloader_cfg = copy.deepcopy(runner.cfg.train_dataloader) if self.new_train_dataset: @@ -51,7 +48,7 @@ class YOLOXPoseModeSwitchHook(Hook): new_train_dataloader = Runner.build_dataloader(train_dataloader_cfg) runner.train_loop.dataloader = new_train_dataloader - runner.logger.info('Recreated the dataloader!') + runner.logger.info("Recreated the dataloader!") def before_train_epoch(self, runner: Runner): """Close mosaic and mixup augmentation, switch to use L1 loss.""" @@ -62,7 +59,7 @@ class YOLOXPoseModeSwitchHook(Hook): if epoch + 1 == runner.max_epochs - self.num_last_epochs: self._modify_dataloader(runner) - runner.logger.info('Added additional reg loss now!') + runner.logger.info("Added additional reg loss now!") model.head.use_aux_loss = True @@ -104,5 +101,4 @@ class RTMOModeSwitchHook(Hook): if epoch in self.epoch_attributes: for key, value in self.epoch_attributes[epoch].items(): rsetattr(model.head, key, value) - runner.logger.info( - f'Change model.head.{key} to {rgetattr(model.head, key)}') + runner.logger.info(f"Change model.head.{key} to {rgetattr(model.head, key)}") diff --git a/mmpose/engine/hooks/sync_norm_hook.py b/mmpose/engine/hooks/sync_norm_hook.py index 053e4f92af37037a64309b2262ef4610d336b3f5..986902a92b337383abc2173e49948dd1084029a0 100644 --- a/mmpose/engine/hooks/sync_norm_hook.py +++ b/mmpose/engine/hooks/sync_norm_hook.py @@ -14,7 +14,7 @@ def get_norm_states(module: nn.Module) -> OrderedDict: for name, child in module.named_modules(): if isinstance(child, nn.modules.batchnorm._NormBase): for k, v in child.state_dict().items(): - async_norm_states['.'.join([name, k])] = v + async_norm_states[".".join([name, k])] = v return async_norm_states @@ -35,7 +35,7 @@ class SyncNormHook(Hook): return try: - norm_states = all_reduce_dict(norm_states, op='mean') + norm_states = all_reduce_dict(norm_states, op="mean") module.load_state_dict(norm_states, strict=True) except Exception as e: - runner.logger.warn(f'SyncNormHook failed: {str(e)}') + runner.logger.warn(f"SyncNormHook failed: {str(e)}") diff --git a/mmpose/engine/hooks/visualization_hook.py b/mmpose/engine/hooks/visualization_hook.py index 7de273698c2dc3cb29be2183e2b7d9eb11a7f298..6a4ed5078375d5788f9ad38c823eba1e5513a59b 100644 --- a/mmpose/engine/hooks/visualization_hook.py +++ b/mmpose/engine/hooks/visualization_hook.py @@ -3,11 +3,10 @@ import os import warnings from typing import Optional, Sequence -import numpy as np - import mmcv import mmengine import mmengine.fileio as fileio +import numpy as np from mmengine.hooks import Hook from mmengine.runner import Runner from mmengine.visualization import Visualizer @@ -55,7 +54,7 @@ class PoseVisualizationHook(Hook): interval: int = 50, kpt_thr: float = 0.3, show: bool = False, - wait_time: float = 0., + wait_time: float = 0.0, out_dir: Optional[str] = None, backend_args: Optional[dict] = None, ): @@ -66,10 +65,12 @@ class PoseVisualizationHook(Hook): if self.show: # No need to think about vis backends. self._visualizer._vis_backends = {} - warnings.warn('The show is True, it means that only ' - 'the prediction results are visualized ' - 'without storing data, so vis_backends ' - 'needs to be excluded.') + warnings.warn( + "The show is True, it means that only " + "the prediction results are visualized " + "without storing data, so vis_backends " + "needs to be excluded." + ) self.wait_time = wait_time self.enable = enable @@ -77,8 +78,7 @@ class PoseVisualizationHook(Hook): self._test_index = 0 self.backend_args = backend_args - def after_val_iter(self, runner: Runner, batch_idx: int, data_batch: dict, - outputs: Sequence[PoseDataSample]) -> None: + def after_val_iter(self, runner: Runner, batch_idx: int, data_batch: dict, outputs: Sequence[PoseDataSample]) -> None: """Run after every ``self.interval`` validation iterations. Args: @@ -97,9 +97,9 @@ class PoseVisualizationHook(Hook): total_curr_iter = runner.iter + batch_idx # Visualize only the first data - img_path = data_batch['data_samples'][0].get('img_path') + img_path = data_batch["data_samples"][0].get("img_path") img_bytes = fileio.get(img_path, backend_args=self.backend_args) - img = mmcv.imfrombytes(img_bytes, channel_order='rgb') + img = mmcv.imfrombytes(img_bytes, channel_order="rgb") data_sample = outputs[0] # revert the heatmap on the original image @@ -107,7 +107,7 @@ class PoseVisualizationHook(Hook): if total_curr_iter % self.interval == 0: self._visualizer.add_datasample( - os.path.basename(img_path) if self.show else 'val_img', + os.path.basename(img_path) if self.show else "val_img", img, data_sample=data_sample, draw_gt=False, @@ -116,10 +116,10 @@ class PoseVisualizationHook(Hook): show=self.show, wait_time=self.wait_time, kpt_thr=self.kpt_thr, - step=total_curr_iter) + step=total_curr_iter, + ) - def after_test_iter(self, runner: Runner, batch_idx: int, data_batch: dict, - outputs: Sequence[PoseDataSample]) -> None: + def after_test_iter(self, runner: Runner, batch_idx: int, data_batch: dict, outputs: Sequence[PoseDataSample]) -> None: """Run after every testing iterations. Args: @@ -132,8 +132,7 @@ class PoseVisualizationHook(Hook): return if self.out_dir is not None: - self.out_dir = os.path.join(runner.work_dir, runner.timestamp, - self.out_dir) + self.out_dir = os.path.join(runner.work_dir, runner.timestamp, self.out_dir) mmengine.mkdir_or_exist(self.out_dir) self._visualizer.set_dataset_meta(runner.test_evaluator.dataset_meta) @@ -141,32 +140,28 @@ class PoseVisualizationHook(Hook): for data_sample in outputs: self._test_index += 1 - img_path = data_sample.get('img_path') + img_path = data_sample.get("img_path") img_bytes = fileio.get(img_path, backend_args=self.backend_args) - img = mmcv.imfrombytes(img_bytes, channel_order='rgb') + img = mmcv.imfrombytes(img_bytes, channel_order="rgb") # img = pad_img_to_amap(img, data_sample) - + data_sample = merge_data_samples([data_sample]) # Resize image to heatmap size - if data_sample.get('_pred_heatmaps') is not None: + if data_sample.get("_pred_heatmaps") is not None: heatmap_size = data_sample._pred_heatmaps.shape img = mmcv.imresize(img, heatmap_size[::-1]) out_file = None if self.out_dir is not None: - out_file_name, postfix = os.path.basename(img_path).rsplit( - '.', 1) - index = len([ - fname for fname in os.listdir(self.out_dir) - if fname.startswith(out_file_name) - ]) - out_file = f'{out_file_name}_{index}.{postfix}' + out_file_name, postfix = os.path.basename(img_path).rsplit(".", 1) + index = len([fname for fname in os.listdir(self.out_dir) if fname.startswith(out_file_name)]) + out_file = f"{out_file_name}_{index}.{postfix}" out_file = os.path.join(self.out_dir, out_file) self._visualizer.add_datasample( - os.path.basename(img_path) if self.show else 'test_img', + os.path.basename(img_path) if self.show else "test_img", img, data_sample=data_sample, show=self.show, @@ -176,29 +171,27 @@ class PoseVisualizationHook(Hook): wait_time=self.wait_time, kpt_thr=self.kpt_thr, out_file=out_file, - step=self._test_index) + step=self._test_index, + ) def pad_img_to_amap(img, data_sample): bbox_xywh = None - if 'raw_ann_info' in data_sample: - bbox_xywh = data_sample.raw_ann_info['bbox'] - elif 'pred_instances' in data_sample: + if "raw_ann_info" in data_sample: + bbox_xywh = data_sample.raw_ann_info["bbox"] + elif "pred_instances" in data_sample: bbox_xywh = data_sample.pred_instances.bboxes.flatten() - + if bbox_xywh is None: return img - bbox_xyxy = np.array([ - bbox_xywh[0], bbox_xywh[1], - bbox_xywh[0] + bbox_xywh[2], bbox_xywh[1] + bbox_xywh[3] - ]) - abox_xyxy = fix_bbox_aspect_ratio(bbox_xyxy, aspect_ratio=3/4, padding=1.25, bbox_format='xyxy') + bbox_xyxy = np.array([bbox_xywh[0], bbox_xywh[1], bbox_xywh[0] + bbox_xywh[2], bbox_xywh[1] + bbox_xywh[3]]) + abox_xyxy = fix_bbox_aspect_ratio(bbox_xyxy, aspect_ratio=3 / 4, padding=1.25, bbox_format="xyxy") abox_xyxy = abox_xyxy.flatten() x_pad = np.array([max(0, -abox_xyxy[0]), max(0, abox_xyxy[2] - img.shape[1])], dtype=int) y_pad = np.array([max(0, -abox_xyxy[1]), max(0, abox_xyxy[3] - img.shape[0])], dtype=int) - img = np.pad(img, ((y_pad[0], y_pad[1]), (x_pad[0], x_pad[1]), (0, 0)), mode='constant') + img = np.pad(img, ((y_pad[0], y_pad[1]), (x_pad[0], x_pad[1]), (0, 0)), mode="constant") kpts = data_sample.pred_instances.keypoints[0].reshape(-1, 2) kpts[:, :2] += np.array([x_pad[0], y_pad[0]]) diff --git a/mmpose/engine/optim_wrappers/__init__.py b/mmpose/engine/optim_wrappers/__init__.py index 16174c500f9dfa9e67ffd0692d1afe9016afdb27..e4fee513f5c3e4fd55a1f0af736acbd3bc60ee51 100644 --- a/mmpose/engine/optim_wrappers/__init__.py +++ b/mmpose/engine/optim_wrappers/__init__.py @@ -2,6 +2,4 @@ from .force_default_constructor import ForceDefaultOptimWrapperConstructor from .layer_decay_optim_wrapper import LayerDecayOptimWrapperConstructor -__all__ = [ - 'LayerDecayOptimWrapperConstructor', 'ForceDefaultOptimWrapperConstructor' -] +__all__ = ["LayerDecayOptimWrapperConstructor", "ForceDefaultOptimWrapperConstructor"] diff --git a/mmpose/engine/optim_wrappers/force_default_constructor.py b/mmpose/engine/optim_wrappers/force_default_constructor.py index f45291a73b0c38b94ae8b00bd2b7927f8778b622..f00f8e00a55587f03ad492722b02ed6832ab46f3 100644 --- a/mmpose/engine/optim_wrappers/force_default_constructor.py +++ b/mmpose/engine/optim_wrappers/force_default_constructor.py @@ -129,11 +129,9 @@ class ForceDefaultOptimWrapperConstructor(DefaultOptimWrapperConstructor): >>> # model.cls_head is (0.01, 0.95). """ - def add_params(self, - params: List[dict], - module: nn.Module, - prefix: str = '', - is_dcn_module: Optional[Union[int, float]] = None) -> None: + def add_params( + self, params: List[dict], module: nn.Module, prefix: str = "", is_dcn_module: Optional[Union[int, float]] = None + ) -> None: """Add all parameters of module to the params list. The parameters of the given module will be added to the list of param @@ -149,35 +147,31 @@ class ForceDefaultOptimWrapperConstructor(DefaultOptimWrapperConstructor): control conv_offset layer's learning rate. Defaults to None. """ # get param-wise options - custom_keys = self.paramwise_cfg.get('custom_keys', {}) + custom_keys = self.paramwise_cfg.get("custom_keys", {}) # first sort with alphabet order and then sort with reversed len of str sorted_keys = sorted(sorted(custom_keys.keys()), key=len, reverse=True) - bias_lr_mult = self.paramwise_cfg.get('bias_lr_mult', None) - bias_decay_mult = self.paramwise_cfg.get('bias_decay_mult', None) - norm_decay_mult = self.paramwise_cfg.get('norm_decay_mult', None) - dwconv_decay_mult = self.paramwise_cfg.get('dwconv_decay_mult', None) - flat_decay_mult = self.paramwise_cfg.get('flat_decay_mult', None) - bypass_duplicate = self.paramwise_cfg.get('bypass_duplicate', False) - dcn_offset_lr_mult = self.paramwise_cfg.get('dcn_offset_lr_mult', None) - force_default_settings = self.paramwise_cfg.get( - 'force_default_settings', False) + bias_lr_mult = self.paramwise_cfg.get("bias_lr_mult", None) + bias_decay_mult = self.paramwise_cfg.get("bias_decay_mult", None) + norm_decay_mult = self.paramwise_cfg.get("norm_decay_mult", None) + dwconv_decay_mult = self.paramwise_cfg.get("dwconv_decay_mult", None) + flat_decay_mult = self.paramwise_cfg.get("flat_decay_mult", None) + bypass_duplicate = self.paramwise_cfg.get("bypass_duplicate", False) + dcn_offset_lr_mult = self.paramwise_cfg.get("dcn_offset_lr_mult", None) + force_default_settings = self.paramwise_cfg.get("force_default_settings", False) # special rules for norm layers and depth-wise conv layers - is_norm = isinstance(module, - (_BatchNorm, _InstanceNorm, GroupNorm, LayerNorm)) - is_dwconv = ( - isinstance(module, torch.nn.Conv2d) - and module.in_channels == module.groups) + is_norm = isinstance(module, (_BatchNorm, _InstanceNorm, GroupNorm, LayerNorm)) + is_dwconv = isinstance(module, torch.nn.Conv2d) and module.in_channels == module.groups for name, param in module.named_parameters(recurse=False): - param_group = {'params': [param]} + param_group = {"params": [param]} if bypass_duplicate and self._is_in(param_group, params): print_log( - f'{prefix} is duplicate. It is skipped since ' - f'bypass_duplicate={bypass_duplicate}', - logger='current', - level=logging.WARNING) + f"{prefix} is duplicate. It is skipped since " f"bypass_duplicate={bypass_duplicate}", + logger="current", + level=logging.WARNING, + ) continue if not param.requires_grad: params.append(param_group) @@ -186,13 +180,13 @@ class ForceDefaultOptimWrapperConstructor(DefaultOptimWrapperConstructor): # if the parameter match one of the custom keys, ignore other rules is_custom = False for key in sorted_keys: - if key in f'{prefix}.{name}': + if key in f"{prefix}.{name}": is_custom = True - lr_mult = custom_keys[key].get('lr_mult', 1.) - param_group['lr'] = self.base_lr * lr_mult + lr_mult = custom_keys[key].get("lr_mult", 1.0) + param_group["lr"] = self.base_lr * lr_mult if self.base_wd is not None: - decay_mult = custom_keys[key].get('decay_mult', 1.) - param_group['weight_decay'] = self.base_wd * decay_mult + decay_mult = custom_keys[key].get("decay_mult", 1.0) + param_group["weight_decay"] = self.base_wd * decay_mult # add custom settings to param_group for k, v in custom_keys[key].items(): param_group[k] = v @@ -201,55 +195,45 @@ class ForceDefaultOptimWrapperConstructor(DefaultOptimWrapperConstructor): if not is_custom or force_default_settings: # bias_lr_mult affects all bias parameters # except for norm.bias dcn.conv_offset.bias - if name == 'bias' and not ( - is_norm or is_dcn_module) and bias_lr_mult is not None: - param_group['lr'] = self.base_lr * bias_lr_mult - - if (prefix.find('conv_offset') != -1 and is_dcn_module - and dcn_offset_lr_mult is not None - and isinstance(module, torch.nn.Conv2d)): + if name == "bias" and not (is_norm or is_dcn_module) and bias_lr_mult is not None: + param_group["lr"] = self.base_lr * bias_lr_mult + + if ( + prefix.find("conv_offset") != -1 + and is_dcn_module + and dcn_offset_lr_mult is not None + and isinstance(module, torch.nn.Conv2d) + ): # deal with both dcn_offset's bias & weight - param_group['lr'] = self.base_lr * dcn_offset_lr_mult + param_group["lr"] = self.base_lr * dcn_offset_lr_mult # apply weight decay policies if self.base_wd is not None: # norm decay if is_norm and norm_decay_mult is not None: - param_group[ - 'weight_decay'] = self.base_wd * norm_decay_mult + param_group["weight_decay"] = self.base_wd * norm_decay_mult # bias lr and decay - elif (name == 'bias' and not is_dcn_module - and bias_decay_mult is not None): - param_group[ - 'weight_decay'] = self.base_wd * bias_decay_mult + elif name == "bias" and not is_dcn_module and bias_decay_mult is not None: + param_group["weight_decay"] = self.base_wd * bias_decay_mult # depth-wise conv elif is_dwconv and dwconv_decay_mult is not None: - param_group[ - 'weight_decay'] = self.base_wd * dwconv_decay_mult + param_group["weight_decay"] = self.base_wd * dwconv_decay_mult # flatten parameters except dcn offset - elif (param.ndim == 1 and not is_dcn_module - and flat_decay_mult is not None): - param_group[ - 'weight_decay'] = self.base_wd * flat_decay_mult + elif param.ndim == 1 and not is_dcn_module and flat_decay_mult is not None: + param_group["weight_decay"] = self.base_wd * flat_decay_mult params.append(param_group) for key, value in param_group.items(): - if key == 'params': + if key == "params": continue - full_name = f'{prefix}.{name}' if prefix else name - print_log( - f'paramwise_options -- {full_name}:{key}={value}', - logger='current') + full_name = f"{prefix}.{name}" if prefix else name + print_log(f"paramwise_options -- {full_name}:{key}={value}", logger="current") if mmcv_full_available(): from mmcv.ops import DeformConv2d, ModulatedDeformConv2d - is_dcn_module = isinstance(module, - (DeformConv2d, ModulatedDeformConv2d)) + + is_dcn_module = isinstance(module, (DeformConv2d, ModulatedDeformConv2d)) else: is_dcn_module = False for child_name, child_mod in module.named_children(): - child_prefix = f'{prefix}.{child_name}' if prefix else child_name - self.add_params( - params, - child_mod, - prefix=child_prefix, - is_dcn_module=is_dcn_module) + child_prefix = f"{prefix}.{child_name}" if prefix else child_name + self.add_params(params, child_mod, prefix=child_prefix, is_dcn_module=is_dcn_module) diff --git a/mmpose/engine/optim_wrappers/layer_decay_optim_wrapper.py b/mmpose/engine/optim_wrappers/layer_decay_optim_wrapper.py index 6513e5593d98e9aa77a2795529ddeb538b6099c3..e7d2c74fcd90dfb137144e0569db3d499602c397 100644 --- a/mmpose/engine/optim_wrappers/layer_decay_optim_wrapper.py +++ b/mmpose/engine/optim_wrappers/layer_decay_optim_wrapper.py @@ -5,13 +5,12 @@ from mmengine.registry import OPTIM_WRAPPER_CONSTRUCTORS def get_num_layer_for_vit(var_name, num_max_layer): - if var_name in ('backbone.cls_token', 'backbone.mask_token', - 'backbone.pos_embed'): + if var_name in ("backbone.cls_token", "backbone.mask_token", "backbone.pos_embed"): return 0 - elif var_name.startswith('backbone.patch_embed'): + elif var_name.startswith("backbone.patch_embed"): return 0 - elif var_name.startswith('backbone.layers'): - layer_id = int(var_name.split('.')[2]) + elif var_name.startswith("backbone.layers"): + layer_id = int(var_name.split(".")[2]) return layer_id + 1 else: return num_max_layer - 1 @@ -22,52 +21,51 @@ class LayerDecayOptimWrapperConstructor(DefaultOptimWrapperConstructor): def __init__(self, optim_wrapper_cfg, paramwise_cfg=None): super().__init__(optim_wrapper_cfg, paramwise_cfg=None) - self.layer_decay_rate = paramwise_cfg.get('layer_decay_rate', 0.5) + self.layer_decay_rate = paramwise_cfg.get("layer_decay_rate", 0.5) super().__init__(optim_wrapper_cfg, paramwise_cfg) - def add_params(self, params, module, prefix='', lr=None): + def add_params(self, params, module, prefix="", lr=None): parameter_groups = {} print(self.paramwise_cfg) - num_layers = self.paramwise_cfg.get('num_layers') + 2 - layer_decay_rate = self.paramwise_cfg.get('layer_decay_rate') + num_layers = self.paramwise_cfg.get("num_layers") + 2 + layer_decay_rate = self.paramwise_cfg.get("layer_decay_rate") weight_decay = self.base_wd for name, param in module.named_parameters(): if not param.requires_grad: continue # frozen weights - if (len(param.shape) == 1 or name.endswith('.bias') - or 'pos_embed' in name): - group_name = 'no_decay' - this_weight_decay = 0. + if len(param.shape) == 1 or name.endswith(".bias") or "pos_embed" in name: + group_name = "no_decay" + this_weight_decay = 0.0 else: - group_name = 'decay' + group_name = "decay" this_weight_decay = weight_decay layer_id = get_num_layer_for_vit(name, num_layers) - group_name = 'layer_%d_%s' % (layer_id, group_name) + group_name = "layer_%d_%s" % (layer_id, group_name) if group_name not in parameter_groups: - scale = layer_decay_rate**(num_layers - layer_id - 1) + scale = layer_decay_rate ** (num_layers - layer_id - 1) parameter_groups[group_name] = { - 'weight_decay': this_weight_decay, - 'params': [], - 'param_names': [], - 'lr_scale': scale, - 'group_name': group_name, - 'lr': scale * self.base_lr, + "weight_decay": this_weight_decay, + "params": [], + "param_names": [], + "lr_scale": scale, + "group_name": group_name, + "lr": scale * self.base_lr, } - parameter_groups[group_name]['params'].append(param) - parameter_groups[group_name]['param_names'].append(name) + parameter_groups[group_name]["params"].append(param) + parameter_groups[group_name]["param_names"].append(name) rank, _ = get_dist_info() if rank == 0: to_display = {} for key in parameter_groups: to_display[key] = { - 'param_names': parameter_groups[key]['param_names'], - 'lr_scale': parameter_groups[key]['lr_scale'], - 'lr': parameter_groups[key]['lr'], - 'weight_decay': parameter_groups[key]['weight_decay'], + "param_names": parameter_groups[key]["param_names"], + "lr_scale": parameter_groups[key]["lr_scale"], + "lr": parameter_groups[key]["lr"], + "weight_decay": parameter_groups[key]["weight_decay"], } params.extend(parameter_groups.values()) diff --git a/mmpose/engine/schedulers/__init__.py b/mmpose/engine/schedulers/__init__.py index 8ea59930e8c465dc75c52106d0440656a5a9446a..5629ed579239bc5ec1f08bdedcb71a658d4700d8 100644 --- a/mmpose/engine/schedulers/__init__.py +++ b/mmpose/engine/schedulers/__init__.py @@ -1,9 +1,5 @@ # Copyright (c) OpenMMLab. All rights reserved. from .constant_lr import ConstantLR -from .quadratic_warmup import (QuadraticWarmupLR, QuadraticWarmupMomentum, - QuadraticWarmupParamScheduler) +from .quadratic_warmup import QuadraticWarmupLR, QuadraticWarmupMomentum, QuadraticWarmupParamScheduler -__all__ = [ - 'QuadraticWarmupParamScheduler', 'QuadraticWarmupMomentum', - 'QuadraticWarmupLR', 'ConstantLR' -] +__all__ = ["QuadraticWarmupParamScheduler", "QuadraticWarmupMomentum", "QuadraticWarmupLR", "ConstantLR"] diff --git a/mmpose/engine/schedulers/constant_lr.py b/mmpose/engine/schedulers/constant_lr.py index 3b96374542f6c85d5b1edaad77ef81cc031ae3ad..7cf7f5b05e3948ee0a6cdad79e2d3e97c1d393f6 100644 --- a/mmpose/engine/schedulers/constant_lr.py +++ b/mmpose/engine/schedulers/constant_lr.py @@ -1,6 +1,5 @@ # Copyright (c) OpenMMLab. All rights reserved. -from mmengine.optim.scheduler import \ - ConstantParamScheduler as MMENGINE_ConstantParamScheduler +from mmengine.optim.scheduler import ConstantParamScheduler as MMENGINE_ConstantParamScheduler from mmengine.optim.scheduler.lr_scheduler import LRSchedulerMixin from mmpose.registry import PARAM_SCHEDULERS @@ -34,26 +33,23 @@ class ConstantParamScheduler(MMENGINE_ConstantParamScheduler): Defaults to False. """ - def __init__(self, - optimizer, - param_name: str, - factor: float = 1.0 / 3, - begin: int = 0, - end: int = INF, - last_step: int = -1, - by_epoch: bool = True, - verbose: bool = False): + def __init__( + self, + optimizer, + param_name: str, + factor: float = 1.0 / 3, + begin: int = 0, + end: int = INF, + last_step: int = -1, + by_epoch: bool = True, + verbose: bool = False, + ): self.factor = factor self.total_iters = end - begin - 1 super(MMENGINE_ConstantParamScheduler, self).__init__( - optimizer, - param_name=param_name, - begin=begin, - end=end, - last_step=last_step, - by_epoch=by_epoch, - verbose=verbose) + optimizer, param_name=param_name, begin=begin, end=end, last_step=last_step, by_epoch=by_epoch, verbose=verbose + ) @PARAM_SCHEDULERS.register_module() diff --git a/mmpose/engine/schedulers/quadratic_warmup.py b/mmpose/engine/schedulers/quadratic_warmup.py index 10217972173ac9e764ea71966a1f2dd3a8b79a1d..0ae652ee3e8fe019c283603ecd4ec02f3c52c6d0 100644 --- a/mmpose/engine/schedulers/quadratic_warmup.py +++ b/mmpose/engine/schedulers/quadratic_warmup.py @@ -32,44 +32,34 @@ class QuadraticWarmupParamScheduler(_ParamScheduler): Defaults to False. """ - def __init__(self, - optimizer: Optimizer, - param_name: str, - begin: int = 0, - end: int = INF, - last_step: int = -1, - by_epoch: bool = True, - verbose: bool = False): + def __init__( + self, + optimizer: Optimizer, + param_name: str, + begin: int = 0, + end: int = INF, + last_step: int = -1, + by_epoch: bool = True, + verbose: bool = False, + ): if end >= INF: - raise ValueError('``end`` must be less than infinity,' - 'Please set ``end`` parameter of ' - '``QuadraticWarmupScheduler`` as the ' - 'number of warmup end.') + raise ValueError( + "``end`` must be less than infinity," + "Please set ``end`` parameter of " + "``QuadraticWarmupScheduler`` as the " + "number of warmup end." + ) self.total_iters = end - begin super().__init__( - optimizer=optimizer, - param_name=param_name, - begin=begin, - end=end, - last_step=last_step, - by_epoch=by_epoch, - verbose=verbose) + optimizer=optimizer, param_name=param_name, begin=begin, end=end, last_step=last_step, by_epoch=by_epoch, verbose=verbose + ) @classmethod - def build_iter_from_epoch(cls, - *args, - begin=0, - end=INF, - by_epoch=True, - epoch_length=None, - **kwargs): + def build_iter_from_epoch(cls, *args, begin=0, end=INF, by_epoch=True, epoch_length=None, **kwargs): """Build an iter-based instance of this scheduler from an epoch-based config.""" - assert by_epoch, 'Only epoch-based kwargs whose `by_epoch=True` can ' \ - 'be converted to iter-based.' - assert epoch_length is not None and epoch_length > 0, \ - f'`epoch_length` must be a positive integer, ' \ - f'but got {epoch_length}.' + assert by_epoch, "Only epoch-based kwargs whose `by_epoch=True` can " "be converted to iter-based." + assert epoch_length is not None and epoch_length > 0, f"`epoch_length` must be a positive integer, " f"but got {epoch_length}." by_epoch = False begin = begin * epoch_length if end != INF: @@ -79,16 +69,11 @@ class QuadraticWarmupParamScheduler(_ParamScheduler): def _get_value(self): """Compute value using chainable form of the scheduler.""" if self.last_step == 0: - return [ - base_value * (2 * self.last_step + 1) / self.total_iters**2 - for base_value in self.base_values - ] + return [base_value * (2 * self.last_step + 1) / self.total_iters**2 for base_value in self.base_values] return [ - group[self.param_name] + base_value * - (2 * self.last_step + 1) / self.total_iters**2 - for base_value, group in zip(self.base_values, - self.optimizer.param_groups) + group[self.param_name] + base_value * (2 * self.last_step + 1) / self.total_iters**2 + for base_value, group in zip(self.base_values, self.optimizer.param_groups) ] @@ -112,8 +97,7 @@ class QuadraticWarmupLR(LRSchedulerMixin, QuadraticWarmupParamScheduler): @PARAM_SCHEDULERS.register_module() -class QuadraticWarmupMomentum(MomentumSchedulerMixin, - QuadraticWarmupParamScheduler): +class QuadraticWarmupMomentum(MomentumSchedulerMixin, QuadraticWarmupParamScheduler): """Warm up the momentum value of each parameter group by quadratic formula. Args: diff --git a/mmpose/evaluation/evaluators/__init__.py b/mmpose/evaluation/evaluators/__init__.py index ae2d79d514dca929d4a0458acace3c6eaab6aea1..3b4fa4b97ef6d2d004fb54c2c90006a5d63bad6d 100644 --- a/mmpose/evaluation/evaluators/__init__.py +++ b/mmpose/evaluation/evaluators/__init__.py @@ -1,4 +1,4 @@ # Copyright (c) OpenMMLab. All rights reserved. from .mutli_dataset_evaluator import MultiDatasetEvaluator -__all__ = ['MultiDatasetEvaluator'] +__all__ = ["MultiDatasetEvaluator"] diff --git a/mmpose/evaluation/evaluators/mutli_dataset_evaluator.py b/mmpose/evaluation/evaluators/mutli_dataset_evaluator.py index bc47d2980c9d05a2e068d1860068d2f6ba213e1f..af4d242c4fe7bcf2589e8cf7a5430739e5f4cb12 100644 --- a/mmpose/evaluation/evaluators/mutli_dataset_evaluator.py +++ b/mmpose/evaluation/evaluators/mutli_dataset_evaluator.py @@ -25,18 +25,17 @@ class MultiDatasetEvaluator(Evaluator): datasets: Sequence[dict], ): - assert len(metrics) == len(datasets), 'the argument ' \ - 'datasets should have same length as metrics' + assert len(metrics) == len(datasets), "the argument " "datasets should have same length as metrics" super().__init__(metrics) # Initialize metrics for each dataset metrics_dict = dict() for dataset, metric in zip(datasets, self.metrics): - metainfo_file = DATASETS.module_dict[dataset['type']].METAINFO + metainfo_file = DATASETS.module_dict[dataset["type"]].METAINFO dataset_meta = parse_pose_metainfo(metainfo_file) metric.dataset_meta = dataset_meta - dataset_name = dataset_meta['dataset_name'] + dataset_name = dataset_meta["dataset_name"] metrics_dict[dataset_name] = metric self.metrics_dict = metrics_dict @@ -50,9 +49,7 @@ class MultiDatasetEvaluator(Evaluator): """Set the dataset meta info to the evaluator and it's metrics.""" self._dataset_meta = dataset_meta - def process(self, - data_samples: Sequence[BaseDataElement], - data_batch: Optional[Any] = None): + def process(self, data_samples: Sequence[BaseDataElement], data_batch: Optional[Any] = None): """Convert ``BaseDataSample`` to dict and invoke process method of each metric. @@ -67,23 +64,18 @@ class MultiDatasetEvaluator(Evaluator): data_samples=defaultdict(list), ) - for inputs, data_ds, data_sample in zip(data_batch['inputs'], - data_batch['data_samples'], - data_samples): + for inputs, data_ds, data_sample in zip(data_batch["inputs"], data_batch["data_samples"], data_samples): if isinstance(data_sample, BaseDataElement): data_sample = data_sample.to_dict() assert isinstance(data_sample, dict) - dataset_name = data_sample.get('dataset_name', - self.dataset_meta['dataset_name']) + dataset_name = data_sample.get("dataset_name", self.dataset_meta["dataset_name"]) _data_samples[dataset_name].append(data_sample) - _data_batch['inputs'][dataset_name].append(inputs) - _data_batch['data_samples'][dataset_name].append(data_ds) + _data_batch["inputs"][dataset_name].append(inputs) + _data_batch["data_samples"][dataset_name].append(data_ds) for dataset_name, metric in self.metrics_dict.items(): if dataset_name in _data_samples: - data_batch = dict( - inputs=_data_batch['inputs'][dataset_name], - data_samples=_data_batch['data_samples'][dataset_name]) + data_batch = dict(inputs=_data_batch["inputs"][dataset_name], data_samples=_data_batch["data_samples"][dataset_name]) metric.process(data_batch, _data_samples[dataset_name]) else: continue diff --git a/mmpose/evaluation/functional/__init__.py b/mmpose/evaluation/functional/__init__.py index 239968f03aa4c67dd65f752c5945a35d20b31897..d6d6ca8e15f61d2c702697ae695e1220b04d7e15 100644 --- a/mmpose/evaluation/functional/__init__.py +++ b/mmpose/evaluation/functional/__init__.py @@ -1,15 +1,32 @@ # Copyright (c) OpenMMLab. All rights reserved. -from .keypoint_eval import (keypoint_auc, keypoint_epe, keypoint_mpjpe, - keypoint_nme, keypoint_pck_accuracy, - multilabel_classification_accuracy, - pose_pck_accuracy, simcc_pck_accuracy) +from .keypoint_eval import ( + keypoint_auc, + keypoint_epe, + keypoint_mpjpe, + keypoint_nme, + keypoint_pck_accuracy, + multilabel_classification_accuracy, + pose_pck_accuracy, + simcc_pck_accuracy, +) from .nms import nearby_joints_nms, nms, nms_torch, oks_nms, soft_oks_nms from .transforms import transform_ann, transform_pred, transform_sigmas __all__ = [ - 'keypoint_pck_accuracy', 'keypoint_auc', 'keypoint_nme', 'keypoint_epe', - 'pose_pck_accuracy', 'multilabel_classification_accuracy', - 'simcc_pck_accuracy', 'nms', 'oks_nms', 'soft_oks_nms', 'keypoint_mpjpe', - 'nms_torch', 'transform_ann', 'transform_sigmas', 'transform_pred', - 'nearby_joints_nms' + "keypoint_pck_accuracy", + "keypoint_auc", + "keypoint_nme", + "keypoint_epe", + "pose_pck_accuracy", + "multilabel_classification_accuracy", + "simcc_pck_accuracy", + "nms", + "oks_nms", + "soft_oks_nms", + "keypoint_mpjpe", + "nms_torch", + "transform_ann", + "transform_sigmas", + "transform_pred", + "nearby_joints_nms", ] diff --git a/mmpose/evaluation/functional/keypoint_eval.py b/mmpose/evaluation/functional/keypoint_eval.py index f5d5d0584b5ebe5da34abbe3ab99033b283956eb..01c7a7fccaead25b5dbc6f9060958be71e9e29cb 100644 --- a/mmpose/evaluation/functional/keypoint_eval.py +++ b/mmpose/evaluation/functional/keypoint_eval.py @@ -3,12 +3,12 @@ from typing import Optional, Tuple import numpy as np -from mmpose.codecs.utils import get_heatmap_maximum, get_simcc_maximum, get_heatmap_expected_value +from mmpose.codecs.utils import get_heatmap_expected_value, get_heatmap_maximum, get_simcc_maximum + from .mesh_eval import compute_similarity_transform -def _calc_distances(preds: np.ndarray, gts: np.ndarray, mask: np.ndarray, - norm_factor: np.ndarray) -> np.ndarray: +def _calc_distances(preds: np.ndarray, gts: np.ndarray, mask: np.ndarray, norm_factor: np.ndarray) -> np.ndarray: """Calculate the normalized distances between preds and target. Note: @@ -37,8 +37,7 @@ def _calc_distances(preds: np.ndarray, gts: np.ndarray, mask: np.ndarray, distances = np.full((N, K), -1, dtype=np.float32) # handle invalid values norm_factor[np.where(norm_factor <= 0)] = 1e6 - distances[_mask] = np.linalg.norm( - ((preds - gts) / norm_factor[:, None, :])[_mask], axis=-1) + distances[_mask] = np.linalg.norm(((preds - gts) / norm_factor[:, None, :])[_mask], axis=-1) return distances.T @@ -64,8 +63,7 @@ def _distance_acc(distances: np.ndarray, thr: float = 0.5) -> float: return -1 -def keypoint_pck_accuracy(pred: np.ndarray, gt: np.ndarray, mask: np.ndarray, - thr: np.ndarray, norm_factor: np.ndarray) -> tuple: +def keypoint_pck_accuracy(pred: np.ndarray, gt: np.ndarray, mask: np.ndarray, thr: np.ndarray, norm_factor: np.ndarray) -> tuple: """Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints for coordinates. @@ -103,11 +101,7 @@ def keypoint_pck_accuracy(pred: np.ndarray, gt: np.ndarray, mask: np.ndarray, return acc, avg_acc, cnt -def keypoint_auc(pred: np.ndarray, - gt: np.ndarray, - mask: np.ndarray, - norm_factor: np.ndarray, - num_thrs: int = 20) -> float: +def keypoint_auc(pred: np.ndarray, gt: np.ndarray, mask: np.ndarray, norm_factor: np.ndarray, num_thrs: int = 20) -> float: """Calculate the Area under curve (AUC) of keypoint PCK accuracy. Note: @@ -139,8 +133,7 @@ def keypoint_auc(pred: np.ndarray, return auc -def keypoint_nme(pred: np.ndarray, gt: np.ndarray, mask: np.ndarray, - normalize_factor: np.ndarray) -> float: +def keypoint_nme(pred: np.ndarray, gt: np.ndarray, mask: np.ndarray, normalize_factor: np.ndarray) -> float: """Calculate the normalized mean error (NME). Note: @@ -181,19 +174,19 @@ def keypoint_epe(pred: np.ndarray, gt: np.ndarray, mask: np.ndarray) -> float: float: Average end-point error. """ - distances = _calc_distances( - pred, gt, mask, - np.ones((pred.shape[0], pred.shape[2]), dtype=np.float32)) + distances = _calc_distances(pred, gt, mask, np.ones((pred.shape[0], pred.shape[2]), dtype=np.float32)) distance_valid = distances[distances != -1] return distance_valid.sum() / max(1, len(distance_valid)) -def pose_pck_accuracy(output: np.ndarray, - target: np.ndarray, - mask: np.ndarray, - thr: float = 0.05, - normalize: Optional[np.ndarray] = None, - method: str = 'argmax') -> tuple: +def pose_pck_accuracy( + output: np.ndarray, + target: np.ndarray, + mask: np.ndarray, + thr: float = 0.05, + normalize: Optional[np.ndarray] = None, + method: str = "argmax", +) -> tuple: """Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints from heatmaps. @@ -226,8 +219,8 @@ def pose_pck_accuracy(output: np.ndarray, - int: Number of valid keypoints. """ method = method.lower() - if method not in ['argmax', 'expected']: - raise ValueError(f'Invalid method: {method}') + if method not in ["argmax", "expected"]: + raise ValueError(f"Invalid method: {method}") N, K, H, W = output.shape if K == 0: @@ -235,7 +228,7 @@ def pose_pck_accuracy(output: np.ndarray, if normalize is None: normalize = np.tile(np.array([[H, W]]), (N, 1)) - if method == 'argmax': + if method == "argmax": pred, _ = get_heatmap_maximum(output) gt, _ = get_heatmap_maximum(target) else: @@ -244,12 +237,14 @@ def pose_pck_accuracy(output: np.ndarray, return keypoint_pck_accuracy(pred, gt, mask, thr, normalize) -def simcc_pck_accuracy(output: Tuple[np.ndarray, np.ndarray], - target: Tuple[np.ndarray, np.ndarray], - simcc_split_ratio: float, - mask: np.ndarray, - thr: float = 0.05, - normalize: Optional[np.ndarray] = None) -> tuple: +def simcc_pck_accuracy( + output: Tuple[np.ndarray, np.ndarray], + target: Tuple[np.ndarray, np.ndarray], + simcc_split_ratio: float, + mask: np.ndarray, + thr: float = 0.05, + normalize: Optional[np.ndarray] = None, +) -> tuple: """Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints from SimCC. @@ -297,10 +292,7 @@ def simcc_pck_accuracy(output: Tuple[np.ndarray, np.ndarray], return keypoint_pck_accuracy(pred_coords, gt_coords, mask, thr, normalize) -def multilabel_classification_accuracy(pred: np.ndarray, - gt: np.ndarray, - mask: np.ndarray, - thr: float = 0.5) -> float: +def multilabel_classification_accuracy(pred: np.ndarray, gt: np.ndarray, mask: np.ndarray, thr: float = 0.5) -> float: """Get multi-label classification accuracy. Note: @@ -330,10 +322,7 @@ def multilabel_classification_accuracy(pred: np.ndarray, return acc -def keypoint_mpjpe(pred: np.ndarray, - gt: np.ndarray, - mask: np.ndarray, - alignment: str = 'none'): +def keypoint_mpjpe(pred: np.ndarray, gt: np.ndarray, mask: np.ndarray, alignment: str = "none"): """Calculate the mean per-joint position error (MPJPE) and the error after rigid alignment with the ground truth (P-MPJPE). @@ -365,20 +354,17 @@ def keypoint_mpjpe(pred: np.ndarray, """ assert mask.any() - if alignment == 'none': + if alignment == "none": pass - elif alignment == 'procrustes': - pred = np.stack([ - compute_similarity_transform(pred_i, gt_i) - for pred_i, gt_i in zip(pred, gt) - ]) - elif alignment == 'scale': - pred_dot_pred = np.einsum('nkc,nkc->n', pred, pred) - pred_dot_gt = np.einsum('nkc,nkc->n', pred, gt) + elif alignment == "procrustes": + pred = np.stack([compute_similarity_transform(pred_i, gt_i) for pred_i, gt_i in zip(pred, gt)]) + elif alignment == "scale": + pred_dot_pred = np.einsum("nkc,nkc->n", pred, pred) + pred_dot_gt = np.einsum("nkc,nkc->n", pred, gt) scale_factor = pred_dot_gt / pred_dot_pred pred = pred * scale_factor[:, None, None] else: - raise ValueError(f'Invalid value for alignment: {alignment}') + raise ValueError(f"Invalid value for alignment: {alignment}") error = np.linalg.norm(pred - gt, ord=2, axis=-1)[mask].mean() return error diff --git a/mmpose/evaluation/functional/nms.py b/mmpose/evaluation/functional/nms.py index f7dd2279c74cb74ef943a02bff3998f4d03f744d..fc151b37db379a2c356ce9aa1bd076703352b442 100644 --- a/mmpose/evaluation/functional/nms.py +++ b/mmpose/evaluation/functional/nms.py @@ -55,12 +55,9 @@ def nms(dets: np.ndarray, thr: float) -> List[int]: return keep -def oks_iou(g: np.ndarray, - d: np.ndarray, - a_g: float, - a_d: np.ndarray, - sigmas: Optional[np.ndarray] = None, - vis_thr: Optional[float] = None) -> np.ndarray: +def oks_iou( + g: np.ndarray, d: np.ndarray, a_g: float, a_d: np.ndarray, sigmas: Optional[np.ndarray] = None, vis_thr: Optional[float] = None +) -> np.ndarray: """Calculate oks ious. Note: @@ -89,11 +86,8 @@ def oks_iou(g: np.ndarray, np.ndarray: The oks ious. """ if sigmas is None: - sigmas = np.array([ - .26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, - .87, .87, .89, .89 - ]) / 10.0 - vars = (sigmas * 2)**2 + sigmas = np.array([0.26, 0.25, 0.25, 0.35, 0.35, 0.79, 0.79, 0.72, 0.72, 0.62, 0.62, 1.07, 1.07, 0.87, 0.87, 0.89, 0.89]) / 10.0 + vars = (sigmas * 2) ** 2 xg = g[0::3] yg = g[1::3] vg = g[2::3] @@ -112,11 +106,9 @@ def oks_iou(g: np.ndarray, return ious -def oks_nms(kpts_db: List[dict], - thr: float, - sigmas: Optional[np.ndarray] = None, - vis_thr: Optional[float] = None, - score_per_joint: bool = False): +def oks_nms( + kpts_db: List[dict], thr: float, sigmas: Optional[np.ndarray] = None, vis_thr: Optional[float] = None, score_per_joint: bool = False +): """OKS NMS implementations. Args: @@ -140,12 +132,12 @@ def oks_nms(kpts_db: List[dict], return [] if score_per_joint: - scores = np.array([k['score'].mean() for k in kpts_db]) + scores = np.array([k["score"].mean() for k in kpts_db]) else: - scores = np.array([k['score'] for k in kpts_db]) + scores = np.array([k["score"] for k in kpts_db]) - kpts = np.array([k['keypoints'].flatten() for k in kpts_db]) - areas = np.array([k['area'] for k in kpts_db]) + kpts = np.array([k["keypoints"].flatten() for k in kpts_db]) + areas = np.array([k["area"] for k in kpts_db]) order = scores.argsort()[::-1] @@ -154,8 +146,7 @@ def oks_nms(kpts_db: List[dict], i = order[0] keep.append(i) - oks_ovr = oks_iou(kpts[i], kpts[order[1:]], areas[i], areas[order[1:]], - sigmas, vis_thr) + oks_ovr = oks_iou(kpts[i], kpts[order[1:]], areas[i], areas[order[1:]], sigmas, vis_thr) inds = np.where(oks_ovr <= thr)[0] order = order[inds + 1] @@ -165,10 +156,7 @@ def oks_nms(kpts_db: List[dict], return keep -def _rescore(overlap: np.ndarray, - scores: np.ndarray, - thr: float, - type: str = 'gaussian'): +def _rescore(overlap: np.ndarray, scores: np.ndarray, thr: float, type: str = "gaussian"): """Rescoring mechanism gaussian or linear. Args: @@ -182,23 +170,25 @@ def _rescore(overlap: np.ndarray, np.ndarray: indexes to keep """ assert len(overlap) == len(scores) - assert type in ['gaussian', 'linear'] + assert type in ["gaussian", "linear"] - if type == 'linear': + if type == "linear": inds = np.where(overlap >= thr)[0] scores[inds] = scores[inds] * (1 - overlap[inds]) else: - scores = scores * np.exp(-overlap**2 / thr) + scores = scores * np.exp(-(overlap**2) / thr) return scores -def soft_oks_nms(kpts_db: List[dict], - thr: float, - max_dets: int = 20, - sigmas: Optional[np.ndarray] = None, - vis_thr: Optional[float] = None, - score_per_joint: bool = False): +def soft_oks_nms( + kpts_db: List[dict], + thr: float, + max_dets: int = 20, + sigmas: Optional[np.ndarray] = None, + vis_thr: Optional[float] = None, + score_per_joint: bool = False, +): """Soft OKS NMS implementations. Args: @@ -223,12 +213,12 @@ def soft_oks_nms(kpts_db: List[dict], return [] if score_per_joint: - scores = np.array([k['score'].mean() for k in kpts_db]) + scores = np.array([k["score"].mean() for k in kpts_db]) else: - scores = np.array([k['score'] for k in kpts_db]) + scores = np.array([k["score"] for k in kpts_db]) - kpts = np.array([k['keypoints'].flatten() for k in kpts_db]) - areas = np.array([k['area'] for k in kpts_db]) + kpts = np.array([k["keypoints"].flatten() for k in kpts_db]) + areas = np.array([k["area"] for k in kpts_db]) order = scores.argsort()[::-1] scores = scores[order] @@ -238,8 +228,7 @@ def soft_oks_nms(kpts_db: List[dict], while len(order) > 0 and keep_cnt < max_dets: i = order[0] - oks_ovr = oks_iou(kpts[i], kpts[order[1:]], areas[i], areas[order[1:]], - sigmas, vis_thr) + oks_ovr = oks_iou(kpts[i], kpts[order[1:]], areas[i], areas[order[1:]], sigmas, vis_thr) order = order[1:] scores = _rescore(oks_ovr, scores[1:], thr) @@ -282,22 +271,21 @@ def nearby_joints_nms( np.ndarray: indexes to keep. """ - assert dist_thr > 0, '`dist_thr` must be greater than 0.' + assert dist_thr > 0, "`dist_thr` must be greater than 0." if len(kpts_db) == 0: return [] if score_per_joint: - scores = np.array([k['score'].mean() for k in kpts_db]) + scores = np.array([k["score"].mean() for k in kpts_db]) else: - scores = np.array([k['score'] for k in kpts_db]) + scores = np.array([k["score"] for k in kpts_db]) - kpts = np.array([k['keypoints'] for k in kpts_db]) + kpts = np.array([k["keypoints"] for k in kpts_db]) num_people, num_joints, _ = kpts.shape if num_nearby_joints_thr is None: num_nearby_joints_thr = num_joints // 2 - assert num_nearby_joints_thr < num_joints, '`num_nearby_joints_thr` must '\ - 'be less than the number of joints.' + assert num_nearby_joints_thr < num_joints, "`num_nearby_joints_thr` must " "be less than the number of joints." # compute distance threshold pose_area = kpts.max(axis=1) - kpts.min(axis=1) @@ -326,17 +314,13 @@ def nearby_joints_nms( # limit the number of output instances if max_dets > 0 and len(keep_pose_inds) > max_dets: - sub_inds = np.argsort(scores[keep_pose_inds])[-1:-max_dets - 1:-1] + sub_inds = np.argsort(scores[keep_pose_inds])[-1 : -max_dets - 1 : -1] keep_pose_inds = [keep_pose_inds[i] for i in sub_inds] return keep_pose_inds -def nms_torch(bboxes: Tensor, - scores: Tensor, - threshold: float = 0.65, - iou_calculator=bbox_overlaps, - return_group: bool = False): +def nms_torch(bboxes: Tensor, scores: Tensor, threshold: float = 0.65, iou_calculator=bbox_overlaps, return_group: bool = False): """Perform Non-Maximum Suppression (NMS) on a set of bounding boxes using their corresponding scores. diff --git a/mmpose/evaluation/functional/transforms.py b/mmpose/evaluation/functional/transforms.py index 56873b389cc145ceaad7f1307399f901a2ed0157..77df0ff2b647e6b1bb71accca90ec210aa62c7d2 100644 --- a/mmpose/evaluation/functional/transforms.py +++ b/mmpose/evaluation/functional/transforms.py @@ -4,9 +4,7 @@ from typing import List, Tuple, Union import numpy as np -def transform_sigmas(sigmas: Union[List, np.ndarray], num_keypoints: int, - mapping: Union[List[Tuple[int, int]], List[Tuple[Tuple, - int]]]): +def transform_sigmas(sigmas: Union[List, np.ndarray], num_keypoints: int, mapping: Union[List[Tuple[int, int]], List[Tuple[Tuple, int]]]): """Transforms the sigmas based on the mapping.""" if len(mapping): source_index, target_index = map(list, zip(*mapping)) @@ -27,9 +25,7 @@ def transform_sigmas(sigmas: Union[List, np.ndarray], num_keypoints: int, return new_sigmas -def transform_ann(ann_info: Union[dict, list], num_keypoints: int, - mapping: Union[List[Tuple[int, int]], List[Tuple[Tuple, - int]]]): +def transform_ann(ann_info: Union[dict, list], num_keypoints: int, mapping: Union[List[Tuple[int, int]], List[Tuple[Tuple, int]]]): """Transforms COCO-format annotations based on the mapping.""" if len(mapping): source_index, target_index = map(list, zip(*mapping)) @@ -42,17 +38,17 @@ def transform_ann(ann_info: Union[dict, list], num_keypoints: int, list_input = False for each in ann_info: - if 'keypoints' in each: - keypoints = np.array(each['keypoints']) + if "keypoints" in each: + keypoints = np.array(each["keypoints"]) C = 3 # COCO-format: x, y, score keypoints = keypoints.reshape(-1, C) new_keypoints = np.zeros((num_keypoints, C), dtype=keypoints.dtype) new_keypoints[target_index] = keypoints[source_index] - each['keypoints'] = new_keypoints.reshape(-1).tolist() + each["keypoints"] = new_keypoints.reshape(-1).tolist() - if 'num_keypoints' in each: - each['num_keypoints'] = num_keypoints + if "num_keypoints" in each: + each["num_keypoints"] = num_keypoints if not list_input: ann_info = ann_info[0] @@ -60,9 +56,7 @@ def transform_ann(ann_info: Union[dict, list], num_keypoints: int, return ann_info -def transform_pred(pred_info: Union[dict, list], num_keypoints: int, - mapping: Union[List[Tuple[int, int]], List[Tuple[Tuple, - int]]]): +def transform_pred(pred_info: Union[dict, list], num_keypoints: int, mapping: Union[List[Tuple[int, int]], List[Tuple[Tuple, int]]]): """Transforms predictions based on the mapping.""" if len(mapping): source_index, target_index = map(list, zip(*mapping)) @@ -75,23 +69,21 @@ def transform_pred(pred_info: Union[dict, list], num_keypoints: int, list_input = False for each in pred_info: - if 'keypoints' in each: - keypoints = np.array(each['keypoints']) + if "keypoints" in each: + keypoints = np.array(each["keypoints"]) N, _, C = keypoints.shape - new_keypoints = np.zeros((N, num_keypoints, C), - dtype=keypoints.dtype) + new_keypoints = np.zeros((N, num_keypoints, C), dtype=keypoints.dtype) new_keypoints[:, target_index] = keypoints[:, source_index] - each['keypoints'] = new_keypoints + each["keypoints"] = new_keypoints - keypoint_scores = np.array(each['keypoint_scores']) - new_scores = np.zeros((N, num_keypoints), - dtype=keypoint_scores.dtype) + keypoint_scores = np.array(each["keypoint_scores"]) + new_scores = np.zeros((N, num_keypoints), dtype=keypoint_scores.dtype) new_scores[:, target_index] = keypoint_scores[:, source_index] - each['keypoint_scores'] = new_scores + each["keypoint_scores"] = new_scores - if 'num_keypoints' in each: - each['num_keypoints'] = num_keypoints + if "num_keypoints" in each: + each["num_keypoints"] = num_keypoints if not list_input: pred_info = pred_info[0] diff --git a/mmpose/evaluation/metrics/__init__.py b/mmpose/evaluation/metrics/__init__.py index 9e82356a49f9cfa5136ed0478dc9dba3281fc837..da14d30cd2c40a5c6251eecd031fd5728e9b07c9 100644 --- a/mmpose/evaluation/metrics/__init__.py +++ b/mmpose/evaluation/metrics/__init__.py @@ -2,15 +2,24 @@ from .coco_metric import CocoMetric from .coco_wholebody_metric import CocoWholeBodyMetric from .hand_metric import InterHandMetric -from .keypoint_2d_metrics import (AUC, EPE, NME, JhmdbPCKAccuracy, - MpiiPCKAccuracy, PCKAccuracy) +from .keypoint_2d_metrics import AUC, EPE, NME, JhmdbPCKAccuracy, MpiiPCKAccuracy, PCKAccuracy from .keypoint_3d_metrics import MPJPE from .keypoint_partition_metric import KeypointPartitionMetric from .posetrack18_metric import PoseTrack18Metric from .simple_keypoint_3d_metrics import SimpleMPJPE __all__ = [ - 'CocoMetric', 'PCKAccuracy', 'MpiiPCKAccuracy', 'JhmdbPCKAccuracy', 'AUC', - 'EPE', 'NME', 'PoseTrack18Metric', 'CocoWholeBodyMetric', - 'KeypointPartitionMetric', 'MPJPE', 'InterHandMetric', 'SimpleMPJPE' + "CocoMetric", + "PCKAccuracy", + "MpiiPCKAccuracy", + "JhmdbPCKAccuracy", + "AUC", + "EPE", + "NME", + "PoseTrack18Metric", + "CocoWholeBodyMetric", + "KeypointPartitionMetric", + "MPJPE", + "InterHandMetric", + "SimpleMPJPE", ] diff --git a/mmpose/evaluation/metrics/coco_metric.py b/mmpose/evaluation/metrics/coco_metric.py index 440693e30167a11cc52bfcf1219e95c19213f4a3..cc5d075b2b8031e3e2068bd199dcc7b6af1ac4ef 100644 --- a/mmpose/evaluation/metrics/coco_metric.py +++ b/mmpose/evaluation/metrics/coco_metric.py @@ -1,32 +1,29 @@ # Copyright (c) OpenMMLab. All rights reserved. import datetime +import os import os.path as osp import tempfile +import traceback from collections import OrderedDict, defaultdict from typing import Dict, Optional, Sequence -import traceback +import cv2 +import matplotlib.pyplot as plt import numpy as np +from matplotlib import rc from mmengine.evaluator import BaseMetric from mmengine.fileio import dump, get_local_path, load from mmengine.logging import MessageHub, MMLogger, print_log -from xtcocotools.coco import COCO -from xtcocotools.cocoeval import COCOeval from mmpose.registry import METRICS from mmpose.structures.bbox import bbox_xyxy2xywh from mmpose.structures.keypoint import find_min_padding_exact, fix_bbox_aspect_ratio -from ..functional import (oks_nms, soft_oks_nms, transform_ann, transform_pred, - transform_sigmas) - -import cv2 -import os - -import matplotlib.pyplot as plt -from matplotlib import rc - +from xtcocotools.coco import COCO +from xtcocotools.cocoeval import COCOeval from xtcocotools.mask import _mask as maskUtils +from ..functional import oks_nms, soft_oks_nms, transform_ann, transform_pred, transform_sigmas + @METRICS.register_module() class CocoMetric(BaseMetric): @@ -101,28 +98,31 @@ class CocoMetric(BaseMetric): If prefix is not provided in the argument, ``self.default_prefix`` will be used instead. Defaults to ``None`` """ - default_prefix: Optional[str] = 'coco' - - def __init__(self, - ann_file: Optional[str] = None, - use_area: bool = True, - iou_type: str = 'keypoints', - score_mode: str = 'bbox_keypoint', - score_thresh_type: str = 'score', - keypoint_score_thr: float = 0.2, - nms_mode: str = 'oks_nms', - nms_thr: float = 0.9, - format_only: bool = False, - pred_converter: Dict = None, - gt_converter: Dict = None, - outfile_prefix: Optional[str] = None, - collect_device: str = 'cpu', - prefix: Optional[str] = None, - extended: list = [False], - match_by_bbox: list = [False], - ignore_border_points: list = [False], - ignore_stats: list = [], - padding: float = 1.25) -> None: + + default_prefix: Optional[str] = "coco" + + def __init__( + self, + ann_file: Optional[str] = None, + use_area: bool = True, + iou_type: str = "keypoints", + score_mode: str = "bbox_keypoint", + score_thresh_type: str = "score", + keypoint_score_thr: float = 0.2, + nms_mode: str = "oks_nms", + nms_thr: float = 0.9, + format_only: bool = False, + pred_converter: Dict = None, + gt_converter: Dict = None, + outfile_prefix: Optional[str] = None, + collect_device: str = "cpu", + prefix: Optional[str] = None, + extended: list = [False], + match_by_bbox: list = [False], + ignore_border_points: list = [False], + ignore_stats: list = [], + padding: float = 1.25, + ) -> None: super().__init__(collect_device=collect_device, prefix=prefix) self.ann_file = ann_file # initialize coco helper with the annotation json file @@ -136,37 +136,31 @@ class CocoMetric(BaseMetric): self.use_area = use_area self.iou_type = iou_type - allowed_score_modes = ['bbox', 'bbox_keypoint', 'bbox_rle', 'keypoint'] + allowed_score_modes = ["bbox", "bbox_keypoint", "bbox_rle", "keypoint"] if score_mode not in allowed_score_modes: - raise ValueError( - "`score_mode` should be one of 'bbox', 'bbox_keypoint', " - f"'bbox_rle', but got {score_mode}") + raise ValueError("`score_mode` should be one of 'bbox', 'bbox_keypoint', " f"'bbox_rle', but got {score_mode}") self.score_mode = score_mode self.keypoint_score_thr = keypoint_score_thr - if score_thresh_type not in ['score', 'prob']: - raise ValueError( - "'score_thresh_type' should be one of 'score' or 'prob'" - ) + if score_thresh_type not in ["score", "prob"]: + raise ValueError("'score_thresh_type' should be one of 'score' or 'prob'") self.score_thresh_type = score_thresh_type - allowed_nms_modes = ['oks_nms', 'soft_oks_nms', 'none'] + allowed_nms_modes = ["oks_nms", "soft_oks_nms", "none"] if nms_mode not in allowed_nms_modes: - raise ValueError( - "`nms_mode` should be one of 'oks_nms', 'soft_oks_nms', " - f"'none', but got {nms_mode}") + raise ValueError("`nms_mode` should be one of 'oks_nms', 'soft_oks_nms', " f"'none', but got {nms_mode}") self.nms_mode = nms_mode self.nms_thr = nms_thr if format_only: - assert outfile_prefix is not None, '`outfile_prefix` can not be '\ - 'None when `format_only` is True, otherwise the result file '\ - 'will be saved to a temp directory which will be cleaned up '\ - 'in the end.' + assert outfile_prefix is not None, ( + "`outfile_prefix` can not be " + "None when `format_only` is True, otherwise the result file " + "will be saved to a temp directory which will be cleaned up " + "in the end." + ) elif ann_file is not None: # do evaluation only if the ground truth annotations exist - assert 'annotations' in load(ann_file), \ - 'Ground truth annotations are required for evaluation '\ - 'when `format_only` is False.' + assert "annotations" in load(ann_file), "Ground truth annotations are required for evaluation " "when `format_only` is False." self.format_only = format_only self.outfile_prefix = outfile_prefix @@ -178,10 +172,8 @@ class CocoMetric(BaseMetric): extended = extended * len_params if len(match_by_bbox) == 1 and len_params > 1: match_by_bbox = match_by_bbox * len_params - assert len(extended) == len(match_by_bbox), \ - 'The length of `extended` and `match_by_bbox` should be the same.' - assert len(extended) >= 1, \ - 'The length of `extended` and `match_by_bbox` should be at least 1.' + assert len(extended) == len(match_by_bbox), "The length of `extended` and `match_by_bbox` should be the same." + assert len(extended) >= 1, "The length of `extended` and `match_by_bbox` should be at least 1." self.extended = extended self.match_by_bbox = match_by_bbox self.ignore_border_points = ignore_border_points @@ -202,42 +194,42 @@ class CocoMetric(BaseMetric): def dataset_meta(self, dataset_meta: dict) -> None: """Set the dataset meta info to the metric.""" if self.gt_converter is not None: - dataset_meta['sigmas'] = transform_sigmas( - dataset_meta['sigmas'], self.gt_converter['num_keypoints'], - self.gt_converter['mapping']) - dataset_meta['num_keypoints'] = len(dataset_meta['sigmas']) + dataset_meta["sigmas"] = transform_sigmas( + dataset_meta["sigmas"], self.gt_converter["num_keypoints"], self.gt_converter["mapping"] + ) + dataset_meta["num_keypoints"] = len(dataset_meta["sigmas"]) self._dataset_meta = dataset_meta if self.coco is None: message = MessageHub.get_current_instance() - ann_file = message.get_info( - f"{dataset_meta['dataset_name']}_ann_file", None) + ann_file = message.get_info(f"{dataset_meta['dataset_name']}_ann_file", None) if ann_file is not None: with get_local_path(ann_file) as local_path: self.coco = COCO(local_path) print_log( - f'CocoMetric for dataset ' + f"CocoMetric for dataset " f"{dataset_meta['dataset_name']} has successfully " - f'loaded the annotation file from {ann_file}', 'current') + f"loaded the annotation file from {ann_file}", + "current", + ) def _compute_min_padding_in_coco(self): """Compute the minimum padding in COCO format.""" if self.coco is None: return - + for _, ann in self.coco.anns.items(): - if 'pad_to_contain' in ann.keys(): + if "pad_to_contain" in ann.keys(): continue - kpts = np.array(ann['keypoints']).reshape(-1, 3) - bbox = np.array(ann['bbox']).flatten() + kpts = np.array(ann["keypoints"]).reshape(-1, 3) + bbox = np.array(ann["bbox"]).flatten() min_padding = find_min_padding_exact(bbox, kpts) - ann['pad_to_contain'] = min_padding + ann["pad_to_contain"] = min_padding return - def process(self, data_batch: Sequence[dict], - data_samples: Sequence[dict]) -> None: + def process(self, data_batch: Sequence[dict], data_samples: Sequence[dict]) -> None: """Process one batch of data samples and predictions. The processed results should be stored in ``self.results``, which will be used to compute the metrics when all batches have been processed. @@ -254,40 +246,38 @@ class CocoMetric(BaseMetric): """ self.results_len = len(self.results) for data_sample in data_samples: - if 'pred_instances' not in data_sample: - raise ValueError( - '`pred_instances` are required to process the ' - f'predictions results in {self.__class__.__name__}. ') + if "pred_instances" not in data_sample: + raise ValueError("`pred_instances` are required to process the " f"predictions results in {self.__class__.__name__}. ") # keypoints.shape: [N, K, 2], # N: number of instances, K: number of keypoints # for topdown-style output, N is usually 1, while for # bottomup-style output, N is the number of instances in the image - keypoints = data_sample['pred_instances']['keypoints'] + keypoints = data_sample["pred_instances"]["keypoints"] N, K, _ = keypoints.shape # [N, K], the scores for all keypoints of all instances - keypoint_scores = data_sample['pred_instances']['keypoint_scores'] + keypoint_scores = data_sample["pred_instances"]["keypoint_scores"] assert keypoint_scores.shape == keypoints.shape[:2] - - if 'keypoints_visible' in data_sample['pred_instances']: - keypoints_visible = data_sample['pred_instances']['keypoints_visible'] + + if "keypoints_visible" in data_sample["pred_instances"]: + keypoints_visible = data_sample["pred_instances"]["keypoints_visible"] else: keypoints_visible = keypoint_scores.copy() - - if 'keypoints_probs' in data_sample['pred_instances']: - keypoints_probs = data_sample['pred_instances']['keypoints_probs'] + + if "keypoints_probs" in data_sample["pred_instances"]: + keypoints_probs = data_sample["pred_instances"]["keypoints_probs"] # keypoints_probs = keypoint_scores.copy() else: self.has_probability = False keypoints_probs = keypoint_scores.copy() - if 'keypoints_oks' in data_sample['pred_instances']: - keypoints_oks = data_sample['pred_instances']['keypoints_oks'] + if "keypoints_oks" in data_sample["pred_instances"]: + keypoints_oks = data_sample["pred_instances"]["keypoints_oks"] else: keypoints_oks = keypoint_scores.copy() - if 'keypoints_error' in data_sample['pred_instances']: - keypoints_error = data_sample['pred_instances']['keypoints_error'] + if "keypoints_error" in data_sample["pred_instances"]: + keypoints_error = data_sample["pred_instances"]["keypoints_error"] else: keypoints_error = keypoint_scores.copy() @@ -301,80 +291,72 @@ class CocoMetric(BaseMetric): keypoints_error = keypoints_error[:, :17] elif K != 17: - raise ValueError('The number of keypoints should be 17 or 21, ' - f'but got {K}.') + raise ValueError("The number of keypoints should be 17 or 21, " f"but got {K}.") - assert keypoints.shape[1] == 17, f'Number of keypoints should be 17 but got {keypoints.shape}' - assert keypoint_scores.shape[1] == 17, f'Number of keypoint scores should be 17 but got {keypoint_scores.shape}' - assert keypoints_visible.shape[1] == 17, f'Number of visible keypoints should be 17 but got {keypoints_visible.shape}' - assert keypoints_probs.shape[1] == 17, f'Number of keypoint probs should be 17 but got {keypoints_probs.shape}' - assert keypoints_oks.shape[1] == 17, f'Number of keypoint oks should be 17 but got {keypoints_oks.shape}' - assert keypoints_error.shape[1] == 17, f'Number of keypoint error should be 17 but got {keypoints_error.shape}' - assert heatmaps.shape[1] == 17, f'Number of heatmaps should be 17 but got {heatmaps.shape}' + assert keypoints.shape[1] == 17, f"Number of keypoints should be 17 but got {keypoints.shape}" + assert keypoint_scores.shape[1] == 17, f"Number of keypoint scores should be 17 but got {keypoint_scores.shape}" + assert keypoints_visible.shape[1] == 17, f"Number of visible keypoints should be 17 but got {keypoints_visible.shape}" + assert keypoints_probs.shape[1] == 17, f"Number of keypoint probs should be 17 but got {keypoints_probs.shape}" + assert keypoints_oks.shape[1] == 17, f"Number of keypoint oks should be 17 but got {keypoints_oks.shape}" + assert keypoints_error.shape[1] == 17, f"Number of keypoint error should be 17 but got {keypoints_error.shape}" + assert heatmaps.shape[1] == 17, f"Number of heatmaps should be 17 but got {heatmaps.shape}" # parse prediction results pred = dict() - pred['id'] = data_sample['id'] - pred['img_id'] = data_sample['img_id'] - - pred['keypoints'] = keypoints - pred['keypoint_scores'] = keypoint_scores - pred['keypoints_visible'] = keypoints_visible - pred['keypoint_probs'] = keypoints_probs - pred['keypoint_oks'] = keypoints_oks - pred['keypoint_error'] = keypoints_error - pred['category_id'] = data_sample.get('category_id', 1) - if 'bboxes' in data_sample['pred_instances']: - pred['bbox'] = bbox_xyxy2xywh( - data_sample['pred_instances']['bboxes']) - - if 'bbox_scores' in data_sample['pred_instances']: + pred["id"] = data_sample["id"] + pred["img_id"] = data_sample["img_id"] + + pred["keypoints"] = keypoints + pred["keypoint_scores"] = keypoint_scores + pred["keypoints_visible"] = keypoints_visible + pred["keypoint_probs"] = keypoints_probs + pred["keypoint_oks"] = keypoints_oks + pred["keypoint_error"] = keypoints_error + pred["category_id"] = data_sample.get("category_id", 1) + if "bboxes" in data_sample["pred_instances"]: + pred["bbox"] = bbox_xyxy2xywh(data_sample["pred_instances"]["bboxes"]) + + if "bbox_scores" in data_sample["pred_instances"]: # some one-stage models will predict bboxes and scores # together with keypoints - bbox_scores = data_sample['pred_instances']['bbox_scores'] - elif ('bbox_scores' not in data_sample['gt_instances'] - or len(data_sample['gt_instances']['bbox_scores']) != - len(keypoints)): + bbox_scores = data_sample["pred_instances"]["bbox_scores"] + elif "bbox_scores" not in data_sample["gt_instances"] or len(data_sample["gt_instances"]["bbox_scores"]) != len(keypoints): # bottom-up models might output different number of # instances from annotation bbox_scores = np.ones(len(keypoints)) else: # top-down models use detected bboxes, the scores of which # are contained in the gt_instances - bbox_scores = data_sample['gt_instances']['bbox_scores'] - pred['bbox_scores'] = bbox_scores + bbox_scores = data_sample["gt_instances"]["bbox_scores"] + pred["bbox_scores"] = bbox_scores # get area information - if 'bbox_scales' in data_sample['gt_instances']: - pred['areas'] = np.prod( - data_sample['gt_instances']['bbox_scales'], axis=1) + if "bbox_scales" in data_sample["gt_instances"]: + pred["areas"] = np.prod(data_sample["gt_instances"]["bbox_scales"], axis=1) # parse gt gt = dict() if self.coco is None: - gt['width'] = data_sample['ori_shape'][1] - gt['height'] = data_sample['ori_shape'][0] - gt['img_id'] = data_sample['img_id'] - if self.iou_type == 'keypoints_crowd': - assert 'crowd_index' in data_sample, \ - '`crowd_index` is required when `self.iou_type` is ' \ - '`keypoints_crowd`' - gt['crowd_index'] = data_sample['crowd_index'] - assert 'raw_ann_info' in data_sample, \ - 'The row ground truth annotations are required for ' \ - 'evaluation when `ann_file` is not provided' - anns = data_sample['raw_ann_info'] - gt['raw_ann_info'] = anns if isinstance(anns, list) else [anns] + gt["width"] = data_sample["ori_shape"][1] + gt["height"] = data_sample["ori_shape"][0] + gt["img_id"] = data_sample["img_id"] + if self.iou_type == "keypoints_crowd": + assert "crowd_index" in data_sample, "`crowd_index` is required when `self.iou_type` is " "`keypoints_crowd`" + gt["crowd_index"] = data_sample["crowd_index"] + assert "raw_ann_info" in data_sample, ( + "The row ground truth annotations are required for " "evaluation when `ann_file` is not provided" + ) + anns = data_sample["raw_ann_info"] + gt["raw_ann_info"] = anns if isinstance(anns, list) else [anns] # add converted result to the results list self.results.append((pred, gt)) processed_len = len(self.results) - self.results_len if processed_len != len(data_samples): - print(f'Warning: {processed_len} samples are processed, ') - print(f'but {len(data_samples)} samples are provided.') - - def gt_to_coco_json(self, gt_dicts: Sequence[dict], - outfile_prefix: str) -> str: + print(f"Warning: {processed_len} samples are processed, ") + print(f"but {len(data_samples)} samples are provided.") + + def gt_to_coco_json(self, gt_dicts: Sequence[dict], outfile_prefix: str) -> str: """Convert ground truth to coco format json file. Args: @@ -419,58 +401,53 @@ class CocoMetric(BaseMetric): for gt_dict in gt_dicts: # filter duplicate image_info - if gt_dict['img_id'] not in img_ids: + if gt_dict["img_id"] not in img_ids: image_info = dict( - id=gt_dict['img_id'], - width=gt_dict['width'], - height=gt_dict['height'], + id=gt_dict["img_id"], + width=gt_dict["width"], + height=gt_dict["height"], ) - if self.iou_type == 'keypoints_crowd': - image_info['crowdIndex'] = gt_dict['crowd_index'] + if self.iou_type == "keypoints_crowd": + image_info["crowdIndex"] = gt_dict["crowd_index"] image_infos.append(image_info) - img_ids.append(gt_dict['img_id']) + img_ids.append(gt_dict["img_id"]) # filter duplicate annotations - for ann in gt_dict['raw_ann_info']: + for ann in gt_dict["raw_ann_info"]: if ann is None: # during evaluation on bottom-up datasets, some images # do not have instance annotation continue annotation = dict( - id=ann['id'], - image_id=ann['image_id'], - category_id=ann['category_id'], - bbox=ann['bbox'], - keypoints=ann['keypoints'], - iscrowd=ann['iscrowd'], + id=ann["id"], + image_id=ann["image_id"], + category_id=ann["category_id"], + bbox=ann["bbox"], + keypoints=ann["keypoints"], + iscrowd=ann["iscrowd"], ) if self.use_area: - assert 'area' in ann, \ - '`area` is required when `self.use_area` is `True`' - annotation['area'] = ann['area'] + assert "area" in ann, "`area` is required when `self.use_area` is `True`" + annotation["area"] = ann["area"] - if self.iou_type == 'keypoints_crowd': - assert 'num_keypoints' in ann, \ - '`num_keypoints` is required when `self.iou_type` ' \ - 'is `keypoints_crowd`' - annotation['num_keypoints'] = ann['num_keypoints'] + if self.iou_type == "keypoints_crowd": + assert "num_keypoints" in ann, "`num_keypoints` is required when `self.iou_type` " "is `keypoints_crowd`" + annotation["num_keypoints"] = ann["num_keypoints"] annotations.append(annotation) - ann_ids.append(ann['id']) + ann_ids.append(ann["id"]) - info = dict( - date_created=str(datetime.datetime.now()), - description='Coco json file converted by mmpose CocoMetric.') + info = dict(date_created=str(datetime.datetime.now()), description="Coco json file converted by mmpose CocoMetric.") coco_json = dict( info=info, images=image_infos, - categories=self.dataset_meta['CLASSES'], + categories=self.dataset_meta["CLASSES"], licenses=None, annotations=annotations, ) - converted_json_path = f'{outfile_prefix}.gt.json' + converted_json_path = f"{outfile_prefix}.gt.json" dump(coco_json, converted_json_path, sort_keys=True, indent=4) return converted_json_path @@ -492,117 +469,101 @@ class CocoMetric(BaseMetric): tmp_dir = None if self.outfile_prefix is None: tmp_dir = tempfile.TemporaryDirectory() - outfile_prefix = osp.join(tmp_dir.name, 'results') + outfile_prefix = osp.join(tmp_dir.name, "results") else: outfile_prefix = self.outfile_prefix if self.coco is None: # use converted gt json file to initialize coco helper - logger.info('Converting ground truth to coco format...') - coco_json_path = self.gt_to_coco_json( - gt_dicts=gts, outfile_prefix=outfile_prefix) + logger.info("Converting ground truth to coco format...") + coco_json_path = self.gt_to_coco_json(gt_dicts=gts, outfile_prefix=outfile_prefix) self.coco = COCO(coco_json_path) if self.gt_converter is not None: for id_, ann in self.coco.anns.items(): - self.coco.anns[id_] = transform_ann( - ann, self.gt_converter['num_keypoints'], - self.gt_converter['mapping']) + self.coco.anns[id_] = transform_ann(ann, self.gt_converter["num_keypoints"], self.gt_converter["mapping"]) kpts = defaultdict(list) # group the preds by img_id for pred in preds: - img_id = pred['img_id'] + img_id = pred["img_id"] if self.pred_converter is not None: - pred = transform_pred(pred, - self.pred_converter['num_keypoints'], - self.pred_converter['mapping']) + pred = transform_pred(pred, self.pred_converter["num_keypoints"], self.pred_converter["mapping"]) + + for idx, keypoints in enumerate(pred["keypoints"]): - for idx, keypoints in enumerate(pred['keypoints']): - instance = { - 'id': pred['id'], - 'img_id': pred['img_id'], - 'category_id': pred['category_id'], - 'keypoints': keypoints, - 'keypoint_scores': pred['keypoint_scores'][idx], - 'bbox_score': pred['bbox_scores'][idx], - 'keypoints_visible': pred['keypoints_visible'][idx], - 'keypoint_probs': pred['keypoint_probs'][idx], - 'keypoint_oks': pred['keypoint_oks'][idx], - 'keypoint_error': pred['keypoint_error'][idx], + "id": pred["id"], + "img_id": pred["img_id"], + "category_id": pred["category_id"], + "keypoints": keypoints, + "keypoint_scores": pred["keypoint_scores"][idx], + "bbox_score": pred["bbox_scores"][idx], + "keypoints_visible": pred["keypoints_visible"][idx], + "keypoint_probs": pred["keypoint_probs"][idx], + "keypoint_oks": pred["keypoint_oks"][idx], + "keypoint_error": pred["keypoint_error"][idx], } - + # breakpoint() - if 'bbox' in pred: - instance['bbox'] = pred['bbox'][idx] - diagonal = np.sqrt( - instance['bbox'][2]**2 + instance['bbox'][3]**2) - if 'areas' in pred: - instance['area'] = pred['areas'][idx] - diagonal = np.sqrt(instance['area']) + if "bbox" in pred: + instance["bbox"] = pred["bbox"][idx] + diagonal = np.sqrt(instance["bbox"][2] ** 2 + instance["bbox"][3] ** 2) + if "areas" in pred: + instance["area"] = pred["areas"][idx] + diagonal = np.sqrt(instance["area"]) else: # use keypoint to calculate bbox and get area - area = ( - np.max(keypoints[:, 0]) - np.min(keypoints[:, 0])) * ( - np.max(keypoints[:, 1]) - np.min(keypoints[:, 1])) - instance['area'] = area + area = (np.max(keypoints[:, 0]) - np.min(keypoints[:, 0])) * (np.max(keypoints[:, 1]) - np.min(keypoints[:, 1])) + instance["area"] = area diagonal = np.sqrt(area) - + kpts[img_id].append(instance) # sort keypoint results according to id and remove duplicate ones - kpts = self._sort_and_unique_bboxes(kpts, key='id') + kpts = self._sort_and_unique_bboxes(kpts, key="id") # score the prediction results according to `score_mode` # and perform NMS according to `nms_mode` valid_kpts = defaultdict(list) if self.pred_converter is not None: - num_keypoints = self.pred_converter['num_keypoints'] + num_keypoints = self.pred_converter["num_keypoints"] else: - num_keypoints = self.dataset_meta['num_keypoints'] + num_keypoints = self.dataset_meta["num_keypoints"] for img_id, instances in kpts.items(): for instance in instances: # concatenate the keypoint coordinates and scores - instance['keypoints'] = np.concatenate([ - instance['keypoints'], instance['keypoint_probs'][:, None] - ], - axis=-1) - if self.score_mode == 'bbox': - instance['score'] = instance['bbox_score'] - elif self.score_mode == 'keypoint': - instance['score'] = np.mean(instance['keypoint_scores']) + instance["keypoints"] = np.concatenate([instance["keypoints"], instance["keypoint_probs"][:, None]], axis=-1) + if self.score_mode == "bbox": + instance["score"] = instance["bbox_score"] + elif self.score_mode == "keypoint": + instance["score"] = np.mean(instance["keypoint_scores"]) else: - bbox_score = instance['bbox_score'] - if self.score_mode == 'bbox_rle': - keypoint_scores = instance['keypoint_scores'] - instance['score'] = float(bbox_score + - np.mean(keypoint_scores) + - np.max(keypoint_scores)) + bbox_score = instance["bbox_score"] + if self.score_mode == "bbox_rle": + keypoint_scores = instance["keypoint_scores"] + instance["score"] = float(bbox_score + np.mean(keypoint_scores) + np.max(keypoint_scores)) else: # self.score_mode == 'bbox_keypoint': mean_kpt_score = 0 valid_num = 0 for kpt_idx in range(num_keypoints): - kpt_score = instance['keypoint_scores'][kpt_idx] - kpt_prob = instance['keypoint_probs'][kpt_idx] - kpt_thresh = kpt_score if self.score_thresh_type == 'score' else kpt_prob + kpt_score = instance["keypoint_scores"][kpt_idx] + kpt_prob = instance["keypoint_probs"][kpt_idx] + kpt_thresh = kpt_score if self.score_thresh_type == "score" else kpt_prob if kpt_thresh > self.keypoint_score_thr: mean_kpt_score += kpt_score valid_num += 1 if valid_num != 0: mean_kpt_score /= valid_num - instance['score'] = bbox_score * mean_kpt_score + instance["score"] = bbox_score * mean_kpt_score # perform nms - if self.nms_mode == 'none': + if self.nms_mode == "none": valid_kpts[img_id] = instances else: - nms = oks_nms if self.nms_mode == 'oks_nms' else soft_oks_nms - keep = nms( - instances, - self.nms_thr, - sigmas=self.dataset_meta['sigmas']) + nms = oks_nms if self.nms_mode == "oks_nms" else soft_oks_nms + keep = nms(instances, self.nms_thr, sigmas=self.dataset_meta["sigmas"]) valid_kpts[img_id] = [instances[_keep] for _keep in keep] # convert results to coco style and dump into a json file @@ -610,14 +571,13 @@ class CocoMetric(BaseMetric): # only format the results without doing quantitative evaluation if self.format_only: - logger.info('results are saved in ' - f'{osp.dirname(outfile_prefix)}') + logger.info("results are saved in " f"{osp.dirname(outfile_prefix)}") return {} eval_results = OrderedDict() - + # mAP evaluation results - logger.info(f'Evaluating {self.__class__.__name__}...') + logger.info(f"Evaluating {self.__class__.__name__}...") self.prob_thr = 0.51 # Localization evaluation results @@ -625,15 +585,13 @@ class CocoMetric(BaseMetric): name_value = OrderedDict(info_str) eval_results.update(name_value) - - logger.info('Number of values per dataset: {}'.format(len(eval_results))) + logger.info("Number of values per dataset: {}".format(len(eval_results))) if tmp_dir is not None: tmp_dir.cleanup() return eval_results - def results2json(self, keypoints: Dict[int, list], - outfile_prefix: str) -> str: + def results2json(self, keypoints: Dict[int, list], outfile_prefix: str) -> str: """Dump the keypoint detection results to a COCO style json file. Args: @@ -650,29 +608,28 @@ class CocoMetric(BaseMetric): cat_results = [] for _, img_kpts in keypoints.items(): - _keypoints = np.array( - [img_kpt['keypoints'] for img_kpt in img_kpts]) - num_keypoints = self.dataset_meta['num_keypoints'] + _keypoints = np.array([img_kpt["keypoints"] for img_kpt in img_kpts]) + num_keypoints = self.dataset_meta["num_keypoints"] # collect all the person keypoints in current image _keypoints = _keypoints.reshape(-1, num_keypoints * 3) result = [] for img_kpt, keypoint in zip(img_kpts, _keypoints): res = { - 'image_id': img_kpt['img_id'], - 'category_id': img_kpt['category_id'], - 'keypoints': keypoint.tolist(), - 'score': float(img_kpt['score']), + "image_id": img_kpt["img_id"], + "category_id": img_kpt["category_id"], + "keypoints": keypoint.tolist(), + "score": float(img_kpt["score"]), } - if 'bbox' in img_kpt: - res['bbox'] = img_kpt['bbox'].tolist() - if 'keypoint_probs' in img_kpt: - res['probs'] = img_kpt['keypoint_probs'].tolist() + if "bbox" in img_kpt: + res["bbox"] = img_kpt["bbox"].tolist() + if "keypoint_probs" in img_kpt: + res["probs"] = img_kpt["keypoint_probs"].tolist() result.append(res) cat_results.extend(result) - res_file = f'{outfile_prefix}.keypoints.json' + res_file = f"{outfile_prefix}.keypoints.json" dump(cat_results, res_file, sort_keys=True, indent=4) def _do_python_keypoint_eval(self, outfile_prefix: str) -> list: @@ -687,14 +644,12 @@ class CocoMetric(BaseMetric): list: a list of tuples. Each tuple contains the evaluation stats name and corresponding stats value. """ - res_file = f'{outfile_prefix}.keypoints.json' + res_file = f"{outfile_prefix}.keypoints.json" coco_det = self.coco.loadRes(res_file) - sigmas = self.dataset_meta['sigmas'] + sigmas = self.dataset_meta["sigmas"] info_str = [] - for extended_oks, match_by_bbox, ignore_border_points in zip( - self.extended, self.match_by_bbox, self.ignore_border_points - ): + for extended_oks, match_by_bbox, ignore_border_points in zip(self.extended, self.match_by_bbox, self.ignore_border_points): prefix = "" suffix = "" if match_by_bbox: @@ -705,10 +660,8 @@ class CocoMetric(BaseMetric): suffix = suffix + "_NoBrd" conf_thr = self.prob_thr - print("+"*80) - print("COCO Eval params: Bbox {:5s}, ExOKS {:5s}".format( - str(match_by_bbox), str(extended_oks) - ), end="") + print("+" * 80) + print("COCO Eval params: Bbox {:5s}, ExOKS {:5s}".format(str(match_by_bbox), str(extended_oks)), end="") if extended_oks: print(" with conf_thr: {:.2f} (has probability: {})".format(conf_thr, self.has_probability), end="") print() @@ -723,7 +676,7 @@ class CocoMetric(BaseMetric): match_by_bbox=match_by_bbox, confidence_thr=conf_thr, padding=self.padding, - ignore_near_bbox=ignore_border_points + ignore_near_bbox=ignore_border_points, ) coco_eval.params.useSegm = None coco_eval.evaluate() @@ -733,31 +686,23 @@ class CocoMetric(BaseMetric): try: stats_names = coco_eval.stats_names except AttributeError: - if self.iou_type == 'keypoints_crowd': - stats_names = [ - 'AP', 'AP .5', 'AP .75', 'AR', 'AR .5', 'AR .75', - 'AP(E)', 'AP(M)', 'AP(H)' - ] + if self.iou_type == "keypoints_crowd": + stats_names = ["AP", "AP .5", "AP .75", "AR", "AR .5", "AR .75", "AP(E)", "AP(M)", "AP(H)"] else: - stats_names = [ - 'AP', 'AP .5', 'AP .75', 'AP (M)', 'AP (L)', 'AR', - 'AR .5', 'AR .75', 'AR (M)', 'AR (L)' - ] + stats_names = ["AP", "AP .5", "AP .75", "AP (M)", "AP (L)", "AR", "AR .5", "AR .75", "AR (M)", "AR (L)"] i_str = list(zip(stats_names, coco_eval.stats)) ignore_stats = self.ignore_stats # if match_by_bbox or extended_oks: # ignore_stats.extend(['AP (M)', 'AP (L)', 'AR (M)', 'AR (L)', 'AR']) i_str = [(k, v) for k, v in i_str if k not in self.ignore_stats] - i_str = [(f'{prefix}{k}', v) for k, v in i_str] - i_str = [(f'{k}{suffix}', v) for k, v in i_str] + i_str = [(f"{prefix}{k}", v) for k, v in i_str] + i_str = [(f"{k}{suffix}", v) for k, v in i_str] info_str.extend(i_str) return info_str - def _sort_and_unique_bboxes(self, - kpts: Dict[int, list], - key: str = 'id') -> Dict[int, list]: + def _sort_and_unique_bboxes(self, kpts: Dict[int, list], key: str = "id") -> Dict[int, list]: """Sort keypoint detection results in each image and remove the duplicate ones. Usually performed in multi-batch testing. diff --git a/mmpose/evaluation/metrics/coco_wholebody_metric.py b/mmpose/evaluation/metrics/coco_wholebody_metric.py index 74dc52c2ad1db6ca4d296ed2b620bcf7290f93c2..f3264ff93cd8497321bf388924199fd95d21ac15 100644 --- a/mmpose/evaluation/metrics/coco_wholebody_metric.py +++ b/mmpose/evaluation/metrics/coco_wholebody_metric.py @@ -4,9 +4,10 @@ from typing import Dict, Optional, Sequence import numpy as np from mmengine.fileio import dump -from xtcocotools.cocoeval import COCOeval from mmpose.registry import METRICS +from xtcocotools.cocoeval import COCOeval + from .coco_metric import CocoMetric @@ -71,15 +72,15 @@ class CocoWholeBodyMetric(CocoMetric): If not specified, a temp file will be created. Defaults to ``None`` **kwargs: Keyword parameters passed to :class:`mmeval.BaseMetric` """ - default_prefix: Optional[str] = 'coco-wholebody' + + default_prefix: Optional[str] = "coco-wholebody" body_num = 17 foot_num = 6 face_num = 68 left_hand_num = 21 right_hand_num = 21 - def gt_to_coco_json(self, gt_dicts: Sequence[dict], - outfile_prefix: str) -> str: + def gt_to_coco_json(self, gt_dicts: Sequence[dict], outfile_prefix: str) -> str: """Convert ground truth to coco format json file. Args: @@ -125,56 +126,52 @@ class CocoWholeBodyMetric(CocoMetric): for gt_dict in gt_dicts: # filter duplicate image_info - if gt_dict['img_id'] not in img_ids: + if gt_dict["img_id"] not in img_ids: image_info = dict( - id=gt_dict['img_id'], - width=gt_dict['width'], - height=gt_dict['height'], + id=gt_dict["img_id"], + width=gt_dict["width"], + height=gt_dict["height"], ) - if self.iou_type == 'keypoints_crowd': - image_info['crowdIndex'] = gt_dict['crowd_index'] + if self.iou_type == "keypoints_crowd": + image_info["crowdIndex"] = gt_dict["crowd_index"] image_infos.append(image_info) - img_ids.append(gt_dict['img_id']) + img_ids.append(gt_dict["img_id"]) # filter duplicate annotations - for ann in gt_dict['raw_ann_info']: + for ann in gt_dict["raw_ann_info"]: annotation = dict( - id=ann['id'], - image_id=ann['image_id'], - category_id=ann['category_id'], - bbox=ann['bbox'], - keypoints=ann['keypoints'], - foot_kpts=ann['foot_kpts'], - face_kpts=ann['face_kpts'], - lefthand_kpts=ann['lefthand_kpts'], - righthand_kpts=ann['righthand_kpts'], - iscrowd=ann['iscrowd'], + id=ann["id"], + image_id=ann["image_id"], + category_id=ann["category_id"], + bbox=ann["bbox"], + keypoints=ann["keypoints"], + foot_kpts=ann["foot_kpts"], + face_kpts=ann["face_kpts"], + lefthand_kpts=ann["lefthand_kpts"], + righthand_kpts=ann["righthand_kpts"], + iscrowd=ann["iscrowd"], ) if self.use_area: - assert 'area' in ann, \ - '`area` is required when `self.use_area` is `True`' - annotation['area'] = ann['area'] + assert "area" in ann, "`area` is required when `self.use_area` is `True`" + annotation["area"] = ann["area"] annotations.append(annotation) - ann_ids.append(ann['id']) + ann_ids.append(ann["id"]) - info = dict( - date_created=str(datetime.datetime.now()), - description='Coco json file converted by mmpose CocoMetric.') + info = dict(date_created=str(datetime.datetime.now()), description="Coco json file converted by mmpose CocoMetric.") coco_json: dict = dict( info=info, images=image_infos, - categories=self.dataset_meta['CLASSES'], + categories=self.dataset_meta["CLASSES"], licenses=None, annotations=annotations, ) - converted_json_path = f'{outfile_prefix}.gt.json' + converted_json_path = f"{outfile_prefix}.gt.json" dump(coco_json, converted_json_path, sort_keys=True, indent=4) return converted_json_path - def results2json(self, keypoints: Dict[int, list], - outfile_prefix: str) -> str: + def results2json(self, keypoints: Dict[int, list], outfile_prefix: str) -> str: """Dump the keypoint detection results to a COCO style json file. Args: @@ -191,32 +188,31 @@ class CocoWholeBodyMetric(CocoMetric): cat_id = 1 cat_results = [] - cuts = np.cumsum([ - 0, self.body_num, self.foot_num, self.face_num, self.left_hand_num, - self.right_hand_num - ]) * 3 + cuts = np.cumsum([0, self.body_num, self.foot_num, self.face_num, self.left_hand_num, self.right_hand_num]) * 3 for _, img_kpts in keypoints.items(): - _keypoints = np.array( - [img_kpt['keypoints'] for img_kpt in img_kpts]) - num_keypoints = self.dataset_meta['num_keypoints'] + _keypoints = np.array([img_kpt["keypoints"] for img_kpt in img_kpts]) + num_keypoints = self.dataset_meta["num_keypoints"] # collect all the person keypoints in current image _keypoints = _keypoints.reshape(-1, num_keypoints * 3) - result = [{ - 'image_id': img_kpt['img_id'], - 'category_id': cat_id, - 'keypoints': _keypoint[cuts[0]:cuts[1]].tolist(), - 'foot_kpts': _keypoint[cuts[1]:cuts[2]].tolist(), - 'face_kpts': _keypoint[cuts[2]:cuts[3]].tolist(), - 'lefthand_kpts': _keypoint[cuts[3]:cuts[4]].tolist(), - 'righthand_kpts': _keypoint[cuts[4]:cuts[5]].tolist(), - 'score': float(img_kpt['score']), - } for img_kpt, _keypoint in zip(img_kpts, _keypoints)] + result = [ + { + "image_id": img_kpt["img_id"], + "category_id": cat_id, + "keypoints": _keypoint[cuts[0] : cuts[1]].tolist(), + "foot_kpts": _keypoint[cuts[1] : cuts[2]].tolist(), + "face_kpts": _keypoint[cuts[2] : cuts[3]].tolist(), + "lefthand_kpts": _keypoint[cuts[3] : cuts[4]].tolist(), + "righthand_kpts": _keypoint[cuts[4] : cuts[5]].tolist(), + "score": float(img_kpt["score"]), + } + for img_kpt, _keypoint in zip(img_kpts, _keypoints) + ] cat_results.extend(result) - res_file = f'{outfile_prefix}.keypoints.json' + res_file = f"{outfile_prefix}.keypoints.json" dump(cat_results, res_file, sort_keys=True, indent=4) def _do_python_keypoint_eval(self, outfile_prefix: str) -> list: @@ -231,85 +227,49 @@ class CocoWholeBodyMetric(CocoMetric): list: a list of tuples. Each tuple contains the evaluation stats name and corresponding stats value. """ - res_file = f'{outfile_prefix}.keypoints.json' + res_file = f"{outfile_prefix}.keypoints.json" coco_det = self.coco.loadRes(res_file) - sigmas = self.dataset_meta['sigmas'] - - cuts = np.cumsum([ - 0, self.body_num, self.foot_num, self.face_num, self.left_hand_num, - self.right_hand_num - ]) - - coco_eval = COCOeval( - self.coco, - coco_det, - 'keypoints_body', - sigmas[cuts[0]:cuts[1]], - use_area=self.use_area) + sigmas = self.dataset_meta["sigmas"] + + cuts = np.cumsum([0, self.body_num, self.foot_num, self.face_num, self.left_hand_num, self.right_hand_num]) + + coco_eval = COCOeval(self.coco, coco_det, "keypoints_body", sigmas[cuts[0] : cuts[1]], use_area=self.use_area) coco_eval.params.useSegm = None coco_eval.evaluate() coco_eval.accumulate() coco_eval.summarize() - coco_eval = COCOeval( - self.coco, - coco_det, - 'keypoints_foot', - sigmas[cuts[1]:cuts[2]], - use_area=self.use_area) + coco_eval = COCOeval(self.coco, coco_det, "keypoints_foot", sigmas[cuts[1] : cuts[2]], use_area=self.use_area) coco_eval.params.useSegm = None coco_eval.evaluate() coco_eval.accumulate() coco_eval.summarize() - coco_eval = COCOeval( - self.coco, - coco_det, - 'keypoints_face', - sigmas[cuts[2]:cuts[3]], - use_area=self.use_area) + coco_eval = COCOeval(self.coco, coco_det, "keypoints_face", sigmas[cuts[2] : cuts[3]], use_area=self.use_area) coco_eval.params.useSegm = None coco_eval.evaluate() coco_eval.accumulate() coco_eval.summarize() - coco_eval = COCOeval( - self.coco, - coco_det, - 'keypoints_lefthand', - sigmas[cuts[3]:cuts[4]], - use_area=self.use_area) + coco_eval = COCOeval(self.coco, coco_det, "keypoints_lefthand", sigmas[cuts[3] : cuts[4]], use_area=self.use_area) coco_eval.params.useSegm = None coco_eval.evaluate() coco_eval.accumulate() coco_eval.summarize() - coco_eval = COCOeval( - self.coco, - coco_det, - 'keypoints_righthand', - sigmas[cuts[4]:cuts[5]], - use_area=self.use_area) + coco_eval = COCOeval(self.coco, coco_det, "keypoints_righthand", sigmas[cuts[4] : cuts[5]], use_area=self.use_area) coco_eval.params.useSegm = None coco_eval.evaluate() coco_eval.accumulate() coco_eval.summarize() - coco_eval = COCOeval( - self.coco, - coco_det, - 'keypoints_wholebody', - sigmas, - use_area=self.use_area) + coco_eval = COCOeval(self.coco, coco_det, "keypoints_wholebody", sigmas, use_area=self.use_area) coco_eval.params.useSegm = None coco_eval.evaluate() coco_eval.accumulate() coco_eval.summarize() - stats_names = [ - 'AP', 'AP .5', 'AP .75', 'AP (M)', 'AP (L)', 'AR', 'AR .5', - 'AR .75', 'AR (M)', 'AR (L)' - ] + stats_names = ["AP", "AP .5", "AP .75", "AP (M)", "AP (L)", "AR", "AR .5", "AR .75", "AR (M)", "AR (L)"] info_str = list(zip(stats_names, coco_eval.stats)) diff --git a/mmpose/evaluation/metrics/hand_metric.py b/mmpose/evaluation/metrics/hand_metric.py index 004e168a7d195f2c93a1292f6c96880e82300318..48e9491362204a328d2ca3f3dc41d3ce8258b0d6 100644 --- a/mmpose/evaluation/metrics/hand_metric.py +++ b/mmpose/evaluation/metrics/hand_metric.py @@ -7,28 +7,26 @@ from mmengine.logging import MMLogger from mmpose.codecs.utils import pixel_to_camera from mmpose.registry import METRICS + from ..functional import keypoint_epe @METRICS.register_module() class InterHandMetric(BaseMetric): - METRICS = {'MPJPE', 'MRRPE', 'HandednessAcc'} + METRICS = {"MPJPE", "MRRPE", "HandednessAcc"} - def __init__(self, - modes: List[str] = ['MPJPE', 'MRRPE', 'HandednessAcc'], - collect_device: str = 'cpu', - prefix: Optional[str] = None) -> None: + def __init__( + self, modes: List[str] = ["MPJPE", "MRRPE", "HandednessAcc"], collect_device: str = "cpu", prefix: Optional[str] = None + ) -> None: super().__init__(collect_device=collect_device, prefix=prefix) for mode in modes: if mode not in self.METRICS: - raise ValueError("`mode` should be 'MPJPE', 'MRRPE', or " - f"'HandednessAcc', but got '{mode}'.") + raise ValueError("`mode` should be 'MPJPE', 'MRRPE', or " f"'HandednessAcc', but got '{mode}'.") self.modes = modes - def process(self, data_batch: Sequence[dict], - data_samples: Sequence[dict]) -> None: + def process(self, data_batch: Sequence[dict], data_samples: Sequence[dict]) -> None: """Process one batch of data samples and predictions. The processed results should be stored in ``self.results``, which will be used to compute the metrics when all batches have been processed. @@ -41,42 +39,40 @@ class InterHandMetric(BaseMetric): """ for data_sample in data_samples: # predicted keypoints coordinates, [1, K, D] - pred_coords = data_sample['pred_instances']['keypoints'] + pred_coords = data_sample["pred_instances"]["keypoints"] _, K, _ = pred_coords.shape pred_coords_cam = pred_coords.copy() # ground truth data_info - gt = data_sample['gt_instances'] + gt = data_sample["gt_instances"] # ground truth keypoints coordinates, [1, K, D] - gt_coords = gt['keypoints_cam'] + gt_coords = gt["keypoints_cam"] keypoints_cam = gt_coords.copy() # ground truth keypoints_visible, [1, K, 1] - mask = gt['keypoints_visible'].astype(bool).reshape(1, -1) + mask = gt["keypoints_visible"].astype(bool).reshape(1, -1) - pred_hand_type = data_sample['pred_instances']['hand_type'] - gt_hand_type = data_sample['hand_type'] - if pred_hand_type is None and 'HandednessAcc' in self.modes: - raise KeyError('metric HandednessAcc is not supported') + pred_hand_type = data_sample["pred_instances"]["hand_type"] + gt_hand_type = data_sample["hand_type"] + if pred_hand_type is None and "HandednessAcc" in self.modes: + raise KeyError("metric HandednessAcc is not supported") - pred_root_depth = data_sample['pred_instances']['rel_root_depth'] - if pred_root_depth is None and 'MRRPE' in self.modes: - raise KeyError('metric MRRPE is not supported') + pred_root_depth = data_sample["pred_instances"]["rel_root_depth"] + if pred_root_depth is None and "MRRPE" in self.modes: + raise KeyError("metric MRRPE is not supported") - abs_depth = data_sample['abs_depth'] - focal = data_sample['focal'] - principal_pt = data_sample['principal_pt'] + abs_depth = data_sample["abs_depth"] + focal = data_sample["focal"] + principal_pt = data_sample["principal_pt"] result = {} - if 'MPJPE' in self.modes: + if "MPJPE" in self.modes: keypoints_cam[..., :21, :] -= keypoints_cam[..., 20, :] keypoints_cam[..., 21:, :] -= keypoints_cam[..., 41, :] pred_coords_cam[..., :21, 2] += abs_depth[0] pred_coords_cam[..., 21:, 2] += abs_depth[1] - pred_coords_cam = pixel_to_camera(pred_coords_cam, focal[0], - focal[1], principal_pt[0], - principal_pt[1]) + pred_coords_cam = pixel_to_camera(pred_coords_cam, focal[0], focal[1], principal_pt[0], principal_pt[1]) pred_coords_cam[..., :21, :] -= pred_coords_cam[..., 20, :] pred_coords_cam[..., 21:, :] -= pred_coords_cam[..., 41, :] @@ -88,57 +84,43 @@ class InterHandMetric(BaseMetric): single_mask = mask interacting_mask = np.zeros((1, K), dtype=bool) - result['pred_coords'] = pred_coords_cam - result['gt_coords'] = keypoints_cam - result['mask'] = mask - result['single_mask'] = single_mask - result['interacting_mask'] = interacting_mask - - if 'HandednessAcc' in self.modes: - hand_type_mask = data_sample['hand_type_valid'] > 0 - result['pred_hand_type'] = pred_hand_type - result['gt_hand_type'] = gt_hand_type - result['hand_type_mask'] = hand_type_mask - - if 'MRRPE' in self.modes: - keypoints_visible = gt['keypoints_visible'] - if gt_hand_type.all() and keypoints_visible[ - ..., 20] and keypoints_visible[..., 41]: + result["pred_coords"] = pred_coords_cam + result["gt_coords"] = keypoints_cam + result["mask"] = mask + result["single_mask"] = single_mask + result["interacting_mask"] = interacting_mask + + if "HandednessAcc" in self.modes: + hand_type_mask = data_sample["hand_type_valid"] > 0 + result["pred_hand_type"] = pred_hand_type + result["gt_hand_type"] = gt_hand_type + result["hand_type_mask"] = hand_type_mask + + if "MRRPE" in self.modes: + keypoints_visible = gt["keypoints_visible"] + if gt_hand_type.all() and keypoints_visible[..., 20] and keypoints_visible[..., 41]: rel_root_mask = np.array([True]) - pred_left_root_coords = np.array( - pred_coords[..., 41, :], dtype=np.float32) - pred_left_root_coords[..., - 2] += abs_depth[0] + pred_root_depth - pred_left_root_coords = pixel_to_camera( - pred_left_root_coords, focal[0], focal[1], - principal_pt[0], principal_pt[1]) + pred_left_root_coords = np.array(pred_coords[..., 41, :], dtype=np.float32) + pred_left_root_coords[..., 2] += abs_depth[0] + pred_root_depth + pred_left_root_coords = pixel_to_camera(pred_left_root_coords, focal[0], focal[1], principal_pt[0], principal_pt[1]) - pred_right_root_coords = np.array( - pred_coords[..., 20, :], dtype=np.float32) + pred_right_root_coords = np.array(pred_coords[..., 20, :], dtype=np.float32) pred_right_root_coords[..., 2] += abs_depth[0] - pred_right_root_coords = pixel_to_camera( - pred_right_root_coords, focal[0], focal[1], - principal_pt[0], principal_pt[1]) - pred_rel_root_coords = pred_left_root_coords - \ - pred_right_root_coords - pred_rel_root_coords = np.expand_dims( - pred_rel_root_coords, axis=0) - gt_rel_root_coords = gt_coords[..., - 41, :] - gt_coords[..., - 20, :] - gt_rel_root_coords = np.expand_dims( - gt_rel_root_coords, axis=0) + pred_right_root_coords = pixel_to_camera(pred_right_root_coords, focal[0], focal[1], principal_pt[0], principal_pt[1]) + pred_rel_root_coords = pred_left_root_coords - pred_right_root_coords + pred_rel_root_coords = np.expand_dims(pred_rel_root_coords, axis=0) + gt_rel_root_coords = gt_coords[..., 41, :] - gt_coords[..., 20, :] + gt_rel_root_coords = np.expand_dims(gt_rel_root_coords, axis=0) else: rel_root_mask = np.array([False]) pred_rel_root_coords = np.array([[0, 0, 0]]) - pred_rel_root_coords = pred_rel_root_coords.reshape( - 1, 1, 3) + pred_rel_root_coords = pred_rel_root_coords.reshape(1, 1, 3) gt_rel_root_coords = np.array([[0, 0, 0]]).reshape(1, 1, 3) - result['pred_rel_root_coords'] = pred_rel_root_coords - result['gt_rel_root_coords'] = gt_rel_root_coords - result['rel_root_mask'] = rel_root_mask + result["pred_rel_root_coords"] = pred_rel_root_coords + result["gt_rel_root_coords"] = gt_rel_root_coords + result["rel_root_mask"] = rel_root_mask self.results.append(result) @@ -156,45 +138,32 @@ class InterHandMetric(BaseMetric): metrics = dict() - logger.info(f'Evaluating {self.__class__.__name__}...') + logger.info(f"Evaluating {self.__class__.__name__}...") - if 'MPJPE' in self.modes: + if "MPJPE" in self.modes: # pred_coords: [N, K, D] - pred_coords = np.concatenate( - [result['pred_coords'] for result in results]) + pred_coords = np.concatenate([result["pred_coords"] for result in results]) # gt_coords: [N, K, D] - gt_coords = np.concatenate( - [result['gt_coords'] for result in results]) + gt_coords = np.concatenate([result["gt_coords"] for result in results]) # mask: [N, K] - mask = np.concatenate([result['mask'] for result in results]) - single_mask = np.concatenate( - [result['single_mask'] for result in results]) - interacting_mask = np.concatenate( - [result['interacting_mask'] for result in results]) - - metrics['MPJPE_all'] = keypoint_epe(pred_coords, gt_coords, mask) - metrics['MPJPE_single'] = keypoint_epe(pred_coords, gt_coords, - single_mask) - metrics['MPJPE_interacting'] = keypoint_epe( - pred_coords, gt_coords, interacting_mask) - - if 'HandednessAcc' in self.modes: - pred_hand_type = np.concatenate( - [result['pred_hand_type'] for result in results]) - gt_hand_type = np.concatenate( - [result['gt_hand_type'] for result in results]) - hand_type_mask = np.concatenate( - [result['hand_type_mask'] for result in results]) + mask = np.concatenate([result["mask"] for result in results]) + single_mask = np.concatenate([result["single_mask"] for result in results]) + interacting_mask = np.concatenate([result["interacting_mask"] for result in results]) + + metrics["MPJPE_all"] = keypoint_epe(pred_coords, gt_coords, mask) + metrics["MPJPE_single"] = keypoint_epe(pred_coords, gt_coords, single_mask) + metrics["MPJPE_interacting"] = keypoint_epe(pred_coords, gt_coords, interacting_mask) + + if "HandednessAcc" in self.modes: + pred_hand_type = np.concatenate([result["pred_hand_type"] for result in results]) + gt_hand_type = np.concatenate([result["gt_hand_type"] for result in results]) + hand_type_mask = np.concatenate([result["hand_type_mask"] for result in results]) acc = (pred_hand_type == gt_hand_type).all(axis=-1) - metrics['HandednessAcc'] = np.mean(acc[hand_type_mask]) - - if 'MRRPE' in self.modes: - pred_rel_root_coords = np.concatenate( - [result['pred_rel_root_coords'] for result in results]) - gt_rel_root_coords = np.concatenate( - [result['gt_rel_root_coords'] for result in results]) - rel_root_mask = np.array( - [result['rel_root_mask'] for result in results]) - metrics['MRRPE'] = keypoint_epe(pred_rel_root_coords, - gt_rel_root_coords, rel_root_mask) + metrics["HandednessAcc"] = np.mean(acc[hand_type_mask]) + + if "MRRPE" in self.modes: + pred_rel_root_coords = np.concatenate([result["pred_rel_root_coords"] for result in results]) + gt_rel_root_coords = np.concatenate([result["gt_rel_root_coords"] for result in results]) + rel_root_mask = np.array([result["rel_root_mask"] for result in results]) + metrics["MRRPE"] = keypoint_epe(pred_rel_root_coords, gt_rel_root_coords, rel_root_mask) return metrics diff --git a/mmpose/evaluation/metrics/keypoint_2d_metrics.py b/mmpose/evaluation/metrics/keypoint_2d_metrics.py index c0be4b398f2d7310aa687a2376efee3eb068d3cd..59a19f8dd63d214c43ffbf56f50192604cad48f1 100644 --- a/mmpose/evaluation/metrics/keypoint_2d_metrics.py +++ b/mmpose/evaluation/metrics/keypoint_2d_metrics.py @@ -7,8 +7,8 @@ from mmengine.evaluator import BaseMetric from mmengine.logging import MMLogger from mmpose.registry import METRICS -from ..functional import (keypoint_auc, keypoint_epe, keypoint_nme, - keypoint_pck_accuracy) + +from ..functional import keypoint_auc, keypoint_epe, keypoint_nme, keypoint_pck_accuracy @METRICS.register_module() @@ -67,26 +67,22 @@ class PCKAccuracy(BaseMetric): """ - def __init__(self, - thr: float = 0.05, - norm_item: Union[str, Sequence[str]] = 'bbox', - collect_device: str = 'cpu', - prefix: Optional[str] = None) -> None: + def __init__( + self, thr: float = 0.05, norm_item: Union[str, Sequence[str]] = "bbox", collect_device: str = "cpu", prefix: Optional[str] = None + ) -> None: super().__init__(collect_device=collect_device, prefix=prefix) self.thr = thr - self.norm_item = norm_item if isinstance(norm_item, - (tuple, - list)) else [norm_item] - allow_normalized_items = ['bbox', 'head', 'torso'] + self.norm_item = norm_item if isinstance(norm_item, (tuple, list)) else [norm_item] + allow_normalized_items = ["bbox", "head", "torso"] for item in self.norm_item: if item not in allow_normalized_items: raise KeyError( - f'The normalized item {item} is not supported by ' + f"The normalized item {item} is not supported by " f"{self.__class__.__name__}. Should be one of 'bbox', " - f"'head', 'torso', but got {item}.") + f"'head', 'torso', but got {item}." + ) - def process(self, data_batch: Sequence[dict], - data_samples: Sequence[dict]) -> None: + def process(self, data_batch: Sequence[dict], data_samples: Sequence[dict]) -> None: """Process one batch of data samples and predictions. The processed @@ -100,51 +96,45 @@ class PCKAccuracy(BaseMetric): """ for data_sample in data_samples: # predicted keypoints coordinates, [1, K, D] - pred_coords = data_sample['pred_instances']['keypoints'] + pred_coords = data_sample["pred_instances"]["keypoints"] # ground truth data_info - gt = data_sample['gt_instances'] + gt = data_sample["gt_instances"] # ground truth keypoints coordinates, [1, K, D] - gt_coords = gt['keypoints'] + gt_coords = gt["keypoints"] # ground truth keypoints_visible, [1, K, 1] - mask = gt['keypoints_visible'].astype(bool) + mask = gt["keypoints_visible"].astype(bool) if mask.ndim == 3: mask = mask[:, :, 0] mask = mask.reshape(1, -1) result = { - 'pred_coords': pred_coords, - 'gt_coords': gt_coords, - 'mask': mask, + "pred_coords": pred_coords, + "gt_coords": gt_coords, + "mask": mask, } - if 'bbox' in self.norm_item: - assert 'bboxes' in gt, 'The ground truth data info do not ' \ - 'have the expected normalized_item ``"bbox"``.' + if "bbox" in self.norm_item: + assert "bboxes" in gt, "The ground truth data info do not " 'have the expected normalized_item ``"bbox"``.' # ground truth bboxes, [1, 4] - bbox_size_ = np.max(gt['bboxes'][0][2:] - gt['bboxes'][0][:2]) + bbox_size_ = np.max(gt["bboxes"][0][2:] - gt["bboxes"][0][:2]) bbox_size = np.array([bbox_size_, bbox_size_]).reshape(-1, 2) - result['bbox_size'] = bbox_size + result["bbox_size"] = bbox_size - if 'head' in self.norm_item: - assert 'head_size' in gt, 'The ground truth data info do ' \ - 'not have the expected normalized_item ``"head_size"``.' + if "head" in self.norm_item: + assert "head_size" in gt, "The ground truth data info do " 'not have the expected normalized_item ``"head_size"``.' # ground truth bboxes - head_size_ = gt['head_size'] + head_size_ = gt["head_size"] head_size = np.array([head_size_, head_size_]).reshape(-1, 2) - result['head_size'] = head_size + result["head_size"] = head_size - if 'torso' in self.norm_item: + if "torso" in self.norm_item: # used in JhmdbDataset torso_size_ = np.linalg.norm(gt_coords[0][4] - gt_coords[0][5]) if torso_size_ < 1: - torso_size_ = np.linalg.norm(pred_coords[0][4] - - pred_coords[0][5]) - warnings.warn('Ground truth torso size < 1. ' - 'Use torso size from predicted ' - 'keypoint results instead.') - torso_size = np.array([torso_size_, - torso_size_]).reshape(-1, 2) - result['torso_size'] = torso_size + torso_size_ = np.linalg.norm(pred_coords[0][4] - pred_coords[0][5]) + warnings.warn("Ground truth torso size < 1. " "Use torso size from predicted " "keypoint results instead.") + torso_size = np.array([torso_size_, torso_size_]).reshape(-1, 2) + result["torso_size"] = torso_size self.results.append(result) @@ -164,46 +154,36 @@ class PCKAccuracy(BaseMetric): logger: MMLogger = MMLogger.get_current_instance() # pred_coords: [N, K, D] - pred_coords = np.concatenate( - [result['pred_coords'] for result in results]) + pred_coords = np.concatenate([result["pred_coords"] for result in results]) # gt_coords: [N, K, D] - gt_coords = np.concatenate([result['gt_coords'] for result in results]) + gt_coords = np.concatenate([result["gt_coords"] for result in results]) # mask: [N, K] - mask = np.concatenate([result['mask'] for result in results]) + mask = np.concatenate([result["mask"] for result in results]) metrics = dict() - if 'bbox' in self.norm_item: - norm_size_bbox = np.concatenate( - [result['bbox_size'] for result in results]) + if "bbox" in self.norm_item: + norm_size_bbox = np.concatenate([result["bbox_size"] for result in results]) - logger.info(f'Evaluating {self.__class__.__name__} ' - f'(normalized by ``"bbox_size"``)...') + logger.info(f"Evaluating {self.__class__.__name__} " f'(normalized by ``"bbox_size"``)...') - _, pck, _ = keypoint_pck_accuracy(pred_coords, gt_coords, mask, - self.thr, norm_size_bbox) - metrics['PCK'] = pck + _, pck, _ = keypoint_pck_accuracy(pred_coords, gt_coords, mask, self.thr, norm_size_bbox) + metrics["PCK"] = pck - if 'head' in self.norm_item: - norm_size_head = np.concatenate( - [result['head_size'] for result in results]) + if "head" in self.norm_item: + norm_size_head = np.concatenate([result["head_size"] for result in results]) - logger.info(f'Evaluating {self.__class__.__name__} ' - f'(normalized by ``"head_size"``)...') + logger.info(f"Evaluating {self.__class__.__name__} " f'(normalized by ``"head_size"``)...') - _, pckh, _ = keypoint_pck_accuracy(pred_coords, gt_coords, mask, - self.thr, norm_size_head) - metrics['PCKh'] = pckh + _, pckh, _ = keypoint_pck_accuracy(pred_coords, gt_coords, mask, self.thr, norm_size_head) + metrics["PCKh"] = pckh - if 'torso' in self.norm_item: - norm_size_torso = np.concatenate( - [result['torso_size'] for result in results]) + if "torso" in self.norm_item: + norm_size_torso = np.concatenate([result["torso_size"] for result in results]) - logger.info(f'Evaluating {self.__class__.__name__} ' - f'(normalized by ``"torso_size"``)...') + logger.info(f"Evaluating {self.__class__.__name__} " f'(normalized by ``"torso_size"``)...') - _, tpck, _ = keypoint_pck_accuracy(pred_coords, gt_coords, mask, - self.thr, norm_size_torso) - metrics['tPCK'] = tpck + _, tpck, _ = keypoint_pck_accuracy(pred_coords, gt_coords, mask, self.thr, norm_size_torso) + metrics["tPCK"] = tpck return metrics @@ -268,16 +248,10 @@ class MpiiPCKAccuracy(PCKAccuracy): 'Ankle PCK': 100.0, 'PCK': 100.0, 'PCK@0.1': 100.0} """ - def __init__(self, - thr: float = 0.5, - norm_item: Union[str, Sequence[str]] = 'head', - collect_device: str = 'cpu', - prefix: Optional[str] = None) -> None: - super().__init__( - thr=thr, - norm_item=norm_item, - collect_device=collect_device, - prefix=prefix) + def __init__( + self, thr: float = 0.5, norm_item: Union[str, Sequence[str]] = "head", collect_device: str = "cpu", prefix: Optional[str] = None + ) -> None: + super().__init__(thr=thr, norm_item=norm_item, collect_device=collect_device, prefix=prefix) def compute_metrics(self, results: list) -> Dict[str, float]: """Compute the metrics from processed results. @@ -303,39 +277,33 @@ class MpiiPCKAccuracy(PCKAccuracy): logger: MMLogger = MMLogger.get_current_instance() # pred_coords: [N, K, D] - pred_coords = np.concatenate( - [result['pred_coords'] for result in results]) + pred_coords = np.concatenate([result["pred_coords"] for result in results]) # gt_coords: [N, K, D] - gt_coords = np.concatenate([result['gt_coords'] for result in results]) + gt_coords = np.concatenate([result["gt_coords"] for result in results]) # mask: [N, K] - mask = np.concatenate([result['mask'] for result in results]) + mask = np.concatenate([result["mask"] for result in results]) # MPII uses matlab format, gt index is 1-based, # convert 0-based index to 1-based index pred_coords = pred_coords + 1.0 metrics = {} - if 'head' in self.norm_item: - norm_size_head = np.concatenate( - [result['head_size'] for result in results]) + if "head" in self.norm_item: + norm_size_head = np.concatenate([result["head_size"] for result in results]) - logger.info(f'Evaluating {self.__class__.__name__} ' - f'(normalized by ``"head_size"``)...') + logger.info(f"Evaluating {self.__class__.__name__} " f'(normalized by ``"head_size"``)...') - pck_p, _, _ = keypoint_pck_accuracy(pred_coords, gt_coords, mask, - self.thr, norm_size_head) + pck_p, _, _ = keypoint_pck_accuracy(pred_coords, gt_coords, mask, self.thr, norm_size_head) jnt_count = np.sum(mask, axis=0) - PCKh = 100. * pck_p + PCKh = 100.0 * pck_p rng = np.arange(0, 0.5 + 0.01, 0.01) pckAll = np.zeros((len(rng), 16), dtype=np.float32) for r, threshold in enumerate(rng): - _pck, _, _ = keypoint_pck_accuracy(pred_coords, gt_coords, - mask, threshold, - norm_size_head) - pckAll[r, :] = 100. * _pck + _pck, _, _ = keypoint_pck_accuracy(pred_coords, gt_coords, mask, threshold, norm_size_head) + pckAll[r, :] = 100.0 * _pck PCKh = np.ma.array(PCKh, mask=False) PCKh.mask[6:8] = True @@ -353,15 +321,15 @@ class MpiiPCKAccuracy(PCKAccuracy): # lkne 4 rkne 1 # lank 5 rank 0 stats = { - 'Head PCK': PCKh[9], - 'Shoulder PCK': 0.5 * (PCKh[13] + PCKh[12]), - 'Elbow PCK': 0.5 * (PCKh[14] + PCKh[11]), - 'Wrist PCK': 0.5 * (PCKh[15] + PCKh[10]), - 'Hip PCK': 0.5 * (PCKh[3] + PCKh[2]), - 'Knee PCK': 0.5 * (PCKh[4] + PCKh[1]), - 'Ankle PCK': 0.5 * (PCKh[5] + PCKh[0]), - 'PCK': np.sum(PCKh * jnt_ratio), - 'PCK@0.1': np.sum(pckAll[10, :] * jnt_ratio) + "Head PCK": PCKh[9], + "Shoulder PCK": 0.5 * (PCKh[13] + PCKh[12]), + "Elbow PCK": 0.5 * (PCKh[14] + PCKh[11]), + "Wrist PCK": 0.5 * (PCKh[15] + PCKh[10]), + "Hip PCK": 0.5 * (PCKh[3] + PCKh[2]), + "Knee PCK": 0.5 * (PCKh[4] + PCKh[1]), + "Ankle PCK": 0.5 * (PCKh[5] + PCKh[0]), + "PCK": np.sum(PCKh * jnt_ratio), + "PCK@0.1": np.sum(pckAll[10, :] * jnt_ratio), } for stats_name, stat in stats.items(): @@ -433,16 +401,10 @@ class JhmdbPCKAccuracy(PCKAccuracy): 'Hip tPCK': 1.0, 'Knee tPCK': 1.0, 'Ank tPCK': 1.0, 'tPCK': 1.0} """ - def __init__(self, - thr: float = 0.05, - norm_item: Union[str, Sequence[str]] = 'bbox', - collect_device: str = 'cpu', - prefix: Optional[str] = None) -> None: - super().__init__( - thr=thr, - norm_item=norm_item, - collect_device=collect_device, - prefix=prefix) + def __init__( + self, thr: float = 0.05, norm_item: Union[str, Sequence[str]] = "bbox", collect_device: str = "cpu", prefix: Optional[str] = None + ) -> None: + super().__init__(thr=thr, norm_item=norm_item, collect_device=collect_device, prefix=prefix) def compute_metrics(self, results: list) -> Dict[str, float]: """Compute the metrics from processed results. @@ -477,56 +439,49 @@ class JhmdbPCKAccuracy(PCKAccuracy): logger: MMLogger = MMLogger.get_current_instance() # pred_coords: [N, K, D] - pred_coords = np.concatenate( - [result['pred_coords'] for result in results]) + pred_coords = np.concatenate([result["pred_coords"] for result in results]) # gt_coords: [N, K, D] - gt_coords = np.concatenate([result['gt_coords'] for result in results]) + gt_coords = np.concatenate([result["gt_coords"] for result in results]) # mask: [N, K] - mask = np.concatenate([result['mask'] for result in results]) + mask = np.concatenate([result["mask"] for result in results]) metrics = dict() - if 'bbox' in self.norm_item: - norm_size_bbox = np.concatenate( - [result['bbox_size'] for result in results]) + if "bbox" in self.norm_item: + norm_size_bbox = np.concatenate([result["bbox_size"] for result in results]) - logger.info(f'Evaluating {self.__class__.__name__} ' - f'(normalized by ``"bbox_size"``)...') + logger.info(f"Evaluating {self.__class__.__name__} " f'(normalized by ``"bbox_size"``)...') - pck_p, pck, _ = keypoint_pck_accuracy(pred_coords, gt_coords, mask, - self.thr, norm_size_bbox) + pck_p, pck, _ = keypoint_pck_accuracy(pred_coords, gt_coords, mask, self.thr, norm_size_bbox) stats = { - 'Head PCK': pck_p[2], - 'Sho PCK': 0.5 * pck_p[3] + 0.5 * pck_p[4], - 'Elb PCK': 0.5 * pck_p[7] + 0.5 * pck_p[8], - 'Wri PCK': 0.5 * pck_p[11] + 0.5 * pck_p[12], - 'Hip PCK': 0.5 * pck_p[5] + 0.5 * pck_p[6], - 'Knee PCK': 0.5 * pck_p[9] + 0.5 * pck_p[10], - 'Ank PCK': 0.5 * pck_p[13] + 0.5 * pck_p[14], - 'PCK': pck + "Head PCK": pck_p[2], + "Sho PCK": 0.5 * pck_p[3] + 0.5 * pck_p[4], + "Elb PCK": 0.5 * pck_p[7] + 0.5 * pck_p[8], + "Wri PCK": 0.5 * pck_p[11] + 0.5 * pck_p[12], + "Hip PCK": 0.5 * pck_p[5] + 0.5 * pck_p[6], + "Knee PCK": 0.5 * pck_p[9] + 0.5 * pck_p[10], + "Ank PCK": 0.5 * pck_p[13] + 0.5 * pck_p[14], + "PCK": pck, } for stats_name, stat in stats.items(): metrics[stats_name] = stat - if 'torso' in self.norm_item: - norm_size_torso = np.concatenate( - [result['torso_size'] for result in results]) + if "torso" in self.norm_item: + norm_size_torso = np.concatenate([result["torso_size"] for result in results]) - logger.info(f'Evaluating {self.__class__.__name__} ' - f'(normalized by ``"torso_size"``)...') + logger.info(f"Evaluating {self.__class__.__name__} " f'(normalized by ``"torso_size"``)...') - pck_p, pck, _ = keypoint_pck_accuracy(pred_coords, gt_coords, mask, - self.thr, norm_size_torso) + pck_p, pck, _ = keypoint_pck_accuracy(pred_coords, gt_coords, mask, self.thr, norm_size_torso) stats = { - 'Head tPCK': pck_p[2], - 'Sho tPCK': 0.5 * pck_p[3] + 0.5 * pck_p[4], - 'Elb tPCK': 0.5 * pck_p[7] + 0.5 * pck_p[8], - 'Wri tPCK': 0.5 * pck_p[11] + 0.5 * pck_p[12], - 'Hip tPCK': 0.5 * pck_p[5] + 0.5 * pck_p[6], - 'Knee tPCK': 0.5 * pck_p[9] + 0.5 * pck_p[10], - 'Ank tPCK': 0.5 * pck_p[13] + 0.5 * pck_p[14], - 'tPCK': pck + "Head tPCK": pck_p[2], + "Sho tPCK": 0.5 * pck_p[3] + 0.5 * pck_p[4], + "Elb tPCK": 0.5 * pck_p[7] + 0.5 * pck_p[8], + "Wri tPCK": 0.5 * pck_p[11] + 0.5 * pck_p[12], + "Hip tPCK": 0.5 * pck_p[5] + 0.5 * pck_p[6], + "Knee tPCK": 0.5 * pck_p[9] + 0.5 * pck_p[10], + "Ank tPCK": 0.5 * pck_p[13] + 0.5 * pck_p[14], + "tPCK": pck, } for stats_name, stat in stats.items(): @@ -561,17 +516,12 @@ class AUC(BaseMetric): will be used instead. Default: ``None``. """ - def __init__(self, - norm_factor: float = 30, - num_thrs: int = 20, - collect_device: str = 'cpu', - prefix: Optional[str] = None) -> None: + def __init__(self, norm_factor: float = 30, num_thrs: int = 20, collect_device: str = "cpu", prefix: Optional[str] = None) -> None: super().__init__(collect_device=collect_device, prefix=prefix) self.norm_factor = norm_factor self.num_thrs = num_thrs - def process(self, data_batch: Sequence[dict], - data_samples: Sequence[dict]) -> None: + def process(self, data_batch: Sequence[dict], data_samples: Sequence[dict]) -> None: """Process one batch of data samples and predictions. The processed results should be stored in ``self.results``, which will be used to compute the metrics when all batches have been processed. @@ -584,21 +534,21 @@ class AUC(BaseMetric): """ for data_sample in data_samples: # predicted keypoints coordinates, [1, K, D] - pred_coords = data_sample['pred_instances']['keypoints'] + pred_coords = data_sample["pred_instances"]["keypoints"] # ground truth data_info - gt = data_sample['gt_instances'] + gt = data_sample["gt_instances"] # ground truth keypoints coordinates, [1, K, D] - gt_coords = gt['keypoints'] + gt_coords = gt["keypoints"] # ground truth keypoints_visible, [1, K, 1] - mask = gt['keypoints_visible'].astype(bool) + mask = gt["keypoints_visible"].astype(bool) if mask.ndim == 3: mask = mask[:, :, 0] mask = mask.reshape(1, -1) result = { - 'pred_coords': pred_coords, - 'gt_coords': gt_coords, - 'mask': mask, + "pred_coords": pred_coords, + "gt_coords": gt_coords, + "mask": mask, } self.results.append(result) @@ -616,20 +566,18 @@ class AUC(BaseMetric): logger: MMLogger = MMLogger.get_current_instance() # pred_coords: [N, K, D] - pred_coords = np.concatenate( - [result['pred_coords'] for result in results]) + pred_coords = np.concatenate([result["pred_coords"] for result in results]) # gt_coords: [N, K, D] - gt_coords = np.concatenate([result['gt_coords'] for result in results]) + gt_coords = np.concatenate([result["gt_coords"] for result in results]) # mask: [N, K] - mask = np.concatenate([result['mask'] for result in results]) + mask = np.concatenate([result["mask"] for result in results]) - logger.info(f'Evaluating {self.__class__.__name__}...') + logger.info(f"Evaluating {self.__class__.__name__}...") - auc = keypoint_auc(pred_coords, gt_coords, mask, self.norm_factor, - self.num_thrs) + auc = keypoint_auc(pred_coords, gt_coords, mask, self.norm_factor, self.num_thrs) metrics = dict() - metrics['AUC'] = auc + metrics["AUC"] = auc return metrics @@ -655,8 +603,7 @@ class EPE(BaseMetric): will be used instead. Default: ``None``. """ - def process(self, data_batch: Sequence[dict], - data_samples: Sequence[dict]) -> None: + def process(self, data_batch: Sequence[dict], data_samples: Sequence[dict]) -> None: """Process one batch of data samples and predictions. The processed results should be stored in ``self.results``, which will be used to compute the metrics when all batches have been processed. @@ -669,21 +616,21 @@ class EPE(BaseMetric): """ for data_sample in data_samples: # predicted keypoints coordinates, [1, K, D] - pred_coords = data_sample['pred_instances']['keypoints'] + pred_coords = data_sample["pred_instances"]["keypoints"] # ground truth data_info - gt = data_sample['gt_instances'] + gt = data_sample["gt_instances"] # ground truth keypoints coordinates, [1, K, D] - gt_coords = gt['keypoints'] + gt_coords = gt["keypoints"] # ground truth keypoints_visible, [1, K, 1] - mask = gt['keypoints_visible'].astype(bool) + mask = gt["keypoints_visible"].astype(bool) if mask.ndim == 3: mask = mask[:, :, 0] mask = mask.reshape(1, -1) result = { - 'pred_coords': pred_coords, - 'gt_coords': gt_coords, - 'mask': mask, + "pred_coords": pred_coords, + "gt_coords": gt_coords, + "mask": mask, } self.results.append(result) @@ -701,19 +648,18 @@ class EPE(BaseMetric): logger: MMLogger = MMLogger.get_current_instance() # pred_coords: [N, K, D] - pred_coords = np.concatenate( - [result['pred_coords'] for result in results]) + pred_coords = np.concatenate([result["pred_coords"] for result in results]) # gt_coords: [N, K, D] - gt_coords = np.concatenate([result['gt_coords'] for result in results]) + gt_coords = np.concatenate([result["gt_coords"] for result in results]) # mask: [N, K] - mask = np.concatenate([result['mask'] for result in results]) + mask = np.concatenate([result["mask"] for result in results]) - logger.info(f'Evaluating {self.__class__.__name__}...') + logger.info(f"Evaluating {self.__class__.__name__}...") epe = keypoint_epe(pred_coords, gt_coords, mask) metrics = dict() - metrics['EPE'] = epe + metrics["EPE"] = epe return metrics @@ -759,43 +705,45 @@ class NME(BaseMetric): DEFAULT_KEYPOINT_INDICES = { # horse10: corresponding to `nose` and `eye` keypoints - 'horse10': [0, 1], + "horse10": [0, 1], # 300w: corresponding to `right-most` and `left-most` eye keypoints - '300w': [36, 45], + "300w": [36, 45], # coco_wholebody_face corresponding to `right-most` and `left-most` # eye keypoints - 'coco_wholebody_face': [36, 45], + "coco_wholebody_face": [36, 45], # cofw: corresponding to `right-most` and `left-most` eye keypoints - 'cofw': [8, 9], + "cofw": [8, 9], # wflw: corresponding to `right-most` and `left-most` eye keypoints - 'wflw': [60, 72], + "wflw": [60, 72], # lapa: corresponding to `right-most` and `left-most` eye keypoints - 'lapa': [66, 79], + "lapa": [66, 79], } - def __init__(self, - norm_mode: str, - norm_item: Optional[str] = None, - keypoint_indices: Optional[Sequence[int]] = None, - collect_device: str = 'cpu', - prefix: Optional[str] = None) -> None: + def __init__( + self, + norm_mode: str, + norm_item: Optional[str] = None, + keypoint_indices: Optional[Sequence[int]] = None, + collect_device: str = "cpu", + prefix: Optional[str] = None, + ) -> None: super().__init__(collect_device=collect_device, prefix=prefix) - allowed_norm_modes = ['use_norm_item', 'keypoint_distance'] + allowed_norm_modes = ["use_norm_item", "keypoint_distance"] if norm_mode not in allowed_norm_modes: - raise KeyError("`norm_mode` should be 'use_norm_item' or " - f"'keypoint_distance', but got {norm_mode}.") + raise KeyError("`norm_mode` should be 'use_norm_item' or " f"'keypoint_distance', but got {norm_mode}.") self.norm_mode = norm_mode - if self.norm_mode == 'use_norm_item': + if self.norm_mode == "use_norm_item": if not norm_item: - raise KeyError('`norm_mode` is set to `"use_norm_item"`, ' - 'please specify the `norm_item` in the ' - 'datainfo used as the normalization factor.') + raise KeyError( + '`norm_mode` is set to `"use_norm_item"`, ' + "please specify the `norm_item` in the " + "datainfo used as the normalization factor." + ) self.norm_item = norm_item self.keypoint_indices = keypoint_indices - def process(self, data_batch: Sequence[dict], - data_samples: Sequence[dict]) -> None: + def process(self, data_batch: Sequence[dict], data_samples: Sequence[dict]) -> None: """Process one batch of data samples and predictions. The processed results should be stored in ``self.results``, which will be used to compute the metrics when all batches have been processed. @@ -808,39 +756,37 @@ class NME(BaseMetric): """ for data_sample in data_samples: # predicted keypoints coordinates, [1, K, D] - pred_coords = data_sample['pred_instances']['keypoints'] + pred_coords = data_sample["pred_instances"]["keypoints"] # ground truth data_info - gt = data_sample['gt_instances'] + gt = data_sample["gt_instances"] # ground truth keypoints coordinates, [1, K, D] - gt_coords = gt['keypoints'] + gt_coords = gt["keypoints"] # ground truth keypoints_visible, [1, K, 1] - mask = gt['keypoints_visible'].astype(bool) + mask = gt["keypoints_visible"].astype(bool) if mask.ndim == 3: mask = mask[:, :, 0] mask = mask.reshape(1, -1) result = { - 'pred_coords': pred_coords, - 'gt_coords': gt_coords, - 'mask': mask, + "pred_coords": pred_coords, + "gt_coords": gt_coords, + "mask": mask, } if self.norm_item: - if self.norm_item == 'bbox_size': - assert 'bboxes' in gt, 'The ground truth data info do ' \ - 'not have the item ``bboxes`` for expected ' \ - 'normalized_item ``"bbox_size"``.' + if self.norm_item == "bbox_size": + assert "bboxes" in gt, ( + "The ground truth data info do " "not have the item ``bboxes`` for expected " 'normalized_item ``"bbox_size"``.' + ) # ground truth bboxes, [1, 4] - bbox_size = np.max(gt['bboxes'][0][2:] - - gt['bboxes'][0][:2]) - result['bbox_size'] = np.array([bbox_size]).reshape(-1, 1) + bbox_size = np.max(gt["bboxes"][0][2:] - gt["bboxes"][0][:2]) + result["bbox_size"] = np.array([bbox_size]).reshape(-1, 1) else: - assert self.norm_item in gt, f'The ground truth data ' \ - f'info do not have the expected normalized factor ' \ - f'"{self.norm_item}"' + assert self.norm_item in gt, ( + f"The ground truth data " f"info do not have the expected normalized factor " f'"{self.norm_item}"' + ) # ground truth norm_item - result[self.norm_item] = np.array( - gt[self.norm_item]).reshape([-1, 1]) + result[self.norm_item] = np.array(gt[self.norm_item]).reshape([-1, 1]) self.results.append(result) @@ -857,49 +803,44 @@ class NME(BaseMetric): logger: MMLogger = MMLogger.get_current_instance() # pred_coords: [N, K, D] - pred_coords = np.concatenate( - [result['pred_coords'] for result in results]) + pred_coords = np.concatenate([result["pred_coords"] for result in results]) # gt_coords: [N, K, D] - gt_coords = np.concatenate([result['gt_coords'] for result in results]) + gt_coords = np.concatenate([result["gt_coords"] for result in results]) # mask: [N, K] - mask = np.concatenate([result['mask'] for result in results]) + mask = np.concatenate([result["mask"] for result in results]) - logger.info(f'Evaluating {self.__class__.__name__}...') + logger.info(f"Evaluating {self.__class__.__name__}...") metrics = dict() - if self.norm_mode == 'use_norm_item': - normalize_factor_ = np.concatenate( - [result[self.norm_item] for result in results]) + if self.norm_mode == "use_norm_item": + normalize_factor_ = np.concatenate([result[self.norm_item] for result in results]) # normalize_factor: [N, 2] normalize_factor = np.tile(normalize_factor_, [1, 2]) nme = keypoint_nme(pred_coords, gt_coords, mask, normalize_factor) - metrics['NME'] = nme + metrics["NME"] = nme else: if self.keypoint_indices is None: # use default keypoint_indices in some datasets - dataset_name = self.dataset_meta['dataset_name'] + dataset_name = self.dataset_meta["dataset_name"] if dataset_name not in self.DEFAULT_KEYPOINT_INDICES: raise KeyError( - '`norm_mode` is set to `keypoint_distance`, and the ' - 'keypoint_indices is set to None, can not find the ' - 'keypoint_indices in `DEFAULT_KEYPOINT_INDICES`, ' - 'please specify `keypoint_indices` appropriately.') - self.keypoint_indices = self.DEFAULT_KEYPOINT_INDICES[ - dataset_name] + "`norm_mode` is set to `keypoint_distance`, and the " + "keypoint_indices is set to None, can not find the " + "keypoint_indices in `DEFAULT_KEYPOINT_INDICES`, " + "please specify `keypoint_indices` appropriately." + ) + self.keypoint_indices = self.DEFAULT_KEYPOINT_INDICES[dataset_name] else: - assert len(self.keypoint_indices) == 2, 'The keypoint '\ - 'indices used for normalization should be a pair.' - keypoint_id2name = self.dataset_meta['keypoint_id2name'] - dataset_name = self.dataset_meta['dataset_name'] + assert len(self.keypoint_indices) == 2, "The keypoint " "indices used for normalization should be a pair." + keypoint_id2name = self.dataset_meta["keypoint_id2name"] + dataset_name = self.dataset_meta["dataset_name"] for idx in self.keypoint_indices: - assert idx in keypoint_id2name, f'The {dataset_name} '\ - f'dataset does not contain the required '\ - f'{idx}-th keypoint.' + assert idx in keypoint_id2name, f"The {dataset_name} " f"dataset does not contain the required " f"{idx}-th keypoint." # normalize_factor: [N, 2] normalize_factor = self._get_normalize_factor(gt_coords=gt_coords) nme = keypoint_nme(pred_coords, gt_coords, mask, normalize_factor) - metrics['NME'] = nme + metrics["NME"] = nme return metrics @@ -916,9 +857,6 @@ class NME(BaseMetric): """ idx1, idx2 = self.keypoint_indices - interocular = np.linalg.norm( - gt_coords[:, idx1, :] - gt_coords[:, idx2, :], - axis=1, - keepdims=True) + interocular = np.linalg.norm(gt_coords[:, idx1, :] - gt_coords[:, idx2, :], axis=1, keepdims=True) return np.tile(interocular, [1, 2]) diff --git a/mmpose/evaluation/metrics/keypoint_3d_metrics.py b/mmpose/evaluation/metrics/keypoint_3d_metrics.py index fb3447bb3ff4a94f192a912c17062f048e838b98..87446bf3773d0d563259883fc98745ec91a0b091 100644 --- a/mmpose/evaluation/metrics/keypoint_3d_metrics.py +++ b/mmpose/evaluation/metrics/keypoint_3d_metrics.py @@ -8,6 +8,7 @@ from mmengine.evaluator import BaseMetric from mmengine.logging import MMLogger from mmpose.registry import METRICS + from ..functional import keypoint_mpjpe @@ -42,24 +43,18 @@ class MPJPE(BaseMetric): to be skipped. Default: []. """ - ALIGNMENT = {'mpjpe': 'none', 'p-mpjpe': 'procrustes', 'n-mpjpe': 'scale'} + ALIGNMENT = {"mpjpe": "none", "p-mpjpe": "procrustes", "n-mpjpe": "scale"} - def __init__(self, - mode: str = 'mpjpe', - collect_device: str = 'cpu', - prefix: Optional[str] = None, - skip_list: List[str] = []) -> None: + def __init__(self, mode: str = "mpjpe", collect_device: str = "cpu", prefix: Optional[str] = None, skip_list: List[str] = []) -> None: super().__init__(collect_device=collect_device, prefix=prefix) allowed_modes = self.ALIGNMENT.keys() if mode not in allowed_modes: - raise KeyError("`mode` should be 'mpjpe', 'p-mpjpe', or " - f"'n-mpjpe', but got '{mode}'.") + raise KeyError("`mode` should be 'mpjpe', 'p-mpjpe', or " f"'n-mpjpe', but got '{mode}'.") self.mode = mode self.skip_list = skip_list - def process(self, data_batch: Sequence[dict], - data_samples: Sequence[dict]) -> None: + def process(self, data_batch: Sequence[dict], data_samples: Sequence[dict]) -> None: """Process one batch of data samples and predictions. The processed results should be stored in ``self.results``, which will be used to compute the metrics when all batches have been processed. @@ -72,32 +67,26 @@ class MPJPE(BaseMetric): """ for data_sample in data_samples: # predicted keypoints coordinates, [T, K, D] - pred_coords = data_sample['pred_instances']['keypoints'] + pred_coords = data_sample["pred_instances"]["keypoints"] if pred_coords.ndim == 4: pred_coords = np.squeeze(pred_coords, axis=0) # ground truth data_info - gt = data_sample['gt_instances'] + gt = data_sample["gt_instances"] # ground truth keypoints coordinates, [T, K, D] - gt_coords = gt['lifting_target'] + gt_coords = gt["lifting_target"] # ground truth keypoints_visible, [T, K, 1] - mask = gt['lifting_target_visible'].astype(bool).reshape( - gt_coords.shape[0], -1) + mask = gt["lifting_target_visible"].astype(bool).reshape(gt_coords.shape[0], -1) # instance action - img_path = data_sample['target_img_path'][0] - _, rest = osp.basename(img_path).split('_', 1) - action, _ = rest.split('.', 1) + img_path = data_sample["target_img_path"][0] + _, rest = osp.basename(img_path).split("_", 1) + action, _ = rest.split(".", 1) actions = np.array([action] * gt_coords.shape[0]) - subj_act = osp.basename(img_path).split('.')[0] + subj_act = osp.basename(img_path).split(".")[0] if subj_act in self.skip_list: continue - result = { - 'pred_coords': pred_coords, - 'gt_coords': gt_coords, - 'mask': mask, - 'actions': actions - } + result = {"pred_coords": pred_coords, "gt_coords": gt_coords, "mask": mask, "actions": actions} self.results.append(result) @@ -114,30 +103,28 @@ class MPJPE(BaseMetric): logger: MMLogger = MMLogger.get_current_instance() # pred_coords: [N, K, D] - pred_coords = np.concatenate( - [result['pred_coords'] for result in results]) + pred_coords = np.concatenate([result["pred_coords"] for result in results]) # gt_coords: [N, K, D] - gt_coords = np.concatenate([result['gt_coords'] for result in results]) + gt_coords = np.concatenate([result["gt_coords"] for result in results]) # mask: [N, K] - mask = np.concatenate([result['mask'] for result in results]) + mask = np.concatenate([result["mask"] for result in results]) # action_category_indices: Dict[List[int]] action_category_indices = defaultdict(list) - actions = np.concatenate([result['actions'] for result in results]) + actions = np.concatenate([result["actions"] for result in results]) for idx, action in enumerate(actions): - action_category = action.split('_')[0] + action_category = action.split("_")[0] action_category_indices[action_category].append(idx) error_name = self.mode.upper() - logger.info(f'Evaluating {self.mode.upper()}...') + logger.info(f"Evaluating {self.mode.upper()}...") metrics = dict() - metrics[error_name] = keypoint_mpjpe(pred_coords, gt_coords, mask, - self.ALIGNMENT[self.mode]) + metrics[error_name] = keypoint_mpjpe(pred_coords, gt_coords, mask, self.ALIGNMENT[self.mode]) for action_category, indices in action_category_indices.items(): - metrics[f'{error_name}_{action_category}'] = keypoint_mpjpe( - pred_coords[indices], gt_coords[indices], mask[indices], - self.ALIGNMENT[self.mode]) + metrics[f"{error_name}_{action_category}"] = keypoint_mpjpe( + pred_coords[indices], gt_coords[indices], mask[indices], self.ALIGNMENT[self.mode] + ) return metrics diff --git a/mmpose/evaluation/metrics/keypoint_partition_metric.py b/mmpose/evaluation/metrics/keypoint_partition_metric.py index fb30eca0d57f68e94cba93deec1f63bd333468aa..dba72e9dafc04c6c3e8e050f53b987efbb9d8297 100644 --- a/mmpose/evaluation/metrics/keypoint_partition_metric.py +++ b/mmpose/evaluation/metrics/keypoint_partition_metric.py @@ -68,71 +68,66 @@ class KeypointPartitionMetric(BaseMetric): ) -> None: super().__init__() # check metric type - supported_metric_types = [ - 'CocoMetric', 'PCKAccuracy', 'AUC', 'EPE', 'NME' - ] - if metric['type'] not in supported_metric_types: + supported_metric_types = ["CocoMetric", "PCKAccuracy", "AUC", "EPE", "NME"] + if metric["type"] not in supported_metric_types: raise ValueError( - 'Metrics supported by KeypointPartitionMetric are CocoMetric, ' - 'PCKAccuracy, AUC, EPE and NME, ' - f"but got {metric['type']}") + "Metrics supported by KeypointPartitionMetric are CocoMetric, " + "PCKAccuracy, AUC, EPE and NME, " + f"but got {metric['type']}" + ) # check CocoMetric arguments - if metric['type'] == 'CocoMetric': - if 'ann_file' in metric: + if metric["type"] == "CocoMetric": + if "ann_file" in metric: warnings.warn( - 'KeypointPartitionMetric does not support the ann_file ' - 'argument of CocoMetric, this argument will be ignored.') - metric['ann_file'] = None - score_mode = metric.get('score_mode', 'bbox_keypoint') - if score_mode != 'bbox': + "KeypointPartitionMetric does not support the ann_file " "argument of CocoMetric, this argument will be ignored." + ) + metric["ann_file"] = None + score_mode = metric.get("score_mode", "bbox_keypoint") + if score_mode != "bbox": warnings.warn( - 'When using KeypointPartitionMetric with CocoMetric, ' + "When using KeypointPartitionMetric with CocoMetric, " "if score_mode is not 'bbox', pose scores will be " "calculated part by part rather than by 'wholebody'. " - 'Therefore, this may produce results different from the ' - 'CocoWholebodyMetric.') - nms_mode = metric.get('nms_mode', 'oks_nms') - if nms_mode != 'none': + "Therefore, this may produce results different from the " + "CocoWholebodyMetric." + ) + nms_mode = metric.get("nms_mode", "oks_nms") + if nms_mode != "none": warnings.warn( - 'When using KeypointPartitionMetric with CocoMetric, ' - 'oks_nms and soft_oks_nms will be calculated part by part ' + "When using KeypointPartitionMetric with CocoMetric, " + "oks_nms and soft_oks_nms will be calculated part by part " "rather than by 'wholebody'. Therefore, this may produce " - 'results different from the CocoWholebodyMetric.') + "results different from the CocoWholebodyMetric." + ) # check PCKAccuracy arguments - if metric['type'] == 'PCKAccuracy': - norm_item = metric.get('norm_item', 'bbox') - if norm_item == 'torso' or 'torso' in norm_item: + if metric["type"] == "PCKAccuracy": + norm_item = metric.get("norm_item", "bbox") + if norm_item == "torso" or "torso" in norm_item: warnings.warn( - 'norm_item torso is used in JhmdbDataset, it may not be ' - 'compatible with other datasets, use at your own risk.') + "norm_item torso is used in JhmdbDataset, it may not be " "compatible with other datasets, use at your own risk." + ) # check NME arguments - if metric['type'] == 'NME': - assert 'norm_mode' in metric, \ - 'Missing norm_mode required by the NME metric.' - if metric['norm_mode'] != 'use_norm_item': - raise ValueError( - "NME norm_mode 'keypoint_distance' is incompatible with " - 'KeypointPartitionMetric.') + if metric["type"] == "NME": + assert "norm_mode" in metric, "Missing norm_mode required by the NME metric." + if metric["norm_mode"] != "use_norm_item": + raise ValueError("NME norm_mode 'keypoint_distance' is incompatible with " "KeypointPartitionMetric.") # check partitions - assert len(partitions) > 0, 'There should be at least one partition.' + assert len(partitions) > 0, "There should be at least one partition." for partition_name, partition in partitions.items(): - assert isinstance(partition, Sequence), \ - 'Each partition should be a sequence.' - assert len(partition) > 0, \ - 'Each partition should have at least one element.' + assert isinstance(partition, Sequence), "Each partition should be a sequence." + assert len(partition) > 0, "Each partition should have at least one element." self.partitions = partitions # instantiate metrics for each partition self.metrics = {} for partition_name in partitions.keys(): _metric = deepcopy(metric) - if 'outfile_prefix' in _metric: - _metric['outfile_prefix'] = _metric[ - 'outfile_prefix'] + '.' + partition_name + if "outfile_prefix" in _metric: + _metric["outfile_prefix"] = _metric["outfile_prefix"] + "." + partition_name self.metrics[partition_name] = METRICS.build(_metric) @BaseMetric.dataset_meta.setter @@ -142,46 +137,34 @@ class KeypointPartitionMetric(BaseMetric): # sigmas required by coco metric have to be split as well for partition_name, keypoint_ids in self.partitions.items(): _dataset_meta = deepcopy(dataset_meta) - _dataset_meta['num_keypoints'] = len(keypoint_ids) - _dataset_meta['sigmas'] = _dataset_meta['sigmas'][keypoint_ids] + _dataset_meta["num_keypoints"] = len(keypoint_ids) + _dataset_meta["sigmas"] = _dataset_meta["sigmas"][keypoint_ids] self.metrics[partition_name].dataset_meta = _dataset_meta - def process(self, data_batch: Sequence[dict], - data_samples: Sequence[dict]) -> None: + def process(self, data_batch: Sequence[dict], data_samples: Sequence[dict]) -> None: """Split data samples by partitions, then call metric.process part by part.""" - parted_data_samples = { - partition_name: [] - for partition_name in self.partitions.keys() - } + parted_data_samples = {partition_name: [] for partition_name in self.partitions.keys()} for data_sample in data_samples: for partition_name, keypoint_ids in self.partitions.items(): _data_sample = deepcopy(data_sample) - if 'keypoint_scores' in _data_sample['pred_instances']: - _data_sample['pred_instances'][ - 'keypoint_scores'] = _data_sample['pred_instances'][ - 'keypoint_scores'][:, keypoint_ids] - _data_sample['pred_instances']['keypoints'] = _data_sample[ - 'pred_instances']['keypoints'][:, keypoint_ids] - _data_sample['gt_instances']['keypoints'] = _data_sample[ - 'gt_instances']['keypoints'][:, keypoint_ids] - _data_sample['gt_instances'][ - 'keypoints_visible'] = _data_sample['gt_instances'][ - 'keypoints_visible'][:, keypoint_ids] + if "keypoint_scores" in _data_sample["pred_instances"]: + _data_sample["pred_instances"]["keypoint_scores"] = _data_sample["pred_instances"]["keypoint_scores"][:, keypoint_ids] + _data_sample["pred_instances"]["keypoints"] = _data_sample["pred_instances"]["keypoints"][:, keypoint_ids] + _data_sample["gt_instances"]["keypoints"] = _data_sample["gt_instances"]["keypoints"][:, keypoint_ids] + _data_sample["gt_instances"]["keypoints_visible"] = _data_sample["gt_instances"]["keypoints_visible"][:, keypoint_ids] # for coco metric - if 'raw_ann_info' in _data_sample: - raw_ann_info = _data_sample['raw_ann_info'] - anns = raw_ann_info if isinstance( - raw_ann_info, list) else [raw_ann_info] + if "raw_ann_info" in _data_sample: + raw_ann_info = _data_sample["raw_ann_info"] + anns = raw_ann_info if isinstance(raw_ann_info, list) else [raw_ann_info] for ann in anns: - if 'keypoints' in ann: - keypoints = np.array(ann['keypoints']).reshape( - -1, 3) + if "keypoints" in ann: + keypoints = np.array(ann["keypoints"]).reshape(-1, 3) keypoints = keypoints[keypoint_ids] num_keypoints = np.sum(keypoints[:, 2] > 0) - ann['keypoints'] = keypoints.flatten().tolist() - ann['num_keypoints'] = num_keypoints + ann["keypoints"] = keypoints.flatten().tolist() + ann["num_keypoints"] = num_keypoints parted_data_samples[partition_name].append(_data_sample) @@ -197,7 +180,7 @@ class KeypointPartitionMetric(BaseMetric): for partition_name, metric in self.metrics.items(): _eval_results = metric.evaluate(size) for key in list(_eval_results.keys()): - new_key = partition_name + '/' + key + new_key = partition_name + "/" + key _eval_results[new_key] = _eval_results.pop(key) eval_results.update(_eval_results) return eval_results diff --git a/mmpose/evaluation/metrics/posetrack18_metric.py b/mmpose/evaluation/metrics/posetrack18_metric.py index 86f801455a62467aaf45722210a6018c95b0bdd4..7a30959137002440a08c10e8cf2a8de93f3ff1da 100644 --- a/mmpose/evaluation/metrics/posetrack18_metric.py +++ b/mmpose/evaluation/metrics/posetrack18_metric.py @@ -8,11 +8,13 @@ from mmengine.fileio import dump, load from mmengine.logging import MMLogger from mmpose.registry import METRICS + from .coco_metric import CocoMetric try: from poseval import eval_helpers from poseval.evaluateAP import evaluateAP + has_poseval = True except (ImportError, ModuleNotFoundError): has_poseval = False @@ -69,23 +71,26 @@ class PoseTrack18Metric(CocoMetric): If not specified, a temp file will be created. Defaults to ``None`` **kwargs: Keyword parameters passed to :class:`mmeval.BaseMetric` """ - default_prefix: Optional[str] = 'posetrack18' - - def __init__(self, - ann_file: Optional[str] = None, - score_mode: str = 'bbox_keypoint', - keypoint_score_thr: float = 0.2, - nms_mode: str = 'oks_nms', - nms_thr: float = 0.9, - format_only: bool = False, - outfile_prefix: Optional[str] = None, - collect_device: str = 'cpu', - prefix: Optional[str] = None) -> None: + + default_prefix: Optional[str] = "posetrack18" + + def __init__( + self, + ann_file: Optional[str] = None, + score_mode: str = "bbox_keypoint", + keypoint_score_thr: float = 0.2, + nms_mode: str = "oks_nms", + nms_thr: float = 0.9, + format_only: bool = False, + outfile_prefix: Optional[str] = None, + collect_device: str = "cpu", + prefix: Optional[str] = None, + ) -> None: # raise an error to avoid long time running without getting results if not has_poseval: - raise ImportError('Please install ``poseval`` package for ' - 'evaluation on PoseTrack dataset ' - '(see `requirements/optional.txt`)') + raise ImportError( + "Please install ``poseval`` package for " "evaluation on PoseTrack dataset " "(see `requirements/optional.txt`)" + ) super().__init__( ann_file=ann_file, score_mode=score_mode, @@ -95,10 +100,10 @@ class PoseTrack18Metric(CocoMetric): format_only=format_only, outfile_prefix=outfile_prefix, collect_device=collect_device, - prefix=prefix) + prefix=prefix, + ) - def results2json(self, keypoints: Dict[int, list], - outfile_prefix: str) -> str: + def results2json(self, keypoints: Dict[int, list], outfile_prefix: str) -> str: """Dump the keypoint detection results into a json file. Args: @@ -114,59 +119,83 @@ class PoseTrack18Metric(CocoMetric): categories = [] cat = {} - cat['supercategory'] = 'person' - cat['id'] = 1 - cat['name'] = 'person' - cat['keypoints'] = [ - 'nose', 'head_bottom', 'head_top', 'left_ear', 'right_ear', - 'left_shoulder', 'right_shoulder', 'left_elbow', 'right_elbow', - 'left_wrist', 'right_wrist', 'left_hip', 'right_hip', 'left_knee', - 'right_knee', 'left_ankle', 'right_ankle' + cat["supercategory"] = "person" + cat["id"] = 1 + cat["name"] = "person" + cat["keypoints"] = [ + "nose", + "head_bottom", + "head_top", + "left_ear", + "right_ear", + "left_shoulder", + "right_shoulder", + "left_elbow", + "right_elbow", + "left_wrist", + "right_wrist", + "left_hip", + "right_hip", + "left_knee", + "right_knee", + "left_ankle", + "right_ankle", + ] + cat["skeleton"] = [ + [16, 14], + [14, 12], + [17, 15], + [15, 13], + [12, 13], + [6, 12], + [7, 13], + [6, 7], + [6, 8], + [7, 9], + [8, 10], + [9, 11], + [2, 3], + [1, 2], + [1, 3], + [2, 4], + [3, 5], + [4, 6], + [5, 7], ] - cat['skeleton'] = [[16, 14], [14, 12], [17, 15], [15, 13], [12, 13], - [6, 12], [7, 13], [6, 7], [6, 8], [7, 9], [8, 10], - [9, 11], [2, 3], [1, 2], [1, 3], [2, 4], [3, 5], - [4, 6], [5, 7]] categories.append(cat) # path of directory for official gt files - gt_folder = osp.join( - osp.dirname(self.ann_file), - osp.splitext(self.ann_file.split('_')[-1])[0]) + gt_folder = osp.join(osp.dirname(self.ann_file), osp.splitext(self.ann_file.split("_")[-1])[0]) # the json file for each video sequence - json_files = [ - pos for pos in os.listdir(gt_folder) if pos.endswith('.json') - ] + json_files = [pos for pos in os.listdir(gt_folder) if pos.endswith(".json")] for json_file in json_files: gt = load(osp.join(gt_folder, json_file)) annotations = [] images = [] - for image in gt['images']: + for image in gt["images"]: img = {} - img['id'] = image['id'] - img['file_name'] = image['file_name'] + img["id"] = image["id"] + img["file_name"] = image["file_name"] images.append(img) - img_kpts = keypoints[img['id']] + img_kpts = keypoints[img["id"]] for track_id, img_kpt in enumerate(img_kpts): ann = {} - ann['image_id'] = img_kpt['img_id'] - ann['keypoints'] = np.array( - img_kpt['keypoints']).reshape(-1).tolist() - ann['scores'] = np.array(ann['keypoints']).reshape( - [-1, 3])[:, 2].tolist() - ann['score'] = float(img_kpt['score']) - ann['track_id'] = track_id + ann["image_id"] = img_kpt["img_id"] + ann["keypoints"] = np.array(img_kpt["keypoints"]).reshape(-1).tolist() + ann["scores"] = np.array(ann["keypoints"]).reshape([-1, 3])[:, 2].tolist() + ann["score"] = float(img_kpt["score"]) + ann["track_id"] = track_id annotations.append(ann) pred_file = osp.join(osp.dirname(outfile_prefix), json_file) info = {} - info['images'] = images - info['categories'] = categories - info['annotations'] = annotations + info["images"] = images + info["categories"] = categories + info["annotations"] = annotations dump(info, pred_file, sort_keys=True, indent=4) @@ -186,34 +215,29 @@ class PoseTrack18Metric(CocoMetric): # path of directory for official gt files # 'xxx/posetrack18_train.json' -> 'xxx/train/' - gt_folder = osp.join( - osp.dirname(self.ann_file), - osp.splitext(self.ann_file.split('_')[-1])[0]) + gt_folder = osp.join(osp.dirname(self.ann_file), osp.splitext(self.ann_file.split("_")[-1])[0]) pred_folder = osp.dirname(outfile_prefix) - argv = ['', gt_folder + '/', pred_folder + '/'] + argv = ["", gt_folder + "/", pred_folder + "/"] - logger.info('Loading data') + logger.info("Loading data") gtFramesAll, prFramesAll = eval_helpers.load_data_dir(argv) - logger.info(f'# gt frames : {len(gtFramesAll)}') - logger.info(f'# pred frames: {len(prFramesAll)}') + logger.info(f"# gt frames : {len(gtFramesAll)}") + logger.info(f"# pred frames: {len(prFramesAll)}") # evaluate per-frame multi-person pose estimation (AP) # compute AP - logger.info('Evaluation of per-frame multi-person pose estimation') + logger.info("Evaluation of per-frame multi-person pose estimation") apAll, _, _ = evaluateAP(gtFramesAll, prFramesAll, None, False, False) # print AP - logger.info('Average Precision (AP) metric:') + logger.info("Average Precision (AP) metric:") eval_helpers.printTable(apAll) stats = eval_helpers.getCum(apAll) - stats_names = [ - 'Head AP', 'Shou AP', 'Elb AP', 'Wri AP', 'Hip AP', 'Knee AP', - 'Ankl AP', 'AP' - ] + stats_names = ["Head AP", "Shou AP", "Elb AP", "Wri AP", "Hip AP", "Knee AP", "Ankl AP", "AP"] info_str = list(zip(stats_names, stats)) diff --git a/mmpose/evaluation/metrics/simple_keypoint_3d_metrics.py b/mmpose/evaluation/metrics/simple_keypoint_3d_metrics.py index dc0065d5b9596c8cafa60abc7fc61a09d7313aac..748bc3783882946c12bb3ddd6e21620790c5001c 100644 --- a/mmpose/evaluation/metrics/simple_keypoint_3d_metrics.py +++ b/mmpose/evaluation/metrics/simple_keypoint_3d_metrics.py @@ -6,6 +6,7 @@ from mmengine.evaluator import BaseMetric from mmengine.logging import MMLogger from mmpose.registry import METRICS + from ..functional import keypoint_mpjpe @@ -40,24 +41,18 @@ class SimpleMPJPE(BaseMetric): to be skipped. Default: []. """ - ALIGNMENT = {'mpjpe': 'none', 'p-mpjpe': 'procrustes', 'n-mpjpe': 'scale'} + ALIGNMENT = {"mpjpe": "none", "p-mpjpe": "procrustes", "n-mpjpe": "scale"} - def __init__(self, - mode: str = 'mpjpe', - collect_device: str = 'cpu', - prefix: Optional[str] = None, - skip_list: List[str] = []) -> None: + def __init__(self, mode: str = "mpjpe", collect_device: str = "cpu", prefix: Optional[str] = None, skip_list: List[str] = []) -> None: super().__init__(collect_device=collect_device, prefix=prefix) allowed_modes = self.ALIGNMENT.keys() if mode not in allowed_modes: - raise KeyError("`mode` should be 'mpjpe', 'p-mpjpe', or " - f"'n-mpjpe', but got '{mode}'.") + raise KeyError("`mode` should be 'mpjpe', 'p-mpjpe', or " f"'n-mpjpe', but got '{mode}'.") self.mode = mode self.skip_list = skip_list - def process(self, data_batch: Sequence[dict], - data_samples: Sequence[dict]) -> None: + def process(self, data_batch: Sequence[dict], data_samples: Sequence[dict]) -> None: """Process one batch of data samples and predictions. The processed results should be stored in ``self.results``, which will be used to compute the metrics when all batches have been processed. @@ -70,21 +65,20 @@ class SimpleMPJPE(BaseMetric): """ for data_sample in data_samples: # predicted keypoints coordinates, [T, K, D] - pred_coords = data_sample['pred_instances']['keypoints'] + pred_coords = data_sample["pred_instances"]["keypoints"] if pred_coords.ndim == 4: pred_coords = np.squeeze(pred_coords, axis=0) # ground truth data_info - gt = data_sample['gt_instances'] + gt = data_sample["gt_instances"] # ground truth keypoints coordinates, [T, K, D] - gt_coords = gt['lifting_target'] + gt_coords = gt["lifting_target"] # ground truth keypoints_visible, [T, K, 1] - mask = gt['lifting_target_visible'].astype(bool).reshape( - gt_coords.shape[0], -1) + mask = gt["lifting_target_visible"].astype(bool).reshape(gt_coords.shape[0], -1) result = { - 'pred_coords': pred_coords, - 'gt_coords': gt_coords, - 'mask': mask, + "pred_coords": pred_coords, + "gt_coords": gt_coords, + "mask": mask, } self.results.append(result) @@ -102,18 +96,13 @@ class SimpleMPJPE(BaseMetric): logger: MMLogger = MMLogger.get_current_instance() # pred_coords: [N, K, D] - pred_coords = np.concatenate( - [result['pred_coords'] for result in results]) + pred_coords = np.concatenate([result["pred_coords"] for result in results]) # gt_coords: [N, K, D] - gt_coords = np.concatenate([result['gt_coords'] for result in results]) + gt_coords = np.concatenate([result["gt_coords"] for result in results]) # mask: [N, K] - mask = np.concatenate([result['mask'] for result in results]) + mask = np.concatenate([result["mask"] for result in results]) error_name = self.mode.upper() - logger.info(f'Evaluating {self.mode.upper()}...') - return { - error_name: - keypoint_mpjpe(pred_coords, gt_coords, mask, - self.ALIGNMENT[self.mode]) - } + logger.info(f"Evaluating {self.mode.upper()}...") + return {error_name: keypoint_mpjpe(pred_coords, gt_coords, mask, self.ALIGNMENT[self.mode])} diff --git a/mmpose/models/__init__.py b/mmpose/models/__init__.py index 7e7b386b92dc4f6900efdc88ee690e6b4d86a43e..4693b6a3cdbd11e203e26488dfc3076fd9c37252 100644 --- a/mmpose/models/__init__.py +++ b/mmpose/models/__init__.py @@ -1,8 +1,17 @@ # Copyright (c) OpenMMLab. All rights reserved. from .backbones import * # noqa -from .builder import (BACKBONES, HEADS, LOSSES, NECKS, build_backbone, - build_head, build_loss, build_neck, build_pose_estimator, - build_posenet) +from .builder import ( + BACKBONES, + HEADS, + LOSSES, + NECKS, + build_backbone, + build_head, + build_loss, + build_neck, + build_pose_estimator, + build_posenet, +) from .data_preprocessors import * # noqa from .distillers import * # noqa from .heads import * # noqa @@ -11,14 +20,14 @@ from .necks import * # noqa from .pose_estimators import * # noqa __all__ = [ - 'BACKBONES', - 'HEADS', - 'NECKS', - 'LOSSES', - 'build_backbone', - 'build_head', - 'build_loss', - 'build_posenet', - 'build_neck', - 'build_pose_estimator', + "BACKBONES", + "HEADS", + "NECKS", + "LOSSES", + "build_backbone", + "build_head", + "build_loss", + "build_posenet", + "build_neck", + "build_pose_estimator", ] diff --git a/mmpose/models/backbones/__init__.py b/mmpose/models/backbones/__init__.py index 1559b6288b846248cdabe3e47cdb4620a87f8087..dee9c382a107c1a6e64bd1d69de9a3d9f1680ec0 100644 --- a/mmpose/models/backbones/__init__.py +++ b/mmpose/models/backbones/__init__.py @@ -31,11 +31,36 @@ from .vipnas_mbv3 import ViPNAS_MobileNetV3 from .vipnas_resnet import ViPNAS_ResNet __all__ = [ - 'AlexNet', 'HourglassNet', 'HourglassAENet', 'HRNet', 'MobileNetV2', - 'MobileNetV3', 'RegNet', 'ResNet', 'ResNetV1d', 'ResNeXt', 'SCNet', - 'SEResNet', 'SEResNeXt', 'ShuffleNetV1', 'ShuffleNetV2', 'CPM', 'RSN', - 'MSPN', 'ResNeSt', 'VGG', 'TCN', 'ViPNAS_ResNet', 'ViPNAS_MobileNetV3', - 'LiteHRNet', 'V2VNet', 'HRFormer', 'PyramidVisionTransformer', - 'PyramidVisionTransformerV2', 'SwinTransformer', 'DSTFormer', 'CSPDarknet', - 'CSPNeXt' + "AlexNet", + "HourglassNet", + "HourglassAENet", + "HRNet", + "MobileNetV2", + "MobileNetV3", + "RegNet", + "ResNet", + "ResNetV1d", + "ResNeXt", + "SCNet", + "SEResNet", + "SEResNeXt", + "ShuffleNetV1", + "ShuffleNetV2", + "CPM", + "RSN", + "MSPN", + "ResNeSt", + "VGG", + "TCN", + "ViPNAS_ResNet", + "ViPNAS_MobileNetV3", + "LiteHRNet", + "V2VNet", + "HRFormer", + "PyramidVisionTransformer", + "PyramidVisionTransformerV2", + "SwinTransformer", + "DSTFormer", + "CSPDarknet", + "CSPNeXt", ] diff --git a/mmpose/models/backbones/alexnet.py b/mmpose/models/backbones/alexnet.py index 2262658f4718a079b2effc276282be4d39fbe6ad..e10ecce1f37a0e47a6db5fd0a4d443761ff9aa1e 100644 --- a/mmpose/models/backbones/alexnet.py +++ b/mmpose/models/backbones/alexnet.py @@ -2,6 +2,7 @@ import torch.nn as nn from mmpose.registry import MODELS + from .base_backbone import BaseBackbone @@ -55,4 +56,4 @@ class AlexNet(BaseBackbone): x = x.view(x.size(0), 256 * 6 * 6) x = self.classifier(x) - return (x, ) + return (x,) diff --git a/mmpose/models/backbones/cpm.py b/mmpose/models/backbones/cpm.py index 256769c43a4d7b9d0cdd40fb6de19a90727012e8..52714c09398107f0453d092921354412150bd2c9 100644 --- a/mmpose/models/backbones/cpm.py +++ b/mmpose/models/backbones/cpm.py @@ -7,6 +7,7 @@ from mmcv.cnn import ConvModule from mmengine.model import BaseModule from mmpose.registry import MODELS + from .base_backbone import BaseBackbone @@ -21,12 +22,7 @@ class CpmBlock(BaseModule): Default: None """ - def __init__(self, - in_channels, - channels=(128, 128, 128), - kernels=(11, 11, 11), - norm_cfg=None, - init_cfg=None): + def __init__(self, in_channels, channels=(128, 128, 128), kernels=(11, 11, 11), norm_cfg=None, init_cfg=None): super().__init__(init_cfg=init_cfg) assert len(channels) == len(kernels) @@ -36,13 +32,7 @@ class CpmBlock(BaseModule): input_channels = in_channels else: input_channels = channels[i - 1] - layers.append( - ConvModule( - input_channels, - channels[i], - kernels[i], - padding=(kernels[i] - 1) // 2, - norm_cfg=norm_cfg)) + layers.append(ConvModule(input_channels, channels[i], kernels[i], padding=(kernels[i] - 1) // 2, norm_cfg=norm_cfg)) self.model = nn.Sequential(*layers) def forward(self, x): @@ -100,11 +90,8 @@ class CPM(BaseBackbone): feat_channels=128, middle_channels=32, num_stages=6, - norm_cfg=dict(type='BN', requires_grad=True), - init_cfg=[ - dict(type='Normal', std=0.001, layer=['Conv2d']), - dict(type='Constant', val=1, layer=['_BatchNorm', 'GroupNorm']) - ], + norm_cfg=dict(type="BN", requires_grad=True), + init_cfg=[dict(type="Normal", std=0.001, layer=["Conv2d"]), dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"])], ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) @@ -125,7 +112,8 @@ class CPM(BaseBackbone): ConvModule(128, 32, 5, padding=2, norm_cfg=norm_cfg), ConvModule(32, 512, 9, padding=4, norm_cfg=norm_cfg), ConvModule(512, 512, 1, padding=0, norm_cfg=norm_cfg), - ConvModule(512, out_channels, 1, padding=0, act_cfg=None)) + ConvModule(512, out_channels, 1, padding=0, act_cfg=None), + ) self.middle = nn.Sequential( ConvModule(in_channels, 128, 9, padding=4, norm_cfg=norm_cfg), @@ -133,34 +121,34 @@ class CPM(BaseBackbone): ConvModule(128, 128, 9, padding=4, norm_cfg=norm_cfg), nn.MaxPool2d(kernel_size=3, stride=2, padding=1), ConvModule(128, 128, 9, padding=4, norm_cfg=norm_cfg), - nn.MaxPool2d(kernel_size=3, stride=2, padding=1)) - - self.cpm_stages = nn.ModuleList([ - CpmBlock( - middle_channels + out_channels, - channels=[feat_channels, feat_channels, feat_channels], - kernels=[11, 11, 11], - norm_cfg=norm_cfg) for _ in range(num_stages - 1) - ]) - - self.middle_conv = nn.ModuleList([ - nn.Sequential( - ConvModule( - 128, middle_channels, 5, padding=2, norm_cfg=norm_cfg)) - for _ in range(num_stages - 1) - ]) - - self.out_convs = nn.ModuleList([ - nn.Sequential( - ConvModule( - feat_channels, - feat_channels, - 1, - padding=0, - norm_cfg=norm_cfg), - ConvModule(feat_channels, out_channels, 1, act_cfg=None)) - for _ in range(num_stages - 1) - ]) + nn.MaxPool2d(kernel_size=3, stride=2, padding=1), + ) + + self.cpm_stages = nn.ModuleList( + [ + CpmBlock( + middle_channels + out_channels, + channels=[feat_channels, feat_channels, feat_channels], + kernels=[11, 11, 11], + norm_cfg=norm_cfg, + ) + for _ in range(num_stages - 1) + ] + ) + + self.middle_conv = nn.ModuleList( + [nn.Sequential(ConvModule(128, middle_channels, 5, padding=2, norm_cfg=norm_cfg)) for _ in range(num_stages - 1)] + ) + + self.out_convs = nn.ModuleList( + [ + nn.Sequential( + ConvModule(feat_channels, feat_channels, 1, padding=0, norm_cfg=norm_cfg), + ConvModule(feat_channels, out_channels, 1, act_cfg=None), + ) + for _ in range(num_stages - 1) + ] + ) def forward(self, x): """Model forward function.""" @@ -174,8 +162,7 @@ class CPM(BaseBackbone): single_stage = self.cpm_stages[ind] out_conv = self.out_convs[ind] - inp_feat = torch.cat( - [out_feats[-1], self.middle_conv[ind](middle_out)], 1) + inp_feat = torch.cat([out_feats[-1], self.middle_conv[ind](middle_out)], 1) cpm_feat = single_stage(inp_feat) out_feat = out_conv(cpm_feat) out_feats.append(out_feat) diff --git a/mmpose/models/backbones/csp_darknet.py b/mmpose/models/backbones/csp_darknet.py index dbaba0cfd93d9765713c92f6854c91ad014e3a9d..5f76865893764199acaa5345a1f4ad135de5e860 100644 --- a/mmpose/models/backbones/csp_darknet.py +++ b/mmpose/models/backbones/csp_darknet.py @@ -8,6 +8,7 @@ from mmengine.model import BaseModule from torch.nn.modules.batchnorm import _BatchNorm from mmpose.registry import MODELS + from ..utils import CSPLayer @@ -27,14 +28,16 @@ class Focus(nn.Module): Default: dict(type='Swish'). """ - def __init__(self, - in_channels, - out_channels, - kernel_size=1, - stride=1, - conv_cfg=None, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish')): + def __init__( + self, + in_channels, + out_channels, + kernel_size=1, + stride=1, + conv_cfg=None, + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + ): super().__init__() self.conv = ConvModule( in_channels * 4, @@ -44,7 +47,8 @@ class Focus(nn.Module): padding=(kernel_size - 1) // 2, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg) + act_cfg=act_cfg, + ) def forward(self, x): # shape of x (b,c,w,h) -> y(b,4c,w/2,h/2) @@ -82,42 +86,27 @@ class SPPBottleneck(BaseModule): Default: None. """ - def __init__(self, - in_channels, - out_channels, - kernel_sizes=(5, 9, 13), - conv_cfg=None, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), - init_cfg=None): + def __init__( + self, + in_channels, + out_channels, + kernel_sizes=(5, 9, 13), + conv_cfg=None, + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + init_cfg=None, + ): super().__init__(init_cfg) mid_channels = in_channels // 2 - self.conv1 = ConvModule( - in_channels, - mid_channels, - 1, - stride=1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg) - self.poolings = nn.ModuleList([ - nn.MaxPool2d(kernel_size=ks, stride=1, padding=ks // 2) - for ks in kernel_sizes - ]) + self.conv1 = ConvModule(in_channels, mid_channels, 1, stride=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) + self.poolings = nn.ModuleList([nn.MaxPool2d(kernel_size=ks, stride=1, padding=ks // 2) for ks in kernel_sizes]) conv2_channels = mid_channels * (len(kernel_sizes) + 1) - self.conv2 = ConvModule( - conv2_channels, - out_channels, - 1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg) + self.conv2 = ConvModule(conv2_channels, out_channels, 1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) def forward(self, x): x = self.conv1(x) with torch.cuda.amp.autocast(enabled=False): - x = torch.cat( - [x] + [pooling(x) for pooling in self.poolings], dim=1) + x = torch.cat([x] + [pooling(x) for pooling in self.poolings], dim=1) x = self.conv2(x) return x @@ -166,46 +155,43 @@ class CSPDarknet(BaseModule): (1, 512, 26, 26) (1, 1024, 13, 13) """ + # From left to right: # in_channels, out_channels, num_blocks, add_identity, use_spp arch_settings = { - 'P5': [[64, 128, 3, True, False], [128, 256, 9, True, False], - [256, 512, 9, True, False], [512, 1024, 3, False, True]], - 'P6': [[64, 128, 3, True, False], [128, 256, 9, True, False], - [256, 512, 9, True, False], [512, 768, 3, True, False], - [768, 1024, 3, False, True]] + "P5": [[64, 128, 3, True, False], [128, 256, 9, True, False], [256, 512, 9, True, False], [512, 1024, 3, False, True]], + "P6": [ + [64, 128, 3, True, False], + [128, 256, 9, True, False], + [256, 512, 9, True, False], + [512, 768, 3, True, False], + [768, 1024, 3, False, True], + ], } - def __init__(self, - arch='P5', - deepen_factor=1.0, - widen_factor=1.0, - out_indices=(2, 3, 4), - frozen_stages=-1, - use_depthwise=False, - arch_ovewrite=None, - spp_kernal_sizes=(5, 9, 13), - conv_cfg=None, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), - norm_eval=False, - init_cfg=dict( - type='Kaiming', - layer='Conv2d', - a=math.sqrt(5), - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu')): + def __init__( + self, + arch="P5", + deepen_factor=1.0, + widen_factor=1.0, + out_indices=(2, 3, 4), + frozen_stages=-1, + use_depthwise=False, + arch_ovewrite=None, + spp_kernal_sizes=(5, 9, 13), + conv_cfg=None, + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + norm_eval=False, + init_cfg=dict(type="Kaiming", layer="Conv2d", a=math.sqrt(5), distribution="uniform", mode="fan_in", nonlinearity="leaky_relu"), + ): super().__init__(init_cfg) arch_setting = self.arch_settings[arch] if arch_ovewrite: arch_setting = arch_ovewrite - assert set(out_indices).issubset( - i for i in range(len(arch_setting) + 1)) + assert set(out_indices).issubset(i for i in range(len(arch_setting) + 1)) if frozen_stages not in range(-1, len(arch_setting) + 1): - raise ValueError('frozen_stages must be in range(-1, ' - 'len(arch_setting) + 1). But received ' - f'{frozen_stages}') + raise ValueError("frozen_stages must be in range(-1, " "len(arch_setting) + 1). But received " f"{frozen_stages}") self.out_indices = out_indices self.frozen_stages = frozen_stages @@ -213,39 +199,20 @@ class CSPDarknet(BaseModule): self.norm_eval = norm_eval conv = DepthwiseSeparableConvModule if use_depthwise else ConvModule - self.stem = Focus( - 3, - int(arch_setting[0][0] * widen_factor), - kernel_size=3, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg) - self.layers = ['stem'] + self.stem = Focus(3, int(arch_setting[0][0] * widen_factor), kernel_size=3, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) + self.layers = ["stem"] - for i, (in_channels, out_channels, num_blocks, add_identity, - use_spp) in enumerate(arch_setting): + for i, (in_channels, out_channels, num_blocks, add_identity, use_spp) in enumerate(arch_setting): in_channels = int(in_channels * widen_factor) out_channels = int(out_channels * widen_factor) num_blocks = max(round(num_blocks * deepen_factor), 1) stage = [] - conv_layer = conv( - in_channels, - out_channels, - 3, - stride=2, - padding=1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg) + conv_layer = conv(in_channels, out_channels, 3, stride=2, padding=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) stage.append(conv_layer) if use_spp: spp = SPPBottleneck( - out_channels, - out_channels, - kernel_sizes=spp_kernal_sizes, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg) + out_channels, out_channels, kernel_sizes=spp_kernal_sizes, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg + ) stage.append(spp) csp_layer = CSPLayer( out_channels, @@ -255,10 +222,11 @@ class CSPDarknet(BaseModule): use_depthwise=use_depthwise, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg) + act_cfg=act_cfg, + ) stage.append(csp_layer) - self.add_module(f'stage{i + 1}', nn.Sequential(*stage)) - self.layers.append(f'stage{i + 1}') + self.add_module(f"stage{i + 1}", nn.Sequential(*stage)) + self.layers.append(f"stage{i + 1}") def _freeze_stages(self): if self.frozen_stages >= 0: diff --git a/mmpose/models/backbones/cspnext.py b/mmpose/models/backbones/cspnext.py index 5275bb255a5bf2c610c90544c7d7e8227b3111c2..619b80133eaa64bb416cfe14249ed188b9ed5040 100644 --- a/mmpose/models/backbones/cspnext.py +++ b/mmpose/models/backbones/cspnext.py @@ -10,6 +10,7 @@ from torch.nn.modules.batchnorm import _BatchNorm from mmpose.registry import MODELS from mmpose.utils.typing import ConfigType + from ..utils import CSPLayer from .csp_darknet import SPPBottleneck @@ -51,19 +52,23 @@ class CSPNeXt(BaseModule): init_cfg (:obj:`ConfigDict` or dict or list[dict] or list[:obj:`ConfigDict`]): Initialization config dict. """ + # From left to right: # in_channels, out_channels, num_blocks, add_identity, use_spp arch_settings = { - 'P5': [[64, 128, 3, True, False], [128, 256, 6, True, False], - [256, 512, 6, True, False], [512, 1024, 3, False, True]], - 'P6': [[64, 128, 3, True, False], [128, 256, 6, True, False], - [256, 512, 6, True, False], [512, 768, 3, True, False], - [768, 1024, 3, False, True]] + "P5": [[64, 128, 3, True, False], [128, 256, 6, True, False], [256, 512, 6, True, False], [512, 1024, 3, False, True]], + "P6": [ + [64, 128, 3, True, False], + [128, 256, 6, True, False], + [256, 512, 6, True, False], + [512, 768, 3, True, False], + [768, 1024, 3, False, True], + ], } def __init__( self, - arch: str = 'P5', + arch: str = "P5", deepen_factor: float = 1.0, widen_factor: float = 1.0, out_indices: Sequence[int] = (2, 3, 4), @@ -74,27 +79,20 @@ class CSPNeXt(BaseModule): spp_kernel_sizes: Sequence[int] = (5, 9, 13), channel_attention: bool = True, conv_cfg: Optional[ConfigType] = None, - norm_cfg: ConfigType = dict(type='BN', momentum=0.03, eps=0.001), - act_cfg: ConfigType = dict(type='SiLU'), + norm_cfg: ConfigType = dict(type="BN", momentum=0.03, eps=0.001), + act_cfg: ConfigType = dict(type="SiLU"), norm_eval: bool = False, init_cfg: Optional[ConfigType] = dict( - type='Kaiming', - layer='Conv2d', - a=math.sqrt(5), - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu') + type="Kaiming", layer="Conv2d", a=math.sqrt(5), distribution="uniform", mode="fan_in", nonlinearity="leaky_relu" + ), ) -> None: super().__init__(init_cfg=init_cfg) arch_setting = self.arch_settings[arch] if arch_ovewrite: arch_setting = arch_ovewrite - assert set(out_indices).issubset( - i for i in range(len(arch_setting) + 1)) + assert set(out_indices).issubset(i for i in range(len(arch_setting) + 1)) if frozen_stages not in range(-1, len(arch_setting) + 1): - raise ValueError('frozen_stages must be in range(-1, ' - 'len(arch_setting) + 1). But received ' - f'{frozen_stages}') + raise ValueError("frozen_stages must be in range(-1, " "len(arch_setting) + 1). But received " f"{frozen_stages}") self.out_indices = out_indices self.frozen_stages = frozen_stages @@ -102,14 +100,7 @@ class CSPNeXt(BaseModule): self.norm_eval = norm_eval conv = DepthwiseSeparableConvModule if use_depthwise else ConvModule self.stem = nn.Sequential( - ConvModule( - 3, - int(arch_setting[0][0] * widen_factor // 2), - 3, - padding=1, - stride=2, - norm_cfg=norm_cfg, - act_cfg=act_cfg), + ConvModule(3, int(arch_setting[0][0] * widen_factor // 2), 3, padding=1, stride=2, norm_cfg=norm_cfg, act_cfg=act_cfg), ConvModule( int(arch_setting[0][0] * widen_factor // 2), int(arch_setting[0][0] * widen_factor // 2), @@ -117,7 +108,8 @@ class CSPNeXt(BaseModule): padding=1, stride=1, norm_cfg=norm_cfg, - act_cfg=act_cfg), + act_cfg=act_cfg, + ), ConvModule( int(arch_setting[0][0] * widen_factor // 2), int(arch_setting[0][0] * widen_factor), @@ -125,33 +117,22 @@ class CSPNeXt(BaseModule): padding=1, stride=1, norm_cfg=norm_cfg, - act_cfg=act_cfg)) - self.layers = ['stem'] + act_cfg=act_cfg, + ), + ) + self.layers = ["stem"] - for i, (in_channels, out_channels, num_blocks, add_identity, - use_spp) in enumerate(arch_setting): + for i, (in_channels, out_channels, num_blocks, add_identity, use_spp) in enumerate(arch_setting): in_channels = int(in_channels * widen_factor) out_channels = int(out_channels * widen_factor) num_blocks = max(round(num_blocks * deepen_factor), 1) stage = [] - conv_layer = conv( - in_channels, - out_channels, - 3, - stride=2, - padding=1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg) + conv_layer = conv(in_channels, out_channels, 3, stride=2, padding=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) stage.append(conv_layer) if use_spp: spp = SPPBottleneck( - out_channels, - out_channels, - kernel_sizes=spp_kernel_sizes, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg) + out_channels, out_channels, kernel_sizes=spp_kernel_sizes, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg + ) stage.append(spp) csp_layer = CSPLayer( out_channels, @@ -164,10 +145,11 @@ class CSPNeXt(BaseModule): channel_attention=channel_attention, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg) + act_cfg=act_cfg, + ) stage.append(csp_layer) - self.add_module(f'stage{i + 1}', nn.Sequential(*stage)) - self.layers.append(f'stage{i + 1}') + self.add_module(f"stage{i + 1}", nn.Sequential(*stage)) + self.layers.append(f"stage{i + 1}") def _freeze_stages(self) -> None: if self.frozen_stages >= 0: diff --git a/mmpose/models/backbones/dstformer.py b/mmpose/models/backbones/dstformer.py index 2ef13bdb02fffe0ce19cd478c12abf5c9e45f499..0d013e8d45c55b900920d05e6cafde9a392f7efd 100644 --- a/mmpose/models/backbones/dstformer.py +++ b/mmpose/models/backbones/dstformer.py @@ -6,19 +6,13 @@ from mmengine.model import BaseModule, constant_init from mmengine.model.weight_init import trunc_normal_ from mmpose.registry import MODELS + from .base_backbone import BaseBackbone class Attention(BaseModule): - def __init__(self, - dim, - num_heads=8, - qkv_bias=False, - qk_scale=None, - attn_drop=0., - proj_drop=0., - mode='spatial'): + def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0, mode="spatial"): super().__init__() self.num_heads = num_heads head_dim = dim // num_heads @@ -37,17 +31,13 @@ class Attention(BaseModule): def forward(self, x, seq_len=1): B, N, C = x.shape - if self.mode == 'temporal': - qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // - self.num_heads).permute(2, 0, 3, 1, 4) - q, k, v = qkv[0], qkv[1], qkv[ - 2] # make torchscript happy (cannot use tensor as tuple) + if self.mode == "temporal": + qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4) + q, k, v = qkv[0], qkv[1], qkv[2] # make torchscript happy (cannot use tensor as tuple) x = self.forward_temporal(q, k, v, seq_len=seq_len) - elif self.mode == 'spatial': - qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // - self.num_heads).permute(2, 0, 3, 1, 4) - q, k, v = qkv[0], qkv[1], qkv[ - 2] # make torchscript happy (cannot use tensor as tuple) + elif self.mode == "spatial": + qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4) + q, k, v = qkv[0], qkv[1], qkv[2] # make torchscript happy (cannot use tensor as tuple) x = self.forward_spatial(q, k, v) else: raise NotImplementedError(self.mode) @@ -67,12 +57,9 @@ class Attention(BaseModule): def forward_temporal(self, q, k, v, seq_len=8): B, _, N, C = q.shape - qt = q.reshape(-1, seq_len, self.num_heads, N, - C).permute(0, 2, 3, 1, 4) # (B, H, N, T, C) - kt = k.reshape(-1, seq_len, self.num_heads, N, - C).permute(0, 2, 3, 1, 4) # (B, H, N, T, C) - vt = v.reshape(-1, seq_len, self.num_heads, N, - C).permute(0, 2, 3, 1, 4) # (B, H, N, T, C) + qt = q.reshape(-1, seq_len, self.num_heads, N, C).permute(0, 2, 3, 1, 4) # (B, H, N, T, C) + kt = k.reshape(-1, seq_len, self.num_heads, N, C).permute(0, 2, 3, 1, 4) # (B, H, N, T, C) + vt = v.reshape(-1, seq_len, self.num_heads, N, C).permute(0, 2, 3, 1, 4) # (B, H, N, T, C) attn = (qt @ kt.transpose(-2, -1)) * self.scale attn = attn.softmax(dim=-1) @@ -85,17 +72,19 @@ class Attention(BaseModule): class AttentionBlock(BaseModule): - def __init__(self, - dim, - num_heads, - mlp_ratio=4., - mlp_out_ratio=1., - qkv_bias=True, - qk_scale=None, - drop=0., - attn_drop=0., - drop_path=0., - st_mode='st'): + def __init__( + self, + dim, + num_heads, + mlp_ratio=4.0, + mlp_out_ratio=1.0, + qkv_bias=True, + qk_scale=None, + drop=0.0, + attn_drop=0.0, + drop_path=0.0, + st_mode="st", + ): super().__init__() self.st_mode = st_mode @@ -103,43 +92,28 @@ class AttentionBlock(BaseModule): self.norm1_t = nn.LayerNorm(dim, eps=1e-06) self.attn_s = Attention( - dim, - num_heads=num_heads, - qkv_bias=qkv_bias, - qk_scale=qk_scale, - attn_drop=attn_drop, - proj_drop=drop, - mode='spatial') + dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop, mode="spatial" + ) self.attn_t = Attention( - dim, - num_heads=num_heads, - qkv_bias=qkv_bias, - qk_scale=qk_scale, - attn_drop=attn_drop, - proj_drop=drop, - mode='temporal') - - self.drop_path = DropPath( - drop_path) if drop_path > 0. else nn.Identity() + dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop, mode="temporal" + ) + + self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity() self.norm2_s = nn.LayerNorm(dim, eps=1e-06) self.norm2_t = nn.LayerNorm(dim, eps=1e-06) mlp_hidden_dim = int(dim * mlp_ratio) mlp_out_dim = int(dim * mlp_out_ratio) - self.mlp_s = nn.Sequential( - nn.Linear(dim, mlp_hidden_dim), nn.GELU(), - nn.Linear(mlp_hidden_dim, mlp_out_dim), nn.Dropout(drop)) - self.mlp_t = nn.Sequential( - nn.Linear(dim, mlp_hidden_dim), nn.GELU(), - nn.Linear(mlp_hidden_dim, mlp_out_dim), nn.Dropout(drop)) + self.mlp_s = nn.Sequential(nn.Linear(dim, mlp_hidden_dim), nn.GELU(), nn.Linear(mlp_hidden_dim, mlp_out_dim), nn.Dropout(drop)) + self.mlp_t = nn.Sequential(nn.Linear(dim, mlp_hidden_dim), nn.GELU(), nn.Linear(mlp_hidden_dim, mlp_out_dim), nn.Dropout(drop)) def forward(self, x, seq_len=1): - if self.st_mode == 'st': + if self.st_mode == "st": x = x + self.drop_path(self.attn_s(self.norm1_s(x), seq_len)) x = x + self.drop_path(self.mlp_s(self.norm2_s(x))) x = x + self.drop_path(self.attn_t(self.norm1_t(x), seq_len)) x = x + self.drop_path(self.mlp_t(self.norm2_t(x))) - elif self.st_mode == 'ts': + elif self.st_mode == "ts": x = x + self.drop_path(self.attn_t(self.norm1_t(x), seq_len)) x = x + self.drop_path(self.mlp_t(self.norm2_t(x))) x = x + self.drop_path(self.attn_s(self.norm1_s(x), seq_len)) @@ -186,21 +160,23 @@ class DSTFormer(BaseBackbone): (1, 2, 17, 512) """ - def __init__(self, - in_channels, - feat_size=256, - depth=5, - num_heads=8, - mlp_ratio=4, - num_keypoints=17, - seq_len=243, - qkv_bias=True, - qk_scale=None, - drop_rate=0., - attn_drop_rate=0., - drop_path_rate=0., - att_fuse=True, - init_cfg=None): + def __init__( + self, + in_channels, + feat_size=256, + depth=5, + num_heads=8, + mlp_ratio=4, + num_keypoints=17, + seq_len=243, + qkv_bias=True, + qk_scale=None, + drop_rate=0.0, + attn_drop_rate=0.0, + drop_path_rate=0.0, + att_fuse=True, + init_cfg=None, + ): super().__init__(init_cfg=init_cfg) self.in_channels = in_channels @@ -209,47 +185,52 @@ class DSTFormer(BaseBackbone): self.joints_embed = nn.Linear(in_channels, feat_size) self.pos_drop = nn.Dropout(p=drop_rate) - dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth) - ] # stochastic depth decay rule - - self.blocks_st = nn.ModuleList([ - AttentionBlock( - dim=feat_size, - num_heads=num_heads, - mlp_ratio=mlp_ratio, - qkv_bias=qkv_bias, - qk_scale=qk_scale, - drop=drop_rate, - attn_drop=attn_drop_rate, - drop_path=dpr[i], - st_mode='st') for i in range(depth) - ]) - self.blocks_ts = nn.ModuleList([ - AttentionBlock( - dim=feat_size, - num_heads=num_heads, - mlp_ratio=mlp_ratio, - qkv_bias=qkv_bias, - qk_scale=qk_scale, - drop=drop_rate, - attn_drop=attn_drop_rate, - drop_path=dpr[i], - st_mode='ts') for i in range(depth) - ]) + dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)] # stochastic depth decay rule + + self.blocks_st = nn.ModuleList( + [ + AttentionBlock( + dim=feat_size, + num_heads=num_heads, + mlp_ratio=mlp_ratio, + qkv_bias=qkv_bias, + qk_scale=qk_scale, + drop=drop_rate, + attn_drop=attn_drop_rate, + drop_path=dpr[i], + st_mode="st", + ) + for i in range(depth) + ] + ) + self.blocks_ts = nn.ModuleList( + [ + AttentionBlock( + dim=feat_size, + num_heads=num_heads, + mlp_ratio=mlp_ratio, + qkv_bias=qkv_bias, + qk_scale=qk_scale, + drop=drop_rate, + attn_drop=attn_drop_rate, + drop_path=dpr[i], + st_mode="ts", + ) + for i in range(depth) + ] + ) self.norm = nn.LayerNorm(feat_size, eps=1e-06) self.temp_embed = nn.Parameter(torch.zeros(1, seq_len, 1, feat_size)) - self.spat_embed = nn.Parameter( - torch.zeros(1, num_keypoints, feat_size)) + self.spat_embed = nn.Parameter(torch.zeros(1, num_keypoints, feat_size)) - trunc_normal_(self.temp_embed, std=.02) - trunc_normal_(self.spat_embed, std=.02) + trunc_normal_(self.temp_embed, std=0.02) + trunc_normal_(self.spat_embed, std=0.02) self.att_fuse = att_fuse if self.att_fuse: - self.attn_regress = nn.ModuleList( - [nn.Linear(feat_size * 2, 2) for i in range(depth)]) + self.attn_regress = nn.ModuleList([nn.Linear(feat_size * 2, 2) for i in range(depth)]) for i in range(depth): self.attn_regress[i].weight.data.fill_(0) self.attn_regress[i].bias.data.fill_(0.5) @@ -269,8 +250,7 @@ class DSTFormer(BaseBackbone): x = x.reshape(BF, K, C) # (BF, K, feat_size) x = self.pos_drop(x) - for idx, (blk_st, - blk_ts) in enumerate(zip(self.blocks_st, self.blocks_ts)): + for idx, (blk_st, blk_ts) in enumerate(zip(self.blocks_st, self.blocks_ts)): x_st = blk_st(x, F) x_ts = blk_ts(x, F) if self.att_fuse: @@ -290,13 +270,12 @@ class DSTFormer(BaseBackbone): """Initialize the weights in backbone.""" super(DSTFormer, self).init_weights() - if (isinstance(self.init_cfg, dict) - and self.init_cfg['type'] == 'Pretrained'): + if isinstance(self.init_cfg, dict) and self.init_cfg["type"] == "Pretrained": return for m in self.modules(): if isinstance(m, nn.Linear): - trunc_normal_(m.weight, std=.02) + trunc_normal_(m.weight, std=0.02) if isinstance(m, nn.Linear) and m.bias is not None: constant_init(m.bias, 0) elif isinstance(m, nn.LayerNorm): diff --git a/mmpose/models/backbones/hourglass.py b/mmpose/models/backbones/hourglass.py index cfc8d6d328da5b63094015351cc10084cda46da0..1c868db7ddc9131d7c92b31dc111c4987d57354b 100644 --- a/mmpose/models/backbones/hourglass.py +++ b/mmpose/models/backbones/hourglass.py @@ -6,6 +6,7 @@ from mmcv.cnn import ConvModule from mmengine.model import BaseModule from mmpose.registry import MODELS + from .base_backbone import BaseBackbone from .resnet import BasicBlock, ResLayer @@ -26,12 +27,7 @@ class HourglassModule(BaseModule): Default: None """ - def __init__(self, - depth, - stage_channels, - stage_blocks, - norm_cfg=dict(type='BN', requires_grad=True), - init_cfg=None): + def __init__(self, depth, stage_channels, stage_blocks, norm_cfg=dict(type="BN", requires_grad=True), init_cfg=None): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) @@ -44,35 +40,16 @@ class HourglassModule(BaseModule): cur_channel = stage_channels[0] next_channel = stage_channels[1] - self.up1 = ResLayer( - BasicBlock, cur_block, cur_channel, cur_channel, norm_cfg=norm_cfg) + self.up1 = ResLayer(BasicBlock, cur_block, cur_channel, cur_channel, norm_cfg=norm_cfg) - self.low1 = ResLayer( - BasicBlock, - cur_block, - cur_channel, - next_channel, - stride=2, - norm_cfg=norm_cfg) + self.low1 = ResLayer(BasicBlock, cur_block, cur_channel, next_channel, stride=2, norm_cfg=norm_cfg) if self.depth > 1: - self.low2 = HourglassModule(depth - 1, stage_channels[1:], - stage_blocks[1:]) + self.low2 = HourglassModule(depth - 1, stage_channels[1:], stage_blocks[1:]) else: - self.low2 = ResLayer( - BasicBlock, - next_block, - next_channel, - next_channel, - norm_cfg=norm_cfg) - - self.low3 = ResLayer( - BasicBlock, - cur_block, - next_channel, - cur_channel, - norm_cfg=norm_cfg, - downsample_first=False) + self.low2 = ResLayer(BasicBlock, next_block, next_channel, next_channel, norm_cfg=norm_cfg) + + self.low3 = ResLayer(BasicBlock, cur_block, next_channel, cur_channel, norm_cfg=norm_cfg, downsample_first=False) self.up2 = nn.Upsample(scale_factor=2) @@ -134,11 +111,8 @@ class HourglassNet(BaseBackbone): stage_channels=(256, 256, 384, 384, 384, 512), stage_blocks=(2, 2, 2, 2, 2, 4), feat_channel=256, - norm_cfg=dict(type='BN', requires_grad=True), - init_cfg=[ - dict(type='Normal', std=0.001, layer=['Conv2d']), - dict(type='Constant', val=1, layer=['_BatchNorm', 'GroupNorm']) - ], + norm_cfg=dict(type="BN", requires_grad=True), + init_cfg=[dict(type="Normal", std=0.001, layer=["Conv2d"]), dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"])], ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) @@ -152,38 +126,22 @@ class HourglassNet(BaseBackbone): cur_channel = stage_channels[0] self.stem = nn.Sequential( - ConvModule(3, 128, 7, padding=3, stride=2, norm_cfg=norm_cfg), - ResLayer(BasicBlock, 1, 128, 256, stride=2, norm_cfg=norm_cfg)) - - self.hourglass_modules = nn.ModuleList([ - HourglassModule(downsample_times, stage_channels, stage_blocks) - for _ in range(num_stacks) - ]) - - self.inters = ResLayer( - BasicBlock, - num_stacks - 1, - cur_channel, - cur_channel, - norm_cfg=norm_cfg) - - self.conv1x1s = nn.ModuleList([ - ConvModule( - cur_channel, cur_channel, 1, norm_cfg=norm_cfg, act_cfg=None) - for _ in range(num_stacks - 1) - ]) - - self.out_convs = nn.ModuleList([ - ConvModule( - cur_channel, feat_channel, 3, padding=1, norm_cfg=norm_cfg) - for _ in range(num_stacks) - ]) - - self.remap_convs = nn.ModuleList([ - ConvModule( - feat_channel, cur_channel, 1, norm_cfg=norm_cfg, act_cfg=None) - for _ in range(num_stacks - 1) - ]) + ConvModule(3, 128, 7, padding=3, stride=2, norm_cfg=norm_cfg), ResLayer(BasicBlock, 1, 128, 256, stride=2, norm_cfg=norm_cfg) + ) + + self.hourglass_modules = nn.ModuleList([HourglassModule(downsample_times, stage_channels, stage_blocks) for _ in range(num_stacks)]) + + self.inters = ResLayer(BasicBlock, num_stacks - 1, cur_channel, cur_channel, norm_cfg=norm_cfg) + + self.conv1x1s = nn.ModuleList( + [ConvModule(cur_channel, cur_channel, 1, norm_cfg=norm_cfg, act_cfg=None) for _ in range(num_stacks - 1)] + ) + + self.out_convs = nn.ModuleList([ConvModule(cur_channel, feat_channel, 3, padding=1, norm_cfg=norm_cfg) for _ in range(num_stacks)]) + + self.remap_convs = nn.ModuleList( + [ConvModule(feat_channel, cur_channel, 1, norm_cfg=norm_cfg, act_cfg=None) for _ in range(num_stacks - 1)] + ) self.relu = nn.ReLU(inplace=True) @@ -201,9 +159,7 @@ class HourglassNet(BaseBackbone): out_feats.append(out_feat) if ind < self.num_stacks - 1: - inter_feat = self.conv1x1s[ind]( - inter_feat) + self.remap_convs[ind]( - out_feat) + inter_feat = self.conv1x1s[ind](inter_feat) + self.remap_convs[ind](out_feat) inter_feat = self.inters[ind](self.relu(inter_feat)) return out_feats diff --git a/mmpose/models/backbones/hourglass_ae.py b/mmpose/models/backbones/hourglass_ae.py index 93e62dd4067c3489de00c5cd1f7875489725de2e..ac017e158f5caabaf534fa90dba3968495c09f68 100644 --- a/mmpose/models/backbones/hourglass_ae.py +++ b/mmpose/models/backbones/hourglass_ae.py @@ -6,6 +6,7 @@ from mmcv.cnn import ConvModule, MaxPool2d from mmengine.model import BaseModule from mmpose.registry import MODELS + from .base_backbone import BaseBackbone @@ -23,11 +24,7 @@ class HourglassAEModule(BaseModule): Default: None """ - def __init__(self, - depth, - stage_channels, - norm_cfg=dict(type='BN', requires_grad=True), - init_cfg=None): + def __init__(self, depth, stage_channels, norm_cfg=dict(type="BN", requires_grad=True), init_cfg=None): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) @@ -37,22 +34,18 @@ class HourglassAEModule(BaseModule): cur_channel = stage_channels[0] next_channel = stage_channels[1] - self.up1 = ConvModule( - cur_channel, cur_channel, 3, padding=1, norm_cfg=norm_cfg) + self.up1 = ConvModule(cur_channel, cur_channel, 3, padding=1, norm_cfg=norm_cfg) self.pool1 = MaxPool2d(2, 2) - self.low1 = ConvModule( - cur_channel, next_channel, 3, padding=1, norm_cfg=norm_cfg) + self.low1 = ConvModule(cur_channel, next_channel, 3, padding=1, norm_cfg=norm_cfg) if self.depth > 1: self.low2 = HourglassAEModule(depth - 1, stage_channels[1:]) else: - self.low2 = ConvModule( - next_channel, next_channel, 3, padding=1, norm_cfg=norm_cfg) + self.low2 = ConvModule(next_channel, next_channel, 3, padding=1, norm_cfg=norm_cfg) - self.low3 = ConvModule( - next_channel, cur_channel, 3, padding=1, norm_cfg=norm_cfg) + self.low3 = ConvModule(next_channel, cur_channel, 3, padding=1, norm_cfg=norm_cfg) self.up2 = nn.UpsamplingNearest2d(scale_factor=2) @@ -116,11 +109,8 @@ class HourglassAENet(BaseBackbone): out_channels=34, stage_channels=(256, 384, 512, 640, 768), feat_channels=256, - norm_cfg=dict(type='BN', requires_grad=True), - init_cfg=[ - dict(type='Normal', std=0.001, layer=['Conv2d']), - dict(type='Constant', val=1, layer=['_BatchNorm', 'GroupNorm']) - ], + norm_cfg=dict(type="BN", requires_grad=True), + init_cfg=[dict(type="Normal", std=0.001, layer=["Conv2d"]), dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"])], ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) @@ -140,51 +130,28 @@ class HourglassAENet(BaseBackbone): ConvModule(128, feat_channels, 3, padding=1, norm_cfg=norm_cfg), ) - self.hourglass_modules = nn.ModuleList([ - nn.Sequential( - HourglassAEModule( - downsample_times, stage_channels, norm_cfg=norm_cfg), - ConvModule( - feat_channels, - feat_channels, - 3, - padding=1, - norm_cfg=norm_cfg), - ConvModule( - feat_channels, - feat_channels, - 3, - padding=1, - norm_cfg=norm_cfg)) for _ in range(num_stacks) - ]) - - self.out_convs = nn.ModuleList([ - ConvModule( - cur_channels, - out_channels, - 1, - padding=0, - norm_cfg=None, - act_cfg=None) for _ in range(num_stacks) - ]) - - self.remap_out_convs = nn.ModuleList([ - ConvModule( - out_channels, - feat_channels, - 1, - norm_cfg=norm_cfg, - act_cfg=None) for _ in range(num_stacks - 1) - ]) - - self.remap_feature_convs = nn.ModuleList([ - ConvModule( - feat_channels, - feat_channels, - 1, - norm_cfg=norm_cfg, - act_cfg=None) for _ in range(num_stacks - 1) - ]) + self.hourglass_modules = nn.ModuleList( + [ + nn.Sequential( + HourglassAEModule(downsample_times, stage_channels, norm_cfg=norm_cfg), + ConvModule(feat_channels, feat_channels, 3, padding=1, norm_cfg=norm_cfg), + ConvModule(feat_channels, feat_channels, 3, padding=1, norm_cfg=norm_cfg), + ) + for _ in range(num_stacks) + ] + ) + + self.out_convs = nn.ModuleList( + [ConvModule(cur_channels, out_channels, 1, padding=0, norm_cfg=None, act_cfg=None) for _ in range(num_stacks)] + ) + + self.remap_out_convs = nn.ModuleList( + [ConvModule(out_channels, feat_channels, 1, norm_cfg=norm_cfg, act_cfg=None) for _ in range(num_stacks - 1)] + ) + + self.remap_feature_convs = nn.ModuleList( + [ConvModule(feat_channels, feat_channels, 1, norm_cfg=norm_cfg, act_cfg=None) for _ in range(num_stacks - 1)] + ) self.relu = nn.ReLU(inplace=True) @@ -202,8 +169,6 @@ class HourglassAENet(BaseBackbone): out_feats.append(out_feat) if ind < self.num_stacks - 1: - inter_feat = inter_feat + self.remap_out_convs[ind]( - out_feat) + self.remap_feature_convs[ind]( - hourglass_feat) + inter_feat = inter_feat + self.remap_out_convs[ind](out_feat) + self.remap_feature_convs[ind](hourglass_feat) return out_feats diff --git a/mmpose/models/backbones/hrformer.py b/mmpose/models/backbones/hrformer.py index 0b86617f14e3104c84e3d5af5dd82bcf8cbd7879..3cb155dbb41df17d27ba2150a950e3205c2199b2 100644 --- a/mmpose/models/backbones/hrformer.py +++ b/mmpose/models/backbones/hrformer.py @@ -10,6 +10,7 @@ from mmengine.model import BaseModule, trunc_normal_init from torch.nn.functional import pad from mmpose.registry import MODELS + from .hrnet import Bottleneck, HRModule, HRNet @@ -26,7 +27,7 @@ def nlc_to_nchw(x, hw_shape): H, W = hw_shape assert len(x.shape) == 3 B, L, C = x.shape - assert L == H * W, 'The seq_len doesn\'t match H, W' + assert L == H * W, "The seq_len doesn't match H, W" return x.transpose(1, 2).reshape(B, C, H, W) @@ -45,7 +46,7 @@ def nchw_to_nlc(x): def build_drop_path(drop_path_rate): """Build drop path layer.""" - return build_dropout(dict(type='DropPath', drop_prob=drop_path_rate)) + return build_dropout(dict(type="DropPath", drop_prob=drop_path_rate)) class WindowMSA(BaseModule): @@ -69,16 +70,18 @@ class WindowMSA(BaseModule): Default: None. """ - def __init__(self, - embed_dims, - num_heads, - window_size, - qkv_bias=True, - qk_scale=None, - attn_drop_rate=0., - proj_drop_rate=0., - with_rpe=True, - init_cfg=None): + def __init__( + self, + embed_dims, + num_heads, + window_size, + qkv_bias=True, + qk_scale=None, + attn_drop_rate=0.0, + proj_drop_rate=0.0, + with_rpe=True, + init_cfg=None, + ): super().__init__(init_cfg=init_cfg) self.embed_dims = embed_dims @@ -91,15 +94,14 @@ class WindowMSA(BaseModule): if self.with_rpe: # define a parameter table of relative position bias self.relative_position_bias_table = nn.Parameter( - torch.zeros( - (2 * window_size[0] - 1) * (2 * window_size[1] - 1), - num_heads)) # 2*Wh-1 * 2*Ww-1, nH + torch.zeros((2 * window_size[0] - 1) * (2 * window_size[1] - 1), num_heads) + ) # 2*Wh-1 * 2*Ww-1, nH Wh, Ww = self.window_size rel_index_coords = self.double_step_seq(2 * Ww - 1, Wh, 1, Ww) rel_position_index = rel_index_coords + rel_index_coords.T rel_position_index = rel_position_index.flip(1).contiguous() - self.register_buffer('relative_position_index', rel_position_index) + self.register_buffer("relative_position_index", rel_position_index) self.qkv = nn.Linear(embed_dims, embed_dims * 3, bias=qkv_bias) self.attn_drop = nn.Dropout(attn_drop_rate) @@ -120,27 +122,22 @@ class WindowMSA(BaseModule): Wh*Ww, Wh*Ww), value should be between (-inf, 0]. """ B, N, C = x.shape - qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, - C // self.num_heads).permute(2, 0, 3, 1, 4) + qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4) q, k, v = qkv[0], qkv[1], qkv[2] q = q * self.scale - attn = (q @ k.transpose(-2, -1)) + attn = q @ k.transpose(-2, -1) if self.with_rpe: - relative_position_bias = self.relative_position_bias_table[ - self.relative_position_index.view(-1)].view( - self.window_size[0] * self.window_size[1], - self.window_size[0] * self.window_size[1], - -1) # Wh*Ww,Wh*Ww,nH - relative_position_bias = relative_position_bias.permute( - 2, 0, 1).contiguous() # nH, Wh*Ww, Wh*Ww + relative_position_bias = self.relative_position_bias_table[self.relative_position_index.view(-1)].view( + self.window_size[0] * self.window_size[1], self.window_size[0] * self.window_size[1], -1 + ) # Wh*Ww,Wh*Ww,nH + relative_position_bias = relative_position_bias.permute(2, 0, 1).contiguous() # nH, Wh*Ww, Wh*Ww attn = attn + relative_position_bias.unsqueeze(0) if mask is not None: nW = mask.shape[0] - attn = attn.view(B // nW, nW, self.num_heads, N, - N) + mask.unsqueeze(1).unsqueeze(0) + attn = attn.view(B // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0) attn = attn.view(-1, self.num_heads, N, N) attn = self.softmax(attn) @@ -159,7 +156,7 @@ class WindowMSA(BaseModule): class LocalWindowSelfAttention(BaseModule): - r""" Local-window Self Attention (LSA) module with relative position bias. + r"""Local-window Self Attention (LSA) module with relative position bias. This module is the short-range self-attention module in the Interlaced Sparse Self-Attention `_. @@ -183,17 +180,19 @@ class LocalWindowSelfAttention(BaseModule): Default: None. """ - def __init__(self, - embed_dims, - num_heads, - window_size, - qkv_bias=True, - qk_scale=None, - attn_drop_rate=0., - proj_drop_rate=0., - with_rpe=True, - with_pad_mask=False, - init_cfg=None): + def __init__( + self, + embed_dims, + num_heads, + window_size, + qkv_bias=True, + qk_scale=None, + attn_drop_rate=0.0, + proj_drop_rate=0.0, + with_rpe=True, + with_pad_mask=False, + init_cfg=None, + ): super().__init__(init_cfg=init_cfg) if isinstance(window_size, int): window_size = (window_size, window_size) @@ -208,7 +207,8 @@ class LocalWindowSelfAttention(BaseModule): attn_drop_rate=attn_drop_rate, proj_drop_rate=proj_drop_rate, with_rpe=with_rpe, - init_cfg=init_cfg) + init_cfg=init_cfg, + ) def forward(self, x, H, W, **kwargs): """Forward function.""" @@ -219,8 +219,7 @@ class LocalWindowSelfAttention(BaseModule): # center-pad the feature on H and W axes pad_h = math.ceil(H / Wh) * Wh - H pad_w = math.ceil(W / Ww) * Ww - W - x = pad(x, (0, 0, pad_w // 2, pad_w - pad_w // 2, pad_h // 2, - pad_h - pad_h // 2)) + x = pad(x, (0, 0, pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2)) # permute x = x.view(B, math.ceil(H / Wh), Wh, math.ceil(W / Ww), Ww, C) @@ -230,14 +229,8 @@ class LocalWindowSelfAttention(BaseModule): # attention if self.with_pad_mask and pad_h > 0 and pad_w > 0: pad_mask = x.new_zeros(1, H, W, 1) - pad_mask = pad( - pad_mask, [ - 0, 0, pad_w // 2, pad_w - pad_w // 2, pad_h // 2, - pad_h - pad_h // 2 - ], - value=-float('inf')) - pad_mask = pad_mask.view(1, math.ceil(H / Wh), Wh, - math.ceil(W / Ww), Ww, 1) + pad_mask = pad(pad_mask, [0, 0, pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2], value=-float("inf")) + pad_mask = pad_mask.view(1, math.ceil(H / Wh), Wh, math.ceil(W / Ww), Ww, 1) pad_mask = pad_mask.permute(1, 3, 0, 2, 4, 5) pad_mask = pad_mask.reshape(-1, Wh * Ww) pad_mask = pad_mask[:, None, :].expand([-1, Wh * Ww, -1]) @@ -251,7 +244,7 @@ class LocalWindowSelfAttention(BaseModule): out = out.reshape(B, H + pad_h, W + pad_w, C) # de-pad - out = out[:, pad_h // 2:H + pad_h // 2, pad_w // 2:W + pad_w // 2] + out = out[:, pad_h // 2 : H + pad_h // 2, pad_w // 2 : W + pad_w // 2] return out.reshape(B, N, C) @@ -272,27 +265,23 @@ class CrossFFN(BaseModule): Default: None. """ - def __init__(self, - in_features, - hidden_features=None, - out_features=None, - act_cfg=dict(type='GELU'), - dw_act_cfg=dict(type='GELU'), - norm_cfg=dict(type='SyncBN'), - init_cfg=None): + def __init__( + self, + in_features, + hidden_features=None, + out_features=None, + act_cfg=dict(type="GELU"), + dw_act_cfg=dict(type="GELU"), + norm_cfg=dict(type="SyncBN"), + init_cfg=None, + ): super().__init__(init_cfg=init_cfg) out_features = out_features or in_features hidden_features = hidden_features or in_features self.fc1 = nn.Conv2d(in_features, hidden_features, kernel_size=1) self.act1 = build_activation_layer(act_cfg) self.norm1 = build_norm_layer(norm_cfg, hidden_features)[1] - self.dw3x3 = nn.Conv2d( - hidden_features, - hidden_features, - kernel_size=3, - stride=1, - groups=hidden_features, - padding=1) + self.dw3x3 = nn.Conv2d(hidden_features, hidden_features, kernel_size=3, stride=1, groups=hidden_features, padding=1) self.act2 = build_activation_layer(dw_act_cfg) self.norm2 = build_norm_layer(norm_cfg, hidden_features)[1] self.fc2 = nn.Conv2d(hidden_features, out_features, kernel_size=1) @@ -332,30 +321,27 @@ class HRFormerBlock(BaseModule): expansion = 1 - def __init__(self, - in_features, - out_features, - num_heads, - window_size=7, - mlp_ratio=4.0, - drop_path=0.0, - act_cfg=dict(type='GELU'), - norm_cfg=dict(type='SyncBN'), - transformer_norm_cfg=dict(type='LN', eps=1e-6), - init_cfg=None, - **kwargs): + def __init__( + self, + in_features, + out_features, + num_heads, + window_size=7, + mlp_ratio=4.0, + drop_path=0.0, + act_cfg=dict(type="GELU"), + norm_cfg=dict(type="SyncBN"), + transformer_norm_cfg=dict(type="LN", eps=1e-6), + init_cfg=None, + **kwargs, + ): super(HRFormerBlock, self).__init__(init_cfg=init_cfg) self.num_heads = num_heads self.window_size = window_size self.mlp_ratio = mlp_ratio self.norm1 = build_norm_layer(transformer_norm_cfg, in_features)[1] - self.attn = LocalWindowSelfAttention( - in_features, - num_heads=num_heads, - window_size=window_size, - init_cfg=None, - **kwargs) + self.attn = LocalWindowSelfAttention(in_features, num_heads=num_heads, window_size=window_size, init_cfg=None, **kwargs) self.norm2 = build_norm_layer(transformer_norm_cfg, out_features)[1] self.ffn = CrossFFN( @@ -365,10 +351,10 @@ class HRFormerBlock(BaseModule): norm_cfg=norm_cfg, act_cfg=act_cfg, dw_act_cfg=act_cfg, - init_cfg=None) + init_cfg=None, + ) - self.drop_path = build_drop_path( - drop_path) if drop_path > 0.0 else nn.Identity() + self.drop_path = build_drop_path(drop_path) if drop_path > 0.0 else nn.Identity() def forward(self, x): """Forward function.""" @@ -383,8 +369,7 @@ class HRFormerBlock(BaseModule): def extra_repr(self): """(Optional) Set the extra information about this module.""" - return 'num_heads={}, window_size={}, mlp_ratio={}'.format( - self.num_heads, self.window_size, self.mlp_ratio) + return "num_heads={}, window_size={}, mlp_ratio={}".format(self.num_heads, self.window_size, self.mlp_ratio) class HRFomerModule(HRModule): @@ -422,25 +407,27 @@ class HRFomerModule(HRModule): Default: None. """ - def __init__(self, - num_branches, - block, - num_blocks, - num_inchannels, - num_channels, - num_heads, - num_window_sizes, - num_mlp_ratios, - multiscale_output=True, - drop_paths=0.0, - with_rpe=True, - with_pad_mask=False, - conv_cfg=None, - norm_cfg=dict(type='SyncBN', requires_grad=True), - transformer_norm_cfg=dict(type='LN', eps=1e-6), - with_cp=False, - upsample_cfg=dict(mode='bilinear', align_corners=False), - **kwargs): + def __init__( + self, + num_branches, + block, + num_blocks, + num_inchannels, + num_channels, + num_heads, + num_window_sizes, + num_mlp_ratios, + multiscale_output=True, + drop_paths=0.0, + with_rpe=True, + with_pad_mask=False, + conv_cfg=None, + norm_cfg=dict(type="SyncBN", requires_grad=True), + transformer_norm_cfg=dict(type="LN", eps=1e-6), + with_cp=False, + upsample_cfg=dict(mode="bilinear", align_corners=False), + **kwargs, + ): self.transformer_norm_cfg = transformer_norm_cfg self.drop_paths = drop_paths @@ -450,20 +437,24 @@ class HRFomerModule(HRModule): self.with_rpe = with_rpe self.with_pad_mask = with_pad_mask - super().__init__(num_branches, block, num_blocks, num_inchannels, - num_channels, multiscale_output, with_cp, conv_cfg, - norm_cfg, upsample_cfg, **kwargs) - - def _make_one_branch(self, - branch_index, - block, - num_blocks, - num_channels, - stride=1): + super().__init__( + num_branches, + block, + num_blocks, + num_inchannels, + num_channels, + multiscale_output, + with_cp, + conv_cfg, + norm_cfg, + upsample_cfg, + **kwargs, + ) + + def _make_one_branch(self, branch_index, block, num_blocks, num_channels, stride=1): """Build one branch.""" # HRFormerBlock does not support down sample layer yet. - assert stride == 1 and self.in_channels[branch_index] == num_channels[ - branch_index] + assert stride == 1 and self.in_channels[branch_index] == num_channels[branch_index] layers = [] layers.append( block( @@ -477,10 +468,11 @@ class HRFomerModule(HRModule): transformer_norm_cfg=self.transformer_norm_cfg, init_cfg=None, with_rpe=self.with_rpe, - with_pad_mask=self.with_pad_mask)) + with_pad_mask=self.with_pad_mask, + ) + ) - self.in_channels[ - branch_index] = self.in_channels[branch_index] * block.expansion + self.in_channels[branch_index] = self.in_channels[branch_index] * block.expansion for i in range(1, num_blocks[branch_index]): layers.append( block( @@ -494,7 +486,9 @@ class HRFomerModule(HRModule): transformer_norm_cfg=self.transformer_norm_cfg, init_cfg=None, with_rpe=self.with_rpe, - with_pad_mask=self.with_pad_mask)) + with_pad_mask=self.with_pad_mask, + ) + ) return nn.Sequential(*layers) def _make_fuse_layers(self): @@ -510,20 +504,13 @@ class HRFomerModule(HRModule): if j > i: fuse_layer.append( nn.Sequential( - build_conv_layer( - self.conv_cfg, - num_inchannels[j], - num_inchannels[i], - kernel_size=1, - stride=1, - bias=False), - build_norm_layer(self.norm_cfg, - num_inchannels[i])[1], + build_conv_layer(self.conv_cfg, num_inchannels[j], num_inchannels[i], kernel_size=1, stride=1, bias=False), + build_norm_layer(self.norm_cfg, num_inchannels[i])[1], nn.Upsample( - scale_factor=2**(j - i), - mode=self.upsample_cfg['mode'], - align_corners=self. - upsample_cfg['align_corners']))) + scale_factor=2 ** (j - i), mode=self.upsample_cfg["mode"], align_corners=self.upsample_cfg["align_corners"] + ), + ) + ) elif j == i: fuse_layer.append(None) else: @@ -546,8 +533,7 @@ class HRFomerModule(HRModule): groups=num_inchannels[j], bias=False, ), - build_norm_layer(self.norm_cfg, - num_inchannels[j])[1], + build_norm_layer(self.norm_cfg, num_inchannels[j])[1], build_conv_layer( self.conv_cfg, num_inchannels[j], @@ -556,8 +542,7 @@ class HRFomerModule(HRModule): stride=1, bias=False, ), - build_norm_layer(self.norm_cfg, - num_outchannels_conv3x3)[1] + build_norm_layer(self.norm_cfg, num_outchannels_conv3x3)[1], ] if with_out_act: sub_modules.append(nn.ReLU(False)) @@ -664,66 +649,51 @@ class HRFormer(HRNet): (1, 256, 1, 1) """ - blocks_dict = {'BOTTLENECK': Bottleneck, 'HRFORMERBLOCK': HRFormerBlock} + blocks_dict = {"BOTTLENECK": Bottleneck, "HRFORMERBLOCK": HRFormerBlock} def __init__( self, extra, in_channels=3, conv_cfg=None, - norm_cfg=dict(type='BN', requires_grad=True), - transformer_norm_cfg=dict(type='LN', eps=1e-6), + norm_cfg=dict(type="BN", requires_grad=True), + transformer_norm_cfg=dict(type="LN", eps=1e-6), norm_eval=False, with_cp=False, zero_init_residual=False, frozen_stages=-1, - init_cfg=[ - dict(type='Normal', std=0.001, layer=['Conv2d']), - dict(type='Constant', val=1, layer=['_BatchNorm', 'GroupNorm']) - ], + init_cfg=[dict(type="Normal", std=0.001, layer=["Conv2d"]), dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"])], ): # stochastic depth - depths = [ - extra[stage]['num_blocks'][0] * extra[stage]['num_modules'] - for stage in ['stage2', 'stage3', 'stage4'] - ] + depths = [extra[stage]["num_blocks"][0] * extra[stage]["num_modules"] for stage in ["stage2", "stage3", "stage4"]] depth_s2, depth_s3, _ = depths - drop_path_rate = extra['drop_path_rate'] - dpr = [ - x.item() for x in torch.linspace(0, drop_path_rate, sum(depths)) - ] - extra['stage2']['drop_path_rates'] = dpr[0:depth_s2] - extra['stage3']['drop_path_rates'] = dpr[depth_s2:depth_s2 + depth_s3] - extra['stage4']['drop_path_rates'] = dpr[depth_s2 + depth_s3:] + drop_path_rate = extra["drop_path_rate"] + dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))] + extra["stage2"]["drop_path_rates"] = dpr[0:depth_s2] + extra["stage3"]["drop_path_rates"] = dpr[depth_s2 : depth_s2 + depth_s3] + extra["stage4"]["drop_path_rates"] = dpr[depth_s2 + depth_s3 :] # HRFormer use bilinear upsample as default - upsample_cfg = extra.get('upsample', { - 'mode': 'bilinear', - 'align_corners': False - }) - extra['upsample'] = upsample_cfg + upsample_cfg = extra.get("upsample", {"mode": "bilinear", "align_corners": False}) + extra["upsample"] = upsample_cfg self.transformer_norm_cfg = transformer_norm_cfg - self.with_rpe = extra.get('with_rpe', True) - self.with_pad_mask = extra.get('with_pad_mask', False) + self.with_rpe = extra.get("with_rpe", True) + self.with_pad_mask = extra.get("with_pad_mask", False) - super().__init__(extra, in_channels, conv_cfg, norm_cfg, norm_eval, - with_cp, zero_init_residual, frozen_stages, init_cfg) + super().__init__(extra, in_channels, conv_cfg, norm_cfg, norm_eval, with_cp, zero_init_residual, frozen_stages, init_cfg) - def _make_stage(self, - layer_config, - num_inchannels, - multiscale_output=True): + def _make_stage(self, layer_config, num_inchannels, multiscale_output=True): """Make each stage.""" - num_modules = layer_config['num_modules'] - num_branches = layer_config['num_branches'] - num_blocks = layer_config['num_blocks'] - num_channels = layer_config['num_channels'] - block = self.blocks_dict[layer_config['block']] - num_heads = layer_config['num_heads'] - num_window_sizes = layer_config['window_sizes'] - num_mlp_ratios = layer_config['mlp_ratios'] - drop_path_rates = layer_config['drop_path_rates'] + num_modules = layer_config["num_modules"] + num_branches = layer_config["num_branches"] + num_blocks = layer_config["num_blocks"] + num_channels = layer_config["num_channels"] + block = self.blocks_dict[layer_config["block"]] + num_heads = layer_config["num_heads"] + num_window_sizes = layer_config["window_sizes"] + num_mlp_ratios = layer_config["mlp_ratios"] + drop_path_rates = layer_config["drop_path_rates"] modules = [] for i in range(num_modules): @@ -744,15 +714,16 @@ class HRFormer(HRNet): num_window_sizes, num_mlp_ratios, reset_multiscale_output, - drop_paths=drop_path_rates[num_blocks[0] * - i:num_blocks[0] * (i + 1)], + drop_paths=drop_path_rates[num_blocks[0] * i : num_blocks[0] * (i + 1)], with_rpe=self.with_rpe, with_pad_mask=self.with_pad_mask, conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, transformer_norm_cfg=self.transformer_norm_cfg, with_cp=self.with_cp, - upsample_cfg=self.upsample_cfg)) + upsample_cfg=self.upsample_cfg, + ) + ) num_inchannels = modules[-1].get_num_inchannels() return nn.Sequential(*modules), num_inchannels diff --git a/mmpose/models/backbones/hrnet.py b/mmpose/models/backbones/hrnet.py index 381b22d60ec886ecb6d8c52fc9e7ccab52c05e99..59d5a28120fb26ad1f6a97e371c70772bb0600b8 100644 --- a/mmpose/models/backbones/hrnet.py +++ b/mmpose/models/backbones/hrnet.py @@ -7,6 +7,7 @@ from mmengine.model import BaseModule, constant_init from torch.nn.modules.batchnorm import _BatchNorm from mmpose.registry import MODELS + from .base_backbone import BaseBackbone from .resnet import BasicBlock, Bottleneck, get_expansion @@ -18,24 +19,25 @@ class HRModule(BaseModule): is in this module. """ - def __init__(self, - num_branches, - blocks, - num_blocks, - in_channels, - num_channels, - multiscale_output=False, - with_cp=False, - conv_cfg=None, - norm_cfg=dict(type='BN'), - upsample_cfg=dict(mode='nearest', align_corners=None), - init_cfg=None): + def __init__( + self, + num_branches, + blocks, + num_blocks, + in_channels, + num_channels, + multiscale_output=False, + with_cp=False, + conv_cfg=None, + norm_cfg=dict(type="BN"), + upsample_cfg=dict(mode="nearest", align_corners=None), + init_cfg=None, + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) - self._check_branches(num_branches, num_blocks, in_channels, - num_channels) + self._check_branches(num_branches, num_blocks, in_channels, num_channels) self.in_channels = in_channels self.num_branches = num_branches @@ -45,8 +47,7 @@ class HRModule(BaseModule): self.conv_cfg = conv_cfg self.upsample_cfg = upsample_cfg self.with_cp = with_cp - self.branches = self._make_branches(num_branches, blocks, num_blocks, - num_channels) + self.branches = self._make_branches(num_branches, blocks, num_blocks, num_channels) self.fuse_layers = self._make_fuse_layers() self.relu = nn.ReLU(inplace=True) @@ -54,31 +55,21 @@ class HRModule(BaseModule): def _check_branches(num_branches, num_blocks, in_channels, num_channels): """Check input to avoid ValueError.""" if num_branches != len(num_blocks): - error_msg = f'NUM_BRANCHES({num_branches}) ' \ - f'!= NUM_BLOCKS({len(num_blocks)})' + error_msg = f"NUM_BRANCHES({num_branches}) " f"!= NUM_BLOCKS({len(num_blocks)})" raise ValueError(error_msg) if num_branches != len(num_channels): - error_msg = f'NUM_BRANCHES({num_branches}) ' \ - f'!= NUM_CHANNELS({len(num_channels)})' + error_msg = f"NUM_BRANCHES({num_branches}) " f"!= NUM_CHANNELS({len(num_channels)})" raise ValueError(error_msg) if num_branches != len(in_channels): - error_msg = f'NUM_BRANCHES({num_branches}) ' \ - f'!= NUM_INCHANNELS({len(in_channels)})' + error_msg = f"NUM_BRANCHES({num_branches}) " f"!= NUM_INCHANNELS({len(in_channels)})" raise ValueError(error_msg) - def _make_one_branch(self, - branch_index, - block, - num_blocks, - num_channels, - stride=1): + def _make_one_branch(self, branch_index, block, num_blocks, num_channels, stride=1): """Make one branch.""" downsample = None - if stride != 1 or \ - self.in_channels[branch_index] != \ - num_channels[branch_index] * get_expansion(block): + if stride != 1 or self.in_channels[branch_index] != num_channels[branch_index] * get_expansion(block): downsample = nn.Sequential( build_conv_layer( self.conv_cfg, @@ -86,10 +77,10 @@ class HRModule(BaseModule): num_channels[branch_index] * get_expansion(block), kernel_size=1, stride=stride, - bias=False), - build_norm_layer( - self.norm_cfg, - num_channels[branch_index] * get_expansion(block))[1]) + bias=False, + ), + build_norm_layer(self.norm_cfg, num_channels[branch_index] * get_expansion(block))[1], + ) layers = [] layers.append( @@ -100,9 +91,10 @@ class HRModule(BaseModule): downsample=downsample, with_cp=self.with_cp, norm_cfg=self.norm_cfg, - conv_cfg=self.conv_cfg)) - self.in_channels[branch_index] = \ - num_channels[branch_index] * get_expansion(block) + conv_cfg=self.conv_cfg, + ) + ) + self.in_channels[branch_index] = num_channels[branch_index] * get_expansion(block) for _ in range(1, num_blocks[branch_index]): layers.append( block( @@ -110,7 +102,9 @@ class HRModule(BaseModule): num_channels[branch_index] * get_expansion(block), with_cp=self.with_cp, norm_cfg=self.norm_cfg, - conv_cfg=self.conv_cfg)) + conv_cfg=self.conv_cfg, + ) + ) return nn.Sequential(*layers) @@ -119,8 +113,7 @@ class HRModule(BaseModule): branches = [] for i in range(num_branches): - branches.append( - self._make_one_branch(i, block, num_blocks, num_channels)) + branches.append(self._make_one_branch(i, block, num_blocks, num_channels)) return nn.ModuleList(branches) @@ -140,20 +133,13 @@ class HRModule(BaseModule): if j > i: fuse_layer.append( nn.Sequential( - build_conv_layer( - self.conv_cfg, - in_channels[j], - in_channels[i], - kernel_size=1, - stride=1, - padding=0, - bias=False), + build_conv_layer(self.conv_cfg, in_channels[j], in_channels[i], kernel_size=1, stride=1, padding=0, bias=False), build_norm_layer(self.norm_cfg, in_channels[i])[1], nn.Upsample( - scale_factor=2**(j - i), - mode=self.upsample_cfg['mode'], - align_corners=self. - upsample_cfg['align_corners']))) + scale_factor=2 ** (j - i), mode=self.upsample_cfg["mode"], align_corners=self.upsample_cfg["align_corners"] + ), + ) + ) elif j == i: fuse_layer.append(None) else: @@ -163,29 +149,21 @@ class HRModule(BaseModule): conv_downsamples.append( nn.Sequential( build_conv_layer( - self.conv_cfg, - in_channels[j], - in_channels[i], - kernel_size=3, - stride=2, - padding=1, - bias=False), - build_norm_layer(self.norm_cfg, - in_channels[i])[1])) + self.conv_cfg, in_channels[j], in_channels[i], kernel_size=3, stride=2, padding=1, bias=False + ), + build_norm_layer(self.norm_cfg, in_channels[i])[1], + ) + ) else: conv_downsamples.append( nn.Sequential( build_conv_layer( - self.conv_cfg, - in_channels[j], - in_channels[j], - kernel_size=3, - stride=2, - padding=1, - bias=False), - build_norm_layer(self.norm_cfg, - in_channels[j])[1], - nn.ReLU(inplace=True))) + self.conv_cfg, in_channels[j], in_channels[j], kernel_size=3, stride=2, padding=1, bias=False + ), + build_norm_layer(self.norm_cfg, in_channels[j])[1], + nn.ReLU(inplace=True), + ) + ) fuse_layer.append(nn.Sequential(*conv_downsamples)) fuse_layers.append(nn.ModuleList(fuse_layer)) @@ -279,22 +257,19 @@ class HRNet(BaseBackbone): (1, 32, 8, 8) """ - blocks_dict = {'BASIC': BasicBlock, 'BOTTLENECK': Bottleneck} + blocks_dict = {"BASIC": BasicBlock, "BOTTLENECK": Bottleneck} def __init__( self, extra, in_channels=3, conv_cfg=None, - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), norm_eval=False, with_cp=False, zero_init_residual=False, frozen_stages=-1, - init_cfg=[ - dict(type='Normal', std=0.001, layer=['Conv2d']), - dict(type='Constant', val=1, layer=['_BatchNorm', 'GroupNorm']) - ], + init_cfg=[dict(type="Normal", std=0.001, layer=["Conv2d"]), dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"])], ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) @@ -312,88 +287,58 @@ class HRNet(BaseBackbone): self.norm1_name, norm1 = build_norm_layer(self.norm_cfg, 64, postfix=1) self.norm2_name, norm2 = build_norm_layer(self.norm_cfg, 64, postfix=2) - self.conv1 = build_conv_layer( - self.conv_cfg, - in_channels, - 64, - kernel_size=3, - stride=2, - padding=1, - bias=False) + self.conv1 = build_conv_layer(self.conv_cfg, in_channels, 64, kernel_size=3, stride=2, padding=1, bias=False) self.add_module(self.norm1_name, norm1) - self.conv2 = build_conv_layer( - self.conv_cfg, - 64, - 64, - kernel_size=3, - stride=2, - padding=1, - bias=False) + self.conv2 = build_conv_layer(self.conv_cfg, 64, 64, kernel_size=3, stride=2, padding=1, bias=False) self.add_module(self.norm2_name, norm2) self.relu = nn.ReLU(inplace=True) - self.upsample_cfg = self.extra.get('upsample', { - 'mode': 'nearest', - 'align_corners': None - }) + self.upsample_cfg = self.extra.get("upsample", {"mode": "nearest", "align_corners": None}) # stage 1 - self.stage1_cfg = self.extra['stage1'] - num_channels = self.stage1_cfg['num_channels'][0] - block_type = self.stage1_cfg['block'] - num_blocks = self.stage1_cfg['num_blocks'][0] + self.stage1_cfg = self.extra["stage1"] + num_channels = self.stage1_cfg["num_channels"][0] + block_type = self.stage1_cfg["block"] + num_blocks = self.stage1_cfg["num_blocks"][0] block = self.blocks_dict[block_type] stage1_out_channels = num_channels * get_expansion(block) - self.layer1 = self._make_layer(block, 64, stage1_out_channels, - num_blocks) + self.layer1 = self._make_layer(block, 64, stage1_out_channels, num_blocks) # stage 2 - self.stage2_cfg = self.extra['stage2'] - num_channels = self.stage2_cfg['num_channels'] - block_type = self.stage2_cfg['block'] + self.stage2_cfg = self.extra["stage2"] + num_channels = self.stage2_cfg["num_channels"] + block_type = self.stage2_cfg["block"] block = self.blocks_dict[block_type] - num_channels = [ - channel * get_expansion(block) for channel in num_channels - ] - self.transition1 = self._make_transition_layer([stage1_out_channels], - num_channels) - self.stage2, pre_stage_channels = self._make_stage( - self.stage2_cfg, num_channels) + num_channels = [channel * get_expansion(block) for channel in num_channels] + self.transition1 = self._make_transition_layer([stage1_out_channels], num_channels) + self.stage2, pre_stage_channels = self._make_stage(self.stage2_cfg, num_channels) # stage 3 - self.stage3_cfg = self.extra['stage3'] - num_channels = self.stage3_cfg['num_channels'] - block_type = self.stage3_cfg['block'] + self.stage3_cfg = self.extra["stage3"] + num_channels = self.stage3_cfg["num_channels"] + block_type = self.stage3_cfg["block"] block = self.blocks_dict[block_type] - num_channels = [ - channel * get_expansion(block) for channel in num_channels - ] - self.transition2 = self._make_transition_layer(pre_stage_channels, - num_channels) - self.stage3, pre_stage_channels = self._make_stage( - self.stage3_cfg, num_channels) + num_channels = [channel * get_expansion(block) for channel in num_channels] + self.transition2 = self._make_transition_layer(pre_stage_channels, num_channels) + self.stage3, pre_stage_channels = self._make_stage(self.stage3_cfg, num_channels) # stage 4 - self.stage4_cfg = self.extra['stage4'] - num_channels = self.stage4_cfg['num_channels'] - block_type = self.stage4_cfg['block'] + self.stage4_cfg = self.extra["stage4"] + num_channels = self.stage4_cfg["num_channels"] + block_type = self.stage4_cfg["block"] block = self.blocks_dict[block_type] - num_channels = [ - channel * get_expansion(block) for channel in num_channels - ] - self.transition3 = self._make_transition_layer(pre_stage_channels, - num_channels) + num_channels = [channel * get_expansion(block) for channel in num_channels] + self.transition3 = self._make_transition_layer(pre_stage_channels, num_channels) self.stage4, pre_stage_channels = self._make_stage( - self.stage4_cfg, - num_channels, - multiscale_output=self.stage4_cfg.get('multiscale_output', False)) + self.stage4_cfg, num_channels, multiscale_output=self.stage4_cfg.get("multiscale_output", False) + ) self._freeze_stages() @@ -407,8 +352,7 @@ class HRNet(BaseBackbone): """nn.Module: the normalization layer named "norm2" """ return getattr(self, self.norm2_name) - def _make_transition_layer(self, num_channels_pre_layer, - num_channels_cur_layer): + def _make_transition_layer(self, num_channels_pre_layer, num_channels_cur_layer): """Make transition layer.""" num_branches_cur = len(num_channels_cur_layer) num_branches_pre = len(num_channels_pre_layer) @@ -426,30 +370,26 @@ class HRNet(BaseBackbone): kernel_size=3, stride=1, padding=1, - bias=False), - build_norm_layer(self.norm_cfg, - num_channels_cur_layer[i])[1], - nn.ReLU(inplace=True))) + bias=False, + ), + build_norm_layer(self.norm_cfg, num_channels_cur_layer[i])[1], + nn.ReLU(inplace=True), + ) + ) else: transition_layers.append(None) else: conv_downsamples = [] for j in range(i + 1 - num_branches_pre): in_channels = num_channels_pre_layer[-1] - out_channels = num_channels_cur_layer[i] \ - if j == i - num_branches_pre else in_channels + out_channels = num_channels_cur_layer[i] if j == i - num_branches_pre else in_channels conv_downsamples.append( nn.Sequential( - build_conv_layer( - self.conv_cfg, - in_channels, - out_channels, - kernel_size=3, - stride=2, - padding=1, - bias=False), + build_conv_layer(self.conv_cfg, in_channels, out_channels, kernel_size=3, stride=2, padding=1, bias=False), build_norm_layer(self.norm_cfg, out_channels)[1], - nn.ReLU(inplace=True))) + nn.ReLU(inplace=True), + ) + ) transition_layers.append(nn.Sequential(*conv_downsamples)) return nn.ModuleList(transition_layers) @@ -459,14 +399,9 @@ class HRNet(BaseBackbone): downsample = None if stride != 1 or in_channels != out_channels: downsample = nn.Sequential( - build_conv_layer( - self.conv_cfg, - in_channels, - out_channels, - kernel_size=1, - stride=stride, - bias=False), - build_norm_layer(self.norm_cfg, out_channels)[1]) + build_conv_layer(self.conv_cfg, in_channels, out_channels, kernel_size=1, stride=stride, bias=False), + build_norm_layer(self.norm_cfg, out_channels)[1], + ) layers = [] layers.append( @@ -477,25 +412,21 @@ class HRNet(BaseBackbone): downsample=downsample, with_cp=self.with_cp, norm_cfg=self.norm_cfg, - conv_cfg=self.conv_cfg)) + conv_cfg=self.conv_cfg, + ) + ) for _ in range(1, blocks): - layers.append( - block( - out_channels, - out_channels, - with_cp=self.with_cp, - norm_cfg=self.norm_cfg, - conv_cfg=self.conv_cfg)) + layers.append(block(out_channels, out_channels, with_cp=self.with_cp, norm_cfg=self.norm_cfg, conv_cfg=self.conv_cfg)) return nn.Sequential(*layers) def _make_stage(self, layer_config, in_channels, multiscale_output=True): """Make stage.""" - num_modules = layer_config['num_modules'] - num_branches = layer_config['num_branches'] - num_blocks = layer_config['num_blocks'] - num_channels = layer_config['num_channels'] - block = self.blocks_dict[layer_config['block']] + num_modules = layer_config["num_modules"] + num_branches = layer_config["num_branches"] + num_blocks = layer_config["num_blocks"] + num_channels = layer_config["num_channels"] + block = self.blocks_dict[layer_config["block"]] hr_modules = [] for i in range(num_modules): @@ -516,7 +447,9 @@ class HRNet(BaseBackbone): with_cp=self.with_cp, norm_cfg=self.norm_cfg, conv_cfg=self.conv_cfg, - upsample_cfg=self.upsample_cfg)) + upsample_cfg=self.upsample_cfg, + ) + ) in_channels = hr_modules[-1].in_channels @@ -534,16 +467,16 @@ class HRNet(BaseBackbone): for i in range(1, self.frozen_stages + 1): if i == 1: - m = getattr(self, 'layer1') + m = getattr(self, "layer1") else: - m = getattr(self, f'stage{i}') + m = getattr(self, f"stage{i}") m.eval() for param in m.parameters(): param.requires_grad = False if i < 4: - m = getattr(self, f'transition{i}') + m = getattr(self, f"transition{i}") m.eval() for param in m.parameters(): param.requires_grad = False @@ -552,8 +485,7 @@ class HRNet(BaseBackbone): """Initialize the weights in backbone.""" super(HRNet, self).init_weights() - if (isinstance(self.init_cfg, dict) - and self.init_cfg['type'] == 'Pretrained'): + if isinstance(self.init_cfg, dict) and self.init_cfg["type"] == "Pretrained": # Suppress zero_init_residual if use pretrained model. return @@ -575,7 +507,7 @@ class HRNet(BaseBackbone): x = self.layer1(x) x_list = [] - for i in range(self.stage2_cfg['num_branches']): + for i in range(self.stage2_cfg["num_branches"]): if self.transition1[i] is not None: x_list.append(self.transition1[i](x)) else: @@ -583,7 +515,7 @@ class HRNet(BaseBackbone): y_list = self.stage2(x_list) x_list = [] - for i in range(self.stage3_cfg['num_branches']): + for i in range(self.stage3_cfg["num_branches"]): if self.transition2[i] is not None: x_list.append(self.transition2[i](y_list[-1])) else: @@ -591,7 +523,7 @@ class HRNet(BaseBackbone): y_list = self.stage3(x_list) x_list = [] - for i in range(self.stage4_cfg['num_branches']): + for i in range(self.stage4_cfg["num_branches"]): if self.transition3[i] is not None: x_list.append(self.transition3[i](y_list[-1])) else: diff --git a/mmpose/models/backbones/litehrnet.py b/mmpose/models/backbones/litehrnet.py index 1ad5f63014553129a02ca3dc4bfda4c181fcd6a6..48882f286ddd719332ab7c509490f29c46fc7265 100644 --- a/mmpose/models/backbones/litehrnet.py +++ b/mmpose/models/backbones/litehrnet.py @@ -8,12 +8,12 @@ import torch import torch.nn as nn import torch.nn.functional as F import torch.utils.checkpoint as cp -from mmcv.cnn import (ConvModule, DepthwiseSeparableConvModule, - build_conv_layer, build_norm_layer) +from mmcv.cnn import ConvModule, DepthwiseSeparableConvModule, build_conv_layer, build_norm_layer from mmengine.model import BaseModule from torch.nn.modules.batchnorm import _BatchNorm from mmpose.registry import MODELS + from .base_backbone import BaseBackbone from .utils import channel_shuffle @@ -35,13 +35,7 @@ class SpatialWeighting(BaseModule): Default: None """ - def __init__(self, - channels, - ratio=16, - conv_cfg=None, - norm_cfg=None, - act_cfg=(dict(type='ReLU'), dict(type='Sigmoid')), - init_cfg=None): + def __init__(self, channels, ratio=16, conv_cfg=None, norm_cfg=None, act_cfg=(dict(type="ReLU"), dict(type="Sigmoid")), init_cfg=None): super().__init__(init_cfg=init_cfg) if isinstance(act_cfg, dict): act_cfg = (act_cfg, act_cfg) @@ -55,7 +49,8 @@ class SpatialWeighting(BaseModule): stride=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg[0]) + act_cfg=act_cfg[0], + ) self.conv2 = ConvModule( in_channels=int(channels / ratio), out_channels=channels, @@ -63,7 +58,8 @@ class SpatialWeighting(BaseModule): stride=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg[1]) + act_cfg=act_cfg[1], + ) def forward(self, x): out = self.global_avgpool(x) @@ -89,13 +85,7 @@ class CrossResolutionWeighting(BaseModule): Default: None """ - def __init__(self, - channels, - ratio=16, - conv_cfg=None, - norm_cfg=None, - act_cfg=(dict(type='ReLU'), dict(type='Sigmoid')), - init_cfg=None): + def __init__(self, channels, ratio=16, conv_cfg=None, norm_cfg=None, act_cfg=(dict(type="ReLU"), dict(type="Sigmoid")), init_cfg=None): super().__init__(init_cfg=init_cfg) if isinstance(act_cfg, dict): act_cfg = (act_cfg, act_cfg) @@ -110,7 +100,8 @@ class CrossResolutionWeighting(BaseModule): stride=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg[0]) + act_cfg=act_cfg[0], + ) self.conv2 = ConvModule( in_channels=int(total_channel / ratio), out_channels=total_channel, @@ -118,7 +109,8 @@ class CrossResolutionWeighting(BaseModule): stride=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg[1]) + act_cfg=act_cfg[1], + ) def forward(self, x): mini_size = x[-1].size()[-2:] @@ -127,10 +119,7 @@ class CrossResolutionWeighting(BaseModule): out = self.conv1(out) out = self.conv2(out) out = torch.split(out, self.channels, dim=1) - out = [ - s * F.interpolate(a, size=s.size()[-2:], mode='nearest') - for s, a in zip(x, out) - ] + out = [s * F.interpolate(a, size=s.size()[-2:], mode="nearest") for s, a in zip(x, out)] return out @@ -151,14 +140,7 @@ class ConditionalChannelWeighting(BaseModule): Default: None """ - def __init__(self, - in_channels, - stride, - reduce_ratio, - conv_cfg=None, - norm_cfg=dict(type='BN'), - with_cp=False, - init_cfg=None): + def __init__(self, in_channels, stride, reduce_ratio, conv_cfg=None, norm_cfg=dict(type="BN"), with_cp=False, init_cfg=None): super().__init__(init_cfg=init_cfg) self.with_cp = with_cp self.stride = stride @@ -167,28 +149,27 @@ class ConditionalChannelWeighting(BaseModule): branch_channels = [channel // 2 for channel in in_channels] self.cross_resolution_weighting = CrossResolutionWeighting( - branch_channels, - ratio=reduce_ratio, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg) + branch_channels, ratio=reduce_ratio, conv_cfg=conv_cfg, norm_cfg=norm_cfg + ) - self.depthwise_convs = nn.ModuleList([ - ConvModule( - channel, - channel, - kernel_size=3, - stride=self.stride, - padding=1, - groups=channel, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=None) for channel in branch_channels - ]) + self.depthwise_convs = nn.ModuleList( + [ + ConvModule( + channel, + channel, + kernel_size=3, + stride=self.stride, + padding=1, + groups=channel, + conv_cfg=conv_cfg, + norm_cfg=norm_cfg, + act_cfg=None, + ) + for channel in branch_channels + ] + ) - self.spatial_weighting = nn.ModuleList([ - SpatialWeighting(channels=channel, ratio=4) - for channel in branch_channels - ]) + self.spatial_weighting = nn.ModuleList([SpatialWeighting(channels=channel, ratio=4) for channel in branch_channels]) def forward(self, x): @@ -233,15 +214,9 @@ class Stem(BaseModule): Default: None """ - def __init__(self, - in_channels, - stem_channels, - out_channels, - expand_ratio, - conv_cfg=None, - norm_cfg=dict(type='BN'), - with_cp=False, - init_cfg=None): + def __init__( + self, in_channels, stem_channels, out_channels, expand_ratio, conv_cfg=None, norm_cfg=dict(type="BN"), with_cp=False, init_cfg=None + ): super().__init__(init_cfg=init_cfg) self.in_channels = in_channels self.out_channels = out_channels @@ -257,7 +232,8 @@ class Stem(BaseModule): padding=1, conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, - act_cfg=dict(type='ReLU')) + act_cfg=dict(type="ReLU"), + ) mid_channels = int(round(stem_channels * expand_ratio)) branch_channels = stem_channels // 2 @@ -276,7 +252,8 @@ class Stem(BaseModule): groups=branch_channels, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=None), + act_cfg=None, + ), ConvModule( branch_channels, inc_channels, @@ -285,7 +262,8 @@ class Stem(BaseModule): padding=0, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=dict(type='ReLU')), + act_cfg=dict(type="ReLU"), + ), ) self.expand_conv = ConvModule( @@ -296,7 +274,8 @@ class Stem(BaseModule): padding=0, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=dict(type='ReLU')) + act_cfg=dict(type="ReLU"), + ) self.depthwise_conv = ConvModule( mid_channels, mid_channels, @@ -306,17 +285,18 @@ class Stem(BaseModule): groups=mid_channels, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=None) + act_cfg=None, + ) self.linear_conv = ConvModule( mid_channels, - branch_channels - if stem_channels == self.out_channels else stem_channels, + branch_channels if stem_channels == self.out_channels else stem_channels, kernel_size=1, stride=1, padding=0, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=dict(type='ReLU')) + act_cfg=dict(type="ReLU"), + ) def forward(self, x): @@ -353,7 +333,7 @@ class IterativeHead(BaseModule): Default: None """ - def __init__(self, in_channels, norm_cfg=dict(type='BN'), init_cfg=None): + def __init__(self, in_channels, norm_cfg=dict(type="BN"), init_cfg=None): super().__init__(init_cfg=init_cfg) projects = [] num_branchs = len(in_channels) @@ -369,9 +349,11 @@ class IterativeHead(BaseModule): stride=1, padding=1, norm_cfg=norm_cfg, - act_cfg=dict(type='ReLU'), + act_cfg=dict(type="ReLU"), dw_act_cfg=None, - pw_act_cfg=dict(type='ReLU'))) + pw_act_cfg=dict(type="ReLU"), + ) + ) else: projects.append( DepthwiseSeparableConvModule( @@ -381,9 +363,11 @@ class IterativeHead(BaseModule): stride=1, padding=1, norm_cfg=norm_cfg, - act_cfg=dict(type='ReLU'), + act_cfg=dict(type="ReLU"), dw_act_cfg=None, - pw_act_cfg=dict(type='ReLU'))) + pw_act_cfg=dict(type="ReLU"), + ) + ) self.projects = nn.ModuleList(projects) def forward(self, x): @@ -393,11 +377,7 @@ class IterativeHead(BaseModule): last_x = None for i, s in enumerate(x): if last_x is not None: - last_x = F.interpolate( - last_x, - size=s.size()[-2:], - mode='bilinear', - align_corners=True) + last_x = F.interpolate(last_x, size=s.size()[-2:], mode="bilinear", align_corners=True) s = s + last_x s = self.projects[i](s) y.append(s) @@ -425,15 +405,17 @@ class ShuffleUnit(BaseModule): Default: None """ - def __init__(self, - in_channels, - out_channels, - stride=1, - conv_cfg=None, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='ReLU'), - with_cp=False, - init_cfg=None): + def __init__( + self, + in_channels, + out_channels, + stride=1, + conv_cfg=None, + norm_cfg=dict(type="BN"), + act_cfg=dict(type="ReLU"), + with_cp=False, + init_cfg=None, + ): super().__init__(init_cfg=init_cfg) self.stride = stride self.with_cp = with_cp @@ -441,14 +423,11 @@ class ShuffleUnit(BaseModule): branch_features = out_channels // 2 if self.stride == 1: assert in_channels == branch_features * 2, ( - f'in_channels ({in_channels}) should equal to ' - f'branch_features * 2 ({branch_features * 2}) ' - 'when stride is 1') + f"in_channels ({in_channels}) should equal to " f"branch_features * 2 ({branch_features * 2}) " "when stride is 1" + ) if in_channels != branch_features * 2: - assert self.stride != 1, ( - f'stride ({self.stride}) should not equal 1 when ' - f'in_channels != branch_features * 2') + assert self.stride != 1, f"stride ({self.stride}) should not equal 1 when " f"in_channels != branch_features * 2" if self.stride > 1: self.branch1 = nn.Sequential( @@ -461,16 +440,11 @@ class ShuffleUnit(BaseModule): groups=in_channels, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=None), + act_cfg=None, + ), ConvModule( - in_channels, - branch_features, - kernel_size=1, - stride=1, - padding=0, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg), + in_channels, branch_features, kernel_size=1, stride=1, padding=0, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg + ), ) self.branch2 = nn.Sequential( @@ -482,7 +456,8 @@ class ShuffleUnit(BaseModule): padding=0, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg), + act_cfg=act_cfg, + ), ConvModule( branch_features, branch_features, @@ -492,16 +467,12 @@ class ShuffleUnit(BaseModule): groups=branch_features, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=None), + act_cfg=None, + ), ConvModule( - branch_features, - branch_features, - kernel_size=1, - stride=1, - padding=0, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg)) + branch_features, branch_features, kernel_size=1, stride=1, padding=0, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg + ), + ) def forward(self, x): @@ -547,18 +518,20 @@ class LiteHRModule(BaseModule): Default: None """ - def __init__(self, - num_branches, - num_blocks, - in_channels, - reduce_ratio, - module_type, - multiscale_output=False, - with_fuse=True, - conv_cfg=None, - norm_cfg=dict(type='BN'), - with_cp=False, - init_cfg=None): + def __init__( + self, + num_branches, + num_blocks, + in_channels, + reduce_ratio, + module_type, + multiscale_output=False, + with_fuse=True, + conv_cfg=None, + norm_cfg=dict(type="BN"), + with_cp=False, + init_cfg=None, + ): super().__init__(init_cfg=init_cfg) self._check_branches(num_branches, in_channels) @@ -572,9 +545,9 @@ class LiteHRModule(BaseModule): self.conv_cfg = conv_cfg self.with_cp = with_cp - if self.module_type.upper() == 'LITE': + if self.module_type.upper() == "LITE": self.layers = self._make_weighting_blocks(num_blocks, reduce_ratio) - elif self.module_type.upper() == 'NAIVE': + elif self.module_type.upper() == "NAIVE": self.layers = self._make_naive_branches(num_branches, num_blocks) else: raise ValueError("module_type should be either 'LITE' or 'NAIVE'.") @@ -585,8 +558,7 @@ class LiteHRModule(BaseModule): def _check_branches(self, num_branches, in_channels): """Check input to avoid ValueError.""" if num_branches != len(in_channels): - error_msg = f'NUM_BRANCHES({num_branches}) ' \ - f'!= NUM_INCHANNELS({len(in_channels)})' + error_msg = f"NUM_BRANCHES({num_branches}) " f"!= NUM_INCHANNELS({len(in_channels)})" raise ValueError(error_msg) def _make_weighting_blocks(self, num_blocks, reduce_ratio, stride=1): @@ -600,7 +572,9 @@ class LiteHRModule(BaseModule): reduce_ratio=reduce_ratio, conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, - with_cp=self.with_cp)) + with_cp=self.with_cp, + ) + ) return nn.Sequential(*layers) @@ -614,8 +588,10 @@ class LiteHRModule(BaseModule): stride=stride, conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, - act_cfg=dict(type='ReLU'), - with_cp=self.with_cp)) + act_cfg=dict(type="ReLU"), + with_cp=self.with_cp, + ) + ) for i in range(1, num_blocks): layers.append( ShuffleUnit( @@ -624,8 +600,10 @@ class LiteHRModule(BaseModule): stride=1, conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, - act_cfg=dict(type='ReLU'), - with_cp=self.with_cp)) + act_cfg=dict(type="ReLU"), + with_cp=self.with_cp, + ) + ) return nn.Sequential(*layers) @@ -653,17 +631,11 @@ class LiteHRModule(BaseModule): if j > i: fuse_layer.append( nn.Sequential( - build_conv_layer( - self.conv_cfg, - in_channels[j], - in_channels[i], - kernel_size=1, - stride=1, - padding=0, - bias=False), + build_conv_layer(self.conv_cfg, in_channels[j], in_channels[i], kernel_size=1, stride=1, padding=0, bias=False), build_norm_layer(self.norm_cfg, in_channels[i])[1], - nn.Upsample( - scale_factor=2**(j - i), mode='nearest'))) + nn.Upsample(scale_factor=2 ** (j - i), mode="nearest"), + ) + ) elif j == i: fuse_layer.append(None) else: @@ -680,19 +652,15 @@ class LiteHRModule(BaseModule): stride=2, padding=1, groups=in_channels[j], - bias=False), - build_norm_layer(self.norm_cfg, - in_channels[j])[1], + bias=False, + ), + build_norm_layer(self.norm_cfg, in_channels[j])[1], build_conv_layer( - self.conv_cfg, - in_channels[j], - in_channels[i], - kernel_size=1, - stride=1, - padding=0, - bias=False), - build_norm_layer(self.norm_cfg, - in_channels[i])[1])) + self.conv_cfg, in_channels[j], in_channels[i], kernel_size=1, stride=1, padding=0, bias=False + ), + build_norm_layer(self.norm_cfg, in_channels[i])[1], + ) + ) else: conv_downsamples.append( nn.Sequential( @@ -704,20 +672,16 @@ class LiteHRModule(BaseModule): stride=2, padding=1, groups=in_channels[j], - bias=False), - build_norm_layer(self.norm_cfg, - in_channels[j])[1], + bias=False, + ), + build_norm_layer(self.norm_cfg, in_channels[j])[1], build_conv_layer( - self.conv_cfg, - in_channels[j], - in_channels[j], - kernel_size=1, - stride=1, - padding=0, - bias=False), - build_norm_layer(self.norm_cfg, - in_channels[j])[1], - nn.ReLU(inplace=True))) + self.conv_cfg, in_channels[j], in_channels[j], kernel_size=1, stride=1, padding=0, bias=False + ), + build_norm_layer(self.norm_cfg, in_channels[j])[1], + nn.ReLU(inplace=True), + ) + ) fuse_layer.append(nn.Sequential(*conv_downsamples)) fuse_layers.append(nn.ModuleList(fuse_layer)) @@ -728,9 +692,9 @@ class LiteHRModule(BaseModule): if self.num_branches == 1: return [self.layers[0](x[0])] - if self.module_type.upper() == 'LITE': + if self.module_type.upper() == "LITE": out = self.layers(x) - elif self.module_type.upper() == 'NAIVE': + elif self.module_type.upper() == "NAIVE": for i in range(self.num_branches): x[i] = self.layers[i](x[i]) out = x @@ -809,20 +773,16 @@ class LiteHRNet(BaseBackbone): (1, 40, 8, 8) """ - def __init__(self, - extra, - in_channels=3, - conv_cfg=None, - norm_cfg=dict(type='BN'), - norm_eval=False, - with_cp=False, - init_cfg=[ - dict(type='Normal', std=0.001, layer=['Conv2d']), - dict( - type='Constant', - val=1, - layer=['_BatchNorm', 'GroupNorm']) - ]): + def __init__( + self, + extra, + in_channels=3, + conv_cfg=None, + norm_cfg=dict(type="BN"), + norm_eval=False, + with_cp=False, + init_cfg=[dict(type="Normal", std=0.001, layer=["Conv2d"]), dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"])], + ): super().__init__(init_cfg=init_cfg) self.extra = extra self.conv_cfg = conv_cfg @@ -832,38 +792,35 @@ class LiteHRNet(BaseBackbone): self.stem = Stem( in_channels, - stem_channels=self.extra['stem']['stem_channels'], - out_channels=self.extra['stem']['out_channels'], - expand_ratio=self.extra['stem']['expand_ratio'], + stem_channels=self.extra["stem"]["stem_channels"], + out_channels=self.extra["stem"]["out_channels"], + expand_ratio=self.extra["stem"]["expand_ratio"], conv_cfg=self.conv_cfg, - norm_cfg=self.norm_cfg) + norm_cfg=self.norm_cfg, + ) - self.num_stages = self.extra['num_stages'] - self.stages_spec = self.extra['stages_spec'] + self.num_stages = self.extra["num_stages"] + self.stages_spec = self.extra["stages_spec"] num_channels_last = [ self.stem.out_channels, ] for i in range(self.num_stages): - num_channels = self.stages_spec['num_channels'][i] + num_channels = self.stages_spec["num_channels"][i] num_channels = [num_channels[i] for i in range(len(num_channels))] - setattr( - self, f'transition{i}', - self._make_transition_layer(num_channels_last, num_channels)) + setattr(self, f"transition{i}", self._make_transition_layer(num_channels_last, num_channels)) - stage, num_channels_last = self._make_stage( - self.stages_spec, i, num_channels, multiscale_output=True) - setattr(self, f'stage{i}', stage) + stage, num_channels_last = self._make_stage(self.stages_spec, i, num_channels, multiscale_output=True) + setattr(self, f"stage{i}", stage) - self.with_head = self.extra['with_head'] + self.with_head = self.extra["with_head"] if self.with_head: self.head_layer = IterativeHead( in_channels=num_channels_last, norm_cfg=self.norm_cfg, ) - def _make_transition_layer(self, num_channels_pre_layer, - num_channels_cur_layer): + def _make_transition_layer(self, num_channels_pre_layer, num_channels_cur_layer): """Make transition layer.""" num_branches_cur = len(num_channels_cur_layer) num_branches_pre = len(num_channels_pre_layer) @@ -882,9 +839,9 @@ class LiteHRNet(BaseBackbone): stride=1, padding=1, groups=num_channels_pre_layer[i], - bias=False), - build_norm_layer(self.norm_cfg, - num_channels_pre_layer[i])[1], + bias=False, + ), + build_norm_layer(self.norm_cfg, num_channels_pre_layer[i])[1], build_conv_layer( self.conv_cfg, num_channels_pre_layer[i], @@ -892,55 +849,41 @@ class LiteHRNet(BaseBackbone): kernel_size=1, stride=1, padding=0, - bias=False), - build_norm_layer(self.norm_cfg, - num_channels_cur_layer[i])[1], - nn.ReLU())) + bias=False, + ), + build_norm_layer(self.norm_cfg, num_channels_cur_layer[i])[1], + nn.ReLU(), + ) + ) else: transition_layers.append(None) else: conv_downsamples = [] for j in range(i + 1 - num_branches_pre): in_channels = num_channels_pre_layer[-1] - out_channels = num_channels_cur_layer[i] \ - if j == i - num_branches_pre else in_channels + out_channels = num_channels_cur_layer[i] if j == i - num_branches_pre else in_channels conv_downsamples.append( nn.Sequential( build_conv_layer( - self.conv_cfg, - in_channels, - in_channels, - kernel_size=3, - stride=2, - padding=1, - groups=in_channels, - bias=False), + self.conv_cfg, in_channels, in_channels, kernel_size=3, stride=2, padding=1, groups=in_channels, bias=False + ), build_norm_layer(self.norm_cfg, in_channels)[1], - build_conv_layer( - self.conv_cfg, - in_channels, - out_channels, - kernel_size=1, - stride=1, - padding=0, - bias=False), + build_conv_layer(self.conv_cfg, in_channels, out_channels, kernel_size=1, stride=1, padding=0, bias=False), build_norm_layer(self.norm_cfg, out_channels)[1], - nn.ReLU())) + nn.ReLU(), + ) + ) transition_layers.append(nn.Sequential(*conv_downsamples)) return nn.ModuleList(transition_layers) - def _make_stage(self, - stages_spec, - stage_index, - in_channels, - multiscale_output=True): - num_modules = stages_spec['num_modules'][stage_index] - num_branches = stages_spec['num_branches'][stage_index] - num_blocks = stages_spec['num_blocks'][stage_index] - reduce_ratio = stages_spec['reduce_ratios'][stage_index] - with_fuse = stages_spec['with_fuse'][stage_index] - module_type = stages_spec['module_type'][stage_index] + def _make_stage(self, stages_spec, stage_index, in_channels, multiscale_output=True): + num_modules = stages_spec["num_modules"][stage_index] + num_branches = stages_spec["num_branches"][stage_index] + num_blocks = stages_spec["num_blocks"][stage_index] + reduce_ratio = stages_spec["reduce_ratios"][stage_index] + with_fuse = stages_spec["with_fuse"][stage_index] + module_type = stages_spec["module_type"][stage_index] modules = [] for i in range(num_modules): @@ -961,7 +904,9 @@ class LiteHRNet(BaseBackbone): with_fuse=with_fuse, conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, - with_cp=self.with_cp)) + with_cp=self.with_cp, + ) + ) in_channels = modules[-1].in_channels return nn.Sequential(*modules), in_channels @@ -973,8 +918,8 @@ class LiteHRNet(BaseBackbone): y_list = [x] for i in range(self.num_stages): x_list = [] - transition = getattr(self, f'transition{i}') - for j in range(self.stages_spec['num_branches'][i]): + transition = getattr(self, f"transition{i}") + for j in range(self.stages_spec["num_branches"][i]): if transition[j]: if j >= len(y_list): x_list.append(transition[j](y_list[-1])) @@ -982,13 +927,13 @@ class LiteHRNet(BaseBackbone): x_list.append(transition[j](y_list[j])) else: x_list.append(y_list[j]) - y_list = getattr(self, f'stage{i}')(x_list) + y_list = getattr(self, f"stage{i}")(x_list) x = y_list if self.with_head: x = self.head_layer(x) - return (x[0], ) + return (x[0],) def train(self, mode=True): """Convert the model into training mode.""" diff --git a/mmpose/models/backbones/mobilenet_v2.py b/mmpose/models/backbones/mobilenet_v2.py index b64c0d73d41d3763018a8e46621c6ab695be6856..80d780e6f58344fa5f36f2dff6ed49f6e95a37bf 100644 --- a/mmpose/models/backbones/mobilenet_v2.py +++ b/mmpose/models/backbones/mobilenet_v2.py @@ -8,6 +8,7 @@ from mmengine.model import BaseModule from torch.nn.modules.batchnorm import _BatchNorm from mmpose.registry import MODELS + from .base_backbone import BaseBackbone from .utils import make_divisible @@ -33,23 +34,24 @@ class InvertedResidual(BaseModule): Default: None """ - def __init__(self, - in_channels, - out_channels, - stride, - expand_ratio, - conv_cfg=None, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='ReLU6'), - with_cp=False, - init_cfg=None): + def __init__( + self, + in_channels, + out_channels, + stride, + expand_ratio, + conv_cfg=None, + norm_cfg=dict(type="BN"), + act_cfg=dict(type="ReLU6"), + with_cp=False, + init_cfg=None, + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) act_cfg = copy.deepcopy(act_cfg) super().__init__(init_cfg=init_cfg) self.stride = stride - assert stride in [1, 2], f'stride must in [1, 2]. ' \ - f'But received {stride}.' + assert stride in [1, 2], f"stride must in [1, 2]. " f"But received {stride}." self.with_cp = with_cp self.use_res_connect = self.stride == 1 and in_channels == out_channels hidden_dim = int(round(in_channels * expand_ratio)) @@ -58,31 +60,27 @@ class InvertedResidual(BaseModule): if expand_ratio != 1: layers.append( ConvModule( - in_channels=in_channels, + in_channels=in_channels, out_channels=hidden_dim, kernel_size=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg + ) + ) + layers.extend( + [ + ConvModule( + in_channels=hidden_dim, out_channels=hidden_dim, - kernel_size=1, + kernel_size=3, + stride=stride, + padding=1, + groups=hidden_dim, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg)) - layers.extend([ - ConvModule( - in_channels=hidden_dim, - out_channels=hidden_dim, - kernel_size=3, - stride=stride, - padding=1, - groups=hidden_dim, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg), - ConvModule( - in_channels=hidden_dim, - out_channels=out_channels, - kernel_size=1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=None) - ]) + act_cfg=act_cfg, + ), + ConvModule( + in_channels=hidden_dim, out_channels=out_channels, kernel_size=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=None + ), + ] + ) self.conv = nn.Sequential(*layers) def forward(self, x): @@ -135,26 +133,20 @@ class MobileNetV2(BaseBackbone): # Parameters to build layers. 4 parameters are needed to construct a # layer, from left to right: expand_ratio, channel, num_blocks, stride. - arch_settings = [[1, 16, 1, 1], [6, 24, 2, 2], [6, 32, 3, 2], - [6, 64, 4, 2], [6, 96, 3, 1], [6, 160, 3, 2], - [6, 320, 1, 1]] - - def __init__(self, - widen_factor=1., - out_indices=(7, ), - frozen_stages=-1, - conv_cfg=None, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='ReLU6'), - norm_eval=False, - with_cp=False, - init_cfg=[ - dict(type='Kaiming', layer=['Conv2d']), - dict( - type='Constant', - val=1, - layer=['_BatchNorm', 'GroupNorm']) - ]): + arch_settings = [[1, 16, 1, 1], [6, 24, 2, 2], [6, 32, 3, 2], [6, 64, 4, 2], [6, 96, 3, 1], [6, 160, 3, 2], [6, 320, 1, 1]] + + def __init__( + self, + widen_factor=1.0, + out_indices=(7,), + frozen_stages=-1, + conv_cfg=None, + norm_cfg=dict(type="BN"), + act_cfg=dict(type="ReLU6"), + norm_eval=False, + with_cp=False, + init_cfg=[dict(type="Kaiming", layer=["Conv2d"]), dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"])], + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) act_cfg = copy.deepcopy(act_cfg) @@ -163,12 +155,10 @@ class MobileNetV2(BaseBackbone): self.out_indices = out_indices for index in out_indices: if index not in range(0, 8): - raise ValueError('the item in out_indices must in ' - f'range(0, 8). But received {index}') + raise ValueError("the item in out_indices must in " f"range(0, 8). But received {index}") if frozen_stages not in range(-1, 8): - raise ValueError('frozen_stages must be in range(-1, 8). ' - f'But received {frozen_stages}') + raise ValueError("frozen_stages must be in range(-1, 8). " f"But received {frozen_stages}") self.out_indices = out_indices self.frozen_stages = frozen_stages self.conv_cfg = conv_cfg @@ -187,19 +177,16 @@ class MobileNetV2(BaseBackbone): padding=1, conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, - act_cfg=self.act_cfg) + act_cfg=self.act_cfg, + ) self.layers = [] for i, layer_cfg in enumerate(self.arch_settings): expand_ratio, channel, num_blocks, stride = layer_cfg out_channels = make_divisible(channel * widen_factor, 8) - inverted_res_layer = self.make_layer( - out_channels=out_channels, - num_blocks=num_blocks, - stride=stride, - expand_ratio=expand_ratio) - layer_name = f'layer{i + 1}' + inverted_res_layer = self.make_layer(out_channels=out_channels, num_blocks=num_blocks, stride=stride, expand_ratio=expand_ratio) + layer_name = f"layer{i + 1}" self.add_module(layer_name, inverted_res_layer) self.layers.append(layer_name) @@ -216,9 +203,10 @@ class MobileNetV2(BaseBackbone): padding=0, conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, - act_cfg=self.act_cfg) - self.add_module('conv2', layer) - self.layers.append('conv2') + act_cfg=self.act_cfg, + ) + self.add_module("conv2", layer) + self.layers.append("conv2") def make_layer(self, out_channels, num_blocks, stride, expand_ratio): """Stack InvertedResidual blocks to build a layer for MobileNetV2. @@ -243,7 +231,9 @@ class MobileNetV2(BaseBackbone): conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, act_cfg=self.act_cfg, - with_cp=self.with_cp)) + with_cp=self.with_cp, + ) + ) self.in_channels = out_channels return nn.Sequential(*layers) @@ -265,7 +255,7 @@ class MobileNetV2(BaseBackbone): for param in self.conv1.parameters(): param.requires_grad = False for i in range(1, self.frozen_stages + 1): - layer = getattr(self, f'layer{i}') + layer = getattr(self, f"layer{i}") layer.eval() for param in layer.parameters(): param.requires_grad = False diff --git a/mmpose/models/backbones/mobilenet_v3.py b/mmpose/models/backbones/mobilenet_v3.py index 03ecf90dd22d42a3650a4eac00c070ec556c7912..b333f73347fbcd3bb4b174c41f485f3dd862fae9 100644 --- a/mmpose/models/backbones/mobilenet_v3.py +++ b/mmpose/models/backbones/mobilenet_v3.py @@ -5,6 +5,7 @@ from mmcv.cnn import ConvModule from torch.nn.modules.batchnorm import _BatchNorm from mmpose.registry import MODELS + from .base_backbone import BaseBackbone from .utils import InvertedResidual @@ -40,64 +41,63 @@ class MobileNetV3(BaseBackbone): layer=['_BatchNorm']) ]`` """ + # Parameters to build each block: # [kernel size, mid channels, out channels, with_se, act type, stride] arch_settings = { - 'small': [[3, 16, 16, True, 'ReLU', 2], - [3, 72, 24, False, 'ReLU', 2], - [3, 88, 24, False, 'ReLU', 1], - [5, 96, 40, True, 'HSwish', 2], - [5, 240, 40, True, 'HSwish', 1], - [5, 240, 40, True, 'HSwish', 1], - [5, 120, 48, True, 'HSwish', 1], - [5, 144, 48, True, 'HSwish', 1], - [5, 288, 96, True, 'HSwish', 2], - [5, 576, 96, True, 'HSwish', 1], - [5, 576, 96, True, 'HSwish', 1]], - 'big': [[3, 16, 16, False, 'ReLU', 1], - [3, 64, 24, False, 'ReLU', 2], - [3, 72, 24, False, 'ReLU', 1], - [5, 72, 40, True, 'ReLU', 2], - [5, 120, 40, True, 'ReLU', 1], - [5, 120, 40, True, 'ReLU', 1], - [3, 240, 80, False, 'HSwish', 2], - [3, 200, 80, False, 'HSwish', 1], - [3, 184, 80, False, 'HSwish', 1], - [3, 184, 80, False, 'HSwish', 1], - [3, 480, 112, True, 'HSwish', 1], - [3, 672, 112, True, 'HSwish', 1], - [5, 672, 160, True, 'HSwish', 1], - [5, 672, 160, True, 'HSwish', 2], - [5, 960, 160, True, 'HSwish', 1]] + "small": [ + [3, 16, 16, True, "ReLU", 2], + [3, 72, 24, False, "ReLU", 2], + [3, 88, 24, False, "ReLU", 1], + [5, 96, 40, True, "HSwish", 2], + [5, 240, 40, True, "HSwish", 1], + [5, 240, 40, True, "HSwish", 1], + [5, 120, 48, True, "HSwish", 1], + [5, 144, 48, True, "HSwish", 1], + [5, 288, 96, True, "HSwish", 2], + [5, 576, 96, True, "HSwish", 1], + [5, 576, 96, True, "HSwish", 1], + ], + "big": [ + [3, 16, 16, False, "ReLU", 1], + [3, 64, 24, False, "ReLU", 2], + [3, 72, 24, False, "ReLU", 1], + [5, 72, 40, True, "ReLU", 2], + [5, 120, 40, True, "ReLU", 1], + [5, 120, 40, True, "ReLU", 1], + [3, 240, 80, False, "HSwish", 2], + [3, 200, 80, False, "HSwish", 1], + [3, 184, 80, False, "HSwish", 1], + [3, 184, 80, False, "HSwish", 1], + [3, 480, 112, True, "HSwish", 1], + [3, 672, 112, True, "HSwish", 1], + [5, 672, 160, True, "HSwish", 1], + [5, 672, 160, True, "HSwish", 2], + [5, 960, 160, True, "HSwish", 1], + ], } # yapf: disable - def __init__(self, - arch='small', - conv_cfg=None, - norm_cfg=dict(type='BN'), - out_indices=(-1, ), - frozen_stages=-1, - norm_eval=False, - with_cp=False, - init_cfg=[ - dict(type='Kaiming', layer=['Conv2d']), - dict(type='Constant', val=1, layer=['_BatchNorm']) - ]): + def __init__( + self, + arch="small", + conv_cfg=None, + norm_cfg=dict(type="BN"), + out_indices=(-1,), + frozen_stages=-1, + norm_eval=False, + with_cp=False, + init_cfg=[dict(type="Kaiming", layer=["Conv2d"]), dict(type="Constant", val=1, layer=["_BatchNorm"])], + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) assert arch in self.arch_settings for index in out_indices: - if index not in range(-len(self.arch_settings[arch]), - len(self.arch_settings[arch])): - raise ValueError('the item in out_indices must in ' - f'range(0, {len(self.arch_settings[arch])}). ' - f'But received {index}') + if index not in range(-len(self.arch_settings[arch]), len(self.arch_settings[arch])): + raise ValueError("the item in out_indices must in " f"range(0, {len(self.arch_settings[arch])}). " f"But received {index}") if frozen_stages not in range(-1, len(self.arch_settings[arch])): - raise ValueError('frozen_stages must be in range(-1, ' - f'{len(self.arch_settings[arch])}). ' - f'But received {frozen_stages}') + raise ValueError("frozen_stages must be in range(-1, " f"{len(self.arch_settings[arch])}). " f"But received {frozen_stages}") self.arch = arch self.conv_cfg = conv_cfg self.norm_cfg = norm_cfg @@ -115,7 +115,8 @@ class MobileNetV3(BaseBackbone): padding=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=dict(type='HSwish')) + act_cfg=dict(type="HSwish"), + ) self.layers = self._make_layer() self.feat_dim = self.arch_settings[arch][-1][2] @@ -124,14 +125,9 @@ class MobileNetV3(BaseBackbone): layers = [] layer_setting = self.arch_settings[self.arch] for i, params in enumerate(layer_setting): - (kernel_size, mid_channels, out_channels, with_se, act, - stride) = params + kernel_size, mid_channels, out_channels, with_se, act, stride = params if with_se: - se_cfg = dict( - channels=mid_channels, - ratio=4, - act_cfg=(dict(type='ReLU'), - dict(type='HSigmoid', bias=1.0, divisor=2.0))) + se_cfg = dict(channels=mid_channels, ratio=4, act_cfg=(dict(type="ReLU"), dict(type="HSigmoid", bias=1.0, divisor=2.0))) else: se_cfg = None @@ -146,9 +142,10 @@ class MobileNetV3(BaseBackbone): conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, act_cfg=dict(type=act), - with_cp=self.with_cp) + with_cp=self.with_cp, + ) self.in_channels = out_channels - layer_name = f'layer{i + 1}' + layer_name = f"layer{i + 1}" self.add_module(layer_name, layer) layers.append(layer_name) return layers @@ -160,8 +157,7 @@ class MobileNetV3(BaseBackbone): for i, layer_name in enumerate(self.layers): layer = getattr(self, layer_name) x = layer(x) - if i in self.out_indices or \ - i - len(self.layers) in self.out_indices: + if i in self.out_indices or i - len(self.layers) in self.out_indices: outs.append(x) return tuple(outs) @@ -171,7 +167,7 @@ class MobileNetV3(BaseBackbone): for param in self.conv1.parameters(): param.requires_grad = False for i in range(1, self.frozen_stages + 1): - layer = getattr(self, f'layer{i}') + layer = getattr(self, f"layer{i}") layer.eval() for param in layer.parameters(): param.requires_grad = False diff --git a/mmpose/models/backbones/mspn.py b/mmpose/models/backbones/mspn.py index bcb636b1a3fdc0357fa7dc7c3751738914d58980..1f4f90c3f67eff52b9aec96bf53d0a5b2772bb51 100644 --- a/mmpose/models/backbones/mspn.py +++ b/mmpose/models/backbones/mspn.py @@ -10,6 +10,7 @@ from mmengine.runner import load_state_dict from mmpose.registry import MODELS from mmpose.utils import get_root_logger + from .base_backbone import BaseBackbone from .resnet import Bottleneck as _Bottleneck from .utils import get_state_dict @@ -52,14 +53,7 @@ class DownsampleModule(BaseModule): Default: None """ - def __init__(self, - block, - num_blocks, - num_units=4, - has_skip=False, - norm_cfg=dict(type='BN'), - in_channels=64, - init_cfg=None): + def __init__(self, block, num_blocks, num_units=4, has_skip=False, norm_cfg=dict(type="BN"), in_channels=64, init_cfg=None): # Protect mutable default arguments norm_cfg = cp.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) @@ -71,11 +65,8 @@ class DownsampleModule(BaseModule): self.norm_cfg = norm_cfg self.layer1 = self._make_layer(block, in_channels, num_blocks[0]) for i in range(1, num_units): - module_name = f'layer{i + 1}' - self.add_module( - module_name, - self._make_layer( - block, in_channels * pow(2, i), num_blocks[i], stride=2)) + module_name = f"layer{i + 1}" + self.add_module(module_name, self._make_layer(block, in_channels * pow(2, i), num_blocks[i], stride=2)) def _make_layer(self, block, out_channels, blocks, stride=1): downsample = None @@ -88,16 +79,11 @@ class DownsampleModule(BaseModule): padding=0, norm_cfg=self.norm_cfg, act_cfg=None, - inplace=True) + inplace=True, + ) units = list() - units.append( - block( - self.in_channels, - out_channels, - stride=stride, - downsample=downsample, - norm_cfg=self.norm_cfg)) + units.append(block(self.in_channels, out_channels, stride=stride, downsample=downsample, norm_cfg=self.norm_cfg)) self.in_channels = out_channels * block.expansion for _ in range(1, blocks): units.append(block(self.in_channels, out_channels)) @@ -107,7 +93,7 @@ class DownsampleModule(BaseModule): def forward(self, x, skip1, skip2): out = list() for i in range(self.num_units): - module_name = f'layer{i + 1}' + module_name = f"layer{i + 1}" module_i = getattr(self, module_name) x = module_i(x) if self.has_skip: @@ -142,84 +128,53 @@ class UpsampleUnit(BaseModule): Default: None """ - def __init__(self, - ind, - num_units, - in_channels, - unit_channels=256, - gen_skip=False, - gen_cross_conv=False, - norm_cfg=dict(type='BN'), - out_channels=64, - init_cfg=None): + def __init__( + self, + ind, + num_units, + in_channels, + unit_channels=256, + gen_skip=False, + gen_cross_conv=False, + norm_cfg=dict(type="BN"), + out_channels=64, + init_cfg=None, + ): # Protect mutable default arguments norm_cfg = cp.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) self.num_units = num_units self.norm_cfg = norm_cfg self.in_skip = ConvModule( - in_channels, - unit_channels, - kernel_size=1, - stride=1, - padding=0, - norm_cfg=self.norm_cfg, - act_cfg=None, - inplace=True) + in_channels, unit_channels, kernel_size=1, stride=1, padding=0, norm_cfg=self.norm_cfg, act_cfg=None, inplace=True + ) self.relu = nn.ReLU(inplace=True) self.ind = ind if self.ind > 0: self.up_conv = ConvModule( - unit_channels, - unit_channels, - kernel_size=1, - stride=1, - padding=0, - norm_cfg=self.norm_cfg, - act_cfg=None, - inplace=True) + unit_channels, unit_channels, kernel_size=1, stride=1, padding=0, norm_cfg=self.norm_cfg, act_cfg=None, inplace=True + ) self.gen_skip = gen_skip if self.gen_skip: - self.out_skip1 = ConvModule( - in_channels, - in_channels, - kernel_size=1, - stride=1, - padding=0, - norm_cfg=self.norm_cfg, - inplace=True) + self.out_skip1 = ConvModule(in_channels, in_channels, kernel_size=1, stride=1, padding=0, norm_cfg=self.norm_cfg, inplace=True) self.out_skip2 = ConvModule( - unit_channels, - in_channels, - kernel_size=1, - stride=1, - padding=0, - norm_cfg=self.norm_cfg, - inplace=True) + unit_channels, in_channels, kernel_size=1, stride=1, padding=0, norm_cfg=self.norm_cfg, inplace=True + ) self.gen_cross_conv = gen_cross_conv if self.ind == num_units - 1 and self.gen_cross_conv: self.cross_conv = ConvModule( - unit_channels, - out_channels, - kernel_size=1, - stride=1, - padding=0, - norm_cfg=self.norm_cfg, - inplace=True) + unit_channels, out_channels, kernel_size=1, stride=1, padding=0, norm_cfg=self.norm_cfg, inplace=True + ) def forward(self, x, up_x): out = self.in_skip(x) if self.ind > 0: - up_x = F.interpolate( - up_x, - size=(x.size(2), x.size(3)), - mode='bilinear', - align_corners=True) + up_x = F.interpolate(up_x, size=(x.size(2), x.size(3)), mode="bilinear", align_corners=True) up_x = self.up_conv(up_x) out = out + up_x out = self.relu(out) @@ -256,28 +211,22 @@ class UpsampleModule(BaseModule): Default: None """ - def __init__(self, - unit_channels=256, - num_units=4, - gen_skip=False, - gen_cross_conv=False, - norm_cfg=dict(type='BN'), - out_channels=64, - init_cfg=None): + def __init__( + self, unit_channels=256, num_units=4, gen_skip=False, gen_cross_conv=False, norm_cfg=dict(type="BN"), out_channels=64, init_cfg=None + ): # Protect mutable default arguments norm_cfg = cp.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) self.in_channels = list() for i in range(num_units): - self.in_channels.append(Bottleneck.expansion * out_channels * - pow(2, i)) + self.in_channels.append(Bottleneck.expansion * out_channels * pow(2, i)) self.in_channels.reverse() self.num_units = num_units self.gen_skip = gen_skip self.gen_cross_conv = gen_cross_conv self.norm_cfg = norm_cfg for i in range(num_units): - module_name = f'up{i + 1}' + module_name = f"up{i + 1}" self.add_module( module_name, UpsampleUnit( @@ -288,7 +237,9 @@ class UpsampleModule(BaseModule): self.gen_skip, self.gen_cross_conv, norm_cfg=self.norm_cfg, - out_channels=64)) + out_channels=64, + ), + ) def forward(self, x): out = list() @@ -296,7 +247,7 @@ class UpsampleModule(BaseModule): skip2 = list() cross_conv = None for i in range(self.num_units): - module_i = getattr(self, f'up{i + 1}') + module_i = getattr(self, f"up{i + 1}") if i == 0: outi, skip1_i, skip2_i, _ = module_i(x[i], None) elif i == self.num_units - 1: @@ -334,16 +285,18 @@ class SingleStageNetwork(BaseModule): Default: None """ - def __init__(self, - has_skip=False, - gen_skip=False, - gen_cross_conv=False, - unit_channels=256, - num_units=4, - num_blocks=[2, 2, 2, 2], - norm_cfg=dict(type='BN'), - in_channels=64, - init_cfg=None): + def __init__( + self, + has_skip=False, + gen_skip=False, + gen_cross_conv=False, + unit_channels=256, + num_units=4, + num_blocks=[2, 2, 2, 2], + norm_cfg=dict(type="BN"), + in_channels=64, + init_cfg=None, + ): # Protect mutable default arguments norm_cfg = cp.deepcopy(norm_cfg) num_blocks = cp.deepcopy(num_blocks) @@ -357,10 +310,8 @@ class SingleStageNetwork(BaseModule): self.num_blocks = num_blocks self.norm_cfg = norm_cfg - self.downsample = DownsampleModule(Bottleneck, num_blocks, num_units, - has_skip, norm_cfg, in_channels) - self.upsample = UpsampleModule(unit_channels, num_units, gen_skip, - gen_cross_conv, norm_cfg, in_channels) + self.downsample = DownsampleModule(Bottleneck, num_blocks, num_units, has_skip, norm_cfg, in_channels) + self.upsample = UpsampleModule(unit_channels, num_units, gen_skip, gen_cross_conv, norm_cfg, in_channels) def forward(self, x, skip1, skip2): mid = self.downsample(x, skip1, skip2) @@ -380,19 +331,14 @@ class ResNetTop(BaseModule): Default: None """ - def __init__(self, norm_cfg=dict(type='BN'), channels=64, init_cfg=None): + def __init__(self, norm_cfg=dict(type="BN"), channels=64, init_cfg=None): # Protect mutable default arguments norm_cfg = cp.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) self.top = nn.Sequential( - ConvModule( - 3, - channels, - kernel_size=7, - stride=2, - padding=3, - norm_cfg=norm_cfg, - inplace=True), MaxPool2d(kernel_size=3, stride=2, padding=1)) + ConvModule(3, channels, kernel_size=7, stride=2, padding=3, norm_cfg=norm_cfg, inplace=True), + MaxPool2d(kernel_size=3, stride=2, padding=1), + ) def forward(self, img): return self.top(img) @@ -447,21 +393,20 @@ class MSPN(BaseBackbone): (1, 256, 128, 128) """ - def __init__(self, - unit_channels=256, - num_stages=4, - num_units=4, - num_blocks=[2, 2, 2, 2], - norm_cfg=dict(type='BN'), - res_top_channels=64, - init_cfg=[ - dict(type='Kaiming', layer=['Conv2d']), - dict( - type='Constant', - val=1, - layer=['_BatchNorm', 'GroupNorm']), - dict(type='Normal', std=0.01, layer=['Linear']), - ]): + def __init__( + self, + unit_channels=256, + num_stages=4, + num_units=4, + num_blocks=[2, 2, 2, 2], + norm_cfg=dict(type="BN"), + res_top_channels=64, + init_cfg=[ + dict(type="Kaiming", layer=["Conv2d"]), + dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"]), + dict(type="Normal", std=0.01, layer=["Linear"]), + ], + ): # Protect mutable default arguments norm_cfg = cp.deepcopy(norm_cfg) num_blocks = cp.deepcopy(num_blocks) @@ -489,9 +434,8 @@ class MSPN(BaseBackbone): gen_skip = False gen_cross_conv = False self.multi_stage_mspn.append( - SingleStageNetwork(has_skip, gen_skip, gen_cross_conv, - unit_channels, num_units, num_blocks, - norm_cfg, res_top_channels)) + SingleStageNetwork(has_skip, gen_skip, gen_cross_conv, unit_channels, num_units, num_blocks, norm_cfg, res_top_channels) + ) def forward(self, x): """Model forward function.""" @@ -507,35 +451,27 @@ class MSPN(BaseBackbone): def init_weights(self): """Initialize model weights.""" - if (isinstance(self.init_cfg, dict) - and self.init_cfg['type'] == 'Pretrained'): + if isinstance(self.init_cfg, dict) and self.init_cfg["type"] == "Pretrained": logger = get_root_logger() - state_dict_tmp = get_state_dict(self.init_cfg['checkpoint']) + state_dict_tmp = get_state_dict(self.init_cfg["checkpoint"]) state_dict = OrderedDict() - state_dict['top'] = OrderedDict() - state_dict['bottlenecks'] = OrderedDict() + state_dict["top"] = OrderedDict() + state_dict["bottlenecks"] = OrderedDict() for k, v in state_dict_tmp.items(): - if k.startswith('layer'): - if 'downsample.0' in k: - state_dict['bottlenecks'][k.replace( - 'downsample.0', 'downsample.conv')] = v - elif 'downsample.1' in k: - state_dict['bottlenecks'][k.replace( - 'downsample.1', 'downsample.bn')] = v + if k.startswith("layer"): + if "downsample.0" in k: + state_dict["bottlenecks"][k.replace("downsample.0", "downsample.conv")] = v + elif "downsample.1" in k: + state_dict["bottlenecks"][k.replace("downsample.1", "downsample.bn")] = v else: - state_dict['bottlenecks'][k] = v - elif k.startswith('conv1'): - state_dict['top'][k.replace('conv1', 'top.0.conv')] = v - elif k.startswith('bn1'): - state_dict['top'][k.replace('bn1', 'top.0.bn')] = v - - load_state_dict( - self.top, state_dict['top'], strict=False, logger=logger) + state_dict["bottlenecks"][k] = v + elif k.startswith("conv1"): + state_dict["top"][k.replace("conv1", "top.0.conv")] = v + elif k.startswith("bn1"): + state_dict["top"][k.replace("bn1", "top.0.bn")] = v + + load_state_dict(self.top, state_dict["top"], strict=False, logger=logger) for i in range(self.num_stages): - load_state_dict( - self.multi_stage_mspn[i].downsample, - state_dict['bottlenecks'], - strict=False, - logger=logger) + load_state_dict(self.multi_stage_mspn[i].downsample, state_dict["bottlenecks"], strict=False, logger=logger) else: super(MSPN, self).init_weights() diff --git a/mmpose/models/backbones/pvt.py b/mmpose/models/backbones/pvt.py index 3f2b6495482b4feadd86f51fa11b64ee10878fef..8e4c5f8cac1cdeecb58288b3355c423286572706 100644 --- a/mmpose/models/backbones/pvt.py +++ b/mmpose/models/backbones/pvt.py @@ -14,6 +14,7 @@ from mmengine.runner import load_state_dict from mmengine.utils import to_2tuple from mmpose.registry import MODELS + from ...utils import get_root_logger from ..utils import PatchEmbed, nchw_to_nlc, nlc_to_nchw, pvt_convert from .utils import get_state_dict @@ -43,14 +44,9 @@ class MixFFN(BaseModule): Default: None """ - def __init__(self, - embed_dims, - feedforward_channels, - act_cfg=dict(type='GELU'), - ffn_drop=0., - dropout_layer=None, - use_conv=False, - init_cfg=None): + def __init__( + self, embed_dims, feedforward_channels, act_cfg=dict(type="GELU"), ffn_drop=0.0, dropout_layer=None, use_conv=False, init_cfg=None + ): super(MixFFN, self).__init__(init_cfg=init_cfg) self.embed_dims = embed_dims @@ -59,12 +55,7 @@ class MixFFN(BaseModule): activate = build_activation_layer(act_cfg) in_channels = embed_dims - fc1 = Conv2d( - in_channels=in_channels, - out_channels=feedforward_channels, - kernel_size=1, - stride=1, - bias=True) + fc1 = Conv2d(in_channels=in_channels, out_channels=feedforward_channels, kernel_size=1, stride=1, bias=True) if use_conv: # 3x3 depth wise conv to provide positional encode information dw_conv = Conv2d( @@ -74,20 +65,15 @@ class MixFFN(BaseModule): stride=1, padding=(3 - 1) // 2, bias=True, - groups=feedforward_channels) - fc2 = Conv2d( - in_channels=feedforward_channels, - out_channels=in_channels, - kernel_size=1, - stride=1, - bias=True) + groups=feedforward_channels, + ) + fc2 = Conv2d(in_channels=feedforward_channels, out_channels=in_channels, kernel_size=1, stride=1, bias=True) drop = nn.Dropout(ffn_drop) layers = [fc1, activate, drop, fc2, drop] if use_conv: layers.insert(1, dw_conv) self.layers = Sequential(*layers) - self.dropout_layer = build_dropout( - dropout_layer) if dropout_layer else torch.nn.Identity() + self.dropout_layer = build_dropout(dropout_layer) if dropout_layer else torch.nn.Identity() def forward(self, x, hw_shape, identity=None): out = nlc_to_nchw(x, hw_shape) @@ -125,17 +111,19 @@ class SpatialReductionAttention(MultiheadAttention): Default: None """ - def __init__(self, - embed_dims, - num_heads, - attn_drop=0., - proj_drop=0., - dropout_layer=None, - batch_first=True, - qkv_bias=True, - norm_cfg=dict(type='LN'), - sr_ratio=1, - init_cfg=None): + def __init__( + self, + embed_dims, + num_heads, + attn_drop=0.0, + proj_drop=0.0, + dropout_layer=None, + batch_first=True, + qkv_bias=True, + norm_cfg=dict(type="LN"), + sr_ratio=1, + init_cfg=None, + ): super().__init__( embed_dims, num_heads, @@ -144,25 +132,25 @@ class SpatialReductionAttention(MultiheadAttention): batch_first=batch_first, dropout_layer=dropout_layer, bias=qkv_bias, - init_cfg=init_cfg) + init_cfg=init_cfg, + ) self.sr_ratio = sr_ratio if sr_ratio > 1: - self.sr = Conv2d( - in_channels=embed_dims, - out_channels=embed_dims, - kernel_size=sr_ratio, - stride=sr_ratio) + self.sr = Conv2d(in_channels=embed_dims, out_channels=embed_dims, kernel_size=sr_ratio, stride=sr_ratio) # The ret[0] of build_norm_layer is norm name. self.norm = build_norm_layer(norm_cfg, embed_dims)[1] # handle the BC-breaking from https://github.com/open-mmlab/mmcv/pull/1418 # noqa from mmpose import digit_version, mmcv_version - if mmcv_version < digit_version('1.3.17'): - warnings.warn('The legacy version of forward function in' - 'SpatialReductionAttention is deprecated in' - 'mmcv>=1.3.17 and will no longer support in the' - 'future. Please upgrade your mmcv.') + + if mmcv_version < digit_version("1.3.17"): + warnings.warn( + "The legacy version of forward function in" + "SpatialReductionAttention is deprecated in" + "mmcv>=1.3.17 and will no longer support in the" + "future. Please upgrade your mmcv." + ) self.forward = self.legacy_forward def forward(self, x, hw_shape, identity=None): @@ -241,19 +229,21 @@ class PVTEncoderLayer(BaseModule): Default: None """ - def __init__(self, - embed_dims, - num_heads, - feedforward_channels, - drop_rate=0., - attn_drop_rate=0., - drop_path_rate=0., - qkv_bias=True, - act_cfg=dict(type='GELU'), - norm_cfg=dict(type='LN'), - sr_ratio=1, - use_conv_ffn=False, - init_cfg=None): + def __init__( + self, + embed_dims, + num_heads, + feedforward_channels, + drop_rate=0.0, + attn_drop_rate=0.0, + drop_path_rate=0.0, + qkv_bias=True, + act_cfg=dict(type="GELU"), + norm_cfg=dict(type="LN"), + sr_ratio=1, + use_conv_ffn=False, + init_cfg=None, + ): super(PVTEncoderLayer, self).__init__(init_cfg=init_cfg) # The ret[0] of build_norm_layer is norm name. @@ -264,10 +254,11 @@ class PVTEncoderLayer(BaseModule): num_heads=num_heads, attn_drop=attn_drop_rate, proj_drop=drop_rate, - dropout_layer=dict(type='DropPath', drop_prob=drop_path_rate), + dropout_layer=dict(type="DropPath", drop_prob=drop_path_rate), qkv_bias=qkv_bias, norm_cfg=norm_cfg, - sr_ratio=sr_ratio) + sr_ratio=sr_ratio, + ) # The ret[0] of build_norm_layer is norm name. self.norm2 = build_norm_layer(norm_cfg, embed_dims)[1] @@ -276,9 +267,10 @@ class PVTEncoderLayer(BaseModule): embed_dims=embed_dims, feedforward_channels=feedforward_channels, ffn_drop=drop_rate, - dropout_layer=dict(type='DropPath', drop_prob=drop_path_rate), + dropout_layer=dict(type="DropPath", drop_prob=drop_path_rate), use_conv=use_conv_ffn, - act_cfg=act_cfg) + act_cfg=act_cfg, + ) def forward(self, x, hw_shape): x = self.attn(self.norm1(x), hw_shape, identity=x) @@ -299,7 +291,7 @@ class AbsolutePositionEmbedding(BaseModule): Default: None. """ - def __init__(self, pos_shape, pos_dim, drop_rate=0., init_cfg=None): + def __init__(self, pos_shape, pos_dim, drop_rate=0.0, init_cfg=None): super().__init__(init_cfg=init_cfg) if isinstance(pos_shape, int): @@ -307,20 +299,17 @@ class AbsolutePositionEmbedding(BaseModule): elif isinstance(pos_shape, tuple): if len(pos_shape) == 1: pos_shape = to_2tuple(pos_shape[0]) - assert len(pos_shape) == 2, \ - f'The size of image should have length 1 or 2, ' \ - f'but got {len(pos_shape)}' + assert len(pos_shape) == 2, f"The size of image should have length 1 or 2, " f"but got {len(pos_shape)}" self.pos_shape = pos_shape self.pos_dim = pos_dim - self.pos_embed = nn.Parameter( - torch.zeros(1, pos_shape[0] * pos_shape[1], pos_dim)) + self.pos_embed = nn.Parameter(torch.zeros(1, pos_shape[0] * pos_shape[1], pos_dim)) self.drop = nn.Dropout(p=drop_rate) def init_weights(self): trunc_normal_(self.pos_embed, std=0.02) - def resize_pos_embed(self, pos_embed, input_shape, mode='bilinear'): + def resize_pos_embed(self, pos_embed, input_shape, mode="bilinear"): """Resize pos_embed weights. Resize pos_embed using bilinear interpolate method. @@ -336,20 +325,17 @@ class AbsolutePositionEmbedding(BaseModule): Return: torch.Tensor: The resized pos_embed of shape [B, L_new, C]. """ - assert pos_embed.ndim == 3, 'shape of pos_embed must be [B, L, C]' + assert pos_embed.ndim == 3, "shape of pos_embed must be [B, L, C]" pos_h, pos_w = self.pos_shape - pos_embed_weight = pos_embed[:, (-1 * pos_h * pos_w):] - pos_embed_weight = pos_embed_weight.reshape( - 1, pos_h, pos_w, self.pos_dim).permute(0, 3, 1, 2).contiguous() - pos_embed_weight = F.interpolate( - pos_embed_weight, size=input_shape, mode=mode) - pos_embed_weight = torch.flatten(pos_embed_weight, - 2).transpose(1, 2).contiguous() + pos_embed_weight = pos_embed[:, (-1 * pos_h * pos_w) :] + pos_embed_weight = pos_embed_weight.reshape(1, pos_h, pos_w, self.pos_dim).permute(0, 3, 1, 2).contiguous() + pos_embed_weight = F.interpolate(pos_embed_weight, size=input_shape, mode=mode) + pos_embed_weight = torch.flatten(pos_embed_weight, 2).transpose(1, 2).contiguous() pos_embed = pos_embed_weight return pos_embed - def forward(self, x, hw_shape, mode='bilinear'): + def forward(self, x, hw_shape, mode="bilinear"): pos_embed = self.resize_pos_embed(self.pos_embed, hw_shape, mode) return self.drop(x + pos_embed) @@ -413,34 +399,36 @@ class PyramidVisionTransformer(BaseModule): ]`` """ - def __init__(self, - pretrain_img_size=224, - in_channels=3, - embed_dims=64, - num_stages=4, - num_layers=[3, 4, 6, 3], - num_heads=[1, 2, 5, 8], - patch_sizes=[4, 2, 2, 2], - strides=[4, 2, 2, 2], - paddings=[0, 0, 0, 0], - sr_ratios=[8, 4, 2, 1], - out_indices=(0, 1, 2, 3), - mlp_ratios=[8, 8, 4, 4], - qkv_bias=True, - drop_rate=0., - attn_drop_rate=0., - drop_path_rate=0.1, - use_abs_pos_embed=True, - norm_after_stage=False, - use_conv_ffn=False, - act_cfg=dict(type='GELU'), - norm_cfg=dict(type='LN', eps=1e-6), - convert_weights=True, - init_cfg=[ - dict(type='TruncNormal', std=.02, layer=['Linear']), - dict(type='Constant', val=1, layer=['LayerNorm']), - dict(type='Kaiming', layer=['Conv2d']) - ]): + def __init__( + self, + pretrain_img_size=224, + in_channels=3, + embed_dims=64, + num_stages=4, + num_layers=[3, 4, 6, 3], + num_heads=[1, 2, 5, 8], + patch_sizes=[4, 2, 2, 2], + strides=[4, 2, 2, 2], + paddings=[0, 0, 0, 0], + sr_ratios=[8, 4, 2, 1], + out_indices=(0, 1, 2, 3), + mlp_ratios=[8, 8, 4, 4], + qkv_bias=True, + drop_rate=0.0, + attn_drop_rate=0.0, + drop_path_rate=0.1, + use_abs_pos_embed=True, + norm_after_stage=False, + use_conv_ffn=False, + act_cfg=dict(type="GELU"), + norm_cfg=dict(type="LN", eps=1e-6), + convert_weights=True, + init_cfg=[ + dict(type="TruncNormal", std=0.02, layer=["Linear"]), + dict(type="Constant", val=1, layer=["LayerNorm"]), + dict(type="Kaiming", layer=["Conv2d"]), + ], + ): super().__init__(init_cfg=init_cfg) self.convert_weights = convert_weights @@ -449,9 +437,7 @@ class PyramidVisionTransformer(BaseModule): elif isinstance(pretrain_img_size, tuple): if len(pretrain_img_size) == 1: pretrain_img_size = to_2tuple(pretrain_img_size[0]) - assert len(pretrain_img_size) == 2, \ - f'The size of image should have length 1 or 2, ' \ - f'but got {len(pretrain_img_size)}' + assert len(pretrain_img_size) == 2, f"The size of image should have length 1 or 2, " f"but got {len(pretrain_img_size)}" self.embed_dims = embed_dims @@ -461,17 +447,13 @@ class PyramidVisionTransformer(BaseModule): self.patch_sizes = patch_sizes self.strides = strides self.sr_ratios = sr_ratios - assert num_stages == len(num_layers) == len(num_heads) \ - == len(patch_sizes) == len(strides) == len(sr_ratios) + assert num_stages == len(num_layers) == len(num_heads) == len(patch_sizes) == len(strides) == len(sr_ratios) self.out_indices = out_indices assert max(out_indices) < self.num_stages # transformer encoder - dpr = [ - x.item() - for x in torch.linspace(0, drop_path_rate, sum(num_layers)) - ] # stochastic num_layer decay rule + dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(num_layers))] # stochastic num_layer decay rule cur = 0 self.layers = ModuleList() @@ -484,30 +466,32 @@ class PyramidVisionTransformer(BaseModule): stride=strides[i], padding=paddings[i], bias=True, - norm_cfg=norm_cfg) + norm_cfg=norm_cfg, + ) layers = ModuleList() if use_abs_pos_embed: - pos_shape = pretrain_img_size // np.prod(patch_sizes[:i + 1]) - pos_embed = AbsolutePositionEmbedding( - pos_shape=pos_shape, - pos_dim=embed_dims_i, - drop_rate=drop_rate) + pos_shape = pretrain_img_size // np.prod(patch_sizes[: i + 1]) + pos_embed = AbsolutePositionEmbedding(pos_shape=pos_shape, pos_dim=embed_dims_i, drop_rate=drop_rate) layers.append(pos_embed) - layers.extend([ - PVTEncoderLayer( - embed_dims=embed_dims_i, - num_heads=num_heads[i], - feedforward_channels=mlp_ratios[i] * embed_dims_i, - drop_rate=drop_rate, - attn_drop_rate=attn_drop_rate, - drop_path_rate=dpr[cur + idx], - qkv_bias=qkv_bias, - act_cfg=act_cfg, - norm_cfg=norm_cfg, - sr_ratio=sr_ratios[i], - use_conv_ffn=use_conv_ffn) for idx in range(num_layer) - ]) + layers.extend( + [ + PVTEncoderLayer( + embed_dims=embed_dims_i, + num_heads=num_heads[i], + feedforward_channels=mlp_ratios[i] * embed_dims_i, + drop_rate=drop_rate, + attn_drop_rate=attn_drop_rate, + drop_path_rate=dpr[cur + idx], + qkv_bias=qkv_bias, + act_cfg=act_cfg, + norm_cfg=norm_cfg, + sr_ratio=sr_ratios[i], + use_conv_ffn=use_conv_ffn, + ) + for idx in range(num_layer) + ] + ) in_channels = embed_dims_i # The ret[0] of build_norm_layer is norm name. if norm_after_stage: @@ -520,13 +504,10 @@ class PyramidVisionTransformer(BaseModule): def init_weights(self): """Initialize the weights in backbone.""" - if (isinstance(self.init_cfg, dict) - and self.init_cfg['type'] == 'Pretrained'): + if isinstance(self.init_cfg, dict) and self.init_cfg["type"] == "Pretrained": logger = get_root_logger() - state_dict = get_state_dict( - self.init_cfg['checkpoint'], map_location='cpu') - logger.warn(f'Load pre-trained model for ' - f'{self.__class__.__name__} from original repo') + state_dict = get_state_dict(self.init_cfg["checkpoint"], map_location="cpu") + logger.warn(f"Load pre-trained model for " f"{self.__class__.__name__} from original repo") if self.convert_weights: # Because pvt backbones are not supported by mmcls, @@ -561,9 +542,5 @@ class PyramidVisionTransformerV2(PyramidVisionTransformer): def __init__(self, **kwargs): super(PyramidVisionTransformerV2, self).__init__( - patch_sizes=[7, 3, 3, 3], - paddings=[3, 1, 1, 1], - use_abs_pos_embed=False, - norm_after_stage=True, - use_conv_ffn=True, - **kwargs) + patch_sizes=[7, 3, 3, 3], paddings=[3, 1, 1, 1], use_abs_pos_embed=False, norm_after_stage=True, use_conv_ffn=True, **kwargs + ) diff --git a/mmpose/models/backbones/regnet.py b/mmpose/models/backbones/regnet.py index 120523e658ecb2b3134eba45508ac47457a87f1d..de838563401efa74f72f48e643e73e899d312877 100644 --- a/mmpose/models/backbones/regnet.py +++ b/mmpose/models/backbones/regnet.py @@ -6,6 +6,7 @@ import torch.nn as nn from mmcv.cnn import build_conv_layer, build_norm_layer from mmpose.registry import MODELS + from .resnet import ResNet from .resnext import Bottleneck @@ -75,77 +76,62 @@ class RegNet(ResNet): (1, 432, 2, 2) (1, 1008, 1, 1) """ + arch_settings = { - 'regnetx_400mf': - dict(w0=24, wa=24.48, wm=2.54, group_w=16, depth=22, bot_mul=1.0), - 'regnetx_800mf': - dict(w0=56, wa=35.73, wm=2.28, group_w=16, depth=16, bot_mul=1.0), - 'regnetx_1.6gf': - dict(w0=80, wa=34.01, wm=2.25, group_w=24, depth=18, bot_mul=1.0), - 'regnetx_3.2gf': - dict(w0=88, wa=26.31, wm=2.25, group_w=48, depth=25, bot_mul=1.0), - 'regnetx_4.0gf': - dict(w0=96, wa=38.65, wm=2.43, group_w=40, depth=23, bot_mul=1.0), - 'regnetx_6.4gf': - dict(w0=184, wa=60.83, wm=2.07, group_w=56, depth=17, bot_mul=1.0), - 'regnetx_8.0gf': - dict(w0=80, wa=49.56, wm=2.88, group_w=120, depth=23, bot_mul=1.0), - 'regnetx_12gf': - dict(w0=168, wa=73.36, wm=2.37, group_w=112, depth=19, bot_mul=1.0), + "regnetx_400mf": dict(w0=24, wa=24.48, wm=2.54, group_w=16, depth=22, bot_mul=1.0), + "regnetx_800mf": dict(w0=56, wa=35.73, wm=2.28, group_w=16, depth=16, bot_mul=1.0), + "regnetx_1.6gf": dict(w0=80, wa=34.01, wm=2.25, group_w=24, depth=18, bot_mul=1.0), + "regnetx_3.2gf": dict(w0=88, wa=26.31, wm=2.25, group_w=48, depth=25, bot_mul=1.0), + "regnetx_4.0gf": dict(w0=96, wa=38.65, wm=2.43, group_w=40, depth=23, bot_mul=1.0), + "regnetx_6.4gf": dict(w0=184, wa=60.83, wm=2.07, group_w=56, depth=17, bot_mul=1.0), + "regnetx_8.0gf": dict(w0=80, wa=49.56, wm=2.88, group_w=120, depth=23, bot_mul=1.0), + "regnetx_12gf": dict(w0=168, wa=73.36, wm=2.37, group_w=112, depth=19, bot_mul=1.0), } - def __init__(self, - arch, - in_channels=3, - stem_channels=32, - base_channels=32, - strides=(2, 2, 2, 2), - dilations=(1, 1, 1, 1), - out_indices=(3, ), - style='pytorch', - deep_stem=False, - avg_down=False, - frozen_stages=-1, - conv_cfg=None, - norm_cfg=dict(type='BN', requires_grad=True), - norm_eval=False, - with_cp=False, - zero_init_residual=True, - init_cfg=[ - dict(type='Kaiming', layer=['Conv2d']), - dict( - type='Constant', - val=1, - layer=['_BatchNorm', 'GroupNorm']) - ]): + def __init__( + self, + arch, + in_channels=3, + stem_channels=32, + base_channels=32, + strides=(2, 2, 2, 2), + dilations=(1, 1, 1, 1), + out_indices=(3,), + style="pytorch", + deep_stem=False, + avg_down=False, + frozen_stages=-1, + conv_cfg=None, + norm_cfg=dict(type="BN", requires_grad=True), + norm_eval=False, + with_cp=False, + zero_init_residual=True, + init_cfg=[dict(type="Kaiming", layer=["Conv2d"]), dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"])], + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) super(ResNet, self).__init__(init_cfg=init_cfg) # Generate RegNet parameters first if isinstance(arch, str): - assert arch in self.arch_settings, \ - f'"arch": "{arch}" is not one of the' \ - ' arch_settings' + assert arch in self.arch_settings, f'"arch": "{arch}" is not one of the' " arch_settings" arch = self.arch_settings[arch] elif not isinstance(arch, dict): - raise TypeError('Expect "arch" to be either a string ' - f'or a dict, got {type(arch)}') + raise TypeError('Expect "arch" to be either a string ' f"or a dict, got {type(arch)}") widths, num_stages = self.generate_regnet( - arch['w0'], - arch['wa'], - arch['wm'], - arch['depth'], + arch["w0"], + arch["wa"], + arch["wm"], + arch["depth"], ) # Convert to per stage format stage_widths, stage_blocks = self.get_stages_from_blocks(widths) # Generate group widths and bot muls - group_widths = [arch['group_w'] for _ in range(num_stages)] - self.bottleneck_ratio = [arch['bot_mul'] for _ in range(num_stages)] + group_widths = [arch["group_w"] for _ in range(num_stages)] + self.bottleneck_ratio = [arch["bot_mul"] for _ in range(num_stages)] # Adjust the compatibility of stage_widths and group_widths - stage_widths, group_widths = self.adjust_width_group( - stage_widths, self.bottleneck_ratio, group_widths) + stage_widths, group_widths = self.adjust_width_group(stage_widths, self.bottleneck_ratio, group_widths) # Group params by stage self.stage_widths = stage_widths @@ -163,8 +149,7 @@ class RegNet(ResNet): self.style = style self.deep_stem = deep_stem if self.deep_stem: - raise NotImplementedError( - 'deep_stem has not been implemented for RegNet') + raise NotImplementedError("deep_stem has not been implemented for RegNet") self.avg_down = avg_down self.frozen_stages = frozen_stages self.conv_cfg = conv_cfg @@ -200,9 +185,10 @@ class RegNet(ResNet): norm_cfg=self.norm_cfg, base_channels=self.stage_widths[i], groups=stage_groups, - width_per_group=group_width) + width_per_group=group_width, + ) _in_channels = self.stage_widths[i] - layer_name = f'layer{i + 1}' + layer_name = f"layer{i + 1}" self.add_module(layer_name, res_layer) self.res_layers.append(layer_name) @@ -211,25 +197,13 @@ class RegNet(ResNet): self.feat_dim = stage_widths[-1] def _make_stem_layer(self, in_channels, base_channels): - self.conv1 = build_conv_layer( - self.conv_cfg, - in_channels, - base_channels, - kernel_size=3, - stride=2, - padding=1, - bias=False) - self.norm1_name, norm1 = build_norm_layer( - self.norm_cfg, base_channels, postfix=1) + self.conv1 = build_conv_layer(self.conv_cfg, in_channels, base_channels, kernel_size=3, stride=2, padding=1, bias=False) + self.norm1_name, norm1 = build_norm_layer(self.norm_cfg, base_channels, postfix=1) self.add_module(self.norm1_name, norm1) self.relu = nn.ReLU(inplace=True) @staticmethod - def generate_regnet(initial_width, - width_slope, - width_parameter, - depth, - divisor=8): + def generate_regnet(initial_width, width_slope, width_parameter, depth, divisor=8): """Generates per block width from RegNet parameters. Args: @@ -248,8 +222,7 @@ class RegNet(ResNet): assert width_parameter > 1 assert initial_width % divisor == 0 widths_cont = np.arange(depth) * width_slope + initial_width - ks = np.round( - np.log(widths_cont / initial_width) / np.log(width_parameter)) + ks = np.round(np.log(widths_cont / initial_width) / np.log(width_parameter)) widths = initial_width * np.power(width_parameter, ks) widths = np.round(np.divide(widths, divisor)) * divisor num_stages = len(np.unique(widths)) @@ -280,18 +253,10 @@ class RegNet(ResNet): Returns: tuple(list): The adjusted widths and groups of each stage. """ - bottleneck_width = [ - int(w * b) for w, b in zip(widths, bottleneck_ratio) - ] + bottleneck_width = [int(w * b) for w, b in zip(widths, bottleneck_ratio)] groups = [min(g, w_bot) for g, w_bot in zip(groups, bottleneck_width)] - bottleneck_width = [ - self.quantize_float(w_bot, g) - for w_bot, g in zip(bottleneck_width, groups) - ] - widths = [ - int(w_bot / b) - for w_bot, b in zip(bottleneck_width, bottleneck_ratio) - ] + bottleneck_width = [self.quantize_float(w_bot, g) for w_bot, g in zip(bottleneck_width, groups)] + widths = [int(w_bot / b) for w_bot, b in zip(bottleneck_width, bottleneck_ratio)] return widths, groups def get_stages_from_blocks(self, widths): @@ -303,17 +268,9 @@ class RegNet(ResNet): Returns: tuple(list): width and depth of each stage """ - width_diff = [ - width != width_prev - for width, width_prev in zip(widths + [0], [0] + widths) - ] - stage_widths = [ - width for width, diff in zip(widths, width_diff[:-1]) if diff - ] - stage_blocks = np.diff([ - depth for depth, diff in zip(range(len(width_diff)), width_diff) - if diff - ]).tolist() + width_diff = [width != width_prev for width, width_prev in zip(widths + [0], [0] + widths)] + stage_widths = [width for width, diff in zip(widths, width_diff[:-1]) if diff] + stage_blocks = np.diff([depth for depth, diff in zip(range(len(width_diff)), width_diff) if diff]).tolist() return stage_widths, stage_blocks def forward(self, x): diff --git a/mmpose/models/backbones/resnest.py b/mmpose/models/backbones/resnest.py index b5eea8ad7e50c2ab997e2df17316943fcaf3a5fe..6ea2c458328cafd51ea126945aba6f9237b0463a 100644 --- a/mmpose/models/backbones/resnest.py +++ b/mmpose/models/backbones/resnest.py @@ -7,8 +7,8 @@ from mmcv.cnn import build_conv_layer, build_norm_layer from mmengine.model import BaseModule from mmpose.registry import MODELS -from .resnet import Bottleneck as _Bottleneck -from .resnet import ResLayer, ResNetV1d + +from .resnet import Bottleneck as _Bottleneck, ResLayer, ResNetV1d class RSoftmax(nn.Module): @@ -56,19 +56,21 @@ class SplitAttentionConv2d(BaseModule): Default: None """ - def __init__(self, - in_channels, - channels, - kernel_size, - stride=1, - padding=0, - dilation=1, - groups=1, - radix=2, - reduction_factor=4, - conv_cfg=None, - norm_cfg=dict(type='BN'), - init_cfg=None): + def __init__( + self, + in_channels, + channels, + kernel_size, + stride=1, + padding=0, + dilation=1, + groups=1, + radix=2, + reduction_factor=4, + conv_cfg=None, + norm_cfg=dict(type="BN"), + init_cfg=None, + ): super().__init__(init_cfg=init_cfg) inter_channels = max(in_channels * radix // reduction_factor, 32) self.radix = radix @@ -83,18 +85,15 @@ class SplitAttentionConv2d(BaseModule): padding=padding, dilation=dilation, groups=groups * radix, - bias=False) - self.norm0_name, norm0 = build_norm_layer( - norm_cfg, channels * radix, postfix=0) + bias=False, + ) + self.norm0_name, norm0 = build_norm_layer(norm_cfg, channels * radix, postfix=0) self.add_module(self.norm0_name, norm0) self.relu = nn.ReLU(inplace=True) - self.fc1 = build_conv_layer( - None, channels, inter_channels, 1, groups=self.groups) - self.norm1_name, norm1 = build_norm_layer( - norm_cfg, inter_channels, postfix=1) + self.fc1 = build_conv_layer(None, channels, inter_channels, 1, groups=self.groups) + self.norm1_name, norm1 = build_norm_layer(norm_cfg, inter_channels, postfix=1) self.add_module(self.norm1_name, norm1) - self.fc2 = build_conv_layer( - None, inter_channels, channels * radix, 1, groups=self.groups) + self.fc2 = build_conv_layer(None, inter_channels, channels * radix, 1, groups=self.groups) self.rsoftmax = RSoftmax(radix, groups) @property @@ -165,16 +164,18 @@ class Bottleneck(_Bottleneck): Default: None """ - def __init__(self, - in_channels, - out_channels, - groups=1, - width_per_group=4, - base_channels=64, - radix=2, - reduction_factor=4, - avg_down_stride=True, - **kwargs): + def __init__( + self, + in_channels, + out_channels, + groups=1, + width_per_group=4, + base_channels=64, + radix=2, + reduction_factor=4, + avg_down_stride=True, + **kwargs, + ): super().__init__(in_channels, out_channels, **kwargs) self.groups = groups @@ -185,23 +186,16 @@ class Bottleneck(_Bottleneck): # groups and width_per_group and the stage it is located in. if groups != 1: assert self.mid_channels % base_channels == 0 - self.mid_channels = ( - groups * width_per_group * self.mid_channels // base_channels) + self.mid_channels = groups * width_per_group * self.mid_channels // base_channels self.avg_down_stride = avg_down_stride and self.conv2_stride > 1 - self.norm1_name, norm1 = build_norm_layer( - self.norm_cfg, self.mid_channels, postfix=1) - self.norm3_name, norm3 = build_norm_layer( - self.norm_cfg, self.out_channels, postfix=3) + self.norm1_name, norm1 = build_norm_layer(self.norm_cfg, self.mid_channels, postfix=1) + self.norm3_name, norm3 = build_norm_layer(self.norm_cfg, self.out_channels, postfix=3) self.conv1 = build_conv_layer( - self.conv_cfg, - self.in_channels, - self.mid_channels, - kernel_size=1, - stride=self.conv1_stride, - bias=False) + self.conv_cfg, self.in_channels, self.mid_channels, kernel_size=1, stride=self.conv1_stride, bias=False + ) self.add_module(self.norm1_name, norm1) self.conv2 = SplitAttentionConv2d( self.mid_channels, @@ -214,18 +208,14 @@ class Bottleneck(_Bottleneck): radix=radix, reduction_factor=reduction_factor, conv_cfg=self.conv_cfg, - norm_cfg=self.norm_cfg) + norm_cfg=self.norm_cfg, + ) delattr(self, self.norm2_name) if self.avg_down_stride: self.avd_layer = nn.AvgPool2d(3, self.conv2_stride, padding=1) - self.conv3 = build_conv_layer( - self.conv_cfg, - self.mid_channels, - self.out_channels, - kernel_size=1, - bias=False) + self.conv3 = build_conv_layer(self.conv_cfg, self.mid_channels, self.out_channels, kernel_size=1, bias=False) self.add_module(self.norm3_name, norm3) def forward(self, x): @@ -324,17 +314,10 @@ class ResNeSt(ResNetV1d): 101: (Bottleneck, (3, 4, 23, 3)), 152: (Bottleneck, (3, 8, 36, 3)), 200: (Bottleneck, (3, 24, 36, 3)), - 269: (Bottleneck, (3, 30, 48, 8)) + 269: (Bottleneck, (3, 30, 48, 8)), } - def __init__(self, - depth, - groups=1, - width_per_group=4, - radix=2, - reduction_factor=4, - avg_down_stride=True, - **kwargs): + def __init__(self, depth, groups=1, width_per_group=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs): self.groups = groups self.width_per_group = width_per_group self.radix = radix @@ -350,4 +333,5 @@ class ResNeSt(ResNetV1d): radix=self.radix, reduction_factor=self.reduction_factor, avg_down_stride=self.avg_down_stride, - **kwargs) + **kwargs, + ) diff --git a/mmpose/models/backbones/resnet.py b/mmpose/models/backbones/resnet.py index a04853f60d179ee2450ca199b0a8c28ae893941f..44837c1feb3549873ff2fc44f683320e08b72a3e 100644 --- a/mmpose/models/backbones/resnet.py +++ b/mmpose/models/backbones/resnet.py @@ -8,6 +8,7 @@ from mmengine.model import BaseModule, constant_init from mmengine.utils.dl_utils.parrots_wrapper import _BatchNorm from mmpose.registry import MODELS + from .base_backbone import BaseBackbone @@ -36,18 +37,20 @@ class BasicBlock(BaseModule): Default: None """ - def __init__(self, - in_channels, - out_channels, - expansion=1, - stride=1, - dilation=1, - downsample=None, - style='pytorch', - with_cp=False, - conv_cfg=None, - norm_cfg=dict(type='BN'), - init_cfg=None): + def __init__( + self, + in_channels, + out_channels, + expansion=1, + stride=1, + dilation=1, + downsample=None, + style="pytorch", + with_cp=False, + conv_cfg=None, + norm_cfg=dict(type="BN"), + init_cfg=None, + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) @@ -64,28 +67,14 @@ class BasicBlock(BaseModule): self.conv_cfg = conv_cfg self.norm_cfg = norm_cfg - self.norm1_name, norm1 = build_norm_layer( - norm_cfg, self.mid_channels, postfix=1) - self.norm2_name, norm2 = build_norm_layer( - norm_cfg, out_channels, postfix=2) + self.norm1_name, norm1 = build_norm_layer(norm_cfg, self.mid_channels, postfix=1) + self.norm2_name, norm2 = build_norm_layer(norm_cfg, out_channels, postfix=2) self.conv1 = build_conv_layer( - conv_cfg, - in_channels, - self.mid_channels, - 3, - stride=stride, - padding=dilation, - dilation=dilation, - bias=False) + conv_cfg, in_channels, self.mid_channels, 3, stride=stride, padding=dilation, dilation=dilation, bias=False + ) self.add_module(self.norm1_name, norm1) - self.conv2 = build_conv_layer( - conv_cfg, - self.mid_channels, - out_channels, - 3, - padding=1, - bias=False) + self.conv2 = build_conv_layer(conv_cfg, self.mid_channels, out_channels, 3, padding=1, bias=False) self.add_module(self.norm2_name, norm2) self.relu = nn.ReLU(inplace=True) @@ -156,22 +145,24 @@ class Bottleneck(BaseModule): Default: None """ - def __init__(self, - in_channels, - out_channels, - expansion=4, - stride=1, - dilation=1, - downsample=None, - style='pytorch', - with_cp=False, - conv_cfg=None, - norm_cfg=dict(type='BN'), - init_cfg=None): + def __init__( + self, + in_channels, + out_channels, + expansion=4, + stride=1, + dilation=1, + downsample=None, + style="pytorch", + with_cp=False, + conv_cfg=None, + norm_cfg=dict(type="BN"), + init_cfg=None, + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) - assert style in ['pytorch', 'caffe'] + assert style in ["pytorch", "caffe"] self.in_channels = in_channels self.out_channels = out_channels @@ -185,27 +176,18 @@ class Bottleneck(BaseModule): self.conv_cfg = conv_cfg self.norm_cfg = norm_cfg - if self.style == 'pytorch': + if self.style == "pytorch": self.conv1_stride = 1 self.conv2_stride = stride else: self.conv1_stride = stride self.conv2_stride = 1 - self.norm1_name, norm1 = build_norm_layer( - norm_cfg, self.mid_channels, postfix=1) - self.norm2_name, norm2 = build_norm_layer( - norm_cfg, self.mid_channels, postfix=2) - self.norm3_name, norm3 = build_norm_layer( - norm_cfg, out_channels, postfix=3) + self.norm1_name, norm1 = build_norm_layer(norm_cfg, self.mid_channels, postfix=1) + self.norm2_name, norm2 = build_norm_layer(norm_cfg, self.mid_channels, postfix=2) + self.norm3_name, norm3 = build_norm_layer(norm_cfg, out_channels, postfix=3) - self.conv1 = build_conv_layer( - conv_cfg, - in_channels, - self.mid_channels, - kernel_size=1, - stride=self.conv1_stride, - bias=False) + self.conv1 = build_conv_layer(conv_cfg, in_channels, self.mid_channels, kernel_size=1, stride=self.conv1_stride, bias=False) self.add_module(self.norm1_name, norm1) self.conv2 = build_conv_layer( conv_cfg, @@ -215,15 +197,11 @@ class Bottleneck(BaseModule): stride=self.conv2_stride, padding=dilation, dilation=dilation, - bias=False) + bias=False, + ) self.add_module(self.norm2_name, norm2) - self.conv3 = build_conv_layer( - conv_cfg, - self.mid_channels, - out_channels, - kernel_size=1, - bias=False) + self.conv3 = build_conv_layer(conv_cfg, self.mid_channels, out_channels, kernel_size=1, bias=False) self.add_module(self.norm3_name, norm3) self.relu = nn.ReLU(inplace=True) @@ -299,16 +277,16 @@ def get_expansion(block, expansion=None): if isinstance(expansion, int): assert expansion > 0 elif expansion is None: - if hasattr(block, 'expansion'): + if hasattr(block, "expansion"): expansion = block.expansion elif issubclass(block, BasicBlock): expansion = 1 elif issubclass(block, Bottleneck): expansion = 4 else: - raise TypeError(f'expansion is not specified for {block.__name__}') + raise TypeError(f"expansion is not specified for {block.__name__}") else: - raise TypeError('expansion must be an integer or None') + raise TypeError("expansion must be an integer or None") return expansion @@ -337,18 +315,20 @@ class ResLayer(nn.Sequential): False for Hourglass, True for ResNet. Default: True """ - def __init__(self, - block, - num_blocks, - in_channels, - out_channels, - expansion=None, - stride=1, - avg_down=False, - conv_cfg=None, - norm_cfg=dict(type='BN'), - downsample_first=True, - **kwargs): + def __init__( + self, + block, + num_blocks, + in_channels, + out_channels, + expansion=None, + stride=1, + avg_down=False, + conv_cfg=None, + norm_cfg=dict(type="BN"), + downsample_first=True, + **kwargs, + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) self.block = block @@ -360,22 +340,13 @@ class ResLayer(nn.Sequential): conv_stride = stride if avg_down and stride != 1: conv_stride = 1 - downsample.append( - nn.AvgPool2d( - kernel_size=stride, - stride=stride, - ceil_mode=True, - count_include_pad=False)) - downsample.extend([ - build_conv_layer( - conv_cfg, - in_channels, - out_channels, - kernel_size=1, - stride=conv_stride, - bias=False), - build_norm_layer(norm_cfg, out_channels)[1] - ]) + downsample.append(nn.AvgPool2d(kernel_size=stride, stride=stride, ceil_mode=True, count_include_pad=False)) + downsample.extend( + [ + build_conv_layer(conv_cfg, in_channels, out_channels, kernel_size=1, stride=conv_stride, bias=False), + build_norm_layer(norm_cfg, out_channels)[1], + ] + ) downsample = nn.Sequential(*downsample) layers = [] @@ -389,7 +360,9 @@ class ResLayer(nn.Sequential): downsample=downsample, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - **kwargs)) + **kwargs, + ) + ) in_channels = out_channels for _ in range(1, num_blocks): layers.append( @@ -400,7 +373,9 @@ class ResLayer(nn.Sequential): stride=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - **kwargs)) + **kwargs, + ) + ) else: # downsample_first=False is for HourglassModule for i in range(0, num_blocks - 1): layers.append( @@ -411,7 +386,9 @@ class ResLayer(nn.Sequential): stride=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - **kwargs)) + **kwargs, + ) + ) layers.append( block( in_channels=in_channels, @@ -421,7 +398,9 @@ class ResLayer(nn.Sequential): downsample=downsample, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - **kwargs)) + **kwargs, + ) + ) super().__init__(*layers) @@ -495,40 +474,36 @@ class ResNet(BaseBackbone): 34: (BasicBlock, (3, 4, 6, 3)), 50: (Bottleneck, (3, 4, 6, 3)), 101: (Bottleneck, (3, 4, 23, 3)), - 152: (Bottleneck, (3, 8, 36, 3)) + 152: (Bottleneck, (3, 8, 36, 3)), } - def __init__(self, - depth, - in_channels=3, - stem_channels=64, - base_channels=64, - expansion=None, - num_stages=4, - strides=(1, 2, 2, 2), - dilations=(1, 1, 1, 1), - out_indices=(3, ), - style='pytorch', - deep_stem=False, - avg_down=False, - frozen_stages=-1, - conv_cfg=None, - norm_cfg=dict(type='BN', requires_grad=True), - norm_eval=False, - with_cp=False, - zero_init_residual=True, - init_cfg=[ - dict(type='Kaiming', layer=['Conv2d']), - dict( - type='Constant', - val=1, - layer=['_BatchNorm', 'GroupNorm']) - ]): + def __init__( + self, + depth, + in_channels=3, + stem_channels=64, + base_channels=64, + expansion=None, + num_stages=4, + strides=(1, 2, 2, 2), + dilations=(1, 1, 1, 1), + out_indices=(3,), + style="pytorch", + deep_stem=False, + avg_down=False, + frozen_stages=-1, + conv_cfg=None, + norm_cfg=dict(type="BN", requires_grad=True), + norm_eval=False, + with_cp=False, + zero_init_residual=True, + init_cfg=[dict(type="Kaiming", layer=["Conv2d"]), dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"])], + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) super(ResNet, self).__init__(init_cfg) if depth not in self.arch_settings: - raise KeyError(f'invalid depth {depth} for resnet') + raise KeyError(f"invalid depth {depth} for resnet") self.depth = depth self.stem_channels = stem_channels self.base_channels = base_channels @@ -572,10 +547,11 @@ class ResNet(BaseBackbone): avg_down=self.avg_down, with_cp=with_cp, conv_cfg=conv_cfg, - norm_cfg=norm_cfg) + norm_cfg=norm_cfg, + ) _in_channels = _out_channels _out_channels *= 2 - layer_name = f'layer{i + 1}' + layer_name = f"layer{i + 1}" self.add_module(layer_name, res_layer) self.res_layers.append(layer_name) @@ -604,7 +580,8 @@ class ResNet(BaseBackbone): padding=1, conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, - inplace=True), + inplace=True, + ), ConvModule( stem_channels // 2, stem_channels // 2, @@ -613,7 +590,8 @@ class ResNet(BaseBackbone): padding=1, conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, - inplace=True), + inplace=True, + ), ConvModule( stem_channels // 2, stem_channels, @@ -622,18 +600,12 @@ class ResNet(BaseBackbone): padding=1, conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, - inplace=True)) + inplace=True, + ), + ) else: - self.conv1 = build_conv_layer( - self.conv_cfg, - in_channels, - stem_channels, - kernel_size=7, - stride=2, - padding=3, - bias=False) - self.norm1_name, norm1 = build_norm_layer( - self.norm_cfg, stem_channels, postfix=1) + self.conv1 = build_conv_layer(self.conv_cfg, in_channels, stem_channels, kernel_size=7, stride=2, padding=3, bias=False) + self.norm1_name, norm1 = build_norm_layer(self.norm_cfg, stem_channels, postfix=1) self.add_module(self.norm1_name, norm1) self.relu = nn.ReLU(inplace=True) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) @@ -652,7 +624,7 @@ class ResNet(BaseBackbone): param.requires_grad = False for i in range(1, self.frozen_stages + 1): - m = getattr(self, f'layer{i}') + m = getattr(self, f"layer{i}") m.eval() for param in m.parameters(): param.requires_grad = False @@ -661,8 +633,7 @@ class ResNet(BaseBackbone): """Initialize the weights in backbone.""" super(ResNet, self).init_weights() - if (isinstance(self.init_cfg, dict) - and self.init_cfg['type'] == 'Pretrained'): + if isinstance(self.init_cfg, dict) and self.init_cfg["type"] == "Pretrained": # Suppress zero_init_residual if use pretrained model. return diff --git a/mmpose/models/backbones/resnext.py b/mmpose/models/backbones/resnext.py index 241f83a11449d3e816d4dbb16bd5715cf9ba6e3f..2fa809497b1d2a6f1857057f1503e7a0c3a45e56 100644 --- a/mmpose/models/backbones/resnext.py +++ b/mmpose/models/backbones/resnext.py @@ -2,8 +2,8 @@ from mmcv.cnn import build_conv_layer, build_norm_layer from mmpose.registry import MODELS -from .resnet import Bottleneck as _Bottleneck -from .resnet import ResLayer, ResNet + +from .resnet import Bottleneck as _Bottleneck, ResLayer, ResNet class Bottleneck(_Bottleneck): @@ -31,13 +31,7 @@ class Bottleneck(_Bottleneck): memory while slowing down the training speed. """ - def __init__(self, - in_channels, - out_channels, - base_channels=64, - groups=32, - width_per_group=4, - **kwargs): + def __init__(self, in_channels, out_channels, base_channels=64, groups=32, width_per_group=4, **kwargs): super().__init__(in_channels, out_channels, **kwargs) self.groups = groups self.width_per_group = width_per_group @@ -47,23 +41,15 @@ class Bottleneck(_Bottleneck): # groups and width_per_group and the stage it is located in. if groups != 1: assert self.mid_channels % base_channels == 0 - self.mid_channels = ( - groups * width_per_group * self.mid_channels // base_channels) + self.mid_channels = groups * width_per_group * self.mid_channels // base_channels - self.norm1_name, norm1 = build_norm_layer( - self.norm_cfg, self.mid_channels, postfix=1) - self.norm2_name, norm2 = build_norm_layer( - self.norm_cfg, self.mid_channels, postfix=2) - self.norm3_name, norm3 = build_norm_layer( - self.norm_cfg, self.out_channels, postfix=3) + self.norm1_name, norm1 = build_norm_layer(self.norm_cfg, self.mid_channels, postfix=1) + self.norm2_name, norm2 = build_norm_layer(self.norm_cfg, self.mid_channels, postfix=2) + self.norm3_name, norm3 = build_norm_layer(self.norm_cfg, self.out_channels, postfix=3) self.conv1 = build_conv_layer( - self.conv_cfg, - self.in_channels, - self.mid_channels, - kernel_size=1, - stride=self.conv1_stride, - bias=False) + self.conv_cfg, self.in_channels, self.mid_channels, kernel_size=1, stride=self.conv1_stride, bias=False + ) self.add_module(self.norm1_name, norm1) self.conv2 = build_conv_layer( self.conv_cfg, @@ -74,15 +60,11 @@ class Bottleneck(_Bottleneck): padding=self.dilation, dilation=self.dilation, groups=groups, - bias=False) + bias=False, + ) self.add_module(self.norm2_name, norm2) - self.conv3 = build_conv_layer( - self.conv_cfg, - self.mid_channels, - self.out_channels, - kernel_size=1, - bias=False) + self.conv3 = build_conv_layer(self.conv_cfg, self.mid_channels, self.out_channels, kernel_size=1, bias=False) self.add_module(self.norm3_name, norm3) @@ -152,11 +134,7 @@ class ResNeXt(ResNet): (1, 2048, 1, 1) """ - arch_settings = { - 50: (Bottleneck, (3, 4, 6, 3)), - 101: (Bottleneck, (3, 4, 23, 3)), - 152: (Bottleneck, (3, 8, 36, 3)) - } + arch_settings = {50: (Bottleneck, (3, 4, 6, 3)), 101: (Bottleneck, (3, 4, 23, 3)), 152: (Bottleneck, (3, 8, 36, 3))} def __init__(self, depth, groups=32, width_per_group=4, **kwargs): self.groups = groups @@ -164,8 +142,4 @@ class ResNeXt(ResNet): super().__init__(depth, **kwargs) def make_res_layer(self, **kwargs): - return ResLayer( - groups=self.groups, - width_per_group=self.width_per_group, - base_channels=self.base_channels, - **kwargs) + return ResLayer(groups=self.groups, width_per_group=self.width_per_group, base_channels=self.base_channels, **kwargs) diff --git a/mmpose/models/backbones/rsn.py b/mmpose/models/backbones/rsn.py index 8267d23d952f9639dff524cfea8e8d111ce19584..8c6781271e2b7666fbbb0d38f1a8207a181f76e8 100644 --- a/mmpose/models/backbones/rsn.py +++ b/mmpose/models/backbones/rsn.py @@ -8,6 +8,7 @@ from mmcv.cnn import ConvModule, MaxPool2d from mmengine.model import BaseModule from mmpose.registry import MODELS + from .base_backbone import BaseBackbone @@ -34,17 +35,19 @@ class RSB(BaseModule): expansion = 1 - def __init__(self, - in_channels, - out_channels, - num_steps=4, - stride=1, - downsample=None, - with_cp=False, - norm_cfg=dict(type='BN'), - expand_times=26, - res_top_channels=64, - init_cfg=None): + def __init__( + self, + in_channels, + out_channels, + num_steps=4, + stride=1, + downsample=None, + with_cp=False, + norm_cfg=dict(type="BN"), + expand_times=26, + res_top_channels=64, + init_cfg=None, + ): # Protect mutable default arguments norm_cfg = cp.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) @@ -65,10 +68,11 @@ class RSB(BaseModule): stride=self.stride, padding=0, norm_cfg=self.norm_cfg, - inplace=False) + inplace=False, + ) for i in range(self.num_steps): for j in range(i + 1): - module_name = f'conv_bn_relu2_{i + 1}_{j + 1}' + module_name = f"conv_bn_relu2_{i + 1}_{j + 1}" self.add_module( module_name, ConvModule( @@ -78,7 +82,9 @@ class RSB(BaseModule): stride=1, padding=1, norm_cfg=self.norm_cfg, - inplace=False)) + inplace=False, + ), + ) self.conv_bn3 = ConvModule( self.num_steps * self.branch_channels, self.out_channels * self.expansion, @@ -87,7 +93,8 @@ class RSB(BaseModule): padding=0, act_cfg=None, norm_cfg=self.norm_cfg, - inplace=False) + inplace=False, + ) self.relu = nn.ReLU(inplace=False) def forward(self, x): @@ -108,7 +115,7 @@ class RSB(BaseModule): inputs = outputs[i][j - 1] if i > j: inputs = inputs + outputs[i - 1][j] - module_name = f'conv_bn_relu2_{i + 1}_{j + 1}' + module_name = f"conv_bn_relu2_{i + 1}_{j + 1}" module_i_j = getattr(self, module_name) outputs[i].append(module_i_j(inputs)) @@ -145,16 +152,18 @@ class Downsample_module(BaseModule): Default: None """ - def __init__(self, - block, - num_blocks, - num_steps=4, - num_units=4, - has_skip=False, - norm_cfg=dict(type='BN'), - in_channels=64, - expand_times=26, - init_cfg=None): + def __init__( + self, + block, + num_blocks, + num_steps=4, + num_units=4, + has_skip=False, + norm_cfg=dict(type="BN"), + in_channels=64, + expand_times=26, + init_cfg=None, + ): # Protect mutable default arguments norm_cfg = cp.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) @@ -165,31 +174,17 @@ class Downsample_module(BaseModule): self.num_units = num_units self.num_steps = num_steps self.norm_cfg = norm_cfg - self.layer1 = self._make_layer( - block, - in_channels, - num_blocks[0], - expand_times=expand_times, - res_top_channels=in_channels) + self.layer1 = self._make_layer(block, in_channels, num_blocks[0], expand_times=expand_times, res_top_channels=in_channels) for i in range(1, num_units): - module_name = f'layer{i + 1}' + module_name = f"layer{i + 1}" self.add_module( module_name, self._make_layer( - block, - in_channels * pow(2, i), - num_blocks[i], - stride=2, - expand_times=expand_times, - res_top_channels=in_channels)) - - def _make_layer(self, - block, - out_channels, - blocks, - stride=1, - expand_times=26, - res_top_channels=64): + block, in_channels * pow(2, i), num_blocks[i], stride=2, expand_times=expand_times, res_top_channels=in_channels + ), + ) + + def _make_layer(self, block, out_channels, blocks, stride=1, expand_times=26, res_top_channels=64): downsample = None if stride != 1 or self.in_channels != out_channels * block.expansion: downsample = ConvModule( @@ -200,7 +195,8 @@ class Downsample_module(BaseModule): padding=0, norm_cfg=self.norm_cfg, act_cfg=None, - inplace=True) + inplace=True, + ) units = list() units.append( @@ -212,23 +208,23 @@ class Downsample_module(BaseModule): downsample=downsample, norm_cfg=self.norm_cfg, expand_times=expand_times, - res_top_channels=res_top_channels)) + res_top_channels=res_top_channels, + ) + ) self.in_channels = out_channels * block.expansion for _ in range(1, blocks): units.append( block( - self.in_channels, - out_channels, - num_steps=self.num_steps, - expand_times=expand_times, - res_top_channels=res_top_channels)) + self.in_channels, out_channels, num_steps=self.num_steps, expand_times=expand_times, res_top_channels=res_top_channels + ) + ) return nn.Sequential(*units) def forward(self, x, skip1, skip2): out = list() for i in range(self.num_units): - module_name = f'layer{i + 1}' + module_name = f"layer{i + 1}" module_i = getattr(self, module_name) x = module_i(x) if self.has_skip: @@ -263,84 +259,53 @@ class Upsample_unit(BaseModule): Default: None """ - def __init__(self, - ind, - num_units, - in_channels, - unit_channels=256, - gen_skip=False, - gen_cross_conv=False, - norm_cfg=dict(type='BN'), - out_channels=64, - init_cfg=None): + def __init__( + self, + ind, + num_units, + in_channels, + unit_channels=256, + gen_skip=False, + gen_cross_conv=False, + norm_cfg=dict(type="BN"), + out_channels=64, + init_cfg=None, + ): # Protect mutable default arguments norm_cfg = cp.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) self.num_units = num_units self.norm_cfg = norm_cfg self.in_skip = ConvModule( - in_channels, - unit_channels, - kernel_size=1, - stride=1, - padding=0, - norm_cfg=self.norm_cfg, - act_cfg=None, - inplace=True) + in_channels, unit_channels, kernel_size=1, stride=1, padding=0, norm_cfg=self.norm_cfg, act_cfg=None, inplace=True + ) self.relu = nn.ReLU(inplace=True) self.ind = ind if self.ind > 0: self.up_conv = ConvModule( - unit_channels, - unit_channels, - kernel_size=1, - stride=1, - padding=0, - norm_cfg=self.norm_cfg, - act_cfg=None, - inplace=True) + unit_channels, unit_channels, kernel_size=1, stride=1, padding=0, norm_cfg=self.norm_cfg, act_cfg=None, inplace=True + ) self.gen_skip = gen_skip if self.gen_skip: - self.out_skip1 = ConvModule( - in_channels, - in_channels, - kernel_size=1, - stride=1, - padding=0, - norm_cfg=self.norm_cfg, - inplace=True) + self.out_skip1 = ConvModule(in_channels, in_channels, kernel_size=1, stride=1, padding=0, norm_cfg=self.norm_cfg, inplace=True) self.out_skip2 = ConvModule( - unit_channels, - in_channels, - kernel_size=1, - stride=1, - padding=0, - norm_cfg=self.norm_cfg, - inplace=True) + unit_channels, in_channels, kernel_size=1, stride=1, padding=0, norm_cfg=self.norm_cfg, inplace=True + ) self.gen_cross_conv = gen_cross_conv if self.ind == num_units - 1 and self.gen_cross_conv: self.cross_conv = ConvModule( - unit_channels, - out_channels, - kernel_size=1, - stride=1, - padding=0, - norm_cfg=self.norm_cfg, - inplace=True) + unit_channels, out_channels, kernel_size=1, stride=1, padding=0, norm_cfg=self.norm_cfg, inplace=True + ) def forward(self, x, up_x): out = self.in_skip(x) if self.ind > 0: - up_x = F.interpolate( - up_x, - size=(x.size(2), x.size(3)), - mode='bilinear', - align_corners=True) + up_x = F.interpolate(up_x, size=(x.size(2), x.size(3)), mode="bilinear", align_corners=True) up_x = self.up_conv(up_x) out = out + up_x out = self.relu(out) @@ -377,14 +342,9 @@ class Upsample_module(BaseModule): Default: None """ - def __init__(self, - unit_channels=256, - num_units=4, - gen_skip=False, - gen_cross_conv=False, - norm_cfg=dict(type='BN'), - out_channels=64, - init_cfg=None): + def __init__( + self, unit_channels=256, num_units=4, gen_skip=False, gen_cross_conv=False, norm_cfg=dict(type="BN"), out_channels=64, init_cfg=None + ): # Protect mutable default arguments norm_cfg = cp.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) @@ -397,7 +357,7 @@ class Upsample_module(BaseModule): self.gen_cross_conv = gen_cross_conv self.norm_cfg = norm_cfg for i in range(num_units): - module_name = f'up{i + 1}' + module_name = f"up{i + 1}" self.add_module( module_name, Upsample_unit( @@ -408,7 +368,9 @@ class Upsample_module(BaseModule): self.gen_skip, self.gen_cross_conv, norm_cfg=self.norm_cfg, - out_channels=64)) + out_channels=64, + ), + ) def forward(self, x): out = list() @@ -416,7 +378,7 @@ class Upsample_module(BaseModule): skip2 = list() cross_conv = None for i in range(self.num_units): - module_i = getattr(self, f'up{i + 1}') + module_i = getattr(self, f"up{i + 1}") if i == 0: outi, skip1_i, skip2_i, _ = module_i(x[i], None) elif i == self.num_units - 1: @@ -457,18 +419,20 @@ class Single_stage_RSN(BaseModule): Default: None """ - def __init__(self, - has_skip=False, - gen_skip=False, - gen_cross_conv=False, - unit_channels=256, - num_units=4, - num_steps=4, - num_blocks=[2, 2, 2, 2], - norm_cfg=dict(type='BN'), - in_channels=64, - expand_times=26, - init_cfg=None): + def __init__( + self, + has_skip=False, + gen_skip=False, + gen_cross_conv=False, + unit_channels=256, + num_units=4, + num_steps=4, + num_blocks=[2, 2, 2, 2], + norm_cfg=dict(type="BN"), + in_channels=64, + expand_times=26, + init_cfg=None, + ): # Protect mutable default arguments norm_cfg = cp.deepcopy(norm_cfg) num_blocks = cp.deepcopy(num_blocks) @@ -483,11 +447,8 @@ class Single_stage_RSN(BaseModule): self.num_blocks = num_blocks self.norm_cfg = norm_cfg - self.downsample = Downsample_module(RSB, num_blocks, num_steps, - num_units, has_skip, norm_cfg, - in_channels, expand_times) - self.upsample = Upsample_module(unit_channels, num_units, gen_skip, - gen_cross_conv, norm_cfg, in_channels) + self.downsample = Downsample_module(RSB, num_blocks, num_steps, num_units, has_skip, norm_cfg, in_channels, expand_times) + self.upsample = Upsample_module(unit_channels, num_units, gen_skip, gen_cross_conv, norm_cfg, in_channels) def forward(self, x, skip1, skip2): mid = self.downsample(x, skip1, skip2) @@ -507,19 +468,14 @@ class ResNet_top(BaseModule): Default: None """ - def __init__(self, norm_cfg=dict(type='BN'), channels=64, init_cfg=None): + def __init__(self, norm_cfg=dict(type="BN"), channels=64, init_cfg=None): # Protect mutable default arguments norm_cfg = cp.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) self.top = nn.Sequential( - ConvModule( - 3, - channels, - kernel_size=7, - stride=2, - padding=3, - norm_cfg=norm_cfg, - inplace=True), MaxPool2d(kernel_size=3, stride=2, padding=1)) + ConvModule(3, channels, kernel_size=7, stride=2, padding=3, norm_cfg=norm_cfg, inplace=True), + MaxPool2d(kernel_size=3, stride=2, padding=1), + ) def forward(self, img): return self.top(img) @@ -576,23 +532,22 @@ class RSN(BaseBackbone): (1, 256, 128, 128) """ - def __init__(self, - unit_channels=256, - num_stages=4, - num_units=4, - num_blocks=[2, 2, 2, 2], - num_steps=4, - norm_cfg=dict(type='BN'), - res_top_channels=64, - expand_times=26, - init_cfg=[ - dict(type='Kaiming', layer=['Conv2d']), - dict( - type='Constant', - val=1, - layer=['_BatchNorm', 'GroupNorm']), - dict(type='Normal', std=0.01, layer=['Linear']), - ]): + def __init__( + self, + unit_channels=256, + num_stages=4, + num_units=4, + num_blocks=[2, 2, 2, 2], + num_steps=4, + norm_cfg=dict(type="BN"), + res_top_channels=64, + expand_times=26, + init_cfg=[ + dict(type="Kaiming", layer=["Conv2d"]), + dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"]), + dict(type="Normal", std=0.01, layer=["Linear"]), + ], + ): # Protect mutable default arguments norm_cfg = cp.deepcopy(norm_cfg) num_blocks = cp.deepcopy(num_blocks) @@ -622,10 +577,19 @@ class RSN(BaseBackbone): gen_skip = False gen_cross_conv = False self.multi_stage_rsn.append( - Single_stage_RSN(has_skip, gen_skip, gen_cross_conv, - unit_channels, num_units, num_steps, - num_blocks, norm_cfg, res_top_channels, - expand_times)) + Single_stage_RSN( + has_skip, + gen_skip, + gen_cross_conv, + unit_channels, + num_units, + num_steps, + num_blocks, + norm_cfg, + res_top_channels, + expand_times, + ) + ) def forward(self, x): """Model forward function.""" diff --git a/mmpose/models/backbones/scnet.py b/mmpose/models/backbones/scnet.py index 5c802d256e711aa70c955ac5bb91d2f7ff724604..75119efb6eafcc674564e309c24a943e3e7586fe 100644 --- a/mmpose/models/backbones/scnet.py +++ b/mmpose/models/backbones/scnet.py @@ -9,6 +9,7 @@ from mmcv.cnn import build_conv_layer, build_norm_layer from mmengine.model import BaseModule from mmpose.registry import MODELS + from .resnet import Bottleneck, ResNet @@ -28,14 +29,7 @@ class SCConv(BaseModule): Default: None """ - def __init__(self, - in_channels, - out_channels, - stride, - pooling_r, - conv_cfg=None, - norm_cfg=dict(type='BN', momentum=0.1), - init_cfg=None): + def __init__(self, in_channels, out_channels, stride, pooling_r, conv_cfg=None, norm_cfg=dict(type="BN", momentum=0.1), init_cfg=None): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) @@ -44,36 +38,15 @@ class SCConv(BaseModule): self.k2 = nn.Sequential( nn.AvgPool2d(kernel_size=pooling_r, stride=pooling_r), - build_conv_layer( - conv_cfg, - in_channels, - in_channels, - kernel_size=3, - stride=1, - padding=1, - bias=False), + build_conv_layer(conv_cfg, in_channels, in_channels, kernel_size=3, stride=1, padding=1, bias=False), build_norm_layer(norm_cfg, in_channels)[1], ) self.k3 = nn.Sequential( - build_conv_layer( - conv_cfg, - in_channels, - in_channels, - kernel_size=3, - stride=1, - padding=1, - bias=False), + build_conv_layer(conv_cfg, in_channels, in_channels, kernel_size=3, stride=1, padding=1, bias=False), build_norm_layer(norm_cfg, in_channels)[1], ) self.k4 = nn.Sequential( - build_conv_layer( - conv_cfg, - in_channels, - in_channels, - kernel_size=3, - stride=stride, - padding=1, - bias=False), + build_conv_layer(conv_cfg, in_channels, in_channels, kernel_size=3, stride=stride, padding=1, bias=False), build_norm_layer(norm_cfg, out_channels)[1], nn.ReLU(inplace=True), ) @@ -82,9 +55,7 @@ class SCConv(BaseModule): """Forward function.""" identity = x - out = torch.sigmoid( - torch.add(identity, F.interpolate(self.k2(x), - identity.size()[2:]))) + out = torch.sigmoid(torch.add(identity, F.interpolate(self.k2(x), identity.size()[2:]))) out = torch.mul(self.k3(x), out) out = self.k4(out) @@ -105,53 +76,25 @@ class SCBottleneck(Bottleneck): super().__init__(in_channels, out_channels, **kwargs) self.mid_channels = out_channels // self.expansion // 2 - self.norm1_name, norm1 = build_norm_layer( - self.norm_cfg, self.mid_channels, postfix=1) - self.norm2_name, norm2 = build_norm_layer( - self.norm_cfg, self.mid_channels, postfix=2) - self.norm3_name, norm3 = build_norm_layer( - self.norm_cfg, out_channels, postfix=3) - - self.conv1 = build_conv_layer( - self.conv_cfg, - in_channels, - self.mid_channels, - kernel_size=1, - stride=1, - bias=False) + self.norm1_name, norm1 = build_norm_layer(self.norm_cfg, self.mid_channels, postfix=1) + self.norm2_name, norm2 = build_norm_layer(self.norm_cfg, self.mid_channels, postfix=2) + self.norm3_name, norm3 = build_norm_layer(self.norm_cfg, out_channels, postfix=3) + + self.conv1 = build_conv_layer(self.conv_cfg, in_channels, self.mid_channels, kernel_size=1, stride=1, bias=False) self.add_module(self.norm1_name, norm1) self.k1 = nn.Sequential( - build_conv_layer( - self.conv_cfg, - self.mid_channels, - self.mid_channels, - kernel_size=3, - stride=self.stride, - padding=1, - bias=False), + build_conv_layer(self.conv_cfg, self.mid_channels, self.mid_channels, kernel_size=3, stride=self.stride, padding=1, bias=False), build_norm_layer(self.norm_cfg, self.mid_channels)[1], - nn.ReLU(inplace=True)) - - self.conv2 = build_conv_layer( - self.conv_cfg, - in_channels, - self.mid_channels, - kernel_size=1, - stride=1, - bias=False) + nn.ReLU(inplace=True), + ) + + self.conv2 = build_conv_layer(self.conv_cfg, in_channels, self.mid_channels, kernel_size=1, stride=1, bias=False) self.add_module(self.norm2_name, norm2) - self.scconv = SCConv(self.mid_channels, self.mid_channels, self.stride, - self.pooling_r, self.conv_cfg, self.norm_cfg) + self.scconv = SCConv(self.mid_channels, self.mid_channels, self.stride, self.pooling_r, self.conv_cfg, self.norm_cfg) - self.conv3 = build_conv_layer( - self.conv_cfg, - self.mid_channels * 2, - out_channels, - kernel_size=1, - stride=1, - bias=False) + self.conv3 = build_conv_layer(self.conv_cfg, self.mid_channels * 2, out_channels, kernel_size=1, stride=1, bias=False) self.add_module(self.norm3_name, norm3) def forward(self, x): @@ -241,12 +184,9 @@ class SCNet(ResNet): (1, 2048, 7, 7) """ - arch_settings = { - 50: (SCBottleneck, [3, 4, 6, 3]), - 101: (SCBottleneck, [3, 4, 23, 3]) - } + arch_settings = {50: (SCBottleneck, [3, 4, 6, 3]), 101: (SCBottleneck, [3, 4, 23, 3])} def __init__(self, depth, **kwargs): if depth not in self.arch_settings: - raise KeyError(f'invalid depth {depth} for SCNet') + raise KeyError(f"invalid depth {depth} for SCNet") super().__init__(depth, **kwargs) diff --git a/mmpose/models/backbones/seresnet.py b/mmpose/models/backbones/seresnet.py index 617a1b72bee737ef0f3fb305e83ce33d8c8a7ea1..5b3875704f96b88821bb170cabb8e4722ef068ba 100644 --- a/mmpose/models/backbones/seresnet.py +++ b/mmpose/models/backbones/seresnet.py @@ -2,6 +2,7 @@ import torch.utils.checkpoint as cp from mmpose.registry import MODELS + from .resnet import Bottleneck, ResLayer, ResNet from .utils.se_layer import SELayer @@ -118,15 +119,11 @@ class SEResNet(ResNet): (1, 2048, 7, 7) """ - arch_settings = { - 50: (SEBottleneck, (3, 4, 6, 3)), - 101: (SEBottleneck, (3, 4, 23, 3)), - 152: (SEBottleneck, (3, 8, 36, 3)) - } + arch_settings = {50: (SEBottleneck, (3, 4, 6, 3)), 101: (SEBottleneck, (3, 4, 23, 3)), 152: (SEBottleneck, (3, 8, 36, 3))} def __init__(self, depth, se_ratio=16, **kwargs): if depth not in self.arch_settings: - raise KeyError(f'invalid depth {depth} for SEResNet') + raise KeyError(f"invalid depth {depth} for SEResNet") self.se_ratio = se_ratio super().__init__(depth, **kwargs) diff --git a/mmpose/models/backbones/seresnext.py b/mmpose/models/backbones/seresnext.py index c1f5a6c8f3fe6b602aceb331781cd119958518b7..6c43b76c8c0a1e7523fb6281abda9be866f22b58 100644 --- a/mmpose/models/backbones/seresnext.py +++ b/mmpose/models/backbones/seresnext.py @@ -2,9 +2,9 @@ from mmcv.cnn import build_conv_layer, build_norm_layer from mmpose.registry import MODELS + from .resnet import ResLayer -from .seresnet import SEBottleneck as _SEBottleneck -from .seresnet import SEResNet +from .seresnet import SEBottleneck as _SEBottleneck, SEResNet class SEBottleneck(_SEBottleneck): @@ -36,14 +36,7 @@ class SEBottleneck(_SEBottleneck): Default: None """ - def __init__(self, - in_channels, - out_channels, - base_channels=64, - groups=32, - width_per_group=4, - se_ratio=16, - **kwargs): + def __init__(self, in_channels, out_channels, base_channels=64, groups=32, width_per_group=4, se_ratio=16, **kwargs): super().__init__(in_channels, out_channels, se_ratio, **kwargs) self.groups = groups self.width_per_group = width_per_group @@ -54,23 +47,15 @@ class SEBottleneck(_SEBottleneck): # groups and width_per_group and the stage it is located in. if groups != 1: assert self.mid_channels % base_channels == 0 - self.mid_channels = ( - groups * width_per_group * self.mid_channels // base_channels) + self.mid_channels = groups * width_per_group * self.mid_channels // base_channels - self.norm1_name, norm1 = build_norm_layer( - self.norm_cfg, self.mid_channels, postfix=1) - self.norm2_name, norm2 = build_norm_layer( - self.norm_cfg, self.mid_channels, postfix=2) - self.norm3_name, norm3 = build_norm_layer( - self.norm_cfg, self.out_channels, postfix=3) + self.norm1_name, norm1 = build_norm_layer(self.norm_cfg, self.mid_channels, postfix=1) + self.norm2_name, norm2 = build_norm_layer(self.norm_cfg, self.mid_channels, postfix=2) + self.norm3_name, norm3 = build_norm_layer(self.norm_cfg, self.out_channels, postfix=3) self.conv1 = build_conv_layer( - self.conv_cfg, - self.in_channels, - self.mid_channels, - kernel_size=1, - stride=self.conv1_stride, - bias=False) + self.conv_cfg, self.in_channels, self.mid_channels, kernel_size=1, stride=self.conv1_stride, bias=False + ) self.add_module(self.norm1_name, norm1) self.conv2 = build_conv_layer( self.conv_cfg, @@ -81,15 +66,11 @@ class SEBottleneck(_SEBottleneck): padding=self.dilation, dilation=self.dilation, groups=groups, - bias=False) + bias=False, + ) self.add_module(self.norm2_name, norm2) - self.conv3 = build_conv_layer( - self.conv_cfg, - self.mid_channels, - self.out_channels, - kernel_size=1, - bias=False) + self.conv3 = build_conv_layer(self.conv_cfg, self.mid_channels, self.out_channels, kernel_size=1, bias=False) self.add_module(self.norm3_name, norm3) @@ -160,11 +141,7 @@ class SEResNeXt(SEResNet): (1, 2048, 7, 7) """ - arch_settings = { - 50: (SEBottleneck, (3, 4, 6, 3)), - 101: (SEBottleneck, (3, 4, 23, 3)), - 152: (SEBottleneck, (3, 8, 36, 3)) - } + arch_settings = {50: (SEBottleneck, (3, 4, 6, 3)), 101: (SEBottleneck, (3, 4, 23, 3)), 152: (SEBottleneck, (3, 8, 36, 3))} def __init__(self, depth, groups=32, width_per_group=4, **kwargs): self.groups = groups @@ -172,8 +149,4 @@ class SEResNeXt(SEResNet): super().__init__(depth, **kwargs) def make_res_layer(self, **kwargs): - return ResLayer( - groups=self.groups, - width_per_group=self.width_per_group, - base_channels=self.base_channels, - **kwargs) + return ResLayer(groups=self.groups, width_per_group=self.width_per_group, base_channels=self.base_channels, **kwargs) diff --git a/mmpose/models/backbones/shufflenet_v1.py b/mmpose/models/backbones/shufflenet_v1.py index 17491910e9c1c2ec4eea04ca715dc91293f00cd4..d6e52256deadd1305e09999835b980b4fa33cbc3 100644 --- a/mmpose/models/backbones/shufflenet_v1.py +++ b/mmpose/models/backbones/shufflenet_v1.py @@ -9,6 +9,7 @@ from mmengine.model import BaseModule from torch.nn.modules.batchnorm import _BatchNorm from mmpose.registry import MODELS + from .base_backbone import BaseBackbone from .utils import channel_shuffle, make_divisible @@ -45,17 +46,19 @@ class ShuffleUnit(BaseModule): Tensor: The output tensor. """ - def __init__(self, - in_channels, - out_channels, - groups=3, - first_block=True, - combine='add', - conv_cfg=None, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='ReLU'), - with_cp=False, - init_cfg=None): + def __init__( + self, + in_channels, + out_channels, + groups=3, + first_block=True, + combine="add", + conv_cfg=None, + norm_cfg=dict(type="BN"), + act_cfg=dict(type="ReLU"), + with_cp=False, + init_cfg=None, + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) act_cfg = copy.deepcopy(act_cfg) @@ -68,20 +71,17 @@ class ShuffleUnit(BaseModule): self.bottleneck_channels = self.out_channels // 4 self.with_cp = with_cp - if self.combine == 'add': + if self.combine == "add": self.depthwise_stride = 1 self._combine_func = self._add - assert in_channels == out_channels, ( - 'in_channels must be equal to out_channels when combine ' - 'is add') - elif self.combine == 'concat': + assert in_channels == out_channels, "in_channels must be equal to out_channels when combine " "is add" + elif self.combine == "concat": self.depthwise_stride = 2 self._combine_func = self._concat self.out_channels -= self.in_channels self.avgpool = nn.AvgPool2d(kernel_size=3, stride=2, padding=1) else: - raise ValueError(f'Cannot combine tensors with {self.combine}. ' - 'Only "add" and "concat" are supported') + raise ValueError(f"Cannot combine tensors with {self.combine}. " 'Only "add" and "concat" are supported') self.first_1x1_groups = 1 if first_block else self.groups self.g_conv_1x1_compress = ConvModule( @@ -91,7 +91,8 @@ class ShuffleUnit(BaseModule): groups=self.first_1x1_groups, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg) + act_cfg=act_cfg, + ) self.depthwise_conv3x3_bn = ConvModule( in_channels=self.bottleneck_channels, @@ -102,7 +103,8 @@ class ShuffleUnit(BaseModule): groups=self.bottleneck_channels, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=None) + act_cfg=None, + ) self.g_conv_1x1_expand = ConvModule( in_channels=self.bottleneck_channels, @@ -111,7 +113,8 @@ class ShuffleUnit(BaseModule): groups=self.groups, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=None) + act_cfg=None, + ) self.act = build_activation_layer(act_cfg) @@ -138,7 +141,7 @@ class ShuffleUnit(BaseModule): out = self.g_conv_1x1_expand(out) - if self.combine == 'concat': + if self.combine == "concat": residual = self.avgpool(residual) out = self.act(out) out = self._combine_func(residual, out) @@ -191,24 +194,22 @@ class ShuffleNetV1(BaseBackbone): ]`` """ - def __init__(self, - groups=3, - widen_factor=1.0, - out_indices=(2, ), - frozen_stages=-1, - conv_cfg=None, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='ReLU'), - norm_eval=False, - with_cp=False, - init_cfg=[ - dict(type='Normal', std=0.01, layer=['Conv2d']), - dict( - type='Constant', - val=1, - bias=0.0001, - layer=['_BatchNorm', 'GroupNorm']) - ]): + def __init__( + self, + groups=3, + widen_factor=1.0, + out_indices=(2,), + frozen_stages=-1, + conv_cfg=None, + norm_cfg=dict(type="BN"), + act_cfg=dict(type="ReLU"), + norm_eval=False, + with_cp=False, + init_cfg=[ + dict(type="Normal", std=0.01, layer=["Conv2d"]), + dict(type="Constant", val=1, bias=0.0001, layer=["_BatchNorm", "GroupNorm"]), + ], + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) act_cfg = copy.deepcopy(act_cfg) @@ -218,12 +219,10 @@ class ShuffleNetV1(BaseBackbone): for index in out_indices: if index not in range(0, 3): - raise ValueError('the item in out_indices must in ' - f'range(0, 3). But received {index}') + raise ValueError("the item in out_indices must in " f"range(0, 3). But received {index}") if frozen_stages not in range(-1, 3): - raise ValueError('frozen_stages must be in range(-1, 3). ' - f'But received {frozen_stages}') + raise ValueError("frozen_stages must be in range(-1, 3). " f"But received {frozen_stages}") self.out_indices = out_indices self.frozen_stages = frozen_stages self.conv_cfg = conv_cfg @@ -243,8 +242,7 @@ class ShuffleNetV1(BaseBackbone): elif groups == 8: channels = (384, 768, 1536) else: - raise ValueError(f'{groups} groups is not supported for 1x1 ' - 'Grouped Convolutions') + raise ValueError(f"{groups} groups is not supported for 1x1 " "Grouped Convolutions") channels = [make_divisible(ch * widen_factor, 8) for ch in channels] @@ -258,12 +256,13 @@ class ShuffleNetV1(BaseBackbone): padding=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg) + act_cfg=act_cfg, + ) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) self.layers = nn.ModuleList() for i, num_blocks in enumerate(self.stage_blocks): - first_block = (i == 0) + first_block = i == 0 layer = self.make_layer(channels[i], num_blocks, first_block) self.layers.append(layer) @@ -280,12 +279,11 @@ class ShuffleNetV1(BaseBackbone): def init_weights(self, pretrained=None): super(ShuffleNetV1, self).init_weights() - if (isinstance(self.init_cfg, dict) - and self.init_cfg['type'] == 'Pretrained'): + if isinstance(self.init_cfg, dict) and self.init_cfg["type"] == "Pretrained": return for name, m in self.named_modules(): - if isinstance(m, nn.Conv2d) and 'conv1' not in name: + if isinstance(m, nn.Conv2d) and "conv1" not in name: nn.init.normal_(m.weight, mean=0, std=1.0 / m.weight.shape[1]) def make_layer(self, out_channels, num_blocks, first_block=False): @@ -301,7 +299,7 @@ class ShuffleNetV1(BaseBackbone): layers = [] for i in range(num_blocks): first_block = first_block if i == 0 else False - combine_mode = 'concat' if i == 0 else 'add' + combine_mode = "concat" if i == 0 else "add" layers.append( ShuffleUnit( self.in_channels, @@ -312,7 +310,9 @@ class ShuffleNetV1(BaseBackbone): conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, act_cfg=self.act_cfg, - with_cp=self.with_cp)) + with_cp=self.with_cp, + ) + ) self.in_channels = out_channels return nn.Sequential(*layers) diff --git a/mmpose/models/backbones/shufflenet_v2.py b/mmpose/models/backbones/shufflenet_v2.py index 9757841e73bf547fde77cf847a917c46acfb0b00..4df3c0b0674016b355163bee6702ff40b708acdf 100644 --- a/mmpose/models/backbones/shufflenet_v2.py +++ b/mmpose/models/backbones/shufflenet_v2.py @@ -8,6 +8,7 @@ from mmcv.cnn import ConvModule from mmengine.model import BaseModule from mmpose.registry import MODELS + from .base_backbone import BaseBackbone from .utils import channel_shuffle @@ -31,15 +32,17 @@ class InvertedResidual(BaseModule): Default: None """ - def __init__(self, - in_channels, - out_channels, - stride=1, - conv_cfg=None, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='ReLU'), - with_cp=False, - init_cfg=None): + def __init__( + self, + in_channels, + out_channels, + stride=1, + conv_cfg=None, + norm_cfg=dict(type="BN"), + act_cfg=dict(type="ReLU"), + with_cp=False, + init_cfg=None, + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) act_cfg = copy.deepcopy(act_cfg) @@ -50,14 +53,11 @@ class InvertedResidual(BaseModule): branch_features = out_channels // 2 if self.stride == 1: assert in_channels == branch_features * 2, ( - f'in_channels ({in_channels}) should equal to ' - f'branch_features * 2 ({branch_features * 2}) ' - 'when stride is 1') + f"in_channels ({in_channels}) should equal to " f"branch_features * 2 ({branch_features * 2}) " "when stride is 1" + ) if in_channels != branch_features * 2: - assert self.stride != 1, ( - f'stride ({self.stride}) should not equal 1 when ' - f'in_channels != branch_features * 2') + assert self.stride != 1, f"stride ({self.stride}) should not equal 1 when " f"in_channels != branch_features * 2" if self.stride > 1: self.branch1 = nn.Sequential( @@ -70,16 +70,11 @@ class InvertedResidual(BaseModule): groups=in_channels, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=None), + act_cfg=None, + ), ConvModule( - in_channels, - branch_features, - kernel_size=1, - stride=1, - padding=0, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg), + in_channels, branch_features, kernel_size=1, stride=1, padding=0, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg + ), ) self.branch2 = nn.Sequential( @@ -91,7 +86,8 @@ class InvertedResidual(BaseModule): padding=0, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg), + act_cfg=act_cfg, + ), ConvModule( branch_features, branch_features, @@ -101,16 +97,12 @@ class InvertedResidual(BaseModule): groups=branch_features, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=None), + act_cfg=None, + ), ConvModule( - branch_features, - branch_features, - kernel_size=1, - stride=1, - padding=0, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg)) + branch_features, branch_features, kernel_size=1, stride=1, padding=0, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg + ), + ) def forward(self, x): @@ -167,23 +159,21 @@ class ShuffleNetV2(BaseBackbone): ]`` """ - def __init__(self, - widen_factor=1.0, - out_indices=(3, ), - frozen_stages=-1, - conv_cfg=None, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='ReLU'), - norm_eval=False, - with_cp=False, - init_cfg=[ - dict(type='Normal', std=0.01, layer=['Conv2d']), - dict( - type='Constant', - val=1, - bias=0.0001, - layer=['_BatchNorm', 'GroupNorm']) - ]): + def __init__( + self, + widen_factor=1.0, + out_indices=(3,), + frozen_stages=-1, + conv_cfg=None, + norm_cfg=dict(type="BN"), + act_cfg=dict(type="ReLU"), + norm_eval=False, + with_cp=False, + init_cfg=[ + dict(type="Normal", std=0.01, layer=["Conv2d"]), + dict(type="Constant", val=1, bias=0.0001, layer=["_BatchNorm", "GroupNorm"]), + ], + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) act_cfg = copy.deepcopy(act_cfg) @@ -191,12 +181,10 @@ class ShuffleNetV2(BaseBackbone): self.stage_blocks = [4, 8, 4] for index in out_indices: if index not in range(0, 4): - raise ValueError('the item in out_indices must in ' - f'range(0, 4). But received {index}') + raise ValueError("the item in out_indices must in " f"range(0, 4). But received {index}") if frozen_stages not in range(-1, 4): - raise ValueError('frozen_stages must be in range(-1, 4). ' - f'But received {frozen_stages}') + raise ValueError("frozen_stages must be in range(-1, 4). " f"But received {frozen_stages}") self.out_indices = out_indices self.frozen_stages = frozen_stages self.conv_cfg = conv_cfg @@ -214,8 +202,7 @@ class ShuffleNetV2(BaseBackbone): elif widen_factor == 2.0: channels = [244, 488, 976, 2048] else: - raise ValueError('widen_factor must be in [0.5, 1.0, 1.5, 2.0]. ' - f'But received {widen_factor}') + raise ValueError("widen_factor must be in [0.5, 1.0, 1.5, 2.0]. " f"But received {widen_factor}") self.in_channels = 24 self.conv1 = ConvModule( @@ -226,7 +213,8 @@ class ShuffleNetV2(BaseBackbone): padding=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg) + act_cfg=act_cfg, + ) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) @@ -243,7 +231,9 @@ class ShuffleNetV2(BaseBackbone): kernel_size=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg)) + act_cfg=act_cfg, + ) + ) def _make_layer(self, out_channels, num_blocks): """Stack blocks to make a layer. @@ -263,7 +253,9 @@ class ShuffleNetV2(BaseBackbone): conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, act_cfg=self.act_cfg, - with_cp=self.with_cp)) + with_cp=self.with_cp, + ) + ) self.in_channels = out_channels return nn.Sequential(*layers) @@ -282,12 +274,11 @@ class ShuffleNetV2(BaseBackbone): def init_weights(self): super(ShuffleNetV2, self).init_weights() - if (isinstance(self.init_cfg, dict) - and self.init_cfg['type'] == 'Pretrained'): + if isinstance(self.init_cfg, dict) and self.init_cfg["type"] == "Pretrained": return for name, m in self.named_modules(): - if isinstance(m, nn.Conv2d) and 'conv1' not in name: + if isinstance(m, nn.Conv2d) and "conv1" not in name: nn.init.normal_(m.weight, mean=0, std=1.0 / m.weight.shape[1]) def forward(self, x): diff --git a/mmpose/models/backbones/swin.py b/mmpose/models/backbones/swin.py index a8f7c972787c19f64eb398615966722c5bdcd533..4648cdcdb23e361acf4008416d8df26636764df9 100644 --- a/mmpose/models/backbones/swin.py +++ b/mmpose/models/backbones/swin.py @@ -14,6 +14,7 @@ from mmengine.utils import to_2tuple from mmpose.registry import MODELS from mmpose.utils import get_root_logger + from ..utils.transformer import PatchEmbed, PatchMerging from .base_backbone import BaseBackbone from .utils import get_state_dict @@ -39,15 +40,9 @@ class WindowMSA(BaseModule): Default: None. """ - def __init__(self, - embed_dims, - num_heads, - window_size, - qkv_bias=True, - qk_scale=None, - attn_drop_rate=0., - proj_drop_rate=0., - init_cfg=None): + def __init__( + self, embed_dims, num_heads, window_size, qkv_bias=True, qk_scale=None, attn_drop_rate=0.0, proj_drop_rate=0.0, init_cfg=None + ): super().__init__(init_cfg=init_cfg) self.embed_dims = embed_dims @@ -58,15 +53,15 @@ class WindowMSA(BaseModule): # define a parameter table of relative position bias self.relative_position_bias_table = nn.Parameter( - torch.zeros((2 * window_size[0] - 1) * (2 * window_size[1] - 1), - num_heads)) # 2*Wh-1 * 2*Ww-1, nH + torch.zeros((2 * window_size[0] - 1) * (2 * window_size[1] - 1), num_heads) + ) # 2*Wh-1 * 2*Ww-1, nH # About 2x faster than original impl Wh, Ww = self.window_size rel_index_coords = self.double_step_seq(2 * Ww - 1, Wh, 1, Ww) rel_position_index = rel_index_coords + rel_index_coords.T rel_position_index = rel_position_index.flip(1).contiguous() - self.register_buffer('relative_position_index', rel_position_index) + self.register_buffer("relative_position_index", rel_position_index) self.qkv = nn.Linear(embed_dims, embed_dims * 3, bias=qkv_bias) self.attn_drop = nn.Dropout(attn_drop_rate) @@ -87,27 +82,22 @@ class WindowMSA(BaseModule): Wh*Ww, Wh*Ww), value should be between (-inf, 0]. """ B, N, C = x.shape - qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, - C // self.num_heads).permute(2, 0, 3, 1, 4) + qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4) # make torchscript happy (cannot use tensor as tuple) q, k, v = qkv[0], qkv[1], qkv[2] q = q * self.scale - attn = (q @ k.transpose(-2, -1)) - - relative_position_bias = self.relative_position_bias_table[ - self.relative_position_index.view(-1)].view( - self.window_size[0] * self.window_size[1], - self.window_size[0] * self.window_size[1], - -1) # Wh*Ww,Wh*Ww,nH - relative_position_bias = relative_position_bias.permute( - 2, 0, 1).contiguous() # nH, Wh*Ww, Wh*Ww + attn = q @ k.transpose(-2, -1) + + relative_position_bias = self.relative_position_bias_table[self.relative_position_index.view(-1)].view( + self.window_size[0] * self.window_size[1], self.window_size[0] * self.window_size[1], -1 + ) # Wh*Ww,Wh*Ww,nH + relative_position_bias = relative_position_bias.permute(2, 0, 1).contiguous() # nH, Wh*Ww, Wh*Ww attn = attn + relative_position_bias.unsqueeze(0) if mask is not None: nW = mask.shape[0] - attn = attn.view(B // nW, nW, self.num_heads, N, - N) + mask.unsqueeze(1).unsqueeze(0) + attn = attn.view(B // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0) attn = attn.view(-1, self.num_heads, N, N) attn = self.softmax(attn) @@ -148,17 +138,19 @@ class ShiftWindowMSA(BaseModule): Default: None """ - def __init__(self, - embed_dims, - num_heads, - window_size, - shift_size=0, - qkv_bias=True, - qk_scale=None, - attn_drop_rate=0, - proj_drop_rate=0, - dropout_layer=dict(type='DropPath', drop_prob=0.), - init_cfg=None): + def __init__( + self, + embed_dims, + num_heads, + window_size, + shift_size=0, + qkv_bias=True, + qk_scale=None, + attn_drop_rate=0, + proj_drop_rate=0, + dropout_layer=dict(type="DropPath", drop_prob=0.0), + init_cfg=None, + ): super().__init__(init_cfg=init_cfg) self.window_size = window_size @@ -172,14 +164,15 @@ class ShiftWindowMSA(BaseModule): qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop_rate=attn_drop_rate, - proj_drop_rate=proj_drop_rate) + proj_drop_rate=proj_drop_rate, + ) self.drop = build_dropout(dropout_layer) def forward(self, query, hw_shape): B, L, C = query.shape H, W = hw_shape - assert L == H * W, 'input feature has wrong size' + assert L == H * W, "input feature has wrong size" query = query.view(B, H, W, C) # pad feature maps to multiples of window size @@ -190,19 +183,12 @@ class ShiftWindowMSA(BaseModule): # cyclic shift if self.shift_size > 0: - shifted_query = torch.roll( - query, - shifts=(-self.shift_size, -self.shift_size), - dims=(1, 2)) + shifted_query = torch.roll(query, shifts=(-self.shift_size, -self.shift_size), dims=(1, 2)) # calculate attention mask for SW-MSA img_mask = torch.zeros((1, H_pad, W_pad, 1), device=query.device) - h_slices = (slice(0, -self.window_size), - slice(-self.window_size, - -self.shift_size), slice(-self.shift_size, None)) - w_slices = (slice(0, -self.window_size), - slice(-self.window_size, - -self.shift_size), slice(-self.shift_size, None)) + h_slices = (slice(0, -self.window_size), slice(-self.window_size, -self.shift_size), slice(-self.shift_size, None)) + w_slices = (slice(0, -self.window_size), slice(-self.window_size, -self.shift_size), slice(-self.shift_size, None)) cnt = 0 for h in h_slices: for w in w_slices: @@ -211,12 +197,9 @@ class ShiftWindowMSA(BaseModule): # nW, window_size, window_size, 1 mask_windows = self.window_partition(img_mask) - mask_windows = mask_windows.view( - -1, self.window_size * self.window_size) + mask_windows = mask_windows.view(-1, self.window_size * self.window_size) attn_mask = mask_windows.unsqueeze(1) - mask_windows.unsqueeze(2) - attn_mask = attn_mask.masked_fill(attn_mask != 0, - float(-100.0)).masked_fill( - attn_mask == 0, float(0.0)) + attn_mask = attn_mask.masked_fill(attn_mask != 0, float(-100.0)).masked_fill(attn_mask == 0, float(0.0)) else: shifted_query = query attn_mask = None @@ -230,17 +213,13 @@ class ShiftWindowMSA(BaseModule): attn_windows = self.w_msa(query_windows, mask=attn_mask) # merge windows - attn_windows = attn_windows.view(-1, self.window_size, - self.window_size, C) + attn_windows = attn_windows.view(-1, self.window_size, self.window_size, C) # B H' W' C shifted_x = self.window_reverse(attn_windows, H_pad, W_pad) # reverse cyclic shift if self.shift_size > 0: - x = torch.roll( - shifted_x, - shifts=(self.shift_size, self.shift_size), - dims=(1, 2)) + x = torch.roll(shifted_x, shifts=(self.shift_size, self.shift_size), dims=(1, 2)) else: x = shifted_x @@ -263,8 +242,7 @@ class ShiftWindowMSA(BaseModule): """ window_size = self.window_size B = int(windows.shape[0] / (H * W / window_size / window_size)) - x = windows.view(B, H // window_size, W // window_size, window_size, - window_size, -1) + x = windows.view(B, H // window_size, W // window_size, window_size, window_size, -1) x = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(B, H, W, -1) return x @@ -277,15 +255,14 @@ class ShiftWindowMSA(BaseModule): """ B, H, W, C = x.shape window_size = self.window_size - x = x.view(B, H // window_size, window_size, W // window_size, - window_size, C) + x = x.view(B, H // window_size, window_size, W // window_size, window_size, C) windows = x.permute(0, 1, 3, 2, 4, 5).contiguous() windows = windows.view(-1, window_size, window_size, C) return windows class SwinBlock(BaseModule): - """" + """ " Args: embed_dims (int): The feature dimension. num_heads (int): Parallel attention heads. @@ -309,21 +286,23 @@ class SwinBlock(BaseModule): Default: None """ - def __init__(self, - embed_dims, - num_heads, - feedforward_channels, - window_size=7, - shift=False, - qkv_bias=True, - qk_scale=None, - drop_rate=0., - attn_drop_rate=0., - drop_path_rate=0., - act_cfg=dict(type='GELU'), - norm_cfg=dict(type='LN'), - with_cp=False, - init_cfg=None): + def __init__( + self, + embed_dims, + num_heads, + feedforward_channels, + window_size=7, + shift=False, + qkv_bias=True, + qk_scale=None, + drop_rate=0.0, + attn_drop_rate=0.0, + drop_path_rate=0.0, + act_cfg=dict(type="GELU"), + norm_cfg=dict(type="LN"), + with_cp=False, + init_cfg=None, + ): super(SwinBlock, self).__init__(init_cfg=init_cfg) @@ -339,7 +318,8 @@ class SwinBlock(BaseModule): qk_scale=qk_scale, attn_drop_rate=attn_drop_rate, proj_drop_rate=drop_rate, - dropout_layer=dict(type='DropPath', drop_prob=drop_path_rate)) + dropout_layer=dict(type="DropPath", drop_prob=drop_path_rate), + ) self.norm2 = build_norm_layer(norm_cfg, embed_dims)[1] self.ffn = FFN( @@ -347,10 +327,11 @@ class SwinBlock(BaseModule): feedforward_channels=feedforward_channels, num_fcs=2, ffn_drop=drop_rate, - dropout_layer=dict(type='DropPath', drop_prob=drop_path_rate), + dropout_layer=dict(type="DropPath", drop_prob=drop_path_rate), act_cfg=act_cfg, add_identity=True, - init_cfg=None) + init_cfg=None, + ) def forward(self, x, hw_shape): @@ -404,22 +385,24 @@ class SwinBlockSequence(BaseModule): Default: None """ - def __init__(self, - embed_dims, - num_heads, - feedforward_channels, - depth, - window_size=7, - qkv_bias=True, - qk_scale=None, - drop_rate=0., - attn_drop_rate=0., - drop_path_rate=0., - downsample=None, - act_cfg=dict(type='GELU'), - norm_cfg=dict(type='LN'), - with_cp=False, - init_cfg=None): + def __init__( + self, + embed_dims, + num_heads, + feedforward_channels, + depth, + window_size=7, + qkv_bias=True, + qk_scale=None, + drop_rate=0.0, + attn_drop_rate=0.0, + drop_path_rate=0.0, + downsample=None, + act_cfg=dict(type="GELU"), + norm_cfg=dict(type="LN"), + with_cp=False, + init_cfg=None, + ): super().__init__(init_cfg=init_cfg) if isinstance(drop_path_rate, list): @@ -443,7 +426,8 @@ class SwinBlockSequence(BaseModule): drop_path_rate=drop_path_rates[i], act_cfg=act_cfg, norm_cfg=norm_cfg, - with_cp=with_cp) + with_cp=with_cp, + ) self.blocks.append(block) self.downsample = downsample @@ -461,7 +445,7 @@ class SwinBlockSequence(BaseModule): @MODELS.register_module() class SwinTransformer(BaseBackbone): - """ Swin Transformer + """Swin Transformer A PyTorch implement of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows` - https://arxiv.org/abs/2103.14030 @@ -520,33 +504,35 @@ class SwinTransformer(BaseBackbone): ]`` """ - def __init__(self, - pretrain_img_size=224, - in_channels=3, - embed_dims=96, - patch_size=4, - window_size=7, - mlp_ratio=4, - depths=(2, 2, 6, 2), - num_heads=(3, 6, 12, 24), - strides=(4, 2, 2, 2), - out_indices=(0, 1, 2, 3), - qkv_bias=True, - qk_scale=None, - patch_norm=True, - drop_rate=0., - attn_drop_rate=0., - drop_path_rate=0.1, - use_abs_pos_embed=False, - act_cfg=dict(type='GELU'), - norm_cfg=dict(type='LN'), - with_cp=False, - convert_weights=False, - frozen_stages=-1, - init_cfg=[ - dict(type='TruncNormal', std=.02, layer=['Linear']), - dict(type='Constant', val=1, layer=['LayerNorm']), - ]): + def __init__( + self, + pretrain_img_size=224, + in_channels=3, + embed_dims=96, + patch_size=4, + window_size=7, + mlp_ratio=4, + depths=(2, 2, 6, 2), + num_heads=(3, 6, 12, 24), + strides=(4, 2, 2, 2), + out_indices=(0, 1, 2, 3), + qkv_bias=True, + qk_scale=None, + patch_norm=True, + drop_rate=0.0, + attn_drop_rate=0.0, + drop_path_rate=0.1, + use_abs_pos_embed=False, + act_cfg=dict(type="GELU"), + norm_cfg=dict(type="LN"), + with_cp=False, + convert_weights=False, + frozen_stages=-1, + init_cfg=[ + dict(type="TruncNormal", std=0.02, layer=["Linear"]), + dict(type="Constant", val=1, layer=["LayerNorm"]), + ], + ): self.convert_weights = convert_weights self.frozen_stages = frozen_stages if isinstance(pretrain_img_size, int): @@ -554,9 +540,7 @@ class SwinTransformer(BaseBackbone): elif isinstance(pretrain_img_size, tuple): if len(pretrain_img_size) == 1: pretrain_img_size = to_2tuple(pretrain_img_size[0]) - assert len(pretrain_img_size) == 2, \ - f'The size of image should have length 1 or 2, ' \ - f'but got {len(pretrain_img_size)}' + assert len(pretrain_img_size) == 2, f"The size of image should have length 1 or 2, " f"but got {len(pretrain_img_size)}" super(SwinTransformer, self).__init__(init_cfg=init_cfg) @@ -564,31 +548,29 @@ class SwinTransformer(BaseBackbone): self.out_indices = out_indices self.use_abs_pos_embed = use_abs_pos_embed - assert strides[0] == patch_size, 'Use non-overlapping patch embed.' + assert strides[0] == patch_size, "Use non-overlapping patch embed." self.patch_embed = PatchEmbed( in_channels=in_channels, embed_dims=embed_dims, - conv_type='Conv2d', + conv_type="Conv2d", kernel_size=patch_size, stride=strides[0], norm_cfg=norm_cfg if patch_norm else None, - init_cfg=None) + init_cfg=None, + ) if self.use_abs_pos_embed: patch_row = pretrain_img_size[0] // patch_size patch_col = pretrain_img_size[1] // patch_size num_patches = patch_row * patch_col - self.absolute_pos_embed = nn.Parameter( - torch.zeros((1, num_patches, embed_dims))) + self.absolute_pos_embed = nn.Parameter(torch.zeros((1, num_patches, embed_dims))) self.drop_after_pos = nn.Dropout(p=drop_rate) # set stochastic depth decay rule total_depth = sum(depths) - dpr = [ - x.item() for x in torch.linspace(0, drop_path_rate, total_depth) - ] + dpr = [x.item() for x in torch.linspace(0, drop_path_rate, total_depth)] self.stages = nn.ModuleList() in_channels = embed_dims @@ -599,7 +581,8 @@ class SwinTransformer(BaseBackbone): out_channels=2 * in_channels, stride=strides[i + 1], norm_cfg=norm_cfg if patch_norm else None, - init_cfg=None) + init_cfg=None, + ) else: downsample = None @@ -613,11 +596,12 @@ class SwinTransformer(BaseBackbone): qk_scale=qk_scale, drop_rate=drop_rate, attn_drop_rate=attn_drop_rate, - drop_path_rate=dpr[sum(depths[:i]):sum(depths[:i + 1])], + drop_path_rate=dpr[sum(depths[:i]) : sum(depths[: i + 1])], downsample=downsample, act_cfg=act_cfg, norm_cfg=norm_cfg, - with_cp=with_cp) + with_cp=with_cp, + ) self.stages.append(stage) if downsample: in_channels = downsample.out_channels @@ -626,7 +610,7 @@ class SwinTransformer(BaseBackbone): # Add a norm layer for each output for i in out_indices: layer = build_norm_layer(norm_cfg, self.num_features[i])[1] - layer_name = f'norm{i}' + layer_name = f"norm{i}" self.add_module(layer_name, layer) def train(self, mode=True): @@ -646,7 +630,7 @@ class SwinTransformer(BaseBackbone): for i in range(1, self.frozen_stages + 1): if (i - 1) in self.out_indices: - norm_layer = getattr(self, f'norm{i-1}') + norm_layer = getattr(self, f"norm{i-1}") norm_layer.eval() for param in norm_layer.parameters(): param.requires_grad = False @@ -663,52 +647,44 @@ class SwinTransformer(BaseBackbone): pretrained (str, optional): Path to pre-trained weights. Defaults to None. """ - if (isinstance(self.init_cfg, dict) - and self.init_cfg['type'] == 'Pretrained'): + if isinstance(self.init_cfg, dict) and self.init_cfg["type"] == "Pretrained": # Suppress zero_init_residual if use pretrained model. logger = get_root_logger() - state_dict = get_state_dict( - self.init_cfg['checkpoint'], map_location='cpu') + state_dict = get_state_dict(self.init_cfg["checkpoint"], map_location="cpu") if self.convert_weights: # supported loading weight from original repo state_dict = swin_converter(state_dict) # strip prefix of state_dict - if list(state_dict.keys())[0].startswith('module.'): + if list(state_dict.keys())[0].startswith("module."): state_dict = {k[7:]: v for k, v in state_dict.items()} # reshape absolute position embedding - if state_dict.get('absolute_pos_embed') is not None: - absolute_pos_embed = state_dict['absolute_pos_embed'] + if state_dict.get("absolute_pos_embed") is not None: + absolute_pos_embed = state_dict["absolute_pos_embed"] N1, L, C1 = absolute_pos_embed.size() N2, C2, H, W = self.absolute_pos_embed.size() if N1 != N2 or C1 != C2 or L != H * W: - logger.warning('Error in loading absolute_pos_embed, pass') + logger.warning("Error in loading absolute_pos_embed, pass") else: - state_dict['absolute_pos_embed'] = absolute_pos_embed.view( - N2, H, W, C2).permute(0, 3, 1, 2).contiguous() + state_dict["absolute_pos_embed"] = absolute_pos_embed.view(N2, H, W, C2).permute(0, 3, 1, 2).contiguous() # interpolate position bias table if needed - relative_position_bias_table_keys = [ - k for k in state_dict.keys() - if 'relative_position_bias_table' in k - ] + relative_position_bias_table_keys = [k for k in state_dict.keys() if "relative_position_bias_table" in k] for table_key in relative_position_bias_table_keys: table_pretrained = state_dict[table_key] table_current = self.state_dict()[table_key] L1, nH1 = table_pretrained.size() L2, nH2 = table_current.size() if nH1 != nH2: - logger.warning(f'Error in loading {table_key}, pass') + logger.warning(f"Error in loading {table_key}, pass") elif L1 != L2: S1 = int(L1**0.5) S2 = int(L2**0.5) table_pretrained_resized = F.interpolate( - table_pretrained.permute(1, 0).reshape(1, nH1, S1, S1), - size=(S2, S2), - mode='bicubic') - state_dict[table_key] = table_pretrained_resized.view( - nH2, L2).permute(1, 0).contiguous() + table_pretrained.permute(1, 0).reshape(1, nH1, S1, S1), size=(S2, S2), mode="bicubic" + ) + state_dict[table_key] = table_pretrained_resized.view(nH2, L2).permute(1, 0).contiguous() # load state_dict load_state_dict(self, state_dict, strict=False, logger=logger) @@ -729,11 +705,9 @@ class SwinTransformer(BaseBackbone): for i, stage in enumerate(self.stages): x, hw_shape, out, out_hw_shape = stage(x, hw_shape) if i in self.out_indices: - norm_layer = getattr(self, f'norm{i}') + norm_layer = getattr(self, f"norm{i}") out = norm_layer(out) - out = out.view(-1, *out_hw_shape, - self.num_features[i]).permute(0, 3, 1, - 2).contiguous() + out = out.view(-1, *out_hw_shape, self.num_features[i]).permute(0, 3, 1, 2).contiguous() outs.append(out) return tuple(outs) diff --git a/mmpose/models/backbones/tcn.py b/mmpose/models/backbones/tcn.py index ef49a1ff075288cc7a23f51f47c5b1bcdd383894..2af2d8cee1be6e36616f03f0ff0285195bcc1340 100644 --- a/mmpose/models/backbones/tcn.py +++ b/mmpose/models/backbones/tcn.py @@ -6,6 +6,7 @@ from mmcv.cnn import ConvModule, build_conv_layer from mmengine.model import BaseModule from mmpose.registry import MODELS + from ..utils.regularizations import WeightNormClipHook from .base_backbone import BaseBackbone @@ -37,19 +38,21 @@ class BasicTemporalBlock(BaseModule): Default: None """ - def __init__(self, - in_channels, - out_channels, - mid_channels=1024, - kernel_size=3, - dilation=3, - dropout=0.25, - causal=False, - residual=True, - use_stride_conv=False, - conv_cfg=dict(type='Conv1d'), - norm_cfg=dict(type='BN1d'), - init_cfg=None): + def __init__( + self, + in_channels, + out_channels, + mid_channels=1024, + kernel_size=3, + dilation=3, + dropout=0.25, + causal=False, + residual=True, + use_stride_conv=False, + conv_cfg=dict(type="Conv1d"), + norm_cfg=dict(type="BN1d"), + init_cfg=None, + ): # Protect mutable default arguments conv_cfg = copy.deepcopy(conv_cfg) norm_cfg = copy.deepcopy(norm_cfg) @@ -80,21 +83,15 @@ class BasicTemporalBlock(BaseModule): kernel_size=kernel_size, stride=self.stride, dilation=self.dilation, - bias='auto', - conv_cfg=conv_cfg, - norm_cfg=norm_cfg)) - self.conv2 = nn.Sequential( - ConvModule( - mid_channels, - out_channels, - kernel_size=1, - bias='auto', + bias="auto", conv_cfg=conv_cfg, - norm_cfg=norm_cfg)) + norm_cfg=norm_cfg, + ) + ) + self.conv2 = nn.Sequential(ConvModule(mid_channels, out_channels, kernel_size=1, bias="auto", conv_cfg=conv_cfg, norm_cfg=norm_cfg)) if residual and in_channels != out_channels: - self.short_cut = build_conv_layer(conv_cfg, in_channels, - out_channels, 1) + self.short_cut = build_conv_layer(conv_cfg, in_channels, out_channels, 1) else: self.short_cut = None @@ -105,8 +102,7 @@ class BasicTemporalBlock(BaseModule): if self.use_stride_conv: assert self.causal_shift + self.kernel_size // 2 < x.shape[2] else: - assert 0 <= self.pad + self.causal_shift < x.shape[2] - \ - self.pad + self.causal_shift <= x.shape[2] + assert 0 <= self.pad + self.causal_shift < x.shape[2] - self.pad + self.causal_shift <= x.shape[2] out = self.conv1(x) if self.dropout is not None: @@ -118,12 +114,9 @@ class BasicTemporalBlock(BaseModule): if self.residual: if self.use_stride_conv: - res = x[:, :, self.causal_shift + - self.kernel_size // 2::self.kernel_size] + res = x[:, :, self.causal_shift + self.kernel_size // 2 :: self.kernel_size] else: - res = x[:, :, - (self.pad + self.causal_shift):(x.shape[2] - self.pad + - self.causal_shift)] + res = x[:, :, (self.pad + self.causal_shift) : (x.shape[2] - self.pad + self.causal_shift)] if self.short_cut is not None: res = self.short_cut(res) @@ -192,29 +185,24 @@ class TCN(BaseBackbone): (1, 1024, 217) """ - def __init__(self, - in_channels, - stem_channels=1024, - num_blocks=2, - kernel_sizes=(3, 3, 3), - dropout=0.25, - causal=False, - residual=True, - use_stride_conv=False, - conv_cfg=dict(type='Conv1d'), - norm_cfg=dict(type='BN1d'), - max_norm=None, - init_cfg=[ - dict( - type='Kaiming', - mode='fan_in', - nonlinearity='relu', - layer=['Conv2d']), - dict( - type='Constant', - val=1, - layer=['_BatchNorm', 'GroupNorm']) - ]): + def __init__( + self, + in_channels, + stem_channels=1024, + num_blocks=2, + kernel_sizes=(3, 3, 3), + dropout=0.25, + causal=False, + residual=True, + use_stride_conv=False, + conv_cfg=dict(type="Conv1d"), + norm_cfg=dict(type="BN1d"), + max_norm=None, + init_cfg=[ + dict(type="Kaiming", mode="fan_in", nonlinearity="relu", layer=["Conv2d"]), + dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"]), + ], + ): # Protect mutable default arguments conv_cfg = copy.deepcopy(conv_cfg) norm_cfg = copy.deepcopy(norm_cfg) @@ -231,16 +219,17 @@ class TCN(BaseBackbone): assert num_blocks == len(kernel_sizes) - 1 for ks in kernel_sizes: - assert ks % 2 == 1, 'Only odd filter widths are supported.' + assert ks % 2 == 1, "Only odd filter widths are supported." self.expand_conv = ConvModule( in_channels, stem_channels, kernel_size=kernel_sizes[0], stride=kernel_sizes[0] if use_stride_conv else 1, - bias='auto', + bias="auto", conv_cfg=conv_cfg, - norm_cfg=norm_cfg) + norm_cfg=norm_cfg, + ) dilation = kernel_sizes[0] self.tcn_blocks = nn.ModuleList() @@ -257,7 +246,9 @@ class TCN(BaseBackbone): residual=residual, use_stride_conv=use_stride_conv, conv_cfg=conv_cfg, - norm_cfg=norm_cfg)) + norm_cfg=norm_cfg, + ) + ) dilation *= kernel_sizes[i] if self.max_norm is not None: diff --git a/mmpose/models/backbones/utils/__init__.py b/mmpose/models/backbones/utils/__init__.py index 07e42f89126c9e5663123794f92987b4f9b347f1..4923a45c8e42957e5a3cf54fd1895af5e855afb9 100644 --- a/mmpose/models/backbones/utils/__init__.py +++ b/mmpose/models/backbones/utils/__init__.py @@ -5,7 +5,4 @@ from .make_divisible import make_divisible from .se_layer import SELayer from .utils import get_state_dict, load_checkpoint -__all__ = [ - 'channel_shuffle', 'make_divisible', 'InvertedResidual', 'SELayer', - 'load_checkpoint', 'get_state_dict' -] +__all__ = ["channel_shuffle", "make_divisible", "InvertedResidual", "SELayer", "load_checkpoint", "get_state_dict"] diff --git a/mmpose/models/backbones/utils/channel_shuffle.py b/mmpose/models/backbones/utils/channel_shuffle.py index aedd826bee690d42d92ed8a7f538b221e5b069e2..73f3a1b878c6d72c62e2ccb8a7b9d528888cefcf 100644 --- a/mmpose/models/backbones/utils/channel_shuffle.py +++ b/mmpose/models/backbones/utils/channel_shuffle.py @@ -18,8 +18,7 @@ def channel_shuffle(x, groups): """ batch_size, num_channels, height, width = x.size() - assert (num_channels % groups == 0), ('num_channels should be ' - 'divisible by groups') + assert num_channels % groups == 0, "num_channels should be " "divisible by groups" channels_per_group = num_channels // groups x = x.view(batch_size, groups, channels_per_group, height, width) diff --git a/mmpose/models/backbones/utils/ckpt_convert.py b/mmpose/models/backbones/utils/ckpt_convert.py index 14a43892c6630be31e915ed1f8b9164ba250e8bd..7b6dff7e7f1ab86417bce9ea1a259d5237ccccaf 100644 --- a/mmpose/models/backbones/utils/ckpt_convert.py +++ b/mmpose/models/backbones/utils/ckpt_convert.py @@ -14,8 +14,7 @@ def swin_converter(ckpt): def correct_unfold_reduction_order(x): out_channel, in_channel = x.shape x = x.reshape(out_channel, 4, in_channel // 4) - x = x[:, [0, 2, 1, 3], :].transpose(1, - 2).reshape(out_channel, in_channel) + x = x[:, [0, 2, 1, 3], :].transpose(1, 2).reshape(out_channel, in_channel) return x def correct_unfold_norm_order(x): @@ -25,38 +24,38 @@ def swin_converter(ckpt): return x for k, v in ckpt.items(): - if k.startswith('head'): + if k.startswith("head"): continue - elif k.startswith('layers'): + elif k.startswith("layers"): new_v = v - if 'attn.' in k: - new_k = k.replace('attn.', 'attn.w_msa.') - elif 'mlp.' in k: - if 'mlp.fc1.' in k: - new_k = k.replace('mlp.fc1.', 'ffn.layers.0.0.') - elif 'mlp.fc2.' in k: - new_k = k.replace('mlp.fc2.', 'ffn.layers.1.') + if "attn." in k: + new_k = k.replace("attn.", "attn.w_msa.") + elif "mlp." in k: + if "mlp.fc1." in k: + new_k = k.replace("mlp.fc1.", "ffn.layers.0.0.") + elif "mlp.fc2." in k: + new_k = k.replace("mlp.fc2.", "ffn.layers.1.") else: - new_k = k.replace('mlp.', 'ffn.') - elif 'downsample' in k: + new_k = k.replace("mlp.", "ffn.") + elif "downsample" in k: new_k = k - if 'reduction.' in k: + if "reduction." in k: new_v = correct_unfold_reduction_order(v) - elif 'norm.' in k: + elif "norm." in k: new_v = correct_unfold_norm_order(v) else: new_k = k - new_k = new_k.replace('layers', 'stages', 1) - elif k.startswith('patch_embed'): + new_k = new_k.replace("layers", "stages", 1) + elif k.startswith("patch_embed"): new_v = v - if 'proj' in k: - new_k = k.replace('proj', 'projection') + if "proj" in k: + new_k = k.replace("proj", "projection") else: new_k = k else: new_v = v new_k = k - new_ckpt['backbone.' + new_k] = new_v + new_ckpt["backbone." + new_k] = new_v return new_ckpt diff --git a/mmpose/models/backbones/utils/inverted_residual.py b/mmpose/models/backbones/utils/inverted_residual.py index dff762c570550e4a738ae1833a4c82c18777115d..aea28097b40cfcd95a99241e3d054f6ed198bb8e 100644 --- a/mmpose/models/backbones/utils/inverted_residual.py +++ b/mmpose/models/backbones/utils/inverted_residual.py @@ -38,24 +38,26 @@ class InvertedResidual(nn.Module): Tensor: The output tensor. """ - def __init__(self, - in_channels, - out_channels, - mid_channels, - kernel_size=3, - groups=None, - stride=1, - se_cfg=None, - with_expand_conv=True, - conv_cfg=None, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='ReLU'), - with_cp=False): + def __init__( + self, + in_channels, + out_channels, + mid_channels, + kernel_size=3, + groups=None, + stride=1, + se_cfg=None, + with_expand_conv=True, + conv_cfg=None, + norm_cfg=dict(type="BN"), + act_cfg=dict(type="ReLU"), + with_cp=False, + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) act_cfg = copy.deepcopy(act_cfg) super().__init__() - self.with_res_shortcut = (stride == 1 and in_channels == out_channels) + self.with_res_shortcut = stride == 1 and in_channels == out_channels assert stride in [1, 2] self.with_cp = with_cp self.with_se = se_cfg is not None @@ -78,7 +80,8 @@ class InvertedResidual(nn.Module): padding=0, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg) + act_cfg=act_cfg, + ) self.depthwise_conv = ConvModule( in_channels=mid_channels, out_channels=mid_channels, @@ -88,7 +91,8 @@ class InvertedResidual(nn.Module): groups=groups, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg) + act_cfg=act_cfg, + ) if self.with_se: self.se = SELayer(**se_cfg) self.linear_conv = ConvModule( @@ -99,7 +103,8 @@ class InvertedResidual(nn.Module): padding=0, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=None) + act_cfg=None, + ) def forward(self, x): diff --git a/mmpose/models/backbones/utils/se_layer.py b/mmpose/models/backbones/utils/se_layer.py index ec6d7aeaa9a990dbaf437b4ff4f4ba685e008245..94b744b9494b555c29480d0ca7ae5af6dd593ee4 100644 --- a/mmpose/models/backbones/utils/se_layer.py +++ b/mmpose/models/backbones/utils/se_layer.py @@ -21,11 +21,7 @@ class SELayer(nn.Module): Default: (dict(type='ReLU'), dict(type='Sigmoid')) """ - def __init__(self, - channels, - ratio=16, - conv_cfg=None, - act_cfg=(dict(type='ReLU'), dict(type='Sigmoid'))): + def __init__(self, channels, ratio=16, conv_cfg=None, act_cfg=(dict(type="ReLU"), dict(type="Sigmoid"))): super().__init__() if isinstance(act_cfg, dict): act_cfg = (act_cfg, act_cfg) @@ -33,19 +29,11 @@ class SELayer(nn.Module): assert mmengine.is_tuple_of(act_cfg, dict) self.global_avgpool = nn.AdaptiveAvgPool2d(1) self.conv1 = ConvModule( - in_channels=channels, - out_channels=int(channels / ratio), - kernel_size=1, - stride=1, - conv_cfg=conv_cfg, - act_cfg=act_cfg[0]) + in_channels=channels, out_channels=int(channels / ratio), kernel_size=1, stride=1, conv_cfg=conv_cfg, act_cfg=act_cfg[0] + ) self.conv2 = ConvModule( - in_channels=int(channels / ratio), - out_channels=channels, - kernel_size=1, - stride=1, - conv_cfg=conv_cfg, - act_cfg=act_cfg[1]) + in_channels=int(channels / ratio), out_channels=channels, kernel_size=1, stride=1, conv_cfg=conv_cfg, act_cfg=act_cfg[1] + ) def forward(self, x): out = self.global_avgpool(x) diff --git a/mmpose/models/backbones/utils/utils.py b/mmpose/models/backbones/utils/utils.py index ebc4fe40cd481391edf73872e2d4f6eb35592779..9e0767cdca5acb39f16f2e13cd50f0f8b6d2ced2 100644 --- a/mmpose/models/backbones/utils/utils.py +++ b/mmpose/models/backbones/utils/utils.py @@ -4,11 +4,7 @@ from collections import OrderedDict from mmengine.runner import CheckpointLoader, load_state_dict -def load_checkpoint(model, - filename, - map_location='cpu', - strict=False, - logger=None): +def load_checkpoint(model, filename, map_location="cpu", strict=False, logger=None): """Load checkpoint from a file or URI. Args: @@ -26,24 +22,23 @@ def load_checkpoint(model, checkpoint = CheckpointLoader.load_checkpoint(filename, map_location) # OrderedDict is a subclass of dict if not isinstance(checkpoint, dict): - raise RuntimeError( - f'No state_dict found in checkpoint file {filename}') + raise RuntimeError(f"No state_dict found in checkpoint file {filename}") # get state_dict from checkpoint - if 'state_dict' in checkpoint: - state_dict_tmp = checkpoint['state_dict'] - elif 'model' in checkpoint: - state_dict_tmp = checkpoint['model'] + if "state_dict" in checkpoint: + state_dict_tmp = checkpoint["state_dict"] + elif "model" in checkpoint: + state_dict_tmp = checkpoint["model"] else: state_dict_tmp = checkpoint state_dict = OrderedDict() # strip prefix of state_dict for k, v in state_dict_tmp.items(): - if k.startswith('module.backbone.'): + if k.startswith("module.backbone."): state_dict[k[16:]] = v - elif k.startswith('module.'): + elif k.startswith("module."): state_dict[k[7:]] = v - elif k.startswith('backbone.'): + elif k.startswith("backbone."): state_dict[k[9:]] = v else: state_dict[k] = v @@ -52,7 +47,7 @@ def load_checkpoint(model, return checkpoint -def get_state_dict(filename, map_location='cpu'): +def get_state_dict(filename, map_location="cpu"): """Get state_dict from a file or URI. Args: @@ -66,22 +61,21 @@ def get_state_dict(filename, map_location='cpu'): checkpoint = CheckpointLoader.load_checkpoint(filename, map_location) # OrderedDict is a subclass of dict if not isinstance(checkpoint, dict): - raise RuntimeError( - f'No state_dict found in checkpoint file {filename}') + raise RuntimeError(f"No state_dict found in checkpoint file {filename}") # get state_dict from checkpoint - if 'state_dict' in checkpoint: - state_dict_tmp = checkpoint['state_dict'] + if "state_dict" in checkpoint: + state_dict_tmp = checkpoint["state_dict"] else: state_dict_tmp = checkpoint state_dict = OrderedDict() # strip prefix of state_dict for k, v in state_dict_tmp.items(): - if k.startswith('module.backbone.'): + if k.startswith("module.backbone."): state_dict[k[16:]] = v - elif k.startswith('module.'): + elif k.startswith("module."): state_dict[k[7:]] = v - elif k.startswith('backbone.'): + elif k.startswith("backbone."): state_dict[k[9:]] = v else: state_dict[k] = v diff --git a/mmpose/models/backbones/v2v_net.py b/mmpose/models/backbones/v2v_net.py index 2cd1ab93b105b345aabc0ace2c7e776cd99e36a9..bdd1adb98ce7cef4e35ee34ab203ea577b185169 100644 --- a/mmpose/models/backbones/v2v_net.py +++ b/mmpose/models/backbones/v2v_net.py @@ -11,6 +11,7 @@ from mmcv.cnn import ConvModule from mmengine.model import BaseModule from mmpose.registry import MODELS + from .base_backbone import BaseBackbone @@ -29,13 +30,7 @@ class Basic3DBlock(BaseModule): Default: None """ - def __init__(self, - in_channels, - out_channels, - kernel_size, - conv_cfg=dict(type='Conv3d'), - norm_cfg=dict(type='BN3d'), - init_cfg=None): + def __init__(self, in_channels, out_channels, kernel_size, conv_cfg=dict(type="Conv3d"), norm_cfg=dict(type="BN3d"), init_cfg=None): super(Basic3DBlock, self).__init__(init_cfg=init_cfg) self.block = ConvModule( in_channels, @@ -45,7 +40,8 @@ class Basic3DBlock(BaseModule): padding=((kernel_size - 1) // 2), conv_cfg=conv_cfg, norm_cfg=norm_cfg, - bias=True) + bias=True, + ) def forward(self, x): """Forward function.""" @@ -68,13 +64,7 @@ class Res3DBlock(BaseModule): Default: None """ - def __init__(self, - in_channels, - out_channels, - kernel_size=3, - conv_cfg=dict(type='Conv3d'), - norm_cfg=dict(type='BN3d'), - init_cfg=None): + def __init__(self, in_channels, out_channels, kernel_size=3, conv_cfg=dict(type="Conv3d"), norm_cfg=dict(type="BN3d"), init_cfg=None): super(Res3DBlock, self).__init__(init_cfg=init_cfg) self.res_branch = nn.Sequential( ConvModule( @@ -85,7 +75,8 @@ class Res3DBlock(BaseModule): padding=((kernel_size - 1) // 2), conv_cfg=conv_cfg, norm_cfg=norm_cfg, - bias=True), + bias=True, + ), ConvModule( out_channels, out_channels, @@ -95,21 +86,16 @@ class Res3DBlock(BaseModule): conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=None, - bias=True)) + bias=True, + ), + ) if in_channels == out_channels: self.skip_con = nn.Sequential() else: self.skip_con = ConvModule( - in_channels, - out_channels, - 1, - stride=1, - padding=0, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=None, - bias=True) + in_channels, out_channels, 1, stride=1, padding=0, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=None, bias=True + ) def forward(self, x): """Forward function.""" @@ -131,8 +117,7 @@ class Pool3DBlock(BaseModule): def forward(self, x): """Forward function.""" - return F.max_pool3d( - x, kernel_size=self.pool_size, stride=self.pool_size) + return F.max_pool3d(x, kernel_size=self.pool_size, stride=self.pool_size) class Upsample3DBlock(BaseModule): @@ -149,23 +134,15 @@ class Upsample3DBlock(BaseModule): Default: None """ - def __init__(self, - in_channels, - out_channels, - kernel_size=2, - stride=2, - init_cfg=None): + def __init__(self, in_channels, out_channels, kernel_size=2, stride=2, init_cfg=None): super(Upsample3DBlock, self).__init__(init_cfg=init_cfg) assert kernel_size == 2 assert stride == 2 self.block = nn.Sequential( - nn.ConvTranspose3d( - in_channels, - out_channels, - kernel_size=kernel_size, - stride=stride, - padding=0, - output_padding=0), nn.BatchNorm3d(out_channels), nn.ReLU(True)) + nn.ConvTranspose3d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=0, output_padding=0), + nn.BatchNorm3d(out_channels), + nn.ReLU(True), + ) def forward(self, x): """Forward function.""" @@ -192,11 +169,9 @@ class EncoderDecorder(BaseModule): self.mid_res = Res3DBlock(in_channels * 4, in_channels * 4) self.decoder_res2 = Res3DBlock(in_channels * 4, in_channels * 4) - self.decoder_upsample2 = Upsample3DBlock(in_channels * 4, - in_channels * 2, 2, 2) + self.decoder_upsample2 = Upsample3DBlock(in_channels * 4, in_channels * 2, 2, 2) self.decoder_res1 = Res3DBlock(in_channels * 2, in_channels * 2) - self.decoder_upsample1 = Upsample3DBlock(in_channels * 2, in_channels, - 2, 2) + self.decoder_upsample1 = Upsample3DBlock(in_channels * 2, in_channels, 2, 2) self.skip_res1 = Res3DBlock(in_channels, in_channels) self.skip_res2 = Res3DBlock(in_channels * 2, in_channels * 2) @@ -246,14 +221,9 @@ class V2VNet(BaseBackbone): )`` """ - def __init__(self, - input_channels, - output_channels, - mid_channels=32, - init_cfg=dict( - type='Normal', - std=0.001, - layer=['Conv3d', 'ConvTranspose3d'])): + def __init__( + self, input_channels, output_channels, mid_channels=32, init_cfg=dict(type="Normal", std=0.001, layer=["Conv3d", "ConvTranspose3d"]) + ): super(V2VNet, self).__init__(init_cfg=init_cfg) self.front_layers = nn.Sequential( @@ -263,8 +233,7 @@ class V2VNet(BaseBackbone): self.encoder_decoder = EncoderDecorder(in_channels=mid_channels) - self.output_layer = nn.Conv3d( - mid_channels, output_channels, kernel_size=1, stride=1, padding=0) + self.output_layer = nn.Conv3d(mid_channels, output_channels, kernel_size=1, stride=1, padding=0) def forward(self, x): """Forward function.""" @@ -272,4 +241,4 @@ class V2VNet(BaseBackbone): x = self.encoder_decoder(x) x = self.output_layer(x) - return (x, ) + return (x,) diff --git a/mmpose/models/backbones/vgg.py b/mmpose/models/backbones/vgg.py index 8fa09d8dc7ded75678e8e23846474acee763a532..95a6705658f7561d6456e8a07a8e30e6cafa6632 100644 --- a/mmpose/models/backbones/vgg.py +++ b/mmpose/models/backbones/vgg.py @@ -4,18 +4,21 @@ from mmcv.cnn import ConvModule from mmengine.utils.dl_utils.parrots_wrapper import _BatchNorm from mmpose.registry import MODELS + from .base_backbone import BaseBackbone -def make_vgg_layer(in_channels, - out_channels, - num_blocks, - conv_cfg=None, - norm_cfg=None, - act_cfg=dict(type='ReLU'), - dilation=1, - with_norm=False, - ceil_mode=False): +def make_vgg_layer( + in_channels, + out_channels, + num_blocks, + conv_cfg=None, + norm_cfg=None, + act_cfg=dict(type="ReLU"), + dilation=1, + with_norm=False, + ceil_mode=False, +): layers = [] for _ in range(num_blocks): layer = ConvModule( @@ -27,7 +30,8 @@ def make_vgg_layer(in_channels, bias=True, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg) + act_cfg=act_cfg, + ) layers.append(layer) in_channels = out_channels layers.append(nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=ceil_mode)) @@ -80,37 +84,31 @@ class VGG(BaseBackbone): # each stage. For example, VGG11 contains 11 layers with learnable # parameters. 11 is computed as 11 = (1 + 1 + 2 + 2 + 2) + 3, # where 3 indicates the last three fully-connected layers. - arch_settings = { - 11: (1, 1, 2, 2, 2), - 13: (2, 2, 2, 2, 2), - 16: (2, 2, 3, 3, 3), - 19: (2, 2, 4, 4, 4) - } - - def __init__(self, - depth, - num_classes=-1, - num_stages=5, - dilations=(1, 1, 1, 1, 1), - out_indices=None, - frozen_stages=-1, - conv_cfg=None, - norm_cfg=None, - act_cfg=dict(type='ReLU'), - norm_eval=False, - ceil_mode=False, - with_last_pool=True, - init_cfg=[ - dict(type='Kaiming', layer=['Conv2d']), - dict( - type='Constant', - val=1, - layer=['_BatchNorm', 'GroupNorm']), - dict(type='Normal', std=0.01, layer=['Linear']), - ]): + arch_settings = {11: (1, 1, 2, 2, 2), 13: (2, 2, 2, 2, 2), 16: (2, 2, 3, 3, 3), 19: (2, 2, 4, 4, 4)} + + def __init__( + self, + depth, + num_classes=-1, + num_stages=5, + dilations=(1, 1, 1, 1, 1), + out_indices=None, + frozen_stages=-1, + conv_cfg=None, + norm_cfg=None, + act_cfg=dict(type="ReLU"), + norm_eval=False, + ceil_mode=False, + with_last_pool=True, + init_cfg=[ + dict(type="Kaiming", layer=["Conv2d"]), + dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"]), + dict(type="Normal", std=0.01, layer=["Linear"]), + ], + ): super().__init__(init_cfg=init_cfg) if depth not in self.arch_settings: - raise KeyError(f'invalid depth {depth} for vgg') + raise KeyError(f"invalid depth {depth} for vgg") assert num_stages >= 1 and num_stages <= 5 stage_blocks = self.arch_settings[depth] self.stage_blocks = stage_blocks[:num_stages] @@ -122,7 +120,7 @@ class VGG(BaseBackbone): with_norm = norm_cfg is not None if out_indices is None: - out_indices = (5, ) if num_classes > 0 else (4, ) + out_indices = (5,) if num_classes > 0 else (4,) assert max(out_indices) <= num_stages self.out_indices = out_indices @@ -144,7 +142,8 @@ class VGG(BaseBackbone): act_cfg=act_cfg, dilation=dilation, with_norm=with_norm, - ceil_mode=ceil_mode) + ceil_mode=ceil_mode, + ) vgg_layers.extend(vgg_layer) self.in_channels = out_channels self.range_sub_modules.append([start_idx, end_idx]) @@ -152,7 +151,7 @@ class VGG(BaseBackbone): if not with_last_pool: vgg_layers.pop(-1) self.range_sub_modules[-1][1] -= 1 - self.module_name = 'features' + self.module_name = "features" self.add_module(self.module_name, nn.Sequential(*vgg_layers)) if self.num_classes > 0: diff --git a/mmpose/models/backbones/vipnas_mbv3.py b/mmpose/models/backbones/vipnas_mbv3.py index 9156cafa56d4f15766e48c77cd492e52345aed65..e9c248cefa0975b2208e3f4a20fab8cf470cf093 100644 --- a/mmpose/models/backbones/vipnas_mbv3.py +++ b/mmpose/models/backbones/vipnas_mbv3.py @@ -5,6 +5,7 @@ from mmcv.cnn import ConvModule from torch.nn.modules.batchnorm import _BatchNorm from mmpose.registry import MODELS + from .base_backbone import BaseBackbone from .utils import InvertedResidual @@ -58,16 +59,13 @@ class ViPNAS_MobileNetV3(BaseBackbone): group=[None, 8, 120, 20, 100, 280, 240], att=[None, True, True, False, True, True, True], stride=[2, 1, 2, 2, 2, 1, 2], - act=['HSwish', 'ReLU', 'ReLU', 'ReLU', 'HSwish', 'HSwish', 'HSwish'], + act=["HSwish", "ReLU", "ReLU", "ReLU", "HSwish", "HSwish", "HSwish"], conv_cfg=None, - norm_cfg=dict(type='BN'), + norm_cfg=dict(type="BN"), frozen_stages=-1, norm_eval=False, with_cp=False, - init_cfg=[ - dict(type='Normal', std=0.001, layer=['Conv2d']), - dict(type='Constant', val=1, layer=['_BatchNorm', 'GroupNorm']) - ], + init_cfg=[dict(type="Normal", std=0.001, layer=["Conv2d"]), dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"])], ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) @@ -94,7 +92,8 @@ class ViPNAS_MobileNetV3(BaseBackbone): padding=self.ks[0] // 2, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=dict(type=self.act[0])) + act_cfg=dict(type=self.act[0]), + ) self.layers = self._make_layer() @@ -105,11 +104,7 @@ class ViPNAS_MobileNetV3(BaseBackbone): mid_channels = self.wid[i + 1] * self.expan[i + 1] if self.att[i + 1]: - se_cfg = dict( - channels=mid_channels, - ratio=4, - act_cfg=(dict(type='ReLU'), - dict(type='HSigmoid', bias=1.0, divisor=2.0))) + se_cfg = dict(channels=mid_channels, ratio=4, act_cfg=(dict(type="ReLU"), dict(type="HSigmoid", bias=1.0, divisor=2.0))) else: se_cfg = None @@ -138,9 +133,10 @@ class ViPNAS_MobileNetV3(BaseBackbone): conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, act_cfg=dict(type=self.act[i + 1]), - with_cp=self.with_cp) + with_cp=self.with_cp, + ) layer_index += 1 - layer_name = f'layer{layer_index}' + layer_name = f"layer{layer_index}" self.add_module(layer_name, layer) layers.append(layer_name) return layers @@ -152,14 +148,14 @@ class ViPNAS_MobileNetV3(BaseBackbone): layer = getattr(self, layer_name) x = layer(x) - return (x, ) + return (x,) def _freeze_stages(self): if self.frozen_stages >= 0: for param in self.conv1.parameters(): param.requires_grad = False for i in range(1, self.frozen_stages + 1): - layer = getattr(self, f'layer{i}') + layer = getattr(self, f"layer{i}") layer.eval() for param in layer.parameters(): param.requires_grad = False diff --git a/mmpose/models/backbones/vipnas_resnet.py b/mmpose/models/backbones/vipnas_resnet.py index 7be810b449c1a840c425c69e3d1d1340583e52ea..2df530000cdb0e8d936c5c2c776b8ff9890cd333 100644 --- a/mmpose/models/backbones/vipnas_resnet.py +++ b/mmpose/models/backbones/vipnas_resnet.py @@ -9,6 +9,7 @@ from mmengine.model import BaseModule, Sequential from mmengine.utils.dl_utils.parrots_wrapper import _BatchNorm from mmpose.registry import MODELS + from .base_backbone import BaseBackbone @@ -41,25 +42,27 @@ class ViPNAS_Bottleneck(BaseModule): Default: None """ - def __init__(self, - in_channels, - out_channels, - expansion=4, - stride=1, - dilation=1, - downsample=None, - style='pytorch', - with_cp=False, - conv_cfg=None, - norm_cfg=dict(type='BN'), - kernel_size=3, - groups=1, - attention=False, - init_cfg=None): + def __init__( + self, + in_channels, + out_channels, + expansion=4, + stride=1, + dilation=1, + downsample=None, + style="pytorch", + with_cp=False, + conv_cfg=None, + norm_cfg=dict(type="BN"), + kernel_size=3, + groups=1, + attention=False, + init_cfg=None, + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) - assert style in ['pytorch', 'caffe'] + assert style in ["pytorch", "caffe"] self.in_channels = in_channels self.out_channels = out_channels @@ -73,27 +76,18 @@ class ViPNAS_Bottleneck(BaseModule): self.conv_cfg = conv_cfg self.norm_cfg = norm_cfg - if self.style == 'pytorch': + if self.style == "pytorch": self.conv1_stride = 1 self.conv2_stride = stride else: self.conv1_stride = stride self.conv2_stride = 1 - self.norm1_name, norm1 = build_norm_layer( - norm_cfg, self.mid_channels, postfix=1) - self.norm2_name, norm2 = build_norm_layer( - norm_cfg, self.mid_channels, postfix=2) - self.norm3_name, norm3 = build_norm_layer( - norm_cfg, out_channels, postfix=3) + self.norm1_name, norm1 = build_norm_layer(norm_cfg, self.mid_channels, postfix=1) + self.norm2_name, norm2 = build_norm_layer(norm_cfg, self.mid_channels, postfix=2) + self.norm3_name, norm3 = build_norm_layer(norm_cfg, out_channels, postfix=3) - self.conv1 = build_conv_layer( - conv_cfg, - in_channels, - self.mid_channels, - kernel_size=1, - stride=self.conv1_stride, - bias=False) + self.conv1 = build_conv_layer(conv_cfg, in_channels, self.mid_channels, kernel_size=1, stride=self.conv1_stride, bias=False) self.add_module(self.norm1_name, norm1) self.conv2 = build_conv_layer( conv_cfg, @@ -104,20 +98,15 @@ class ViPNAS_Bottleneck(BaseModule): padding=kernel_size // 2, groups=groups, dilation=dilation, - bias=False) + bias=False, + ) self.add_module(self.norm2_name, norm2) - self.conv3 = build_conv_layer( - conv_cfg, - self.mid_channels, - out_channels, - kernel_size=1, - bias=False) + self.conv3 = build_conv_layer(conv_cfg, self.mid_channels, out_channels, kernel_size=1, bias=False) self.add_module(self.norm3_name, norm3) if attention: - self.attention = ContextBlock(out_channels, - max(1.0 / 16, 16.0 / out_channels)) + self.attention = ContextBlock(out_channels, max(1.0 / 16, 16.0 / out_channels)) else: self.attention = None @@ -197,14 +186,14 @@ def get_expansion(block, expansion=None): if isinstance(expansion, int): assert expansion > 0 elif expansion is None: - if hasattr(block, 'expansion'): + if hasattr(block, "expansion"): expansion = block.expansion elif issubclass(block, ViPNAS_Bottleneck): expansion = 1 else: - raise TypeError(f'expansion is not specified for {block.__name__}') + raise TypeError(f"expansion is not specified for {block.__name__}") else: - raise TypeError('expansion must be an integer or None') + raise TypeError("expansion must be an integer or None") return expansion @@ -241,22 +230,24 @@ class ViPNAS_ResLayer(Sequential): Default: None """ - def __init__(self, - block, - num_blocks, - in_channels, - out_channels, - expansion=None, - stride=1, - avg_down=False, - conv_cfg=None, - norm_cfg=dict(type='BN'), - downsample_first=True, - kernel_size=3, - groups=1, - attention=False, - init_cfg=None, - **kwargs): + def __init__( + self, + block, + num_blocks, + in_channels, + out_channels, + expansion=None, + stride=1, + avg_down=False, + conv_cfg=None, + norm_cfg=dict(type="BN"), + downsample_first=True, + kernel_size=3, + groups=1, + attention=False, + init_cfg=None, + **kwargs, + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) self.block = block @@ -268,22 +259,13 @@ class ViPNAS_ResLayer(Sequential): conv_stride = stride if avg_down and stride != 1: conv_stride = 1 - downsample.append( - nn.AvgPool2d( - kernel_size=stride, - stride=stride, - ceil_mode=True, - count_include_pad=False)) - downsample.extend([ - build_conv_layer( - conv_cfg, - in_channels, - out_channels, - kernel_size=1, - stride=conv_stride, - bias=False), - build_norm_layer(norm_cfg, out_channels)[1] - ]) + downsample.append(nn.AvgPool2d(kernel_size=stride, stride=stride, ceil_mode=True, count_include_pad=False)) + downsample.extend( + [ + build_conv_layer(conv_cfg, in_channels, out_channels, kernel_size=1, stride=conv_stride, bias=False), + build_norm_layer(norm_cfg, out_channels)[1], + ] + ) downsample = nn.Sequential(*downsample) layers = [] @@ -300,7 +282,9 @@ class ViPNAS_ResLayer(Sequential): kernel_size=kernel_size, groups=groups, attention=attention, - **kwargs)) + **kwargs, + ) + ) in_channels = out_channels for _ in range(1, num_blocks): layers.append( @@ -314,7 +298,9 @@ class ViPNAS_ResLayer(Sequential): kernel_size=kernel_size, groups=groups, attention=attention, - **kwargs)) + **kwargs, + ) + ) else: # downsample_first=False is for HourglassModule for i in range(0, num_blocks - 1): layers.append( @@ -328,7 +314,9 @@ class ViPNAS_ResLayer(Sequential): kernel_size=kernel_size, groups=groups, attention=attention, - **kwargs)) + **kwargs, + ) + ) layers.append( block( in_channels=in_channels, @@ -341,7 +329,9 @@ class ViPNAS_ResLayer(Sequential): kernel_size=kernel_size, groups=groups, attention=attention, - **kwargs)) + **kwargs, + ) + ) super().__init__(*layers, init_cfg=init_cfg) @@ -405,40 +395,36 @@ class ViPNAS_ResNet(BaseBackbone): 50: ViPNAS_Bottleneck, } - def __init__(self, - depth, - in_channels=3, - num_stages=4, - strides=(1, 2, 2, 2), - dilations=(1, 1, 1, 1), - out_indices=(3, ), - style='pytorch', - deep_stem=False, - avg_down=False, - frozen_stages=-1, - conv_cfg=None, - norm_cfg=dict(type='BN', requires_grad=True), - norm_eval=False, - with_cp=False, - zero_init_residual=True, - wid=[48, 80, 160, 304, 608], - expan=[None, 1, 1, 1, 1], - dep=[None, 4, 6, 7, 3], - ks=[7, 3, 5, 5, 5], - group=[None, 16, 16, 16, 16], - att=[None, True, False, True, True], - init_cfg=[ - dict(type='Normal', std=0.001, layer=['Conv2d']), - dict( - type='Constant', - val=1, - layer=['_BatchNorm', 'GroupNorm']) - ]): + def __init__( + self, + depth, + in_channels=3, + num_stages=4, + strides=(1, 2, 2, 2), + dilations=(1, 1, 1, 1), + out_indices=(3,), + style="pytorch", + deep_stem=False, + avg_down=False, + frozen_stages=-1, + conv_cfg=None, + norm_cfg=dict(type="BN", requires_grad=True), + norm_eval=False, + with_cp=False, + zero_init_residual=True, + wid=[48, 80, 160, 304, 608], + expan=[None, 1, 1, 1, 1], + dep=[None, 4, 6, 7, 3], + ks=[7, 3, 5, 5, 5], + group=[None, 16, 16, 16, 16], + att=[None, True, False, True, True], + init_cfg=[dict(type="Normal", std=0.001, layer=["Conv2d"]), dict(type="Constant", val=1, layer=["_BatchNorm", "GroupNorm"])], + ): # Protect mutable default arguments norm_cfg = copy.deepcopy(norm_cfg) super().__init__(init_cfg=init_cfg) if depth not in self.arch_settings: - raise KeyError(f'invalid depth {depth} for resnet') + raise KeyError(f"invalid depth {depth} for resnet") self.depth = depth self.stem_channels = dep[0] self.num_stages = num_stages @@ -458,7 +444,7 @@ class ViPNAS_ResNet(BaseBackbone): self.norm_eval = norm_eval self.zero_init_residual = zero_init_residual self.block = self.arch_settings[depth] - self.stage_blocks = dep[1:1 + num_stages] + self.stage_blocks = dep[1 : 1 + num_stages] self._make_stem_layer(in_channels, wid[0], ks[0]) @@ -484,9 +470,10 @@ class ViPNAS_ResNet(BaseBackbone): norm_cfg=norm_cfg, kernel_size=ks[i + 1], groups=group[i + 1], - attention=att[i + 1]) + attention=att[i + 1], + ) _in_channels = _out_channels - layer_name = f'layer{i + 1}' + layer_name = f"layer{i + 1}" self.add_module(layer_name, res_layer) self.res_layers.append(layer_name) @@ -515,7 +502,8 @@ class ViPNAS_ResNet(BaseBackbone): padding=1, conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, - inplace=True), + inplace=True, + ), ConvModule( stem_channels // 2, stem_channels // 2, @@ -524,7 +512,8 @@ class ViPNAS_ResNet(BaseBackbone): padding=1, conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, - inplace=True), + inplace=True, + ), ConvModule( stem_channels // 2, stem_channels, @@ -533,18 +522,14 @@ class ViPNAS_ResNet(BaseBackbone): padding=1, conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, - inplace=True)) + inplace=True, + ), + ) else: self.conv1 = build_conv_layer( - self.conv_cfg, - in_channels, - stem_channels, - kernel_size=kernel_size, - stride=2, - padding=kernel_size // 2, - bias=False) - self.norm1_name, norm1 = build_norm_layer( - self.norm_cfg, stem_channels, postfix=1) + self.conv_cfg, in_channels, stem_channels, kernel_size=kernel_size, stride=2, padding=kernel_size // 2, bias=False + ) + self.norm1_name, norm1 = build_norm_layer(self.norm_cfg, stem_channels, postfix=1) self.add_module(self.norm1_name, norm1) self.relu = nn.ReLU(inplace=True) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) @@ -563,7 +548,7 @@ class ViPNAS_ResNet(BaseBackbone): param.requires_grad = False for i in range(1, self.frozen_stages + 1): - m = getattr(self, f'layer{i}') + m = getattr(self, f"layer{i}") m.eval() for param in m.parameters(): param.requires_grad = False diff --git a/mmpose/models/builder.py b/mmpose/models/builder.py index cefaedc29100bcbc4c5b9cde55db8f66b25ab637..4dd1faf3326f98e975747fea97a5b899d0d99c4c 100644 --- a/mmpose/models/builder.py +++ b/mmpose/models/builder.py @@ -37,7 +37,5 @@ def build_pose_estimator(cfg): def build_posenet(cfg): """Build posenet.""" - warnings.warn( - '``build_posenet`` will be deprecated soon, ' - 'please use ``build_pose_estimator`` instead.', DeprecationWarning) + warnings.warn("``build_posenet`` will be deprecated soon, " "please use ``build_pose_estimator`` instead.", DeprecationWarning) return build_pose_estimator(cfg) diff --git a/mmpose/models/data_preprocessors/__init__.py b/mmpose/models/data_preprocessors/__init__.py index 89980f1f6e8538f81faa10a933028eec923b30b0..ba6456dddfa1254b4f37731596ab7ef585ba8246 100644 --- a/mmpose/models/data_preprocessors/__init__.py +++ b/mmpose/models/data_preprocessors/__init__.py @@ -3,6 +3,6 @@ from .batch_augmentation import BatchSyncRandomResize from .data_preprocessor import PoseDataPreprocessor __all__ = [ - 'PoseDataPreprocessor', - 'BatchSyncRandomResize', + "PoseDataPreprocessor", + "BatchSyncRandomResize", ] diff --git a/mmpose/models/data_preprocessors/batch_augmentation.py b/mmpose/models/data_preprocessors/batch_augmentation.py index e4dcd568e53b5d9cd6f6b2e2fd8f716c44bf3c7d..5574e12f1d994b99456de5eb61a4d30ee022e59b 100644 --- a/mmpose/models/data_preprocessors/batch_augmentation.py +++ b/mmpose/models/data_preprocessors/batch_augmentation.py @@ -27,20 +27,15 @@ class BatchSyncRandomResize(nn.Module): Defaults to 32. """ - def __init__(self, - random_size_range: Tuple[int, int], - interval: int = 10, - size_divisor: int = 32) -> None: + def __init__(self, random_size_range: Tuple[int, int], interval: int = 10, size_divisor: int = 32) -> None: super().__init__() self.rank, self.world_size = get_dist_info() self._input_size = None - self._random_size_range = (round(random_size_range[0] / size_divisor), - round(random_size_range[1] / size_divisor)) + self._random_size_range = (round(random_size_range[0] / size_divisor), round(random_size_range[1] / size_divisor)) self._interval = interval self._size_divisor = size_divisor - def forward(self, inputs: Tensor, data_samples: List[PoseDataSample] - ) -> Tuple[Tensor, List[PoseDataSample]]: + def forward(self, inputs: Tensor, data_samples: List[PoseDataSample]) -> Tuple[Tensor, List[PoseDataSample]]: """resize a batch of images and bboxes to shape ``self._input_size``""" h, w = inputs.shape[-2:] if self._input_size is None: @@ -48,65 +43,48 @@ class BatchSyncRandomResize(nn.Module): scale_y = self._input_size[0] / h scale_x = self._input_size[1] / w if scale_x != 1 or scale_y != 1: - inputs = F.interpolate( - inputs, - size=self._input_size, - mode='bilinear', - align_corners=False) + inputs = F.interpolate(inputs, size=self._input_size, mode="bilinear", align_corners=False) for data_sample in data_samples: - img_shape = (int(data_sample.img_shape[0] * scale_y), - int(data_sample.img_shape[1] * scale_x)) - pad_shape = (int(data_sample.pad_shape[0] * scale_y), - int(data_sample.pad_shape[1] * scale_x)) - data_sample.set_metainfo({ - 'img_shape': img_shape, - 'pad_shape': pad_shape, - 'batch_input_shape': self._input_size - }) - - if 'gt_instance_labels' not in data_sample: + img_shape = (int(data_sample.img_shape[0] * scale_y), int(data_sample.img_shape[1] * scale_x)) + pad_shape = (int(data_sample.pad_shape[0] * scale_y), int(data_sample.pad_shape[1] * scale_x)) + data_sample.set_metainfo({"img_shape": img_shape, "pad_shape": pad_shape, "batch_input_shape": self._input_size}) + + if "gt_instance_labels" not in data_sample: continue - if 'bboxes' in data_sample.gt_instance_labels: + if "bboxes" in data_sample.gt_instance_labels: data_sample.gt_instance_labels.bboxes[..., 0::2] *= scale_x data_sample.gt_instance_labels.bboxes[..., 1::2] *= scale_y - if 'keypoints' in data_sample.gt_instance_labels: + if "keypoints" in data_sample.gt_instance_labels: data_sample.gt_instance_labels.keypoints[..., 0] *= scale_x data_sample.gt_instance_labels.keypoints[..., 1] *= scale_y - if 'areas' in data_sample.gt_instance_labels: + if "areas" in data_sample.gt_instance_labels: data_sample.gt_instance_labels.areas *= scale_x * scale_y - if 'gt_fields' in data_sample \ - and 'heatmap_mask' in data_sample.gt_fields: + if "gt_fields" in data_sample and "heatmap_mask" in data_sample.gt_fields: mask = data_sample.gt_fields.heatmap_mask.unsqueeze(0) gt_fields = PixelData() gt_fields.set_field( - F.interpolate( - mask.float(), - size=self._input_size, - mode='bilinear', - align_corners=False).squeeze(0), 'heatmap_mask') + F.interpolate(mask.float(), size=self._input_size, mode="bilinear", align_corners=False).squeeze(0), "heatmap_mask" + ) data_sample.gt_fields = gt_fields message_hub = MessageHub.get_current_instance() - if (message_hub.get_info('iter') + 1) % self._interval == 0: - self._input_size = self._get_random_size( - aspect_ratio=float(w / h), device=inputs.device) + if (message_hub.get_info("iter") + 1) % self._interval == 0: + self._input_size = self._get_random_size(aspect_ratio=float(w / h), device=inputs.device) return inputs, data_samples - def _get_random_size(self, aspect_ratio: float, - device: torch.device) -> Tuple[int, int]: + def _get_random_size(self, aspect_ratio: float, device: torch.device) -> Tuple[int, int]: """Randomly generate a shape in ``_random_size_range`` and broadcast to all ranks.""" tensor = torch.LongTensor(2).to(device) if self.rank == 0: size = random.randint(*self._random_size_range) - size = (self._size_divisor * size, - self._size_divisor * int(aspect_ratio * size)) + size = (self._size_divisor * size, self._size_divisor * int(aspect_ratio * size)) tensor[0] = size[0] tensor[1] = size[1] barrier() diff --git a/mmpose/models/data_preprocessors/data_preprocessor.py b/mmpose/models/data_preprocessors/data_preprocessor.py index 9442d0ed50bdf0e9ca219e496619c1880777bda4..72ceb5ba50e967b5e84df51bea05e96f2e8dc826 100644 --- a/mmpose/models/data_preprocessors/data_preprocessor.py +++ b/mmpose/models/data_preprocessors/data_preprocessor.py @@ -50,15 +50,17 @@ class PoseDataPreprocessor(ImgDataPreprocessor): transforms on batched data. Defaults to None. """ - def __init__(self, - mean: Sequence[float] = None, - std: Sequence[float] = None, - pad_size_divisor: int = 1, - pad_value: Union[float, int] = 0, - bgr_to_rgb: bool = False, - rgb_to_bgr: bool = False, - non_blocking: Optional[bool] = False, - batch_augments: Optional[List[dict]] = None): + def __init__( + self, + mean: Sequence[float] = None, + std: Sequence[float] = None, + pad_size_divisor: int = 1, + pad_value: Union[float, int] = 0, + bgr_to_rgb: bool = False, + rgb_to_bgr: bool = False, + non_blocking: Optional[bool] = False, + batch_augments: Optional[List[dict]] = None, + ): super().__init__( mean=mean, std=std, @@ -66,11 +68,11 @@ class PoseDataPreprocessor(ImgDataPreprocessor): pad_value=pad_value, bgr_to_rgb=bgr_to_rgb, rgb_to_bgr=rgb_to_bgr, - non_blocking=non_blocking) + non_blocking=non_blocking, + ) if batch_augments is not None: - self.batch_augments = nn.ModuleList( - [MODELS.build(aug) for aug in batch_augments]) + self.batch_augments = nn.ModuleList([MODELS.build(aug) for aug in batch_augments]) else: self.batch_augments = None @@ -87,53 +89,43 @@ class PoseDataPreprocessor(ImgDataPreprocessor): """ batch_pad_shape = self._get_pad_shape(data) data = super().forward(data=data, training=training) - inputs, data_samples = data['inputs'], data['data_samples'] + inputs, data_samples = data["inputs"], data["data_samples"] # update metainfo since the image shape might change batch_input_shape = tuple(inputs[0].size()[-2:]) for data_sample, pad_shape in zip(data_samples, batch_pad_shape): - data_sample.set_metainfo({ - 'batch_input_shape': batch_input_shape, - 'pad_shape': pad_shape - }) + data_sample.set_metainfo({"batch_input_shape": batch_input_shape, "pad_shape": pad_shape}) # apply batch augmentations if training and self.batch_augments is not None: for batch_aug in self.batch_augments: inputs, data_samples = batch_aug(inputs, data_samples) - return {'inputs': inputs, 'data_samples': data_samples} + return {"inputs": inputs, "data_samples": data_samples} def _get_pad_shape(self, data: dict) -> List[tuple]: """Get the pad_shape of each image based on data and pad_size_divisor.""" - _batch_inputs = data['inputs'] + _batch_inputs = data["inputs"] # Process data with `pseudo_collate`. if is_seq_of(_batch_inputs, torch.Tensor): batch_pad_shape = [] for ori_input in _batch_inputs: - pad_h = int( - np.ceil(ori_input.shape[1] / - self.pad_size_divisor)) * self.pad_size_divisor - pad_w = int( - np.ceil(ori_input.shape[2] / - self.pad_size_divisor)) * self.pad_size_divisor + pad_h = int(np.ceil(ori_input.shape[1] / self.pad_size_divisor)) * self.pad_size_divisor + pad_w = int(np.ceil(ori_input.shape[2] / self.pad_size_divisor)) * self.pad_size_divisor batch_pad_shape.append((pad_h, pad_w)) # Process data with `default_collate`. elif isinstance(_batch_inputs, torch.Tensor): assert _batch_inputs.dim() == 4, ( - 'The input of `ImgDataPreprocessor` should be a NCHW tensor ' - 'or a list of tensor, but got a tensor with shape: ' - f'{_batch_inputs.shape}') - pad_h = int( - np.ceil(_batch_inputs.shape[1] / - self.pad_size_divisor)) * self.pad_size_divisor - pad_w = int( - np.ceil(_batch_inputs.shape[2] / - self.pad_size_divisor)) * self.pad_size_divisor + "The input of `ImgDataPreprocessor` should be a NCHW tensor " + "or a list of tensor, but got a tensor with shape: " + f"{_batch_inputs.shape}" + ) + pad_h = int(np.ceil(_batch_inputs.shape[1] / self.pad_size_divisor)) * self.pad_size_divisor + pad_w = int(np.ceil(_batch_inputs.shape[2] / self.pad_size_divisor)) * self.pad_size_divisor batch_pad_shape = [(pad_h, pad_w)] * _batch_inputs.shape[0] else: - raise TypeError('Output of `cast_data` should be a dict ' - 'or a tuple with inputs and data_samples, but got' - f'{type(data)}: {data}') + raise TypeError( + "Output of `cast_data` should be a dict " "or a tuple with inputs and data_samples, but got" f"{type(data)}: {data}" + ) return batch_pad_shape diff --git a/mmpose/models/distillers/__init__.py b/mmpose/models/distillers/__init__.py index 4cc22a61105dabf5a3d60d0f2f7f6ee2df512bf1..4eb6192b5b90909b9b2b3d760feaa8b2d02738ca 100644 --- a/mmpose/models/distillers/__init__.py +++ b/mmpose/models/distillers/__init__.py @@ -1,4 +1,4 @@ # Copyright (c) OpenMMLab. All rights reserved. from .dwpose_distiller import DWPoseDistiller -__all__ = ['DWPoseDistiller'] +__all__ = ["DWPoseDistiller"] diff --git a/mmpose/models/distillers/dwpose_distiller.py b/mmpose/models/distillers/dwpose_distiller.py index d267951cd549a03eacb4473846c574ada5262144..d540bfd72bed6ebcd3fab79917498c0bc8636e4e 100644 --- a/mmpose/models/distillers/dwpose_distiller.py +++ b/mmpose/models/distillers/dwpose_distiller.py @@ -14,8 +14,7 @@ from mmpose.evaluation.functional import simcc_pck_accuracy from mmpose.models import build_pose_estimator from mmpose.registry import MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ForwardResults, OptConfigType, OptMultiConfig, - OptSampleList, SampleList) +from mmpose.utils.typing import ForwardResults, OptConfigType, OptMultiConfig, OptSampleList, SampleList @MODELS.register_module() @@ -46,27 +45,26 @@ class DWPoseDistiller(BaseModel, metaclass=ABCMeta): .. _`DWPose`: https://arxiv.org/abs/2307.15880 """ - def __init__(self, - teacher_cfg, - student_cfg, - two_dis=False, - distill_cfg=None, - teacher_pretrained=None, - train_cfg: OptConfigType = None, - data_preprocessor: OptConfigType = None, - init_cfg: OptMultiConfig = None): - super().__init__( - data_preprocessor=data_preprocessor, init_cfg=init_cfg) - - self.teacher = build_pose_estimator( - (Config.fromfile(teacher_cfg)).model) + def __init__( + self, + teacher_cfg, + student_cfg, + two_dis=False, + distill_cfg=None, + teacher_pretrained=None, + train_cfg: OptConfigType = None, + data_preprocessor: OptConfigType = None, + init_cfg: OptMultiConfig = None, + ): + super().__init__(data_preprocessor=data_preprocessor, init_cfg=init_cfg) + + self.teacher = build_pose_estimator((Config.fromfile(teacher_cfg)).model) self.teacher_pretrained = teacher_pretrained self.teacher.eval() for param in self.teacher.parameters(): param.requires_grad = False - self.student = build_pose_estimator( - (Config.fromfile(student_cfg)).model) + self.student = build_pose_estimator((Config.fromfile(student_cfg)).model) self.distill_cfg = distill_cfg self.distill_losses = nn.ModuleDict() @@ -76,8 +74,7 @@ class DWPoseDistiller(BaseModel, metaclass=ABCMeta): loss_name = item_loss.name use_this = item_loss.use_this if use_this: - self.distill_losses[loss_name] = MODELS.build( - item_loss) + self.distill_losses[loss_name] = MODELS.build(item_loss) self.two_dis = two_dis self.train_cfg = train_cfg if train_cfg else self.student.train_cfg @@ -86,8 +83,7 @@ class DWPoseDistiller(BaseModel, metaclass=ABCMeta): def init_weights(self): if self.teacher_pretrained is not None: - load_checkpoint( - self.teacher, self.teacher_pretrained, map_location='cpu') + load_checkpoint(self.teacher, self.teacher_pretrained, map_location="cpu") self.student.init_weights() def set_epoch(self): @@ -96,26 +92,22 @@ class DWPoseDistiller(BaseModel, metaclass=ABCMeta): Used for the decay of distillation loss. """ self.message_hub = MessageHub.get_current_instance() - self.epoch = self.message_hub.get_info('epoch') - self.max_epochs = self.message_hub.get_info('max_epochs') - - def forward(self, - inputs: torch.Tensor, - data_samples: OptSampleList, - mode: str = 'tensor') -> ForwardResults: - if mode == 'loss': + self.epoch = self.message_hub.get_info("epoch") + self.max_epochs = self.message_hub.get_info("max_epochs") + + def forward(self, inputs: torch.Tensor, data_samples: OptSampleList, mode: str = "tensor") -> ForwardResults: + if mode == "loss": return self.loss(inputs, data_samples) - elif mode == 'predict': + elif mode == "predict": # use customed metainfo to override the default metainfo if self.metainfo is not None: for data_sample in data_samples: data_sample.set_metainfo(self.metainfo) return self.predict(inputs, data_samples) - elif mode == 'tensor': + elif mode == "tensor": return self._forward(inputs) else: - raise RuntimeError(f'Invalid mode "{mode}". ' - 'Only supports loss, predict and tensor mode.') + raise RuntimeError(f'Invalid mode "{mode}". ' "Only supports loss, predict and tensor mode.") def loss(self, inputs: Tensor, data_samples: SampleList) -> dict: """Calculate losses from a batch of inputs and data samples. @@ -139,31 +131,24 @@ class DWPoseDistiller(BaseModel, metaclass=ABCMeta): if not self.two_dis: fea_s = self.student.extract_feat(inputs) - ori_loss, pred, gt, target_weight = self.head_loss( - fea_s, data_samples, train_cfg=self.train_cfg) + ori_loss, pred, gt, target_weight = self.head_loss(fea_s, data_samples, train_cfg=self.train_cfg) losses.update(ori_loss) else: - ori_loss, pred, gt, target_weight = self.head_loss( - fea_t, data_samples, train_cfg=self.train_cfg) + ori_loss, pred, gt, target_weight = self.head_loss(fea_t, data_samples, train_cfg=self.train_cfg) all_keys = self.distill_losses.keys() - if 'loss_fea' in all_keys: - loss_name = 'loss_fea' - losses[loss_name] = self.distill_losses[loss_name](fea_s[-1], - fea_t[-1]) + if "loss_fea" in all_keys: + loss_name = "loss_fea" + losses[loss_name] = self.distill_losses[loss_name](fea_s[-1], fea_t[-1]) if not self.two_dis: - losses[loss_name] = ( - 1 - self.epoch / self.max_epochs) * losses[loss_name] - - if 'loss_logit' in all_keys: - loss_name = 'loss_logit' - losses[loss_name] = self.distill_losses[loss_name]( - pred, pred_t, self.student.head.loss_module.beta, - target_weight) + losses[loss_name] = (1 - self.epoch / self.max_epochs) * losses[loss_name] + + if "loss_logit" in all_keys: + loss_name = "loss_logit" + losses[loss_name] = self.distill_losses[loss_name](pred, pred_t, self.student.head.loss_module.beta, target_weight) if not self.two_dis: - losses[loss_name] = ( - 1 - self.epoch / self.max_epochs) * losses[loss_name] + losses[loss_name] = (1 - self.epoch / self.max_epochs) * losses[loss_name] return losses @@ -189,18 +174,16 @@ class DWPoseDistiller(BaseModel, metaclass=ABCMeta): (num_instances, K) """ if self.two_dis: - assert self.student.with_head, ( - 'The model must have head to perform prediction.') + assert self.student.with_head, "The model must have head to perform prediction." - if self.test_cfg.get('flip_test', False): + if self.test_cfg.get("flip_test", False): _feats = self.extract_feat(inputs) _feats_flip = self.extract_feat(inputs.flip(-1)) feats = [_feats, _feats_flip] else: feats = self.extract_feat(inputs) - preds = self.student.head.predict( - feats, data_samples, test_cfg=self.student.test_cfg) + preds = self.student.head.predict(feats, data_samples, test_cfg=self.student.test_cfg) if isinstance(preds, tuple): batch_pred_instances, batch_pred_fields = preds @@ -208,8 +191,7 @@ class DWPoseDistiller(BaseModel, metaclass=ABCMeta): batch_pred_instances = preds batch_pred_fields = None - results = self.student.add_pred_to_datasample( - batch_pred_instances, batch_pred_fields, data_samples) + results = self.student.add_pred_to_datasample(batch_pred_instances, batch_pred_fields, data_samples) return results else: @@ -238,19 +220,10 @@ class DWPoseDistiller(BaseModel, metaclass=ABCMeta): pred_x, pred_y = self.student.head.forward(feats) - gt_x = torch.cat([ - d.gt_instance_labels.keypoint_x_labels for d in batch_data_samples - ], - dim=0) - gt_y = torch.cat([ - d.gt_instance_labels.keypoint_y_labels for d in batch_data_samples - ], - dim=0) + gt_x = torch.cat([d.gt_instance_labels.keypoint_x_labels for d in batch_data_samples], dim=0) + gt_y = torch.cat([d.gt_instance_labels.keypoint_y_labels for d in batch_data_samples], dim=0) keypoint_weights = torch.cat( - [ - d.gt_instance_labels.keypoint_weights - for d in batch_data_samples - ], + [d.gt_instance_labels.keypoint_weights for d in batch_data_samples], dim=0, ) @@ -259,8 +232,7 @@ class DWPoseDistiller(BaseModel, metaclass=ABCMeta): # calculate losses losses = dict() - loss = self.student.head.loss_module(pred_simcc, gt_simcc, - keypoint_weights) + loss = self.student.head.loss_module(pred_simcc, gt_simcc, keypoint_weights) losses.update(loss_kpt=loss) diff --git a/mmpose/models/heads/__init__.py b/mmpose/models/heads/__init__.py index 319f0c6836be700a335b9667ca91f442e86ad70a..9f267dc925ab759d890652f0ec5deeaf0a62ab8b 100644 --- a/mmpose/models/heads/__init__.py +++ b/mmpose/models/heads/__init__.py @@ -1,20 +1,41 @@ # Copyright (c) OpenMMLab. All rights reserved. from .base_head import BaseHead from .coord_cls_heads import RTMCCHead, RTMWHead, SimCCHead -from .heatmap_heads import (AssociativeEmbeddingHead, CIDHead, CPMHead, - HeatmapHead, InternetHead, MSPNHead, ViPNASHead) -from .hybrid_heads import DEKRHead, RTMOHead, VisPredictHead -from .regression_heads import (DSNTHead, IntegralRegressionHead, - MotionRegressionHead, RegressionHead, RLEHead, - TemporalRegressionHead, - TrajectoryRegressionHead) +from .heatmap_heads import AssociativeEmbeddingHead, CIDHead, CPMHead, HeatmapHead, InternetHead, MSPNHead, ViPNASHead +from .hybrid_heads import DEKRHead, MultiHead, RTMOHead, VisPredictHead +from .regression_heads import ( + DSNTHead, + IntegralRegressionHead, + MotionRegressionHead, + RegressionHead, + RLEHead, + TemporalRegressionHead, + TrajectoryRegressionHead, +) from .transformer_heads import EDPoseHead __all__ = [ - 'BaseHead', 'HeatmapHead', 'CPMHead', 'MSPNHead', 'ViPNASHead', - 'RegressionHead', 'IntegralRegressionHead', 'SimCCHead', 'RLEHead', - 'DSNTHead', 'AssociativeEmbeddingHead', 'DEKRHead', 'VisPredictHead', - 'CIDHead', 'RTMCCHead', 'TemporalRegressionHead', - 'TrajectoryRegressionHead', 'MotionRegressionHead', 'EDPoseHead', - 'InternetHead', 'RTMWHead', 'RTMOHead' + "BaseHead", + "HeatmapHead", + "CPMHead", + "MSPNHead", + "ViPNASHead", + "RegressionHead", + "IntegralRegressionHead", + "SimCCHead", + "RLEHead", + "DSNTHead", + "AssociativeEmbeddingHead", + "DEKRHead", + "VisPredictHead", + "CIDHead", + "RTMCCHead", + "TemporalRegressionHead", + "TrajectoryRegressionHead", + "MotionRegressionHead", + "EDPoseHead", + "InternetHead", + "RTMWHead", + "RTMOHead", + "MultiHead", ] diff --git a/mmpose/models/heads/base_head.py b/mmpose/models/heads/base_head.py index d35c27b8b2b8db0ea737765d962de204304d2f19..0dc60aa7b1ba703ca36451be59bd020585ab9018 100644 --- a/mmpose/models/heads/base_head.py +++ b/mmpose/models/heads/base_head.py @@ -7,8 +7,7 @@ from mmengine.structures import InstanceData from torch import Tensor from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (Features, InstanceList, OptConfigType, - OptSampleList, Predictions) +from mmpose.utils.typing import Features, InstanceList, OptConfigType, OptSampleList, Predictions class BaseHead(BaseModule, metaclass=ABCMeta): @@ -24,21 +23,14 @@ class BaseHead(BaseModule, metaclass=ABCMeta): """Forward the network.""" @abstractmethod - def predict(self, - feats: Features, - batch_data_samples: OptSampleList, - test_cfg: OptConfigType = {}) -> Predictions: + def predict(self, feats: Features, batch_data_samples: OptSampleList, test_cfg: OptConfigType = {}) -> Predictions: """Predict results from features.""" @abstractmethod - def loss(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: OptConfigType = {}) -> dict: + def loss(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: OptConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples.""" - def decode(self, batch_outputs: Union[Tensor, - Tuple[Tensor]]) -> InstanceList: + def decode(self, batch_outputs: Union[Tensor, Tuple[Tensor]]) -> InstanceList: """Decode keypoints from outputs. Args: @@ -52,18 +44,18 @@ class BaseHead(BaseModule, metaclass=ABCMeta): def _pack_and_call(args, func): if not isinstance(args, tuple): - args = (args, ) + args = (args,) return func(*args) if self.decoder is None: raise RuntimeError( - f'The decoder has not been set in {self.__class__.__name__}. ' - 'Please set the decoder configs in the init parameters to ' - 'enable head methods `head.predict()` and `head.decode()`') + f"The decoder has not been set in {self.__class__.__name__}. " + "Please set the decoder configs in the init parameters to " + "enable head methods `head.predict()` and `head.decode()`" + ) if self.decoder.support_batch_decoding: - batch_keypoints, batch_scores = _pack_and_call( - batch_outputs, self.decoder.batch_decode) + batch_keypoints, batch_scores = _pack_and_call(batch_outputs, self.decoder.batch_decode) if isinstance(batch_scores, tuple) and len(batch_scores) == 2: batch_scores, batch_visibility = batch_scores else: @@ -75,8 +67,7 @@ class BaseHead(BaseModule, metaclass=ABCMeta): batch_scores = [] batch_visibility = [] for outputs in batch_output_np: - keypoints, scores = _pack_and_call(outputs, - self.decoder.decode) + keypoints, scores = _pack_and_call(outputs, self.decoder.decode) batch_keypoints.append(keypoints) if isinstance(scores, tuple) and len(scores) == 2: batch_scores.append(scores[0]) @@ -86,8 +77,7 @@ class BaseHead(BaseModule, metaclass=ABCMeta): batch_visibility.append(None) preds = [] - for keypoints, scores, visibility in zip(batch_keypoints, batch_scores, - batch_visibility): + for keypoints, scores, visibility in zip(batch_keypoints, batch_scores, batch_visibility): pred = InstanceData(keypoints=keypoints, keypoint_scores=scores) if visibility is not None: pred.keypoints_visible = visibility diff --git a/mmpose/models/heads/coord_cls_heads/__init__.py b/mmpose/models/heads/coord_cls_heads/__init__.py index 6a4e51c4d7307486a8d14a49f757caddacfbe2cc..32498ce7fb5b5228364c6216c3edd657a03a38a9 100644 --- a/mmpose/models/heads/coord_cls_heads/__init__.py +++ b/mmpose/models/heads/coord_cls_heads/__init__.py @@ -3,4 +3,4 @@ from .rtmcc_head import RTMCCHead from .rtmw_head import RTMWHead from .simcc_head import SimCCHead -__all__ = ['SimCCHead', 'RTMCCHead', 'RTMWHead'] +__all__ = ["SimCCHead", "RTMCCHead", "RTMWHead"] diff --git a/mmpose/models/heads/coord_cls_heads/rtmcc_head.py b/mmpose/models/heads/coord_cls_heads/rtmcc_head.py index 5df0733c4827af56ffe7635d7ba083890efb9f2b..2694293a295edfe9af151305e7c83f88a4053d68 100644 --- a/mmpose/models/heads/coord_cls_heads/rtmcc_head.py +++ b/mmpose/models/heads/coord_cls_heads/rtmcc_head.py @@ -13,8 +13,8 @@ from mmpose.models.utils.rtmcc_block import RTMCCBlock, ScaleNorm from mmpose.models.utils.tta import flip_vectors from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, InstanceList, OptConfigType, - OptSampleList) +from mmpose.utils.typing import ConfigType, InstanceList, OptConfigType, OptSampleList + from ..base_head import BaseHead OptIntSeq = Optional[Sequence[int]] @@ -64,15 +64,9 @@ class RTMCCHead(BaseHead): simcc_split_ratio: float = 2.0, final_layer_kernel_size: int = 1, gau_cfg: ConfigType = dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='ReLU', - use_rel_bias=False, - pos_enc=False), - loss: ConfigType = dict(type='KLDiscretLoss', use_target_weight=True), + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="ReLU", use_rel_bias=False, pos_enc=False + ), + loss: ConfigType = dict(type="KLDiscretLoss", use_target_weight=True), decoder: OptConfigType = None, init_cfg: OptConfigType = None, ): @@ -95,41 +89,35 @@ class RTMCCHead(BaseHead): self.decoder = None if isinstance(in_channels, (tuple, list)): - raise ValueError( - f'{self.__class__.__name__} does not support selecting ' - 'multiple input features.') + raise ValueError(f"{self.__class__.__name__} does not support selecting " "multiple input features.") # Define SimCC layers flatten_dims = self.in_featuremap_size[0] * self.in_featuremap_size[1] self.final_layer = nn.Conv2d( - in_channels, - out_channels, - kernel_size=final_layer_kernel_size, - stride=1, - padding=final_layer_kernel_size // 2) - self.mlp = nn.Sequential( - ScaleNorm(flatten_dims), - nn.Linear(flatten_dims, gau_cfg['hidden_dims'], bias=False)) + in_channels, out_channels, kernel_size=final_layer_kernel_size, stride=1, padding=final_layer_kernel_size // 2 + ) + self.mlp = nn.Sequential(ScaleNorm(flatten_dims), nn.Linear(flatten_dims, gau_cfg["hidden_dims"], bias=False)) W = int(self.input_size[0] * self.simcc_split_ratio) H = int(self.input_size[1] * self.simcc_split_ratio) self.gau = RTMCCBlock( self.out_channels, - gau_cfg['hidden_dims'], - gau_cfg['hidden_dims'], - s=gau_cfg['s'], - expansion_factor=gau_cfg['expansion_factor'], - dropout_rate=gau_cfg['dropout_rate'], - drop_path=gau_cfg['drop_path'], - attn_type='self-attn', - act_fn=gau_cfg['act_fn'], - use_rel_bias=gau_cfg['use_rel_bias'], - pos_enc=gau_cfg['pos_enc']) - - self.cls_x = nn.Linear(gau_cfg['hidden_dims'], W, bias=False) - self.cls_y = nn.Linear(gau_cfg['hidden_dims'], H, bias=False) + gau_cfg["hidden_dims"], + gau_cfg["hidden_dims"], + s=gau_cfg["s"], + expansion_factor=gau_cfg["expansion_factor"], + dropout_rate=gau_cfg["dropout_rate"], + drop_path=gau_cfg["drop_path"], + attn_type="self-attn", + act_fn=gau_cfg["act_fn"], + use_rel_bias=gau_cfg["use_rel_bias"], + pos_enc=gau_cfg["pos_enc"], + ) + + self.cls_x = nn.Linear(gau_cfg["hidden_dims"], W, bias=False) + self.cls_y = nn.Linear(gau_cfg["hidden_dims"], H, bias=False) def forward(self, feats: Tuple[Tensor]) -> Tuple[Tensor, Tensor]: """Forward the network. @@ -190,19 +178,16 @@ class RTMCCHead(BaseHead): intensity distribution in the y direction """ - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): # TTA: flip test -> feats = [orig, flipped] assert isinstance(feats, list) and len(feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] _feats, _feats_flip = feats _batch_pred_x, _batch_pred_y = self.forward(_feats) _batch_pred_x_flip, _batch_pred_y_flip = self.forward(_feats_flip) - _batch_pred_x_flip, _batch_pred_y_flip = flip_vectors( - _batch_pred_x_flip, - _batch_pred_y_flip, - flip_indices=flip_indices) + _batch_pred_x_flip, _batch_pred_y_flip = flip_vectors(_batch_pred_x_flip, _batch_pred_y_flip, flip_indices=flip_indices) batch_pred_x = (_batch_pred_x + _batch_pred_x_flip) * 0.5 batch_pred_y = (_batch_pred_y + _batch_pred_y_flip) * 0.5 @@ -211,13 +196,15 @@ class RTMCCHead(BaseHead): preds = self.decode((batch_pred_x, batch_pred_y)) - if test_cfg.get('output_heatmaps', False): + if test_cfg.get("output_heatmaps", False): rank, _ = get_dist_info() if rank == 0: - warnings.warn('The predicted simcc values are normalized for ' - 'visualization. This may cause discrepancy ' - 'between the keypoint scores and the 1D heatmaps' - '.') + warnings.warn( + "The predicted simcc values are normalized for " + "visualization. This may cause discrepancy " + "between the keypoint scores and the 1D heatmaps" + "." + ) # normalize the predicted 1d distribution batch_pred_x = get_simcc_normalized(batch_pred_x) @@ -230,13 +217,9 @@ class RTMCCHead(BaseHead): y = batch_pred_y.reshape(B, K, -1, 1) # B, K, Wx, Wy batch_heatmaps = torch.matmul(y, x) - pred_fields = [ - PixelData(heatmaps=hm) for hm in batch_heatmaps.detach() - ] + pred_fields = [PixelData(heatmaps=hm) for hm in batch_heatmaps.detach()] - for pred_instances, pred_x, pred_y in zip(preds, - to_numpy(batch_pred_x), - to_numpy(batch_pred_y)): + for pred_instances, pred_x, pred_y in zip(preds, to_numpy(batch_pred_x), to_numpy(batch_pred_y)): pred_instances.keypoint_x_labels = pred_x[None] pred_instances.keypoint_y_labels = pred_y[None] @@ -255,19 +238,10 @@ class RTMCCHead(BaseHead): pred_x, pred_y = self.forward(feats) - gt_x = torch.cat([ - d.gt_instance_labels.keypoint_x_labels for d in batch_data_samples - ], - dim=0) - gt_y = torch.cat([ - d.gt_instance_labels.keypoint_y_labels for d in batch_data_samples - ], - dim=0) + gt_x = torch.cat([d.gt_instance_labels.keypoint_x_labels for d in batch_data_samples], dim=0) + gt_y = torch.cat([d.gt_instance_labels.keypoint_y_labels for d in batch_data_samples], dim=0) keypoint_weights = torch.cat( - [ - d.gt_instance_labels.keypoint_weights - for d in batch_data_samples - ], + [d.gt_instance_labels.keypoint_weights for d in batch_data_samples], dim=0, ) @@ -296,8 +270,8 @@ class RTMCCHead(BaseHead): @property def default_init_cfg(self): init_cfg = [ - dict(type='Normal', layer=['Conv2d'], std=0.001), - dict(type='Constant', layer='BatchNorm2d', val=1), - dict(type='Normal', layer=['Linear'], std=0.01, bias=0), + dict(type="Normal", layer=["Conv2d"], std=0.001), + dict(type="Constant", layer="BatchNorm2d", val=1), + dict(type="Normal", layer=["Linear"], std=0.01, bias=0), ] return init_cfg diff --git a/mmpose/models/heads/coord_cls_heads/rtmw_head.py b/mmpose/models/heads/coord_cls_heads/rtmw_head.py index 7111f9044615e5e58726f42372406a973f86cf8b..642ef2a90f2488078f60633473b59b5187c3b31e 100644 --- a/mmpose/models/heads/coord_cls_heads/rtmw_head.py +++ b/mmpose/models/heads/coord_cls_heads/rtmw_head.py @@ -14,8 +14,8 @@ from mmpose.models.utils.rtmcc_block import RTMCCBlock, ScaleNorm from mmpose.models.utils.tta import flip_vectors from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, InstanceList, OptConfigType, - OptSampleList) +from mmpose.utils.typing import ConfigType, InstanceList, OptConfigType, OptSampleList + from ..base_head import BaseHead OptIntSeq = Optional[Sequence[int]] @@ -62,15 +62,9 @@ class RTMWHead(BaseHead): simcc_split_ratio: float = 2.0, final_layer_kernel_size: int = 1, gau_cfg: ConfigType = dict( - hidden_dims=256, - s=128, - expansion_factor=2, - dropout_rate=0., - drop_path=0., - act_fn='ReLU', - use_rel_bias=False, - pos_enc=False), - loss: ConfigType = dict(type='KLDiscretLoss', use_target_weight=True), + hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="ReLU", use_rel_bias=False, pos_enc=False + ), + loss: ConfigType = dict(type="KLDiscretLoss", use_target_weight=True), decoder: OptConfigType = None, init_cfg: OptConfigType = None, ): @@ -93,9 +87,7 @@ class RTMWHead(BaseHead): self.decoder = None if isinstance(in_channels, (tuple, list)): - raise ValueError( - f'{self.__class__.__name__} does not support selecting ' - 'multiple input features.') + raise ValueError(f"{self.__class__.__name__} does not support selecting " "multiple input features.") # Define SimCC layers flatten_dims = self.in_featuremap_size[0] * self.in_featuremap_size[1] @@ -108,8 +100,9 @@ class RTMWHead(BaseHead): kernel_size=final_layer_kernel_size, stride=1, padding=final_layer_kernel_size // 2, - norm_cfg=dict(type='BN', requires_grad=True), - act_cfg=dict(type='ReLU')) + norm_cfg=dict(type="BN", requires_grad=True), + act_cfg=dict(type="ReLU"), + ) self.final_layer = ConvModule( in_channels, @@ -117,44 +110,42 @@ class RTMWHead(BaseHead): kernel_size=final_layer_kernel_size, stride=1, padding=final_layer_kernel_size // 2, - norm_cfg=dict(type='BN', requires_grad=True), - act_cfg=dict(type='ReLU')) + norm_cfg=dict(type="BN", requires_grad=True), + act_cfg=dict(type="ReLU"), + ) self.final_layer2 = ConvModule( in_channels // ps + in_channels // 4, out_channels, kernel_size=final_layer_kernel_size, stride=1, padding=final_layer_kernel_size // 2, - norm_cfg=dict(type='BN', requires_grad=True), - act_cfg=dict(type='ReLU')) + norm_cfg=dict(type="BN", requires_grad=True), + act_cfg=dict(type="ReLU"), + ) - self.mlp = nn.Sequential( - ScaleNorm(flatten_dims), - nn.Linear(flatten_dims, gau_cfg['hidden_dims'] // 2, bias=False)) + self.mlp = nn.Sequential(ScaleNorm(flatten_dims), nn.Linear(flatten_dims, gau_cfg["hidden_dims"] // 2, bias=False)) - self.mlp2 = nn.Sequential( - ScaleNorm(flatten_dims * ps**2), - nn.Linear( - flatten_dims * ps**2, gau_cfg['hidden_dims'] // 2, bias=False)) + self.mlp2 = nn.Sequential(ScaleNorm(flatten_dims * ps**2), nn.Linear(flatten_dims * ps**2, gau_cfg["hidden_dims"] // 2, bias=False)) W = int(self.input_size[0] * self.simcc_split_ratio) H = int(self.input_size[1] * self.simcc_split_ratio) self.gau = RTMCCBlock( self.out_channels, - gau_cfg['hidden_dims'], - gau_cfg['hidden_dims'], - s=gau_cfg['s'], - expansion_factor=gau_cfg['expansion_factor'], - dropout_rate=gau_cfg['dropout_rate'], - drop_path=gau_cfg['drop_path'], - attn_type='self-attn', - act_fn=gau_cfg['act_fn'], - use_rel_bias=gau_cfg['use_rel_bias'], - pos_enc=gau_cfg['pos_enc']) - - self.cls_x = nn.Linear(gau_cfg['hidden_dims'], W, bias=False) - self.cls_y = nn.Linear(gau_cfg['hidden_dims'], H, bias=False) + gau_cfg["hidden_dims"], + gau_cfg["hidden_dims"], + s=gau_cfg["s"], + expansion_factor=gau_cfg["expansion_factor"], + dropout_rate=gau_cfg["dropout_rate"], + drop_path=gau_cfg["drop_path"], + attn_type="self-attn", + act_fn=gau_cfg["act_fn"], + use_rel_bias=gau_cfg["use_rel_bias"], + pos_enc=gau_cfg["pos_enc"], + ) + + self.cls_x = nn.Linear(gau_cfg["hidden_dims"], W, bias=False) + self.cls_y = nn.Linear(gau_cfg["hidden_dims"], H, bias=False) def forward(self, feats: Tuple[Tensor]) -> Tuple[Tensor, Tensor]: """Forward the network. @@ -224,19 +215,16 @@ class RTMWHead(BaseHead): intensity distribution in the y direction """ - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): # TTA: flip test -> feats = [orig, flipped] assert isinstance(feats, list) and len(feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] _feats, _feats_flip = feats _batch_pred_x, _batch_pred_y = self.forward(_feats) _batch_pred_x_flip, _batch_pred_y_flip = self.forward(_feats_flip) - _batch_pred_x_flip, _batch_pred_y_flip = flip_vectors( - _batch_pred_x_flip, - _batch_pred_y_flip, - flip_indices=flip_indices) + _batch_pred_x_flip, _batch_pred_y_flip = flip_vectors(_batch_pred_x_flip, _batch_pred_y_flip, flip_indices=flip_indices) batch_pred_x = (_batch_pred_x + _batch_pred_x_flip) * 0.5 batch_pred_y = (_batch_pred_y + _batch_pred_y_flip) * 0.5 @@ -245,13 +233,15 @@ class RTMWHead(BaseHead): preds = self.decode((batch_pred_x, batch_pred_y)) - if test_cfg.get('output_heatmaps', False): + if test_cfg.get("output_heatmaps", False): rank, _ = get_dist_info() if rank == 0: - warnings.warn('The predicted simcc values are normalized for ' - 'visualization. This may cause discrepancy ' - 'between the keypoint scores and the 1D heatmaps' - '.') + warnings.warn( + "The predicted simcc values are normalized for " + "visualization. This may cause discrepancy " + "between the keypoint scores and the 1D heatmaps" + "." + ) # normalize the predicted 1d distribution batch_pred_x = get_simcc_normalized(batch_pred_x) @@ -264,13 +254,9 @@ class RTMWHead(BaseHead): y = batch_pred_y.reshape(B, K, -1, 1) # B, K, Wx, Wy batch_heatmaps = torch.matmul(y, x) - pred_fields = [ - PixelData(heatmaps=hm) for hm in batch_heatmaps.detach() - ] + pred_fields = [PixelData(heatmaps=hm) for hm in batch_heatmaps.detach()] - for pred_instances, pred_x, pred_y in zip(preds, - to_numpy(batch_pred_x), - to_numpy(batch_pred_y)): + for pred_instances, pred_x, pred_y in zip(preds, to_numpy(batch_pred_x), to_numpy(batch_pred_y)): pred_instances.keypoint_x_labels = pred_x[None] pred_instances.keypoint_y_labels = pred_y[None] @@ -289,19 +275,10 @@ class RTMWHead(BaseHead): pred_x, pred_y = self.forward(feats) - gt_x = torch.cat([ - d.gt_instance_labels.keypoint_x_labels for d in batch_data_samples - ], - dim=0) - gt_y = torch.cat([ - d.gt_instance_labels.keypoint_y_labels for d in batch_data_samples - ], - dim=0) + gt_x = torch.cat([d.gt_instance_labels.keypoint_x_labels for d in batch_data_samples], dim=0) + gt_y = torch.cat([d.gt_instance_labels.keypoint_y_labels for d in batch_data_samples], dim=0) keypoint_weights = torch.cat( - [ - d.gt_instance_labels.keypoint_weights - for d in batch_data_samples - ], + [d.gt_instance_labels.keypoint_weights for d in batch_data_samples], dim=0, ) @@ -330,8 +307,8 @@ class RTMWHead(BaseHead): @property def default_init_cfg(self): init_cfg = [ - dict(type='Normal', layer=['Conv2d'], std=0.001), - dict(type='Constant', layer='BatchNorm2d', val=1), - dict(type='Normal', layer=['Linear'], std=0.01, bias=0), + dict(type="Normal", layer=["Conv2d"], std=0.001), + dict(type="Constant", layer="BatchNorm2d", val=1), + dict(type="Normal", layer=["Linear"], std=0.01, bias=0), ] return init_cfg diff --git a/mmpose/models/heads/coord_cls_heads/simcc_head.py b/mmpose/models/heads/coord_cls_heads/simcc_head.py index d9e7001cbc31685d5a46f2cedde19606001fc8c8..e1e6b14ced009e3ee561e1597888965f25130a2b 100644 --- a/mmpose/models/heads/coord_cls_heads/simcc_head.py +++ b/mmpose/models/heads/coord_cls_heads/simcc_head.py @@ -13,8 +13,8 @@ from mmpose.evaluation.functional import simcc_pck_accuracy from mmpose.models.utils.tta import flip_vectors from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, InstanceList, OptConfigType, - OptSampleList) +from mmpose.utils.typing import ConfigType, InstanceList, OptConfigType, OptSampleList + from ..base_head import BaseHead OptIntSeq = Optional[Sequence[int]] @@ -77,14 +77,14 @@ class SimCCHead(BaseHead): input_size: Tuple[int, int], in_featuremap_size: Tuple[int, int], simcc_split_ratio: float = 2.0, - deconv_type: str = 'heatmap', + deconv_type: str = "heatmap", deconv_out_channels: OptIntSeq = (256, 256, 256), deconv_kernel_sizes: OptIntSeq = (4, 4, 4), deconv_num_groups: OptIntSeq = (16, 16, 16), conv_out_channels: OptIntSeq = None, conv_kernel_sizes: OptIntSeq = None, final_layer: dict = dict(kernel_size=1), - loss: ConfigType = dict(type='KLDiscretLoss', use_target_weight=True), + loss: ConfigType = dict(type="KLDiscretLoss", use_target_weight=True), decoder: OptConfigType = None, init_cfg: OptConfigType = None, ): @@ -94,11 +94,10 @@ class SimCCHead(BaseHead): super().__init__(init_cfg) - if deconv_type not in {'heatmap', 'vipnas'}: + if deconv_type not in {"heatmap", "vipnas"}: raise ValueError( - f'{self.__class__.__name__} got invalid `deconv_type` value' - f'{deconv_type}. Should be one of ' - '{"heatmap", "vipnas"}') + f"{self.__class__.__name__} got invalid `deconv_type` value" f"{deconv_type}. Should be one of " '{"heatmap", "vipnas"}' + ) self.in_channels = in_channels self.out_channels = out_channels @@ -113,8 +112,7 @@ class SimCCHead(BaseHead): num_deconv = len(deconv_out_channels) if deconv_out_channels else 0 if num_deconv != 0: - self.heatmap_size = tuple( - [s * (2**num_deconv) for s in in_featuremap_size]) + self.heatmap_size = tuple([s * (2**num_deconv) for s in in_featuremap_size]) # deconv layers + 1x1 conv self.deconv_head = self._make_deconv_head( @@ -126,7 +124,8 @@ class SimCCHead(BaseHead): deconv_num_groups=deconv_num_groups, conv_out_channels=conv_out_channels, conv_kernel_sizes=conv_kernel_sizes, - final_layer=final_layer) + final_layer=final_layer, + ) if final_layer is not None: in_channels = out_channels @@ -137,11 +136,7 @@ class SimCCHead(BaseHead): self.deconv_head = None if final_layer is not None: - cfg = dict( - type='Conv2d', - in_channels=in_channels, - out_channels=out_channels, - kernel_size=1) + cfg = dict(type="Conv2d", in_channels=in_channels, out_channels=out_channels, kernel_size=1) cfg.update(final_layer) self.final_layer = build_conv_layer(cfg) else: @@ -162,38 +157,42 @@ class SimCCHead(BaseHead): self, in_channels: Union[int, Sequence[int]], out_channels: int, - deconv_type: str = 'heatmap', + deconv_type: str = "heatmap", deconv_out_channels: OptIntSeq = (256, 256, 256), deconv_kernel_sizes: OptIntSeq = (4, 4, 4), deconv_num_groups: OptIntSeq = (16, 16, 16), conv_out_channels: OptIntSeq = None, conv_kernel_sizes: OptIntSeq = None, - final_layer: dict = dict(kernel_size=1) + final_layer: dict = dict(kernel_size=1), ) -> nn.Module: """Create deconvolutional layers by given parameters.""" - if deconv_type == 'heatmap': + if deconv_type == "heatmap": deconv_head = MODELS.build( dict( - type='HeatmapHead', + type="HeatmapHead", in_channels=self.in_channels, out_channels=out_channels, deconv_out_channels=deconv_out_channels, deconv_kernel_sizes=deconv_kernel_sizes, conv_out_channels=conv_out_channels, conv_kernel_sizes=conv_kernel_sizes, - final_layer=final_layer)) + final_layer=final_layer, + ) + ) else: deconv_head = MODELS.build( dict( - type='ViPNASHead', + type="ViPNASHead", in_channels=in_channels, out_channels=out_channels, deconv_out_channels=deconv_out_channels, deconv_num_groups=deconv_num_groups, conv_out_channels=conv_out_channels, conv_kernel_sizes=conv_kernel_sizes, - final_layer=final_layer)) + final_layer=final_layer, + ) + ) return deconv_head @@ -256,19 +255,16 @@ class SimCCHead(BaseHead): intensity distribution in the y direction """ - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): # TTA: flip test -> feats = [orig, flipped] assert isinstance(feats, list) and len(feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] _feats, _feats_flip = feats _batch_pred_x, _batch_pred_y = self.forward(_feats) _batch_pred_x_flip, _batch_pred_y_flip = self.forward(_feats_flip) - _batch_pred_x_flip, _batch_pred_y_flip = flip_vectors( - _batch_pred_x_flip, - _batch_pred_y_flip, - flip_indices=flip_indices) + _batch_pred_x_flip, _batch_pred_y_flip = flip_vectors(_batch_pred_x_flip, _batch_pred_y_flip, flip_indices=flip_indices) batch_pred_x = (_batch_pred_x + _batch_pred_x_flip) * 0.5 batch_pred_y = (_batch_pred_y + _batch_pred_y_flip) * 0.5 @@ -277,13 +273,15 @@ class SimCCHead(BaseHead): preds = self.decode((batch_pred_x, batch_pred_y)) - if test_cfg.get('output_heatmaps', False): + if test_cfg.get("output_heatmaps", False): rank, _ = get_dist_info() if rank == 0: - warnings.warn('The predicted simcc values are normalized for ' - 'visualization. This may cause discrepancy ' - 'between the keypoint scores and the 1D heatmaps' - '.') + warnings.warn( + "The predicted simcc values are normalized for " + "visualization. This may cause discrepancy " + "between the keypoint scores and the 1D heatmaps" + "." + ) # normalize the predicted 1d distribution sigma = self.decoder.sigma @@ -297,13 +295,9 @@ class SimCCHead(BaseHead): y = batch_pred_y.reshape(B, K, -1, 1) # B, K, Wx, Wy batch_heatmaps = torch.matmul(y, x) - pred_fields = [ - PixelData(heatmaps=hm) for hm in batch_heatmaps.detach() - ] + pred_fields = [PixelData(heatmaps=hm) for hm in batch_heatmaps.detach()] - for pred_instances, pred_x, pred_y in zip(preds, - to_numpy(batch_pred_x), - to_numpy(batch_pred_y)): + for pred_instances, pred_x, pred_y in zip(preds, to_numpy(batch_pred_x), to_numpy(batch_pred_y)): pred_instances.keypoint_x_labels = pred_x[None] pred_instances.keypoint_y_labels = pred_y[None] @@ -322,19 +316,10 @@ class SimCCHead(BaseHead): pred_x, pred_y = self.forward(feats) - gt_x = torch.cat([ - d.gt_instance_labels.keypoint_x_labels for d in batch_data_samples - ], - dim=0) - gt_y = torch.cat([ - d.gt_instance_labels.keypoint_y_labels for d in batch_data_samples - ], - dim=0) + gt_x = torch.cat([d.gt_instance_labels.keypoint_x_labels for d in batch_data_samples], dim=0) + gt_y = torch.cat([d.gt_instance_labels.keypoint_y_labels for d in batch_data_samples], dim=0) keypoint_weights = torch.cat( - [ - d.gt_instance_labels.keypoint_weights - for d in batch_data_samples - ], + [d.gt_instance_labels.keypoint_weights for d in batch_data_samples], dim=0, ) @@ -363,9 +348,8 @@ class SimCCHead(BaseHead): @property def default_init_cfg(self): init_cfg = [ - dict( - type='Normal', layer=['Conv2d', 'ConvTranspose2d'], std=0.001), - dict(type='Constant', layer='BatchNorm2d', val=1), - dict(type='Normal', layer=['Linear'], std=0.01, bias=0), + dict(type="Normal", layer=["Conv2d", "ConvTranspose2d"], std=0.001), + dict(type="Constant", layer="BatchNorm2d", val=1), + dict(type="Normal", layer=["Linear"], std=0.01, bias=0), ] return init_cfg diff --git a/mmpose/models/heads/heatmap_heads/__init__.py b/mmpose/models/heads/heatmap_heads/__init__.py index c629455c195755ae1800e87390a56ab56d1dae96..327cf21fdf4a4fc70cb71f33c5f9cbe085b5f4de 100644 --- a/mmpose/models/heads/heatmap_heads/__init__.py +++ b/mmpose/models/heads/heatmap_heads/__init__.py @@ -7,7 +7,4 @@ from .internet_head import InternetHead from .mspn_head import MSPNHead from .vipnas_head import ViPNASHead -__all__ = [ - 'HeatmapHead', 'CPMHead', 'MSPNHead', 'ViPNASHead', - 'AssociativeEmbeddingHead', 'CIDHead', 'InternetHead' -] +__all__ = ["HeatmapHead", "CPMHead", "MSPNHead", "ViPNASHead", "AssociativeEmbeddingHead", "CIDHead", "InternetHead"] diff --git a/mmpose/models/heads/heatmap_heads/ae_head.py b/mmpose/models/heads/heatmap_heads/ae_head.py index c9559eebc2696fa0363ffdb4807c9a0e70d04e26..c7fd303410b863c12e11ff5135c377a37d9b7e0c 100644 --- a/mmpose/models/heads/heatmap_heads/ae_head.py +++ b/mmpose/models/heads/heatmap_heads/ae_head.py @@ -9,8 +9,8 @@ from torch import Tensor from mmpose.models.utils.tta import aggregate_heatmaps, flip_heatmaps from mmpose.registry import MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, Features, InstanceList, - OptConfigType, OptSampleList, Predictions) +from mmpose.utils.typing import ConfigType, Features, InstanceList, OptConfigType, OptSampleList, Predictions + from .heatmap_head import HeatmapHead OptIntSeq = Optional[Sequence[int]] @@ -19,29 +19,29 @@ OptIntSeq = Optional[Sequence[int]] @MODELS.register_module() class AssociativeEmbeddingHead(HeatmapHead): - def __init__(self, - in_channels: Union[int, Sequence[int]], - num_keypoints: int, - tag_dim: int = 1, - tag_per_keypoint: bool = True, - deconv_out_channels: OptIntSeq = (256, 256, 256), - deconv_kernel_sizes: OptIntSeq = (4, 4, 4), - conv_out_channels: OptIntSeq = None, - conv_kernel_sizes: OptIntSeq = None, - final_layer: dict = dict(kernel_size=1), - keypoint_loss: ConfigType = dict(type='KeypointMSELoss'), - tag_loss: ConfigType = dict(type='AssociativeEmbeddingLoss'), - decoder: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + num_keypoints: int, + tag_dim: int = 1, + tag_per_keypoint: bool = True, + deconv_out_channels: OptIntSeq = (256, 256, 256), + deconv_kernel_sizes: OptIntSeq = (4, 4, 4), + conv_out_channels: OptIntSeq = None, + conv_kernel_sizes: OptIntSeq = None, + final_layer: dict = dict(kernel_size=1), + keypoint_loss: ConfigType = dict(type="KeypointMSELoss"), + tag_loss: ConfigType = dict(type="AssociativeEmbeddingLoss"), + decoder: OptConfigType = None, + init_cfg: OptConfigType = None, + ): if tag_per_keypoint: out_channels = num_keypoints * (1 + tag_dim) else: out_channels = num_keypoints + tag_dim - loss = dict( - type='CombinedLoss', - losses=dict(keypoint_loss=keypoint_loss, tag_loss=tag_loss)) + loss = dict(type="CombinedLoss", losses=dict(keypoint_loss=keypoint_loss, tag_loss=tag_loss)) super().__init__( in_channels=in_channels, @@ -53,16 +53,14 @@ class AssociativeEmbeddingHead(HeatmapHead): final_layer=final_layer, loss=loss, decoder=decoder, - init_cfg=init_cfg) + init_cfg=init_cfg, + ) self.num_keypoints = num_keypoints self.tag_dim = tag_dim self.tag_per_keypoint = tag_per_keypoint - def predict(self, - feats: Features, - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Features, batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from features. Args: @@ -99,12 +97,12 @@ class AssociativeEmbeddingHead(HeatmapHead): - heatmaps (Tensor): The predicted heatmaps in shape (K, h, w) """ # test configs - multiscale_test = test_cfg.get('multiscale_test', False) - flip_test = test_cfg.get('flip_test', False) - shift_heatmap = test_cfg.get('shift_heatmap', False) - align_corners = test_cfg.get('align_corners', False) - restore_heatmap_size = test_cfg.get('restore_heatmap_size', False) - output_heatmaps = test_cfg.get('output_heatmaps', False) + multiscale_test = test_cfg.get("multiscale_test", False) + flip_test = test_cfg.get("flip_test", False) + shift_heatmap = test_cfg.get("shift_heatmap", False) + align_corners = test_cfg.get("align_corners", False) + restore_heatmap_size = test_cfg.get("restore_heatmap_size", False) + output_heatmaps = test_cfg.get("output_heatmaps", False) # enable multi-scale test if multiscale_test: @@ -116,9 +114,8 @@ class AssociativeEmbeddingHead(HeatmapHead): # resize heatmaps to align with with input size if restore_heatmap_size: - img_shape = batch_data_samples[0].metainfo['img_shape'] - assert all(d.metainfo['img_shape'] == img_shape - for d in batch_data_samples) + img_shape = batch_data_samples[0].metainfo["img_shape"] + assert all(d.metainfo["img_shape"] == img_shape for d in batch_data_samples) img_h, img_w = img_shape heatmap_size = (img_w, img_h) else: @@ -134,36 +131,24 @@ class AssociativeEmbeddingHead(HeatmapHead): else: # TTA: flip test assert isinstance(_feats, list) and len(_feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] # original _feats_orig, _feats_flip = _feats _heatmaps_orig, _tags_orig = self.forward(_feats_orig) # flipped _heatmaps_flip, _tags_flip = self.forward(_feats_flip) - _heatmaps_flip = flip_heatmaps( - _heatmaps_flip, - flip_mode='heatmap', - flip_indices=flip_indices, - shift_heatmap=shift_heatmap) - _tags_flip = self._flip_tags( - _tags_flip, - flip_indices=flip_indices, - shift_heatmap=shift_heatmap) + _heatmaps_flip = flip_heatmaps(_heatmaps_flip, flip_mode="heatmap", flip_indices=flip_indices, shift_heatmap=shift_heatmap) + _tags_flip = self._flip_tags(_tags_flip, flip_indices=flip_indices, shift_heatmap=shift_heatmap) # aggregated heatmaps _heatmaps = aggregate_heatmaps( - [_heatmaps_orig, _heatmaps_flip], - size=heatmap_size, - align_corners=align_corners, - mode='average') + [_heatmaps_orig, _heatmaps_flip], size=heatmap_size, align_corners=align_corners, mode="average" + ) # aggregated tags (only at original scale) if scale_idx == 0: - _tags = aggregate_heatmaps([_tags_orig, _tags_flip], - size=heatmap_size, - align_corners=align_corners, - mode='concat') + _tags = aggregate_heatmaps([_tags_orig, _tags_flip], size=heatmap_size, align_corners=align_corners, mode="concat") else: _tags = None @@ -172,10 +157,7 @@ class AssociativeEmbeddingHead(HeatmapHead): # aggregate multi-scale heatmaps if len(feats) > 1: - batch_heatmaps = aggregate_heatmaps( - multiscale_heatmaps, - align_corners=align_corners, - mode='average') + batch_heatmaps = aggregate_heatmaps(multiscale_heatmaps, align_corners=align_corners, mode="average") else: batch_heatmaps = multiscale_heatmaps[0] # only keep tags at original scale @@ -186,18 +168,14 @@ class AssociativeEmbeddingHead(HeatmapHead): if output_heatmaps: pred_fields = [] - for _heatmaps, _tags in zip(batch_heatmaps.detach(), - batch_tags.detach()): + for _heatmaps, _tags in zip(batch_heatmaps.detach(), batch_tags.detach()): pred_fields.append(PixelData(heatmaps=_heatmaps, tags=_tags)) return preds, pred_fields else: return preds - def _flip_tags(self, - tags: Tensor, - flip_indices: List[int], - shift_heatmap: bool = True): + def _flip_tags(self, tags: Tensor, flip_indices: List[int], shift_heatmap: bool = True): """Flip the tagging heatmaps horizontally for test-time augmentation. Args: @@ -227,8 +205,7 @@ class AssociativeEmbeddingHead(HeatmapHead): return tags - def decode(self, batch_outputs: Union[Tensor, - Tuple[Tensor]]) -> InstanceList: + def decode(self, batch_outputs: Union[Tensor, Tuple[Tensor]]) -> InstanceList: """Decode keypoints from outputs. Args: @@ -242,18 +219,18 @@ class AssociativeEmbeddingHead(HeatmapHead): def _pack_and_call(args, func): if not isinstance(args, tuple): - args = (args, ) + args = (args,) return func(*args) if self.decoder is None: raise RuntimeError( - f'The decoder has not been set in {self.__class__.__name__}. ' - 'Please set the decoder configs in the init parameters to ' - 'enable head methods `head.predict()` and `head.decode()`') + f"The decoder has not been set in {self.__class__.__name__}. " + "Please set the decoder configs in the init parameters to " + "enable head methods `head.predict()` and `head.decode()`" + ) if self.decoder.support_batch_decoding: - batch_keypoints, batch_scores, batch_instance_scores = \ - _pack_and_call(batch_outputs, self.decoder.batch_decode) + batch_keypoints, batch_scores, batch_instance_scores = _pack_and_call(batch_outputs, self.decoder.batch_decode) else: batch_output_np = to_numpy(batch_outputs, unzip=True) @@ -261,19 +238,14 @@ class AssociativeEmbeddingHead(HeatmapHead): batch_scores = [] batch_instance_scores = [] for outputs in batch_output_np: - keypoints, scores, instance_scores = _pack_and_call( - outputs, self.decoder.decode) + keypoints, scores, instance_scores = _pack_and_call(outputs, self.decoder.decode) batch_keypoints.append(keypoints) batch_scores.append(scores) batch_instance_scores.append(instance_scores) preds = [ - InstanceData( - bbox_scores=instance_scores, - keypoints=keypoints, - keypoint_scores=scores) - for keypoints, scores, instance_scores in zip( - batch_keypoints, batch_scores, batch_instance_scores) + InstanceData(bbox_scores=instance_scores, keypoints=keypoints, keypoint_scores=scores) + for keypoints, scores, instance_scores in zip(batch_keypoints, batch_scores, batch_instance_scores) ] return preds @@ -292,14 +264,11 @@ class AssociativeEmbeddingHead(HeatmapHead): """ output = super().forward(feats) - heatmaps = output[:, :self.num_keypoints] - tags = output[:, self.num_keypoints:] + heatmaps = output[:, : self.num_keypoints] + tags = output[:, self.num_keypoints :] return heatmaps, tags - def loss(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples. Args: @@ -317,27 +286,15 @@ class AssociativeEmbeddingHead(HeatmapHead): if not self.tag_per_keypoint: pred_tags = pred_tags.repeat((1, self.num_keypoints, 1, 1)) - gt_heatmaps = torch.stack( - [d.gt_fields.heatmaps for d in batch_data_samples]) - gt_masks = torch.stack( - [d.gt_fields.heatmap_mask for d in batch_data_samples]) - keypoint_weights = torch.cat([ - d.gt_instance_labels.keypoint_weights for d in batch_data_samples - ]) - keypoint_indices = [ - d.gt_instance_labels.keypoint_indices for d in batch_data_samples - ] + gt_heatmaps = torch.stack([d.gt_fields.heatmaps for d in batch_data_samples]) + gt_masks = torch.stack([d.gt_fields.heatmap_mask for d in batch_data_samples]) + keypoint_weights = torch.cat([d.gt_instance_labels.keypoint_weights for d in batch_data_samples]) + keypoint_indices = [d.gt_instance_labels.keypoint_indices for d in batch_data_samples] - loss_kpt = self.loss_module.keypoint_loss(pred_heatmaps, gt_heatmaps, - keypoint_weights, gt_masks) + loss_kpt = self.loss_module.keypoint_loss(pred_heatmaps, gt_heatmaps, keypoint_weights, gt_masks) - loss_pull, loss_push = self.loss_module.tag_loss( - pred_tags, keypoint_indices) + loss_pull, loss_push = self.loss_module.tag_loss(pred_tags, keypoint_indices) - losses = { - 'loss_kpt': loss_kpt, - 'loss_pull': loss_pull, - 'loss_push': loss_push - } + losses = {"loss_kpt": loss_kpt, "loss_pull": loss_pull, "loss_push": loss_push} return losses diff --git a/mmpose/models/heads/heatmap_heads/cid_head.py b/mmpose/models/heads/heatmap_heads/cid_head.py index 39e0211a3e135c1c101c14e37956528d3330ca1b..bf8dbe64a863890bc0447a92b59d7f356108166e 100644 --- a/mmpose/models/heads/heatmap_heads/cid_head.py +++ b/mmpose/models/heads/heatmap_heads/cid_head.py @@ -12,8 +12,8 @@ from torch import Tensor from mmpose.models.utils.tta import flip_heatmaps from mmpose.registry import KEYPOINT_CODECS, MODELS -from mmpose.utils.typing import (ConfigType, Features, OptConfigType, - OptSampleList, Predictions) +from mmpose.utils.typing import ConfigType, Features, OptConfigType, OptSampleList, Predictions + from ..base_head import BaseHead @@ -27,8 +27,7 @@ def smooth_heatmaps(heatmaps: Tensor, blur_kernel_size: int) -> Tensor: Returns: Tensor: The smoothed heatmaps. """ - smoothed_heatmaps = torch.nn.functional.avg_pool2d( - heatmaps, blur_kernel_size, 1, (blur_kernel_size - 1) // 2) + smoothed_heatmaps = torch.nn.functional.avg_pool2d(heatmaps, blur_kernel_size, 1, (blur_kernel_size - 1) // 2) smoothed_heatmaps = (heatmaps + smoothed_heatmaps) / 2.0 return smoothed_heatmaps @@ -78,12 +77,7 @@ class IIAModule(BaseModule): ): super().__init__(init_cfg=init_cfg) - self.keypoint_root_conv = build_conv_layer( - dict( - type='Conv2d', - in_channels=in_channels, - out_channels=out_channels, - kernel_size=1)) + self.keypoint_root_conv = build_conv_layer(dict(type="Conv2d", in_channels=in_channels, out_channels=out_channels, kernel_size=1)) self.sigmoid = TruncSigmoid(min=clamp_delta, max=1 - clamp_delta) def forward(self, feats: Tensor): @@ -110,12 +104,10 @@ class IIAModule(BaseModule): w, h = [ind.squeeze(-1) for ind in indices.split(1, -1)] instance_feats = feats[:, :, h, w] instance_feats = instance_feats.permute(0, 2, 1) - instance_feats = instance_feats.reshape(-1, - instance_feats.shape[-1]) + instance_feats = instance_feats.reshape(-1, instance_feats.shape[-1]) else: - raise ValueError(f'`indices` should have 2 or 3 channels, ' - f'but got f{indices.shape[1]}') + raise ValueError(f"`indices` should have 2 or 3 channels, " f"but got f{indices.shape[1]}") return instance_feats def _hierarchical_pool(self, heatmaps: Tensor) -> Tensor: @@ -137,8 +129,7 @@ class IIAModule(BaseModule): maxm = torch.nn.functional.max_pool2d(heatmaps, 3, 1, 1) return maxm - def forward_train(self, feats: Tensor, instance_coords: Tensor, - instance_imgids: Tensor) -> Tuple[Tensor, Tensor]: + def forward_train(self, feats: Tensor, instance_coords: Tensor, instance_imgids: Tensor) -> Tuple[Tensor, Tensor]: """Forward pass during training. Args: @@ -157,9 +148,7 @@ class IIAModule(BaseModule): return instance_feats, heatmaps - def forward_test( - self, feats: Tensor, test_cfg: Dict - ) -> Tuple[Optional[Tensor], Optional[Tensor], Optional[Tensor]]: + def forward_test(self, feats: Tensor, test_cfg: Dict) -> Tuple[Optional[Tensor], Optional[Tensor], Optional[Tensor]]: """Forward pass during testing. Args: @@ -180,14 +169,14 @@ class IIAModule(BaseModule): coordinates, and scores of the instances. Any of these can be empty Tensor if no instances are extracted. """ - blur_kernel_size = test_cfg.get('blur_kernel_size', 3) - max_instances = test_cfg.get('max_instances', 30) - score_threshold = test_cfg.get('score_threshold', 0.01) + blur_kernel_size = test_cfg.get("blur_kernel_size", 3) + max_instances = test_cfg.get("max_instances", 30) + score_threshold = test_cfg.get("score_threshold", 0.01) H, W = feats.shape[-2:] # compute heatmaps heatmaps = self.forward(feats).narrow(1, -1, 1) - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): heatmaps = heatmaps.mean(dim=0, keepdims=True) smoothed_heatmaps = smooth_heatmaps(heatmaps, blur_kernel_size) @@ -243,7 +232,7 @@ class SpatialAttention(nn.Module): self.feat_stride = 4 self.conv = nn.Conv2d(3, 1, 5, 1, 2) - def _get_pixel_coords(self, heatmap_size: Tuple, device: str = 'cpu'): + def _get_pixel_coords(self, heatmap_size: Tuple, device: str = "cpu"): """Get pixel coordinates for each element in the heatmap. Args: @@ -260,8 +249,7 @@ class SpatialAttention(nn.Module): pixel_coords = pixel_coords.float().to(device) + 0.5 return pixel_coords - def forward(self, global_feats: Tensor, instance_feats: Tensor, - instance_coords: Tensor) -> Tensor: + def forward(self, global_feats: Tensor, instance_feats: Tensor, instance_coords: Tensor) -> Tensor: """Perform spatial attention. Args: @@ -281,8 +269,7 @@ class SpatialAttention(nn.Module): fsum = torch.sum(feats, dim=1, keepdim=True) pixel_coords = self._get_pixel_coords((W, H), feats.device) - relative_coords = instance_coords.reshape( - -1, 1, 2) - pixel_coords.reshape(1, -1, 2) + relative_coords = instance_coords.reshape(-1, 1, 2) - pixel_coords.reshape(1, -1, 2) relative_coords = relative_coords.permute(0, 2, 1) / 32.0 relative_coords = relative_coords.reshape(B, 2, H, W) @@ -316,27 +303,12 @@ class GFDModule(BaseModule): ): super().__init__(init_cfg=init_cfg) - self.conv_down = build_conv_layer( - dict( - type='Conv2d', - in_channels=in_channels, - out_channels=gfd_channels, - kernel_size=1)) + self.conv_down = build_conv_layer(dict(type="Conv2d", in_channels=in_channels, out_channels=gfd_channels, kernel_size=1)) self.channel_attention = ChannelAttention(in_channels, gfd_channels) self.spatial_attention = SpatialAttention(in_channels, gfd_channels) - self.fuse_attention = build_conv_layer( - dict( - type='Conv2d', - in_channels=gfd_channels * 2, - out_channels=gfd_channels, - kernel_size=1)) - self.heatmap_conv = build_conv_layer( - dict( - type='Conv2d', - in_channels=gfd_channels, - out_channels=out_channels, - kernel_size=1)) + self.fuse_attention = build_conv_layer(dict(type="Conv2d", in_channels=gfd_channels * 2, out_channels=gfd_channels, kernel_size=1)) + self.heatmap_conv = build_conv_layer(dict(type="Conv2d", in_channels=gfd_channels, out_channels=out_channels, kernel_size=1)) self.sigmoid = TruncSigmoid(min=clamp_delta, max=1 - clamp_delta) def forward( @@ -364,10 +336,9 @@ class GFDModule(BaseModule): global_feats = self.conv_down(feats) global_feats = global_feats[instance_imgids] cond_instance_feats = torch.cat( - (self.channel_attention(global_feats, instance_feats), - self.spatial_attention(global_feats, instance_feats, - instance_coords)), - dim=1) + (self.channel_attention(global_feats, instance_feats), self.spatial_attention(global_feats, instance_feats, instance_coords)), + dim=1, + ) cond_instance_feats = self.fuse_attention(cond_instance_feats) cond_instance_feats = torch.nn.functional.relu(cond_instance_feats) @@ -409,20 +380,21 @@ class CIDHead(BaseHead): Contextual_Instance_Decoupling_for_Robust_Multi-Person_Pose_Estimation_ CVPR_2022_paper.html """ + _version = 2 - def __init__(self, - in_channels: Union[int, Sequence[int]], - gfd_channels: int, - num_keypoints: int, - prior_prob: float = 0.01, - coupled_heatmap_loss: OptConfigType = dict( - type='FocalHeatmapLoss'), - decoupled_heatmap_loss: OptConfigType = dict( - type='FocalHeatmapLoss'), - contrastive_loss: OptConfigType = dict(type='InfoNCELoss'), - decoder: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + gfd_channels: int, + num_keypoints: int, + prior_prob: float = 0.01, + coupled_heatmap_loss: OptConfigType = dict(type="FocalHeatmapLoss"), + decoupled_heatmap_loss: OptConfigType = dict(type="FocalHeatmapLoss"), + contrastive_loss: OptConfigType = dict(type="InfoNCELoss"), + decoder: OptConfigType = None, + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg @@ -441,32 +413,30 @@ class CIDHead(BaseHead): self.iia_module = IIAModule( in_channels, num_keypoints + 1, - init_cfg=init_cfg + [ + init_cfg=init_cfg + + [ dict( - type='Normal', - layer=['Conv2d', 'Linear'], + type="Normal", + layer=["Conv2d", "Linear"], std=0.001, - override=dict( - name='keypoint_root_conv', - type='Normal', - std=0.001, - bias=bias_value)) - ]) + override=dict(name="keypoint_root_conv", type="Normal", std=0.001, bias=bias_value), + ) + ], + ) self.gfd_module = GFDModule( in_channels, num_keypoints, gfd_channels, - init_cfg=init_cfg + [ + init_cfg=init_cfg + + [ dict( - type='Normal', - layer=['Conv2d', 'Linear'], + type="Normal", + layer=["Conv2d", "Linear"], std=0.001, - override=dict( - name='heatmap_conv', - type='Normal', - std=0.001, - bias=bias_value)) - ]) + override=dict(name="heatmap_conv", type="Normal", std=0.001, bias=bias_value), + ) + ], + ) # build losses self.loss_module = ModuleDict( @@ -474,17 +444,15 @@ class CIDHead(BaseHead): heatmap_coupled=MODELS.build(coupled_heatmap_loss), heatmap_decoupled=MODELS.build(decoupled_heatmap_loss), contrastive=MODELS.build(contrastive_loss), - )) + ) + ) # Register the hook to automatically convert old version state dicts self._register_load_state_dict_pre_hook(self._load_state_dict_pre_hook) @property def default_init_cfg(self): - init_cfg = [ - dict(type='Normal', layer=['Conv2d', 'Linear'], std=0.001), - dict(type='Constant', layer='BatchNorm2d', val=1) - ] + init_cfg = [dict(type="Normal", layer=["Conv2d", "Linear"], std=0.001), dict(type="Constant", layer="BatchNorm2d", val=1)] return init_cfg def forward(self, feats: Tuple[Tensor]) -> Tensor: @@ -500,17 +468,12 @@ class CIDHead(BaseHead): feats = feats[-1] instance_info = self.iia_module.forward_test(feats, {}) instance_feats, instance_coords, instance_scores = instance_info - instance_imgids = torch.zeros( - instance_coords.size(0), dtype=torch.long, device=feats.device) - instance_heatmaps = self.gfd_module(feats, instance_feats, - instance_coords, instance_imgids) + instance_imgids = torch.zeros(instance_coords.size(0), dtype=torch.long, device=feats.device) + instance_heatmaps = self.gfd_module(feats, instance_feats, instance_coords, instance_imgids) return instance_heatmaps - def predict(self, - feats: Features, - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Features, batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from features. Args: @@ -542,7 +505,7 @@ class CIDHead(BaseHead): """ metainfo = batch_data_samples[0].metainfo - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): assert isinstance(feats, list) and len(feats) == 2 feats_flipped = flip_heatmaps(feats[1][-1], shift_heatmap=False) @@ -553,57 +516,35 @@ class CIDHead(BaseHead): instance_info = self.iia_module.forward_test(feats, test_cfg) instance_feats, instance_coords, instance_scores = instance_info if len(instance_coords) > 0: - instance_imgids = torch.zeros( - instance_coords.size(0), dtype=torch.long, device=feats.device) - if test_cfg.get('flip_test', False): + instance_imgids = torch.zeros(instance_coords.size(0), dtype=torch.long, device=feats.device) + if test_cfg.get("flip_test", False): instance_coords = torch.cat((instance_coords, instance_coords)) - instance_imgids = torch.cat( - (instance_imgids, instance_imgids + 1)) - instance_heatmaps = self.gfd_module(feats, instance_feats, - instance_coords, - instance_imgids) - if test_cfg.get('flip_test', False): - flip_indices = batch_data_samples[0].metainfo['flip_indices'] - instance_heatmaps, instance_heatmaps_flip = torch.chunk( - instance_heatmaps, 2, dim=0) - instance_heatmaps_flip = \ - instance_heatmaps_flip[:, flip_indices, :, :] - instance_heatmaps = (instance_heatmaps + - instance_heatmaps_flip) / 2.0 - instance_heatmaps = smooth_heatmaps( - instance_heatmaps, test_cfg.get('blur_kernel_size', 3)) + instance_imgids = torch.cat((instance_imgids, instance_imgids + 1)) + instance_heatmaps = self.gfd_module(feats, instance_feats, instance_coords, instance_imgids) + if test_cfg.get("flip_test", False): + flip_indices = batch_data_samples[0].metainfo["flip_indices"] + instance_heatmaps, instance_heatmaps_flip = torch.chunk(instance_heatmaps, 2, dim=0) + instance_heatmaps_flip = instance_heatmaps_flip[:, flip_indices, :, :] + instance_heatmaps = (instance_heatmaps + instance_heatmaps_flip) / 2.0 + instance_heatmaps = smooth_heatmaps(instance_heatmaps, test_cfg.get("blur_kernel_size", 3)) preds = self.decode((instance_heatmaps, instance_scores[:, None])) preds = InstanceData.cat(preds) - preds.keypoints[..., 0] += metainfo['input_size'][ - 0] / instance_heatmaps.shape[-1] / 2.0 - preds.keypoints[..., 1] += metainfo['input_size'][ - 1] / instance_heatmaps.shape[-2] / 2.0 + preds.keypoints[..., 0] += metainfo["input_size"][0] / instance_heatmaps.shape[-1] / 2.0 + preds.keypoints[..., 1] += metainfo["input_size"][1] / instance_heatmaps.shape[-2] / 2.0 preds = [preds] else: - preds = [ - InstanceData( - keypoints=np.empty((0, self.num_keypoints, 2)), - keypoint_scores=np.empty((0, self.num_keypoints))) - ] - instance_heatmaps = torch.empty(0, self.num_keypoints, - *feats.shape[-2:]) - - if test_cfg.get('output_heatmaps', False): - pred_fields = [ - PixelData( - heatmaps=instance_heatmaps.reshape( - -1, *instance_heatmaps.shape[-2:])) - ] + preds = [InstanceData(keypoints=np.empty((0, self.num_keypoints, 2)), keypoint_scores=np.empty((0, self.num_keypoints)))] + instance_heatmaps = torch.empty(0, self.num_keypoints, *feats.shape[-2:]) + + if test_cfg.get("output_heatmaps", False): + pred_fields = [PixelData(heatmaps=instance_heatmaps.reshape(-1, *instance_heatmaps.shape[-2:]))] return preds, pred_fields else: return preds - def loss(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples. Args: @@ -625,17 +566,12 @@ class CIDHead(BaseHead): gt_heatmaps.append(d.gt_fields.heatmaps) gt_instance_coords.append(d.gt_instance_labels.instance_coords) keypoint_weights.append(d.gt_instance_labels.keypoint_weights) - instance_imgids.append( - torch.ones( - len(d.gt_instance_labels.instance_coords), - dtype=torch.long) * i) - - instance_heatmaps = d.gt_fields.instance_heatmaps.reshape( - -1, self.num_keypoints, - *d.gt_fields.instance_heatmaps.shape[1:]) + instance_imgids.append(torch.ones(len(d.gt_instance_labels.instance_coords), dtype=torch.long) * i) + + instance_heatmaps = d.gt_fields.instance_heatmaps.reshape(-1, self.num_keypoints, *d.gt_fields.instance_heatmaps.shape[1:]) gt_instance_heatmaps.append(instance_heatmaps) - if 'heatmap_mask' in d.gt_fields: + if "heatmap_mask" in d.gt_fields: heatmap_mask.append(d.gt_fields.heatmap_mask) gt_heatmaps = torch.stack(gt_heatmaps) @@ -648,25 +584,19 @@ class CIDHead(BaseHead): # feed-forward feats = feats[-1] - pred_instance_feats, pred_heatmaps = self.iia_module.forward_train( - feats, gt_instance_coords, instance_imgids) + pred_instance_feats, pred_heatmaps = self.iia_module.forward_train(feats, gt_instance_coords, instance_imgids) # conpute contrastive loss contrastive_loss = 0 for i in range(len(batch_data_samples)): pred_instance_feat = pred_instance_feats[instance_imgids == i] - contrastive_loss += self.loss_module['contrastive']( - pred_instance_feat) + contrastive_loss += self.loss_module["contrastive"](pred_instance_feat) contrastive_loss = contrastive_loss / max(1, len(instance_imgids)) # limit the number of instances - max_train_instances = train_cfg.get('max_train_instances', -1) - if (max_train_instances > 0 - and len(instance_imgids) > max_train_instances): - selected_indices = torch.randperm( - len(instance_imgids), - device=gt_heatmaps.device, - dtype=torch.long)[:max_train_instances] + max_train_instances = train_cfg.get("max_train_instances", -1) + if max_train_instances > 0 and len(instance_imgids) > max_train_instances: + selected_indices = torch.randperm(len(instance_imgids), device=gt_heatmaps.device, dtype=torch.long)[:max_train_instances] gt_instance_coords = gt_instance_coords[selected_indices] keypoint_weights = keypoint_weights[selected_indices] gt_instance_heatmaps = gt_instance_heatmaps[selected_indices] @@ -674,70 +604,62 @@ class CIDHead(BaseHead): pred_instance_feats = pred_instance_feats[selected_indices] # calculate the decoupled heatmaps for each instance - pred_instance_heatmaps = self.gfd_module(feats, pred_instance_feats, - gt_instance_coords, - instance_imgids) + pred_instance_heatmaps = self.gfd_module(feats, pred_instance_feats, gt_instance_coords, instance_imgids) # calculate losses - losses = { - 'loss/heatmap_coupled': - self.loss_module['heatmap_coupled'](pred_heatmaps, gt_heatmaps, - None, heatmap_mask) - } + losses = {"loss/heatmap_coupled": self.loss_module["heatmap_coupled"](pred_heatmaps, gt_heatmaps, None, heatmap_mask)} if len(instance_imgids) > 0: - losses.update({ - 'loss/heatmap_decoupled': - self.loss_module['heatmap_decoupled'](pred_instance_heatmaps, - gt_instance_heatmaps, - keypoint_weights), - 'loss/contrastive': - contrastive_loss - }) + losses.update( + { + "loss/heatmap_decoupled": self.loss_module["heatmap_decoupled"]( + pred_instance_heatmaps, gt_instance_heatmaps, keypoint_weights + ), + "loss/contrastive": contrastive_loss, + } + ) return losses - def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, - **kwargs): + def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, **kwargs): """A hook function to convert old-version state dict of :class:`CIDHead` (before MMPose v1.0.0) to a compatible format of :class:`CIDHead`. The hook will be automatically registered during initialization. """ - version = local_meta.get('version', None) + version = local_meta.get("version", None) if version and version >= self._version: return # convert old-version state dict keys = list(state_dict.keys()) for k in keys: - if 'keypoint_center_conv' in k: + if "keypoint_center_conv" in k: v = state_dict.pop(k) - k = k.replace('keypoint_center_conv', - 'iia_module.keypoint_root_conv') + k = k.replace("keypoint_center_conv", "iia_module.keypoint_root_conv") state_dict[k] = v - if 'conv_down' in k: + if "conv_down" in k: v = state_dict.pop(k) - k = k.replace('conv_down', 'gfd_module.conv_down') + k = k.replace("conv_down", "gfd_module.conv_down") state_dict[k] = v - if 'c_attn' in k: + if "c_attn" in k: v = state_dict.pop(k) - k = k.replace('c_attn', 'gfd_module.channel_attention') + k = k.replace("c_attn", "gfd_module.channel_attention") state_dict[k] = v - if 's_attn' in k: + if "s_attn" in k: v = state_dict.pop(k) - k = k.replace('s_attn', 'gfd_module.spatial_attention') + k = k.replace("s_attn", "gfd_module.spatial_attention") state_dict[k] = v - if 'fuse_attn' in k: + if "fuse_attn" in k: v = state_dict.pop(k) - k = k.replace('fuse_attn', 'gfd_module.fuse_attention') + k = k.replace("fuse_attn", "gfd_module.fuse_attention") state_dict[k] = v - if 'heatmap_conv' in k: + if "heatmap_conv" in k: v = state_dict.pop(k) - k = k.replace('heatmap_conv', 'gfd_module.heatmap_conv') + k = k.replace("heatmap_conv", "gfd_module.heatmap_conv") state_dict[k] = v diff --git a/mmpose/models/heads/heatmap_heads/cpm_head.py b/mmpose/models/heads/heatmap_heads/cpm_head.py index 1ba46357ec5cf72b29b43635a53354f2ed2fd048..fcd6b64e01434abcb3a73df0bc788e81f123a801 100644 --- a/mmpose/models/heads/heatmap_heads/cpm_head.py +++ b/mmpose/models/heads/heatmap_heads/cpm_head.py @@ -10,8 +10,8 @@ from mmpose.evaluation.functional import pose_pck_accuracy from mmpose.models.utils.tta import flip_heatmaps from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (Features, MultiConfig, OptConfigType, - OptSampleList, Predictions) +from mmpose.utils.typing import Features, MultiConfig, OptConfigType, OptSampleList, Predictions + from ..base_head import BaseHead OptIntSeq = Optional[Sequence[int]] @@ -51,17 +51,18 @@ class CPMHead(BaseHead): _version = 2 - def __init__(self, - in_channels: Union[int, Sequence[int]], - out_channels: int, - num_stages: int, - deconv_out_channels: OptIntSeq = None, - deconv_kernel_sizes: OptIntSeq = None, - final_layer: dict = dict(kernel_size=1), - loss: MultiConfig = dict( - type='KeypointMSELoss', use_target_weight=True), - decoder: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + out_channels: int, + num_stages: int, + deconv_out_channels: OptIntSeq = None, + deconv_kernel_sizes: OptIntSeq = None, + final_layer: dict = dict(kernel_size=1), + loss: MultiConfig = dict(type="KeypointMSELoss", use_target_weight=True), + decoder: OptConfigType = None, + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg @@ -73,11 +74,8 @@ class CPMHead(BaseHead): if isinstance(loss, list): if len(loss) != num_stages: - raise ValueError( - f'The length of loss_module({len(loss)}) did not match ' - f'`num_stages`({num_stages})') - self.loss_module = nn.ModuleList( - MODELS.build(_loss) for _loss in loss) + raise ValueError(f"The length of loss_module({len(loss)}) did not match " f"`num_stages`({num_stages})") + self.loss_module = nn.ModuleList(MODELS.build(_loss) for _loss in loss) else: self.loss_module = MODELS.build(loss) @@ -89,13 +87,13 @@ class CPMHead(BaseHead): # build multi-stage deconv layers self.multi_deconv_layers = nn.ModuleList([]) if deconv_out_channels: - if deconv_kernel_sizes is None or len(deconv_out_channels) != len( - deconv_kernel_sizes): + if deconv_kernel_sizes is None or len(deconv_out_channels) != len(deconv_kernel_sizes): raise ValueError( '"deconv_out_channels" and "deconv_kernel_sizes" should ' - 'be integer sequences with the same length. Got ' - f'mismatched lengths {deconv_out_channels} and ' - f'{deconv_kernel_sizes}') + "be integer sequences with the same length. Got " + f"mismatched lengths {deconv_out_channels} and " + f"{deconv_kernel_sizes}" + ) for _ in range(self.num_stages): deconv_layers = self._make_deconv_layers( @@ -112,11 +110,7 @@ class CPMHead(BaseHead): # build multi-stage final layers self.multi_final_layers = nn.ModuleList([]) if final_layer is not None: - cfg = dict( - type='Conv2d', - in_channels=in_channels, - out_channels=out_channels, - kernel_size=1) + cfg = dict(type="Conv2d", in_channels=in_channels, out_channels=out_channels, kernel_size=1) cfg.update(final_layer) for _ in range(self.num_stages): self.multi_final_layers.append(build_conv_layer(cfg)) @@ -126,21 +120,14 @@ class CPMHead(BaseHead): @property def default_init_cfg(self): - init_cfg = [ - dict( - type='Normal', layer=['Conv2d', 'ConvTranspose2d'], std=0.001), - dict(type='Constant', layer='BatchNorm2d', val=1) - ] + init_cfg = [dict(type="Normal", layer=["Conv2d", "ConvTranspose2d"], std=0.001), dict(type="Constant", layer="BatchNorm2d", val=1)] return init_cfg - def _make_deconv_layers(self, in_channels: int, - layer_out_channels: Sequence[int], - layer_kernel_sizes: Sequence[int]) -> nn.Module: + def _make_deconv_layers(self, in_channels: int, layer_out_channels: Sequence[int], layer_kernel_sizes: Sequence[int]) -> nn.Module: """Create deconvolutional layers by given parameters.""" layers = [] - for out_channels, kernel_size in zip(layer_out_channels, - layer_kernel_sizes): + for out_channels, kernel_size in zip(layer_out_channels, layer_kernel_sizes): if kernel_size == 4: padding = 1 output_padding = 0 @@ -151,18 +138,17 @@ class CPMHead(BaseHead): padding = 0 output_padding = 0 else: - raise ValueError(f'Unsupported kernel size {kernel_size} for' - 'deconvlutional layers in ' - f'{self.__class__.__name__}') + raise ValueError(f"Unsupported kernel size {kernel_size} for" "deconvlutional layers in " f"{self.__class__.__name__}") cfg = dict( - type='deconv', + type="deconv", in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=2, padding=padding, output_padding=output_padding, - bias=False) + bias=False, + ) layers.append(build_upsample_layer(cfg)) layers.append(nn.BatchNorm2d(num_features=out_channels)) layers.append(nn.ReLU(inplace=True)) @@ -181,9 +167,7 @@ class CPMHead(BaseHead): List[Tensor]: A list of output heatmaps from multiple stages. """ out = [] - assert len(feats) == self.num_stages, ( - f'The length of feature maps did not match the ' - f'`num_stages` in {self.__class__.__name__}') + assert len(feats) == self.num_stages, f"The length of feature maps did not match the " f"`num_stages` in {self.__class__.__name__}" for i in range(self.num_stages): y = self.multi_deconv_layers[i](feats[i]) y = self.multi_final_layers[i](y) @@ -191,10 +175,7 @@ class CPMHead(BaseHead): return out - def predict(self, - feats: Features, - batch_data_samples: OptSampleList, - test_cfg: OptConfigType = {}) -> Predictions: + def predict(self, feats: Features, batch_data_samples: OptSampleList, test_cfg: OptConfigType = {}) -> Predictions: """Predict results from multi-stage feature maps. Args: @@ -225,17 +206,18 @@ class CPMHead(BaseHead): - heatmaps (Tensor): The predicted heatmaps in shape (K, h, w) """ - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): # TTA: flip test assert isinstance(feats, list) and len(feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] _feats, _feats_flip = feats _batch_heatmaps = self.forward(_feats)[-1] _batch_heatmaps_flip = flip_heatmaps( self.forward(_feats_flip)[-1], - flip_mode=test_cfg.get('flip_mode', 'heatmap'), + flip_mode=test_cfg.get("flip_mode", "heatmap"), flip_indices=flip_indices, - shift_heatmap=test_cfg.get('shift_heatmap', False)) + shift_heatmap=test_cfg.get("shift_heatmap", False), + ) batch_heatmaps = (_batch_heatmaps + _batch_heatmaps_flip) * 0.5 else: multi_stage_heatmaps = self.forward(feats) @@ -243,18 +225,13 @@ class CPMHead(BaseHead): preds = self.decode(batch_heatmaps) - if test_cfg.get('output_heatmaps', False): - pred_fields = [ - PixelData(heatmaps=hm) for hm in batch_heatmaps.detach() - ] + if test_cfg.get("output_heatmaps", False): + pred_fields = [PixelData(heatmaps=hm) for hm in batch_heatmaps.detach()] return preds, pred_fields else: return preds - def loss(self, - feats: Sequence[Tensor], - batch_data_samples: OptSampleList, - train_cfg: OptConfigType = {}) -> dict: + def loss(self, feats: Sequence[Tensor], batch_data_samples: OptSampleList, train_cfg: OptConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples. Args: @@ -269,11 +246,8 @@ class CPMHead(BaseHead): """ multi_stage_pred_heatmaps = self.forward(feats) - gt_heatmaps = torch.stack( - [d.gt_fields.heatmaps for d in batch_data_samples]) - keypoint_weights = torch.cat([ - d.gt_instance_labels.keypoint_weights for d in batch_data_samples - ]) + gt_heatmaps = torch.stack([d.gt_fields.heatmaps for d in batch_data_samples]) + keypoint_weights = torch.cat([d.gt_instance_labels.keypoint_weights for d in batch_data_samples]) # calculate losses over multiple stages losses = dict() @@ -287,19 +261,17 @@ class CPMHead(BaseHead): # the `gt_heatmaps` and `keypoint_weights` used to calculate loss # for different stages are the same - loss_i = loss_func(multi_stage_pred_heatmaps[i], gt_heatmaps, - keypoint_weights) + loss_i = loss_func(multi_stage_pred_heatmaps[i], gt_heatmaps, keypoint_weights) - if 'loss_kpt' not in losses: - losses['loss_kpt'] = loss_i + if "loss_kpt" not in losses: + losses["loss_kpt"] = loss_i else: - losses['loss_kpt'] += loss_i + losses["loss_kpt"] += loss_i # calculate accuracy _, avg_acc, _ = pose_pck_accuracy( - output=to_numpy(multi_stage_pred_heatmaps[-1]), - target=to_numpy(gt_heatmaps), - mask=to_numpy(keypoint_weights) > 0) + output=to_numpy(multi_stage_pred_heatmaps[-1]), target=to_numpy(gt_heatmaps), mask=to_numpy(keypoint_weights) > 0 + ) acc_pose = torch.tensor(avg_acc, device=gt_heatmaps.device) losses.update(acc_pose=acc_pose) diff --git a/mmpose/models/heads/heatmap_heads/heatmap_head.py b/mmpose/models/heads/heatmap_heads/heatmap_head.py index ccb10fcf546243a7b3f013f79806a91d180f1da5..27c4ddb3ae0b88be36ca10db6e846a3abd18afb2 100644 --- a/mmpose/models/heads/heatmap_heads/heatmap_head.py +++ b/mmpose/models/heads/heatmap_heads/heatmap_head.py @@ -10,8 +10,8 @@ from mmpose.evaluation.functional import pose_pck_accuracy from mmpose.models.utils.tta import flip_heatmaps from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, Features, OptConfigType, - OptSampleList, Predictions) +from mmpose.utils.typing import ConfigType, Features, OptConfigType, OptSampleList, Predictions + from ..base_head import BaseHead OptIntSeq = Optional[Sequence[int]] @@ -54,18 +54,19 @@ class HeatmapHead(BaseHead): _version = 2 - def __init__(self, - in_channels: Union[int, Sequence[int]], - out_channels: int, - deconv_out_channels: OptIntSeq = (256, 256, 256), - deconv_kernel_sizes: OptIntSeq = (4, 4, 4), - conv_out_channels: OptIntSeq = None, - conv_kernel_sizes: OptIntSeq = None, - final_layer: dict = dict(kernel_size=1), - loss: ConfigType = dict( - type='KeypointMSELoss', use_target_weight=True), - decoder: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + out_channels: int, + deconv_out_channels: OptIntSeq = (256, 256, 256), + deconv_kernel_sizes: OptIntSeq = (4, 4, 4), + conv_out_channels: OptIntSeq = None, + conv_kernel_sizes: OptIntSeq = None, + final_layer: dict = dict(kernel_size=1), + loss: ConfigType = dict(type="KeypointMSELoss", use_target_weight=True), + decoder: OptConfigType = None, + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg @@ -81,13 +82,13 @@ class HeatmapHead(BaseHead): self.decoder = None if deconv_out_channels: - if deconv_kernel_sizes is None or len(deconv_out_channels) != len( - deconv_kernel_sizes): + if deconv_kernel_sizes is None or len(deconv_out_channels) != len(deconv_kernel_sizes): raise ValueError( '"deconv_out_channels" and "deconv_kernel_sizes" should ' - 'be integer sequences with the same length. Got ' - f'mismatched lengths {deconv_out_channels} and ' - f'{deconv_kernel_sizes}') + "be integer sequences with the same length. Got " + f"mismatched lengths {deconv_out_channels} and " + f"{deconv_kernel_sizes}" + ) self.deconv_layers = self._make_deconv_layers( in_channels=in_channels, @@ -99,28 +100,23 @@ class HeatmapHead(BaseHead): self.deconv_layers = nn.Identity() if conv_out_channels: - if conv_kernel_sizes is None or len(conv_out_channels) != len( - conv_kernel_sizes): + if conv_kernel_sizes is None or len(conv_out_channels) != len(conv_kernel_sizes): raise ValueError( '"conv_out_channels" and "conv_kernel_sizes" should ' - 'be integer sequences with the same length. Got ' - f'mismatched lengths {conv_out_channels} and ' - f'{conv_kernel_sizes}') + "be integer sequences with the same length. Got " + f"mismatched lengths {conv_out_channels} and " + f"{conv_kernel_sizes}" + ) self.conv_layers = self._make_conv_layers( - in_channels=in_channels, - layer_out_channels=conv_out_channels, - layer_kernel_sizes=conv_kernel_sizes) + in_channels=in_channels, layer_out_channels=conv_out_channels, layer_kernel_sizes=conv_kernel_sizes + ) in_channels = conv_out_channels[-1] else: self.conv_layers = nn.Identity() if final_layer is not None: - cfg = dict( - type='Conv2d', - in_channels=in_channels, - out_channels=out_channels, - kernel_size=1) + cfg = dict(type="Conv2d", in_channels=in_channels, out_channels=out_channels, kernel_size=1) cfg.update(final_layer) self.final_layer = build_conv_layer(cfg) else: @@ -129,22 +125,15 @@ class HeatmapHead(BaseHead): # Register the hook to automatically convert old version state dicts self._register_load_state_dict_pre_hook(self._load_state_dict_pre_hook) - def _make_conv_layers(self, in_channels: int, - layer_out_channels: Sequence[int], - layer_kernel_sizes: Sequence[int]) -> nn.Module: + def _make_conv_layers(self, in_channels: int, layer_out_channels: Sequence[int], layer_kernel_sizes: Sequence[int]) -> nn.Module: """Create convolutional layers by given parameters.""" layers = [] - for out_channels, kernel_size in zip(layer_out_channels, - layer_kernel_sizes): + for out_channels, kernel_size in zip(layer_out_channels, layer_kernel_sizes): padding = (kernel_size - 1) // 2 cfg = dict( - type='Conv2d', - in_channels=in_channels, - out_channels=out_channels, - kernel_size=kernel_size, - stride=1, - padding=padding) + type="Conv2d", in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=1, padding=padding + ) layers.append(build_conv_layer(cfg)) layers.append(nn.BatchNorm2d(num_features=out_channels)) layers.append(nn.ReLU(inplace=True)) @@ -152,14 +141,11 @@ class HeatmapHead(BaseHead): return nn.Sequential(*layers) - def _make_deconv_layers(self, in_channels: int, - layer_out_channels: Sequence[int], - layer_kernel_sizes: Sequence[int]) -> nn.Module: + def _make_deconv_layers(self, in_channels: int, layer_out_channels: Sequence[int], layer_kernel_sizes: Sequence[int]) -> nn.Module: """Create deconvolutional layers by given parameters.""" layers = [] - for out_channels, kernel_size in zip(layer_out_channels, - layer_kernel_sizes): + for out_channels, kernel_size in zip(layer_out_channels, layer_kernel_sizes): if kernel_size == 4: padding = 1 output_padding = 0 @@ -170,18 +156,17 @@ class HeatmapHead(BaseHead): padding = 0 output_padding = 0 else: - raise ValueError(f'Unsupported kernel size {kernel_size} for' - 'deconvlutional layers in ' - f'{self.__class__.__name__}') + raise ValueError(f"Unsupported kernel size {kernel_size} for" "deconvlutional layers in " f"{self.__class__.__name__}") cfg = dict( - type='deconv', + type="deconv", in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=2, padding=padding, output_padding=output_padding, - bias=False) + bias=False, + ) layers.append(build_upsample_layer(cfg)) layers.append(nn.BatchNorm2d(num_features=out_channels)) layers.append(nn.ReLU(inplace=True)) @@ -191,11 +176,7 @@ class HeatmapHead(BaseHead): @property def default_init_cfg(self): - init_cfg = [ - dict( - type='Normal', layer=['Conv2d', 'ConvTranspose2d'], std=0.001), - dict(type='Constant', layer='BatchNorm2d', val=1) - ] + init_cfg = [dict(type="Normal", layer=["Conv2d", "ConvTranspose2d"], std=0.001), dict(type="Constant", layer="BatchNorm2d", val=1)] return init_cfg def forward(self, feats: Tuple[Tensor]) -> Tensor: @@ -216,10 +197,7 @@ class HeatmapHead(BaseHead): return x - def predict(self, - feats: Features, - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Features, batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from features. Args: @@ -250,35 +228,31 @@ class HeatmapHead(BaseHead): - heatmaps (Tensor): The predicted heatmaps in shape (K, h, w) """ - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): # TTA: flip test -> feats = [orig, flipped] assert isinstance(feats, list) and len(feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] _feats, _feats_flip = feats _batch_heatmaps = self.forward(_feats) _batch_heatmaps_flip = flip_heatmaps( self.forward(_feats_flip), - flip_mode=test_cfg.get('flip_mode', 'heatmap'), + flip_mode=test_cfg.get("flip_mode", "heatmap"), flip_indices=flip_indices, - shift_heatmap=test_cfg.get('shift_heatmap', False)) + shift_heatmap=test_cfg.get("shift_heatmap", False), + ) batch_heatmaps = (_batch_heatmaps + _batch_heatmaps_flip) * 0.5 else: batch_heatmaps = self.forward(feats) preds = self.decode(batch_heatmaps) - if test_cfg.get('output_heatmaps', False): - pred_fields = [ - PixelData(heatmaps=hm) for hm in batch_heatmaps.detach() - ] + if test_cfg.get("output_heatmaps", False): + pred_fields = [PixelData(heatmaps=hm) for hm in batch_heatmaps.detach()] return preds, pred_fields else: return preds - def loss(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples. Args: @@ -292,11 +266,8 @@ class HeatmapHead(BaseHead): dict: A dictionary of losses. """ pred_fields = self.forward(feats) - gt_heatmaps = torch.stack( - [d.gt_fields.heatmaps for d in batch_data_samples]) - keypoint_weights = torch.cat([ - d.gt_instance_labels.keypoint_weights for d in batch_data_samples - ]) + gt_heatmaps = torch.stack([d.gt_fields.heatmaps for d in batch_data_samples]) + keypoint_weights = torch.cat([d.gt_instance_labels.keypoint_weights for d in batch_data_samples]) # calculate losses losses = dict() @@ -305,26 +276,24 @@ class HeatmapHead(BaseHead): losses.update(loss_kpt=loss) # calculate accuracy - if train_cfg.get('compute_acc', True): + if train_cfg.get("compute_acc", True): _, avg_acc, _ = pose_pck_accuracy( - output=to_numpy(pred_fields), - target=to_numpy(gt_heatmaps), - mask=to_numpy(keypoint_weights) > 0) + output=to_numpy(pred_fields), target=to_numpy(gt_heatmaps), mask=to_numpy(keypoint_weights) > 0 + ) acc_pose = torch.tensor(avg_acc, device=gt_heatmaps.device) losses.update(acc_pose=acc_pose) return losses - def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, - **kwargs): + def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, **kwargs): """A hook function to convert old-version state dict of :class:`TopdownHeatmapSimpleHead` (before MMPose v1.0.0) to a compatible format of :class:`HeatmapHead`. The hook will be automatically registered during initialization. """ - version = local_meta.get('version', None) + version = local_meta.get("version", None) if version and version >= self._version: return @@ -334,7 +303,7 @@ class HeatmapHead(BaseHead): if not _k.startswith(prefix): continue v = state_dict.pop(_k) - k = _k[len(prefix):] + k = _k[len(prefix) :] # In old version, "final_layer" includes both intermediate # conv layers (new "conv_layers") and final conv layers (new # "final_layer"). @@ -347,17 +316,17 @@ class HeatmapHead(BaseHead): # have keys like "final_layer.n.xxx", where the weights of the last # one should be renamed "final_layer.xxx", and others should be # renamed "conv_layers.n.xxx" - k_parts = k.split('.') - if k_parts[0] == 'final_layer': + k_parts = k.split(".") + if k_parts[0] == "final_layer": if len(k_parts) == 3: assert isinstance(self.conv_layers, nn.Sequential) idx = int(k_parts[1]) if idx < len(self.conv_layers): # final_layer.n.xxx -> conv_layers.n.xxx - k_new = 'conv_layers.' + '.'.join(k_parts[1:]) + k_new = "conv_layers." + ".".join(k_parts[1:]) else: # final_layer.n.xxx -> final_layer.xxx - k_new = 'final_layer.' + k_parts[2] + k_new = "final_layer." + k_parts[2] else: # final_layer.xxx remains final_layer.xxx k_new = k diff --git a/mmpose/models/heads/heatmap_heads/internet_head.py b/mmpose/models/heads/heatmap_heads/internet_head.py index 62de8e96db769ec18b64d0483adbc0f2fadde635..183d10d2b6048e5cb2ad3071663ee872fc6f0430 100644 --- a/mmpose/models/heads/heatmap_heads/internet_head.py +++ b/mmpose/models/heads/heatmap_heads/internet_head.py @@ -12,8 +12,8 @@ from mmpose.models.necks import GlobalAveragePooling from mmpose.models.utils.tta import flip_heatmaps from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, Features, InstanceList, - OptConfigType, OptSampleList, Predictions) +from mmpose.utils.typing import ConfigType, Features, InstanceList, OptConfigType, OptSampleList, Predictions + from ..base_head import BaseHead from .heatmap_head import HeatmapHead @@ -25,8 +25,7 @@ def make_linear_layers(feat_dims, relu_final=False): layers = [] for i in range(len(feat_dims) - 1): layers.append(nn.Linear(feat_dims[i], feat_dims[i + 1])) - if i < len(feat_dims) - 2 or \ - (i == len(feat_dims) - 2 and relu_final): + if i < len(feat_dims) - 2 or (i == len(feat_dims) - 2 and relu_final): layers.append(nn.ReLU(inplace=True)) return nn.Sequential(*layers) @@ -53,14 +52,16 @@ class Heatmap3DHead(HeatmapHead): :attr:`default_init_cfg` for default settings. """ - def __init__(self, - in_channels: Union[int, Sequence[int]], - out_channels: int, - depth_size: int = 64, - deconv_out_channels: OptIntSeq = (256, 256, 256), - deconv_kernel_sizes: OptIntSeq = (4, 4, 4), - final_layer: dict = dict(kernel_size=1), - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + out_channels: int, + depth_size: int = 64, + deconv_out_channels: OptIntSeq = (256, 256, 256), + deconv_kernel_sizes: OptIntSeq = (4, 4, 4), + final_layer: dict = dict(kernel_size=1), + init_cfg: OptConfigType = None, + ): super().__init__( in_channels=in_channels, @@ -68,7 +69,8 @@ class Heatmap3DHead(HeatmapHead): deconv_out_channels=deconv_out_channels, deconv_kernel_sizes=deconv_kernel_sizes, final_layer=final_layer, - init_cfg=init_cfg) + init_cfg=init_cfg, + ) assert out_channels % depth_size == 0 self.depth_size = depth_size @@ -104,10 +106,7 @@ class Heatmap1DHead(nn.Module): Defaults to ``(512, )``. """ - def __init__(self, - in_channels: int = 2048, - heatmap_size: int = 64, - hidden_dims: Sequence[int] = (512, )): + def __init__(self, in_channels: int = 2048, heatmap_size: int = 64, hidden_dims: Sequence[int] = (512,)): super().__init__() @@ -119,9 +118,7 @@ class Heatmap1DHead(nn.Module): def soft_argmax_1d(self, heatmap1d): heatmap1d = F.softmax(heatmap1d, 1) - accu = heatmap1d * torch.arange( - self.heatmap_size, dtype=heatmap1d.dtype, - device=heatmap1d.device)[None, :] + accu = heatmap1d * torch.arange(self.heatmap_size, dtype=heatmap1d.dtype, device=heatmap1d.device)[None, :] coord = accu.sum(dim=1) return coord @@ -156,10 +153,7 @@ class MultilabelClassificationHead(nn.Module): Defaults to ``(512, )``. """ - def __init__(self, - in_channels: int = 2048, - num_labels: int = 2, - hidden_dims: Sequence[int] = (512, )): + def __init__(self, in_channels: int = 2048, num_labels: int = 2, hidden_dims: Sequence[int] = (512,)): super().__init__() @@ -206,18 +200,17 @@ class InternetHead(BaseHead): _version = 2 - def __init__(self, - keypoint_head_cfg: ConfigType, - root_head_cfg: ConfigType, - hand_type_head_cfg: ConfigType, - loss: ConfigType = dict( - type='KeypointMSELoss', use_target_weight=True), - loss_root_depth: ConfigType = dict( - type='L1Loss', use_target_weight=True), - loss_hand_type: ConfigType = dict( - type='BCELoss', use_target_weight=True), - decoder: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + keypoint_head_cfg: ConfigType, + root_head_cfg: ConfigType, + hand_type_head_cfg: ConfigType, + loss: ConfigType = dict(type="KeypointMSELoss", use_target_weight=True), + loss_root_depth: ConfigType = dict(type="L1Loss", use_target_weight=True), + loss_hand_type: ConfigType = dict(type="BCELoss", use_target_weight=True), + decoder: OptConfigType = None, + init_cfg: OptConfigType = None, + ): super().__init__() @@ -225,8 +218,7 @@ class InternetHead(BaseHead): self.right_hand_head = Heatmap3DHead(**keypoint_head_cfg) self.left_hand_head = Heatmap3DHead(**keypoint_head_cfg) self.root_head = Heatmap1DHead(**root_head_cfg) - self.hand_type_head = MultilabelClassificationHead( - **hand_type_head_cfg) + self.hand_type_head = MultilabelClassificationHead(**hand_type_head_cfg) self.neck = GlobalAveragePooling() self.loss_module = MODELS.build(loss) @@ -251,18 +243,13 @@ class InternetHead(BaseHead): """ x = feats[-1] outputs = [] - outputs.append( - torch.cat([self.right_hand_head(x), - self.left_hand_head(x)], dim=1)) + outputs.append(torch.cat([self.right_hand_head(x), self.left_hand_head(x)], dim=1)) x = self.neck(x) outputs.append(self.root_head(x)) outputs.append(self.hand_type_head(x)) return outputs - def predict(self, - feats: Features, - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Features, batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from features. Args: @@ -286,10 +273,10 @@ class InternetHead(BaseHead): shape (num_instances, K) """ - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): # TTA: flip test -> feats = [orig, flipped] assert isinstance(feats, list) and len(feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] _feats, _feats_flip = feats _batch_outputs = self.forward(_feats) _batch_heatmaps = _batch_outputs[0] @@ -297,9 +284,10 @@ class InternetHead(BaseHead): _batch_outputs_flip = self.forward(_feats_flip) _batch_heatmaps_flip = flip_heatmaps( _batch_outputs_flip[0], - flip_mode=test_cfg.get('flip_mode', 'heatmap'), + flip_mode=test_cfg.get("flip_mode", "heatmap"), flip_indices=flip_indices, - shift_heatmap=test_cfg.get('shift_heatmap', False)) + shift_heatmap=test_cfg.get("shift_heatmap", False), + ) batch_heatmaps = (_batch_heatmaps + _batch_heatmaps_flip) * 0.5 @@ -324,10 +312,7 @@ class InternetHead(BaseHead): return preds - def loss(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples. Args: @@ -343,13 +328,8 @@ class InternetHead(BaseHead): pred_fields = self.forward(feats) pred_heatmaps = pred_fields[0] _, K, D, W, H = pred_heatmaps.shape - gt_heatmaps = torch.stack([ - d.gt_fields.heatmaps.reshape(K, D, W, H) - for d in batch_data_samples - ]) - keypoint_weights = torch.cat([ - d.gt_instance_labels.keypoint_weights for d in batch_data_samples - ]) + gt_heatmaps = torch.stack([d.gt_fields.heatmaps.reshape(K, D, W, H) for d in batch_data_samples]) + keypoint_weights = torch.cat([d.gt_instance_labels.keypoint_weights for d in batch_data_samples]) # calculate losses losses = dict() @@ -359,39 +339,27 @@ class InternetHead(BaseHead): losses.update(loss_kpt=loss) # relative root depth loss - gt_roots = torch.stack( - [d.gt_instance_labels.root_depth for d in batch_data_samples]) - root_weights = torch.stack([ - d.gt_instance_labels.root_depth_weight for d in batch_data_samples - ]) - loss_root = self.root_loss_module(pred_fields[1], gt_roots, - root_weights) + gt_roots = torch.stack([d.gt_instance_labels.root_depth for d in batch_data_samples]) + root_weights = torch.stack([d.gt_instance_labels.root_depth_weight for d in batch_data_samples]) + loss_root = self.root_loss_module(pred_fields[1], gt_roots, root_weights) losses.update(loss_rel_root=loss_root) # hand type loss - gt_types = torch.stack([ - d.gt_instance_labels.type.reshape(-1) for d in batch_data_samples - ]) - type_weights = torch.stack( - [d.gt_instance_labels.type_weight for d in batch_data_samples]) - loss_type = self.hand_loss_module(pred_fields[2], gt_types, - type_weights) + gt_types = torch.stack([d.gt_instance_labels.type.reshape(-1) for d in batch_data_samples]) + type_weights = torch.stack([d.gt_instance_labels.type_weight for d in batch_data_samples]) + loss_type = self.hand_loss_module(pred_fields[2], gt_types, type_weights) losses.update(loss_hand_type=loss_type) # calculate accuracy - if train_cfg.get('compute_acc', True): - acc = multilabel_classification_accuracy( - pred=to_numpy(pred_fields[2]), - gt=to_numpy(gt_types), - mask=to_numpy(type_weights)) + if train_cfg.get("compute_acc", True): + acc = multilabel_classification_accuracy(pred=to_numpy(pred_fields[2]), gt=to_numpy(gt_types), mask=to_numpy(type_weights)) acc_pose = torch.tensor(acc, device=gt_types.device) losses.update(acc_pose=acc_pose) return losses - def decode(self, batch_outputs: Union[Tensor, - Tuple[Tensor]]) -> InstanceList: + def decode(self, batch_outputs: Union[Tensor, Tuple[Tensor]]) -> InstanceList: """Decode keypoints from outputs. Args: @@ -405,14 +373,15 @@ class InternetHead(BaseHead): def _pack_and_call(args, func): if not isinstance(args, tuple): - args = (args, ) + args = (args,) return func(*args) if self.decoder is None: raise RuntimeError( - f'The decoder has not been set in {self.__class__.__name__}. ' - 'Please set the decoder configs in the init parameters to ' - 'enable head methods `head.predict()` and `head.decode()`') + f"The decoder has not been set in {self.__class__.__name__}. " + "Please set the decoder configs in the init parameters to " + "enable head methods `head.predict()` and `head.decode()`" + ) batch_output_np = to_numpy(batch_outputs[0], unzip=True) batch_root_np = to_numpy(batch_outputs[1], unzip=True) @@ -421,23 +390,16 @@ class InternetHead(BaseHead): batch_scores = [] batch_roots = [] batch_types = [] - for outputs, roots, types in zip(batch_output_np, batch_root_np, - batch_type_np): - keypoints, scores, rel_root_depth, hand_type = _pack_and_call( - tuple([outputs, roots, types]), self.decoder.decode) + for outputs, roots, types in zip(batch_output_np, batch_root_np, batch_type_np): + keypoints, scores, rel_root_depth, hand_type = _pack_and_call(tuple([outputs, roots, types]), self.decoder.decode) batch_keypoints.append(keypoints) batch_scores.append(scores) batch_roots.append(rel_root_depth) batch_types.append(hand_type) preds = [ - InstanceData( - keypoints=keypoints, - keypoint_scores=scores, - rel_root_depth=rel_root_depth, - hand_type=hand_type) - for keypoints, scores, rel_root_depth, hand_type in zip( - batch_keypoints, batch_scores, batch_roots, batch_types) + InstanceData(keypoints=keypoints, keypoint_scores=scores, rel_root_depth=rel_root_depth, hand_type=hand_type) + for keypoints, scores, rel_root_depth, hand_type in zip(batch_keypoints, batch_scores, batch_roots, batch_types) ] return preds diff --git a/mmpose/models/heads/heatmap_heads/mspn_head.py b/mmpose/models/heads/heatmap_heads/mspn_head.py index 8b7cddf7988bfc57cae314ef944f44b4d0d7df09..fabb890df3b241fc7d086cfa2a58f9a182282731 100644 --- a/mmpose/models/heads/heatmap_heads/mspn_head.py +++ b/mmpose/models/heads/heatmap_heads/mspn_head.py @@ -3,8 +3,7 @@ import copy from typing import List, Optional, Sequence, Union import torch -from mmcv.cnn import (ConvModule, DepthwiseSeparableConvModule, Linear, - build_activation_layer, build_norm_layer) +from mmcv.cnn import ConvModule, DepthwiseSeparableConvModule, Linear, build_activation_layer, build_norm_layer from mmengine.structures import PixelData from torch import Tensor, nn @@ -12,8 +11,8 @@ from mmpose.evaluation.functional import pose_pck_accuracy from mmpose.models.utils.tta import flip_heatmaps from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, MultiConfig, OptConfigType, - OptSampleList, Predictions) +from mmpose.utils.typing import ConfigType, MultiConfig, OptConfigType, OptSampleList, Predictions + from ..base_head import BaseHead OptIntSeq = Optional[Sequence[int]] @@ -33,9 +32,7 @@ class PRM(nn.Module): Defaults to ``dict(type='BN')`` """ - def __init__(self, - out_channels: int, - norm_cfg: ConfigType = dict(type='BN')): + def __init__(self, out_channels: int, norm_cfg: ConfigType = dict(type="BN")): super().__init__() # Protect mutable default arguments @@ -44,38 +41,22 @@ class PRM(nn.Module): self.global_pooling = nn.AdaptiveAvgPool2d((1, 1)) self.middle_path = nn.Sequential( Linear(self.out_channels, self.out_channels), - build_norm_layer(dict(type='BN1d'), out_channels)[1], - build_activation_layer(dict(type='ReLU')), + build_norm_layer(dict(type="BN1d"), out_channels)[1], + build_activation_layer(dict(type="ReLU")), Linear(self.out_channels, self.out_channels), - build_norm_layer(dict(type='BN1d'), out_channels)[1], - build_activation_layer(dict(type='ReLU')), - build_activation_layer(dict(type='Sigmoid'))) + build_norm_layer(dict(type="BN1d"), out_channels)[1], + build_activation_layer(dict(type="ReLU")), + build_activation_layer(dict(type="Sigmoid")), + ) self.bottom_path = nn.Sequential( - ConvModule( - self.out_channels, - self.out_channels, - kernel_size=1, - stride=1, - padding=0, - norm_cfg=norm_cfg, - inplace=False), - DepthwiseSeparableConvModule( - self.out_channels, - 1, - kernel_size=9, - stride=1, - padding=4, - norm_cfg=norm_cfg, - inplace=False), build_activation_layer(dict(type='Sigmoid'))) + ConvModule(self.out_channels, self.out_channels, kernel_size=1, stride=1, padding=0, norm_cfg=norm_cfg, inplace=False), + DepthwiseSeparableConvModule(self.out_channels, 1, kernel_size=9, stride=1, padding=4, norm_cfg=norm_cfg, inplace=False), + build_activation_layer(dict(type="Sigmoid")), + ) self.conv_bn_relu_prm_1 = ConvModule( - self.out_channels, - self.out_channels, - kernel_size=3, - stride=1, - padding=1, - norm_cfg=norm_cfg, - inplace=False) + self.out_channels, self.out_channels, kernel_size=3, stride=1, padding=1, norm_cfg=norm_cfg, inplace=False + ) def forward(self, x: Tensor) -> Tensor: """Forward the network. The input heatmaps will be refined. @@ -113,12 +94,9 @@ class PredictHeatmap(nn.Module): Defaults to ``dict(type='BN')`` """ - def __init__(self, - unit_channels: int, - out_channels: int, - out_shape: tuple, - use_prm: bool = False, - norm_cfg: ConfigType = dict(type='BN')): + def __init__( + self, unit_channels: int, out_channels: int, out_shape: tuple, use_prm: bool = False, norm_cfg: ConfigType = dict(type="BN") + ): super().__init__() @@ -131,23 +109,9 @@ class PredictHeatmap(nn.Module): if use_prm: self.prm = PRM(out_channels, norm_cfg=norm_cfg) self.conv_layers = nn.Sequential( - ConvModule( - unit_channels, - unit_channels, - kernel_size=1, - stride=1, - padding=0, - norm_cfg=norm_cfg, - inplace=False), - ConvModule( - unit_channels, - out_channels, - kernel_size=3, - stride=1, - padding=1, - norm_cfg=norm_cfg, - act_cfg=None, - inplace=False)) + ConvModule(unit_channels, unit_channels, kernel_size=1, stride=1, padding=0, norm_cfg=norm_cfg, inplace=False), + ConvModule(unit_channels, out_channels, kernel_size=3, stride=1, padding=1, norm_cfg=norm_cfg, act_cfg=None, inplace=False), + ) def forward(self, feature: Tensor) -> Tensor: """Forward the network. @@ -159,8 +123,7 @@ class PredictHeatmap(nn.Module): Tensor: output heatmaps. """ feature = self.conv_layers(feature) - output = nn.functional.interpolate( - feature, size=self.out_shape, mode='bilinear', align_corners=True) + output = nn.functional.interpolate(feature, size=self.out_shape, mode="bilinear", align_corners=True) if self.use_prm: output = self.prm(output) return output @@ -198,21 +161,23 @@ class MSPNHead(BaseHead): .. _`MSPN`: https://arxiv.org/abs/1901.00148 .. _`RSN`: https://arxiv.org/abs/2003.04030 """ + _version = 2 - def __init__(self, - num_stages: int = 4, - num_units: int = 4, - out_shape: tuple = (64, 48), - unit_channels: int = 256, - out_channels: int = 17, - use_prm: bool = False, - norm_cfg: ConfigType = dict(type='BN'), - level_indices: Sequence[int] = [], - loss: MultiConfig = dict( - type='KeypointMSELoss', use_target_weight=True), - decoder: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + num_stages: int = 4, + num_units: int = 4, + out_shape: tuple = (64, 48), + unit_channels: int = 256, + out_channels: int = 17, + use_prm: bool = False, + norm_cfg: ConfigType = dict(type="BN"), + level_indices: Sequence[int] = [], + loss: MultiConfig = dict(type="KeypointMSELoss", use_target_weight=True), + decoder: OptConfigType = None, + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg super().__init__(init_cfg) @@ -224,23 +189,22 @@ class MSPNHead(BaseHead): self.out_channels = out_channels if len(level_indices) != num_stages * num_units: raise ValueError( - f'The length of level_indices({len(level_indices)}) did not ' - f'match `num_stages`({num_stages}) * `num_units`({num_units})') + f"The length of level_indices({len(level_indices)}) did not " f"match `num_stages`({num_stages}) * `num_units`({num_units})" + ) self.level_indices = level_indices if isinstance(loss, list) and len(loss) != num_stages * num_units: raise ValueError( - f'The length of loss_module({len(loss)}) did not match ' - f'`num_stages`({num_stages}) * `num_units`({num_units})') + f"The length of loss_module({len(loss)}) did not match " f"`num_stages`({num_stages}) * `num_units`({num_units})" + ) if isinstance(loss, list): if len(loss) != num_stages * num_units: raise ValueError( - f'The length of loss_module({len(loss)}) did not match ' - f'`num_stages`({num_stages}) * `num_units`({num_units})') - self.loss_module = nn.ModuleList( - MODELS.build(_loss) for _loss in loss) + f"The length of loss_module({len(loss)}) did not match " f"`num_stages`({num_stages}) * `num_units`({num_units})" + ) + self.loss_module = nn.ModuleList(MODELS.build(_loss) for _loss in loss) else: self.loss_module = MODELS.build(loss) @@ -255,21 +219,15 @@ class MSPNHead(BaseHead): self.predict_layers = nn.ModuleList([]) for i in range(self.num_stages): for j in range(self.num_units): - self.predict_layers.append( - PredictHeatmap( - unit_channels, - out_channels, - out_shape, - use_prm, - norm_cfg=norm_cfg)) + self.predict_layers.append(PredictHeatmap(unit_channels, out_channels, out_shape, use_prm, norm_cfg=norm_cfg)) @property def default_init_cfg(self): """Default config for weight initialization.""" init_cfg = [ - dict(type='Kaiming', layer='Conv2d'), - dict(type='Normal', layer='Linear', std=0.01), - dict(type='Constant', layer='BatchNorm2d', val=1), + dict(type="Kaiming", layer="Conv2d"), + dict(type="Normal", layer="Linear", std=0.01), + dict(type="Constant", layer="BatchNorm2d", val=1), ] return init_cfg @@ -286,17 +244,13 @@ class MSPNHead(BaseHead): and units. """ out = [] - assert len(feats) == self.num_stages, ( - f'The length of feature maps did not match the ' - f'`num_stages` in {self.__class__.__name__}') + assert len(feats) == self.num_stages, f"The length of feature maps did not match the " f"`num_stages` in {self.__class__.__name__}" for feat in feats: - assert len(feat) == self.num_units, ( - f'The length of feature maps did not match the ' - f'`num_units` in {self.__class__.__name__}') + assert len(feat) == self.num_units, f"The length of feature maps did not match the " f"`num_units` in {self.__class__.__name__}" for f in feat: assert f.shape[1] == self.unit_channels, ( - f'The number of feature map channels did not match the ' - f'`unit_channels` in {self.__class__.__name__}') + f"The number of feature map channels did not match the " f"`unit_channels` in {self.__class__.__name__}" + ) for i in range(self.num_stages): for j in range(self.num_units): @@ -304,10 +258,9 @@ class MSPNHead(BaseHead): out.append(y) return out - def predict(self, - feats: Union[MSMUFeatures, List[MSMUFeatures]], - batch_data_samples: OptSampleList, - test_cfg: OptConfigType = {}) -> Predictions: + def predict( + self, feats: Union[MSMUFeatures, List[MSMUFeatures]], batch_data_samples: OptSampleList, test_cfg: OptConfigType = {} + ) -> Predictions: """Predict results from multi-stage feature maps. Args: @@ -338,17 +291,18 @@ class MSPNHead(BaseHead): - heatmaps (Tensor): The predicted heatmaps in shape (K, h, w) """ # multi-stage multi-unit batch heatmaps - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): # TTA: flip test assert isinstance(feats, list) and len(feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] _feats, _feats_flip = feats _batch_heatmaps = self.forward(_feats)[-1] _batch_heatmaps_flip = flip_heatmaps( self.forward(_feats_flip)[-1], - flip_mode=test_cfg.get('flip_mode', 'heatmap'), + flip_mode=test_cfg.get("flip_mode", "heatmap"), flip_indices=flip_indices, - shift_heatmap=test_cfg.get('shift_heatmap', False)) + shift_heatmap=test_cfg.get("shift_heatmap", False), + ) batch_heatmaps = (_batch_heatmaps + _batch_heatmaps_flip) * 0.5 else: msmu_batch_heatmaps = self.forward(feats) @@ -356,18 +310,13 @@ class MSPNHead(BaseHead): preds = self.decode(batch_heatmaps) - if test_cfg.get('output_heatmaps', False): - pred_fields = [ - PixelData(heatmaps=hm) for hm in batch_heatmaps.detach() - ] + if test_cfg.get("output_heatmaps", False): + pred_fields = [PixelData(heatmaps=hm) for hm in batch_heatmaps.detach()] return preds, pred_fields else: return preds - def loss(self, - feats: MSMUFeatures, - batch_data_samples: OptSampleList, - train_cfg: OptConfigType = {}) -> dict: + def loss(self, feats: MSMUFeatures, batch_data_samples: OptSampleList, train_cfg: OptConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples. Note: @@ -392,9 +341,7 @@ class MSPNHead(BaseHead): # multi-stage multi-unit predict heatmaps msmu_pred_heatmaps = self.forward(feats) - keypoint_weights = torch.cat([ - d.gt_instance_labels.keypoint_weights for d in batch_data_samples - ]) # shape: [B*N, L, K] + keypoint_weights = torch.cat([d.gt_instance_labels.keypoint_weights for d in batch_data_samples]) # shape: [B*N, L, K] # calculate losses over multiple stages and multiple units losses = dict() @@ -408,23 +355,18 @@ class MSPNHead(BaseHead): # select `gt_heatmaps` and `keypoint_weights` for different level # according to `self.level_indices` to calculate loss - gt_heatmaps = torch.stack([ - d.gt_fields[self.level_indices[i]].heatmaps - for d in batch_data_samples - ]) - loss_i = loss_func(msmu_pred_heatmaps[i], gt_heatmaps, - keypoint_weights[:, self.level_indices[i]]) - - if 'loss_kpt' not in losses: - losses['loss_kpt'] = loss_i + gt_heatmaps = torch.stack([d.gt_fields[self.level_indices[i]].heatmaps for d in batch_data_samples]) + loss_i = loss_func(msmu_pred_heatmaps[i], gt_heatmaps, keypoint_weights[:, self.level_indices[i]]) + + if "loss_kpt" not in losses: + losses["loss_kpt"] = loss_i else: - losses['loss_kpt'] += loss_i + losses["loss_kpt"] += loss_i # calculate accuracy _, avg_acc, _ = pose_pck_accuracy( - output=to_numpy(msmu_pred_heatmaps[-1]), - target=to_numpy(gt_heatmaps), - mask=to_numpy(keypoint_weights[:, -1]) > 0) + output=to_numpy(msmu_pred_heatmaps[-1]), target=to_numpy(gt_heatmaps), mask=to_numpy(keypoint_weights[:, -1]) > 0 + ) acc_pose = torch.tensor(avg_acc, device=gt_heatmaps.device) losses.update(acc_pose=acc_pose) diff --git a/mmpose/models/heads/heatmap_heads/vipnas_head.py b/mmpose/models/heads/heatmap_heads/vipnas_head.py index 949ee95b096124a162f6d9719446fa80bd26a201..5eb91a406f6b0166621090d95aa7397ab4227104 100644 --- a/mmpose/models/heads/heatmap_heads/vipnas_head.py +++ b/mmpose/models/heads/heatmap_heads/vipnas_head.py @@ -6,6 +6,7 @@ from torch import nn from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.typing import ConfigType, OptConfigType + from .heatmap_head import HeatmapHead OptIntSeq = Optional[Sequence[int]] @@ -54,19 +55,20 @@ class ViPNASHead(HeatmapHead): _version = 2 - def __init__(self, - in_channels: Union[int, Sequence[int]], - out_channels: int, - deconv_out_channels: OptIntSeq = (144, 144, 144), - deconv_kernel_sizes: OptIntSeq = (4, 4, 4), - deconv_num_groups: OptIntSeq = (16, 16, 16), - conv_out_channels: OptIntSeq = None, - conv_kernel_sizes: OptIntSeq = None, - final_layer: dict = dict(kernel_size=1), - loss: ConfigType = dict( - type='KeypointMSELoss', use_target_weight=True), - decoder: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + out_channels: int, + deconv_out_channels: OptIntSeq = (144, 144, 144), + deconv_kernel_sizes: OptIntSeq = (4, 4, 4), + deconv_num_groups: OptIntSeq = (16, 16, 16), + conv_out_channels: OptIntSeq = None, + conv_kernel_sizes: OptIntSeq = None, + final_layer: dict = dict(kernel_size=1), + loss: ConfigType = dict(type="KeypointMSELoss", use_target_weight=True), + decoder: OptConfigType = None, + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg @@ -82,20 +84,20 @@ class ViPNASHead(HeatmapHead): self.decoder = None if deconv_out_channels: - if deconv_kernel_sizes is None or len(deconv_out_channels) != len( - deconv_kernel_sizes): + if deconv_kernel_sizes is None or len(deconv_out_channels) != len(deconv_kernel_sizes): raise ValueError( '"deconv_out_channels" and "deconv_kernel_sizes" should ' - 'be integer sequences with the same length. Got ' - f'mismatched lengths {deconv_out_channels} and ' - f'{deconv_kernel_sizes}') - if deconv_num_groups is None or len(deconv_out_channels) != len( - deconv_num_groups): + "be integer sequences with the same length. Got " + f"mismatched lengths {deconv_out_channels} and " + f"{deconv_kernel_sizes}" + ) + if deconv_num_groups is None or len(deconv_out_channels) != len(deconv_num_groups): raise ValueError( '"deconv_out_channels" and "deconv_num_groups" should ' - 'be integer sequences with the same length. Got ' - f'mismatched lengths {deconv_out_channels} and ' - f'{deconv_num_groups}') + "be integer sequences with the same length. Got " + f"mismatched lengths {deconv_out_channels} and " + f"{deconv_num_groups}" + ) self.deconv_layers = self._make_deconv_layers( in_channels=in_channels, @@ -108,28 +110,23 @@ class ViPNASHead(HeatmapHead): self.deconv_layers = nn.Identity() if conv_out_channels: - if conv_kernel_sizes is None or len(conv_out_channels) != len( - conv_kernel_sizes): + if conv_kernel_sizes is None or len(conv_out_channels) != len(conv_kernel_sizes): raise ValueError( '"conv_out_channels" and "conv_kernel_sizes" should ' - 'be integer sequences with the same length. Got ' - f'mismatched lengths {conv_out_channels} and ' - f'{conv_kernel_sizes}') + "be integer sequences with the same length. Got " + f"mismatched lengths {conv_out_channels} and " + f"{conv_kernel_sizes}" + ) self.conv_layers = self._make_conv_layers( - in_channels=in_channels, - layer_out_channels=conv_out_channels, - layer_kernel_sizes=conv_kernel_sizes) + in_channels=in_channels, layer_out_channels=conv_out_channels, layer_kernel_sizes=conv_kernel_sizes + ) in_channels = conv_out_channels[-1] else: self.conv_layers = nn.Identity() if final_layer is not None: - cfg = dict( - type='Conv2d', - in_channels=in_channels, - out_channels=out_channels, - kernel_size=1) + cfg = dict(type="Conv2d", in_channels=in_channels, out_channels=out_channels, kernel_size=1) cfg.update(final_layer) self.final_layer = build_conv_layer(cfg) else: @@ -138,16 +135,13 @@ class ViPNASHead(HeatmapHead): # Register the hook to automatically convert old version state dicts self._register_load_state_dict_pre_hook(self._load_state_dict_pre_hook) - def _make_deconv_layers(self, in_channels: int, - layer_out_channels: Sequence[int], - layer_kernel_sizes: Sequence[int], - layer_groups: Sequence[int]) -> nn.Module: + def _make_deconv_layers( + self, in_channels: int, layer_out_channels: Sequence[int], layer_kernel_sizes: Sequence[int], layer_groups: Sequence[int] + ) -> nn.Module: """Create deconvolutional layers by given parameters.""" layers = [] - for out_channels, kernel_size, groups in zip(layer_out_channels, - layer_kernel_sizes, - layer_groups): + for out_channels, kernel_size, groups in zip(layer_out_channels, layer_kernel_sizes, layer_groups): if kernel_size == 4: padding = 1 output_padding = 0 @@ -158,11 +152,9 @@ class ViPNASHead(HeatmapHead): padding = 0 output_padding = 0 else: - raise ValueError(f'Unsupported kernel size {kernel_size} for' - 'deconvlutional layers in ' - f'{self.__class__.__name__}') + raise ValueError(f"Unsupported kernel size {kernel_size} for" "deconvlutional layers in " f"{self.__class__.__name__}") cfg = dict( - type='deconv', + type="deconv", in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, @@ -170,7 +162,8 @@ class ViPNASHead(HeatmapHead): stride=2, padding=padding, output_padding=output_padding, - bias=False) + bias=False, + ) layers.append(build_upsample_layer(cfg)) layers.append(nn.BatchNorm2d(num_features=out_channels)) layers.append(nn.ReLU(inplace=True)) diff --git a/mmpose/models/heads/hybrid_heads/__init__.py b/mmpose/models/heads/hybrid_heads/__init__.py index 767f87a19c10871033c1cb5bbbadc226d5bd4551..4149b8bd64928f59b9d278a36049d115f1f04a09 100644 --- a/mmpose/models/heads/hybrid_heads/__init__.py +++ b/mmpose/models/heads/hybrid_heads/__init__.py @@ -1,9 +1,10 @@ # Copyright (c) OpenMMLab. All rights reserved. +from .calibration_head import CalibrationHead from .dekr_head import DEKRHead +from .multi_head import MultiHead +from .poseid_head import PoseIDHead from .rtmo_head import RTMOHead from .vis_head import VisPredictHead from .yoloxpose_head import YOLOXPoseHead -from .poseid_head import PoseIDHead -from .calibration_head import CalibrationHead -__all__ = ['DEKRHead', 'VisPredictHead', 'YOLOXPoseHead', 'RTMOHead', 'PoseIDHead', 'CalibrationHead'] +__all__ = ["DEKRHead", "VisPredictHead", "YOLOXPoseHead", "RTMOHead", "MultiHead", "PoseIDHead", "CalibrationHead"] diff --git a/mmpose/models/heads/hybrid_heads/calibration_head.py b/mmpose/models/heads/hybrid_heads/calibration_head.py index d29d440a896165318cd5753f0627a9d6806c3bfe..3995ade0cb388309bcbb385e04430663349967f4 100644 --- a/mmpose/models/heads/hybrid_heads/calibration_head.py +++ b/mmpose/models/heads/hybrid_heads/calibration_head.py @@ -1,6 +1,10 @@ -# Copyright (c) OpenMMLab. All rights reserved. +# Copyright (c) Miroslav Purkrabek, ProbPose. All rights reserved. +import os +import shutil from typing import Optional, Sequence, Tuple, Union +import cv2 +import numpy as np import torch from mmcv.cnn import build_conv_layer, build_upsample_layer from mmengine.structures import PixelData @@ -9,27 +13,19 @@ from torch import Tensor, nn from mmpose.evaluation.functional import pose_pck_accuracy from mmpose.models.utils.tta import flip_heatmaps from mmpose.registry import KEYPOINT_CODECS, MODELS +from mmpose.structures.keypoint import fix_bbox_aspect_ratio from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, Features, OptConfigType, - OptSampleList, Predictions) -from ..base_head import BaseHead - -import numpy as np - +from mmpose.utils.typing import ConfigType, Features, OptConfigType, OptSampleList, Predictions from sparsemax import Sparsemax -import os -import shutil -import cv2 - -from mmpose.structures.keypoint import fix_bbox_aspect_ratio +from ..base_head import BaseHead OptIntSeq = Optional[Sequence[int]] @MODELS.register_module() class CalibrationHead(BaseHead): - """Multi-variate head predicting all information about keypoints. Apart + """Multi-variate head predicting all information about keypoints. Apart from the heatmap, it also predicts: 1) Heatmap for each keypoint 2) Probability of keypoint being in the heatmap @@ -68,7 +64,7 @@ class CalibrationHead(BaseHead): :class:`MSELoss` error_loss (Config): Config of the error loss. Defaults to use :class:`L1LogLoss` - normalize (bool): Whether to normalize values in the heatmaps between + normalize (bool): Whether to normalize values in the heatmaps between 0 and 1 with sigmoid. Defaults to ``False`` detach_probability (bool): Whether to detach the probability from gradient computation. Defaults to ``True`` @@ -97,38 +93,32 @@ class CalibrationHead(BaseHead): _version = 2 - def __init__(self, - in_channels: Union[int, Sequence[int]], - out_channels: int, - deconv_out_channels: OptIntSeq = (256, 256, 256), - deconv_kernel_sizes: OptIntSeq = (4, 4, 4), - conv_out_channels: OptIntSeq = None, - conv_kernel_sizes: OptIntSeq = None, - final_layer_dict: dict = dict(kernel_size=1), - keypoint_loss: ConfigType = dict( - type='KeypointMSELoss', use_target_weight=True), - probability_loss: ConfigType = dict( - type='BCELoss', use_target_weight=True), - visibility_loss: ConfigType = dict( - type='BCELoss', use_target_weight=True), - oks_loss: ConfigType = dict( - type='MSELoss', use_target_weight=True), - error_loss: ConfigType = dict( - type='L1LogLoss', use_target_weight=True), - normalize: float = None, - detach_probability: bool = True, - detach_visibility: bool = True, - learn_heatmaps_from_zeros: bool = False, - freeze_heatmaps: bool = False, - freeze_probability: bool = False, - freeze_visibility: bool = False, - freeze_oks: bool = False, - freeze_error: bool = False, - decoder: OptConfigType = dict( - type='UDPHeatmap', input_size=(192, 256), - heatmap_size=(48, 64), sigma=2), - init_cfg: OptConfigType = None, - ): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + out_channels: int, + deconv_out_channels: OptIntSeq = (256, 256, 256), + deconv_kernel_sizes: OptIntSeq = (4, 4, 4), + conv_out_channels: OptIntSeq = None, + conv_kernel_sizes: OptIntSeq = None, + final_layer_dict: dict = dict(kernel_size=1), + keypoint_loss: ConfigType = dict(type="KeypointMSELoss", use_target_weight=True), + probability_loss: ConfigType = dict(type="BCELoss", use_target_weight=True), + visibility_loss: ConfigType = dict(type="BCELoss", use_target_weight=True), + oks_loss: ConfigType = dict(type="MSELoss", use_target_weight=True), + error_loss: ConfigType = dict(type="L1LogLoss", use_target_weight=True), + normalize: float = None, + detach_probability: bool = True, + detach_visibility: bool = True, + learn_heatmaps_from_zeros: bool = False, + freeze_heatmaps: bool = False, + freeze_probability: bool = False, + freeze_visibility: bool = False, + freeze_oks: bool = False, + freeze_error: bool = False, + decoder: OptConfigType = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2), + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg @@ -147,12 +137,8 @@ class CalibrationHead(BaseHead): self.gauss_sigma = 2.0 self.gauss_kernel_size = int(2.0 * 3.0 * self.gauss_sigma + 1.0) - ts = torch.linspace( - - self.gauss_kernel_size // 2, - self.gauss_kernel_size // 2, - self.gauss_kernel_size - ) - gauss = torch.exp(-(ts / self.gauss_sigma)**2 / 2) + ts = torch.linspace(-self.gauss_kernel_size // 2, self.gauss_kernel_size // 2, self.gauss_kernel_size) + gauss = torch.exp(-((ts / self.gauss_sigma) ** 2) / 2) gauss = gauss / gauss.sum() self.gauss_kernel = gauss.unsqueeze(0) * gauss.unsqueeze(1) @@ -165,8 +151,7 @@ class CalibrationHead(BaseHead): self.loss_vis_folder = "work_dirs/loss_vis_{:05d}".format(unique_hash) self.interval = 50 shutil.rmtree(self.loss_vis_folder, ignore_errors=True) - print("Will save heatmap visualizations to folder '{:s}'".format(self.loss_vis_folder)) - + self._build_heatmap_head( in_channels=in_channels, out_channels=out_channels, @@ -176,32 +161,21 @@ class CalibrationHead(BaseHead): conv_kernel_sizes=conv_kernel_sizes, final_layer_dict=final_layer_dict, normalize=normalize, - freeze=freeze_heatmaps) - + freeze=freeze_heatmaps, + ) + self.normalize = normalize - + self.detach_probability = detach_probability - self._build_probability_head( - in_channels=in_channels, - out_channels=out_channels, - freeze=freeze_probability) - + self._build_probability_head(in_channels=in_channels, out_channels=out_channels, freeze=freeze_probability) + self.detach_visibility = detach_visibility - self._build_visibility_head( - in_channels=in_channels, - out_channels=out_channels, - freeze=freeze_visibility) - - self._build_oks_head( - in_channels=in_channels, - out_channels=out_channels, - freeze=freeze_oks) + self._build_visibility_head(in_channels=in_channels, out_channels=out_channels, freeze=freeze_visibility) + + self._build_oks_head(in_channels=in_channels, out_channels=out_channels, freeze=freeze_oks) self.freeze_oks = freeze_oks - self._build_error_head( - in_channels=in_channels, - out_channels=out_channels, - freeze=freeze_error) + self._build_error_head(in_channels=in_channels, out_channels=out_channels, freeze=freeze_error) self.freeze_error = freeze_error # Register the hook to automatically convert old version state dicts @@ -210,33 +184,36 @@ class CalibrationHead(BaseHead): self._freeze_all_but_temperature() # Print all params and their gradients - print("\n", "="*20) + print("\n", "=" * 20) for name, param in self.named_parameters(): print(name, param.requires_grad) - def _freeze_all_but_temperature(self): for param in self.parameters(): param.requires_grad = False self.temperature.requires_grad = True - def _build_heatmap_head(self, in_channels: int, out_channels: int, - deconv_out_channels: Sequence[int], - deconv_kernel_sizes: Sequence[int], - conv_out_channels: Sequence[int], - conv_kernel_sizes: Sequence[int], - final_layer_dict: dict, - normalize: bool = False, - freeze: bool = False) -> nn.Module: + def _build_heatmap_head( + self, + in_channels: int, + out_channels: int, + deconv_out_channels: Sequence[int], + deconv_kernel_sizes: Sequence[int], + conv_out_channels: Sequence[int], + conv_kernel_sizes: Sequence[int], + final_layer_dict: dict, + normalize: bool = False, + freeze: bool = False, + ) -> nn.Module: """Build the heatmap head module.""" if deconv_out_channels: - if deconv_kernel_sizes is None or len(deconv_out_channels) != len( - deconv_kernel_sizes): + if deconv_kernel_sizes is None or len(deconv_out_channels) != len(deconv_kernel_sizes): raise ValueError( '"deconv_out_channels" and "deconv_kernel_sizes" should ' - 'be integer sequences with the same length. Got ' - f'mismatched lengths {deconv_out_channels} and ' - f'{deconv_kernel_sizes}') + "be integer sequences with the same length. Got " + f"mismatched lengths {deconv_out_channels} and " + f"{deconv_kernel_sizes}" + ) self.deconv_layers = self._make_deconv_layers( in_channels=in_channels, @@ -248,28 +225,23 @@ class CalibrationHead(BaseHead): self.deconv_layers = nn.Identity() if conv_out_channels: - if conv_kernel_sizes is None or len(conv_out_channels) != len( - conv_kernel_sizes): + if conv_kernel_sizes is None or len(conv_out_channels) != len(conv_kernel_sizes): raise ValueError( '"conv_out_channels" and "conv_kernel_sizes" should ' - 'be integer sequences with the same length. Got ' - f'mismatched lengths {conv_out_channels} and ' - f'{conv_kernel_sizes}') + "be integer sequences with the same length. Got " + f"mismatched lengths {conv_out_channels} and " + f"{conv_kernel_sizes}" + ) self.conv_layers = self._make_conv_layers( - in_channels=in_channels, - layer_out_channels=conv_out_channels, - layer_kernel_sizes=conv_kernel_sizes) + in_channels=in_channels, layer_out_channels=conv_out_channels, layer_kernel_sizes=conv_kernel_sizes + ) in_channels = conv_out_channels[-1] else: self.conv_layers = nn.Identity() if final_layer_dict is not None: - cfg = dict( - type='Conv2d', - in_channels=in_channels, - out_channels=out_channels, - kernel_size=1) + cfg = dict(type="Conv2d", in_channels=in_channels, out_channels=out_channels, kernel_size=1) cfg.update(final_layer_dict) self.final_layer = build_conv_layer(cfg) else: @@ -286,33 +258,20 @@ class CalibrationHead(BaseHead): for param in self.final_layer.parameters(): param.requires_grad = False - def _build_probability_head(self, in_channels: int, out_channels: int, - freeze: bool = False) -> nn.Module: + def _build_probability_head(self, in_channels: int, out_channels: int, freeze: bool = False) -> nn.Module: """Build the probability head module.""" ppb_layers = [] kernel_sizes = [(4, 3), (2, 2), (2, 2)] for i in range(len(kernel_sizes)): ppb_layers.append( - build_conv_layer( - dict(type='Conv2d'), - in_channels=in_channels, - out_channels=in_channels, - kernel_size=3, - stride=1, - padding=1)) - ppb_layers.append( - nn.BatchNorm2d(num_features=in_channels)) - ppb_layers.append( - nn.MaxPool2d(kernel_size=kernel_sizes[i], stride=kernel_sizes[i], padding=0)) + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=in_channels, kernel_size=3, stride=1, padding=1) + ) + ppb_layers.append(nn.BatchNorm2d(num_features=in_channels)) + ppb_layers.append(nn.MaxPool2d(kernel_size=kernel_sizes[i], stride=kernel_sizes[i], padding=0)) ppb_layers.append(self.nonlinearity) ppb_layers.append( - build_conv_layer( - dict(type='Conv2d'), - in_channels=in_channels, - out_channels=out_channels, - kernel_size=1, - stride=1, - padding=0)) + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0) + ) ppb_layers.append(nn.Sigmoid()) self.probability_layers = nn.Sequential(*ppb_layers) @@ -320,33 +279,20 @@ class CalibrationHead(BaseHead): for param in self.probability_layers.parameters(): param.requires_grad = False - def _build_visibility_head(self, in_channels: int, out_channels: int, - freeze: bool = False) -> nn.Module: + def _build_visibility_head(self, in_channels: int, out_channels: int, freeze: bool = False) -> nn.Module: """Build the visibility head module.""" vis_layers = [] kernel_sizes = [(4, 3), (2, 2), (2, 2)] for i in range(len(kernel_sizes)): vis_layers.append( - build_conv_layer( - dict(type='Conv2d'), - in_channels=in_channels, - out_channels=in_channels, - kernel_size=3, - stride=1, - padding=1)) - vis_layers.append( - nn.BatchNorm2d(num_features=in_channels)) - vis_layers.append( - nn.MaxPool2d(kernel_size=kernel_sizes[i], stride=kernel_sizes[i], padding=0)) + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=in_channels, kernel_size=3, stride=1, padding=1) + ) + vis_layers.append(nn.BatchNorm2d(num_features=in_channels)) + vis_layers.append(nn.MaxPool2d(kernel_size=kernel_sizes[i], stride=kernel_sizes[i], padding=0)) vis_layers.append(self.nonlinearity) vis_layers.append( - build_conv_layer( - dict(type='Conv2d'), - in_channels=in_channels, - out_channels=out_channels, - kernel_size=1, - stride=1, - padding=0)) + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0) + ) vis_layers.append(nn.Sigmoid()) self.visibility_layers = nn.Sequential(*vis_layers) @@ -354,33 +300,20 @@ class CalibrationHead(BaseHead): for param in self.visibility_layers.parameters(): param.requires_grad = False - def _build_oks_head(self, in_channels: int, out_channels: int, - freeze: bool = False) -> nn.Module: + def _build_oks_head(self, in_channels: int, out_channels: int, freeze: bool = False) -> nn.Module: """Build the oks head module.""" oks_layers = [] kernel_sizes = [(4, 3), (2, 2), (2, 2)] for i in range(len(kernel_sizes)): oks_layers.append( - build_conv_layer( - dict(type='Conv2d'), - in_channels=in_channels, - out_channels=in_channels, - kernel_size=3, - stride=1, - padding=1)) - oks_layers.append( - nn.BatchNorm2d(num_features=in_channels)) - oks_layers.append( - nn.MaxPool2d(kernel_size=kernel_sizes[i], stride=kernel_sizes[i], padding=0)) + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=in_channels, kernel_size=3, stride=1, padding=1) + ) + oks_layers.append(nn.BatchNorm2d(num_features=in_channels)) + oks_layers.append(nn.MaxPool2d(kernel_size=kernel_sizes[i], stride=kernel_sizes[i], padding=0)) oks_layers.append(self.nonlinearity) oks_layers.append( - build_conv_layer( - dict(type='Conv2d'), - in_channels=in_channels, - out_channels=out_channels, - kernel_size=1, - stride=1, - padding=0)) + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0) + ) oks_layers.append(nn.Sigmoid()) self.oks_layers = nn.Sequential(*oks_layers) @@ -388,33 +321,20 @@ class CalibrationHead(BaseHead): for param in self.oks_layers.parameters(): param.requires_grad = False - def _build_error_head(self, in_channels: int, out_channels: int, - freeze: bool = False) -> nn.Module: + def _build_error_head(self, in_channels: int, out_channels: int, freeze: bool = False) -> nn.Module: """Build the error head module.""" error_layers = [] kernel_sizes = [(4, 3), (2, 2), (2, 2)] for i in range(len(kernel_sizes)): error_layers.append( - build_conv_layer( - dict(type='Conv2d'), - in_channels=in_channels, - out_channels=in_channels, - kernel_size=3, - stride=1, - padding=1)) - error_layers.append( - nn.BatchNorm2d(num_features=in_channels)) - error_layers.append( - nn.MaxPool2d(kernel_size=kernel_sizes[i], stride=kernel_sizes[i], padding=0)) + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=in_channels, kernel_size=3, stride=1, padding=1) + ) + error_layers.append(nn.BatchNorm2d(num_features=in_channels)) + error_layers.append(nn.MaxPool2d(kernel_size=kernel_sizes[i], stride=kernel_sizes[i], padding=0)) error_layers.append(self.nonlinearity) error_layers.append( - build_conv_layer( - dict(type='Conv2d'), - in_channels=in_channels, - out_channels=out_channels, - kernel_size=1, - stride=1, - padding=0)) + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0) + ) error_layers.append(self.nonlinearity) self.error_layers = nn.Sequential(*error_layers) @@ -422,22 +342,15 @@ class CalibrationHead(BaseHead): for param in self.error_layers.parameters(): param.requires_grad = False - def _make_conv_layers(self, in_channels: int, - layer_out_channels: Sequence[int], - layer_kernel_sizes: Sequence[int]) -> nn.Module: + def _make_conv_layers(self, in_channels: int, layer_out_channels: Sequence[int], layer_kernel_sizes: Sequence[int]) -> nn.Module: """Create convolutional layers by given parameters.""" layers = [] - for out_channels, kernel_size in zip(layer_out_channels, - layer_kernel_sizes): + for out_channels, kernel_size in zip(layer_out_channels, layer_kernel_sizes): padding = (kernel_size - 1) // 2 cfg = dict( - type='Conv2d', - in_channels=in_channels, - out_channels=out_channels, - kernel_size=kernel_size, - stride=1, - padding=padding) + type="Conv2d", in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=1, padding=padding + ) layers.append(build_conv_layer(cfg)) layers.append(nn.BatchNorm2d(num_features=out_channels)) layers.append(self.nonlinearity) @@ -445,14 +358,11 @@ class CalibrationHead(BaseHead): return nn.Sequential(*layers) - def _make_deconv_layers(self, in_channels: int, - layer_out_channels: Sequence[int], - layer_kernel_sizes: Sequence[int]) -> nn.Module: + def _make_deconv_layers(self, in_channels: int, layer_out_channels: Sequence[int], layer_kernel_sizes: Sequence[int]) -> nn.Module: """Create deconvolutional layers by given parameters.""" layers = [] - for out_channels, kernel_size in zip(layer_out_channels, - layer_kernel_sizes): + for out_channels, kernel_size in zip(layer_out_channels, layer_kernel_sizes): if kernel_size == 4: padding = 1 output_padding = 0 @@ -463,18 +373,17 @@ class CalibrationHead(BaseHead): padding = 0 output_padding = 0 else: - raise ValueError(f'Unsupported kernel size {kernel_size} for' - 'deconvlutional layers in ' - f'{self.__class__.__name__}') + raise ValueError(f"Unsupported kernel size {kernel_size} for" "deconvlutional layers in " f"{self.__class__.__name__}") cfg = dict( - type='deconv', + type="deconv", in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=2, padding=padding, output_padding=output_padding, - bias=False) + bias=False, + ) layers.append(build_upsample_layer(cfg)) layers.append(nn.BatchNorm2d(num_features=out_channels)) layers.append(self.nonlinearity) @@ -503,11 +412,11 @@ class CalibrationHead(BaseHead): coords, score = self.decoder.decode(gt_htm) coords = coords.squeeze() gt_coords[i, :, :] = coords - + coords, score = self.decoder.decode(dt_htm) coords = coords.squeeze() dt_coords[i, :, :] = coords - + # NaN coordinates mean empty heatmaps -> set them to -1 # as the error will be ignored by weight gt_coords[np.isnan(gt_coords)] = -1 @@ -517,7 +426,7 @@ class CalibrationHead(BaseHead): assert (target_errors >= 0).all(), "Euclidean distance cannot be negative" return target_errors - + def _oks_from_heatmaps(self, gt_heatmaps: Tensor, dt_heatmaps: Tensor, weight: Tensor) -> Tensor: """Calculate the OKS from heatmaps. @@ -542,7 +451,7 @@ class CalibrationHead(BaseHead): coords, score = self.decoder.decode(gt_htm) coords = coords.squeeze() gt_coords[i, :, :] = coords - + coords, score = self.decoder.decode(dt_htm) coords = coords.squeeze() dt_coords[i, :, :] = coords @@ -553,8 +462,8 @@ class CalibrationHead(BaseHead): # Add probability as visibility gt_coords = gt_coords * weight dt_coords = dt_coords * weight - gt_coords = np.concatenate((gt_coords, weight*2), axis=2) - dt_coords = np.concatenate((dt_coords, weight*2), axis=2) + gt_coords = np.concatenate((gt_coords, weight * 2), axis=2) + dt_coords = np.concatenate((dt_coords, weight * 2), axis=2) # Calculate the oks target_oks = [] @@ -569,19 +478,23 @@ class CalibrationHead(BaseHead): oks_weights.append(0) continue - gt_bbox = np.array([ - 0, 0, - 64, 48, - ]) + gt_bbox = np.array( + [ + 0, + 0, + 64, + 48, + ] + ) gt = { - 'keypoints': gt_kpts, - 'bbox': gt_bbox, - 'area': gt_bbox[2] * gt_bbox[3], + "keypoints": gt_kpts, + "bbox": gt_bbox, + "area": gt_bbox[2] * gt_bbox[3], } dt = { - 'keypoints': dt_kpts, - 'bbox': gt_bbox, - 'area': gt_bbox[2] * gt_bbox[3], + "keypoints": dt_kpts, + "bbox": gt_bbox, + "area": gt_bbox[2] * gt_bbox[3], } # Changed for per-keypoint OKS oks = compute_oks(gt, dt, use_area=False, per_kpt=True) @@ -598,11 +511,7 @@ class CalibrationHead(BaseHead): @property def default_init_cfg(self): - init_cfg = [ - dict( - type='Normal', layer=['Conv2d', 'ConvTranspose2d'], std=0.001), - dict(type='Constant', layer='BatchNorm2d', val=1) - ] + init_cfg = [dict(type="Normal", layer=["Conv2d", "ConvTranspose2d"], std=0.001), dict(type="Constant", layer="BatchNorm2d", val=1)] return init_cfg def forward(self, feats: Tuple[Tensor]) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor]: @@ -624,7 +533,7 @@ class CalibrationHead(BaseHead): errors = self.forward_error(x) return heatmaps, probabilities, visibilities, oks, errors - + def forward_heatmap(self, x: Tensor) -> Tensor: """Forward the network. The input is multi scale feature maps and the output is the heatmap. @@ -639,8 +548,8 @@ class CalibrationHead(BaseHead): x = self.conv_layers(x) x = self.final_layer(x) B, C, H, W = x.shape - x = x.reshape((B, C, H*W)) - x = self.normalize_layer(x/self.temperature) + x = x.reshape((B, C, H * W)) + x = self.normalize_layer(x / self.temperature) if self.normalize is not None: x = x * self.normalize x = torch.clamp(x, 0, 1) @@ -652,7 +561,7 @@ class CalibrationHead(BaseHead): # x = x.reshape((B, C, H, W)) return x - + def forward_probability(self, x: Tensor) -> Tensor: """Forward the network. The input is multi scale feature maps and the output is the probability. @@ -684,7 +593,7 @@ class CalibrationHead(BaseHead): x = x.detach() x = self.visibility_layers(x) return x - + def forward_oks(self, x: Tensor) -> Tensor: """Forward the network. The input is multi scale feature maps and the output is the oks. @@ -698,7 +607,7 @@ class CalibrationHead(BaseHead): x = x.detach() x = self.oks_layers(x) return x - + def forward_error(self, x: Tensor) -> Tensor: """Forward the network. The input is multi scale feature maps and the output is the euclidean error. @@ -713,10 +622,7 @@ class CalibrationHead(BaseHead): x = self.error_layers(x) return x - def predict(self, - feats: Features, - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Features, batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from features. Args: @@ -747,10 +653,10 @@ class CalibrationHead(BaseHead): - heatmaps (Tensor): The predicted heatmaps in shape (K, h, w) """ - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): # TTA: flip test -> feats = [orig, flipped] assert isinstance(feats, list) and len(feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] _feats, _feats_flip = feats _htm, _prob, _vis, _oks, _err = self.forward(_feats) @@ -760,9 +666,10 @@ class CalibrationHead(BaseHead): # Flip back the keypoints _htm_flip = flip_heatmaps( _htm_flip, - flip_mode=test_cfg.get('flip_mode', 'heatmap'), + flip_mode=test_cfg.get("flip_mode", "heatmap"), flip_indices=flip_indices, - shift_heatmap=test_cfg.get('shift_heatmap', False)) + shift_heatmap=test_cfg.get("shift_heatmap", False), + ) heatmaps = (_htm + _htm_flip) * 0.5 # Flip back scalars @@ -770,7 +677,7 @@ class CalibrationHead(BaseHead): _vis_flip = _vis_flip[:, flip_indices] _oks_flip = _oks_flip[:, flip_indices] _err_flip = _err_flip[:, flip_indices] - + probabilities = (_prob + _prob_flip) * 0.5 visibilities = (_vis + _vis_flip) * 0.5 oks = (_oks + _oks_flip) * 0.5 @@ -784,13 +691,13 @@ class CalibrationHead(BaseHead): visibilities = to_numpy(visibilities).reshape((B, 1, C)) oks = to_numpy(oks).reshape((B, 1, C)) errors = to_numpy(errors).reshape((B, 1, C)) - + # Normalize errors by dividing with the diagonal of the heatmap htm_diagonal = np.sqrt(H**2 + W**2) errors = errors / htm_diagonal for pi, p in enumerate(preds): - p.set_field(p['keypoint_scores'], "keypoints_conf") + p.set_field(p["keypoint_scores"], "keypoints_conf") p.set_field(probabilities[pi], "keypoints_probs") p.set_field(visibilities[pi], "keypoints_visible") p.set_field(oks[pi], "keypoints_oks") @@ -803,19 +710,14 @@ class CalibrationHead(BaseHead): # hm = heatmaps.detach().cpu().numpy() # print("Heatmaps:", hm.shape, hm.min(), hm.max()) - - if test_cfg.get('output_heatmaps', False): - pred_fields = [ - PixelData(heatmaps=hm) for hm in heatmaps.detach() - ] + + if test_cfg.get("output_heatmaps", False): + pred_fields = [PixelData(heatmaps=hm) for hm in heatmaps.detach()] return preds, pred_fields else: return preds - def loss(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples. Args: @@ -829,21 +731,15 @@ class CalibrationHead(BaseHead): dict: A dictionary of losses. """ dt_heatmaps, dt_probs, dt_vis, dt_oks, dt_errs = self.forward(feats) - device=dt_heatmaps.device + device = dt_heatmaps.device B, C, H, W = dt_heatmaps.shape - + # Extract GT data - gt_heatmaps = torch.stack( - [d.gt_fields.heatmaps for d in batch_data_samples]) - gt_probs = np.stack( - [d.gt_instances.in_image.astype(int) for d in batch_data_samples]) - gt_annotated = np.stack( - [d.gt_instances.keypoints_visible.astype(int) for d in batch_data_samples]) - gt_vis = np.stack( - [d.gt_instances.keypoints_visibility.astype(int) for d in batch_data_samples]) - keypoint_weights = torch.cat([ - d.gt_instance_labels.keypoint_weights for d in batch_data_samples - ]) + gt_heatmaps = torch.stack([d.gt_fields.heatmaps for d in batch_data_samples]) + gt_probs = np.stack([d.gt_instances.in_image.astype(int) for d in batch_data_samples]) + gt_annotated = np.stack([d.gt_instances.keypoints_visible.astype(int) for d in batch_data_samples]) + gt_vis = np.stack([d.gt_instances.keypoints_visibility.astype(int) for d in batch_data_samples]) + keypoint_weights = torch.cat([d.gt_instance_labels.keypoint_weights for d in batch_data_samples]) # Compute GT errors and OKS if self.freeze_error: @@ -864,7 +760,7 @@ class CalibrationHead(BaseHead): gt_probs = torch.tensor(gt_probs, device=device, dtype=dt_probs.dtype) gt_vis = torch.tensor(gt_vis, device=device, dtype=dt_vis.dtype) gt_annotated = torch.tensor(gt_annotated, device=device) - + gt_oks = gt_oks.to(device).to(dt_oks.dtype) oks_weight = oks_weight.to(device).to(dt_oks.dtype) gt_errs = gt_errs.to(device).to(dt_errs.dtype) @@ -895,44 +791,17 @@ class CalibrationHead(BaseHead): heatmap_weights = annotated_in heatmap_loss_pxl = self.keypoint_loss_module(dt_heatmaps, gt_heatmaps, annotated_in, per_pixel=True) - heatmap_loss = self.keypoint_loss_module(dt_heatmaps, gt_heatmaps, annotated_in) + heatmap_loss = self.keypoint_loss_module(dt_heatmaps, gt_heatmaps, annotated_in) # probability_loss = self.probability_loss_module(dt_probs, gt_probs, gt_annotated) # visibility_loss = self.visibility_loss_module(dt_vis, gt_vis, annotated_in) # oks_loss = self.oks_loss_module(dt_oks, gt_oks, annotated_in) # error_loss = self.error_loss_module(dt_errs, gt_errs, annotated_in) - # Visualize some heatmaps - for i in range(0, B): - # continue - if self.num_iters % self.interval == 0: - self.interval = int(self.interval * 1.3) - os.makedirs(self.loss_vis_folder, exist_ok=True) - for kpt_i in np.random.choice(C, 17, replace=False): - tgt = gt_heatmaps[i, kpt_i].detach().cpu().numpy() - htm = dt_heatmaps[i, kpt_i].detach().cpu().numpy() - lss = heatmap_loss_pxl[i, kpt_i].detach().cpu().numpy() - save_img = self._visualize_heatmaps( - htm, tgt, lss, keypoint_weights[i, kpt_i], gt_probs[i, kpt_i] - ) - - save_path = os.path.join( - self.loss_vis_folder, - "heatmap_{:07d}-{:d}-{:d}.png".format(self.num_iters, i, kpt_i) - ) - cv2.imwrite(save_path, save_img) - - self.num_iters += 1 - - - losses.update( - loss_kpt=heatmap_loss - ) - + losses.update(loss_kpt=heatmap_loss) + # calculate accuracy - if train_cfg.get('compute_acc', True): - acc_pose = self.get_pose_accuracy( - dt_heatmaps, gt_heatmaps, keypoint_weights > 0.5 - ) + if train_cfg.get("compute_acc", True): + acc_pose = self.get_pose_accuracy(dt_heatmaps, gt_heatmaps, keypoint_weights > 0.5) losses.update(acc_pose=acc_pose) # Calculate the best binary accuracy for probability @@ -971,7 +840,7 @@ class CalibrationHead(BaseHead): # Calculate the MAE between Euclidean error and OKS err_to_oks_mae = self.get_mae( - self.error_to_OKS(dt_errs, area=H*W), + self.error_to_OKS(dt_errs, area=H * W), gt_oks, annotated_in > 0.5, ) @@ -980,15 +849,8 @@ class CalibrationHead(BaseHead): print(self.temperature.item()) return losses - - def _visualize_heatmaps( - self, - htm, - tgt, - lss, - weight, - prob - ): + + def _visualize_heatmaps(self, htm, tgt, lss, weight, prob): tgt_range = (tgt.min(), tgt.max()) htm_range = (htm.min(), htm.max()) lss_range = (lss.min(), lss.max()) @@ -996,64 +858,91 @@ class CalibrationHead(BaseHead): tgt[tgt < 0] = 0 htm[htm < 0] = 0 lss[lss < 0] = 0 - + # Normalize heatmaps between 0 and 1 - tgt /= (tgt.max()+1e-10) - htm /= (htm.max()+1e-10) - lss /= (lss.max()+1e-10) + tgt /= tgt.max() + 1e-10 + htm /= htm.max() + 1e-10 + lss /= lss.max() + 1e-10 scale = 6 - - htm_color = cv2.cvtColor((htm*255).astype(np.uint8), cv2.COLOR_GRAY2BGR) + + htm_color = cv2.cvtColor((htm * 255).astype(np.uint8), cv2.COLOR_GRAY2BGR) htm_color = cv2.applyColorMap(htm_color, cv2.COLORMAP_JET) - htm_color = cv2.resize(htm_color, (htm.shape[1]*scale, htm.shape[0]*scale), interpolation=cv2.INTER_NEAREST) - - tgt_color = cv2.cvtColor((tgt*255).astype(np.uint8), cv2.COLOR_GRAY2BGR) + htm_color = cv2.resize(htm_color, (htm.shape[1] * scale, htm.shape[0] * scale), interpolation=cv2.INTER_NEAREST) + + tgt_color = cv2.cvtColor((tgt * 255).astype(np.uint8), cv2.COLOR_GRAY2BGR) tgt_color = cv2.applyColorMap(tgt_color, cv2.COLORMAP_JET) - tgt_color = cv2.resize(tgt_color, (htm.shape[1]*scale, htm.shape[0]*scale), interpolation=cv2.INTER_NEAREST) - - lss_color = cv2.cvtColor((lss*255).astype(np.uint8), cv2.COLOR_GRAY2BGR) + tgt_color = cv2.resize(tgt_color, (htm.shape[1] * scale, htm.shape[0] * scale), interpolation=cv2.INTER_NEAREST) + + lss_color = cv2.cvtColor((lss * 255).astype(np.uint8), cv2.COLOR_GRAY2BGR) lss_color = cv2.applyColorMap(lss_color, cv2.COLORMAP_JET) - lss_color = cv2.resize(lss_color, (htm.shape[1]*scale, htm.shape[0]*scale), interpolation=cv2.INTER_NEAREST) - + lss_color = cv2.resize(lss_color, (htm.shape[1] * scale, htm.shape[0] * scale), interpolation=cv2.INTER_NEAREST) + if scale > 2: tgt_color_text = tgt_color.copy() - cv2.putText(tgt_color_text, "tgt ({:.1f}, {:.1f})".format(tgt_range[0]*10, tgt_range[1]*10), (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1) + cv2.putText( + tgt_color_text, + "tgt ({:.1f}, {:.1f})".format(tgt_range[0] * 10, tgt_range[1] * 10), + (10, 20), + cv2.FONT_HERSHEY_SIMPLEX, + 0.5, + (255, 255, 255), + 1, + ) tgt_color = cv2.addWeighted(tgt_color, 0.6, tgt_color_text, 0.4, 0) - + htm_color_text = htm_color.copy() - cv2.putText(htm_color_text, "htm ({:.1f}, {:.1f})".format(htm_range[0]*10, htm_range[1]*10), (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1) + cv2.putText( + htm_color_text, + "htm ({:.1f}, {:.1f})".format(htm_range[0] * 10, htm_range[1] * 10), + (10, 20), + cv2.FONT_HERSHEY_SIMPLEX, + 0.5, + (255, 255, 255), + 1, + ) htm_color = cv2.addWeighted(htm_color, 0.6, htm_color_text, 0.4, 0) lss_color_text = lss_color.copy() - cv2.putText(lss_color_text, "lss ({:.1f}, {:.1f})".format(lss_range[0]*10, lss_range[1]*10), (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1) + cv2.putText( + lss_color_text, + "lss ({:.1f}, {:.1f})".format(lss_range[0] * 10, lss_range[1] * 10), + (10, 20), + cv2.FONT_HERSHEY_SIMPLEX, + 0.5, + (255, 255, 255), + 1, + ) lss_color = cv2.addWeighted(lss_color, 0.6, lss_color_text, 0.4, 0) # Get argmax of the target and draw horizontal and vertical lines tgt_argmax = np.unravel_index(tgt.argmax(), tgt.shape) tgt_color_line = tgt_color.copy() - cv2.line(tgt_color_line, (0, tgt_argmax[0]*scale), (tgt_color.shape[1], tgt_argmax[0]*scale), (0, 255, 255), 1) - cv2.line(tgt_color_line, (tgt_argmax[1]*scale, 0), (tgt_argmax[1]*scale, tgt_color.shape[0]), (0, 255, 255), 1) + cv2.line(tgt_color_line, (0, tgt_argmax[0] * scale), (tgt_color.shape[1], tgt_argmax[0] * scale), (0, 255, 255), 1) + cv2.line(tgt_color_line, (tgt_argmax[1] * scale, 0), (tgt_argmax[1] * scale, tgt_color.shape[0]), (0, 255, 255), 1) tgt_color = cv2.addWeighted(tgt_color, 0.6, tgt_color_line, 0.4, 0) htm_color_line = htm_color.copy() - cv2.line(htm_color_line, (0, tgt_argmax[0]*scale), (tgt_color.shape[1], tgt_argmax[0]*scale), (0, 255, 255), 1) - cv2.line(htm_color_line, (tgt_argmax[1]*scale, 0), (tgt_argmax[1]*scale, tgt_color.shape[0]), (0, 255, 255), 1) + cv2.line(htm_color_line, (0, tgt_argmax[0] * scale), (tgt_color.shape[1], tgt_argmax[0] * scale), (0, 255, 255), 1) + cv2.line(htm_color_line, (tgt_argmax[1] * scale, 0), (tgt_argmax[1] * scale, tgt_color.shape[0]), (0, 255, 255), 1) htm_color = cv2.addWeighted(htm_color, 0.6, htm_color_line, 0.4, 0) lss_color_line = lss_color.copy() - cv2.line(lss_color_line, (0, tgt_argmax[0]*scale), (tgt_color.shape[1], tgt_argmax[0]*scale), (0, 255, 255), 1) - cv2.line(lss_color_line, (tgt_argmax[1]*scale, 0), (tgt_argmax[1]*scale, tgt_color.shape[0]), (0, 255, 255), 1) + cv2.line(lss_color_line, (0, tgt_argmax[0] * scale), (tgt_color.shape[1], tgt_argmax[0] * scale), (0, 255, 255), 1) + cv2.line(lss_color_line, (tgt_argmax[1] * scale, 0), (tgt_argmax[1] * scale, tgt_color.shape[0]), (0, 255, 255), 1) lss_color = cv2.addWeighted(lss_color, 0.6, lss_color_line, 0.4, 0) white_column = np.ones((tgt_color.shape[0], 1, 3), dtype=np.uint8) * 255 - save_img = np.concatenate(( - tgt_color, - white_column, - htm_color, - white_column, - lss_color, - ), axis=1) - + save_img = np.concatenate( + ( + tgt_color, + white_column, + htm_color, + white_column, + lss_color, + ), + axis=1, + ) + if weight < 0.5: # Draw a red X across the whole save_img cv2.line(save_img, (0, 0), (save_img.shape[1], save_img.shape[0]), (0, 0, 255), 2) @@ -1064,18 +953,17 @@ class CalibrationHead(BaseHead): cv2.line(save_img, (0, save_img.shape[0]), (save_img.shape[1], 0), (0, 255, 255), 2) return save_img - def get_pose_accuracy(self, dt, gt, mask): """Calculate the accuracy of predicted pose.""" _, avg_acc, _ = pose_pck_accuracy( output=to_numpy(dt), target=to_numpy(gt), mask=to_numpy(mask), - method='argmax', + method="argmax", ) acc_pose = torch.tensor(avg_acc, device=gt.device) return acc_pose - + def get_binary_accuracy(self, dt, gt, mask, force_balanced=False): """Calculate the binary accuracy.""" assert dt.shape == gt.shape @@ -1107,7 +995,7 @@ class CalibrationHead(BaseHead): n_samples = len(gt) thresholds = np.arange(0.1, 1.0, 0.05) - preds = (dt[:, None] > thresholds) + preds = dt[:, None] > thresholds correct = preds == gt[:, None] counts = correct.sum(axis=0) @@ -1127,7 +1015,7 @@ class CalibrationHead(BaseHead): dt = to_numpy(dt) gt = to_numpy(gt) mask = to_numpy(mask) - + dt = dt[mask] gt = gt[mask] mae = np.abs(dt - gt).mean() @@ -1135,15 +1023,14 @@ class CalibrationHead(BaseHead): mae = torch.tensor(mae, device=device) return mae - def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, - **kwargs): + def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, **kwargs): """A hook function to convert old-version state dict of :class:`TopdownHeatmapSimpleHead` (before MMPose v1.0.0) to a compatible format of :class:`HeatmapHead`. The hook will be automatically registered during initialization. """ - version = local_meta.get('version', None) + version = local_meta.get("version", None) if version and version >= self._version: return @@ -1153,7 +1040,7 @@ class CalibrationHead(BaseHead): if not _k.startswith(prefix): continue v = state_dict.pop(_k) - k = _k[len(prefix):] + k = _k[len(prefix) :] # In old version, "final_layer" includes both intermediate # conv layers (new "conv_layers") and final conv layers (new # "final_layer"). @@ -1166,17 +1053,17 @@ class CalibrationHead(BaseHead): # have keys like "final_layer.n.xxx", where the weights of the last # one should be renamed "final_layer.xxx", and others should be # renamed "conv_layers.n.xxx" - k_parts = k.split('.') - if k_parts[0] == 'final_layer': + k_parts = k.split(".") + if k_parts[0] == "final_layer": if len(k_parts) == 3: assert isinstance(self.conv_layers, nn.Sequential) idx = int(k_parts[1]) if idx < len(self.conv_layers): # final_layer.n.xxx -> conv_layers.n.xxx - k_new = 'conv_layers.' + '.'.join(k_parts[1:]) + k_new = "conv_layers." + ".".join(k_parts[1:]) else: # final_layer.n.xxx -> final_layer.xxx - k_new = 'final_layer.' + k_parts[2] + k_new = "final_layer." + k_parts[2] else: # final_layer.xxx remains final_layer.xxx k_new = k @@ -1187,32 +1074,35 @@ class CalibrationHead(BaseHead): def error_to_OKS(self, error, area=1.0): """Convert the error to OKS.""" - sigmas = np.array( - [.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89])/10.0 + sigmas = np.array([0.26, 0.25, 0.25, 0.35, 0.35, 0.79, 0.79, 0.72, 0.72, 0.62, 0.62, 1.07, 1.07, 0.87, 0.87, 0.89, 0.89]) / 10.0 if isinstance(error, torch.Tensor): sigmas = torch.tensor(sigmas, device=error.device) - vars = (sigmas * 2)**2 + vars = (sigmas * 2) ** 2 norm_error = error**2 / vars / area / 2.0 return torch.exp(-norm_error) def compute_oks(gt, dt, use_area=True, per_kpt=False): - sigmas = np.array( - [.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89])/10.0 - vars = (sigmas * 2)**2 + sigmas = np.array([0.26, 0.25, 0.25, 0.35, 0.35, 0.79, 0.79, 0.72, 0.72, 0.62, 0.62, 1.07, 1.07, 0.87, 0.87, 0.89, 0.89]) / 10.0 + vars = (sigmas * 2) ** 2 k = len(sigmas) visibility_condition = lambda x: x > 0 - g = np.array(gt['keypoints']).reshape(k, 3) - xg = g[:, 0]; yg = g[:, 1]; vg = g[:, 2] + g = np.array(gt["keypoints"]).reshape(k, 3) + xg = g[:, 0] + yg = g[:, 1] + vg = g[:, 2] k1 = np.count_nonzero(visibility_condition(vg)) - bb = gt['bbox'] - x0 = bb[0] - bb[2]; x1 = bb[0] + bb[2] * 2 - y0 = bb[1] - bb[3]; y1 = bb[1] + bb[3] * 2 - - d = np.array(dt['keypoints']).reshape((k, 3)) - xd = d[:, 0]; yd = d[:, 1] - - if k1>0: + bb = gt["bbox"] + x0 = bb[0] - bb[2] + x1 = bb[0] + bb[2] * 2 + y0 = bb[1] - bb[3] + y1 = bb[1] + bb[3] * 2 + + d = np.array(dt["keypoints"]).reshape((k, 3)) + xd = d[:, 0] + yd = d[:, 1] + + if k1 > 0: # measure the per-keypoint distance if keypoints visible dx = xd - xg dy = yd - yg @@ -1220,15 +1110,15 @@ def compute_oks(gt, dt, use_area=True, per_kpt=False): else: # measure minimum distance to keypoints in (x0,y0) & (x1,y1) z = np.zeros((k)) - dx = np.max((z, x0-xd),axis=0)+np.max((z, xd-x1),axis=0) - dy = np.max((z, y0-yd),axis=0)+np.max((z, yd-y1),axis=0) + dx = np.max((z, x0 - xd), axis=0) + np.max((z, xd - x1), axis=0) + dy = np.max((z, y0 - yd), axis=0) + np.max((z, yd - y1), axis=0) if use_area: - e = (dx**2 + dy**2) / vars / (gt['area']+np.spacing(1)) / 2 + e = (dx**2 + dy**2) / vars / (gt["area"] + np.spacing(1)) / 2 else: - tmparea = gt['bbox'][3] * gt['bbox'][2] * 0.53 - e = (dx**2 + dy**2) / vars / (tmparea+np.spacing(1)) / 2 - + tmparea = gt["bbox"][3] * gt["bbox"][2] * 0.53 + e = (dx**2 + dy**2) / vars / (tmparea + np.spacing(1)) / 2 + if per_kpt: oks = np.exp(-e) if k1 > 0: @@ -1236,7 +1126,7 @@ def compute_oks(gt, dt, use_area=True, per_kpt=False): else: if k1 > 0: - e=e[visibility_condition(vg)] + e = e[visibility_condition(vg)] oks = np.sum(np.exp(-e)) / e.shape[0] - return oks \ No newline at end of file + return oks diff --git a/mmpose/models/heads/hybrid_heads/dekr_head.py b/mmpose/models/heads/hybrid_heads/dekr_head.py index 41f7cfc4ce9f7cbb061c18ba14a4847a67a07ffc..fea5ae20c11d71db0cffbec8b2d5b0880ee1da11 100644 --- a/mmpose/models/heads/hybrid_heads/dekr_head.py +++ b/mmpose/models/heads/hybrid_heads/dekr_head.py @@ -2,8 +2,7 @@ from typing import Sequence, Tuple, Union import torch -from mmcv.cnn import (ConvModule, build_activation_layer, build_conv_layer, - build_norm_layer) +from mmcv.cnn import ConvModule, build_activation_layer, build_conv_layer, build_norm_layer from mmengine.model import BaseModule, ModuleDict, Sequential from mmengine.structures import InstanceData, PixelData from torch import Tensor @@ -12,13 +11,14 @@ from mmpose.evaluation.functional.nms import nearby_joints_nms from mmpose.models.utils.tta import flip_heatmaps from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, Features, InstanceList, - OptConfigType, OptSampleList, Predictions) +from mmpose.utils.typing import ConfigType, Features, InstanceList, OptConfigType, OptSampleList, Predictions + from ...backbones.resnet import BasicBlock from ..base_head import BaseHead try: from mmcv.ops import DeformConv2d + has_mmcv_full = True except (ImportError, ModuleNotFoundError): has_mmcv_full = False @@ -37,44 +37,25 @@ class AdaptiveActivationBlock(BaseModule): act_cfg (dict): Config for activation layers. """ - def __init__(self, - in_channels, - out_channels, - groups=1, - norm_cfg=dict(type='BN'), - act_cfg=dict(type='ReLU'), - init_cfg=None): + def __init__(self, in_channels, out_channels, groups=1, norm_cfg=dict(type="BN"), act_cfg=dict(type="ReLU"), init_cfg=None): super(AdaptiveActivationBlock, self).__init__(init_cfg=init_cfg) assert in_channels % groups == 0 and out_channels % groups == 0 self.groups = groups - regular_matrix = torch.tensor([[-1, -1, -1, 0, 0, 0, 1, 1, 1], - [-1, 0, 1, -1, 0, 1, -1, 0, 1], - [1, 1, 1, 1, 1, 1, 1, 1, 1]]) - self.register_buffer('regular_matrix', regular_matrix.float()) + regular_matrix = torch.tensor([[-1, -1, -1, 0, 0, 0, 1, 1, 1], [-1, 0, 1, -1, 0, 1, -1, 0, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1]]) + self.register_buffer("regular_matrix", regular_matrix.float()) self.transform_matrix_conv = build_conv_layer( - dict(type='Conv2d'), - in_channels=in_channels, - out_channels=6 * groups, - kernel_size=3, - padding=1, - groups=groups, - bias=True) + dict(type="Conv2d"), in_channels=in_channels, out_channels=6 * groups, kernel_size=3, padding=1, groups=groups, bias=True + ) if has_mmcv_full: self.adapt_conv = DeformConv2d( - in_channels, - out_channels, - kernel_size=3, - padding=1, - bias=False, - groups=groups, - deform_groups=groups) + in_channels, out_channels, kernel_size=3, padding=1, bias=False, groups=groups, deform_groups=groups + ) else: - raise ImportError('Please install the full version of mmcv ' - 'to use `DeformConv2d`.') + raise ImportError("Please install the full version of mmcv " "to use `DeformConv2d`.") self.norm = build_norm_layer(norm_cfg, out_channels)[1] self.act = build_activation_layer(act_cfg) @@ -139,13 +120,11 @@ class RescoreNet(BaseModule): joint_1, joint_2 = zip(*skeleton) num_link = len(skeleton) - joint_relate = (keypoints[:, joint_1] - - keypoints[:, joint_2])[:, :, :2] + joint_relate = (keypoints[:, joint_1] - keypoints[:, joint_2])[:, :, :2] joint_length = joint_relate.norm(dim=2) # To use the torso distance to normalize - normalize = (joint_length[:, self.norm_indexes[0]] + - joint_length[:, self.norm_indexes[1]]) / 2 + normalize = (joint_length[:, self.norm_indexes[0]] + joint_length[:, self.norm_indexes[1]]) / 2 normalize = normalize.unsqueeze(1).expand(normalize.size(0), num_link) normalize = normalize.clamp(min=1).contiguous() @@ -153,8 +132,7 @@ class RescoreNet(BaseModule): joint_relate = joint_relate / normalize.unsqueeze(-1) joint_relate = joint_relate.flatten(1) - feature = torch.cat((joint_relate, joint_length, keypoint_scores), - dim=1).float() + feature = torch.cat((joint_relate, joint_length, keypoint_scores), dim=1).float() return feature def forward(self, keypoints, keypoint_scores, skeleton): @@ -197,20 +175,18 @@ class DEKRHead(BaseHead): _version = 2 - def __init__(self, - in_channels: Union[int, Sequence[int]], - num_keypoints: int, - num_heatmap_filters: int = 32, - num_displacement_filters_per_keypoint: int = 15, - heatmap_loss: ConfigType = dict( - type='KeypointMSELoss', use_target_weight=True), - displacement_loss: ConfigType = dict( - type='SoftWeightSmoothL1Loss', - use_target_weight=True, - supervise_empty=False), - decoder: OptConfigType = None, - rescore_cfg: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + num_keypoints: int, + num_heatmap_filters: int = 32, + num_displacement_filters_per_keypoint: int = 15, + heatmap_loss: ConfigType = dict(type="KeypointMSELoss", use_target_weight=True), + displacement_loss: ConfigType = dict(type="SoftWeightSmoothL1Loss", use_target_weight=True, supervise_empty=False), + decoder: OptConfigType = None, + rescore_cfg: OptConfigType = None, + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg @@ -232,14 +208,16 @@ class DEKRHead(BaseHead): in_channels=in_channels, out_channels=2 * num_keypoints, num_filters=num_keypoints * num_displacement_filters_per_keypoint, - groups=num_keypoints) + groups=num_keypoints, + ) # build losses self.loss_module = ModuleDict( dict( heatmap=MODELS.build(heatmap_loss), displacement=MODELS.build(displacement_loss), - )) + ) + ) # build decoder if decoder is not None: @@ -258,52 +236,28 @@ class DEKRHead(BaseHead): @property def default_init_cfg(self): - init_cfg = [ - dict( - type='Normal', layer=['Conv2d', 'ConvTranspose2d'], std=0.001), - dict(type='Constant', layer='BatchNorm2d', val=1) - ] + init_cfg = [dict(type="Normal", layer=["Conv2d", "ConvTranspose2d"], std=0.001), dict(type="Constant", layer="BatchNorm2d", val=1)] return init_cfg - def _make_heatmap_conv_layers(self, in_channels: int, out_channels: int, - num_filters: int): + def _make_heatmap_conv_layers(self, in_channels: int, out_channels: int, num_filters: int): """Create convolutional layers of heatmap branch by given parameters.""" layers = [ - ConvModule( - in_channels=in_channels, - out_channels=num_filters, - kernel_size=1, - norm_cfg=dict(type='BN')), + ConvModule(in_channels=in_channels, out_channels=num_filters, kernel_size=1, norm_cfg=dict(type="BN")), BasicBlock(num_filters, num_filters), - build_conv_layer( - dict(type='Conv2d'), - in_channels=num_filters, - out_channels=out_channels, - kernel_size=1), + build_conv_layer(dict(type="Conv2d"), in_channels=num_filters, out_channels=out_channels, kernel_size=1), ] return Sequential(*layers) - def _make_displacement_conv_layers(self, in_channels: int, - out_channels: int, num_filters: int, - groups: int): + def _make_displacement_conv_layers(self, in_channels: int, out_channels: int, num_filters: int, groups: int): """Create convolutional layers of displacement branch by given parameters.""" layers = [ - ConvModule( - in_channels=in_channels, - out_channels=num_filters, - kernel_size=1, - norm_cfg=dict(type='BN')), + ConvModule(in_channels=in_channels, out_channels=num_filters, kernel_size=1, norm_cfg=dict(type="BN")), AdaptiveActivationBlock(num_filters, num_filters, groups=groups), AdaptiveActivationBlock(num_filters, num_filters, groups=groups), - build_conv_layer( - dict(type='Conv2d'), - in_channels=num_filters, - out_channels=out_channels, - kernel_size=1, - groups=groups) + build_conv_layer(dict(type="Conv2d"), in_channels=num_filters, out_channels=out_channels, kernel_size=1, groups=groups), ] return Sequential(*layers) @@ -325,10 +279,7 @@ class DEKRHead(BaseHead): return heatmaps, displacements - def loss(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples. Args: @@ -342,40 +293,31 @@ class DEKRHead(BaseHead): dict: A dictionary of losses. """ pred_heatmaps, pred_displacements = self.forward(feats) - gt_heatmaps = torch.stack( - [d.gt_fields.heatmaps for d in batch_data_samples]) - heatmap_weights = torch.stack( - [d.gt_fields.heatmap_weights for d in batch_data_samples]) - gt_displacements = torch.stack( - [d.gt_fields.displacements for d in batch_data_samples]) - displacement_weights = torch.stack( - [d.gt_fields.displacement_weights for d in batch_data_samples]) - - if 'heatmap_mask' in batch_data_samples[0].gt_fields.keys(): - heatmap_mask = torch.stack( - [d.gt_fields.heatmap_mask for d in batch_data_samples]) + gt_heatmaps = torch.stack([d.gt_fields.heatmaps for d in batch_data_samples]) + heatmap_weights = torch.stack([d.gt_fields.heatmap_weights for d in batch_data_samples]) + gt_displacements = torch.stack([d.gt_fields.displacements for d in batch_data_samples]) + displacement_weights = torch.stack([d.gt_fields.displacement_weights for d in batch_data_samples]) + + if "heatmap_mask" in batch_data_samples[0].gt_fields.keys(): + heatmap_mask = torch.stack([d.gt_fields.heatmap_mask for d in batch_data_samples]) else: heatmap_mask = None # calculate losses losses = dict() - heatmap_loss = self.loss_module['heatmap'](pred_heatmaps, gt_heatmaps, - heatmap_weights, - heatmap_mask) - displacement_loss = self.loss_module['displacement']( - pred_displacements, gt_displacements, displacement_weights) - - losses.update({ - 'loss/heatmap': heatmap_loss, - 'loss/displacement': displacement_loss, - }) + heatmap_loss = self.loss_module["heatmap"](pred_heatmaps, gt_heatmaps, heatmap_weights, heatmap_mask) + displacement_loss = self.loss_module["displacement"](pred_displacements, gt_displacements, displacement_weights) + + losses.update( + { + "loss/heatmap": heatmap_loss, + "loss/displacement": displacement_loss, + } + ) return losses - def predict(self, - feats: Features, - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Features, batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from features. Args: @@ -409,46 +351,40 @@ class DEKRHead(BaseHead): in shape (K*2, h, w) """ - assert len(batch_data_samples) == 1, f'DEKRHead only supports ' \ - f'prediction with batch_size 1, but got {len(batch_data_samples)}' + assert len(batch_data_samples) == 1, f"DEKRHead only supports " f"prediction with batch_size 1, but got {len(batch_data_samples)}" - multiscale_test = test_cfg.get('multiscale_test', False) - flip_test = test_cfg.get('flip_test', False) + multiscale_test = test_cfg.get("multiscale_test", False) + flip_test = test_cfg.get("flip_test", False) metainfo = batch_data_samples[0].metainfo aug_scales = [1] if not multiscale_test: feats = [feats] else: - aug_scales = aug_scales + metainfo['aug_scales'] + aug_scales = aug_scales + metainfo["aug_scales"] heatmaps, displacements = [], [] for feat, s in zip(feats, aug_scales): if flip_test: assert isinstance(feat, list) and len(feat) == 2 - flip_indices = metainfo['flip_indices'] + flip_indices = metainfo["flip_indices"] _feat, _feat_flip = feat _heatmaps, _displacements = self.forward(_feat) _heatmaps_flip, _displacements_flip = self.forward(_feat_flip) _heatmaps_flip = flip_heatmaps( _heatmaps_flip, - flip_mode='heatmap', + flip_mode="heatmap", flip_indices=flip_indices + [len(flip_indices)], - shift_heatmap=test_cfg.get('shift_heatmap', False)) + shift_heatmap=test_cfg.get("shift_heatmap", False), + ) _heatmaps = (_heatmaps + _heatmaps_flip) / 2.0 - _displacements_flip = flip_heatmaps( - _displacements_flip, - flip_mode='offset', - flip_indices=flip_indices, - shift_heatmap=False) + _displacements_flip = flip_heatmaps(_displacements_flip, flip_mode="offset", flip_indices=flip_indices, shift_heatmap=False) # this is a coordinate amendment. - x_scale_factor = s * ( - metainfo['input_size'][0] / _heatmaps.shape[-1]) - _displacements_flip[:, ::2] += (x_scale_factor - 1) / ( - x_scale_factor) + x_scale_factor = s * (metainfo["input_size"][0] / _heatmaps.shape[-1]) + _displacements_flip[:, ::2] += (x_scale_factor - 1) / (x_scale_factor) _displacements = (_displacements + _displacements_flip) / 2.0 else: @@ -459,25 +395,18 @@ class DEKRHead(BaseHead): preds = self.decode(heatmaps, displacements, test_cfg, metainfo) - if test_cfg.get('output_heatmaps', False): + if test_cfg.get("output_heatmaps", False): heatmaps = [hm.detach() for hm in heatmaps] displacements = [dm.detach() for dm in displacements] B = heatmaps[0].shape[0] pred_fields = [] for i in range(B): - pred_fields.append( - PixelData( - heatmaps=heatmaps[0][i], - displacements=displacements[0][i])) + pred_fields.append(PixelData(heatmaps=heatmaps[0][i], displacements=displacements[0][i])) return preds, pred_fields else: return preds - def decode(self, - heatmaps: Tuple[Tensor], - displacements: Tuple[Tensor], - test_cfg: ConfigType = {}, - metainfo: dict = {}) -> InstanceList: + def decode(self, heatmaps: Tuple[Tensor], displacements: Tuple[Tensor], test_cfg: ConfigType = {}, metainfo: dict = {}) -> InstanceList: """Decode keypoints from outputs. Args: @@ -496,12 +425,13 @@ class DEKRHead(BaseHead): if self.decoder is None: raise RuntimeError( - f'The decoder has not been set in {self.__class__.__name__}. ' - 'Please set the decoder configs in the init parameters to ' - 'enable head methods `head.predict()` and `head.decode()`') + f"The decoder has not been set in {self.__class__.__name__}. " + "Please set the decoder configs in the init parameters to " + "enable head methods `head.predict()` and `head.decode()`" + ) - multiscale_test = test_cfg.get('multiscale_test', False) - skeleton = metainfo.get('skeleton_links', None) + multiscale_test = test_cfg.get("multiscale_test", False) + skeleton = metainfo.get("skeleton_links", None) preds = [] batch_size = heatmaps[0].shape[0] @@ -510,72 +440,66 @@ class DEKRHead(BaseHead): if multiscale_test: raise NotImplementedError else: - keypoints, (root_scores, - keypoint_scores) = self.decoder.decode( - heatmaps[0][b], displacements[0][b]) + keypoints, (root_scores, keypoint_scores) = self.decoder.decode(heatmaps[0][b], displacements[0][b]) # rescore each instance - if self.rescore_net is not None and skeleton and len( - keypoints) > 0: - instance_scores = self.rescore_net(keypoints, keypoint_scores, - skeleton) + if self.rescore_net is not None and skeleton and len(keypoints) > 0: + instance_scores = self.rescore_net(keypoints, keypoint_scores, skeleton) instance_scores[torch.isnan(instance_scores)] = 0 root_scores = root_scores * instance_scores # nms keypoints, keypoint_scores = to_numpy((keypoints, keypoint_scores)) scores = to_numpy(root_scores)[..., None] * keypoint_scores - if len(keypoints) > 0 and test_cfg.get('nms_dist_thr', 0) > 0: + if len(keypoints) > 0 and test_cfg.get("nms_dist_thr", 0) > 0: kpts_db = [] for i in range(len(keypoints)): - kpts_db.append( - dict(keypoints=keypoints[i], score=keypoint_scores[i])) + kpts_db.append(dict(keypoints=keypoints[i], score=keypoint_scores[i])) keep_instance_inds = nearby_joints_nms( kpts_db, - test_cfg['nms_dist_thr'], - test_cfg.get('nms_joints_thr', None), + test_cfg["nms_dist_thr"], + test_cfg.get("nms_joints_thr", None), score_per_joint=True, - max_dets=test_cfg.get('max_num_people', 30)) + max_dets=test_cfg.get("max_num_people", 30), + ) keypoints = keypoints[keep_instance_inds] scores = scores[keep_instance_inds] # pack outputs - preds.append( - InstanceData(keypoints=keypoints, keypoint_scores=scores)) + preds.append(InstanceData(keypoints=keypoints, keypoint_scores=scores)) return preds - def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, - **kwargs): + def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, **kwargs): """A hook function to convert old-version state dict of :class:`DEKRHead` (before MMPose v1.0.0) to a compatible format of :class:`DEKRHead`. The hook will be automatically registered during initialization. """ - version = local_meta.get('version', None) + version = local_meta.get("version", None) if version and version >= self._version: return # convert old-version state dict keys = list(state_dict.keys()) for k in keys: - if 'offset_conv_layer' in k: + if "offset_conv_layer" in k: v = state_dict.pop(k) - k = k.replace('offset_conv_layers', 'displacement_conv_layers') - if 'displacement_conv_layers.3.' in k: + k = k.replace("offset_conv_layers", "displacement_conv_layers") + if "displacement_conv_layers.3." in k: # the source and target of displacement vectors are # opposite between two versions. v = -v state_dict[k] = v - if 'heatmap_conv_layers.2' in k: + if "heatmap_conv_layers.2" in k: # root heatmap is at the first/last channel of the # heatmap tensor in MMPose v0.x/1.x, respectively. v = state_dict.pop(k) state_dict[k] = torch.cat((v[1:], v[:1])) - if 'rescore_net' in k: + if "rescore_net" in k: v = state_dict.pop(k) - k = k.replace('rescore_net', 'head.rescore_net') + k = k.replace("rescore_net", "head.rescore_net") state_dict[k] = v diff --git a/mmpose/models/heads/hybrid_heads/multi_head.py b/mmpose/models/heads/hybrid_heads/multi_head.py new file mode 100644 index 0000000000000000000000000000000000000000..432810a5c0dbfbdd937e60f151b1695585b41999 --- /dev/null +++ b/mmpose/models/heads/hybrid_heads/multi_head.py @@ -0,0 +1,1318 @@ +# Copyright (c) Miroslav Purkrabek, ProbPose. All rights reserved. +import os +import shutil +from typing import Optional, Sequence, Tuple, Union + +import cv2 +import numpy as np +import torch +from mmcv.cnn import build_conv_layer, build_upsample_layer +from mmengine.structures import PixelData +from torch import Tensor, nn + +from mmpose.evaluation.functional import pose_pck_accuracy +from mmpose.models.utils.tta import flip_heatmaps +from mmpose.registry import KEYPOINT_CODECS, MODELS +from mmpose.structures.keypoint import fix_bbox_aspect_ratio +from mmpose.utils.tensor_utils import to_numpy +from mmpose.utils.typing import ConfigType, Features, OptConfigType, OptSampleList, Predictions +from sparsemax import Sparsemax + +from ..base_head import BaseHead + +OptIntSeq = Optional[Sequence[int]] + +# The default sigmas are used for COCO dataset. +sigmas_17 = ( + np.array( + [ + 2.6, # 1. Nose + 2.5, # 2. Left eye + 2.5, # 3. Right eye + 3.5, # 4. Left ear + 3.5, # 5. Right ear + 7.9, # 6. Left shoulder + 7.9, # 7. Right shoulder + 7.2, # 8. Left elbow + 7.2, # 9. Right elbow + 6.2, # 10. Left wrist + 6.2, # 11. Right wrist + 10.7, # 12. Left hip + 10.7, # 13. Right hip + 8.7, # 14. Left knee + 8.7, # 15. Right knee + 8.9, # 16. Left ankle + 8.9, # 17. Right ankle + ] + ) + / 100 +) + +# The sigmas for mergeable 21 keypoints. +sigmas_21 = ( + np.array( + [ + 2.6, # 1. Nose + 2.5, # 2. Left eye + 2.5, # 3. Right eye + 3.5, # 4. Left ear + 3.5, # 5. Right ear + 7.9, # 6. Left shoulder + 7.9, # 7. Right shoulder + 7.2, # 8. Left elbow + 7.2, # 9. Right elbow + 6.2, # 10. Left wrist + 6.2, # 11. Right wrist + 10.7, # 12. Left hip + 10.7, # 13. Right hip + 8.7, # 14. Left knee + 8.7, # 15. Right knee + 8.9, # 16. Left ankle + 8.9, # 17. Right ankle + 7.9, # 18. Thorax (MPII+AIC) + 7.9, # 19. Neck (MPII) + 3.5, # 20. Head top (MPII+AIC) + 10.7, # 21. Pelvis (MPII) + ] + ) + / 100 +) + +# The sigmas for mergeable 23 keypoints. +sigmas_23 = ( + np.array( + [ + 2.6, # 1. Nose + 2.5, # 2. Left eye + 2.5, # 3. Right eye + 3.5, # 4. Left ear + 3.5, # 5. Right ear + 7.9, # 6. Left shoulder + 7.9, # 7. Right shoulder + 7.2, # 8. Left elbow + 7.2, # 9. Right elbow + 6.2, # 10. Left wrist + 6.2, # 11. Right wrist + 10.7, # 12. Left hip + 10.7, # 13. Right hip + 8.7, # 14. Left knee + 8.7, # 15. Right knee + 8.9, # 16. Left ankle + 8.9, # 17. Right ankle + 7.9, # 18. Thorax (MPII) + 10.7, # 19. Pelvis (MPII) + 7.9, # 20. Neck (MPII) + 3.5, # 21. Head top (MPII) + 7.9, # 22. Neck (AIC) + 3.5, # 23. Head top (AIC) + ] + ) + / 100 +) + +# The sigmas for 47 keypoints - non-merged combination of COCO+AIC+MPII. +sigmas_47 = ( + np.array( + [ + # ----- COCO keypoints (17) ----- + 2.6, # 1. Nose + 2.5, # 2. Left eye + 2.5, # 3. Right eye + 3.5, # 4. Left ear + 3.5, # 5. Right ear + 7.9, # 6. Left shoulder + 7.9, # 7. Right shoulder + 7.2, # 8. Left elbow + 7.2, # 9. Right elbow + 6.2, # 10. Left wrist + 6.2, # 11. Right wrist + 10.7, # 12. Left hip + 10.7, # 13. Right hip + 8.7, # 14. Left knee + 8.7, # 15. Right knee + 8.9, # 16. Left ankle + 8.9, # 17. Right ankle + # ----- MPII keypoints (16) ----- + 8.9, # 18. Right ankle + 8.7, # 19. Right knee + 10.7, # 20. Right hip + 10.7, # 21. Left hip + 8.7, # 22. Left knee + 8.9, # 23. Left ankle + 10.7, # 24. Pelvis + 7.9, # 25. Thorax + 7.9, # 26. Neck + 3.5, # 27. Head top + 6.2, # 28. Right wrist + 7.2, # 29. Right elbow + 7.9, # 30. Right shoulder + 7.9, # 31. Left shoulder + 7.2, # 32. Left elbow + 6.2, # 33. Left wrist + # ----- AIC keypoints (14) ----- + 7.9, # 34. Right shoulder + 7.2, # 35. Right elbow + 6.2, # 36. Right wrist + 7.9, # 37. Left shoulder + 7.2, # 38. Left elbow + 6.2, # 39. Left wrist + 10.7, # 40. Right hip + 8.7, # 41. Right knee + 8.9, # 42. Right ankle + 10.7, # 43. Left hip + 8.7, # 44. Left knee + 8.9, # 45. Left ankle + 3.5, # 46. Head top + 7.9, # 47. Neck + ] + ) + / 100 +) + + +@MODELS.register_module() +class MultiHead(BaseHead): + """Multi-variate head predicting all information about keypoints. Apart + from the heatmap, it also predicts: + 1) Heatmap for each keypoint + 2) Probability of keypoint being in the heatmap + 3) Visibility of each keypoint + 4) Predicted OKS per keypoint + 5) Predictd euclidean error per keypoint + The heatmap predicting part is the same as HeatmapHead introduced in + in `Simple Baselines`_ by Xiao et al (2018). + + Args: + in_channels (int | Sequence[int]): Number of channels in the input + feature map + out_channels (int): Number of channels in the output heatmap + deconv_out_channels (Sequence[int], optional): The output channel + number of each deconv layer. Defaults to ``(256, 256, 256)`` + deconv_kernel_sizes (Sequence[int | tuple], optional): The kernel size + of each deconv layer. Each element should be either an integer for + both height and width dimensions, or a tuple of two integers for + the height and the width dimension respectively.Defaults to + ``(4, 4, 4)`` + conv_out_channels (Sequence[int], optional): The output channel number + of each intermediate conv layer. ``None`` means no intermediate + conv layer between deconv layers and the final conv layer. + Defaults to ``None`` + conv_kernel_sizes (Sequence[int | tuple], optional): The kernel size + of each intermediate conv layer. Defaults to ``None`` + final_layer_dict (dict): Arguments of the final Conv2d layer. + Defaults to ``dict(kernel_size=1)`` + keypoint_loss (Config): Config of the keypoint loss. Defaults to use + :class:`KeypointMSELoss` + probability_loss (Config): Config of the probability loss. Defaults to use + :class:`BCELoss` + visibility_loss (Config): Config of the visibility loss. Defaults to use + :class:`BCELoss` + oks_loss (Config): Config of the oks loss. Defaults to use + :class:`MSELoss` + error_loss (Config): Config of the error loss. Defaults to use + :class:`L1LogLoss` + normalize (bool): Whether to normalize values in the heatmaps between + 0 and 1 with sigmoid. Defaults to ``False`` + detach_probability (bool): Whether to detach the probability + from gradient computation. Defaults to ``True`` + detach_visibility (bool): Whether to detach the visibility + from gradient computation. Defaults to ``True`` + learn_heatmaps_from_zeros (bool): Whether to learn the + heatmaps from zeros. Defaults to ``False`` + freeze_heatmaps (bool): Whether to freeze the heatmaps prediction. + Defaults to ``False`` + freeze_probability (bool): Whether to freeze the probability prediction. + Defaults to ``False`` + freeze_visibility (bool): Whether to freeze the visibility prediction. + Defaults to ``False`` + freeze_oks (bool): Whether to freeze the oks prediction. + Defaults to ``False`` + freeze_error (bool): Whether to freeze the error prediction. + Defaults to ``False`` + decoder (Config, optional): The decoder config that controls decoding + keypoint coordinates from the network output. Defaults to ``None`` + init_cfg (Config, optional): Config to control the initialization. See + :attr:`default_init_cfg` for default settings + + + .. _`Simple Baselines`: https://arxiv.org/abs/1804.06208 + """ + + _version = 2 + + def __init__( + self, + in_channels: Union[int, Sequence[int]], + out_channels: int, + deconv_out_channels: OptIntSeq = (256, 256, 256), + deconv_kernel_sizes: OptIntSeq = (4, 4, 4), + conv_out_channels: OptIntSeq = None, + conv_kernel_sizes: OptIntSeq = None, + final_layer_dict: dict = dict(kernel_size=1), + keypoint_loss: ConfigType = dict(type="KeypointMSELoss", use_target_weight=True), + probability_loss: ConfigType = dict(type="BCELoss", use_target_weight=True), + visibility_loss: ConfigType = dict(type="BCELoss", use_target_weight=True), + oks_loss: ConfigType = dict(type="MSELoss", use_target_weight=True), + error_loss: ConfigType = dict(type="L1LogLoss", use_target_weight=True), + normalize: float = None, + detach_probability: bool = True, + detach_visibility: bool = True, + learn_heatmaps_from_zeros: bool = False, + freeze_heatmaps: bool = False, + freeze_probability: bool = False, + freeze_visibility: bool = False, + freeze_oks: bool = False, + freeze_error: bool = False, + decoder: OptConfigType = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2), + init_cfg: OptConfigType = None, + ): + + if init_cfg is None: + init_cfg = self.default_init_cfg + + super().__init__(init_cfg) + + self.in_channels = in_channels + self.out_channels = out_channels + self.keypoint_loss_module = MODELS.build(keypoint_loss) + self.probability_loss_module = MODELS.build(probability_loss) + self.visibility_loss_module = MODELS.build(visibility_loss) + self.oks_loss_module = MODELS.build(oks_loss) + self.error_loss_module = MODELS.build(error_loss) + + self.gauss_sigma = 2.0 + self.gauss_kernel_size = int(2.0 * 3.0 * self.gauss_sigma + 1.0) + ts = torch.linspace(-self.gauss_kernel_size // 2, self.gauss_kernel_size // 2, self.gauss_kernel_size) + gauss = torch.exp(-((ts / self.gauss_sigma) ** 2) / 2) + gauss = gauss / gauss.sum() + self.gauss_kernel = gauss.unsqueeze(0) * gauss.unsqueeze(1) + + self.decoder = KEYPOINT_CODECS.build(decoder) + if "oks" in decoder["type"].lower(): + self.fast_decoder = KEYPOINT_CODECS.build(dict(type="OKSArgMaxHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=-1)) + else: + self.fast_decoder = KEYPOINT_CODECS.build(decoder) + self.fast_decoder = KEYPOINT_CODECS.build(dict(type="OKSArgMaxHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=-1)) + self.nonlinearity = nn.ReLU(inplace=True) + self.learn_heatmaps_from_zeros = learn_heatmaps_from_zeros + + self.num_iters = 0 + unique_hash = np.random.randint(0, 100000) + self.interval = 50 + + self._build_heatmap_head( + in_channels=in_channels, + out_channels=out_channels, + deconv_out_channels=deconv_out_channels, + deconv_kernel_sizes=deconv_kernel_sizes, + conv_out_channels=conv_out_channels, + conv_kernel_sizes=conv_kernel_sizes, + final_layer_dict=final_layer_dict, + normalize=normalize, + freeze=freeze_heatmaps, + ) + + self.normalize = normalize + + self.detach_probability = detach_probability + self._build_probability_head(in_channels=in_channels, out_channels=out_channels, freeze=freeze_probability) + + self.detach_visibility = detach_visibility + self._build_visibility_head( + in_channels=in_channels, + # in_channels=self.decoder.heatmap_size[0] * self.decoder.heatmap_size[1], + out_channels=out_channels, + freeze=freeze_visibility, + ) + + self._build_oks_head(in_channels=in_channels, out_channels=out_channels, freeze=freeze_oks) + self.freeze_oks = freeze_oks + + self._build_error_head(in_channels=in_channels, out_channels=out_channels, freeze=freeze_error) + self.freeze_error = freeze_error + + # Register the hook to automatically convert old version state dicts + self._register_load_state_dict_pre_hook(self._load_state_dict_pre_hook) + + def _build_heatmap_head( + self, + in_channels: int, + out_channels: int, + deconv_out_channels: Sequence[int], + deconv_kernel_sizes: Sequence[int], + conv_out_channels: Sequence[int], + conv_kernel_sizes: Sequence[int], + final_layer_dict: dict, + normalize: float = None, + freeze: bool = False, + ) -> nn.Module: + """Build the heatmap head module.""" + if deconv_out_channels: + if deconv_kernel_sizes is None or len(deconv_out_channels) != len(deconv_kernel_sizes): + raise ValueError( + '"deconv_out_channels" and "deconv_kernel_sizes" should ' + "be integer sequences with the same length. Got " + f"mismatched lengths {deconv_out_channels} and " + f"{deconv_kernel_sizes}" + ) + + self.deconv_layers = self._make_deconv_layers( + in_channels=in_channels, + layer_out_channels=deconv_out_channels, + layer_kernel_sizes=deconv_kernel_sizes, + ) + in_channels = deconv_out_channels[-1] + else: + self.deconv_layers = nn.Identity() + + if conv_out_channels: + if conv_kernel_sizes is None or len(conv_out_channels) != len(conv_kernel_sizes): + raise ValueError( + '"conv_out_channels" and "conv_kernel_sizes" should ' + "be integer sequences with the same length. Got " + f"mismatched lengths {conv_out_channels} and " + f"{conv_kernel_sizes}" + ) + + self.conv_layers = self._make_conv_layers( + in_channels=in_channels, layer_out_channels=conv_out_channels, layer_kernel_sizes=conv_kernel_sizes + ) + in_channels = conv_out_channels[-1] + else: + self.conv_layers = nn.Identity() + + if final_layer_dict is not None: + cfg = dict(type="Conv2d", in_channels=in_channels, out_channels=out_channels, kernel_size=1) + cfg.update(final_layer_dict) + self.final_layer = build_conv_layer(cfg) + else: + self.final_layer = nn.Identity() + # self.normalize_layer = lambda x: x / x.sum(dim=-1, keepdim=True) if normalize else nn.Identity() + # self.normalize_layer = nn.Softmax(dim=-1) if normalize else nn.Identity() + self.normalize_layer = nn.Identity() if normalize is None else Sparsemax(dim=-1) + + if freeze: + for param in self.deconv_layers.parameters(): + param.requires_grad = False + for param in self.conv_layers.parameters(): + param.requires_grad = False + for param in self.final_layer.parameters(): + param.requires_grad = False + + def _build_probability_head(self, in_channels: int, out_channels: int, freeze: bool = False) -> nn.Module: + """Build the probability head module.""" + ppb_layers = [] + kernel_sizes = [(4, 3), (2, 2), (2, 2)] + for i in range(len(kernel_sizes)): + ppb_layers.append( + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=in_channels, kernel_size=3, stride=1, padding=1) + ) + ppb_layers.append(nn.BatchNorm2d(num_features=in_channels)) + ppb_layers.append(nn.MaxPool2d(kernel_size=kernel_sizes[i], stride=kernel_sizes[i], padding=0)) + ppb_layers.append(self.nonlinearity) + ppb_layers.append( + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0) + ) + ppb_layers.append(nn.Sigmoid()) + self.probability_layers = nn.Sequential(*ppb_layers) + + if freeze: + for param in self.probability_layers.parameters(): + param.requires_grad = False + + def _build_visibility_head(self, in_channels: int, out_channels: int, freeze: bool = False) -> nn.Module: + """Build the visibility head module.""" + vis_layers = [] + kernel_sizes = [(4, 3), (2, 2), (2, 2)] + for i in range(len(kernel_sizes)): + vis_layers.append( + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=in_channels, kernel_size=3, stride=1, padding=1) + ) + vis_layers.append(nn.BatchNorm2d(num_features=in_channels)) + vis_layers.append(nn.MaxPool2d(kernel_size=kernel_sizes[i], stride=kernel_sizes[i], padding=0)) + vis_layers.append(self.nonlinearity) + vis_layers.append( + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0) + ) + vis_layers.append(nn.Sigmoid()) + self.visibility_layers = nn.Sequential(*vis_layers) + # self.visibility_layers = nn.Sequential( + # nn.Conv1d(in_channels=in_channels, out_channels=64, kernel_size=1, padding=0), + # nn.ReLU(), + # nn.Conv1d(in_channels=64, out_channels=32, kernel_size=1, padding=0), + # nn.ReLU(), + # nn.Conv1d(in_channels=32, out_channels=1, kernel_size=1, padding=0), + # nn.Sigmoid(), + # ) + + if freeze: + for param in self.visibility_layers.parameters(): + param.requires_grad = False + + def _build_oks_head(self, in_channels: int, out_channels: int, freeze: bool = False) -> nn.Module: + """Build the oks head module.""" + oks_layers = [] + kernel_sizes = [(4, 3), (2, 2), (2, 2)] + for i in range(len(kernel_sizes)): + oks_layers.append( + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=in_channels, kernel_size=3, stride=1, padding=1) + ) + oks_layers.append(nn.BatchNorm2d(num_features=in_channels)) + oks_layers.append(nn.MaxPool2d(kernel_size=kernel_sizes[i], stride=kernel_sizes[i], padding=0)) + oks_layers.append(self.nonlinearity) + oks_layers.append( + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0) + ) + oks_layers.append(nn.Sigmoid()) + self.oks_layers = nn.Sequential(*oks_layers) + + if freeze: + for param in self.oks_layers.parameters(): + param.requires_grad = False + + def _build_error_head(self, in_channels: int, out_channels: int, freeze: bool = False) -> nn.Module: + """Build the error head module.""" + error_layers = [] + kernel_sizes = [(4, 3), (2, 2), (2, 2)] + for i in range(len(kernel_sizes)): + error_layers.append( + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=in_channels, kernel_size=3, stride=1, padding=1) + ) + error_layers.append(nn.BatchNorm2d(num_features=in_channels)) + error_layers.append(nn.MaxPool2d(kernel_size=kernel_sizes[i], stride=kernel_sizes[i], padding=0)) + error_layers.append(self.nonlinearity) + error_layers.append( + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0) + ) + error_layers.append(self.nonlinearity) + self.error_layers = nn.Sequential(*error_layers) + + if freeze: + for param in self.error_layers.parameters(): + param.requires_grad = False + + def _make_conv_layers(self, in_channels: int, layer_out_channels: Sequence[int], layer_kernel_sizes: Sequence[int]) -> nn.Module: + """Create convolutional layers by given parameters.""" + + layers = [] + for out_channels, kernel_size in zip(layer_out_channels, layer_kernel_sizes): + padding = (kernel_size - 1) // 2 + cfg = dict( + type="Conv2d", in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=1, padding=padding + ) + layers.append(build_conv_layer(cfg)) + layers.append(nn.BatchNorm2d(num_features=out_channels)) + layers.append(self.nonlinearity) + in_channels = out_channels + + return nn.Sequential(*layers) + + def _make_deconv_layers(self, in_channels: int, layer_out_channels: Sequence[int], layer_kernel_sizes: Sequence[int]) -> nn.Module: + """Create deconvolutional layers by given parameters.""" + + layers = [] + for out_channels, kernel_size in zip(layer_out_channels, layer_kernel_sizes): + if kernel_size == 4: + padding = 1 + output_padding = 0 + elif kernel_size == 3: + padding = 1 + output_padding = 1 + elif kernel_size == 2: + padding = 0 + output_padding = 0 + else: + raise ValueError(f"Unsupported kernel size {kernel_size} for" "deconvlutional layers in " f"{self.__class__.__name__}") + cfg = dict( + type="deconv", + in_channels=in_channels, + out_channels=out_channels, + kernel_size=kernel_size, + stride=2, + padding=padding, + output_padding=output_padding, + bias=False, + ) + layers.append(build_upsample_layer(cfg)) + layers.append(nn.BatchNorm2d(num_features=out_channels)) + layers.append(self.nonlinearity) + in_channels = out_channels + + return nn.Sequential(*layers) + + def _error_from_heatmaps(self, gt_heatmaps: Tensor, dt_heatmaps: Tensor) -> Tensor: + """Calculate the error from heatmaps. + + Args: + heatmaps (Tensor): The predicted heatmaps. + + Returns: + Tensor: The predicted error. + """ + # Transform to numpy + gt_heatmaps = to_numpy(gt_heatmaps) + dt_heatmaps = to_numpy(dt_heatmaps) + + # Get locations from heatmaps + B, C, H, W = gt_heatmaps.shape + gt_coords = np.zeros((B, C, 2)) + dt_coords = np.zeros((B, C, 2)) + for i, (gt_htm, dt_htm) in enumerate(zip(gt_heatmaps, dt_heatmaps)): + coords, score = self.fast_decoder.decode(gt_htm) + coords = coords.squeeze() + gt_coords[i, :, :] = coords + + coords, score = self.fast_decoder.decode(dt_htm) + coords = coords.squeeze() + dt_coords[i, :, :] = coords + + # NaN coordinates mean empty heatmaps -> set them to -1 + # as the error will be ignored by weight + gt_coords[np.isnan(gt_coords)] = -1 + + # Calculate the error + target_errors = np.linalg.norm(gt_coords - dt_coords, axis=2) + assert (target_errors >= 0).all(), "Euclidean distance cannot be negative" + + return target_errors + + def _oks_from_heatmaps(self, gt_heatmaps: Tensor, dt_heatmaps: Tensor, weight: Tensor) -> Tensor: + """Calculate the OKS from heatmaps. + + Args: + heatmaps (Tensor): The predicted heatmaps. + + Returns: + Tensor: The predicted OKS. + """ + C = dt_heatmaps.shape[1] + + # Transform to numpy + gt_heatmaps = to_numpy(gt_heatmaps) + dt_heatmaps = to_numpy(dt_heatmaps) + B, C, H, W = gt_heatmaps.shape + weight = to_numpy(weight).squeeze().reshape((B, C, 1)) + + # Get locations from heatmaps + gt_coords = np.zeros((B, C, 2)) + dt_coords = np.zeros((B, C, 2)) + for i, (gt_htm, dt_htm) in enumerate(zip(gt_heatmaps, dt_heatmaps)): + coords, score = self.fast_decoder.decode(gt_htm) + coords = coords.squeeze() + gt_coords[i, :, :] = coords + + coords, score = self.fast_decoder.decode(dt_htm) + coords = coords.squeeze() + dt_coords[i, :, :] = coords + + # NaN coordinates mean empty heatmaps -> set them to 0 + gt_coords[np.isnan(gt_coords)] = 0 + + # Add probability as visibility + gt_coords = gt_coords * weight + dt_coords = dt_coords * weight + gt_coords = np.concatenate((gt_coords, weight * 2), axis=2) + dt_coords = np.concatenate((dt_coords, weight * 2), axis=2) + + # Calculate the oks + target_oks = [] + oks_weights = [] + for i in range(len(gt_coords)): + gt_kpts = gt_coords[i] + dt_kpts = dt_coords[i] + valid_gt_kpts = gt_kpts[:, 2] > 0 + if not valid_gt_kpts.any(): + # Changed for per-keypoint OKS + target_oks.append(np.zeros(C)) + oks_weights.append(0) + continue + + gt_bbox = np.array( + [ + 0, + 0, + 64, + 48, + ] + ) + gt = { + "keypoints": gt_kpts, + "bbox": gt_bbox, + "area": gt_bbox[2] * gt_bbox[3], + } + dt = { + "keypoints": dt_kpts, + "bbox": gt_bbox, + "area": gt_bbox[2] * gt_bbox[3], + } + # Changed for per-keypoint OKS + oks = compute_oks(gt, dt, use_area=False, per_kpt=True) + target_oks.append(oks) + oks_weights.append(1) + + target_oks = np.array(target_oks) + target_oks = torch.from_numpy(target_oks).float() + + oks_weights = np.array(oks_weights) + oks_weights = torch.from_numpy(oks_weights).float() + + return target_oks, oks_weights + + @property + def default_init_cfg(self): + init_cfg = [dict(type="Normal", layer=["Conv2d", "ConvTranspose2d"], std=0.001), dict(type="Constant", layer="BatchNorm2d", val=1)] + return init_cfg + + def forward(self, feats: Tuple[Tensor]) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor]: + """Forward the network. The input is multi scale feature maps and the + output is (1) the heatmap, (2) probability, (3) visibility, (4) oks and (5) error. + + Args: + feats (Tensor): Multi scale feature maps. + + Returns: + Tuple[Tensor, Tensor, Tensor, Tensor, Tensor]: outputs. + """ + x = feats[-1] + + heatmaps = self.forward_heatmap(x) + probabilities = self.forward_probability(x) + visibilities = self.forward_visibility(x) + # visibilities = self.forward_visibility(heatmaps) + oks = self.forward_oks(x) + errors = self.forward_error(x) + + return heatmaps, probabilities, visibilities, oks, errors + + def forward_heatmap(self, x: Tensor) -> Tensor: + """Forward the network. The input is multi scale feature maps and the + output is the heatmap. + + Args: + x (Tensor): Multi scale feature maps. + + Returns: + Tensor: output heatmap. + """ + x = self.deconv_layers(x) + x = self.conv_layers(x) + x = self.final_layer(x) + B, C, H, W = x.shape + x = x.reshape((B, C, H * W)) + x = self.normalize_layer(x / 0.5) + if self.normalize is not None: + x = x * self.normalize + x = torch.clamp(x, 0, 1) + x = x.reshape((B, C, H, W)) + + # # Blur the heatmaps with Gaussian + # x = x.reshape((B*C, 1, H, W)) + # x = nn.functional.conv2d(x, self.gauss_kernel[None, None, :, :].to(x.device), padding='same') + # x = x.reshape((B, C, H, W)) + + return x + + def forward_probability(self, x: Tensor) -> Tensor: + """Forward the network. The input is multi scale feature maps and the + output is the probability. + + Args: + x (Tensor): Multi scale feature maps. + detach (bool): Whether to detach the probability from gradient + + Returns: + Tensor: output probability. + """ + if self.detach_probability: + x = x.detach() + x = self.probability_layers(x) + return x + + def forward_visibility(self, x: Tensor) -> Tensor: + """Forward the network. The input is multi scale feature maps and the + output is the visibility. + + Args: + x (Tensor): Multi scale feature maps. + detach (bool): Whether to detach the visibility from gradient + + Returns: + Tensor: output visibility. + """ + if self.detach_visibility: + x = x.detach() + # # Reshape from (B, C, H, W) to (B, H*W, C) + # B, C, H, W = x.shape + # x = x.reshape((B, C, -1)) + # x = x.permute(0, 2, 1) + x = self.visibility_layers(x) + # x = x.view((B, -1, 1, 1)) + return x + + def forward_oks(self, x: Tensor) -> Tensor: + """Forward the network. The input is multi scale feature maps and the + output is the oks. + + Args: + x (Tensor): Multi scale feature maps. + + Returns: + Tensor: output oks. + """ + x = x.detach() + x = self.oks_layers(x) + return x + + def forward_error(self, x: Tensor) -> Tensor: + """Forward the network. The input is multi scale feature maps and the + output is the euclidean error. + + Args: + x (Tensor): Multi scale feature maps. + + Returns: + Tensor: output error. + """ + x = x.detach() + x = self.error_layers(x) + return x + + def predict(self, feats: Features, batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: + """Predict results from features. + + Args: + feats (Tuple[Tensor] | List[Tuple[Tensor]]): The multi-stage + features (or multiple multi-stage features in TTA) + batch_data_samples (List[:obj:`PoseDataSample`]): The batch + data samples + test_cfg (dict): The runtime config for testing process. Defaults + to {} + + Returns: + Union[InstanceList | Tuple[InstanceList | PixelDataList]]: If + ``test_cfg['output_heatmap']==True``, return both pose and heatmap + prediction; otherwise only return the pose prediction. + + The pose prediction is a list of ``InstanceData``, each contains + the following fields: + + - keypoints (np.ndarray): predicted keypoint coordinates in + shape (num_instances, K, D) where K is the keypoint number + and D is the keypoint dimension + - keypoint_scores (np.ndarray): predicted keypoint scores in + shape (num_instances, K) + + The heatmap prediction is a list of ``PixelData``, each contains + the following fields: + + - heatmaps (Tensor): The predicted heatmaps in shape (K, h, w) + """ + + if test_cfg.get("flip_test", False): + # TTA: flip test -> feats = [orig, flipped] + assert isinstance(feats, list) and len(feats) == 2 + flip_indices = batch_data_samples[0].metainfo["flip_indices"] + _feats, _feats_flip = feats + + _htm, _prob, _vis, _oks, _err = self.forward(_feats) + _htm_flip, _prob_flip, _vis_flip, _oks_flip, _err_flip = self.forward(_feats_flip) + B, C, H, W = _htm.shape + + # Flip back the keypoints + _htm_flip = flip_heatmaps( + _htm_flip, + flip_mode=test_cfg.get("flip_mode", "heatmap"), + flip_indices=flip_indices, + shift_heatmap=test_cfg.get("shift_heatmap", False), + ) + heatmaps = (_htm + _htm_flip) * 0.5 + + # Flip back scalars + _prob_flip = _prob_flip[:, flip_indices] + _vis_flip = _vis_flip[:, flip_indices] + _oks_flip = _oks_flip[:, flip_indices] + _err_flip = _err_flip[:, flip_indices] + + probabilities = (_prob + _prob_flip) * 0.5 + visibilities = (_vis + _vis_flip) * 0.5 + oks = (_oks + _oks_flip) * 0.5 + errors = (_err + _err_flip) * 0.5 + else: + heatmaps, probabilities, visibilities, oks, errors = self.forward(feats) + B, C, H, W = heatmaps.shape + + preds = self.decode(heatmaps) + probabilities = to_numpy(probabilities).reshape((B, 1, C)) + visibilities = to_numpy(visibilities).reshape((B, 1, C)) + oks = to_numpy(oks).reshape((B, 1, C)) + errors = to_numpy(errors).reshape((B, 1, C)) + + # Normalize errors by dividing with the diagonal of the heatmap + htm_diagonal = np.sqrt(H**2 + W**2) + errors = errors / htm_diagonal + + for pi, p in enumerate(preds): + p.set_field(p["keypoint_scores"], "keypoints_conf") + p.set_field(probabilities[pi], "keypoints_probs") + p.set_field(visibilities[pi], "keypoints_visible") + p.set_field(oks[pi], "keypoints_oks") + p.set_field(errors[pi], "keypoints_error") + + # Replace the keypoint scores with OKS/errors + if not self.freeze_oks: + p.set_field(oks[pi], "keypoint_scores") + # print("Setting OKS as keypoint scores") + # p.set_field(1-errors[pi], "keypoint_scores") + + # hm = heatmaps.detach().cpu().numpy() + # print("Heatmaps:", hm.shape, hm.min(), hm.max()) + + if test_cfg.get("output_heatmaps", False): + pred_fields = [PixelData(heatmaps=hm) for hm in heatmaps.detach()] + return preds, pred_fields + else: + return preds + + def loss(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: + """Calculate losses from a batch of inputs and data samples. + + Args: + feats (Tuple[Tensor]): The multi-stage features + batch_data_samples (List[:obj:`PoseDataSample`]): The batch + data samples + train_cfg (dict): The runtime config for training process. + Defaults to {} + + Returns: + dict: A dictionary of losses. + """ + dt_heatmaps, dt_probs, dt_vis, dt_oks, dt_errs = self.forward(feats) + device = dt_heatmaps.device + B, C, H, W = dt_heatmaps.shape + + # Extract GT data + gt_heatmaps = torch.stack([d.gt_fields.heatmaps for d in batch_data_samples]) + gt_probs = np.stack([d.gt_instances.in_image.astype(int) for d in batch_data_samples]) + gt_annotated = np.stack([d.gt_instances.keypoints_visible.astype(int) for d in batch_data_samples]) + gt_vis = np.stack([d.gt_instances.keypoints_visibility.astype(int) for d in batch_data_samples]) + keypoint_weights = torch.cat([d.gt_instance_labels.keypoint_weights for d in batch_data_samples]) + + # Compute GT errors and OKS + if self.freeze_error: + gt_errs = torch.zeros((B, C, 1), device=device, dtype=dt_errs.dtype) + else: + gt_errs = self._error_from_heatmaps(gt_heatmaps, dt_heatmaps) + if self.freeze_oks: + gt_oks = torch.zeros((B, C, 1), device=device, dtype=dt_oks.dtype) + oks_weight = torch.zeros((B, C, 1), device=device, dtype=dt_oks.dtype) + else: + gt_oks, oks_weight = self._oks_from_heatmaps( + gt_heatmaps, + dt_heatmaps, + gt_probs & gt_annotated, + ) + + # Convert everything to tensors + gt_probs = torch.tensor(gt_probs, device=device, dtype=dt_probs.dtype) + gt_vis = torch.tensor(gt_vis, device=device, dtype=dt_vis.dtype) + gt_annotated = torch.tensor(gt_annotated, device=device) + + gt_oks = gt_oks.to(device).to(dt_oks.dtype) + oks_weight = oks_weight.to(device).to(dt_oks.dtype) + gt_errs = gt_errs.to(device).to(dt_errs.dtype) + + # Reshape everything to comparable shapes + gt_heatmaps = gt_heatmaps.view((B, C, H, W)) + dt_heatmaps = dt_heatmaps.view((B, C, H, W)) + gt_probs = gt_probs.view((B, C)) + dt_probs = dt_probs.view((B, C)) + gt_vis = gt_vis.view((B, C)) + dt_vis = dt_vis.view((B, C)) + gt_oks = gt_oks.view((B, C)) + dt_oks = dt_oks.view((B, C)) + gt_errs = gt_errs.view((B, C)) + dt_errs = dt_errs.view((B, C)) + keypoint_weights = keypoint_weights.view((B, C)) + gt_annotated = gt_annotated.view((B, C)) + # oks_weight = oks_weight.view((B, C)) + + annotated_in = gt_annotated & (gt_probs > 0.5) + + # calculate losses + losses = dict() + if self.learn_heatmaps_from_zeros: + heatmap_weights = gt_annotated + else: + heatmap_weights = keypoint_weights + # heatmap_weights = annotated_in + + heatmap_loss_pxl = self.keypoint_loss_module(dt_heatmaps, gt_heatmaps, heatmap_weights, per_pixel=True) + heatmap_loss = heatmap_loss_pxl.mean() + probability_loss = self.probability_loss_module(dt_probs, gt_probs, gt_annotated) + + # Weight the annotated keypoints such that sum of weights of invisible keypoints is the same as visible ones + invisible_in = (gt_vis == 0) & (gt_annotated > 0.5) + visible_in = (gt_vis > 0) & (gt_annotated > 0.5) + weighted_annotated_in = annotated_in.clone().to(float) + weighted_annotated_in[invisible_in] = (1 / (invisible_in.sum() + 1e-10)).to(weighted_annotated_in.dtype) + weighted_annotated_in[visible_in] = (1 / (visible_in.sum() + 1e-10)).to(weighted_annotated_in.dtype) + weighted_annotated_in = weighted_annotated_in / weighted_annotated_in[weighted_annotated_in > 0].min() + weighted_annotated_in = weighted_annotated_in.to(dt_vis.dtype) + + visibility_loss = self.visibility_loss_module(dt_vis, gt_vis, weighted_annotated_in) + oks_loss = self.oks_loss_module(dt_oks, gt_oks, annotated_in) + error_loss = self.error_loss_module(dt_errs, gt_errs, annotated_in) + + losses.update( + loss_kpt=heatmap_loss, + loss_probability=probability_loss, + loss_visibility=visibility_loss, + loss_oks=oks_loss, + loss_error=error_loss, + ) + + # calculate accuracy + if train_cfg.get("compute_acc", True): + acc_pose = self.get_pose_accuracy(dt_heatmaps, gt_heatmaps, keypoint_weights > 0.5) + losses.update(acc_pose=acc_pose) + + # Calculate the best binary accuracy for probability + acc_prob, _ = self.get_binary_accuracy( + dt_probs, + gt_probs, + gt_annotated > 0.5, + force_balanced=True, + ) + losses.update(acc_prob=acc_prob) + + # Calculate the best binary accuracy for visibility + acc_vis, _ = self.get_binary_accuracy( + dt_vis, + gt_vis, + annotated_in > 0.5, + force_balanced=True, + ) + losses.update(acc_vis=acc_vis) + + # Calculate the MAE for OKS + acc_oks = self.get_mae( + dt_oks, + gt_oks, + annotated_in > 0.5, + ) + losses.update(mae_oks=acc_oks) + + # Calculate the MAE for euclidean error + acc_err = self.get_mae( + dt_errs, + gt_errs, + annotated_in > 0.5, + ) + losses.update(mae_err=acc_err) + + # Calculate the MAE between Euclidean error and OKS + # err_to_oks_mae = self.get_mae( + # self.error_to_OKS(dt_errs, area=H*W), + # gt_oks, + # annotated_in > 0.5, + # ) + # losses.update(mae_err_to_oks=err_to_oks_mae) + + return losses + + def _visualize_heatmaps(self, htm, tgt, lss, weight, prob): + tgt_range = (tgt.min(), tgt.max()) + htm_range = (htm.min(), htm.max()) + lss_range = (lss.min(), lss.max()) + + tgt[tgt < 0] = 0 + htm[htm < 0] = 0 + lss[lss < 0] = 0 + + # Normalize heatmaps between 0 and 1 + tgt /= tgt.max() + 1e-10 + htm /= htm.max() + 1e-10 + lss /= lss.max() + 1e-10 + + scale = 6 + + htm_color = cv2.cvtColor((htm * 255).astype(np.uint8), cv2.COLOR_GRAY2BGR) + htm_color = cv2.applyColorMap(htm_color, cv2.COLORMAP_JET) + htm_color = cv2.resize(htm_color, (htm.shape[1] * scale, htm.shape[0] * scale), interpolation=cv2.INTER_NEAREST) + + tgt_color = cv2.cvtColor((tgt * 255).astype(np.uint8), cv2.COLOR_GRAY2BGR) + tgt_color = cv2.applyColorMap(tgt_color, cv2.COLORMAP_JET) + tgt_color = cv2.resize(tgt_color, (htm.shape[1] * scale, htm.shape[0] * scale), interpolation=cv2.INTER_NEAREST) + + lss_color = cv2.cvtColor((lss * 255).astype(np.uint8), cv2.COLOR_GRAY2BGR) + lss_color = cv2.applyColorMap(lss_color, cv2.COLORMAP_JET) + lss_color = cv2.resize(lss_color, (htm.shape[1] * scale, htm.shape[0] * scale), interpolation=cv2.INTER_NEAREST) + + if scale > 2: + tgt_color_text = tgt_color.copy() + cv2.putText( + tgt_color_text, + "tgt ({:.1f}, {:.1f})".format(tgt_range[0] * 10, tgt_range[1] * 10), + (10, 20), + cv2.FONT_HERSHEY_SIMPLEX, + 0.5, + (255, 255, 255), + 1, + ) + tgt_color = cv2.addWeighted(tgt_color, 0.6, tgt_color_text, 0.4, 0) + + htm_color_text = htm_color.copy() + cv2.putText( + htm_color_text, + "htm ({:.1f}, {:.1f})".format(htm_range[0] * 10, htm_range[1] * 10), + (10, 20), + cv2.FONT_HERSHEY_SIMPLEX, + 0.5, + (255, 255, 255), + 1, + ) + htm_color = cv2.addWeighted(htm_color, 0.6, htm_color_text, 0.4, 0) + + lss_color_text = lss_color.copy() + cv2.putText( + lss_color_text, + "lss ({:.1f}, {:.1f})".format(lss_range[0] * 10, lss_range[1] * 10), + (10, 20), + cv2.FONT_HERSHEY_SIMPLEX, + 0.5, + (255, 255, 255), + 1, + ) + lss_color = cv2.addWeighted(lss_color, 0.6, lss_color_text, 0.4, 0) + + # Get argmax of the target and draw horizontal and vertical lines + tgt_argmax = np.unravel_index(tgt.argmax(), tgt.shape) + tgt_color_line = tgt_color.copy() + cv2.line(tgt_color_line, (0, tgt_argmax[0] * scale), (tgt_color.shape[1], tgt_argmax[0] * scale), (0, 255, 255), 1) + cv2.line(tgt_color_line, (tgt_argmax[1] * scale, 0), (tgt_argmax[1] * scale, tgt_color.shape[0]), (0, 255, 255), 1) + tgt_color = cv2.addWeighted(tgt_color, 0.6, tgt_color_line, 0.4, 0) + htm_color_line = htm_color.copy() + cv2.line(htm_color_line, (0, tgt_argmax[0] * scale), (tgt_color.shape[1], tgt_argmax[0] * scale), (0, 255, 255), 1) + cv2.line(htm_color_line, (tgt_argmax[1] * scale, 0), (tgt_argmax[1] * scale, tgt_color.shape[0]), (0, 255, 255), 1) + htm_color = cv2.addWeighted(htm_color, 0.6, htm_color_line, 0.4, 0) + lss_color_line = lss_color.copy() + cv2.line(lss_color_line, (0, tgt_argmax[0] * scale), (tgt_color.shape[1], tgt_argmax[0] * scale), (0, 255, 255), 1) + cv2.line(lss_color_line, (tgt_argmax[1] * scale, 0), (tgt_argmax[1] * scale, tgt_color.shape[0]), (0, 255, 255), 1) + lss_color = cv2.addWeighted(lss_color, 0.6, lss_color_line, 0.4, 0) + + white_column = np.ones((tgt_color.shape[0], 1, 3), dtype=np.uint8) * 255 + + save_img = np.concatenate( + ( + tgt_color, + white_column, + htm_color, + white_column, + lss_color, + ), + axis=1, + ) + + if weight < 0.5: + # Draw a red X across the whole save_img + cv2.line(save_img, (0, 0), (save_img.shape[1], save_img.shape[0]), (0, 0, 255), 2) + cv2.line(save_img, (0, save_img.shape[0]), (save_img.shape[1], 0), (0, 0, 255), 2) + elif prob < 0.5: + # Draw an yellow X across the whole save_img + cv2.line(save_img, (0, 0), (save_img.shape[1], save_img.shape[0]), (0, 255, 255), 2) + cv2.line(save_img, (0, save_img.shape[0]), (save_img.shape[1], 0), (0, 255, 255), 2) + return save_img + + def get_pose_accuracy(self, dt, gt, mask): + """Calculate the accuracy of predicted pose.""" + _, avg_acc, _ = pose_pck_accuracy( + output=to_numpy(dt), + target=to_numpy(gt), + mask=to_numpy(mask), + method="argmax", + ) + acc_pose = torch.tensor(avg_acc, device=gt.device) + return acc_pose + + def get_binary_accuracy(self, dt, gt, mask, force_balanced=False): + """Calculate the binary accuracy.""" + assert dt.shape == gt.shape + device = gt.device + dt = to_numpy(dt) + gt = to_numpy(gt) + mask = to_numpy(mask) + + dt = dt[mask] + gt = gt[mask] + gt = gt.astype(bool) + + if force_balanced: + # Force the number of positive and negative samples to be balanced + pos_num = np.sum(gt) + neg_num = len(gt) - pos_num + num = min(pos_num, neg_num) + if num == 0: + return torch.tensor([0.0], device=device), torch.tensor([0.0], device=device) + pos_idx = np.where(gt)[0] + neg_idx = np.where(~gt)[0] + + # Randomly sample the same number of positive and negative samples + np.random.shuffle(pos_idx) + np.random.shuffle(neg_idx) + idx = np.concatenate([pos_idx[:num], neg_idx[:num]]) + dt = dt[idx] + gt = gt[idx] + + n_samples = len(gt) + thresholds = np.arange(0.1, 1.0, 0.05) + preds = dt[:, None] > thresholds + correct = preds == gt[:, None] + counts = correct.sum(axis=0) + + # Find the threshold that maximizes the accuracy + best_idx = np.argmax(counts) + best_threshold = thresholds[best_idx] + best_acc = counts[best_idx] / n_samples + + best_acc = torch.tensor(best_acc, device=device).float() + best_threshold = torch.tensor(best_threshold, device=device).float() + return best_acc, best_threshold + + def get_mae(self, dt, gt, mask): + """Calculate the mean absolute error.""" + assert dt.shape == gt.shape + device = gt.device + dt = to_numpy(dt) + gt = to_numpy(gt) + mask = to_numpy(mask) + + dt = dt[mask] + gt = gt[mask] + mae = np.abs(dt - gt).mean() + + mae = torch.tensor(mae, device=device) + return mae + + def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, **kwargs): + """A hook function to convert old-version state dict of + :class:`TopdownHeatmapSimpleHead` (before MMPose v1.0.0) to a + compatible format of :class:`HeatmapHead`. + + The hook will be automatically registered during initialization. + """ + version = local_meta.get("version", None) + if version and version >= self._version: + return + + # convert old-version state dict + keys = list(state_dict.keys()) + for _k in keys: + if not _k.startswith(prefix): + continue + v = state_dict.pop(_k) + k = _k[len(prefix) :] + # In old version, "final_layer" includes both intermediate + # conv layers (new "conv_layers") and final conv layers (new + # "final_layer"). + # + # If there is no intermediate conv layer, old "final_layer" will + # have keys like "final_layer.xxx", which should be still + # named "final_layer.xxx"; + # + # If there are intermediate conv layers, old "final_layer" will + # have keys like "final_layer.n.xxx", where the weights of the last + # one should be renamed "final_layer.xxx", and others should be + # renamed "conv_layers.n.xxx" + k_parts = k.split(".") + if k_parts[0] == "final_layer": + if len(k_parts) == 3: + assert isinstance(self.conv_layers, nn.Sequential) + idx = int(k_parts[1]) + if idx < len(self.conv_layers): + # final_layer.n.xxx -> conv_layers.n.xxx + k_new = "conv_layers." + ".".join(k_parts[1:]) + else: + # final_layer.n.xxx -> final_layer.xxx + k_new = "final_layer." + k_parts[2] + else: + # final_layer.xxx remains final_layer.xxx + k_new = k + else: + k_new = k + + state_dict[prefix + k_new] = v + + def error_to_OKS(self, error, area=1.0): + """Convert the error to OKS.""" + sigmas = np.array([0.26, 0.25, 0.25, 0.35, 0.35, 0.79, 0.79, 0.72, 0.72, 0.62, 0.62, 1.07, 1.07, 0.87, 0.87, 0.89, 0.89]) / 10.0 + if isinstance(error, torch.Tensor): + sigmas = torch.tensor(sigmas, device=error.device) + vars = (sigmas * 2) ** 2 + norm_error = error**2 / vars / area / 2.0 + return torch.exp(-norm_error) + + +def compute_oks(gt, dt, use_area=True, per_kpt=False): + g = np.array(gt["keypoints"]).reshape(-1, 3) + + if g.shape[0] == 17: + sigmas = sigmas_17 + elif g.shape[0] == 21: + sigmas = sigmas_21 + elif g.shape[0] == 23: + sigmas = sigmas_23 + elif g.shape[0] == 47: + sigmas = sigmas_47 + else: + raise ValueError(f"Unsupported number of keypoints: {g.shape[0]}") + + vars = (sigmas * 2) ** 2 + k = len(sigmas) + + visibility_condition = lambda x: x > 0 + xg = g[:, 0] + yg = g[:, 1] + vg = g[:, 2] + k1 = np.count_nonzero(visibility_condition(vg)) + bb = gt["bbox"] + x0 = bb[0] - bb[2] + x1 = bb[0] + bb[2] * 2 + y0 = bb[1] - bb[3] + y1 = bb[1] + bb[3] * 2 + + d = np.array(dt["keypoints"]).reshape((-1, 3)) + xd = d[:, 0] + yd = d[:, 1] + + if k > d.shape[0]: + sigmas = sigmas[: d.shape[0]] + k = d.shape[0] + + if k1 > 0: + # measure the per-keypoint distance if keypoints visible + dx = xd - xg + dy = yd - yg + + else: + # measure minimum distance to keypoints in (x0,y0) & (x1,y1) + z = np.zeros((k)) + dx = np.max((z, x0 - xd), axis=0) + np.max((z, xd - x1), axis=0) + dy = np.max((z, y0 - yd), axis=0) + np.max((z, yd - y1), axis=0) + + if use_area: + e = (dx**2 + dy**2) / vars / (gt["area"] + np.spacing(1)) / 2 + else: + tmparea = gt["bbox"][3] * gt["bbox"][2] * 0.53 + e = (dx**2 + dy**2) / vars / (tmparea + np.spacing(1)) / 2 + + if per_kpt: + oks = np.exp(-e) + if k1 > 0: + oks[~visibility_condition(vg)] = 0 + + else: + if k1 > 0: + e = e[visibility_condition(vg)] + oks = np.sum(np.exp(-e)) / e.shape[0] + + return oks diff --git a/mmpose/models/heads/hybrid_heads/poseid_head.py b/mmpose/models/heads/hybrid_heads/poseid_head.py index 218f7751b92943abd5e8d9602d437fe6621b58c6..7611ef6c42d9ecae779664cfdbb0f2efd275ef23 100644 --- a/mmpose/models/heads/hybrid_heads/poseid_head.py +++ b/mmpose/models/heads/hybrid_heads/poseid_head.py @@ -1,6 +1,7 @@ -# Copyright (c) OpenMMLab. All rights reserved. +# Copyright (c) Miroslav Purkrabek, ProbPose. All rights reserved. from typing import Optional, Sequence, Tuple, Union +import numpy as np import torch from mmcv.cnn import build_conv_layer, build_upsample_layer from mmengine.structures import PixelData @@ -10,18 +11,16 @@ from mmpose.evaluation.functional import pose_pck_accuracy from mmpose.models.utils.tta import flip_heatmaps from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, Features, OptConfigType, - OptSampleList, Predictions) -from ..base_head import BaseHead +from mmpose.utils.typing import ConfigType, Features, OptConfigType, OptSampleList, Predictions -import numpy as np +from ..base_head import BaseHead OptIntSeq = Optional[Sequence[int]] @MODELS.register_module() class PoseIDHead(BaseHead): - """Multi-variate head predicting all information about keypoints. Apart + """Multi-variate head predicting all information about keypoints. Apart from the heatmap, it also predicts: 1) Heatmap for each keypoint 2) Usefulness of the pose for identification @@ -66,27 +65,24 @@ class PoseIDHead(BaseHead): _version = 2 - def __init__(self, - in_channels: Union[int, Sequence[int]], - out_channels: int, - deconv_out_channels: OptIntSeq = (256, 256, 256), - deconv_kernel_sizes: OptIntSeq = (4, 4, 4), - conv_out_channels: OptIntSeq = None, - conv_kernel_sizes: OptIntSeq = None, - final_layer_dict: dict = dict(kernel_size=1), - keypoint_loss: ConfigType = dict( - type='KeypointMSELoss', use_target_weight=True), - usefulness_loss: ConfigType = dict( - type='MSELoss', use_target_weight=True), - usefulness_thr: float = None, - freeze_heatmaps: bool = False, - freeze_usefulness: bool = False, - detach_usefulness: bool = True, - decoder: OptConfigType = dict( - type='UDPHeatmap', input_size=(192, 256), - heatmap_size=(48, 64), sigma=2), - init_cfg: OptConfigType = None, - ): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + out_channels: int, + deconv_out_channels: OptIntSeq = (256, 256, 256), + deconv_kernel_sizes: OptIntSeq = (4, 4, 4), + conv_out_channels: OptIntSeq = None, + conv_kernel_sizes: OptIntSeq = None, + final_layer_dict: dict = dict(kernel_size=1), + keypoint_loss: ConfigType = dict(type="KeypointMSELoss", use_target_weight=True), + usefulness_loss: ConfigType = dict(type="MSELoss", use_target_weight=True), + usefulness_thr: float = None, + freeze_heatmaps: bool = False, + freeze_usefulness: bool = False, + detach_usefulness: bool = True, + decoder: OptConfigType = dict(type="UDPHeatmap", input_size=(192, 256), heatmap_size=(48, 64), sigma=2), + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg @@ -100,7 +96,7 @@ class PoseIDHead(BaseHead): self.decoder = KEYPOINT_CODECS.build(decoder) self.nonlinearity = nn.ReLU(inplace=True) - + self.usefulness_thr = usefulness_thr self.detach_usefulness = detach_usefulness @@ -113,33 +109,35 @@ class PoseIDHead(BaseHead): conv_kernel_sizes=conv_kernel_sizes, final_layer_dict=final_layer_dict, normalize=False, - freeze=freeze_heatmaps) - - self._build_usefulness_head( - in_channels=in_channels, - out_channels=out_channels, - freeze=freeze_usefulness) - + freeze=freeze_heatmaps, + ) + + self._build_usefulness_head(in_channels=in_channels, out_channels=out_channels, freeze=freeze_usefulness) + # Register the hook to automatically convert old version state dicts self._register_load_state_dict_pre_hook(self._load_state_dict_pre_hook) - def _build_heatmap_head(self, in_channels: int, out_channels: int, - deconv_out_channels: Sequence[int], - deconv_kernel_sizes: Sequence[int], - conv_out_channels: Sequence[int], - conv_kernel_sizes: Sequence[int], - final_layer_dict: dict, - normalize: bool = False, - freeze: bool = False) -> nn.Module: + def _build_heatmap_head( + self, + in_channels: int, + out_channels: int, + deconv_out_channels: Sequence[int], + deconv_kernel_sizes: Sequence[int], + conv_out_channels: Sequence[int], + conv_kernel_sizes: Sequence[int], + final_layer_dict: dict, + normalize: bool = False, + freeze: bool = False, + ) -> nn.Module: """Build the heatmap head module.""" if deconv_out_channels: - if deconv_kernel_sizes is None or len(deconv_out_channels) != len( - deconv_kernel_sizes): + if deconv_kernel_sizes is None or len(deconv_out_channels) != len(deconv_kernel_sizes): raise ValueError( '"deconv_out_channels" and "deconv_kernel_sizes" should ' - 'be integer sequences with the same length. Got ' - f'mismatched lengths {deconv_out_channels} and ' - f'{deconv_kernel_sizes}') + "be integer sequences with the same length. Got " + f"mismatched lengths {deconv_out_channels} and " + f"{deconv_kernel_sizes}" + ) self.deconv_layers = self._make_deconv_layers( in_channels=in_channels, @@ -151,28 +149,23 @@ class PoseIDHead(BaseHead): self.deconv_layers = nn.Identity() if conv_out_channels: - if conv_kernel_sizes is None or len(conv_out_channels) != len( - conv_kernel_sizes): + if conv_kernel_sizes is None or len(conv_out_channels) != len(conv_kernel_sizes): raise ValueError( '"conv_out_channels" and "conv_kernel_sizes" should ' - 'be integer sequences with the same length. Got ' - f'mismatched lengths {conv_out_channels} and ' - f'{conv_kernel_sizes}') + "be integer sequences with the same length. Got " + f"mismatched lengths {conv_out_channels} and " + f"{conv_kernel_sizes}" + ) self.conv_layers = self._make_conv_layers( - in_channels=in_channels, - layer_out_channels=conv_out_channels, - layer_kernel_sizes=conv_kernel_sizes) + in_channels=in_channels, layer_out_channels=conv_out_channels, layer_kernel_sizes=conv_kernel_sizes + ) in_channels = conv_out_channels[-1] else: self.conv_layers = nn.Identity() if final_layer_dict is not None: - cfg = dict( - type='Conv2d', - in_channels=in_channels, - out_channels=out_channels, - kernel_size=1) + cfg = dict(type="Conv2d", in_channels=in_channels, out_channels=out_channels, kernel_size=1) cfg.update(final_layer_dict) self.final_layer = build_conv_layer(cfg) else: @@ -187,33 +180,20 @@ class PoseIDHead(BaseHead): for param in self.final_layer.parameters(): param.requires_grad = False - def _build_usefulness_head(self, in_channels: int, out_channels: int, - freeze: bool = False) -> nn.Module: + def _build_usefulness_head(self, in_channels: int, out_channels: int, freeze: bool = False) -> nn.Module: """Build the probability head module.""" usf_layers = [] kernel_sizes = [(4, 3), (2, 2), (2, 2)] for i in range(len(kernel_sizes)): usf_layers.append( - build_conv_layer( - dict(type='Conv2d'), - in_channels=in_channels, - out_channels=in_channels, - kernel_size=3, - stride=1, - padding=1)) - usf_layers.append( - nn.BatchNorm2d(num_features=in_channels)) - usf_layers.append( - nn.MaxPool2d(kernel_size=kernel_sizes[i], stride=kernel_sizes[i], padding=0)) + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=in_channels, kernel_size=3, stride=1, padding=1) + ) + usf_layers.append(nn.BatchNorm2d(num_features=in_channels)) + usf_layers.append(nn.MaxPool2d(kernel_size=kernel_sizes[i], stride=kernel_sizes[i], padding=0)) usf_layers.append(self.nonlinearity) usf_layers.append( - build_conv_layer( - dict(type='Conv2d'), - in_channels=in_channels, - out_channels=1, - kernel_size=1, - stride=1, - padding=0)) + build_conv_layer(dict(type="Conv2d"), in_channels=in_channels, out_channels=1, kernel_size=1, stride=1, padding=0) + ) usf_layers.append(nn.Sigmoid()) self.usefulness_layers = nn.Sequential(*usf_layers) @@ -221,22 +201,15 @@ class PoseIDHead(BaseHead): for param in self.usefulness_layers.parameters(): param.requires_grad = False - def _make_conv_layers(self, in_channels: int, - layer_out_channels: Sequence[int], - layer_kernel_sizes: Sequence[int]) -> nn.Module: + def _make_conv_layers(self, in_channels: int, layer_out_channels: Sequence[int], layer_kernel_sizes: Sequence[int]) -> nn.Module: """Create convolutional layers by given parameters.""" layers = [] - for out_channels, kernel_size in zip(layer_out_channels, - layer_kernel_sizes): + for out_channels, kernel_size in zip(layer_out_channels, layer_kernel_sizes): padding = (kernel_size - 1) // 2 cfg = dict( - type='Conv2d', - in_channels=in_channels, - out_channels=out_channels, - kernel_size=kernel_size, - stride=1, - padding=padding) + type="Conv2d", in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=1, padding=padding + ) layers.append(build_conv_layer(cfg)) layers.append(nn.BatchNorm2d(num_features=out_channels)) layers.append(self.nonlinearity) @@ -244,14 +217,11 @@ class PoseIDHead(BaseHead): return nn.Sequential(*layers) - def _make_deconv_layers(self, in_channels: int, - layer_out_channels: Sequence[int], - layer_kernel_sizes: Sequence[int]) -> nn.Module: + def _make_deconv_layers(self, in_channels: int, layer_out_channels: Sequence[int], layer_kernel_sizes: Sequence[int]) -> nn.Module: """Create deconvolutional layers by given parameters.""" layers = [] - for out_channels, kernel_size in zip(layer_out_channels, - layer_kernel_sizes): + for out_channels, kernel_size in zip(layer_out_channels, layer_kernel_sizes): if kernel_size == 4: padding = 1 output_padding = 0 @@ -262,18 +232,17 @@ class PoseIDHead(BaseHead): padding = 0 output_padding = 0 else: - raise ValueError(f'Unsupported kernel size {kernel_size} for' - 'deconvlutional layers in ' - f'{self.__class__.__name__}') + raise ValueError(f"Unsupported kernel size {kernel_size} for" "deconvlutional layers in " f"{self.__class__.__name__}") cfg = dict( - type='deconv', + type="deconv", in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=2, padding=padding, output_padding=output_padding, - bias=False) + bias=False, + ) layers.append(build_upsample_layer(cfg)) layers.append(nn.BatchNorm2d(num_features=out_channels)) layers.append(self.nonlinearity) @@ -283,11 +252,7 @@ class PoseIDHead(BaseHead): @property def default_init_cfg(self): - init_cfg = [ - dict( - type='Normal', layer=['Conv2d', 'ConvTranspose2d'], std=0.001), - dict(type='Constant', layer='BatchNorm2d', val=1) - ] + init_cfg = [dict(type="Normal", layer=["Conv2d", "ConvTranspose2d"], std=0.001), dict(type="Constant", layer="BatchNorm2d", val=1)] return init_cfg def forward(self, feats: Tuple[Tensor]) -> Tuple[Tensor, Tensor]: @@ -304,9 +269,9 @@ class PoseIDHead(BaseHead): heatmaps = self.forward_heatmap(x) usefulness = self.forward_usefulness(x) - + return heatmaps, usefulness - + def forward_heatmap(self, x: Tensor) -> Tensor: """Forward the network. The input is multi scale feature maps and the output is the heatmap. @@ -322,7 +287,7 @@ class PoseIDHead(BaseHead): x = self.final_layer(x) x = self.normalize_layer(x) return x - + def forward_usefulness(self, x: Tensor) -> Tensor: """Forward the network. The input is multi scale feature maps and the output is the probability. @@ -339,10 +304,7 @@ class PoseIDHead(BaseHead): x = self.usefulness_layers(x) return x - def predict(self, - feats: Features, - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Features, batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from features. Args: @@ -373,10 +335,10 @@ class PoseIDHead(BaseHead): - heatmaps (Tensor): The predicted heatmaps in shape (K, h, w) """ - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): # TTA: flip test -> feats = [orig, flipped] assert isinstance(feats, list) and len(feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] _feats, _feats_flip = feats _htm, _usf = self.forward(_feats) @@ -386,14 +348,15 @@ class PoseIDHead(BaseHead): # Flip back the keypoints _htm_flip = flip_heatmaps( _htm_flip, - flip_mode=test_cfg.get('flip_mode', 'heatmap'), + flip_mode=test_cfg.get("flip_mode", "heatmap"), flip_indices=flip_indices, - shift_heatmap=test_cfg.get('shift_heatmap', False)) + shift_heatmap=test_cfg.get("shift_heatmap", False), + ) heatmaps = (_htm + _htm_flip) * 0.5 # Flip back scalars # _usf_flip = _usf_flip[:, flip_indices] - + usefulness = (_usf + _usf_flip) * 0.5 else: heatmaps, usefulness = self.forward(feats) @@ -401,22 +364,17 @@ class PoseIDHead(BaseHead): preds = self.decode(heatmaps) usefulness = to_numpy(usefulness).reshape((B, 1)) - + for pi, p in enumerate(preds): p.set_field(usefulness[pi], "keypoints_usf") - - if test_cfg.get('output_heatmaps', False): - pred_fields = [ - PixelData(heatmaps=hm) for hm in heatmaps.detach() - ] + + if test_cfg.get("output_heatmaps", False): + pred_fields = [PixelData(heatmaps=hm) for hm in heatmaps.detach()] return preds, pred_fields else: return preds - def loss(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples. Args: @@ -430,28 +388,23 @@ class PoseIDHead(BaseHead): dict: A dictionary of losses. """ dt_heatmaps, dt_usfs = self.forward(feats) - device=dt_heatmaps.device + device = dt_heatmaps.device B, C, H, W = dt_heatmaps.shape - + # Extract GT data - gt_heatmaps = torch.stack( - [d.gt_fields.heatmaps for d in batch_data_samples]) + gt_heatmaps = torch.stack([d.gt_fields.heatmaps for d in batch_data_samples]) # breakpoint() - gt_usfs = np.stack( - [d.gt_instances.identified.astype(float) for d in batch_data_samples]) + gt_usfs = np.stack([d.gt_instances.identified.astype(float) for d in batch_data_samples]) if self.usefulness_thr is not None: gt_usfs = (gt_usfs > self.usefulness_thr).astype(int) - gt_annotated = np.stack( - [d.gt_instances.keypoints_visible.astype(int) for d in batch_data_samples]) - keypoint_weights = torch.cat([ - d.gt_instance_labels.keypoint_weights for d in batch_data_samples - ]) + gt_annotated = np.stack([d.gt_instances.keypoints_visible.astype(int) for d in batch_data_samples]) + keypoint_weights = torch.cat([d.gt_instance_labels.keypoint_weights for d in batch_data_samples]) # Convert everything to tensors gt_usfs = torch.tensor(gt_usfs, device=device, dtype=dt_usfs.dtype) gt_annotated = torch.tensor(gt_annotated, device=device) - + # Reshape everything to comparable shapes gt_heatmaps = gt_heatmaps.view((B, C, H, W)) dt_heatmaps = dt_heatmaps.view((B, C, H, W)) @@ -462,47 +415,40 @@ class PoseIDHead(BaseHead): # Compute uselfulness weights # usfs_weights = torch.ones_like(dt_usfs, dtype=torch.float, device=device) - usfs_weights = gt_usfs.detach().clone() * 8.0 + 1.0 # Weight the useful poses more ais the ratio in data is approx 1:9 + usfs_weights = gt_usfs.detach().clone() * 8.0 + 1.0 # Weight the useful poses more ais the ratio in data is approx 1:9 # calculate losses losses = dict() heatmap_weights = keypoint_weights - heatmap_loss = self.keypoint_loss_module(dt_heatmaps, gt_heatmaps, heatmap_weights) - usefulness_loss = self.usefulness_loss_module( - dt_usfs, gt_usfs, - target_weight=usfs_weights - ) - + heatmap_loss = self.keypoint_loss_module(dt_heatmaps, gt_heatmaps, heatmap_weights) + usefulness_loss = self.usefulness_loss_module(dt_usfs, gt_usfs, target_weight=usfs_weights) + losses.update( loss_kpt=heatmap_loss, loss_usefulness=usefulness_loss, ) - + # calculate accuracy - if train_cfg.get('compute_acc', True): - acc_pose = self.get_pose_accuracy( - dt_heatmaps, gt_heatmaps, keypoint_weights > 0.5 - ) + if train_cfg.get("compute_acc", True): + acc_pose = self.get_pose_accuracy(dt_heatmaps, gt_heatmaps, keypoint_weights > 0.5) losses.update(acc_pose=acc_pose) # Calculate the best binary accuracy for probability if self.usefulness_thr is not None: - usf_acc, usf_thr = self.get_binary_accuracy( - dt_usfs, gt_usfs, torch.ones_like(dt_usfs, dtype=torch.bool) - ) + usf_acc, usf_thr = self.get_binary_accuracy(dt_usfs, gt_usfs, torch.ones_like(dt_usfs, dtype=torch.bool)) losses.update(usf_acc=usf_acc, usf_thr=usf_thr) else: usf_err = self.get_mae( dt_usfs, gt_usfs, # (gt_annotated > 0.5).any(axis=1).view(dt_usfs.shape), - mask=torch.ones_like(dt_usfs, dtype=torch.bool) + mask=torch.ones_like(dt_usfs, dtype=torch.bool), ) losses.update(usf_mae=usf_err) return losses - + def get_pose_accuracy(self, dt, gt, mask): """Calculate the accuracy of predicted pose.""" _, avg_acc, _ = pose_pck_accuracy( @@ -512,7 +458,7 @@ class PoseIDHead(BaseHead): ) acc_pose = torch.tensor(avg_acc, device=gt.device) return acc_pose - + def get_binary_accuracy(self, dt, gt, mask, force_balanced=False): """Calculate the binary accuracy.""" assert dt.shape == gt.shape @@ -544,7 +490,7 @@ class PoseIDHead(BaseHead): n_samples = len(gt) thresholds = np.arange(0.1, 1.0, 0.05) - preds = (dt[:, None] > thresholds) + preds = dt[:, None] > thresholds correct = preds == gt[:, None] counts = correct.sum(axis=0) @@ -564,7 +510,7 @@ class PoseIDHead(BaseHead): dt = to_numpy(dt) gt = to_numpy(gt) mask = to_numpy(mask) - + dt = dt[mask] gt = gt[mask] mae = np.abs(dt - gt).mean() @@ -572,15 +518,14 @@ class PoseIDHead(BaseHead): mae = torch.tensor(mae, device=device) return mae - def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, - **kwargs): + def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, **kwargs): """A hook function to convert old-version state dict of :class:`TopdownHeatmapSimpleHead` (before MMPose v1.0.0) to a compatible format of :class:`HeatmapHead`. The hook will be automatically registered during initialization. """ - version = local_meta.get('version', None) + version = local_meta.get("version", None) if version and version >= self._version: return @@ -590,7 +535,7 @@ class PoseIDHead(BaseHead): if not _k.startswith(prefix): continue v = state_dict.pop(_k) - k = _k[len(prefix):] + k = _k[len(prefix) :] # In old version, "final_layer" includes both intermediate # conv layers (new "conv_layers") and final conv layers (new # "final_layer"). @@ -603,17 +548,17 @@ class PoseIDHead(BaseHead): # have keys like "final_layer.n.xxx", where the weights of the last # one should be renamed "final_layer.xxx", and others should be # renamed "conv_layers.n.xxx" - k_parts = k.split('.') - if k_parts[0] == 'final_layer': + k_parts = k.split(".") + if k_parts[0] == "final_layer": if len(k_parts) == 3: assert isinstance(self.conv_layers, nn.Sequential) idx = int(k_parts[1]) if idx < len(self.conv_layers): # final_layer.n.xxx -> conv_layers.n.xxx - k_new = 'conv_layers.' + '.'.join(k_parts[1:]) + k_new = "conv_layers." + ".".join(k_parts[1:]) else: # final_layer.n.xxx -> final_layer.xxx - k_new = 'final_layer.' + k_parts[2] + k_new = "final_layer." + k_parts[2] else: # final_layer.xxx remains final_layer.xxx k_new = k diff --git a/mmpose/models/heads/hybrid_heads/rtmo_head.py b/mmpose/models/heads/hybrid_heads/rtmo_head.py index c364c20e98fdbb2c1af9f4bba832c24fc20aec37..473b8c55ec4863066ce7e9549fbe0b6e625f4d1d 100644 --- a/mmpose/models/heads/hybrid_heads/rtmo_head.py +++ b/mmpose/models/heads/hybrid_heads/rtmo_head.py @@ -12,11 +12,11 @@ from mmengine.structures import InstanceData from torch import Tensor from mmpose.evaluation.functional import nms_torch -from mmpose.models.utils import (GAUEncoder, SinePositionalEncoding, - filter_scores_and_topk) +from mmpose.models.utils import GAUEncoder, SinePositionalEncoding, filter_scores_and_topk from mmpose.registry import MODELS from mmpose.structures.bbox import bbox_xyxy2cs from mmpose.utils.typing import Features, OptSampleList, Predictions + from .yoloxpose_head import YOLOXPoseHead EPS = 1e-8 @@ -69,17 +69,17 @@ class RTMOHeadModule(BaseModule): channels_per_group=36, pose_vec_channels=-1, featmap_strides: Sequence[int] = [8, 16, 32], - conv_bias: Union[bool, str] = 'auto', + conv_bias: Union[bool, str] = "auto", conv_cfg: Optional[ConfigType] = None, - norm_cfg: ConfigType = dict(type='BN', momentum=0.03, eps=0.001), - act_cfg: ConfigType = dict(type='SiLU', inplace=True), + norm_cfg: ConfigType = dict(type="BN", momentum=0.03, eps=0.001), + act_cfg: ConfigType = dict(type="SiLU", inplace=True), init_cfg: Optional[ConfigType] = None, ): super().__init__(init_cfg=init_cfg) self.num_classes = num_classes self.cls_feat_channels = int(cls_feat_channels * widen_factor) self.stacked_convs = stacked_convs - assert conv_bias == 'auto' or isinstance(conv_bias, bool) + assert conv_bias == "auto" or isinstance(conv_bias, bool) self.conv_bias = conv_bias self.conv_cfg = conv_cfg @@ -118,14 +118,15 @@ class RTMOHeadModule(BaseModule): conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, act_cfg=self.act_cfg, - bias=self.conv_bias)) + bias=self.conv_bias, + ) + ) self.conv_cls.append(nn.Sequential(*stacked_convs)) # output layers self.out_cls = nn.ModuleList() for _ in self.featmap_strides: - self.out_cls.append( - nn.Conv2d(self.cls_feat_channels, self.num_classes, 1)) + self.out_cls.append(nn.Conv2d(self.cls_feat_channels, self.num_classes, 1)) def _init_pose_branch(self): """Initialize pose prediction branch for all level feature maps.""" @@ -147,7 +148,9 @@ class RTMOHeadModule(BaseModule): conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, act_cfg=self.act_cfg, - bias=self.conv_bias)) + bias=self.conv_bias, + ) + ) self.conv_pose.append(nn.Sequential(*stacked_convs)) # output layers @@ -156,15 +159,13 @@ class RTMOHeadModule(BaseModule): self.out_kpt_vis = nn.ModuleList() for _ in self.featmap_strides: self.out_bbox.append(nn.Conv2d(out_chn, 4, 1)) - self.out_kpt_reg.append( - nn.Conv2d(out_chn, self.num_keypoints * 2, 1)) + self.out_kpt_reg.append(nn.Conv2d(out_chn, self.num_keypoints * 2, 1)) self.out_kpt_vis.append(nn.Conv2d(out_chn, self.num_keypoints, 1)) if self.pose_vec_channels > 0: self.out_pose = nn.ModuleList() for _ in self.featmap_strides: - self.out_pose.append( - nn.Conv2d(out_chn, self.pose_vec_channels, 1)) + self.out_pose.append(nn.Conv2d(out_chn, self.pose_vec_channels, 1)) def init_weights(self): """Initialize weights of the head. @@ -244,13 +245,8 @@ class DCC(BaseModule): spe_channels: int = 128, spe_temperature: float = 300.0, gau_cfg: Optional[dict] = dict( - s=128, - expansion_factor=2, - dropout_rate=0.0, - drop_path=0.0, - act_fn='SiLU', - use_rel_bias=False, - pos_enc='add'), + s=128, expansion_factor=2, dropout_rate=0.0, drop_path=0.0, act_fn="SiLU", use_rel_bias=False, pos_enc="add" + ), ): super().__init__() @@ -276,18 +272,16 @@ class DCC(BaseModule): # GAU encoder if self.gau_cfg is not None: gau_cfg = self.gau_cfg.copy() - gau_cfg['in_token_dims'] = self.feat_channels - gau_cfg['out_token_dims'] = self.feat_channels + gau_cfg["in_token_dims"] = self.feat_channels + gau_cfg["out_token_dims"] = self.feat_channels self.gau = GAUEncoder(**gau_cfg) - if gau_cfg.get('pos_enc', 'none') in ('add', 'rope'): - self.pos_enc = nn.Parameter( - torch.randn(self.num_keypoints, gau_cfg['s'])) + if gau_cfg.get("pos_enc", "none") in ("add", "rope"): + self.pos_enc = nn.Parameter(torch.randn(self.num_keypoints, gau_cfg["s"])) # fully-connected layers to convert pose feats to keypoint feats pose_to_kpts = [ - nn.Linear(self.in_channels, - self.feat_channels * self.num_keypoints), - nn.BatchNorm1d(self.feat_channels * self.num_keypoints) + nn.Linear(self.in_channels, self.feat_channels * self.num_keypoints), + nn.BatchNorm1d(self.feat_channels * self.num_keypoints), ] self.pose_to_kpts = nn.Sequential(*pose_to_kpts) @@ -296,16 +290,12 @@ class DCC(BaseModule): self.y_fc = nn.Linear(self.spe_feat_channels, self.feat_channels) # fully-connected layers to predict sigma - self.sigma_fc = nn.Sequential( - nn.Linear(self.in_channels, self.num_keypoints), nn.Sigmoid(), - Scale(0.1)) + self.sigma_fc = nn.Sequential(nn.Linear(self.in_channels, self.num_keypoints), nn.Sigmoid(), Scale(0.1)) def _build_basic_bins(self): """Builds basic bin coordinates for x and y.""" - self.register_buffer('y_bins', - torch.linspace(-0.5, 0.5, self.num_bins[1])) - self.register_buffer('x_bins', - torch.linspace(-0.5, 0.5, self.num_bins[0])) + self.register_buffer("y_bins", torch.linspace(-0.5, 0.5, self.num_bins[1])) + self.register_buffer("x_bins", torch.linspace(-0.5, 0.5, self.num_bins[0])) def _apply_softmax(self, x_hms, y_hms): """Apply softmax on 1-D heatmaps. @@ -353,10 +343,8 @@ class DCC(BaseModule): x_bins, y_bins = self.x_bins, self.y_bins # dynamic bin allocation - x_bins = x_bins.view(*((1,) * (scale.ndim-1)), -1) \ - * scale[..., 0:1] + center[..., 0:1] - y_bins = y_bins.view(*((1,) * (scale.ndim-1)), -1) \ - * scale[..., 1:2] + center[..., 1:2] + x_bins = x_bins.view(*((1,) * (scale.ndim - 1)), -1) * scale[..., 0:1] + center[..., 0:1] + y_bins = y_bins.view(*((1,) * (scale.ndim - 1)), -1) * scale[..., 1:2] + center[..., 1:2] # dynamic bin encoding x_bins_enc = self.x_fc(self.spe(position=x_bins)) @@ -384,17 +372,13 @@ class DCC(BaseModule): kpt_feats = self.pose_to_kpts(pose_feats) - kpt_feats = kpt_feats.reshape(*kpt_feats.shape[:-1], - self.num_keypoints, self.feat_channels) + kpt_feats = kpt_feats.reshape(*kpt_feats.shape[:-1], self.num_keypoints, self.feat_channels) - if hasattr(self, 'gau'): - kpt_feats = self.gau( - kpt_feats, pos_enc=getattr(self, 'pos_enc', None)) + if hasattr(self, "gau"): + kpt_feats = self.gau(kpt_feats, pos_enc=getattr(self, "pos_enc", None)) - x_hms = torch.matmul(kpt_feats, - x_bins_enc.transpose(-1, -2).contiguous()) - y_hms = torch.matmul(kpt_feats, - y_bins_enc.transpose(-1, -2).contiguous()) + x_hms = torch.matmul(kpt_feats, x_bins_enc.transpose(-1, -2).contiguous()) + y_hms = torch.matmul(kpt_feats, y_bins_enc.transpose(-1, -2).contiguous()) return x_hms, y_hms @@ -418,10 +402,8 @@ class DCC(BaseModule): x_bins, y_bins = self.x_bins, self.y_bins - x_bins = x_bins.view(*((1,) * (scale.ndim-1)), -1) \ - * scale[..., 0:1] + center[..., 0:1] - y_bins = y_bins.view(*((1,) * (scale.ndim-1)), -1) \ - * scale[..., 1:2] + center[..., 1:2] + x_bins = x_bins.view(*((1,) * (scale.ndim - 1)), -1) * scale[..., 0:1] + center[..., 0:1] + y_bins = y_bins.view(*((1,) * (scale.ndim - 1)), -1) * scale[..., 1:2] + center[..., 1:2] x = (x_hms * x_bins.unsqueeze(1)).sum(dim=-1) y = (y_hms * y_bins.unsqueeze(1)).sum(dim=-1) @@ -449,10 +431,8 @@ class DCC(BaseModule): # calculate the error of each bin from the GT keypoint coordinates center, scale = bbox_cs.split(2, dim=-1) - x_bins = self.x_bins.view(*((1,) * (scale.ndim-1)), -1) \ - * scale[..., 0:1] + center[..., 0:1] - y_bins = self.y_bins.view(*((1,) * (scale.ndim-1)), -1) \ - * scale[..., 1:2] + center[..., 1:2] + x_bins = self.x_bins.view(*((1,) * (scale.ndim - 1)), -1) * scale[..., 0:1] + center[..., 0:1] + y_bins = self.y_bins.view(*((1,) * (scale.ndim - 1)), -1) * scale[..., 1:2] + center[..., 1:2] dist_x = torch.abs(kpt_targets.narrow(2, 0, 1) - x_bins.unsqueeze(1)) dist_y = torch.abs(kpt_targets.narrow(2, 1, 1) - y_bins.unsqueeze(1)) @@ -486,8 +466,7 @@ class DCC(BaseModule): """ sigmas = self.sigma_fc(pose_feats) x_bins_enc, y_bins_enc = self._get_bin_enc(bbox_cs, grids) - x_hms, y_hms = self._pose_feats_to_heatmaps(pose_feats, x_bins_enc, - y_bins_enc) + x_hms, y_hms = self._pose_feats_to_heatmaps(pose_feats, x_bins_enc, y_bins_enc) x_hms, y_hms = self._apply_softmax(x_hms, y_hms) pose_preds = self._decode_xy_heatmaps(x_hms, y_hms, bbox_cs) return pose_preds, (x_hms, y_hms), sigmas @@ -509,18 +488,17 @@ class DCC(BaseModule): Tensor: Pose predictions tensor. """ x_bins_enc, y_bins_enc = self._get_bin_enc(bbox_cs, grids) - x_hms, y_hms = self._pose_feats_to_heatmaps(pose_feats, x_bins_enc, - y_bins_enc) + x_hms, y_hms = self._pose_feats_to_heatmaps(pose_feats, x_bins_enc, y_bins_enc) x_hms, y_hms = self._apply_softmax(x_hms, y_hms) pose_preds = self._decode_xy_heatmaps(x_hms, y_hms, bbox_cs) return pose_preds def switch_to_deploy(self, test_cfg: Optional[Dict] = None): - if getattr(self, 'deploy', False): + if getattr(self, "deploy", False): return self._convert_pose_to_kpts() - if hasattr(self, 'gau'): + if hasattr(self, "gau"): self._convert_gau() self._convert_forward_test() @@ -562,7 +540,7 @@ class DCC(BaseModule): beta_k = self.gau.beta[1].view(1, 1, 1, self.gau.beta.size(-1)) # Adjust beta tensors with position encoding if available - if hasattr(self, 'pos_enc'): + if hasattr(self, "pos_enc"): pos_enc = self.pos_enc.reshape(1, 1, *self.pos_enc.shape) beta_q = beta_q + pos_enc beta_k = beta_k + pos_enc @@ -614,9 +592,7 @@ class DCC(BaseModule): # step 1: pose features -> keypoint features kpt_feats = self.pose_to_kpts(pose_feats) - kpt_feats = kpt_feats.reshape(*kpt_feats.shape[:-1], - self.num_keypoints, - self.feat_channels) + kpt_feats = kpt_feats.reshape(*kpt_feats.shape[:-1], self.num_keypoints, self.feat_channels) kpt_feats = self.gau(kpt_feats) # step 2: dynamic bin encoding @@ -725,30 +701,28 @@ class RTMOHead(YOLOXPoseHead): loss_oks=loss_oks, loss_vis=loss_vis, loss_bbox_aux=loss_bbox_aux, - overlaps_power=overlaps_power) + overlaps_power=overlaps_power, + ) self.bbox_padding = bbox_padding # override to ensure consistency - head_module_cfg['featmap_strides'] = featmap_strides - head_module_cfg['num_keypoints'] = num_keypoints + head_module_cfg["featmap_strides"] = featmap_strides + head_module_cfg["num_keypoints"] = num_keypoints # build modules self.head_module = RTMOHeadModule(**head_module_cfg) self.proxy_target_cc = proxy_target_cc if dcc_cfg is not None: - dcc_cfg['num_keypoints'] = num_keypoints + dcc_cfg["num_keypoints"] = num_keypoints self.dcc = DCC(**dcc_cfg) # build losses if loss_mle is not None: self.loss_mle = MODELS.build(loss_mle) - def loss(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples. Args: @@ -763,49 +737,52 @@ class RTMOHead(YOLOXPoseHead): """ # 1. collect & reform predictions - cls_scores, bbox_preds, kpt_offsets, kpt_vis, pose_vecs = self.forward( - feats) + cls_scores, bbox_preds, kpt_offsets, kpt_vis, pose_vecs = self.forward(feats) featmap_sizes = [cls_score.shape[2:] for cls_score in cls_scores] mlvl_priors = self.prior_generator.grid_priors( - featmap_sizes, - dtype=cls_scores[0].dtype, - device=cls_scores[0].device, - with_stride=True) + featmap_sizes, dtype=cls_scores[0].dtype, device=cls_scores[0].device, with_stride=True + ) flatten_priors = torch.cat(mlvl_priors) # flatten cls_scores, bbox_preds and objectness flatten_cls_scores = self._flatten_predictions(cls_scores) flatten_bbox_preds = self._flatten_predictions(bbox_preds) - flatten_objectness = torch.ones_like( - flatten_cls_scores).detach().narrow(-1, 0, 1) * 1e4 + flatten_objectness = torch.ones_like(flatten_cls_scores).detach().narrow(-1, 0, 1) * 1e4 flatten_kpt_offsets = self._flatten_predictions(kpt_offsets) flatten_kpt_vis = self._flatten_predictions(kpt_vis) flatten_pose_vecs = self._flatten_predictions(pose_vecs) - flatten_bbox_decoded = self.decode_bbox(flatten_bbox_preds, - flatten_priors[..., :2], - flatten_priors[..., -1]) - flatten_kpt_decoded = self.decode_kpt_reg(flatten_kpt_offsets, - flatten_priors[..., :2], - flatten_priors[..., -1]) + flatten_bbox_decoded = self.decode_bbox(flatten_bbox_preds, flatten_priors[..., :2], flatten_priors[..., -1]) + flatten_kpt_decoded = self.decode_kpt_reg(flatten_kpt_offsets, flatten_priors[..., :2], flatten_priors[..., -1]) # 2. generate targets - targets = self._get_targets(flatten_priors, - flatten_cls_scores.detach(), - flatten_objectness.detach(), - flatten_bbox_decoded.detach(), - flatten_kpt_decoded.detach(), - flatten_kpt_vis.detach(), - batch_data_samples) - pos_masks, cls_targets, obj_targets, obj_weights, \ - bbox_targets, bbox_aux_targets, kpt_targets, kpt_aux_targets, \ - vis_targets, vis_weights, pos_areas, pos_priors, group_indices, \ - num_fg_imgs = targets - - num_pos = torch.tensor( - sum(num_fg_imgs), - dtype=torch.float, - device=flatten_cls_scores.device) + targets = self._get_targets( + flatten_priors, + flatten_cls_scores.detach(), + flatten_objectness.detach(), + flatten_bbox_decoded.detach(), + flatten_kpt_decoded.detach(), + flatten_kpt_vis.detach(), + batch_data_samples, + ) + ( + pos_masks, + cls_targets, + obj_targets, + obj_weights, + bbox_targets, + bbox_aux_targets, + kpt_targets, + kpt_aux_targets, + vis_targets, + vis_weights, + pos_areas, + pos_priors, + group_indices, + num_fg_imgs, + ) = targets + + num_pos = torch.tensor(sum(num_fg_imgs), dtype=torch.float, device=flatten_cls_scores.device) num_total_samples = max(reduce_mean(num_pos), 1.0) # 3. calculate loss @@ -817,39 +794,27 @@ class RTMOHead(YOLOXPoseHead): # 3.1 bbox loss bbox_preds = flatten_bbox_decoded.view(-1, 4)[pos_masks] - losses['loss_bbox'] = self.loss_bbox( - bbox_preds, bbox_targets) / num_total_samples + losses["loss_bbox"] = self.loss_bbox(bbox_preds, bbox_targets) / num_total_samples if self.use_aux_loss: - if hasattr(self, 'loss_bbox_aux'): + if hasattr(self, "loss_bbox_aux"): bbox_preds_raw = flatten_bbox_preds.view(-1, 4)[pos_masks] - losses['loss_bbox_aux'] = self.loss_bbox_aux( - bbox_preds_raw, bbox_aux_targets) / num_total_samples + losses["loss_bbox_aux"] = self.loss_bbox_aux(bbox_preds_raw, bbox_aux_targets) / num_total_samples # 3.2 keypoint visibility loss - kpt_vis_preds = flatten_kpt_vis.view(-1, - self.num_keypoints)[pos_masks] - losses['loss_vis'] = self.loss_vis(kpt_vis_preds, vis_targets, - vis_weights) + kpt_vis_preds = flatten_kpt_vis.view(-1, self.num_keypoints)[pos_masks] + losses["loss_vis"] = self.loss_vis(kpt_vis_preds, vis_targets, vis_weights) # 3.3 keypoint loss - kpt_reg_preds = flatten_kpt_decoded.view(-1, self.num_keypoints, - 2)[pos_masks] - - if hasattr(self, 'loss_mle') and self.loss_mle.loss_weight > 0: - pose_vecs = flatten_pose_vecs.view( - -1, flatten_pose_vecs.size(-1))[pos_masks] - bbox_cs = torch.cat( - bbox_xyxy2cs(bbox_preds, self.bbox_padding), dim=1) + kpt_reg_preds = flatten_kpt_decoded.view(-1, self.num_keypoints, 2)[pos_masks] + + if hasattr(self, "loss_mle") and self.loss_mle.loss_weight > 0: + pose_vecs = flatten_pose_vecs.view(-1, flatten_pose_vecs.size(-1))[pos_masks] + bbox_cs = torch.cat(bbox_xyxy2cs(bbox_preds, self.bbox_padding), dim=1) # 'cc' refers to 'cordinate classification' - kpt_cc_preds, pred_hms, sigmas = \ - self.dcc.forward_train(pose_vecs, - bbox_cs, - pos_priors[..., :2]) - target_hms = self.dcc.generate_target_heatmap( - kpt_targets, bbox_cs, sigmas, pos_areas) - losses['loss_mle'] = self.loss_mle(pred_hms, target_hms, - vis_targets) + kpt_cc_preds, pred_hms, sigmas = self.dcc.forward_train(pose_vecs, bbox_cs, pos_priors[..., :2]) + target_hms = self.dcc.generate_target_heatmap(kpt_targets, bbox_cs, sigmas, pos_areas) + losses["loss_mle"] = self.loss_mle(pred_hms, target_hms, vis_targets) if self.proxy_target_cc: # form the regression target using the coordinate @@ -859,37 +824,28 @@ class RTMOHead(YOLOXPoseHead): diff_reg = torch.norm(kpt_reg_preds - kpt_targets, dim=-1) mask = (diff_reg > diff_cc).float() kpt_weights_reg = vis_targets * mask - oks = self.assigner.oks_calculator(kpt_cc_preds, - kpt_targets, - vis_targets, pos_areas) + oks = self.assigner.oks_calculator(kpt_cc_preds, kpt_targets, vis_targets, pos_areas) cls_targets = oks.unsqueeze(1) - losses['loss_oks'] = self.loss_oks(kpt_reg_preds, - kpt_cc_preds.detach(), - kpt_weights_reg, pos_areas) + losses["loss_oks"] = self.loss_oks(kpt_reg_preds, kpt_cc_preds.detach(), kpt_weights_reg, pos_areas) else: - losses['loss_oks'] = self.loss_oks(kpt_reg_preds, kpt_targets, - vis_targets, pos_areas) + losses["loss_oks"] = self.loss_oks(kpt_reg_preds, kpt_targets, vis_targets, pos_areas) # update the target for classification loss # the target for the positive grids are set to the oks calculated # using predictions and assigned ground truth instances - extra_info['overlaps'] = cls_targets + extra_info["overlaps"] = cls_targets cls_targets = cls_targets.pow(self.overlaps_power).detach() obj_targets[pos_masks] = cls_targets.to(obj_targets) # 3.4 classification loss - losses['loss_cls'] = self.loss_cls(cls_preds_all, obj_targets, - obj_weights) / num_total_samples + losses["loss_cls"] = self.loss_cls(cls_preds_all, obj_targets, obj_weights) / num_total_samples losses.update(extra_info) return losses - def predict(self, - feats: Features, - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Features, batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from features. Args: @@ -932,17 +888,12 @@ class RTMOHead(YOLOXPoseHead): # If the shape does not change, use the previous mlvl_priors if featmap_sizes != self.featmap_sizes: - self.mlvl_priors = self.prior_generator.grid_priors( - featmap_sizes, - dtype=cls_scores[0].dtype, - device=cls_scores[0].device) + self.mlvl_priors = self.prior_generator.grid_priors(featmap_sizes, dtype=cls_scores[0].dtype, device=cls_scores[0].device) self.featmap_sizes = featmap_sizes flatten_priors = torch.cat(self.mlvl_priors) mlvl_strides = [ - flatten_priors.new_full((featmap_size.numel(), ), - stride) for featmap_size, stride in zip( - featmap_sizes, self.featmap_strides) + flatten_priors.new_full((featmap_size.numel(),), stride) for featmap_size, stride in zip(featmap_sizes, self.featmap_strides) ] flatten_stride = torch.cat(mlvl_strides) @@ -953,22 +904,19 @@ class RTMOHead(YOLOXPoseHead): flatten_pose_vecs = self._flatten_predictions(pose_vecs) if flatten_pose_vecs is None: flatten_pose_vecs = [None] * len(batch_img_metas) - flatten_bbox_preds = self.decode_bbox(flatten_bbox_preds, - flatten_priors, flatten_stride) + flatten_bbox_preds = self.decode_bbox(flatten_bbox_preds, flatten_priors, flatten_stride) results_list = [] - for (bboxes, scores, kpt_vis, pose_vecs, - img_meta) in zip(flatten_bbox_preds, flatten_cls_scores, - flatten_kpt_vis, flatten_pose_vecs, - batch_img_metas): + for bboxes, scores, kpt_vis, pose_vecs, img_meta in zip( + flatten_bbox_preds, flatten_cls_scores, flatten_kpt_vis, flatten_pose_vecs, batch_img_metas + ): - score_thr = cfg.get('score_thr', 0.01) + score_thr = cfg.get("score_thr", 0.01) - nms_pre = cfg.get('nms_pre', 100000) + nms_pre = cfg.get("nms_pre", 100000) scores, labels = scores.max(1, keepdim=True) - scores, _, keep_idxs_score, results = filter_scores_and_topk( - scores, score_thr, nms_pre, results=dict(labels=labels[:, 0])) - labels = results['labels'] + scores, _, keep_idxs_score, results = filter_scores_and_topk(scores, score_thr, nms_pre, results=dict(labels=labels[:, 0])) + labels = results["labels"] bboxes = bboxes[keep_idxs_score] kpt_vis = kpt_vis[keep_idxs_score] @@ -976,7 +924,7 @@ class RTMOHead(YOLOXPoseHead): stride = flatten_stride[keep_idxs_score] if bboxes.numel() > 0: - nms_thr = cfg.get('nms_thr', 1.0) + nms_thr = cfg.get("nms_thr", 1.0) if nms_thr < 1.0: keep_idxs_nms = nms_torch(bboxes, scores, nms_thr) @@ -987,8 +935,7 @@ class RTMOHead(YOLOXPoseHead): scores = scores[keep_idxs_nms] pose_vecs = pose_vecs[keep_idxs_score][keep_idxs_nms] - bbox_cs = torch.cat( - bbox_xyxy2cs(bboxes, self.bbox_padding), dim=1) + bbox_cs = torch.cat(bbox_xyxy2cs(bboxes, self.bbox_padding), dim=1) grids = grids[keep_idxs_nms] keypoints = self.dcc.forward_test(pose_vecs, bbox_cs, grids) @@ -1003,9 +950,10 @@ class RTMOHead(YOLOXPoseHead): bbox_scores=scores, keypoints=keypoints, keypoint_scores=kpt_vis, - keypoints_visible=kpt_vis) + keypoints_visible=kpt_vis, + ) - input_size = img_meta['input_size'] + input_size = img_meta["input_size"] results.bboxes[:, 0::2].clamp_(0, input_size[0]) results.bboxes[:, 1::2].clamp_(0, input_size[1]) @@ -1016,25 +964,23 @@ class RTMOHead(YOLOXPoseHead): def switch_to_deploy(self, test_cfg: Optional[Dict]): """Precompute and save the grid coordinates and strides.""" - if getattr(self, 'deploy', False): + if getattr(self, "deploy", False): return self.deploy = True # grid generator - input_size = test_cfg.get('input_size', (640, 640)) + input_size = test_cfg.get("input_size", (640, 640)) featmaps = [] for s in self.featmap_strides: - featmaps.append( - torch.rand(1, 1, input_size[0] // s, input_size[1] // s)) + featmaps.append(torch.rand(1, 1, input_size[0] // s, input_size[1] // s)) featmap_sizes = [fmap.shape[2:] for fmap in featmaps] - self.mlvl_priors = self.prior_generator.grid_priors( - featmap_sizes, dtype=torch.float32, device='cpu') + self.mlvl_priors = self.prior_generator.grid_priors(featmap_sizes, dtype=torch.float32, device="cpu") self.flatten_priors = torch.cat(self.mlvl_priors) mlvl_strides = [ - self.flatten_priors.new_full((featmap_size.numel(), ), stride) for - featmap_size, stride in zip(featmap_sizes, self.featmap_strides) + self.flatten_priors.new_full((featmap_size.numel(),), stride) + for featmap_size, stride in zip(featmap_sizes, self.featmap_strides) ] self.flatten_stride = torch.cat(mlvl_strides) diff --git a/mmpose/models/heads/hybrid_heads/vis_head.py b/mmpose/models/heads/hybrid_heads/vis_head.py index 6f808670ad2c23841d56043c98a78a3fa0ae9aaa..5bab74da38c85acfe076c251367eabff6c0e2240 100644 --- a/mmpose/models/heads/hybrid_heads/vis_head.py +++ b/mmpose/models/heads/hybrid_heads/vis_head.py @@ -7,8 +7,8 @@ from torch import Tensor, nn from mmpose.models.utils.tta import flip_visibility from mmpose.registry import MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, InstanceList, OptConfigType, - OptSampleList, Predictions) +from mmpose.utils.typing import ConfigType, InstanceList, OptConfigType, OptSampleList, Predictions + from ..base_head import BaseHead @@ -27,39 +27,34 @@ class VisPredictHead(BaseHead): :attr:`default_init_cfg` for default settings """ - def __init__(self, - pose_cfg: ConfigType, - loss: ConfigType = dict( - type='BCELoss', use_target_weight=False, - use_sigmoid=True), - init_cfg: OptConfigType = None): + def __init__( + self, + pose_cfg: ConfigType, + loss: ConfigType = dict(type="BCELoss", use_target_weight=False, use_sigmoid=True), + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg super().__init__(init_cfg) - self.in_channels = pose_cfg['in_channels'] - if pose_cfg.get('num_joints', None) is not None: - self.out_channels = pose_cfg['num_joints'] - elif pose_cfg.get('out_channels', None) is not None: - self.out_channels = pose_cfg['out_channels'] + self.in_channels = pose_cfg["in_channels"] + if pose_cfg.get("num_joints", None) is not None: + self.out_channels = pose_cfg["num_joints"] + elif pose_cfg.get("out_channels", None) is not None: + self.out_channels = pose_cfg["out_channels"] else: - raise ValueError('VisPredictHead requires \'num_joints\' or' - ' \'out_channels\' in the pose_cfg.') + raise ValueError("VisPredictHead requires 'num_joints' or" " 'out_channels' in the pose_cfg.") self.loss_module = MODELS.build(loss) self.pose_head = MODELS.build(pose_cfg) self.pose_cfg = pose_cfg - self.use_sigmoid = loss.get('use_sigmoid', False) + self.use_sigmoid = loss.get("use_sigmoid", False) - modules = [ - nn.AdaptiveAvgPool2d(1), - nn.Flatten(), - nn.Linear(self.in_channels, self.out_channels) - ] + modules = [nn.AdaptiveAvgPool2d(1), nn.Flatten(), nn.Linear(self.in_channels, self.out_channels)] if self.use_sigmoid: modules.append(nn.Sigmoid()) @@ -96,8 +91,7 @@ class VisPredictHead(BaseHead): return x_pose, x_vis - def integrate(self, batch_vis: Tensor, - pose_preds: Union[Tuple, Predictions]) -> InstanceList: + def integrate(self, batch_vis: Tensor, pose_preds: Union[Tuple, Predictions]) -> InstanceList: """Add keypoints visibility prediction to pose prediction. Overwrite the original keypoint_scores. @@ -116,10 +110,7 @@ class VisPredictHead(BaseHead): return pose_pred_instances, pose_pred_fields - def predict(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from features. Args: @@ -152,15 +143,14 @@ class VisPredictHead(BaseHead): - heatmaps (Tensor): The predicted heatmaps in shape (K, h, w) """ - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): # TTA: flip test -> feats = [orig, flipped] assert isinstance(feats, list) and len(feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] _feats, _feats_flip = feats _batch_vis = self.vis_forward(_feats) - _batch_vis_flip = flip_visibility( - self.vis_forward(_feats_flip), flip_indices=flip_indices) + _batch_vis_flip = flip_visibility(self.vis_forward(_feats_flip), flip_indices=flip_indices) batch_vis = (_batch_vis + _batch_vis_flip) * 0.5 else: batch_vis = self.vis_forward(feats) # (B, K, D) @@ -170,8 +160,7 @@ class VisPredictHead(BaseHead): if not self.use_sigmoid: batch_vis = torch.sigmoid(batch_vis) - batch_pose = self.pose_head.predict(feats, batch_data_samples, - test_cfg) + batch_pose = self.pose_head.predict(feats, batch_data_samples, test_cfg) return self.integrate(batch_vis, batch_pose) @@ -184,16 +173,12 @@ class VisPredictHead(BaseHead): predictions = (vis_pred_outputs >= threshold).float() correct = (predictions == vis_labels).float() if vis_weights is not None: - accuracy = (correct * vis_weights).sum(dim=1) / ( - vis_weights.sum(dim=1) + 1e-6) + accuracy = (correct * vis_weights).sum(dim=1) / (vis_weights.sum(dim=1) + 1e-6) else: accuracy = correct.mean(dim=1) return accuracy.mean() - def loss(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: OptConfigType = {}) -> dict: + def loss(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: OptConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples. Args: @@ -213,9 +198,7 @@ class VisPredictHead(BaseHead): vis_label = d.gt_instance_labels.keypoint_weights.float() vis_labels.append(vis_label) if vis_weights is not None: - vis_weights.append( - getattr(d.gt_instance_labels, 'keypoints_visible_weights', - vis_label.new_ones(vis_label.shape))) + vis_weights.append(getattr(d.gt_instance_labels, "keypoints_visible_weights", vis_label.new_ones(vis_label.shape))) vis_labels = torch.cat(vis_labels) vis_weights = torch.cat(vis_weights) if vis_weights else None @@ -237,5 +220,5 @@ class VisPredictHead(BaseHead): @property def default_init_cfg(self): - init_cfg = [dict(type='Normal', layer=['Linear'], std=0.01, bias=0)] + init_cfg = [dict(type="Normal", layer=["Linear"], std=0.01, bias=0)] return init_cfg diff --git a/mmpose/models/heads/hybrid_heads/yoloxpose_head.py b/mmpose/models/heads/hybrid_heads/yoloxpose_head.py index 07ae63a32500aacdc66d091ead817cb0f7b3243b..2fe85a1dc8e095517235eb01f280cb2f7f786c54 100644 --- a/mmpose/models/heads/hybrid_heads/yoloxpose_head.py +++ b/mmpose/models/heads/hybrid_heads/yoloxpose_head.py @@ -15,8 +15,7 @@ from mmpose.models.utils import filter_scores_and_topk from mmpose.registry import MODELS, TASK_UTILS from mmpose.structures import PoseDataSample from mmpose.utils import reduce_mean -from mmpose.utils.typing import (ConfigType, Features, OptSampleList, - Predictions, SampleList) +from mmpose.utils.typing import ConfigType, Features, OptSampleList, Predictions, SampleList class YOLOXPoseHeadModule(BaseModule): @@ -64,17 +63,17 @@ class YOLOXPoseHeadModule(BaseModule): feat_channels: int = 256, stacked_convs: int = 2, featmap_strides: Sequence[int] = [8, 16, 32], - conv_bias: Union[bool, str] = 'auto', + conv_bias: Union[bool, str] = "auto", conv_cfg: Optional[ConfigType] = None, - norm_cfg: ConfigType = dict(type='BN', momentum=0.03, eps=0.001), - act_cfg: ConfigType = dict(type='SiLU', inplace=True), + norm_cfg: ConfigType = dict(type="BN", momentum=0.03, eps=0.001), + act_cfg: ConfigType = dict(type="SiLU", inplace=True), init_cfg: Optional[ConfigType] = None, ): super().__init__(init_cfg=init_cfg) self.num_classes = num_classes self.feat_channels = int(feat_channels * widen_factor) self.stacked_convs = stacked_convs - assert conv_bias == 'auto' or isinstance(conv_bias, bool) + assert conv_bias == "auto" or isinstance(conv_bias, bool) self.conv_bias = conv_bias self.conv_cfg = conv_cfg @@ -112,15 +111,16 @@ class YOLOXPoseHeadModule(BaseModule): conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, act_cfg=self.act_cfg, - bias=self.conv_bias)) + bias=self.conv_bias, + ) + ) self.conv_cls.append(nn.Sequential(*stacked_convs)) # output layers self.out_cls = nn.ModuleList() self.out_obj = nn.ModuleList() for _ in self.featmap_strides: - self.out_cls.append( - nn.Conv2d(self.feat_channels, self.num_classes, 1)) + self.out_cls.append(nn.Conv2d(self.feat_channels, self.num_classes, 1)) def _init_reg_branch(self): """Initialize classification branch for all level feature maps.""" @@ -139,7 +139,9 @@ class YOLOXPoseHeadModule(BaseModule): conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, act_cfg=self.act_cfg, - bias=self.conv_bias)) + bias=self.conv_bias, + ) + ) self.conv_reg.append(nn.Sequential(*stacked_convs)) # output layers @@ -166,17 +168,17 @@ class YOLOXPoseHeadModule(BaseModule): conv_cfg=self.conv_cfg, norm_cfg=self.norm_cfg, act_cfg=self.act_cfg, - bias=self.conv_bias)) + bias=self.conv_bias, + ) + ) self.conv_pose.append(nn.Sequential(*stacked_convs)) # output layers self.out_kpt = nn.ModuleList() self.out_kpt_vis = nn.ModuleList() for _ in self.featmap_strides: - self.out_kpt.append( - nn.Conv2d(self.feat_channels, self.num_keypoints * 2, 1)) - self.out_kpt_vis.append( - nn.Conv2d(self.feat_channels, self.num_keypoints, 1)) + self.out_kpt.append(nn.Conv2d(self.feat_channels, self.num_keypoints * 2, 1)) + self.out_kpt_vis.append(nn.Conv2d(self.feat_channels, self.num_keypoints, 1)) def init_weights(self): """Initialize weights of the head.""" @@ -252,8 +254,8 @@ class YOLOXPoseHead(BaseModule): self.prior_generator = TASK_UTILS.build(prior_generator) if head_module_cfg is not None: - head_module_cfg['featmap_strides'] = featmap_strides - head_module_cfg['num_keypoints'] = num_keypoints + head_module_cfg["featmap_strides"] = featmap_strides + head_module_cfg["num_keypoints"] = num_keypoints self.head_module = YOLOXPoseHeadModule(**head_module_cfg) self.assigner = TASK_UTILS.build(assigner) @@ -273,10 +275,7 @@ class YOLOXPoseHead(BaseModule): assert isinstance(feats, (tuple, list)) return self.head_module(feats) - def loss(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples. Args: @@ -291,15 +290,12 @@ class YOLOXPoseHead(BaseModule): """ # 1. collect & reform predictions - cls_scores, objectnesses, bbox_preds, kpt_offsets, \ - kpt_vis = self.forward(feats) + cls_scores, objectnesses, bbox_preds, kpt_offsets, kpt_vis = self.forward(feats) featmap_sizes = [cls_score.shape[2:] for cls_score in cls_scores] mlvl_priors = self.prior_generator.grid_priors( - featmap_sizes, - dtype=cls_scores[0].dtype, - device=cls_scores[0].device, - with_stride=True) + featmap_sizes, dtype=cls_scores[0].dtype, device=cls_scores[0].device, with_stride=True + ) flatten_priors = torch.cat(mlvl_priors) # flatten cls_scores, bbox_preds and objectness @@ -308,30 +304,37 @@ class YOLOXPoseHead(BaseModule): flatten_objectness = self._flatten_predictions(objectnesses) flatten_kpt_offsets = self._flatten_predictions(kpt_offsets) flatten_kpt_vis = self._flatten_predictions(kpt_vis) - flatten_bbox_decoded = self.decode_bbox(flatten_bbox_preds, - flatten_priors[..., :2], - flatten_priors[..., -1]) - flatten_kpt_decoded = self.decode_kpt_reg(flatten_kpt_offsets, - flatten_priors[..., :2], - flatten_priors[..., -1]) + flatten_bbox_decoded = self.decode_bbox(flatten_bbox_preds, flatten_priors[..., :2], flatten_priors[..., -1]) + flatten_kpt_decoded = self.decode_kpt_reg(flatten_kpt_offsets, flatten_priors[..., :2], flatten_priors[..., -1]) # 2. generate targets - targets = self._get_targets(flatten_priors, - flatten_cls_scores.detach(), - flatten_objectness.detach(), - flatten_bbox_decoded.detach(), - flatten_kpt_decoded.detach(), - flatten_kpt_vis.detach(), - batch_data_samples) - pos_masks, cls_targets, obj_targets, obj_weights, \ - bbox_targets, bbox_aux_targets, kpt_targets, kpt_aux_targets, \ - vis_targets, vis_weights, pos_areas, pos_priors, group_indices, \ - num_fg_imgs = targets - - num_pos = torch.tensor( - sum(num_fg_imgs), - dtype=torch.float, - device=flatten_cls_scores.device) + targets = self._get_targets( + flatten_priors, + flatten_cls_scores.detach(), + flatten_objectness.detach(), + flatten_bbox_decoded.detach(), + flatten_kpt_decoded.detach(), + flatten_kpt_vis.detach(), + batch_data_samples, + ) + ( + pos_masks, + cls_targets, + obj_targets, + obj_weights, + bbox_targets, + bbox_aux_targets, + kpt_targets, + kpt_aux_targets, + vis_targets, + vis_weights, + pos_areas, + pos_priors, + group_indices, + num_fg_imgs, + ) = targets + + num_pos = torch.tensor(sum(num_fg_imgs), dtype=torch.float, device=flatten_cls_scores.device) num_total_samples = max(reduce_mean(num_pos), 1.0) # 3. calculate loss @@ -339,49 +342,38 @@ class YOLOXPoseHead(BaseModule): losses = dict() obj_preds = flatten_objectness.view(-1, 1) - losses['loss_obj'] = self.loss_obj(obj_preds, obj_targets, - obj_weights) / num_total_samples + losses["loss_obj"] = self.loss_obj(obj_preds, obj_targets, obj_weights) / num_total_samples if num_pos > 0: # 3.2 bbox loss bbox_preds = flatten_bbox_decoded.view(-1, 4)[pos_masks] - losses['loss_bbox'] = self.loss_bbox( - bbox_preds, bbox_targets) / num_total_samples + losses["loss_bbox"] = self.loss_bbox(bbox_preds, bbox_targets) / num_total_samples # 3.3 keypoint loss - kpt_preds = flatten_kpt_decoded.view(-1, self.num_keypoints, - 2)[pos_masks] - losses['loss_kpt'] = self.loss_oks(kpt_preds, kpt_targets, - vis_targets, pos_areas) + kpt_preds = flatten_kpt_decoded.view(-1, self.num_keypoints, 2)[pos_masks] + losses["loss_kpt"] = self.loss_oks(kpt_preds, kpt_targets, vis_targets, pos_areas) # 3.4 keypoint visibility loss - kpt_vis_preds = flatten_kpt_vis.view(-1, - self.num_keypoints)[pos_masks] - losses['loss_vis'] = self.loss_vis(kpt_vis_preds, vis_targets, - vis_weights) + kpt_vis_preds = flatten_kpt_vis.view(-1, self.num_keypoints)[pos_masks] + losses["loss_vis"] = self.loss_vis(kpt_vis_preds, vis_targets, vis_weights) # 3.5 classification loss - cls_preds = flatten_cls_scores.view(-1, - self.num_classes)[pos_masks] - losses['overlaps'] = cls_targets + cls_preds = flatten_cls_scores.view(-1, self.num_classes)[pos_masks] + losses["overlaps"] = cls_targets cls_targets = cls_targets.pow(self.overlaps_power).detach() - losses['loss_cls'] = self.loss_cls(cls_preds, - cls_targets) / num_total_samples + losses["loss_cls"] = self.loss_cls(cls_preds, cls_targets) / num_total_samples if self.use_aux_loss: - if hasattr(self, 'loss_bbox_aux'): + if hasattr(self, "loss_bbox_aux"): # 3.6 auxiliary bbox regression loss bbox_preds_raw = flatten_bbox_preds.view(-1, 4)[pos_masks] - losses['loss_bbox_aux'] = self.loss_bbox_aux( - bbox_preds_raw, bbox_aux_targets) / num_total_samples + losses["loss_bbox_aux"] = self.loss_bbox_aux(bbox_preds_raw, bbox_aux_targets) / num_total_samples - if hasattr(self, 'loss_kpt_aux'): + if hasattr(self, "loss_kpt_aux"): # 3.7 auxiliary keypoint regression loss - kpt_preds_raw = flatten_kpt_offsets.view( - -1, self.num_keypoints, 2)[pos_masks] + kpt_preds_raw = flatten_kpt_offsets.view(-1, self.num_keypoints, 2)[pos_masks] kpt_weights = vis_targets / vis_targets.size(-1) - losses['loss_kpt_aux'] = self.loss_kpt_aux( - kpt_preds_raw, kpt_aux_targets, kpt_weights) + losses["loss_kpt_aux"] = self.loss_kpt_aux(kpt_preds_raw, kpt_aux_targets, kpt_weights) return losses @@ -407,12 +399,15 @@ class YOLOXPoseHead(BaseModule): targets_each = [] for i in range(num_imgs): - target = self._get_targets_single(priors, batch_cls_scores[i], - batch_objectness[i], - batch_decoded_bboxes[i], - batch_decoded_kpts[i], - batch_kpt_vis[i], - batch_data_samples[i]) + target = self._get_targets_single( + priors, + batch_cls_scores[i], + batch_objectness[i], + batch_decoded_bboxes[i], + batch_decoded_kpts[i], + batch_kpt_vis[i], + batch_data_samples[i], + ) targets_each.append(target) targets = list(zip(*targets_each)) @@ -422,29 +417,49 @@ class YOLOXPoseHead(BaseModule): if len(target) > 0: targets[i] = torch.cat(target) - foreground_masks, cls_targets, obj_targets, obj_weights, \ - bbox_targets, kpt_targets, vis_targets, vis_weights, pos_areas, \ - pos_priors, group_indices, num_pos_per_img = targets + ( + foreground_masks, + cls_targets, + obj_targets, + obj_weights, + bbox_targets, + kpt_targets, + vis_targets, + vis_weights, + pos_areas, + pos_priors, + group_indices, + num_pos_per_img, + ) = targets # post-processing for targets if self.use_aux_loss: bbox_cxcy = (bbox_targets[:, :2] + bbox_targets[:, 2:]) / 2.0 bbox_wh = bbox_targets[:, 2:] - bbox_targets[:, :2] - bbox_aux_targets = torch.cat([ - (bbox_cxcy - pos_priors[:, :2]) / pos_priors[:, 2:], - torch.log(bbox_wh / pos_priors[:, 2:] + 1e-8) - ], - dim=-1) - - kpt_aux_targets = (kpt_targets - pos_priors[:, None, :2]) \ - / pos_priors[:, None, 2:] + bbox_aux_targets = torch.cat( + [(bbox_cxcy - pos_priors[:, :2]) / pos_priors[:, 2:], torch.log(bbox_wh / pos_priors[:, 2:] + 1e-8)], dim=-1 + ) + + kpt_aux_targets = (kpt_targets - pos_priors[:, None, :2]) / pos_priors[:, None, 2:] else: bbox_aux_targets, kpt_aux_targets = None, None - return (foreground_masks, cls_targets, obj_targets, obj_weights, - bbox_targets, bbox_aux_targets, kpt_targets, kpt_aux_targets, - vis_targets, vis_weights, pos_areas, pos_priors, group_indices, - num_pos_per_img) + return ( + foreground_masks, + cls_targets, + obj_targets, + obj_weights, + bbox_targets, + bbox_aux_targets, + kpt_targets, + kpt_aux_targets, + vis_targets, + vis_weights, + pos_areas, + pos_priors, + group_indices, + num_pos_per_img, + ) @torch.no_grad() def _get_targets_single( @@ -501,7 +516,7 @@ class YOLOXPoseHead(BaseModule): # TODO: change the shape of objectness to [num_priors] num_priors = priors.size(0) gt_instances = data_sample.gt_instance_labels - gt_fields = data_sample.get('gt_fields', dict()) + gt_fields = data_sample.get("gt_fields", dict()) num_gts = len(gt_instances) # No target @@ -513,12 +528,23 @@ class YOLOXPoseHead(BaseModule): kpt_target = cls_scores.new_zeros((0, self.num_keypoints, 2)) vis_target = cls_scores.new_zeros((0, self.num_keypoints)) vis_weight = cls_scores.new_zeros((0, self.num_keypoints)) - pos_areas = cls_scores.new_zeros((0, )) + pos_areas = cls_scores.new_zeros((0,)) pos_priors = priors[:0] foreground_mask = cls_scores.new_zeros(num_priors).bool() - return (foreground_mask, cls_target, obj_target, obj_weight, - bbox_target, kpt_target, vis_target, vis_weight, pos_areas, - pos_priors, [], 0) + return ( + foreground_mask, + cls_target, + obj_target, + obj_weight, + bbox_target, + kpt_target, + vis_target, + vis_weight, + pos_areas, + pos_priors, + [], + 0, + ) # assign positive samples scores = cls_scores * objectness @@ -529,30 +555,26 @@ class YOLOXPoseHead(BaseModule): keypoints=decoded_kpts, keypoints_visible=kpt_vis, ) - assign_result = self.assigner.assign( - pred_instances=pred_instances, gt_instances=gt_instances) + assign_result = self.assigner.assign(pred_instances=pred_instances, gt_instances=gt_instances) # sampling - pos_inds = torch.nonzero( - assign_result['gt_inds'] > 0, as_tuple=False).squeeze(-1).unique() + pos_inds = torch.nonzero(assign_result["gt_inds"] > 0, as_tuple=False).squeeze(-1).unique() num_pos_per_img = pos_inds.size(0) - pos_gt_labels = assign_result['labels'][pos_inds] - pos_assigned_gt_inds = assign_result['gt_inds'][pos_inds] - 1 + pos_gt_labels = assign_result["labels"][pos_inds] + pos_assigned_gt_inds = assign_result["gt_inds"][pos_inds] - 1 # bbox target bbox_target = gt_instances.bboxes[pos_assigned_gt_inds.long()] # cls target - max_overlaps = assign_result['max_overlaps'][pos_inds] - cls_target = F.one_hot(pos_gt_labels, - self.num_classes) * max_overlaps.unsqueeze(-1) + max_overlaps = assign_result["max_overlaps"][pos_inds] + cls_target = F.one_hot(pos_gt_labels, self.num_classes) * max_overlaps.unsqueeze(-1) # pose targets kpt_target = gt_instances.keypoints[pos_assigned_gt_inds] vis_target = gt_instances.keypoints_visible[pos_assigned_gt_inds] - if 'keypoints_visible_weights' in gt_instances: - vis_weight = gt_instances.keypoints_visible_weights[ - pos_assigned_gt_inds] + if "keypoints_visible_weights" in gt_instances: + vis_weight = gt_instances.keypoints_visible_weights[pos_assigned_gt_inds] else: vis_weight = vis_target.new_ones(vis_target.shape) pos_areas = gt_instances.areas[pos_assigned_gt_inds] @@ -561,18 +583,16 @@ class YOLOXPoseHead(BaseModule): obj_target = torch.zeros_like(objectness) obj_target[pos_inds] = 1 - invalid_mask = gt_fields.get('heatmap_mask', None) + invalid_mask = gt_fields.get("heatmap_mask", None) if invalid_mask is not None and (invalid_mask != 0.0).any(): # ignore the tokens that predict the unlabled instances pred_vis = (kpt_vis.unsqueeze(-1) > 0.3).float() - mean_kpts = (decoded_kpts * pred_vis).sum(dim=1) / pred_vis.sum( - dim=1).clamp(min=1e-8) + mean_kpts = (decoded_kpts * pred_vis).sum(dim=1) / pred_vis.sum(dim=1).clamp(min=1e-8) mean_kpts = mean_kpts.reshape(1, -1, 1, 2) wh = invalid_mask.shape[-1] grids = mean_kpts / (wh - 1) * 2 - 1 mask = invalid_mask.unsqueeze(0).float() - weight = F.grid_sample( - mask, grids, mode='bilinear', padding_mode='zeros') + weight = F.grid_sample(mask, grids, mode="bilinear", padding_mode="zeros") obj_weight = 1.0 - weight.reshape(num_priors, 1) else: obj_weight = obj_target.new_ones(obj_target.shape) @@ -581,19 +601,24 @@ class YOLOXPoseHead(BaseModule): foreground_mask = torch.zeros_like(objectness.squeeze()).to(torch.bool) foreground_mask[pos_inds] = 1 pos_priors = priors[pos_inds] - group_index = [ - torch.where(pos_assigned_gt_inds == num)[0] - for num in torch.unique(pos_assigned_gt_inds) - ] - - return (foreground_mask, cls_target, obj_target, obj_weight, - bbox_target, kpt_target, vis_target, vis_weight, pos_areas, - pos_priors, group_index, num_pos_per_img) + group_index = [torch.where(pos_assigned_gt_inds == num)[0] for num in torch.unique(pos_assigned_gt_inds)] + + return ( + foreground_mask, + cls_target, + obj_target, + obj_weight, + bbox_target, + kpt_target, + vis_target, + vis_weight, + pos_areas, + pos_priors, + group_index, + num_pos_per_img, + ) - def predict(self, - feats: Features, - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Features, batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from features. Args: @@ -627,8 +652,7 @@ class YOLOXPoseHead(BaseModule): in shape (K*2, h, w) """ - cls_scores, objectnesses, bbox_preds, kpt_offsets, \ - kpt_vis = self.forward(feats) + cls_scores, objectnesses, bbox_preds, kpt_offsets, kpt_vis = self.forward(feats) cfg = copy.deepcopy(test_cfg) @@ -637,17 +661,12 @@ class YOLOXPoseHead(BaseModule): # If the shape does not change, use the previous mlvl_priors if featmap_sizes != self.featmap_sizes: - self.mlvl_priors = self.prior_generator.grid_priors( - featmap_sizes, - dtype=cls_scores[0].dtype, - device=cls_scores[0].device) + self.mlvl_priors = self.prior_generator.grid_priors(featmap_sizes, dtype=cls_scores[0].dtype, device=cls_scores[0].device) self.featmap_sizes = featmap_sizes flatten_priors = torch.cat(self.mlvl_priors) mlvl_strides = [ - flatten_priors.new_full((featmap_size.numel(), ), - stride) for featmap_size, stride in zip( - featmap_sizes, self.featmap_strides) + flatten_priors.new_full((featmap_size.numel(),), stride) for featmap_size, stride in zip(featmap_sizes, self.featmap_strides) ] flatten_stride = torch.cat(mlvl_strides) @@ -657,25 +676,21 @@ class YOLOXPoseHead(BaseModule): flatten_objectness = self._flatten_predictions(objectnesses).sigmoid() flatten_kpt_offsets = self._flatten_predictions(kpt_offsets) flatten_kpt_vis = self._flatten_predictions(kpt_vis).sigmoid() - flatten_bbox_preds = self.decode_bbox(flatten_bbox_preds, - flatten_priors, flatten_stride) - flatten_kpt_reg = self.decode_kpt_reg(flatten_kpt_offsets, - flatten_priors, flatten_stride) + flatten_bbox_preds = self.decode_bbox(flatten_bbox_preds, flatten_priors, flatten_stride) + flatten_kpt_reg = self.decode_kpt_reg(flatten_kpt_offsets, flatten_priors, flatten_stride) results_list = [] - for (bboxes, scores, objectness, kpt_reg, kpt_vis, - img_meta) in zip(flatten_bbox_preds, flatten_cls_scores, - flatten_objectness, flatten_kpt_reg, - flatten_kpt_vis, batch_img_metas): + for bboxes, scores, objectness, kpt_reg, kpt_vis, img_meta in zip( + flatten_bbox_preds, flatten_cls_scores, flatten_objectness, flatten_kpt_reg, flatten_kpt_vis, batch_img_metas + ): - score_thr = cfg.get('score_thr', 0.01) + score_thr = cfg.get("score_thr", 0.01) scores *= objectness - nms_pre = cfg.get('nms_pre', 100000) + nms_pre = cfg.get("nms_pre", 100000) scores, labels = scores.max(1, keepdim=True) - scores, _, keep_idxs_score, results = filter_scores_and_topk( - scores, score_thr, nms_pre, results=dict(labels=labels[:, 0])) - labels = results['labels'] + scores, _, keep_idxs_score, results = filter_scores_and_topk(scores, score_thr, nms_pre, results=dict(labels=labels[:, 0])) + labels = results["labels"] bboxes = bboxes[keep_idxs_score] kpt_vis = kpt_vis[keep_idxs_score] @@ -683,7 +698,7 @@ class YOLOXPoseHead(BaseModule): keypoints = kpt_reg[keep_idxs_score] if bboxes.numel() > 0: - nms_thr = cfg.get('nms_thr', 1.0) + nms_thr = cfg.get("nms_thr", 1.0) if nms_thr < 1.0: keep_idxs_nms = nms_torch(bboxes, scores, nms_thr) bboxes = bboxes[keep_idxs_nms] @@ -700,9 +715,10 @@ class YOLOXPoseHead(BaseModule): bbox_scores=scores, keypoints=keypoints, keypoint_scores=kpt_vis, - keypoints_visible=kpt_vis) + keypoints_visible=kpt_vis, + ) - input_size = img_meta['input_size'] + input_size = img_meta["input_size"] results.bboxes[:, 0::2].clamp_(0, input_size[0]) results.bboxes[:, 1::2].clamp_(0, input_size[1]) @@ -710,8 +726,7 @@ class YOLOXPoseHead(BaseModule): return results_list - def decode_bbox(self, pred_bboxes: torch.Tensor, priors: torch.Tensor, - stride: Union[torch.Tensor, int]) -> torch.Tensor: + def decode_bbox(self, pred_bboxes: torch.Tensor, priors: torch.Tensor, stride: Union[torch.Tensor, int]) -> torch.Tensor: """Decode regression results (delta_x, delta_y, log_w, log_h) to bounding boxes (tl_x, tl_y, br_x, br_y). @@ -747,9 +762,7 @@ class YOLOXPoseHead(BaseModule): decoded_bboxes = torch.stack([tl_x, tl_y, br_x, br_y], -1) return decoded_bboxes - def decode_kpt_reg(self, pred_kpt_offsets: torch.Tensor, - priors: torch.Tensor, - stride: torch.Tensor) -> torch.Tensor: + def decode_kpt_reg(self, pred_kpt_offsets: torch.Tensor, priors: torch.Tensor, stride: torch.Tensor) -> torch.Tensor: """Decode regression results (delta_x, delta_y) to keypoints coordinates (x, y). @@ -766,8 +779,7 @@ class YOLOXPoseHead(BaseModule): """ stride = stride.view(1, stride.size(0), 1, 1) priors = priors.view(1, priors.size(0), 1, 2) - pred_kpt_offsets = pred_kpt_offsets.reshape( - *pred_kpt_offsets.shape[:-1], self.num_keypoints, 2) + pred_kpt_offsets = pred_kpt_offsets.reshape(*pred_kpt_offsets.shape[:-1], self.num_keypoints, 2) decoded_kpts = pred_kpt_offsets * stride + priors return decoded_kpts diff --git a/mmpose/models/heads/regression_heads/__init__.py b/mmpose/models/heads/regression_heads/__init__.py index 729d193b51981b9819290a3787f8292c72bc16d4..d831a818ec0ce50aca0630be2a52f128968c4ef8 100644 --- a/mmpose/models/heads/regression_heads/__init__.py +++ b/mmpose/models/heads/regression_heads/__init__.py @@ -8,7 +8,11 @@ from .temporal_regression_head import TemporalRegressionHead from .trajectory_regression_head import TrajectoryRegressionHead __all__ = [ - 'RegressionHead', 'IntegralRegressionHead', 'DSNTHead', 'RLEHead', - 'TemporalRegressionHead', 'TrajectoryRegressionHead', - 'MotionRegressionHead' + "RegressionHead", + "IntegralRegressionHead", + "DSNTHead", + "RLEHead", + "TemporalRegressionHead", + "TrajectoryRegressionHead", + "MotionRegressionHead", ] diff --git a/mmpose/models/heads/regression_heads/dsnt_head.py b/mmpose/models/heads/regression_heads/dsnt_head.py index 3bd49e385db31c996de086419285e2f5fa7748b3..1af5bde88e46fa9cefbda18a715ced290270cab3 100644 --- a/mmpose/models/heads/regression_heads/dsnt_head.py +++ b/mmpose/models/heads/regression_heads/dsnt_head.py @@ -10,6 +10,7 @@ from mmpose.evaluation.functional import keypoint_pck_accuracy from mmpose.registry import MODELS from mmpose.utils.tensor_utils import to_numpy from mmpose.utils.typing import ConfigType, OptConfigType, OptSampleList + from .integral_regression_head import IntegralRegressionHead OptIntSeq = Optional[Sequence[int]] @@ -62,26 +63,26 @@ class DSNTHead(IntegralRegressionHead): _version = 2 - def __init__(self, - in_channels: Union[int, Sequence[int]], - in_featuremap_size: Tuple[int, int], - num_joints: int, - lambda_t: int = -1, - debias: bool = False, - beta: float = 1.0, - deconv_out_channels: OptIntSeq = (256, 256, 256), - deconv_kernel_sizes: OptIntSeq = (4, 4, 4), - conv_out_channels: OptIntSeq = None, - conv_kernel_sizes: OptIntSeq = None, - final_layer: dict = dict(kernel_size=1), - loss: ConfigType = dict( - type='MultipleLossWrapper', - losses=[ - dict(type='SmoothL1Loss', use_target_weight=True), - dict(type='JSDiscretLoss', use_target_weight=True) - ]), - decoder: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + in_featuremap_size: Tuple[int, int], + num_joints: int, + lambda_t: int = -1, + debias: bool = False, + beta: float = 1.0, + deconv_out_channels: OptIntSeq = (256, 256, 256), + deconv_kernel_sizes: OptIntSeq = (4, 4, 4), + conv_out_channels: OptIntSeq = None, + conv_kernel_sizes: OptIntSeq = None, + final_layer: dict = dict(kernel_size=1), + loss: ConfigType = dict( + type="MultipleLossWrapper", + losses=[dict(type="SmoothL1Loss", use_target_weight=True), dict(type="JSDiscretLoss", use_target_weight=True)], + ), + decoder: OptConfigType = None, + init_cfg: OptConfigType = None, + ): super().__init__( in_channels=in_channels, @@ -96,24 +97,18 @@ class DSNTHead(IntegralRegressionHead): final_layer=final_layer, loss=loss, decoder=decoder, - init_cfg=init_cfg) + init_cfg=init_cfg, + ) self.lambda_t = lambda_t - def loss(self, - inputs: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, inputs: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples.""" pred_coords, pred_heatmaps = self.forward(inputs) - keypoint_labels = torch.cat( - [d.gt_instance_labels.keypoint_labels for d in batch_data_samples]) - keypoint_weights = torch.cat([ - d.gt_instance_labels.keypoint_weights for d in batch_data_samples - ]) - gt_heatmaps = torch.stack( - [d.gt_fields.heatmaps for d in batch_data_samples]) + keypoint_labels = torch.cat([d.gt_instance_labels.keypoint_labels for d in batch_data_samples]) + keypoint_weights = torch.cat([d.gt_instance_labels.keypoint_weights for d in batch_data_samples]) + gt_heatmaps = torch.stack([d.gt_fields.heatmaps for d in batch_data_samples]) input_list = [pred_coords, pred_heatmaps] target_list = [keypoint_labels, gt_heatmaps] @@ -126,7 +121,7 @@ class DSNTHead(IntegralRegressionHead): if self.lambda_t > 0: mh = MessageHub.get_current_instance() - cur_epoch = mh.get_info('epoch') + cur_epoch = mh.get_info("epoch") if cur_epoch >= self.lambda_t: loss = loss_list[0] @@ -138,7 +133,8 @@ class DSNTHead(IntegralRegressionHead): gt=to_numpy(keypoint_labels), mask=to_numpy(keypoint_weights) > 0, thr=0.05, - norm_factor=np.ones((pred_coords.size(0), 2), dtype=np.float32)) + norm_factor=np.ones((pred_coords.size(0), 2), dtype=np.float32), + ) acc_pose = torch.tensor(avg_acc, device=keypoint_labels.device) losses.update(acc_pose=acc_pose) diff --git a/mmpose/models/heads/regression_heads/integral_regression_head.py b/mmpose/models/heads/regression_heads/integral_regression_head.py index 9046d94ad4318a19a3037f839ee054a445c80c68..f4889d49f9d595c93f1ef45498e4a5117319d640 100644 --- a/mmpose/models/heads/regression_heads/integral_regression_head.py +++ b/mmpose/models/heads/regression_heads/integral_regression_head.py @@ -13,8 +13,8 @@ from mmpose.evaluation.functional import keypoint_pck_accuracy from mmpose.models.utils.tta import flip_coordinates, flip_heatmaps from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, OptConfigType, OptSampleList, - Predictions) +from mmpose.utils.typing import ConfigType, OptConfigType, OptSampleList, Predictions + from .. import HeatmapHead from ..base_head import BaseHead @@ -66,21 +66,22 @@ class IntegralRegressionHead(BaseHead): _version = 2 - def __init__(self, - in_channels: Union[int, Sequence[int]], - in_featuremap_size: Tuple[int, int], - num_joints: int, - debias: bool = False, - beta: float = 1.0, - deconv_out_channels: OptIntSeq = (256, 256, 256), - deconv_kernel_sizes: OptIntSeq = (4, 4, 4), - conv_out_channels: OptIntSeq = None, - conv_kernel_sizes: OptIntSeq = None, - final_layer: dict = dict(kernel_size=1), - loss: ConfigType = dict( - type='SmoothL1Loss', use_target_weight=True), - decoder: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + in_featuremap_size: Tuple[int, int], + num_joints: int, + debias: bool = False, + beta: float = 1.0, + deconv_out_channels: OptIntSeq = (256, 256, 256), + deconv_kernel_sizes: OptIntSeq = (4, 4, 4), + conv_out_channels: OptIntSeq = None, + conv_kernel_sizes: OptIntSeq = None, + final_layer: dict = dict(kernel_size=1), + loss: ConfigType = dict(type="SmoothL1Loss", use_target_weight=True), + decoder: OptConfigType = None, + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg @@ -100,8 +101,7 @@ class IntegralRegressionHead(BaseHead): num_deconv = len(deconv_out_channels) if deconv_out_channels else 0 if num_deconv != 0: - self.heatmap_size = tuple( - [s * (2**num_deconv) for s in in_featuremap_size]) + self.heatmap_size = tuple([s * (2**num_deconv) for s in in_featuremap_size]) # deconv layers + 1x1 conv self.simplebaseline_head = HeatmapHead( @@ -111,7 +111,8 @@ class IntegralRegressionHead(BaseHead): deconv_kernel_sizes=deconv_kernel_sizes, conv_out_channels=conv_out_channels, conv_kernel_sizes=conv_kernel_sizes, - final_layer=final_layer) + final_layer=final_layer, + ) if final_layer is not None: in_channels = num_joints @@ -122,11 +123,7 @@ class IntegralRegressionHead(BaseHead): self.simplebaseline_head = None if final_layer is not None: - cfg = dict( - type='Conv2d', - in_channels=in_channels, - out_channels=num_joints, - kernel_size=1) + cfg = dict(type="Conv2d", in_channels=in_channels, out_channels=num_joints, kernel_size=1) cfg.update(final_layer) self.final_layer = build_conv_layer(cfg) else: @@ -135,9 +132,7 @@ class IntegralRegressionHead(BaseHead): self.heatmap_size = in_featuremap_size if isinstance(in_channels, list): - raise ValueError( - f'{self.__class__.__name__} does not support selecting ' - 'multiple input features.') + raise ValueError(f"{self.__class__.__name__} does not support selecting " "multiple input features.") W, H = self.heatmap_size self.linspace_x = torch.arange(0.0, 1.0 * W, 1).reshape(1, 1, 1, W) / W @@ -148,8 +143,7 @@ class IntegralRegressionHead(BaseHead): self._register_load_state_dict_pre_hook(self._load_state_dict_pre_hook) - def _linear_expectation(self, heatmaps: Tensor, - linspace: Tensor) -> Tensor: + def _linear_expectation(self, heatmaps: Tensor, linspace: Tensor) -> Tensor: """Calculate linear expectation.""" B, N, _, _ = heatmaps.shape @@ -199,10 +193,7 @@ class IntegralRegressionHead(BaseHead): coords = torch.cat([pred_x, pred_y], dim=-1) return coords, heatmaps - def predict(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from features. Args: @@ -233,27 +224,22 @@ class IntegralRegressionHead(BaseHead): - heatmaps (Tensor): The predicted heatmaps in shape (K, h, w) """ - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): # TTA: flip test -> feats = [orig, flipped] assert isinstance(feats, list) and len(feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] - input_size = batch_data_samples[0].metainfo['input_size'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] + input_size = batch_data_samples[0].metainfo["input_size"] _feats, _feats_flip = feats _batch_coords, _batch_heatmaps = self.forward(_feats) - _batch_coords_flip, _batch_heatmaps_flip = self.forward( - _feats_flip) + _batch_coords_flip, _batch_heatmaps_flip = self.forward(_feats_flip) _batch_coords_flip = flip_coordinates( - _batch_coords_flip, - flip_indices=flip_indices, - shift_coords=test_cfg.get('shift_coords', True), - input_size=input_size) + _batch_coords_flip, flip_indices=flip_indices, shift_coords=test_cfg.get("shift_coords", True), input_size=input_size + ) _batch_heatmaps_flip = flip_heatmaps( - _batch_heatmaps_flip, - flip_mode='heatmap', - flip_indices=flip_indices, - shift_heatmap=test_cfg.get('shift_heatmap', False)) + _batch_heatmaps_flip, flip_mode="heatmap", flip_indices=flip_indices, shift_heatmap=test_cfg.get("shift_heatmap", False) + ) batch_coords = (_batch_coords + _batch_coords_flip) * 0.5 batch_heatmaps = (_batch_heatmaps + _batch_heatmaps_flip) * 0.5 @@ -263,26 +249,18 @@ class IntegralRegressionHead(BaseHead): batch_coords.unsqueeze_(dim=1) # (B, N, K, D) preds = self.decode(batch_coords) - if test_cfg.get('output_heatmaps', False): - pred_fields = [ - PixelData(heatmaps=hm) for hm in batch_heatmaps.detach() - ] + if test_cfg.get("output_heatmaps", False): + pred_fields = [PixelData(heatmaps=hm) for hm in batch_heatmaps.detach()] return preds, pred_fields else: return preds - def loss(self, - inputs: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, inputs: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples.""" pred_coords, _ = self.forward(inputs) - keypoint_labels = torch.cat( - [d.gt_instance_labels.keypoint_labels for d in batch_data_samples]) - keypoint_weights = torch.cat([ - d.gt_instance_labels.keypoint_weights for d in batch_data_samples - ]) + keypoint_labels = torch.cat([d.gt_instance_labels.keypoint_labels for d in batch_data_samples]) + keypoint_weights = torch.cat([d.gt_instance_labels.keypoint_weights for d in batch_data_samples]) # calculate losses losses = dict() @@ -298,7 +276,8 @@ class IntegralRegressionHead(BaseHead): gt=to_numpy(keypoint_labels), mask=to_numpy(keypoint_weights) > 0, thr=0.05, - norm_factor=np.ones((pred_coords.size(0), 2), dtype=np.float32)) + norm_factor=np.ones((pred_coords.size(0), 2), dtype=np.float32), + ) acc_pose = torch.tensor(avg_acc, device=keypoint_labels.device) losses.update(acc_pose=acc_pose) @@ -307,11 +286,10 @@ class IntegralRegressionHead(BaseHead): @property def default_init_cfg(self): - init_cfg = [dict(type='Normal', layer=['Linear'], std=0.01, bias=0)] + init_cfg = [dict(type="Normal", layer=["Linear"], std=0.01, bias=0)] return init_cfg - def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, - **kwargs): + def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, **kwargs): """A hook function to load weights of deconv layers from :class:`HeatmapHead` into `simplebaseline_head`. @@ -327,13 +305,11 @@ class IntegralRegressionHead(BaseHead): k = _k.lstrip(prefix) k_new = _k - k_parts = k.split('.') + k_parts = k.split(".") if self.simplebaseline_head is not None: - if k_parts[0] == 'conv_layers': - k_new = ( - prefix + 'simplebaseline_head.deconv_layers.' + - '.'.join(k_parts[1:])) - elif k_parts[0] == 'final_layer': - k_new = prefix + 'simplebaseline_head.' + k + if k_parts[0] == "conv_layers": + k_new = prefix + "simplebaseline_head.deconv_layers." + ".".join(k_parts[1:]) + elif k_parts[0] == "final_layer": + k_new = prefix + "simplebaseline_head." + k state_dict[k_new] = v diff --git a/mmpose/models/heads/regression_heads/motion_regression_head.py b/mmpose/models/heads/regression_heads/motion_regression_head.py index 2ad94973459156f6f44eadac02656bbf6c8a39b8..9646b92a1f4394fa4253453219c9f0a6e03cce25 100644 --- a/mmpose/models/heads/regression_heads/motion_regression_head.py +++ b/mmpose/models/heads/regression_heads/motion_regression_head.py @@ -10,8 +10,8 @@ from mmpose.evaluation.functional import keypoint_mpjpe from mmpose.models.utils.tta import flip_coordinates from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, OptConfigType, OptSampleList, - Predictions) +from mmpose.utils.typing import ConfigType, OptConfigType, OptSampleList, Predictions + from ..base_head import BaseHead @@ -35,14 +35,15 @@ class MotionRegressionHead(BaseHead): _version = 2 - def __init__(self, - in_channels: int = 256, - out_channels: int = 3, - embedding_size: int = 512, - loss: ConfigType = dict( - type='MSELoss', use_target_weight=True), - decoder: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: int = 256, + out_channels: int = 3, + embedding_size: int = 512, + loss: ConfigType = dict(type="MSELoss", use_target_weight=True), + decoder: OptConfigType = None, + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg @@ -58,12 +59,8 @@ class MotionRegressionHead(BaseHead): self.decoder = None # Define fully-connected layers - self.pre_logits = nn.Sequential( - OrderedDict([('fc', nn.Linear(in_channels, embedding_size)), - ('act', nn.Tanh())])) - self.fc = nn.Linear( - embedding_size, - out_channels) if embedding_size > 0 else nn.Identity() + self.pre_logits = nn.Sequential(OrderedDict([("fc", nn.Linear(in_channels, embedding_size)), ("act", nn.Tanh())])) + self.fc = nn.Linear(embedding_size, out_channels) if embedding_size > 0 else nn.Identity() def forward(self, feats: Tuple[Tensor]) -> Tensor: """Forward the network. The input is multi scale feature maps and the @@ -81,10 +78,7 @@ class MotionRegressionHead(BaseHead): return x - def predict(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from outputs. Returns: @@ -96,99 +90,67 @@ class MotionRegressionHead(BaseHead): (B, N, K). """ - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): # TTA: flip test -> feats = [orig, flipped] assert isinstance(feats, list) and len(feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] _feats, _feats_flip = feats _batch_coords = self.forward(_feats) - _batch_coords_flip = torch.stack([ - flip_coordinates( - _batch_coord_flip, - flip_indices=flip_indices, - shift_coords=test_cfg.get('shift_coords', True), - input_size=(1, 1)) - for _batch_coord_flip in self.forward(_feats_flip) - ], - dim=0) + _batch_coords_flip = torch.stack( + [ + flip_coordinates( + _batch_coord_flip, flip_indices=flip_indices, shift_coords=test_cfg.get("shift_coords", True), input_size=(1, 1) + ) + for _batch_coord_flip in self.forward(_feats_flip) + ], + dim=0, + ) batch_coords = (_batch_coords + _batch_coords_flip) * 0.5 else: batch_coords = self.forward(feats) # Restore global position with camera_param and factor - camera_param = batch_data_samples[0].metainfo.get('camera_param', None) + camera_param = batch_data_samples[0].metainfo.get("camera_param", None) if camera_param is not None: - w = torch.stack([ - torch.from_numpy(np.array([b.metainfo['camera_param']['w']])) - for b in batch_data_samples - ]) - h = torch.stack([ - torch.from_numpy(np.array([b.metainfo['camera_param']['h']])) - for b in batch_data_samples - ]) + w = torch.stack([torch.from_numpy(np.array([b.metainfo["camera_param"]["w"]])) for b in batch_data_samples]) + h = torch.stack([torch.from_numpy(np.array([b.metainfo["camera_param"]["h"]])) for b in batch_data_samples]) else: - w = torch.stack([ - torch.empty((0), dtype=torch.float32) - for _ in batch_data_samples - ]) - h = torch.stack([ - torch.empty((0), dtype=torch.float32) - for _ in batch_data_samples - ]) - - factor = batch_data_samples[0].metainfo.get('factor', None) + w = torch.stack([torch.empty((0), dtype=torch.float32) for _ in batch_data_samples]) + h = torch.stack([torch.empty((0), dtype=torch.float32) for _ in batch_data_samples]) + + factor = batch_data_samples[0].metainfo.get("factor", None) if factor is not None: - factor = torch.stack([ - torch.from_numpy(b.metainfo['factor']) - for b in batch_data_samples - ]) + factor = torch.stack([torch.from_numpy(b.metainfo["factor"]) for b in batch_data_samples]) else: - factor = torch.stack([ - torch.empty((0), dtype=torch.float32) - for _ in batch_data_samples - ]) + factor = torch.stack([torch.empty((0), dtype=torch.float32) for _ in batch_data_samples]) preds = self.decode((batch_coords, w, h, factor)) return preds - def loss(self, - inputs: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, inputs: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples.""" pred_outputs = self.forward(inputs) - lifting_target_label = torch.stack([ - d.gt_instance_labels.lifting_target_label - for d in batch_data_samples - ]) - lifting_target_weight = torch.stack([ - d.gt_instance_labels.lifting_target_weight - for d in batch_data_samples - ]) + lifting_target_label = torch.stack([d.gt_instance_labels.lifting_target_label for d in batch_data_samples]) + lifting_target_weight = torch.stack([d.gt_instance_labels.lifting_target_weight for d in batch_data_samples]) # calculate losses losses = dict() - loss = self.loss_module(pred_outputs, lifting_target_label, - lifting_target_weight.unsqueeze(-1)) + loss = self.loss_module(pred_outputs, lifting_target_label, lifting_target_weight.unsqueeze(-1)) losses.update(loss_pose3d=loss) # calculate accuracy - mpjpe_err = keypoint_mpjpe( - pred=to_numpy(pred_outputs), - gt=to_numpy(lifting_target_label), - mask=to_numpy(lifting_target_weight) > 0) + mpjpe_err = keypoint_mpjpe(pred=to_numpy(pred_outputs), gt=to_numpy(lifting_target_label), mask=to_numpy(lifting_target_weight) > 0) - mpjpe_pose = torch.tensor( - mpjpe_err, device=lifting_target_label.device) + mpjpe_pose = torch.tensor(mpjpe_err, device=lifting_target_label.device) losses.update(mpjpe=mpjpe_pose) return losses @property def default_init_cfg(self): - init_cfg = [dict(type='TruncNormal', layer=['Linear'], std=0.02)] + init_cfg = [dict(type="TruncNormal", layer=["Linear"], std=0.02)] return init_cfg diff --git a/mmpose/models/heads/regression_heads/regression_head.py b/mmpose/models/heads/regression_heads/regression_head.py index 8ff73aa6ef1bed93e8985d9be20f3c94355d8c21..b0b6d0bca53574da0294e12a9f81a44524297f68 100644 --- a/mmpose/models/heads/regression_heads/regression_head.py +++ b/mmpose/models/heads/regression_heads/regression_head.py @@ -9,8 +9,8 @@ from mmpose.evaluation.functional import keypoint_pck_accuracy from mmpose.models.utils.tta import flip_coordinates from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, OptConfigType, OptSampleList, - Predictions) +from mmpose.utils.typing import ConfigType, OptConfigType, OptSampleList, Predictions + from ..base_head import BaseHead OptIntSeq = Optional[Sequence[int]] @@ -37,13 +37,14 @@ class RegressionHead(BaseHead): _version = 2 - def __init__(self, - in_channels: Union[int, Sequence[int]], - num_joints: int, - loss: ConfigType = dict( - type='SmoothL1Loss', use_target_weight=True), - decoder: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + num_joints: int, + loss: ConfigType = dict(type="SmoothL1Loss", use_target_weight=True), + decoder: OptConfigType = None, + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg @@ -78,25 +79,20 @@ class RegressionHead(BaseHead): return x.reshape(-1, self.num_joints, 2) - def predict(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from outputs.""" - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): # TTA: flip test -> feats = [orig, flipped] assert isinstance(feats, list) and len(feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] - input_size = batch_data_samples[0].metainfo['input_size'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] + input_size = batch_data_samples[0].metainfo["input_size"] _feats, _feats_flip = feats _batch_coords = self.forward(_feats) _batch_coords_flip = flip_coordinates( - self.forward(_feats_flip), - flip_indices=flip_indices, - shift_coords=test_cfg.get('shift_coords', True), - input_size=input_size) + self.forward(_feats_flip), flip_indices=flip_indices, shift_coords=test_cfg.get("shift_coords", True), input_size=input_size + ) batch_coords = (_batch_coords + _batch_coords_flip) * 0.5 else: batch_coords = self.forward(feats) # (B, K, D) @@ -106,24 +102,17 @@ class RegressionHead(BaseHead): return preds - def loss(self, - inputs: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, inputs: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples.""" pred_outputs = self.forward(inputs) - keypoint_labels = torch.cat( - [d.gt_instance_labels.keypoint_labels for d in batch_data_samples]) - keypoint_weights = torch.cat([ - d.gt_instance_labels.keypoint_weights for d in batch_data_samples - ]) + keypoint_labels = torch.cat([d.gt_instance_labels.keypoint_labels for d in batch_data_samples]) + keypoint_weights = torch.cat([d.gt_instance_labels.keypoint_weights for d in batch_data_samples]) # calculate losses losses = dict() - loss = self.loss_module(pred_outputs, keypoint_labels, - keypoint_weights.unsqueeze(-1)) + loss = self.loss_module(pred_outputs, keypoint_labels, keypoint_weights.unsqueeze(-1)) losses.update(loss_kpt=loss) @@ -133,7 +122,8 @@ class RegressionHead(BaseHead): gt=to_numpy(keypoint_labels), mask=to_numpy(keypoint_weights) > 0, thr=0.05, - norm_factor=np.ones((pred_outputs.size(0), 2), dtype=np.float32)) + norm_factor=np.ones((pred_outputs.size(0), 2), dtype=np.float32), + ) acc_pose = torch.tensor(avg_acc, device=keypoint_labels.device) losses.update(acc_pose=acc_pose) @@ -142,5 +132,5 @@ class RegressionHead(BaseHead): @property def default_init_cfg(self): - init_cfg = [dict(type='Normal', layer=['Linear'], std=0.01, bias=0)] + init_cfg = [dict(type="Normal", layer=["Linear"], std=0.01, bias=0)] return init_cfg diff --git a/mmpose/models/heads/regression_heads/rle_head.py b/mmpose/models/heads/regression_heads/rle_head.py index ef696dffa6b53c9c981879732416a21fa8f45349..e0ad24b45d17304cf112bb9578c7abd0e7914c25 100644 --- a/mmpose/models/heads/regression_heads/rle_head.py +++ b/mmpose/models/heads/regression_heads/rle_head.py @@ -9,8 +9,8 @@ from mmpose.evaluation.functional import keypoint_pck_accuracy from mmpose.models.utils.tta import flip_coordinates from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, OptConfigType, OptSampleList, - Predictions) +from mmpose.utils.typing import ConfigType, OptConfigType, OptSampleList, Predictions + from ..base_head import BaseHead OptIntSeq = Optional[Sequence[int]] @@ -37,13 +37,14 @@ class RLEHead(BaseHead): _version = 2 - def __init__(self, - in_channels: Union[int, Sequence[int]], - num_joints: int, - loss: ConfigType = dict( - type='RLELoss', use_target_weight=True), - decoder: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + num_joints: int, + loss: ConfigType = dict(type="RLELoss", use_target_weight=True), + decoder: OptConfigType = None, + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg @@ -81,17 +82,14 @@ class RLEHead(BaseHead): return x.reshape(-1, self.num_joints, 4) - def predict(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from outputs.""" - if test_cfg.get('flip_test', False): + if test_cfg.get("flip_test", False): # TTA: flip test -> feats = [orig, flipped] assert isinstance(feats, list) and len(feats) == 2 - flip_indices = batch_data_samples[0].metainfo['flip_indices'] - input_size = batch_data_samples[0].metainfo['input_size'] + flip_indices = batch_data_samples[0].metainfo["flip_indices"] + input_size = batch_data_samples[0].metainfo["input_size"] _feats, _feats_flip = feats @@ -99,10 +97,8 @@ class RLEHead(BaseHead): _batch_coords[..., 2:] = _batch_coords[..., 2:].sigmoid() _batch_coords_flip = flip_coordinates( - self.forward(_feats_flip), - flip_indices=flip_indices, - shift_coords=test_cfg.get('shift_coords', True), - input_size=input_size) + self.forward(_feats_flip), flip_indices=flip_indices, shift_coords=test_cfg.get("shift_coords", True), input_size=input_size + ) _batch_coords_flip[..., 2:] = _batch_coords_flip[..., 2:].sigmoid() batch_coords = (_batch_coords + _batch_coords_flip) * 0.5 @@ -115,27 +111,20 @@ class RLEHead(BaseHead): return preds - def loss(self, - inputs: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, inputs: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples.""" pred_outputs = self.forward(inputs) - keypoint_labels = torch.cat( - [d.gt_instance_labels.keypoint_labels for d in batch_data_samples]) - keypoint_weights = torch.cat([ - d.gt_instance_labels.keypoint_weights for d in batch_data_samples - ]) + keypoint_labels = torch.cat([d.gt_instance_labels.keypoint_labels for d in batch_data_samples]) + keypoint_weights = torch.cat([d.gt_instance_labels.keypoint_weights for d in batch_data_samples]) pred_coords = pred_outputs[:, :, :2] pred_sigma = pred_outputs[:, :, 2:4] # calculate losses losses = dict() - loss = self.loss_module(pred_coords, pred_sigma, keypoint_labels, - keypoint_weights.unsqueeze(-1)) + loss = self.loss_module(pred_coords, pred_sigma, keypoint_labels, keypoint_weights.unsqueeze(-1)) losses.update(loss_kpt=loss) @@ -145,15 +134,15 @@ class RLEHead(BaseHead): gt=to_numpy(keypoint_labels), mask=to_numpy(keypoint_weights) > 0, thr=0.05, - norm_factor=np.ones((pred_coords.size(0), 2), dtype=np.float32)) + norm_factor=np.ones((pred_coords.size(0), 2), dtype=np.float32), + ) acc_pose = torch.tensor(avg_acc, device=keypoint_labels.device) losses.update(acc_pose=acc_pose) return losses - def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, - **kwargs): + def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, **kwargs): """A hook function to convert old-version state dict of :class:`DeepposeRegressionHead` (before MMPose v1.0.0) to a compatible format of :class:`RegressionHead`. @@ -161,7 +150,7 @@ class RLEHead(BaseHead): The hook will be automatically registered during initialization. """ - version = local_meta.get('version', None) + version = local_meta.get("version", None) if version and version >= self._version: return @@ -172,10 +161,10 @@ class RLEHead(BaseHead): k = _k.lstrip(prefix) # In old version, "loss" includes the instances of loss, # now it should be renamed "loss_module" - k_parts = k.split('.') - if k_parts[0] == 'loss': + k_parts = k.split(".") + if k_parts[0] == "loss": # loss.xxx -> loss_module.xxx - k_new = prefix + 'loss_module.' + '.'.join(k_parts[1:]) + k_new = prefix + "loss_module." + ".".join(k_parts[1:]) else: k_new = _k @@ -183,5 +172,5 @@ class RLEHead(BaseHead): @property def default_init_cfg(self): - init_cfg = [dict(type='Normal', layer=['Linear'], std=0.01, bias=0)] + init_cfg = [dict(type="Normal", layer=["Linear"], std=0.01, bias=0)] return init_cfg diff --git a/mmpose/models/heads/regression_heads/temporal_regression_head.py b/mmpose/models/heads/regression_heads/temporal_regression_head.py index 902be8099ef6d8643f78542dc4949eea5004816b..96df8589a0afb1ff3674298dfbf30138d6e6f7fa 100644 --- a/mmpose/models/heads/regression_heads/temporal_regression_head.py +++ b/mmpose/models/heads/regression_heads/temporal_regression_head.py @@ -7,8 +7,8 @@ from torch import Tensor, nn from mmpose.evaluation.functional import keypoint_mpjpe from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, OptConfigType, OptSampleList, - Predictions) +from mmpose.utils.typing import ConfigType, OptConfigType, OptSampleList, Predictions + from ..base_head import BaseHead OptIntSeq = Optional[Sequence[int]] @@ -33,13 +33,14 @@ class TemporalRegressionHead(BaseHead): _version = 2 - def __init__(self, - in_channels: Union[int, Sequence[int]], - num_joints: int, - loss: ConfigType = dict( - type='MSELoss', use_target_weight=True), - decoder: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + num_joints: int, + loss: ConfigType = dict(type="MSELoss", use_target_weight=True), + decoder: OptConfigType = None, + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg @@ -73,10 +74,7 @@ class TemporalRegressionHead(BaseHead): return x.reshape(-1, self.num_joints, 3) - def predict(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from outputs. Returns: @@ -91,59 +89,39 @@ class TemporalRegressionHead(BaseHead): batch_coords = self.forward(feats) # (B, K, D) # Restore global position with target_root - target_root = batch_data_samples[0].metainfo.get('target_root', None) + target_root = batch_data_samples[0].metainfo.get("target_root", None) if target_root is not None: - target_root = torch.stack([ - torch.from_numpy(b.metainfo['target_root']) - for b in batch_data_samples - ]) + target_root = torch.stack([torch.from_numpy(b.metainfo["target_root"]) for b in batch_data_samples]) else: - target_root = torch.stack([ - torch.empty((0), dtype=torch.float32) - for _ in batch_data_samples - ]) + target_root = torch.stack([torch.empty((0), dtype=torch.float32) for _ in batch_data_samples]) preds = self.decode((batch_coords, target_root)) return preds - def loss(self, - inputs: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, inputs: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples.""" pred_outputs = self.forward(inputs) - lifting_target_label = torch.cat([ - d.gt_instance_labels.lifting_target_label - for d in batch_data_samples - ]) - lifting_target_weight = torch.cat([ - d.gt_instance_labels.lifting_target_weight - for d in batch_data_samples - ]) + lifting_target_label = torch.cat([d.gt_instance_labels.lifting_target_label for d in batch_data_samples]) + lifting_target_weight = torch.cat([d.gt_instance_labels.lifting_target_weight for d in batch_data_samples]) # calculate losses losses = dict() - loss = self.loss_module(pred_outputs, lifting_target_label, - lifting_target_weight.unsqueeze(-1)) + loss = self.loss_module(pred_outputs, lifting_target_label, lifting_target_weight.unsqueeze(-1)) losses.update(loss_pose3d=loss) # calculate accuracy - mpjpe_err = keypoint_mpjpe( - pred=to_numpy(pred_outputs), - gt=to_numpy(lifting_target_label), - mask=to_numpy(lifting_target_weight) > 0) + mpjpe_err = keypoint_mpjpe(pred=to_numpy(pred_outputs), gt=to_numpy(lifting_target_label), mask=to_numpy(lifting_target_weight) > 0) - mpjpe_pose = torch.tensor( - mpjpe_err, device=lifting_target_label.device) + mpjpe_pose = torch.tensor(mpjpe_err, device=lifting_target_label.device) losses.update(mpjpe=mpjpe_pose) return losses @property def default_init_cfg(self): - init_cfg = [dict(type='Normal', layer=['Linear'], std=0.01, bias=0)] + init_cfg = [dict(type="Normal", layer=["Linear"], std=0.01, bias=0)] return init_cfg diff --git a/mmpose/models/heads/regression_heads/trajectory_regression_head.py b/mmpose/models/heads/regression_heads/trajectory_regression_head.py index b4d02f2ce37061a0c6cd6abd0e188937dd405c21..406b07c1f65b84dfa6701c0bd8cec1aa7820125a 100644 --- a/mmpose/models/heads/regression_heads/trajectory_regression_head.py +++ b/mmpose/models/heads/regression_heads/trajectory_regression_head.py @@ -7,8 +7,8 @@ from torch import Tensor, nn from mmpose.evaluation.functional import keypoint_mpjpe from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, OptConfigType, OptSampleList, - Predictions) +from mmpose.utils.typing import ConfigType, OptConfigType, OptSampleList, Predictions + from ..base_head import BaseHead OptIntSeq = Optional[Sequence[int]] @@ -33,13 +33,14 @@ class TrajectoryRegressionHead(BaseHead): _version = 2 - def __init__(self, - in_channels: Union[int, Sequence[int]], - num_joints: int, - loss: ConfigType = dict( - type='MPJPELoss', use_target_weight=True), - decoder: OptConfigType = None, - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: Union[int, Sequence[int]], + num_joints: int, + loss: ConfigType = dict(type="MPJPELoss", use_target_weight=True), + decoder: OptConfigType = None, + init_cfg: OptConfigType = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg @@ -73,10 +74,7 @@ class TrajectoryRegressionHead(BaseHead): return x.reshape(-1, self.num_joints, 3) - def predict(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from outputs. Returns: @@ -91,58 +89,39 @@ class TrajectoryRegressionHead(BaseHead): batch_coords = self.forward(feats) # (B, K, D) # Restore global position with target_root - target_root = batch_data_samples[0].metainfo.get('target_root', None) + target_root = batch_data_samples[0].metainfo.get("target_root", None) if target_root is not None: - target_root = torch.stack([ - torch.from_numpy(b.metainfo['target_root']) - for b in batch_data_samples - ]) + target_root = torch.stack([torch.from_numpy(b.metainfo["target_root"]) for b in batch_data_samples]) else: - target_root = torch.stack([ - torch.empty((0), dtype=torch.float32) - for _ in batch_data_samples - ]) + target_root = torch.stack([torch.empty((0), dtype=torch.float32) for _ in batch_data_samples]) preds = self.decode((batch_coords, target_root)) return preds - def loss(self, - inputs: Union[Tensor, Tuple[Tensor]], - batch_data_samples: OptSampleList, - train_cfg: ConfigType = {}) -> dict: + def loss(self, inputs: Union[Tensor, Tuple[Tensor]], batch_data_samples: OptSampleList, train_cfg: ConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples.""" pred_outputs = self.forward(inputs) - lifting_target_label = torch.cat([ - d.gt_instance_labels.lifting_target_label - for d in batch_data_samples - ]) - trajectory_weights = torch.cat([ - d.gt_instance_labels.trajectory_weights for d in batch_data_samples - ]) + lifting_target_label = torch.cat([d.gt_instance_labels.lifting_target_label for d in batch_data_samples]) + trajectory_weights = torch.cat([d.gt_instance_labels.trajectory_weights for d in batch_data_samples]) # calculate losses losses = dict() - loss = self.loss_module(pred_outputs, lifting_target_label, - trajectory_weights.unsqueeze(-1)) + loss = self.loss_module(pred_outputs, lifting_target_label, trajectory_weights.unsqueeze(-1)) losses.update(loss_traj=loss) # calculate accuracy - mpjpe_err = keypoint_mpjpe( - pred=to_numpy(pred_outputs), - gt=to_numpy(lifting_target_label), - mask=to_numpy(trajectory_weights) > 0) + mpjpe_err = keypoint_mpjpe(pred=to_numpy(pred_outputs), gt=to_numpy(lifting_target_label), mask=to_numpy(trajectory_weights) > 0) - mpjpe_traj = torch.tensor( - mpjpe_err, device=lifting_target_label.device) + mpjpe_traj = torch.tensor(mpjpe_err, device=lifting_target_label.device) losses.update(mpjpe_traj=mpjpe_traj) return losses @property def default_init_cfg(self): - init_cfg = [dict(type='Normal', layer=['Linear'], std=0.01, bias=0)] + init_cfg = [dict(type="Normal", layer=["Linear"], std=0.01, bias=0)] return init_cfg diff --git a/mmpose/models/heads/transformer_heads/__init__.py b/mmpose/models/heads/transformer_heads/__init__.py index bb16484ff8b441db1f32a721cd8f0d410234289e..a8e81c85880374edc62199606e84a0e9b76bfb8b 100644 --- a/mmpose/models/heads/transformer_heads/__init__.py +++ b/mmpose/models/heads/transformer_heads/__init__.py @@ -1,17 +1,28 @@ # Copyright (c) OpenMMLab. All rights reserved. from .edpose_head import EDPoseHead -from .transformers import (FFN, DeformableDetrTransformerDecoder, - DeformableDetrTransformerDecoderLayer, - DeformableDetrTransformerEncoder, - DeformableDetrTransformerEncoderLayer, - DetrTransformerDecoder, DetrTransformerDecoderLayer, - DetrTransformerEncoder, DetrTransformerEncoderLayer, - PositionEmbeddingSineHW) +from .transformers import ( + FFN, + DeformableDetrTransformerDecoder, + DeformableDetrTransformerDecoderLayer, + DeformableDetrTransformerEncoder, + DeformableDetrTransformerEncoderLayer, + DetrTransformerDecoder, + DetrTransformerDecoderLayer, + DetrTransformerEncoder, + DetrTransformerEncoderLayer, + PositionEmbeddingSineHW, +) __all__ = [ - 'EDPoseHead', 'DetrTransformerEncoder', 'DetrTransformerDecoder', - 'DetrTransformerEncoderLayer', 'DetrTransformerDecoderLayer', - 'DeformableDetrTransformerEncoder', 'DeformableDetrTransformerDecoder', - 'DeformableDetrTransformerEncoderLayer', - 'DeformableDetrTransformerDecoderLayer', 'PositionEmbeddingSineHW', 'FFN' + "EDPoseHead", + "DetrTransformerEncoder", + "DetrTransformerDecoder", + "DetrTransformerEncoderLayer", + "DetrTransformerDecoderLayer", + "DeformableDetrTransformerEncoder", + "DeformableDetrTransformerDecoder", + "DeformableDetrTransformerEncoderLayer", + "DeformableDetrTransformerDecoderLayer", + "PositionEmbeddingSineHW", + "FFN", ] diff --git a/mmpose/models/heads/transformer_heads/base_transformer_head.py b/mmpose/models/heads/transformer_heads/base_transformer_head.py index 96855e186d8874a712f38d0ef6a604ace5d34b7f..2f64d88a2f063cd00b6020dd8eecdc611f009b1b 100644 --- a/mmpose/models/heads/transformer_heads/base_transformer_head.py +++ b/mmpose/models/heads/transformer_heads/base_transformer_head.py @@ -6,8 +6,8 @@ import torch from torch import Tensor from mmpose.registry import MODELS -from mmpose.utils.typing import (Features, OptConfigType, OptMultiConfig, - OptSampleList, Predictions) +from mmpose.utils.typing import Features, OptConfigType, OptMultiConfig, OptSampleList, Predictions + from ..base_head import BaseHead @@ -34,14 +34,16 @@ class TransformerHead(BaseHead): init_cfg (ConfigDict, optional): Config to control the initialization. """ - def __init__(self, - encoder: OptConfigType = None, - decoder: OptConfigType = None, - out_head: OptConfigType = None, - positional_encoding: OptConfigType = None, - num_queries: int = 100, - loss: OptConfigType = None, - init_cfg: OptMultiConfig = None): + def __init__( + self, + encoder: OptConfigType = None, + decoder: OptConfigType = None, + out_head: OptConfigType = None, + positional_encoding: OptConfigType = None, + num_queries: int = 100, + loss: OptConfigType = None, + init_cfg: OptMultiConfig = None, + ): if init_cfg is None: init_cfg = self.default_init_cfg @@ -54,46 +56,34 @@ class TransformerHead(BaseHead): self.positional_encoding_cfg = positional_encoding self.num_queries = num_queries - def forward(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList = None) -> Dict: + def forward(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList = None) -> Dict: """Forward the network.""" encoder_outputs_dict = self.forward_encoder(feats, batch_data_samples) decoder_outputs_dict = self.forward_decoder(**encoder_outputs_dict) - head_outputs_dict = self.forward_out_head(batch_data_samples, - **decoder_outputs_dict) + head_outputs_dict = self.forward_out_head(batch_data_samples, **decoder_outputs_dict) return head_outputs_dict @abstractmethod - def predict(self, - feats: Features, - batch_data_samples: OptSampleList, - test_cfg: OptConfigType = {}) -> Predictions: + def predict(self, feats: Features, batch_data_samples: OptSampleList, test_cfg: OptConfigType = {}) -> Predictions: """Predict results from features.""" pass - def loss(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: OptConfigType = {}) -> dict: + def loss(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: OptConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples.""" pass @abstractmethod - def forward_encoder(self, feat: Tensor, feat_mask: Tensor, - feat_pos: Tensor, **kwargs) -> Dict: + def forward_encoder(self, feat: Tensor, feat_mask: Tensor, feat_pos: Tensor, **kwargs) -> Dict: pass @abstractmethod - def forward_decoder(self, query: Tensor, query_pos: Tensor, memory: Tensor, - **kwargs) -> Dict: + def forward_decoder(self, query: Tensor, query_pos: Tensor, memory: Tensor, **kwargs) -> Dict: pass @abstractmethod - def forward_out_head(self, query: Tensor, query_pos: Tensor, - memory: Tensor, **kwargs) -> Dict: + def forward_out_head(self, query: Tensor, query_pos: Tensor, memory: Tensor, **kwargs) -> Dict: pass @staticmethod diff --git a/mmpose/models/heads/transformer_heads/edpose_head.py b/mmpose/models/heads/transformer_heads/edpose_head.py index d864f8fadd9882c16a3b81c55ca319def8d5d3aa..30a1e23b25fc7c021491fcc48ebf3b10df919560 100644 --- a/mmpose/models/heads/transformer_heads/edpose_head.py +++ b/mmpose/models/heads/transformer_heads/edpose_head.py @@ -19,11 +19,10 @@ from torch import Tensor, nn from mmpose.models.utils import inverse_sigmoid from mmpose.registry import KEYPOINT_CODECS, MODELS from mmpose.utils.tensor_utils import to_numpy -from mmpose.utils.typing import (ConfigType, Features, OptConfigType, - OptSampleList, Predictions) +from mmpose.utils.typing import ConfigType, Features, OptConfigType, OptSampleList, Predictions + from .base_transformer_head import TransformerHead -from .transformers.deformable_detr_layers import ( - DeformableDetrTransformerDecoderLayer, DeformableDetrTransformerEncoder) +from .transformers.deformable_detr_layers import DeformableDetrTransformerDecoderLayer, DeformableDetrTransformerEncoder from .transformers.utils import FFN, PositionEmbeddingSineHW @@ -46,41 +45,37 @@ class EDPoseDecoder(BaseModule): num_group (int): Number of decoder layers. """ - def __init__(self, - layer_cfg, - num_layers, - return_intermediate, - embed_dims: int = 256, - query_dim=4, - num_feature_levels=1, - num_box_decoder_layers=2, - num_keypoints=17, - num_dn=100, - num_group=100): + def __init__( + self, + layer_cfg, + num_layers, + return_intermediate, + embed_dims: int = 256, + query_dim=4, + num_feature_levels=1, + num_box_decoder_layers=2, + num_keypoints=17, + num_dn=100, + num_group=100, + ): super().__init__() self.layer_cfg = layer_cfg self.num_layers = num_layers self.embed_dims = embed_dims - assert return_intermediate, 'support return_intermediate only' + assert return_intermediate, "support return_intermediate only" self.return_intermediate = return_intermediate - assert query_dim in [ - 2, 4 - ], 'query_dim should be 2/4 but {}'.format(query_dim) + assert query_dim in [2, 4], "query_dim should be 2/4 but {}".format(query_dim) self.query_dim = query_dim self.num_feature_levels = num_feature_levels - self.layers = ModuleList([ - DeformableDetrTransformerDecoderLayer(**self.layer_cfg) - for _ in range(self.num_layers) - ]) + self.layers = ModuleList([DeformableDetrTransformerDecoderLayer(**self.layer_cfg) for _ in range(self.num_layers)]) self.norm = nn.LayerNorm(self.embed_dims) - self.ref_point_head = FFN(self.query_dim // 2 * self.embed_dims, - self.embed_dims, self.embed_dims, 2) + self.ref_point_head = FFN(self.query_dim // 2 * self.embed_dims, self.embed_dims, self.embed_dims, 2) self.num_keypoints = num_keypoints self.query_scale = None @@ -95,16 +90,21 @@ class EDPoseDecoder(BaseModule): self.num_dn = num_dn self.hw = nn.Embedding(self.num_keypoints, 2) self.keypoint_embed = nn.Embedding(self.num_keypoints, embed_dims) - self.kpt_index = [ - x for x in range(self.num_group * (self.num_keypoints + 1)) - if x % (self.num_keypoints + 1) != 0 - ] - - def forward(self, query: Tensor, value: Tensor, key_padding_mask: Tensor, - reference_points: Tensor, spatial_shapes: Tensor, - level_start_index: Tensor, valid_ratios: Tensor, - humandet_attn_mask: Tensor, human2pose_attn_mask: Tensor, - **kwargs) -> Tuple[Tensor]: + self.kpt_index = [x for x in range(self.num_group * (self.num_keypoints + 1)) if x % (self.num_keypoints + 1) != 0] + + def forward( + self, + query: Tensor, + value: Tensor, + key_padding_mask: Tensor, + reference_points: Tensor, + spatial_shapes: Tensor, + level_start_index: Tensor, + valid_ratios: Tensor, + humandet_attn_mask: Tensor, + human2pose_attn_mask: Tensor, + **kwargs, + ) -> Tuple[Tensor]: """Forward function of decoder Args: query (Tensor): The input queries, has shape (bs, num_queries, @@ -145,17 +145,12 @@ class EDPoseDecoder(BaseModule): inter_select_number = self.num_group for layer_id, layer in enumerate(self.layers): if reference_points.shape[-1] == 4: - reference_points_input = \ - reference_points[:, :, None] * \ - torch.cat([valid_ratios, valid_ratios], -1)[None, :] + reference_points_input = reference_points[:, :, None] * torch.cat([valid_ratios, valid_ratios], -1)[None, :] else: assert reference_points.shape[-1] == 2 - reference_points_input = \ - reference_points[:, :, None] * \ - valid_ratios[None, :] + reference_points_input = reference_points[:, :, None] * valid_ratios[None, :] - query_sine_embed = self.get_proposal_pos_embed( - reference_points_input[:, :, 0, :]) # nq, bs, 256*2 + query_sine_embed = self.get_proposal_pos_embed(reference_points_input[:, :, 0, :]) # nq, bs, 256*2 query_pos = self.ref_point_head(query_sine_embed) # nq, bs, 256 output = layer( @@ -166,62 +161,45 @@ class EDPoseDecoder(BaseModule): spatial_shapes=spatial_shapes, level_start_index=level_start_index, valid_ratios=valid_ratios, - reference_points=reference_points_input.transpose( - 0, 1).contiguous(), + reference_points=reference_points_input.transpose(0, 1).contiguous(), self_attn_mask=attn_mask, - **kwargs) + **kwargs, + ) output = output.transpose(0, 1) intermediate.append(self.norm(output)) # human update if layer_id < self.num_box_decoder_layers: delta_unsig = self.bbox_embed[layer_id](output) - new_reference_points = delta_unsig + inverse_sigmoid( - reference_points) + new_reference_points = delta_unsig + inverse_sigmoid(reference_points) new_reference_points = new_reference_points.sigmoid() # query expansion if layer_id == self.num_box_decoder_layers - 1: dn_output = output[:effect_num_dn] dn_new_reference_points = new_reference_points[:effect_num_dn] - class_unselected = self.class_embed[layer_id]( - output)[effect_num_dn:] - topk_proposals = torch.topk( - class_unselected.max(-1)[0], inter_select_number, dim=0)[1] + class_unselected = self.class_embed[layer_id](output)[effect_num_dn:] + topk_proposals = torch.topk(class_unselected.max(-1)[0], inter_select_number, dim=0)[1] new_reference_points_for_box = torch.gather( - new_reference_points[effect_num_dn:], 0, - topk_proposals.unsqueeze(-1).repeat(1, 1, 4)) - new_output_for_box = torch.gather( - output[effect_num_dn:], 0, - topk_proposals.unsqueeze(-1).repeat(1, 1, self.embed_dims)) + new_reference_points[effect_num_dn:], 0, topk_proposals.unsqueeze(-1).repeat(1, 1, 4) + ) + new_output_for_box = torch.gather(output[effect_num_dn:], 0, topk_proposals.unsqueeze(-1).repeat(1, 1, self.embed_dims)) bs = new_output_for_box.shape[1] - new_output_for_keypoint = new_output_for_box[:, None, :, :] \ - + self.keypoint_embed.weight[None, :, None, :] + new_output_for_keypoint = new_output_for_box[:, None, :, :] + self.keypoint_embed.weight[None, :, None, :] if self.num_keypoints == 17: - delta_xy = self.pose_embed[-1](new_output_for_keypoint)[ - ..., :2] + delta_xy = self.pose_embed[-1](new_output_for_keypoint)[..., :2] else: - delta_xy = self.pose_embed[0](new_output_for_keypoint)[ - ..., :2] - keypoint_xy = (inverse_sigmoid( - new_reference_points_for_box[..., :2][:, None]) + - delta_xy).sigmoid() + delta_xy = self.pose_embed[0](new_output_for_keypoint)[..., :2] + keypoint_xy = (inverse_sigmoid(new_reference_points_for_box[..., :2][:, None]) + delta_xy).sigmoid() num_queries, _, bs, _ = keypoint_xy.shape - keypoint_wh_weight = self.hw.weight.unsqueeze(0).unsqueeze( - -2).repeat(num_queries, 1, bs, 1).sigmoid() - keypoint_wh = keypoint_wh_weight * \ - new_reference_points_for_box[..., 2:][:, None] - new_reference_points_for_keypoint = torch.cat( - (keypoint_xy, keypoint_wh), dim=-1) - new_reference_points = torch.cat( - (new_reference_points_for_box.unsqueeze(1), - new_reference_points_for_keypoint), - dim=1).flatten(0, 1) - output = torch.cat( - (new_output_for_box.unsqueeze(1), new_output_for_keypoint), - dim=1).flatten(0, 1) + keypoint_wh_weight = self.hw.weight.unsqueeze(0).unsqueeze(-2).repeat(num_queries, 1, bs, 1).sigmoid() + keypoint_wh = keypoint_wh_weight * new_reference_points_for_box[..., 2:][:, None] + new_reference_points_for_keypoint = torch.cat((keypoint_xy, keypoint_wh), dim=-1) new_reference_points = torch.cat( - (dn_new_reference_points, new_reference_points), dim=0) + (new_reference_points_for_box.unsqueeze(1), new_reference_points_for_keypoint), dim=1 + ).flatten(0, 1) + output = torch.cat((new_output_for_box.unsqueeze(1), new_output_for_keypoint), dim=1).flatten(0, 1) + new_reference_points = torch.cat((dn_new_reference_points, new_reference_points), dim=0) output = torch.cat((dn_output, output), dim=0) attn_mask = human2pose_attn_mask @@ -231,62 +209,41 @@ class EDPoseDecoder(BaseModule): inter_select_number = self.num_group ref_before_sigmoid = inverse_sigmoid(reference_points) output_bbox_dn = output[:effect_num_dn] - output_bbox_norm = output[effect_num_dn:][0::( - self.num_keypoints + 1)] - ref_before_sigmoid_bbox_dn = \ - ref_before_sigmoid[:effect_num_dn] - ref_before_sigmoid_bbox_norm = \ - ref_before_sigmoid[effect_num_dn:][0::( - self.num_keypoints + 1)] + output_bbox_norm = output[effect_num_dn:][0 :: (self.num_keypoints + 1)] + ref_before_sigmoid_bbox_dn = ref_before_sigmoid[:effect_num_dn] + ref_before_sigmoid_bbox_norm = ref_before_sigmoid[effect_num_dn:][0 :: (self.num_keypoints + 1)] delta_unsig_dn = self.bbox_embed[layer_id](output_bbox_dn) delta_unsig_norm = self.bbox_embed[layer_id](output_bbox_norm) outputs_unsig_dn = delta_unsig_dn + ref_before_sigmoid_bbox_dn - outputs_unsig_norm = delta_unsig_norm + \ - ref_before_sigmoid_bbox_norm + outputs_unsig_norm = delta_unsig_norm + ref_before_sigmoid_bbox_norm new_reference_points_for_box_dn = outputs_unsig_dn.sigmoid() - new_reference_points_for_box_norm = outputs_unsig_norm.sigmoid( + new_reference_points_for_box_norm = outputs_unsig_norm.sigmoid() + output_kpt = output[effect_num_dn:].index_select(0, torch.tensor(self.kpt_index, device=output.device)) + delta_xy_unsig = self.pose_embed[layer_id - self.num_box_decoder_layers](output_kpt) + outputs_unsig = ( + ref_before_sigmoid[effect_num_dn:].index_select(0, torch.tensor(self.kpt_index, device=output.device)).clone() ) - output_kpt = output[effect_num_dn:].index_select( - 0, torch.tensor(self.kpt_index, device=output.device)) - delta_xy_unsig = self.pose_embed[layer_id - - self.num_box_decoder_layers]( - output_kpt) - outputs_unsig = ref_before_sigmoid[ - effect_num_dn:].index_select( - 0, torch.tensor(self.kpt_index, - device=output.device)).clone() - delta_hw_unsig = self.pose_hw_embed[ - layer_id - self.num_box_decoder_layers]( - output_kpt) + delta_hw_unsig = self.pose_hw_embed[layer_id - self.num_box_decoder_layers](output_kpt) outputs_unsig[..., :2] += delta_xy_unsig[..., :2] outputs_unsig[..., 2:] += delta_hw_unsig new_reference_points_for_keypoint = outputs_unsig.sigmoid() bs = new_reference_points_for_box_norm.shape[1] new_reference_points_norm = torch.cat( - (new_reference_points_for_box_norm.unsqueeze(1), - new_reference_points_for_keypoint.view( - -1, self.num_keypoints, bs, 4)), - dim=1).flatten(0, 1) - new_reference_points = torch.cat( - (new_reference_points_for_box_dn, - new_reference_points_norm), - dim=0) + (new_reference_points_for_box_norm.unsqueeze(1), new_reference_points_for_keypoint.view(-1, self.num_keypoints, bs, 4)), + dim=1, + ).flatten(0, 1) + new_reference_points = torch.cat((new_reference_points_for_box_dn, new_reference_points_norm), dim=0) reference_points = new_reference_points.detach() intermediate_reference_points.append(reference_points) decoder_outputs = [itm_out.transpose(0, 1) for itm_out in intermediate] - reference_points = [ - itm_refpoint.transpose(0, 1) - for itm_refpoint in intermediate_reference_points - ] + reference_points = [itm_refpoint.transpose(0, 1) for itm_refpoint in intermediate_reference_points] return decoder_outputs, reference_points @staticmethod - def get_proposal_pos_embed(pos_tensor: Tensor, - temperature: int = 10000, - num_pos_feats: int = 128) -> Tensor: + def get_proposal_pos_embed(pos_tensor: Tensor, temperature: int = 10000, num_pos_feats: int = 128) -> Tensor: """Get the position embedding of the proposal. Args: @@ -307,36 +264,28 @@ class EDPoseDecoder(BaseModule): """ scale = 2 * math.pi - dim_t = torch.arange( - num_pos_feats, dtype=torch.float32, device=pos_tensor.device) - dim_t = temperature**(2 * (dim_t // 2) / num_pos_feats) + dim_t = torch.arange(num_pos_feats, dtype=torch.float32, device=pos_tensor.device) + dim_t = temperature ** (2 * (dim_t // 2) / num_pos_feats) x_embed = pos_tensor[:, :, 0] * scale y_embed = pos_tensor[:, :, 1] * scale pos_x = x_embed[:, :, None] / dim_t pos_y = y_embed[:, :, None] / dim_t - pos_x = torch.stack((pos_x[:, :, 0::2].sin(), pos_x[:, :, 1::2].cos()), - dim=3).flatten(2) - pos_y = torch.stack((pos_y[:, :, 0::2].sin(), pos_y[:, :, 1::2].cos()), - dim=3).flatten(2) + pos_x = torch.stack((pos_x[:, :, 0::2].sin(), pos_x[:, :, 1::2].cos()), dim=3).flatten(2) + pos_y = torch.stack((pos_y[:, :, 0::2].sin(), pos_y[:, :, 1::2].cos()), dim=3).flatten(2) if pos_tensor.size(-1) == 2: pos = torch.cat((pos_y, pos_x), dim=2) elif pos_tensor.size(-1) == 4: w_embed = pos_tensor[:, :, 2] * scale pos_w = w_embed[:, :, None] / dim_t - pos_w = torch.stack( - (pos_w[:, :, 0::2].sin(), pos_w[:, :, 1::2].cos()), - dim=3).flatten(2) + pos_w = torch.stack((pos_w[:, :, 0::2].sin(), pos_w[:, :, 1::2].cos()), dim=3).flatten(2) h_embed = pos_tensor[:, :, 3] * scale pos_h = h_embed[:, :, None] / dim_t - pos_h = torch.stack( - (pos_h[:, :, 0::2].sin(), pos_h[:, :, 1::2].cos()), - dim=3).flatten(2) + pos_h = torch.stack((pos_h[:, :, 0::2].sin(), pos_h[:, :, 1::2].cos()), dim=3).flatten(2) pos = torch.cat((pos_y, pos_x, pos_w, pos_h), dim=2) else: - raise ValueError('Unknown pos_tensor shape(-1):{}'.format( - pos_tensor.size(-1))) + raise ValueError("Unknown pos_tensor shape(-1):{}".format(pos_tensor.size(-1))) return pos @@ -366,21 +315,23 @@ class EDPoseOutHead(BaseModule): for all the pose prediction layers. Defaults to `False`. """ - def __init__(self, - num_classes, - num_keypoints: int = 17, - num_queries: int = 900, - cls_no_bias: bool = False, - embed_dims: int = 256, - as_two_stage: bool = False, - refine_queries_num: int = 100, - num_box_decoder_layers: int = 2, - num_group: int = 100, - num_pred_layer: int = 6, - dec_pred_class_embed_share: bool = False, - dec_pred_bbox_embed_share: bool = False, - dec_pred_pose_embed_share: bool = False, - **kwargs): + def __init__( + self, + num_classes, + num_keypoints: int = 17, + num_queries: int = 900, + cls_no_bias: bool = False, + embed_dims: int = 256, + as_two_stage: bool = False, + refine_queries_num: int = 100, + num_box_decoder_layers: int = 2, + num_group: int = 100, + num_pred_layer: int = 6, + dec_pred_class_embed_share: bool = False, + dec_pred_bbox_embed_share: bool = False, + dec_pred_pose_embed_share: bool = False, + **kwargs, + ): super().__init__() self.embed_dims = embed_dims self.as_two_stage = as_two_stage @@ -395,8 +346,7 @@ class EDPoseOutHead(BaseModule): self.dec_pred_bbox_embed_share = dec_pred_bbox_embed_share self.dec_pred_pose_embed_share = dec_pred_pose_embed_share # prepare class & box embed - _class_embed = nn.Linear( - self.embed_dims, self.num_classes, bias=(not cls_no_bias)) + _class_embed = nn.Linear(self.embed_dims, self.num_classes, bias=(not cls_no_bias)) if not cls_no_bias: prior_prob = 0.01 bias_value = -math.log((1 - prior_prob) / prior_prob) @@ -410,45 +360,24 @@ class EDPoseOutHead(BaseModule): if dec_pred_bbox_embed_share: box_embed_layerlist = [_bbox_embed for i in range(num_pred_layer)] else: - box_embed_layerlist = [ - copy.deepcopy(_bbox_embed) for i in range(num_pred_layer) - ] + box_embed_layerlist = [copy.deepcopy(_bbox_embed) for i in range(num_pred_layer)] if dec_pred_class_embed_share: - class_embed_layerlist = [ - _class_embed for i in range(num_pred_layer) - ] + class_embed_layerlist = [_class_embed for i in range(num_pred_layer)] else: - class_embed_layerlist = [ - copy.deepcopy(_class_embed) for i in range(num_pred_layer) - ] + class_embed_layerlist = [copy.deepcopy(_class_embed) for i in range(num_pred_layer)] if num_keypoints == 17: if dec_pred_pose_embed_share: - pose_embed_layerlist = [ - _pose_embed - for i in range(num_pred_layer - num_box_decoder_layers + 1) - ] + pose_embed_layerlist = [_pose_embed for i in range(num_pred_layer - num_box_decoder_layers + 1)] else: - pose_embed_layerlist = [ - copy.deepcopy(_pose_embed) - for i in range(num_pred_layer - num_box_decoder_layers + 1) - ] + pose_embed_layerlist = [copy.deepcopy(_pose_embed) for i in range(num_pred_layer - num_box_decoder_layers + 1)] else: if dec_pred_pose_embed_share: - pose_embed_layerlist = [ - _pose_embed - for i in range(num_pred_layer - num_box_decoder_layers) - ] + pose_embed_layerlist = [_pose_embed for i in range(num_pred_layer - num_box_decoder_layers)] else: - pose_embed_layerlist = [ - copy.deepcopy(_pose_embed) - for i in range(num_pred_layer - num_box_decoder_layers) - ] - - pose_hw_embed_layerlist = [ - _pose_hw_embed - for i in range(num_pred_layer - num_box_decoder_layers) - ] + pose_embed_layerlist = [copy.deepcopy(_pose_embed) for i in range(num_pred_layer - num_box_decoder_layers)] + + pose_hw_embed_layerlist = [_pose_hw_embed for i in range(num_pred_layer - num_box_decoder_layers)] self.bbox_embed = nn.ModuleList(box_embed_layerlist) self.class_embed = nn.ModuleList(class_embed_layerlist) self.pose_embed = nn.ModuleList(pose_embed_layerlist) @@ -462,9 +391,15 @@ class EDPoseOutHead(BaseModule): for m in self.pose_embed: constant_init(m[-1], 0, bias=0) - def forward(self, hidden_states: List[Tensor], references: List[Tensor], - mask_dict: Dict, hidden_states_enc: Tensor, - referens_enc: Tensor, batch_data_samples) -> Dict: + def forward( + self, + hidden_states: List[Tensor], + references: List[Tensor], + mask_dict: Dict, + hidden_states_enc: Tensor, + referens_enc: Tensor, + batch_data_samples, + ) -> Dict: """Forward function. Args: @@ -484,39 +419,29 @@ class EDPoseOutHead(BaseModule): effec_dn_num = self.refine_queries_num if self.training else 0 outputs_coord_list = [] outputs_class = [] - for dec_lid, (layer_ref_sig, layer_bbox_embed, layer_cls_embed, - layer_hs) in enumerate( - zip(references[:-1], self.bbox_embed, - self.class_embed, hidden_states)): + for dec_lid, (layer_ref_sig, layer_bbox_embed, layer_cls_embed, layer_hs) in enumerate( + zip(references[:-1], self.bbox_embed, self.class_embed, hidden_states) + ): if dec_lid < self.num_box_decoder_layers: layer_delta_unsig = layer_bbox_embed(layer_hs) - layer_outputs_unsig = layer_delta_unsig + inverse_sigmoid( - layer_ref_sig) + layer_outputs_unsig = layer_delta_unsig + inverse_sigmoid(layer_ref_sig) layer_outputs_unsig = layer_outputs_unsig.sigmoid() layer_cls = layer_cls_embed(layer_hs) outputs_coord_list.append(layer_outputs_unsig) outputs_class.append(layer_cls) else: layer_hs_bbox_dn = layer_hs[:, :effec_dn_num, :] - layer_hs_bbox_norm = \ - layer_hs[:, effec_dn_num:, :][:, 0::( - self.num_keypoints + 1), :] + layer_hs_bbox_norm = layer_hs[:, effec_dn_num:, :][:, 0 :: (self.num_keypoints + 1), :] bs = layer_ref_sig.shape[0] - ref_before_sigmoid_bbox_dn = \ - layer_ref_sig[:, : effec_dn_num, :] - ref_before_sigmoid_bbox_norm = \ - layer_ref_sig[:, effec_dn_num:, :][:, 0::( - self.num_keypoints + 1), :] + ref_before_sigmoid_bbox_dn = layer_ref_sig[:, :effec_dn_num, :] + ref_before_sigmoid_bbox_norm = layer_ref_sig[:, effec_dn_num:, :][:, 0 :: (self.num_keypoints + 1), :] layer_delta_unsig_dn = layer_bbox_embed(layer_hs_bbox_dn) layer_delta_unsig_norm = layer_bbox_embed(layer_hs_bbox_norm) - layer_outputs_unsig_dn = layer_delta_unsig_dn + \ - inverse_sigmoid(ref_before_sigmoid_bbox_dn) + layer_outputs_unsig_dn = layer_delta_unsig_dn + inverse_sigmoid(ref_before_sigmoid_bbox_dn) layer_outputs_unsig_dn = layer_outputs_unsig_dn.sigmoid() - layer_outputs_unsig_norm = layer_delta_unsig_norm + \ - inverse_sigmoid(ref_before_sigmoid_bbox_norm) + layer_outputs_unsig_norm = layer_delta_unsig_norm + inverse_sigmoid(ref_before_sigmoid_bbox_norm) layer_outputs_unsig_norm = layer_outputs_unsig_norm.sigmoid() - layer_outputs_unsig = torch.cat( - (layer_outputs_unsig_dn, layer_outputs_unsig_norm), dim=1) + layer_outputs_unsig = torch.cat((layer_outputs_unsig_dn, layer_outputs_unsig_norm), dim=1) layer_cls_dn = layer_cls_embed(layer_hs_bbox_dn) layer_cls_norm = layer_cls_embed(layer_hs_bbox_norm) layer_cls = torch.cat((layer_cls_dn, layer_cls_norm), dim=1) @@ -525,58 +450,36 @@ class EDPoseOutHead(BaseModule): # update keypoints boxes outputs_keypoints_list = [] - kpt_index = [ - x for x in range(self.num_group * (self.num_keypoints + 1)) - if x % (self.num_keypoints + 1) != 0 - ] - for dec_lid, (layer_ref_sig, layer_hs) in enumerate( - zip(references[:-1], hidden_states)): + kpt_index = [x for x in range(self.num_group * (self.num_keypoints + 1)) if x % (self.num_keypoints + 1) != 0] + for dec_lid, (layer_ref_sig, layer_hs) in enumerate(zip(references[:-1], hidden_states)): if dec_lid < self.num_box_decoder_layers: assert isinstance(layer_hs, torch.Tensor) bs = layer_hs.shape[0] - layer_res = layer_hs.new_zeros( - (bs, self.num_queries, self.num_keypoints * 3)) + layer_res = layer_hs.new_zeros((bs, self.num_queries, self.num_keypoints * 3)) outputs_keypoints_list.append(layer_res) else: bs = layer_ref_sig.shape[0] - layer_hs_kpt = \ - layer_hs[:, effec_dn_num:, :].index_select( - 1, torch.tensor(kpt_index, device=layer_hs.device)) - delta_xy_unsig = self.pose_embed[dec_lid - - self.num_box_decoder_layers]( - layer_hs_kpt) - layer_ref_sig_kpt = \ - layer_ref_sig[:, effec_dn_num:, :].index_select( - 1, torch.tensor(kpt_index, device=layer_hs.device)) - layer_outputs_unsig_keypoints = delta_xy_unsig + \ - inverse_sigmoid(layer_ref_sig_kpt[..., :2]) - vis_xy_unsig = torch.ones_like( - layer_outputs_unsig_keypoints, - device=layer_outputs_unsig_keypoints.device) - xyv = torch.cat((layer_outputs_unsig_keypoints, - vis_xy_unsig[:, :, 0].unsqueeze(-1)), - dim=-1) + layer_hs_kpt = layer_hs[:, effec_dn_num:, :].index_select(1, torch.tensor(kpt_index, device=layer_hs.device)) + delta_xy_unsig = self.pose_embed[dec_lid - self.num_box_decoder_layers](layer_hs_kpt) + layer_ref_sig_kpt = layer_ref_sig[:, effec_dn_num:, :].index_select(1, torch.tensor(kpt_index, device=layer_hs.device)) + layer_outputs_unsig_keypoints = delta_xy_unsig + inverse_sigmoid(layer_ref_sig_kpt[..., :2]) + vis_xy_unsig = torch.ones_like(layer_outputs_unsig_keypoints, device=layer_outputs_unsig_keypoints.device) + xyv = torch.cat((layer_outputs_unsig_keypoints, vis_xy_unsig[:, :, 0].unsqueeze(-1)), dim=-1) xyv = xyv.sigmoid() - layer_res = xyv.reshape( - (bs, self.num_group, self.num_keypoints, 3)).flatten(2, 3) + layer_res = xyv.reshape((bs, self.num_group, self.num_keypoints, 3)).flatten(2, 3) layer_res = self.keypoint_xyzxyz_to_xyxyzz(layer_res) outputs_keypoints_list.append(layer_res) dn_mask_dict = mask_dict if self.refine_queries_num > 0 and dn_mask_dict is not None: - outputs_class, outputs_coord_list, outputs_keypoints_list = \ - self.dn_post_process2( - outputs_class, outputs_coord_list, - outputs_keypoints_list, dn_mask_dict - ) + outputs_class, outputs_coord_list, outputs_keypoints_list = self.dn_post_process2( + outputs_class, outputs_coord_list, outputs_keypoints_list, dn_mask_dict + ) - for _out_class, _out_bbox, _out_keypoint in zip( - outputs_class, outputs_coord_list, outputs_keypoints_list): - assert _out_class.shape[1] == \ - _out_bbox.shape[1] == _out_keypoint.shape[1] + for _out_class, _out_bbox, _out_keypoint in zip(outputs_class, outputs_coord_list, outputs_keypoints_list): + assert _out_class.shape[1] == _out_bbox.shape[1] == _out_keypoint.shape[1] - return outputs_class[-1], outputs_coord_list[ - -1], outputs_keypoints_list[-1] + return outputs_class[-1], outputs_coord_list[-1], outputs_keypoints_list[-1] def keypoint_xyzxyz_to_xyxyzz(self, keypoints: torch.Tensor): """ @@ -585,9 +488,9 @@ class EDPoseOutHead(BaseModule): """ res = torch.zeros_like(keypoints) num_points = keypoints.shape[-1] // 3 - res[..., 0:2 * num_points:2] = keypoints[..., 0::3] - res[..., 1:2 * num_points:2] = keypoints[..., 1::3] - res[..., 2 * num_points:] = keypoints[..., 2::3] + res[..., 0 : 2 * num_points : 2] = keypoints[..., 0::3] + res[..., 1 : 2 * num_points : 2] = keypoints[..., 1::3] + res[..., 2 * num_points :] = keypoints[..., 2::3] return res @@ -631,21 +534,23 @@ class EDPoseHead(TransformerHead): two_stage_keep_all_tokens (bool): Whether to keep all tokens. """ - def __init__(self, - num_queries: int = 100, - num_feature_levels: int = 4, - num_keypoints: int = 17, - as_two_stage: bool = False, - encoder: OptConfigType = None, - decoder: OptConfigType = None, - out_head: OptConfigType = None, - positional_encoding: OptConfigType = None, - data_decoder: OptConfigType = None, - denosing_cfg: OptConfigType = None, - dec_pred_class_embed_share: bool = False, - dec_pred_bbox_embed_share: bool = False, - refine_queries_num: int = 100, - two_stage_keep_all_tokens: bool = False) -> None: + def __init__( + self, + num_queries: int = 100, + num_feature_levels: int = 4, + num_keypoints: int = 17, + as_two_stage: bool = False, + encoder: OptConfigType = None, + decoder: OptConfigType = None, + out_head: OptConfigType = None, + positional_encoding: OptConfigType = None, + data_decoder: OptConfigType = None, + denosing_cfg: OptConfigType = None, + dec_pred_class_embed_share: bool = False, + dec_pred_bbox_embed_share: bool = False, + refine_queries_num: int = 100, + two_stage_keep_all_tokens: bool = False, + ) -> None: self.as_two_stage = as_two_stage self.num_feature_levels = num_feature_levels @@ -653,8 +558,8 @@ class EDPoseHead(TransformerHead): self.dec_pred_class_embed_share = dec_pred_class_embed_share self.dec_pred_bbox_embed_share = dec_pred_bbox_embed_share self.two_stage_keep_all_tokens = two_stage_keep_all_tokens - self.num_heads = decoder['layer_cfg']['self_attn_cfg']['num_heads'] - self.num_group = decoder['num_group'] + self.num_heads = decoder["layer_cfg"]["self_attn_cfg"]["num_heads"] + self.num_group = decoder["num_group"] self.num_keypoints = num_keypoints self.denosing_cfg = denosing_cfg if data_decoder is not None: @@ -663,35 +568,28 @@ class EDPoseHead(TransformerHead): self.data_decoder = None super().__init__( - encoder=encoder, - decoder=decoder, - out_head=out_head, - positional_encoding=positional_encoding, - num_queries=num_queries) - - self.positional_encoding = PositionEmbeddingSineHW( - **self.positional_encoding_cfg) + encoder=encoder, decoder=decoder, out_head=out_head, positional_encoding=positional_encoding, num_queries=num_queries + ) + + self.positional_encoding = PositionEmbeddingSineHW(**self.positional_encoding_cfg) self.encoder = DeformableDetrTransformerEncoder(**self.encoder_cfg) - self.decoder = EDPoseDecoder( - num_keypoints=num_keypoints, **self.decoder_cfg) + self.decoder = EDPoseDecoder(num_keypoints=num_keypoints, **self.decoder_cfg) self.out_head = EDPoseOutHead( num_keypoints=num_keypoints, as_two_stage=as_two_stage, refine_queries_num=refine_queries_num, **self.out_head_cfg, - **self.decoder_cfg) + **self.decoder_cfg, + ) self.embed_dims = self.encoder.embed_dims - self.label_enc = nn.Embedding( - self.denosing_cfg['dn_labelbook_size'] + 1, self.embed_dims) + self.label_enc = nn.Embedding(self.denosing_cfg["dn_labelbook_size"] + 1, self.embed_dims) if not self.as_two_stage: - self.query_embedding = nn.Embedding(self.num_queries, - self.embed_dims) + self.query_embedding = nn.Embedding(self.num_queries, self.embed_dims) self.refpoint_embedding = nn.Embedding(self.num_queries, 4) - self.level_embed = nn.Parameter( - torch.Tensor(self.num_feature_levels, self.embed_dims)) + self.level_embed = nn.Parameter(torch.Tensor(self.num_feature_levels, self.embed_dims)) self.decoder.bbox_embed = self.out_head.bbox_embed self.decoder.pose_embed = self.out_head.pose_embed @@ -704,14 +602,12 @@ class EDPoseHead(TransformerHead): if dec_pred_class_embed_share and dec_pred_bbox_embed_share: self.enc_out_bbox_embed = self.out_head.bbox_embed[0] else: - self.enc_out_bbox_embed = copy.deepcopy( - self.out_head.bbox_embed[0]) + self.enc_out_bbox_embed = copy.deepcopy(self.out_head.bbox_embed[0]) if dec_pred_class_embed_share and dec_pred_bbox_embed_share: self.enc_out_class_embed = self.out_head.class_embed[0] else: - self.enc_out_class_embed = copy.deepcopy( - self.out_head.class_embed[0]) + self.enc_out_class_embed = copy.deepcopy(self.out_head.class_embed[0]) def init_weights(self) -> None: """Initialize weights for Transformer and other components.""" @@ -728,10 +624,7 @@ class EDPoseHead(TransformerHead): nn.init.normal_(self.level_embed) - def pre_transformer(self, - img_feats: Tuple[Tensor], - batch_data_samples: OptSampleList = None - ) -> Tuple[Dict]: + def pre_transformer(self, img_feats: Tuple[Tensor], batch_data_samples: OptSampleList = None) -> Tuple[Dict]: """Process image features before feeding them to the transformer. Args: @@ -768,17 +661,14 @@ class EDPoseHead(TransformerHead): mlvl_masks = [] mlvl_pos_embeds = [] for feat in img_feats: - mlvl_masks.append( - F.interpolate(masks[None], - size=feat.shape[-2:]).to(torch.bool).squeeze(0)) + mlvl_masks.append(F.interpolate(masks[None], size=feat.shape[-2:]).to(torch.bool).squeeze(0)) mlvl_pos_embeds.append(self.positional_encoding(mlvl_masks[-1])) feat_flatten = [] lvl_pos_embed_flatten = [] mask_flatten = [] spatial_shapes = [] - for lvl, (feat, mask, pos_embed) in enumerate( - zip(img_feats, mlvl_masks, mlvl_pos_embeds)): + for lvl, (feat, mask, pos_embed) in enumerate(zip(img_feats, mlvl_masks, mlvl_pos_embeds)): batch_size, c, h, w = feat.shape # [bs, c, h_lvl, w_lvl] -> [bs, h_lvl*w_lvl, c] feat = feat.view(batch_size, c, -1).permute(0, 2, 1) @@ -799,26 +689,17 @@ class EDPoseHead(TransformerHead): # (bs, num_feat_points), where num_feat_points = sum_lvl(h_lvl*w_lvl) mask_flatten = torch.cat(mask_flatten, 1) - spatial_shapes = torch.as_tensor( # (num_level, 2) - spatial_shapes, - dtype=torch.long, - device=feat_flatten.device) - level_start_index = torch.cat(( - spatial_shapes.new_zeros((1, )), # (num_level) - spatial_shapes.prod(1).cumsum(0)[:-1])) - valid_ratios = torch.stack( # (bs, num_level, 2) - [self.get_valid_ratio(m) for m in mlvl_masks], 1) + spatial_shapes = torch.as_tensor(spatial_shapes, dtype=torch.long, device=feat_flatten.device) # (num_level, 2) + level_start_index = torch.cat((spatial_shapes.new_zeros((1,)), spatial_shapes.prod(1).cumsum(0)[:-1])) # (num_level) + valid_ratios = torch.stack([self.get_valid_ratio(m) for m in mlvl_masks], 1) # (bs, num_level, 2) if self.refine_queries_num > 0 or batch_data_samples is not None: - input_query_label, input_query_bbox, humandet_attn_mask, \ - human2pose_attn_mask, mask_dict =\ - self.prepare_for_denosing( - batch_data_samples, - device=img_feats[0].device) + input_query_label, input_query_bbox, humandet_attn_mask, human2pose_attn_mask, mask_dict = self.prepare_for_denosing( + batch_data_samples, device=img_feats[0].device + ) else: assert batch_data_samples is None - input_query_bbox = input_query_label = \ - humandet_attn_mask = human2pose_attn_mask = mask_dict = None + input_query_bbox = input_query_label = humandet_attn_mask = human2pose_attn_mask = mask_dict = None encoder_inputs_dict = dict( query=feat_flatten, @@ -826,7 +707,8 @@ class EDPoseHead(TransformerHead): key_padding_mask=mask_flatten, spatial_shapes=spatial_shapes, level_start_index=level_start_index, - valid_ratios=valid_ratios) + valid_ratios=valid_ratios, + ) decoder_inputs_dict = dict( memory_mask=mask_flatten, spatial_shapes=spatial_shapes, @@ -836,12 +718,11 @@ class EDPoseHead(TransformerHead): human2pose_attn_mask=human2pose_attn_mask, input_query_bbox=input_query_bbox, input_query_label=input_query_label, - mask_dict=mask_dict) + mask_dict=mask_dict, + ) return encoder_inputs_dict, decoder_inputs_dict - def forward_encoder(self, - img_feats: Tuple[Tensor], - batch_data_samples: OptSampleList = None) -> Dict: + def forward_encoder(self, img_feats: Tuple[Tensor], batch_data_samples: OptSampleList = None) -> Dict: """Forward with Transformer encoder. The forward procedure is defined as: @@ -860,16 +741,15 @@ class EDPoseHead(TransformerHead): dict: The dictionary of encoder outputs, which includes the `memory` of the encoder output. """ - encoder_inputs_dict, decoder_inputs_dict = self.pre_transformer( - img_feats, batch_data_samples) + encoder_inputs_dict, decoder_inputs_dict = self.pre_transformer(img_feats, batch_data_samples) memory = self.encoder(**encoder_inputs_dict) encoder_outputs_dict = dict(memory=memory, **decoder_inputs_dict) return encoder_outputs_dict - def pre_decoder(self, memory: Tensor, memory_mask: Tensor, - spatial_shapes: Tensor, input_query_bbox: Tensor, - input_query_label: Tensor) -> Tuple[Dict, Dict]: + def pre_decoder( + self, memory: Tensor, memory_mask: Tensor, spatial_shapes: Tensor, input_query_bbox: Tensor, input_query_label: Tensor + ) -> Tuple[Dict, Dict]: """Prepare intermediate variables before entering Transformer decoder, such as `query` and `reference_points`. @@ -895,29 +775,20 @@ class EDPoseHead(TransformerHead): """ bs, _, c = memory.shape if self.as_two_stage: - output_memory, output_proposals = \ - self.gen_encoder_output_proposals( - memory, memory_mask, spatial_shapes) + output_memory, output_proposals = self.gen_encoder_output_proposals(memory, memory_mask, spatial_shapes) enc_outputs_class = self.enc_out_class_embed(output_memory) - enc_outputs_coord_unact = self.enc_out_bbox_embed( - output_memory) + output_proposals - - topk_proposals = torch.topk( - enc_outputs_class.max(-1)[0], self.num_queries, dim=1)[1] - topk_coords_undetach = torch.gather( - enc_outputs_coord_unact, 1, - topk_proposals.unsqueeze(-1).repeat(1, 1, 4)) + enc_outputs_coord_unact = self.enc_out_bbox_embed(output_memory) + output_proposals + + topk_proposals = torch.topk(enc_outputs_class.max(-1)[0], self.num_queries, dim=1)[1] + topk_coords_undetach = torch.gather(enc_outputs_coord_unact, 1, topk_proposals.unsqueeze(-1).repeat(1, 1, 4)) topk_coords_unact = topk_coords_undetach.detach() reference_points = topk_coords_unact.sigmoid() - query_undetach = torch.gather( - output_memory, 1, - topk_proposals.unsqueeze(-1).repeat(1, 1, self.embed_dims)) + query_undetach = torch.gather(output_memory, 1, topk_proposals.unsqueeze(-1).repeat(1, 1, self.embed_dims)) query = query_undetach.detach() if input_query_bbox is not None: - reference_points = torch.cat( - [input_query_bbox, topk_coords_unact], dim=1).sigmoid() + reference_points = torch.cat([input_query_bbox, topk_coords_unact], dim=1).sigmoid() query = torch.cat([input_query_label, query], dim=1) if self.two_stage_keep_all_tokens: hidden_states_enc = output_memory.unsqueeze(0) @@ -927,28 +798,31 @@ class EDPoseHead(TransformerHead): referens_enc = topk_coords_undetach.sigmoid().unsqueeze(0) else: hidden_states_enc, referens_enc = None, None - query = self.query_embedding.weight[:, None, :].repeat( - 1, bs, 1).transpose(0, 1) - reference_points = \ - self.refpoint_embedding.weight[:, None, :].repeat(1, bs, 1) + query = self.query_embedding.weight[:, None, :].repeat(1, bs, 1).transpose(0, 1) + reference_points = self.refpoint_embedding.weight[:, None, :].repeat(1, bs, 1) if input_query_bbox is not None: - reference_points = torch.cat( - [input_query_bbox, reference_points], dim=1) + reference_points = torch.cat([input_query_bbox, reference_points], dim=1) query = torch.cat([input_query_label, query], dim=1) reference_points = reference_points.sigmoid() - decoder_inputs_dict = dict( - query=query, reference_points=reference_points) - head_inputs_dict = dict( - hidden_states_enc=hidden_states_enc, referens_enc=referens_enc) + decoder_inputs_dict = dict(query=query, reference_points=reference_points) + head_inputs_dict = dict(hidden_states_enc=hidden_states_enc, referens_enc=referens_enc) return decoder_inputs_dict, head_inputs_dict - def forward_decoder(self, memory: Tensor, memory_mask: Tensor, - spatial_shapes: Tensor, level_start_index: Tensor, - valid_ratios: Tensor, humandet_attn_mask: Tensor, - human2pose_attn_mask: Tensor, input_query_bbox: Tensor, - input_query_label: Tensor, mask_dict: Dict) -> Dict: + def forward_decoder( + self, + memory: Tensor, + memory_mask: Tensor, + spatial_shapes: Tensor, + level_start_index: Tensor, + valid_ratios: Tensor, + humandet_attn_mask: Tensor, + human2pose_attn_mask: Tensor, + input_query_bbox: Tensor, + input_query_label: Tensor, + mask_dict: Dict, + ) -> Dict: """Forward with Transformer decoder. The forward procedure is defined as: @@ -977,64 +851,50 @@ class EDPoseHead(TransformerHead): `hidden_states` of the decoder output and `references` including the initial and intermediate reference_points. """ - decoder_in, head_in = self.pre_decoder(memory, memory_mask, - spatial_shapes, - input_query_bbox, - input_query_label) + decoder_in, head_in = self.pre_decoder(memory, memory_mask, spatial_shapes, input_query_bbox, input_query_label) inter_states, inter_references = self.decoder( - query=decoder_in['query'].transpose(0, 1), + query=decoder_in["query"].transpose(0, 1), value=memory.transpose(0, 1), key_padding_mask=memory_mask, # for cross_attn - reference_points=decoder_in['reference_points'].transpose(0, 1), + reference_points=decoder_in["reference_points"].transpose(0, 1), spatial_shapes=spatial_shapes, level_start_index=level_start_index, valid_ratios=valid_ratios, humandet_attn_mask=humandet_attn_mask, - human2pose_attn_mask=human2pose_attn_mask) + human2pose_attn_mask=human2pose_attn_mask, + ) references = inter_references - decoder_outputs_dict = dict( - hidden_states=inter_states, - references=references, - mask_dict=mask_dict) + decoder_outputs_dict = dict(hidden_states=inter_states, references=references, mask_dict=mask_dict) decoder_outputs_dict.update(head_in) return decoder_outputs_dict - def forward_out_head(self, batch_data_samples: OptSampleList, - hidden_states: List[Tensor], references: List[Tensor], - mask_dict: Dict, hidden_states_enc: Tensor, - referens_enc: Tensor) -> Tuple[Tensor]: + def forward_out_head( + self, + batch_data_samples: OptSampleList, + hidden_states: List[Tensor], + references: List[Tensor], + mask_dict: Dict, + hidden_states_enc: Tensor, + referens_enc: Tensor, + ) -> Tuple[Tensor]: """Forward function.""" - out = self.out_head(hidden_states, references, mask_dict, - hidden_states_enc, referens_enc, - batch_data_samples) + out = self.out_head(hidden_states, references, mask_dict, hidden_states_enc, referens_enc, batch_data_samples) return out - def predict(self, - feats: Features, - batch_data_samples: OptSampleList, - test_cfg: ConfigType = {}) -> Predictions: + def predict(self, feats: Features, batch_data_samples: OptSampleList, test_cfg: ConfigType = {}) -> Predictions: """Predict results from features.""" - input_shapes = np.array( - [d.metainfo['input_size'] for d in batch_data_samples]) + input_shapes = np.array([d.metainfo["input_size"] for d in batch_data_samples]) - if test_cfg.get('flip_test', False): - assert NotImplementedError( - 'flip_test is currently not supported ' - 'for EDPose. Please set `model.test_cfg.flip_test=False`') + if test_cfg.get("flip_test", False): + assert NotImplementedError("flip_test is currently not supported " "for EDPose. Please set `model.test_cfg.flip_test=False`") else: - pred_logits, pred_boxes, pred_keypoints = self.forward( - feats, batch_data_samples) # (B, K, D) - - pred = self.decode( - input_shapes, - pred_logits=pred_logits, - pred_boxes=pred_boxes, - pred_keypoints=pred_keypoints) + pred_logits, pred_boxes, pred_keypoints = self.forward(feats, batch_data_samples) # (B, K, D) + + pred = self.decode(input_shapes, pred_logits=pred_logits, pred_boxes=pred_boxes, pred_keypoints=pred_keypoints) return pred - def decode(self, input_shapes: np.ndarray, pred_logits: Tensor, - pred_boxes: Tensor, pred_keypoints: Tensor): + def decode(self, input_shapes: np.ndarray, pred_logits: Tensor, pred_boxes: Tensor, pred_keypoints: Tensor): """Select the final top-k keypoints, and decode the results from normalize size to origin input size. @@ -1048,37 +908,30 @@ class EDPoseHead(TransformerHead): """ if self.data_decoder is None: - raise RuntimeError(f'The data decoder has not been set in \ - {self.__class__.__name__}. ' - 'Please set the data decoder configs in \ - the init parameters to ' - 'enable head methods `head.predict()` and \ - `head.decode()`') + raise RuntimeError( + f"The data decoder has not been set in \ + {self.__class__.__name__}. " + "Please set the data decoder configs in \ + the init parameters to " + "enable head methods `head.predict()` and \ + `head.decode()`" + ) preds = [] pred_logits = pred_logits.sigmoid() - pred_logits, pred_boxes, pred_keypoints = to_numpy( - [pred_logits, pred_boxes, pred_keypoints]) + pred_logits, pred_boxes, pred_keypoints = to_numpy([pred_logits, pred_boxes, pred_keypoints]) - for input_shape, pred_logit, pred_bbox, pred_kpts in zip( - input_shapes, pred_logits, pred_boxes, pred_keypoints): + for input_shape, pred_logit, pred_bbox, pred_kpts in zip(input_shapes, pred_logits, pred_boxes, pred_keypoints): - bboxes, keypoints, keypoint_scores = self.data_decoder.decode( - input_shape, pred_logit, pred_bbox, pred_kpts) + bboxes, keypoints, keypoint_scores = self.data_decoder.decode(input_shape, pred_logit, pred_bbox, pred_kpts) # pack outputs - preds.append( - InstanceData( - keypoints=keypoints, - keypoint_scores=keypoint_scores, - bboxes=bboxes)) + preds.append(InstanceData(keypoints=keypoints, keypoint_scores=keypoint_scores, bboxes=bboxes)) return preds - def gen_encoder_output_proposals(self, memory: Tensor, memory_mask: Tensor, - spatial_shapes: Tensor - ) -> Tuple[Tensor, Tensor]: + def gen_encoder_output_proposals(self, memory: Tensor, memory_mask: Tensor, spatial_shapes: Tensor) -> Tuple[Tensor, Tensor]: """Generate proposals from encoded memory. The function will only be used when `as_two_stage` is `True`. @@ -1103,16 +956,14 @@ class EDPoseHead(TransformerHead): proposals = [] _cur = 0 # start index in the sequence of the current level for lvl, (H, W) in enumerate(spatial_shapes): - mask_flatten_ = memory_mask[:, - _cur:(_cur + H * W)].view(bs, H, W, 1) + mask_flatten_ = memory_mask[:, _cur : (_cur + H * W)].view(bs, H, W, 1) valid_H = torch.sum(~mask_flatten_[:, :, 0, 0], 1).unsqueeze(-1) valid_W = torch.sum(~mask_flatten_[:, 0, :, 0], 1).unsqueeze(-1) grid_y, grid_x = torch.meshgrid( - torch.linspace( - 0, H - 1, H, dtype=torch.float32, device=memory.device), - torch.linspace( - 0, W - 1, W, dtype=torch.float32, device=memory.device)) + torch.linspace(0, H - 1, H, dtype=torch.float32, device=memory.device), + torch.linspace(0, W - 1, W, dtype=torch.float32, device=memory.device), + ) grid = torch.cat([grid_x.unsqueeze(-1), grid_y.unsqueeze(-1)], -1) scale = torch.cat([valid_W, valid_H], 1).view(bs, 1, 1, 2) @@ -1120,23 +971,17 @@ class EDPoseHead(TransformerHead): wh = torch.ones_like(grid) * 0.05 * (2.0**lvl) proposal = torch.cat((grid, wh), -1).view(bs, -1, 4) proposals.append(proposal) - _cur += (H * W) + _cur += H * W output_proposals = torch.cat(proposals, 1) - output_proposals_valid = ((output_proposals > 0.01) & - (output_proposals < 0.99)).all( - -1, keepdim=True) + output_proposals_valid = ((output_proposals > 0.01) & (output_proposals < 0.99)).all(-1, keepdim=True) output_proposals = inverse_sigmoid(output_proposals) - output_proposals = output_proposals.masked_fill( - memory_mask.unsqueeze(-1), float('inf')) - output_proposals = output_proposals.masked_fill( - ~output_proposals_valid, float('inf')) + output_proposals = output_proposals.masked_fill(memory_mask.unsqueeze(-1), float("inf")) + output_proposals = output_proposals.masked_fill(~output_proposals_valid, float("inf")) output_memory = memory - output_memory = output_memory.masked_fill( - memory_mask.unsqueeze(-1), float(0)) - output_memory = output_memory.masked_fill(~output_proposals_valid, - float(0)) + output_memory = output_memory.masked_fill(memory_mask.unsqueeze(-1), float(0)) + output_memory = output_memory.masked_fill(~output_proposals_valid, float(0)) output_memory = self.memory_trans_fc(output_memory) output_memory = self.memory_trans_norm(output_memory) # [bs, sum(hw), 2] @@ -1144,7 +989,7 @@ class EDPoseHead(TransformerHead): @property def default_init_cfg(self): - init_cfg = [dict(type='Normal', layer=['Linear'], std=0.01, bias=0)] + init_cfg = [dict(type="Normal", layer=["Linear"], std=0.01, bias=0)] return init_cfg def prepare_for_denosing(self, targets: OptSampleList, device): @@ -1157,12 +1002,10 @@ class EDPoseHead(TransformerHead): self.num_group * (self.num_keypoints + 1), self.num_group * (self.num_keypoints + 1), device=device, - dtype=torch.bool) - group_bbox_kpt = (self.num_keypoints + 1) - kpt_index = [ - x for x in range(self.num_group * (self.num_keypoints + 1)) - if x % (self.num_keypoints + 1) == 0 - ] + dtype=torch.bool, + ) + group_bbox_kpt = self.num_keypoints + 1 + kpt_index = [x for x in range(self.num_group * (self.num_keypoints + 1)) if x % (self.num_keypoints + 1) == 0] for matchj in range(self.num_group * (self.num_keypoints + 1)): sj = (matchj // group_bbox_kpt) * group_bbox_kpt ej = (matchj // group_bbox_kpt + 1) * group_bbox_kpt @@ -1178,17 +1021,17 @@ class EDPoseHead(TransformerHead): return None, None, None, attn_mask_infere, None # targets, dn_scalar, noise_scale = dn_args - device = targets[0]['boxes'].device + device = targets[0]["boxes"].device bs = len(targets) refine_queries_num = self.refine_queries_num # gather gt boxes and labels - gt_boxes = [t['boxes'] for t in targets] - gt_labels = [t['labels'] for t in targets] - gt_keypoints = [t['keypoints'] for t in targets] + gt_boxes = [t["boxes"] for t in targets] + gt_labels = [t["labels"] for t in targets] + gt_keypoints = [t["keypoints"] for t in targets] # repeat them - def get_indices_for_repeat(now_num, target_num, device='cuda'): + def get_indices_for_repeat(now_num, target_num, device="cuda"): """ Input: - now_num: int @@ -1201,31 +1044,24 @@ class EDPoseHead(TransformerHead): multiplier = target_num // now_num out_indice.append(base_indice.repeat(multiplier)) residue = target_num % now_num - out_indice.append(base_indice[torch.randint( - 0, now_num, (residue, ), device=device)]) + out_indice.append(base_indice[torch.randint(0, now_num, (residue,), device=device)]) return torch.cat(out_indice) gt_boxes_expand = [] gt_labels_expand = [] gt_keypoints_expand = [] - for idx, (gt_boxes_i, gt_labels_i, gt_keypoint_i) in enumerate( - zip(gt_boxes, gt_labels, gt_keypoints)): + for idx, (gt_boxes_i, gt_labels_i, gt_keypoint_i) in enumerate(zip(gt_boxes, gt_labels, gt_keypoints)): num_gt_i = gt_boxes_i.shape[0] if num_gt_i > 0: - indices = get_indices_for_repeat(num_gt_i, refine_queries_num, - device) + indices = get_indices_for_repeat(num_gt_i, refine_queries_num, device) gt_boxes_expand_i = gt_boxes_i[indices] # num_dn, 4 gt_labels_expand_i = gt_labels_i[indices] gt_keypoints_expand_i = gt_keypoint_i[indices] else: # all negative samples when no gt boxes - gt_boxes_expand_i = torch.rand( - refine_queries_num, 4, device=device) - gt_labels_expand_i = torch.ones( - refine_queries_num, dtype=torch.int64, - device=device) * int(self.num_classes) - gt_keypoints_expand_i = torch.rand( - refine_queries_num, self.num_keypoints * 3, device=device) + gt_boxes_expand_i = torch.rand(refine_queries_num, 4, device=device) + gt_labels_expand_i = torch.ones(refine_queries_num, dtype=torch.int64, device=device) * int(self.num_classes) + gt_keypoints_expand_i = torch.rand(refine_queries_num, self.num_keypoints * 3, device=device) gt_boxes_expand.append(gt_boxes_expand_i) gt_labels_expand.append(gt_labels_expand_i) gt_keypoints_expand.append(gt_keypoints_expand_i) @@ -1236,38 +1072,34 @@ class EDPoseHead(TransformerHead): knwon_labels_expand = gt_labels_expand.clone() # add noise - if self.denosing_cfg['dn_label_noise_ratio'] > 0: + if self.denosing_cfg["dn_label_noise_ratio"] > 0: prob = torch.rand_like(knwon_labels_expand.float()) - chosen_indice = prob < self.denosing_cfg['dn_label_noise_ratio'] - new_label = torch.randint_like( - knwon_labels_expand[chosen_indice], 0, - self.dn_labelbook_size) # randomly put a new one here + chosen_indice = prob < self.denosing_cfg["dn_label_noise_ratio"] + new_label = torch.randint_like(knwon_labels_expand[chosen_indice], 0, self.dn_labelbook_size) # randomly put a new one here knwon_labels_expand[chosen_indice] = new_label - if self.denosing_cfg['dn_box_noise_scale'] > 0: + if self.denosing_cfg["dn_box_noise_scale"] > 0: diff = torch.zeros_like(knwon_boxes_expand) diff[..., :2] = knwon_boxes_expand[..., 2:] / 2 diff[..., 2:] = knwon_boxes_expand[..., 2:] - knwon_boxes_expand += torch.mul( - (torch.rand_like(knwon_boxes_expand) * 2 - 1.0), - diff) * self.denosing_cfg['dn_box_noise_scale'] + knwon_boxes_expand += torch.mul((torch.rand_like(knwon_boxes_expand) * 2 - 1.0), diff) * self.denosing_cfg["dn_box_noise_scale"] knwon_boxes_expand = knwon_boxes_expand.clamp(min=0.0, max=1.0) input_query_label = self.label_enc(knwon_labels_expand) input_query_bbox = inverse_sigmoid(knwon_boxes_expand) # prepare mask - if 'group2group' in self.denosing_cfg['dn_attn_mask_type_list']: + if "group2group" in self.denosing_cfg["dn_attn_mask_type_list"]: attn_mask = torch.zeros( bs, self.num_heads, refine_queries_num + self.num_queries, refine_queries_num + self.num_queries, device=device, - dtype=torch.bool) + dtype=torch.bool, + ) attn_mask[:, :, refine_queries_num:, :refine_queries_num] = True - for idx, (gt_boxes_i, - gt_labels_i) in enumerate(zip(gt_boxes, gt_labels)): + for idx, (gt_boxes_i, gt_labels_i) in enumerate(zip(gt_boxes, gt_labels)): num_gt_i = gt_boxes_i.shape[0] if num_gt_i == 0: continue @@ -1280,38 +1112,31 @@ class EDPoseHead(TransformerHead): attn_mask[idx, :, matchi, ei:refine_queries_num] = True attn_mask = attn_mask.flatten(0, 1) - if 'group2group' in self.denosing_cfg['dn_attn_mask_type_list']: + if "group2group" in self.denosing_cfg["dn_attn_mask_type_list"]: attn_mask2 = torch.zeros( bs, self.num_heads, refine_queries_num + self.num_group * (self.num_keypoints + 1), refine_queries_num + self.num_group * (self.num_keypoints + 1), device=device, - dtype=torch.bool) + dtype=torch.bool, + ) attn_mask2[:, :, refine_queries_num:, :refine_queries_num] = True - group_bbox_kpt = (self.num_keypoints + 1) - kpt_index = [ - x for x in range(self.num_group * (self.num_keypoints + 1)) - if x % (self.num_keypoints + 1) == 0 - ] + group_bbox_kpt = self.num_keypoints + 1 + kpt_index = [x for x in range(self.num_group * (self.num_keypoints + 1)) if x % (self.num_keypoints + 1) == 0] for matchj in range(self.num_group * (self.num_keypoints + 1)): sj = (matchj // group_bbox_kpt) * group_bbox_kpt ej = (matchj // group_bbox_kpt + 1) * group_bbox_kpt if sj > 0: - attn_mask2[:, :, refine_queries_num:, - refine_queries_num:][:, :, matchj, :sj] = True + attn_mask2[:, :, refine_queries_num:, refine_queries_num:][:, :, matchj, :sj] = True if ej < self.num_group * (self.num_keypoints + 1): - attn_mask2[:, :, refine_queries_num:, - refine_queries_num:][:, :, matchj, ej:] = True + attn_mask2[:, :, refine_queries_num:, refine_queries_num:][:, :, matchj, ej:] = True for match_x in range(self.num_group * (self.num_keypoints + 1)): if match_x % group_bbox_kpt == 0: - attn_mask2[:, :, refine_queries_num:, - refine_queries_num:][:, :, match_x, - kpt_index] = False + attn_mask2[:, :, refine_queries_num:, refine_queries_num:][:, :, match_x, kpt_index] = False - for idx, (gt_boxes_i, - gt_labels_i) in enumerate(zip(gt_boxes, gt_labels)): + for idx, (gt_boxes_i, gt_labels_i) in enumerate(zip(gt_boxes, gt_labels)): num_gt_i = gt_boxes_i.shape[0] if num_gt_i == 0: continue @@ -1321,26 +1146,19 @@ class EDPoseHead(TransformerHead): if si > 0: attn_mask2[idx, :, matchi, :si] = True if ei < refine_queries_num: - attn_mask2[idx, :, matchi, - ei:refine_queries_num] = True + attn_mask2[idx, :, matchi, ei:refine_queries_num] = True attn_mask2 = attn_mask2.flatten(0, 1) mask_dict = { - 'pad_size': refine_queries_num, - 'known_bboxs': gt_boxes_expand, - 'known_labels': gt_labels_expand, - 'known_keypoints': gt_keypoints_expand + "pad_size": refine_queries_num, + "known_bboxs": gt_boxes_expand, + "known_labels": gt_labels_expand, + "known_keypoints": gt_keypoints_expand, } - return input_query_label, input_query_bbox, \ - attn_mask, attn_mask2, mask_dict + return input_query_label, input_query_bbox, attn_mask, attn_mask2, mask_dict - def loss(self, - feats: Tuple[Tensor], - batch_data_samples: OptSampleList, - train_cfg: OptConfigType = {}) -> dict: + def loss(self, feats: Tuple[Tensor], batch_data_samples: OptSampleList, train_cfg: OptConfigType = {}) -> dict: """Calculate losses from a batch of inputs and data samples.""" - assert NotImplementedError( - 'the training of EDPose has not been ' - 'supported. Please stay tuned for further update.') + assert NotImplementedError("the training of EDPose has not been " "supported. Please stay tuned for further update.") diff --git a/mmpose/models/heads/transformer_heads/transformers/__init__.py b/mmpose/models/heads/transformer_heads/transformers/__init__.py index 0e9f115cd1e770554ffc405a969d11e745b493ed..10550e9860e5ac2605bc1efc5870578a415677b6 100644 --- a/mmpose/models/heads/transformer_heads/transformers/__init__.py +++ b/mmpose/models/heads/transformer_heads/transformers/__init__.py @@ -1,16 +1,22 @@ # Copyright (c) OpenMMLab. All rights reserved. -from .deformable_detr_layers import (DeformableDetrTransformerDecoder, - DeformableDetrTransformerDecoderLayer, - DeformableDetrTransformerEncoder, - DeformableDetrTransformerEncoderLayer) -from .detr_layers import (DetrTransformerDecoder, DetrTransformerDecoderLayer, - DetrTransformerEncoder, DetrTransformerEncoderLayer) +from .deformable_detr_layers import ( + DeformableDetrTransformerDecoder, + DeformableDetrTransformerDecoderLayer, + DeformableDetrTransformerEncoder, + DeformableDetrTransformerEncoderLayer, +) +from .detr_layers import DetrTransformerDecoder, DetrTransformerDecoderLayer, DetrTransformerEncoder, DetrTransformerEncoderLayer from .utils import FFN, PositionEmbeddingSineHW __all__ = [ - 'DetrTransformerEncoder', 'DetrTransformerDecoder', - 'DetrTransformerEncoderLayer', 'DetrTransformerDecoderLayer', - 'DeformableDetrTransformerEncoder', 'DeformableDetrTransformerDecoder', - 'DeformableDetrTransformerEncoderLayer', - 'DeformableDetrTransformerDecoderLayer', 'PositionEmbeddingSineHW', 'FFN' + "DetrTransformerEncoder", + "DetrTransformerDecoder", + "DetrTransformerEncoderLayer", + "DetrTransformerDecoderLayer", + "DeformableDetrTransformerEncoder", + "DeformableDetrTransformerDecoder", + "DeformableDetrTransformerEncoderLayer", + "DeformableDetrTransformerDecoderLayer", + "PositionEmbeddingSineHW", + "FFN", ] diff --git a/mmpose/models/heads/transformer_heads/transformers/deformable_detr_layers.py b/mmpose/models/heads/transformer_heads/transformers/deformable_detr_layers.py index 149f04e469ba48ff1f1f8b8474e44ce74ecebdb0..c0e472fe0fa33648fa02257b243e7b4ae5748206 100644 --- a/mmpose/models/heads/transformer_heads/transformers/deformable_detr_layers.py +++ b/mmpose/models/heads/transformer_heads/transformers/deformable_detr_layers.py @@ -9,8 +9,8 @@ from mmengine.model import ModuleList from torch import Tensor, nn from mmpose.models.utils import inverse_sigmoid -from .detr_layers import (DetrTransformerDecoder, DetrTransformerDecoderLayer, - DetrTransformerEncoder, DetrTransformerEncoderLayer) + +from .detr_layers import DetrTransformerDecoder, DetrTransformerDecoderLayer, DetrTransformerEncoder, DetrTransformerEncoderLayer class DeformableDetrTransformerEncoder(DetrTransformerEncoder): @@ -18,16 +18,19 @@ class DeformableDetrTransformerEncoder(DetrTransformerEncoder): def _init_layers(self) -> None: """Initialize encoder layers.""" - self.layers = ModuleList([ - DeformableDetrTransformerEncoderLayer(**self.layer_cfg) - for _ in range(self.num_layers) - ]) + self.layers = ModuleList([DeformableDetrTransformerEncoderLayer(**self.layer_cfg) for _ in range(self.num_layers)]) self.embed_dims = self.layers[0].embed_dims - def forward(self, query: Tensor, query_pos: Tensor, - key_padding_mask: Tensor, spatial_shapes: Tensor, - level_start_index: Tensor, valid_ratios: Tensor, - **kwargs) -> Tensor: + def forward( + self, + query: Tensor, + query_pos: Tensor, + key_padding_mask: Tensor, + spatial_shapes: Tensor, + level_start_index: Tensor, + valid_ratios: Tensor, + **kwargs, + ) -> Tensor: """Forward function of Transformer encoder. Args: @@ -50,8 +53,7 @@ class DeformableDetrTransformerEncoder(DetrTransformerEncoder): called 'encoder output embeddings' or 'memory', has shape (bs, num_queries, dim) """ - reference_points = self.get_encoder_reference_points( - spatial_shapes, valid_ratios, device=query.device) + reference_points = self.get_encoder_reference_points(spatial_shapes, valid_ratios, device=query.device) for layer in self.layers: query = layer( query=query, @@ -61,14 +63,12 @@ class DeformableDetrTransformerEncoder(DetrTransformerEncoder): level_start_index=level_start_index, valid_ratios=valid_ratios, reference_points=reference_points, - **kwargs) + **kwargs, + ) return query @staticmethod - def get_encoder_reference_points(spatial_shapes: Tensor, - valid_ratios: Tensor, - device: Union[torch.device, - str]) -> Tensor: + def get_encoder_reference_points(spatial_shapes: Tensor, valid_ratios: Tensor, device: Union[torch.device, str]) -> Tensor: """Get the reference points used in encoder. Args: @@ -88,14 +88,11 @@ class DeformableDetrTransformerEncoder(DetrTransformerEncoder): reference_points_list = [] for lvl, (H, W) in enumerate(spatial_shapes): ref_y, ref_x = torch.meshgrid( - torch.linspace( - 0.5, H - 0.5, H, dtype=torch.float32, device=device), - torch.linspace( - 0.5, W - 0.5, W, dtype=torch.float32, device=device)) - ref_y = ref_y.reshape(-1)[None] / ( - valid_ratios[:, None, lvl, 1] * H) - ref_x = ref_x.reshape(-1)[None] / ( - valid_ratios[:, None, lvl, 0] * W) + torch.linspace(0.5, H - 0.5, H, dtype=torch.float32, device=device), + torch.linspace(0.5, W - 0.5, W, dtype=torch.float32, device=device), + ) + ref_y = ref_y.reshape(-1)[None] / (valid_ratios[:, None, lvl, 1] * H) + ref_x = ref_x.reshape(-1)[None] / (valid_ratios[:, None, lvl, 0] * W) ref = torch.stack((ref_x, ref_y), -1) reference_points_list.append(ref) reference_points = torch.cat(reference_points_list, 1) @@ -109,26 +106,24 @@ class DeformableDetrTransformerDecoder(DetrTransformerDecoder): def _init_layers(self) -> None: """Initialize decoder layers.""" - self.layers = ModuleList([ - DeformableDetrTransformerDecoderLayer(**self.layer_cfg) - for _ in range(self.num_layers) - ]) + self.layers = ModuleList([DeformableDetrTransformerDecoderLayer(**self.layer_cfg) for _ in range(self.num_layers)]) self.embed_dims = self.layers[0].embed_dims if self.post_norm_cfg is not None: - raise ValueError('There is not post_norm in ' - f'{self._get_name()}') - - def forward(self, - query: Tensor, - query_pos: Tensor, - value: Tensor, - key_padding_mask: Tensor, - reference_points: Tensor, - spatial_shapes: Tensor, - level_start_index: Tensor, - valid_ratios: Tensor, - reg_branches: Optional[nn.Module] = None, - **kwargs) -> Tuple[Tensor]: + raise ValueError("There is not post_norm in " f"{self._get_name()}") + + def forward( + self, + query: Tensor, + query_pos: Tensor, + value: Tensor, + key_padding_mask: Tensor, + reference_points: Tensor, + spatial_shapes: Tensor, + level_start_index: Tensor, + valid_ratios: Tensor, + reg_branches: Optional[nn.Module] = None, + **kwargs, + ) -> Tuple[Tensor]: """Forward function of Transformer decoder. Args: @@ -176,14 +171,10 @@ class DeformableDetrTransformerDecoder(DetrTransformerDecoder): intermediate_reference_points = [] for layer_id, layer in enumerate(self.layers): if reference_points.shape[-1] == 4: - reference_points_input = \ - reference_points[:, :, None] * \ - torch.cat([valid_ratios, valid_ratios], -1)[:, None] + reference_points_input = reference_points[:, :, None] * torch.cat([valid_ratios, valid_ratios], -1)[:, None] else: assert reference_points.shape[-1] == 2 - reference_points_input = \ - reference_points[:, :, None] * \ - valid_ratios[:, None] + reference_points_input = reference_points[:, :, None] * valid_ratios[:, None] output = layer( output, query_pos=query_pos, @@ -193,19 +184,18 @@ class DeformableDetrTransformerDecoder(DetrTransformerDecoder): level_start_index=level_start_index, valid_ratios=valid_ratios, reference_points=reference_points_input, - **kwargs) + **kwargs, + ) if reg_branches is not None: tmp_reg_preds = reg_branches[layer_id](output) if reference_points.shape[-1] == 4: - new_reference_points = tmp_reg_preds + inverse_sigmoid( - reference_points) + new_reference_points = tmp_reg_preds + inverse_sigmoid(reference_points) new_reference_points = new_reference_points.sigmoid() else: assert reference_points.shape[-1] == 2 new_reference_points = tmp_reg_preds - new_reference_points[..., :2] = tmp_reg_preds[ - ..., :2] + inverse_sigmoid(reference_points) + new_reference_points[..., :2] = tmp_reg_preds[..., :2] + inverse_sigmoid(reference_points) new_reference_points = new_reference_points.sigmoid() reference_points = new_reference_points.detach() @@ -214,8 +204,7 @@ class DeformableDetrTransformerDecoder(DetrTransformerDecoder): intermediate_reference_points.append(reference_points) if self.return_intermediate: - return torch.stack(intermediate), torch.stack( - intermediate_reference_points) + return torch.stack(intermediate), torch.stack(intermediate_reference_points) return output, reference_points @@ -228,10 +217,7 @@ class DeformableDetrTransformerEncoderLayer(DetrTransformerEncoderLayer): self.self_attn = MultiScaleDeformableAttention(**self.self_attn_cfg) self.embed_dims = self.self_attn.embed_dims self.ffn = FFN(**self.ffn_cfg) - norms_list = [ - build_norm_layer(self.norm_cfg, self.embed_dims)[1] - for _ in range(2) - ] + norms_list = [build_norm_layer(self.norm_cfg, self.embed_dims)[1] for _ in range(2)] self.norms = ModuleList(norms_list) @@ -244,8 +230,5 @@ class DeformableDetrTransformerDecoderLayer(DetrTransformerDecoderLayer): self.cross_attn = MultiScaleDeformableAttention(**self.cross_attn_cfg) self.embed_dims = self.self_attn.embed_dims self.ffn = FFN(**self.ffn_cfg) - norms_list = [ - build_norm_layer(self.norm_cfg, self.embed_dims)[1] - for _ in range(3) - ] + norms_list = [build_norm_layer(self.norm_cfg, self.embed_dims)[1] for _ in range(3)] self.norms = ModuleList(norms_list) diff --git a/mmpose/models/heads/transformer_heads/transformers/detr_layers.py b/mmpose/models/heads/transformer_heads/transformers/detr_layers.py index a669c5dda6c7693d405be0abc11b7486cce896e7..179070a12768e161bd68787b6666499888894782 100644 --- a/mmpose/models/heads/transformer_heads/transformers/detr_layers.py +++ b/mmpose/models/heads/transformer_heads/transformers/detr_layers.py @@ -22,10 +22,7 @@ class DetrTransformerEncoder(BaseModule): the initialization. Defaults to None. """ - def __init__(self, - num_layers: int, - layer_cfg: ConfigType, - init_cfg: OptConfigType = None) -> None: + def __init__(self, num_layers: int, layer_cfg: ConfigType, init_cfg: OptConfigType = None) -> None: super().__init__(init_cfg=init_cfg) self.num_layers = num_layers @@ -34,14 +31,10 @@ class DetrTransformerEncoder(BaseModule): def _init_layers(self) -> None: """Initialize encoder layers.""" - self.layers = ModuleList([ - DetrTransformerEncoderLayer(**self.layer_cfg) - for _ in range(self.num_layers) - ]) + self.layers = ModuleList([DetrTransformerEncoderLayer(**self.layer_cfg) for _ in range(self.num_layers)]) self.embed_dims = self.layers[0].embed_dims - def forward(self, query: Tensor, query_pos: Tensor, - key_padding_mask: Tensor, **kwargs) -> Tensor: + def forward(self, query: Tensor, query_pos: Tensor, key_padding_mask: Tensor, **kwargs) -> Tensor: """Forward function of encoder. Args: @@ -76,12 +69,14 @@ class DetrTransformerDecoder(BaseModule): the initialization. Defaults to None. """ - def __init__(self, - num_layers: int, - layer_cfg: ConfigType, - post_norm_cfg: OptConfigType = dict(type='LN'), - return_intermediate: bool = True, - init_cfg: Union[dict, ConfigDict] = None) -> None: + def __init__( + self, + num_layers: int, + layer_cfg: ConfigType, + post_norm_cfg: OptConfigType = dict(type="LN"), + return_intermediate: bool = True, + init_cfg: Union[dict, ConfigDict] = None, + ) -> None: super().__init__(init_cfg=init_cfg) self.layer_cfg = layer_cfg self.num_layers = num_layers @@ -91,17 +86,13 @@ class DetrTransformerDecoder(BaseModule): def _init_layers(self) -> None: """Initialize decoder layers.""" - self.layers = ModuleList([ - DetrTransformerDecoderLayer(**self.layer_cfg) - for _ in range(self.num_layers) - ]) + self.layers = ModuleList([DetrTransformerDecoderLayer(**self.layer_cfg) for _ in range(self.num_layers)]) self.embed_dims = self.layers[0].embed_dims - self.post_norm = build_norm_layer(self.post_norm_cfg, - self.embed_dims)[1] + self.post_norm = build_norm_layer(self.post_norm_cfg, self.embed_dims)[1] - def forward(self, query: Tensor, key: Tensor, value: Tensor, - query_pos: Tensor, key_pos: Tensor, key_padding_mask: Tensor, - **kwargs) -> Tensor: + def forward( + self, query: Tensor, key: Tensor, value: Tensor, query_pos: Tensor, key_pos: Tensor, key_padding_mask: Tensor, **kwargs + ) -> Tensor: """Forward function of decoder Args: query (Tensor): The input query, has shape (bs, num_queries, dim). @@ -121,14 +112,7 @@ class DetrTransformerDecoder(BaseModule): """ intermediate = [] for layer in self.layers: - query = layer( - query, - key=key, - value=value, - query_pos=query_pos, - key_pos=key_pos, - key_padding_mask=key_padding_mask, - **kwargs) + query = layer(query, key=key, value=value, query_pos=query_pos, key_pos=key_pos, key_padding_mask=key_padding_mask, **kwargs) if self.return_intermediate: intermediate.append(self.post_norm(query)) query = self.post_norm(query) @@ -153,27 +137,27 @@ class DetrTransformerEncoderLayer(BaseModule): the initialization. Defaults to None. """ - def __init__(self, - self_attn_cfg: OptConfigType = dict( - embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg: OptConfigType = dict( - embed_dims=256, - feedforward_channels=1024, - num_fcs=2, - ffn_drop=0., - act_cfg=dict(type='ReLU', inplace=True)), - norm_cfg: OptConfigType = dict(type='LN'), - init_cfg: OptConfigType = None) -> None: + def __init__( + self, + self_attn_cfg: OptConfigType = dict(embed_dims=256, num_heads=8, dropout=0.0), + ffn_cfg: OptConfigType = dict( + embed_dims=256, feedforward_channels=1024, num_fcs=2, ffn_drop=0.0, act_cfg=dict(type="ReLU", inplace=True) + ), + norm_cfg: OptConfigType = dict(type="LN"), + init_cfg: OptConfigType = None, + ) -> None: super().__init__(init_cfg=init_cfg) self.self_attn_cfg = self_attn_cfg - if 'batch_first' not in self.self_attn_cfg: - self.self_attn_cfg['batch_first'] = True + if "batch_first" not in self.self_attn_cfg: + self.self_attn_cfg["batch_first"] = True else: - assert self.self_attn_cfg['batch_first'] is True, 'First \ + assert ( + self.self_attn_cfg["batch_first"] is True + ), "First \ dimension of all DETRs in mmdet is `batch`, \ - please set `batch_first` flag.' + please set `batch_first` flag." self.ffn_cfg = ffn_cfg self.norm_cfg = norm_cfg @@ -184,14 +168,10 @@ class DetrTransformerEncoderLayer(BaseModule): self.self_attn = MultiheadAttention(**self.self_attn_cfg) self.embed_dims = self.self_attn.embed_dims self.ffn = FFN(**self.ffn_cfg) - norms_list = [ - build_norm_layer(self.norm_cfg, self.embed_dims)[1] - for _ in range(2) - ] + norms_list = [build_norm_layer(self.norm_cfg, self.embed_dims)[1] for _ in range(2)] self.norms = ModuleList(norms_list) - def forward(self, query: Tensor, query_pos: Tensor, - key_padding_mask: Tensor, **kwargs) -> Tensor: + def forward(self, query: Tensor, query_pos: Tensor, key_padding_mask: Tensor, **kwargs) -> Tensor: """Forward function of an encoder layer. Args: @@ -204,13 +184,8 @@ class DetrTransformerEncoderLayer(BaseModule): Tensor: forwarded results, has shape (bs, num_queries, dim). """ query = self.self_attn( - query=query, - key=query, - value=query, - query_pos=query_pos, - key_pos=query_pos, - key_padding_mask=key_padding_mask, - **kwargs) + query=query, key=query, value=query, query_pos=query_pos, key_pos=query_pos, key_padding_mask=key_padding_mask, **kwargs + ) query = self.norms[0](query) query = self.ffn(query) query = self.norms[1](query) @@ -234,44 +209,42 @@ class DetrTransformerDecoderLayer(BaseModule): the initialization. Defaults to None. """ - def __init__(self, - self_attn_cfg: OptConfigType = dict( - embed_dims=256, - num_heads=8, - dropout=0.0, - batch_first=True), - cross_attn_cfg: OptConfigType = dict( - embed_dims=256, - num_heads=8, - dropout=0.0, - batch_first=True), - ffn_cfg: OptConfigType = dict( - embed_dims=256, - feedforward_channels=1024, - num_fcs=2, - ffn_drop=0., - act_cfg=dict(type='ReLU', inplace=True), - ), - norm_cfg: OptConfigType = dict(type='LN'), - init_cfg: OptConfigType = None) -> None: + def __init__( + self, + self_attn_cfg: OptConfigType = dict(embed_dims=256, num_heads=8, dropout=0.0, batch_first=True), + cross_attn_cfg: OptConfigType = dict(embed_dims=256, num_heads=8, dropout=0.0, batch_first=True), + ffn_cfg: OptConfigType = dict( + embed_dims=256, + feedforward_channels=1024, + num_fcs=2, + ffn_drop=0.0, + act_cfg=dict(type="ReLU", inplace=True), + ), + norm_cfg: OptConfigType = dict(type="LN"), + init_cfg: OptConfigType = None, + ) -> None: super().__init__(init_cfg=init_cfg) self.self_attn_cfg = self_attn_cfg self.cross_attn_cfg = cross_attn_cfg - if 'batch_first' not in self.self_attn_cfg: - self.self_attn_cfg['batch_first'] = True + if "batch_first" not in self.self_attn_cfg: + self.self_attn_cfg["batch_first"] = True else: - assert self.self_attn_cfg['batch_first'] is True, 'First \ + assert ( + self.self_attn_cfg["batch_first"] is True + ), "First \ dimension of all DETRs in mmdet is `batch`, \ - please set `batch_first` flag.' + please set `batch_first` flag." - if 'batch_first' not in self.cross_attn_cfg: - self.cross_attn_cfg['batch_first'] = True + if "batch_first" not in self.cross_attn_cfg: + self.cross_attn_cfg["batch_first"] = True else: - assert self.cross_attn_cfg['batch_first'] is True, 'First \ + assert ( + self.cross_attn_cfg["batch_first"] is True + ), "First \ dimension of all DETRs in mmdet is `batch`, \ - please set `batch_first` flag.' + please set `batch_first` flag." self.ffn_cfg = ffn_cfg self.norm_cfg = norm_cfg @@ -283,22 +256,21 @@ class DetrTransformerDecoderLayer(BaseModule): self.cross_attn = MultiheadAttention(**self.cross_attn_cfg) self.embed_dims = self.self_attn.embed_dims self.ffn = FFN(**self.ffn_cfg) - norms_list = [ - build_norm_layer(self.norm_cfg, self.embed_dims)[1] - for _ in range(3) - ] + norms_list = [build_norm_layer(self.norm_cfg, self.embed_dims)[1] for _ in range(3)] self.norms = ModuleList(norms_list) - def forward(self, - query: Tensor, - key: Tensor = None, - value: Tensor = None, - query_pos: Tensor = None, - key_pos: Tensor = None, - self_attn_mask: Tensor = None, - cross_attn_mask: Tensor = None, - key_padding_mask: Tensor = None, - **kwargs) -> Tensor: + def forward( + self, + query: Tensor, + key: Tensor = None, + value: Tensor = None, + query_pos: Tensor = None, + key_pos: Tensor = None, + self_attn_mask: Tensor = None, + cross_attn_mask: Tensor = None, + key_padding_mask: Tensor = None, + **kwargs, + ) -> Tensor: """ Args: query (Tensor): The input query, has shape (bs, num_queries, dim). @@ -330,13 +302,8 @@ class DetrTransformerDecoderLayer(BaseModule): """ query = self.self_attn( - query=query, - key=query, - value=query, - query_pos=query_pos, - key_pos=query_pos, - attn_mask=self_attn_mask, - **kwargs) + query=query, key=query, value=query, query_pos=query_pos, key_pos=query_pos, attn_mask=self_attn_mask, **kwargs + ) query = self.norms[0](query) query = self.cross_attn( query=query, @@ -346,7 +313,8 @@ class DetrTransformerDecoderLayer(BaseModule): key_pos=key_pos, attn_mask=cross_attn_mask, key_padding_mask=key_padding_mask, - **kwargs) + **kwargs, + ) query = self.norms[1](query) query = self.ffn(query) query = self.norms[2](query) diff --git a/mmpose/models/heads/transformer_heads/transformers/utils.py b/mmpose/models/heads/transformer_heads/transformers/utils.py index 7d7c086dc838bc45347f1c7de9f3a7718e15a3cd..9ed5379819b7d9a184fdde5bdab9fdd6e9f1d158 100644 --- a/mmpose/models/heads/transformer_heads/transformers/utils.py +++ b/mmpose/models/heads/transformer_heads/transformers/utils.py @@ -19,8 +19,7 @@ class FFN(BaseModule): num_layers (int): Number of FFN layers.. """ - def __init__(self, input_dim: int, hidden_dim: int, output_dim: int, - num_layers: int) -> None: + def __init__(self, input_dim: int, hidden_dim: int, output_dim: int, num_layers: int) -> None: super().__init__() self.num_layers = num_layers @@ -53,19 +52,14 @@ class PositionEmbeddingSineHW(BaseModule): to the one used by the Attention is all you need paper, generalized to work on images.""" - def __init__(self, - num_pos_feats=64, - temperatureH=10000, - temperatureW=10000, - normalize=False, - scale=None): + def __init__(self, num_pos_feats=64, temperatureH=10000, temperatureW=10000, normalize=False, scale=None): super().__init__() self.num_pos_feats = num_pos_feats self.temperatureH = temperatureH self.temperatureW = temperatureW self.normalize = normalize if scale is not None and normalize is False: - raise ValueError('normalize should be True if scale is passed') + raise ValueError("normalize should be True if scale is passed") if scale is None: scale = 2 * math.pi self.scale = scale @@ -82,22 +76,16 @@ class PositionEmbeddingSineHW(BaseModule): y_embed = y_embed / (y_embed[:, -1:, :] + eps) * self.scale x_embed = x_embed / (x_embed[:, :, -1:] + eps) * self.scale - dim_tx = torch.arange( - self.num_pos_feats, dtype=torch.float32, device=mask.device) - dim_tx = self.temperatureW**(2 * (dim_tx // 2) / self.num_pos_feats) + dim_tx = torch.arange(self.num_pos_feats, dtype=torch.float32, device=mask.device) + dim_tx = self.temperatureW ** (2 * (dim_tx // 2) / self.num_pos_feats) pos_x = x_embed[:, :, :, None] / dim_tx - dim_ty = torch.arange( - self.num_pos_feats, dtype=torch.float32, device=mask.device) - dim_ty = self.temperatureH**(2 * (dim_ty // 2) / self.num_pos_feats) + dim_ty = torch.arange(self.num_pos_feats, dtype=torch.float32, device=mask.device) + dim_ty = self.temperatureH ** (2 * (dim_ty // 2) / self.num_pos_feats) pos_y = y_embed[:, :, :, None] / dim_ty - pos_x = torch.stack( - (pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), - dim=4).flatten(3) - pos_y = torch.stack( - (pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), - dim=4).flatten(3) + pos_x = torch.stack((pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4).flatten(3) + pos_y = torch.stack((pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4).flatten(3) pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2) return pos diff --git a/mmpose/models/losses/__init__.py b/mmpose/models/losses/__init__.py index 9fbf6ba1de561a5d772a224e5d806c5996aacff2..bd392d48e8140859c2dfee1d51e002408002a427 100644 --- a/mmpose/models/losses/__init__.py +++ b/mmpose/models/losses/__init__.py @@ -1,27 +1,55 @@ # Copyright (c) OpenMMLab. All rights reserved. from .ae_loss import AssociativeEmbeddingLoss from .bbox_loss import IoULoss -from .classification_loss import (BCELoss, JSDiscretLoss, KLDiscretLoss, - VariFocalLoss) +from .classification_loss import BCELoss, JSDiscretLoss, KLDiscretLoss, VariFocalLoss from .fea_dis_loss import FeaLoss -from .heatmap_loss import (AdaptiveWingLoss, KeypointMSELoss, - KeypointOHKMMSELoss, MLECCLoss, - OKSHeatmapLoss, CalibrationLoss) +from .heatmap_loss import AdaptiveWingLoss, CalibrationLoss, KeypointMSELoss, KeypointOHKMMSELoss, MLECCLoss, OKSHeatmapLoss from .logit_dis_loss import KDLoss from .loss_wrappers import CombinedLoss, MultipleLossWrapper -from .regression_loss import (BoneLoss, L1Loss, MPJPELoss, - MPJPEVelocityJointLoss, MSELoss, OKSLoss, - RLELoss, SemiSupervisionLoss, SmoothL1Loss, - SoftWeightSmoothL1Loss, SoftWingLoss, WingLoss, - L1LogLoss) +from .regression_loss import ( + BoneLoss, + L1LogLoss, + L1Loss, + MPJPELoss, + MPJPEVelocityJointLoss, + MSELoss, + OKSLoss, + RLELoss, + SemiSupervisionLoss, + SmoothL1Loss, + SoftWeightSmoothL1Loss, + SoftWingLoss, + WingLoss, +) __all__ = [ - 'KeypointMSELoss', 'KeypointOHKMMSELoss', 'SmoothL1Loss', 'WingLoss', - 'MPJPELoss', 'MSELoss', 'L1Loss', 'BCELoss', 'BoneLoss', - 'SemiSupervisionLoss', 'SoftWingLoss', 'AdaptiveWingLoss', 'RLELoss', - 'KLDiscretLoss', 'MultipleLossWrapper', 'JSDiscretLoss', 'CombinedLoss', - 'AssociativeEmbeddingLoss', 'SoftWeightSmoothL1Loss', - 'MPJPEVelocityJointLoss', 'FeaLoss', 'KDLoss', 'OKSLoss', 'IoULoss', - 'VariFocalLoss', 'MLECCLoss', 'L1LogLoss', 'OKSHeatmapLoss', - 'CalibrationLoss' + "KeypointMSELoss", + "KeypointOHKMMSELoss", + "SmoothL1Loss", + "WingLoss", + "MPJPELoss", + "MSELoss", + "L1Loss", + "BCELoss", + "BoneLoss", + "SemiSupervisionLoss", + "SoftWingLoss", + "AdaptiveWingLoss", + "RLELoss", + "KLDiscretLoss", + "MultipleLossWrapper", + "JSDiscretLoss", + "CombinedLoss", + "AssociativeEmbeddingLoss", + "SoftWeightSmoothL1Loss", + "MPJPEVelocityJointLoss", + "FeaLoss", + "KDLoss", + "OKSLoss", + "IoULoss", + "VariFocalLoss", + "MLECCLoss", + "L1LogLoss", + "OKSHeatmapLoss", + "CalibrationLoss", ] diff --git a/mmpose/models/losses/ae_loss.py b/mmpose/models/losses/ae_loss.py index 1f1e08181beaf835238596d95fe509b122c64b3d..faa515cc44163bf321df4f96a9cd484c3b2eb41f 100644 --- a/mmpose/models/losses/ae_loss.py +++ b/mmpose/models/losses/ae_loss.py @@ -32,9 +32,7 @@ class AssociativeEmbeddingLoss(nn.Module): the push loss and the pull loss. Defaults to 0.5 """ - def __init__(self, - loss_weight: float = 1.0, - push_loss_factor: float = 0.5) -> None: + def __init__(self, loss_weight: float = 1.0, push_loss_factor: float = 0.5) -> None: super().__init__() self.loss_weight = loss_weight self.push_loss_factor = push_loss_factor @@ -72,9 +70,7 @@ class AssociativeEmbeddingLoss(nn.Module): pull_loss = tags.new_zeros(size=(), requires_grad=True) push_loss = tags.new_zeros(size=(), requires_grad=True) else: - pull_loss = sum( - F.mse_loss(_kpt_tags, _tag.expand_as(_kpt_tags)) - for (_kpt_tags, _tag) in zip(instance_kpt_tags, instance_tags)) + pull_loss = sum(F.mse_loss(_kpt_tags, _tag.expand_as(_kpt_tags)) for (_kpt_tags, _tag) in zip(instance_kpt_tags, instance_tags)) if N == 1: push_loss = tags.new_zeros(size=(), requires_grad=True) @@ -90,8 +86,7 @@ class AssociativeEmbeddingLoss(nn.Module): return pull_loss, push_loss - def forward(self, tags: Tensor, keypoint_indices: Union[List[Tensor], - Tensor]): + def forward(self, tags: Tensor, keypoint_indices: Union[List[Tensor], Tensor]): """Compute associative embedding loss on a batch of data. Args: @@ -111,12 +106,11 @@ class AssociativeEmbeddingLoss(nn.Module): assert tags.shape[0] == len(keypoint_indices) - pull_loss = 0. - push_loss = 0. + pull_loss = 0.0 + push_loss = 0.0 for i in range(tags.shape[0]): - _pull, _push = self._ae_loss_per_image(tags[i], - keypoint_indices[i]) + _pull, _push = self._ae_loss_per_image(tags[i], keypoint_indices[i]) pull_loss += _pull * self.loss_weight push_loss += _push * self.loss_weight * self.push_loss_factor diff --git a/mmpose/models/losses/bbox_loss.py b/mmpose/models/losses/bbox_loss.py index 2694076b26e59625e52ac0fc7fa1045c39574b72..32181fb00deba1b4a6e98cc42146d3cb274b3ec6 100644 --- a/mmpose/models/losses/bbox_loss.py +++ b/mmpose/models/losses/bbox_loss.py @@ -20,23 +20,19 @@ class IoULoss(nn.Module): Default: 'log' """ - def __init__(self, - reduction='mean', - mode='log', - eps: float = 1e-16, - loss_weight=1.): + def __init__(self, reduction="mean", mode="log", eps: float = 1e-16, loss_weight=1.0): super().__init__() - assert reduction in ('mean', 'sum', 'none'), f'the argument ' \ - f'`reduction` should be either \'mean\', \'sum\' or \'none\', ' \ - f'but got {reduction}' + assert reduction in ("mean", "sum", "none"), ( + f"the argument " f"`reduction` should be either 'mean', 'sum' or 'none', " f"but got {reduction}" + ) - assert mode in ('linear', 'square', 'log'), f'the argument ' \ - f'`reduction` should be either \'linear\', \'square\' or ' \ - f'\'log\', but got {mode}' + assert mode in ("linear", "square", "log"), ( + f"the argument " f"`reduction` should be either 'linear', 'square' or " f"'log', but got {mode}" + ) self.reduction = reduction - self.criterion = partial(F.cross_entropy, reduction='none') + self.criterion = partial(F.cross_entropy, reduction="none") self.loss_weight = loss_weight self.mode = mode self.eps = eps @@ -52,14 +48,13 @@ class IoULoss(nn.Module): output (torch.Tensor[N, K]): Output classification. target (torch.Tensor[N, K]): Target classification. """ - ious = bbox_overlaps( - output, target, is_aligned=True).clamp(min=self.eps) + ious = bbox_overlaps(output, target, is_aligned=True).clamp(min=self.eps) - if self.mode == 'linear': + if self.mode == "linear": loss = 1 - ious - elif self.mode == 'square': + elif self.mode == "square": loss = 1 - ious.pow(2) - elif self.mode == 'log': + elif self.mode == "log": loss = -ious.log() else: raise NotImplementedError @@ -69,9 +64,9 @@ class IoULoss(nn.Module): target_weight = target_weight.unsqueeze(-1) loss = loss * target_weight - if self.reduction == 'sum': + if self.reduction == "sum": loss = loss.sum() - elif self.reduction == 'mean': + elif self.reduction == "mean": loss = loss.mean() return loss * self.loss_weight diff --git a/mmpose/models/losses/classification_loss.py b/mmpose/models/losses/classification_loss.py index 0b70d88cfa7739fa42ff4b5c219db26f6e78e987..815668091208fd44f073148ac19f68769f9f1359 100644 --- a/mmpose/models/losses/classification_loss.py +++ b/mmpose/models/losses/classification_loss.py @@ -21,22 +21,17 @@ class BCELoss(nn.Module): before output. Defaults to False. """ - def __init__(self, - use_target_weight=False, - loss_weight=1., - reduction='mean', - use_sigmoid=False): + def __init__(self, use_target_weight=False, loss_weight=1.0, reduction="mean", use_sigmoid=False): super().__init__() - assert reduction in ('mean', 'sum', 'none'), f'the argument ' \ - f'`reduction` should be either \'mean\', \'sum\' or \'none\', ' \ - f'but got {reduction}' + assert reduction in ("mean", "sum", "none"), ( + f"the argument " f"`reduction` should be either 'mean', 'sum' or 'none', " f"but got {reduction}" + ) self.reduction = reduction self.use_sigmoid = use_sigmoid - criterion = F.binary_cross_entropy if use_sigmoid \ - else F.binary_cross_entropy_with_logits - self.criterion = partial(criterion, reduction='none') + criterion = F.binary_cross_entropy if use_sigmoid else F.binary_cross_entropy_with_logits + self.criterion = partial(criterion, reduction="none") self.use_target_weight = use_target_weight self.loss_weight = loss_weight @@ -59,13 +54,13 @@ class BCELoss(nn.Module): loss = self.criterion(output, target) if target_weight.dim() == 1: target_weight = target_weight[:, None] - loss = (loss * target_weight) + loss = loss * target_weight else: loss = self.criterion(output, target) - if self.reduction == 'sum': + if self.reduction == "sum": loss = loss.sum() - elif self.reduction == 'mean': + elif self.reduction == "mean": loss = loss.mean() return loss * self.loss_weight @@ -92,7 +87,7 @@ class JSDiscretLoss(nn.Module): super(JSDiscretLoss, self).__init__() self.use_target_weight = use_target_weight self.size_average = size_average - self.kl_loss = nn.KLDivLoss(reduction='none') + self.kl_loss = nn.KLDivLoss(reduction="none") def kl(self, p, q): """Kullback-Leibler Divergence.""" @@ -156,13 +151,7 @@ class KLDiscretLoss(nn.Module): mask_weight (float): Weight of masked keypoints. Default: 1.0. """ - def __init__(self, - beta=1.0, - label_softmax=False, - label_beta=10.0, - use_target_weight=True, - mask=None, - mask_weight=1.0): + def __init__(self, beta=1.0, label_softmax=False, label_beta=10.0, use_target_weight=True, mask=None, mask_weight=1.0): super(KLDiscretLoss, self).__init__() self.beta = beta self.label_softmax = label_softmax @@ -172,7 +161,7 @@ class KLDiscretLoss(nn.Module): self.mask_weight = mask_weight self.log_softmax = nn.LogSoftmax(dim=1) - self.kl_loss = nn.KLDivLoss(reduction='none') + self.kl_loss = nn.KLDivLoss(reduction="none") def criterion(self, dec_outs, labels): """Criterion function.""" @@ -198,7 +187,7 @@ class KLDiscretLoss(nn.Module): if self.use_target_weight: weight = target_weight.reshape(-1) else: - weight = 1. + weight = 1.0 for pred, target in zip(pred_simcc, gt_simcc): pred = pred.reshape(-1, pred.size(-1)) @@ -233,8 +222,7 @@ class InfoNCELoss(nn.Module): def __init__(self, temperature: float = 1.0, loss_weight=1.0) -> None: super(InfoNCELoss, self).__init__() - assert temperature > 0, f'the argument `temperature` must be ' \ - f'positive, but got {temperature}' + assert temperature > 0, f"the argument `temperature` must be " f"positive, but got {temperature}" self.temp = temperature self.loss_weight = loss_weight @@ -252,7 +240,7 @@ class InfoNCELoss(nn.Module): features_norm = F.normalize(features, dim=1) logits = features_norm.mm(features_norm.t()) / self.temp targets = torch.arange(n, dtype=torch.long, device=features.device) - loss = F.cross_entropy(logits, targets, reduction='sum') + loss = F.cross_entropy(logits, targets, reduction="sum") return loss * self.loss_weight @@ -271,17 +259,12 @@ class VariFocalLoss(nn.Module): Defaults to 2.0. """ - def __init__(self, - use_target_weight=False, - loss_weight=1., - reduction='mean', - alpha=0.75, - gamma=2.0): + def __init__(self, use_target_weight=False, loss_weight=1.0, reduction="mean", alpha=0.75, gamma=2.0): super().__init__() - assert reduction in ('mean', 'sum', 'none'), f'the argument ' \ - f'`reduction` should be either \'mean\', \'sum\' or \'none\', ' \ - f'but got {reduction}' + assert reduction in ("mean", "sum", "none"), ( + f"the argument " f"`reduction` should be either 'mean', 'sum' or 'none', " f"but got {reduction}" + ) self.reduction = reduction self.use_target_weight = use_target_weight @@ -291,12 +274,9 @@ class VariFocalLoss(nn.Module): def criterion(self, output, target): label = (target > 1e-4).to(target) - weight = self.alpha * output.sigmoid().pow( - self.gamma) * (1 - label) + target + weight = self.alpha * output.sigmoid().pow(self.gamma) * (1 - label) + target output = output.clip(min=-10, max=10) - vfl = ( - F.binary_cross_entropy_with_logits( - output, target, reduction='none') * weight) + vfl = F.binary_cross_entropy_with_logits(output, target, reduction="none") * weight return vfl def forward(self, output, target, target_weight=None): @@ -318,16 +298,16 @@ class VariFocalLoss(nn.Module): loss = self.criterion(output, target) if target_weight.dim() == 1: target_weight = target_weight.unsqueeze(1) - loss = (loss * target_weight) + loss = loss * target_weight else: loss = self.criterion(output, target) loss[torch.isinf(loss)] = 0.0 loss[torch.isnan(loss)] = 0.0 - if self.reduction == 'sum': + if self.reduction == "sum": loss = loss.sum() - elif self.reduction == 'mean': + elif self.reduction == "mean": loss = loss.mean() return loss * self.loss_weight diff --git a/mmpose/models/losses/fea_dis_loss.py b/mmpose/models/losses/fea_dis_loss.py index b90ca9d24f56139de25ed95b8f2d19e6012cb516..7343827de4e35b276f21808c6a3d81c935bb35e1 100644 --- a/mmpose/models/losses/fea_dis_loss.py +++ b/mmpose/models/losses/fea_dis_loss.py @@ -28,12 +28,7 @@ class FeaLoss(nn.Module): self.alpha_fea = alpha_fea if teacher_channels != student_channels: - self.align = nn.Conv2d( - student_channels, - teacher_channels, - kernel_size=1, - stride=1, - padding=0) + self.align = nn.Conv2d(student_channels, teacher_channels, kernel_size=1, stride=1, padding=0) else: self.align = None @@ -55,7 +50,7 @@ class FeaLoss(nn.Module): return loss def get_dis_loss(self, preds_S, preds_T): - loss_mse = nn.MSELoss(reduction='sum') + loss_mse = nn.MSELoss(reduction="sum") N, C, H, W = preds_T.shape dis_loss = loss_mse(preds_S, preds_T) / N * self.alpha_fea diff --git a/mmpose/models/losses/heatmap_loss.py b/mmpose/models/losses/heatmap_loss.py index 908e3636bdf922fee10fbee0c06bc084b18a0940..1f717a53fe07b6589f6f9196f2d573db47d788d1 100644 --- a/mmpose/models/losses/heatmap_loss.py +++ b/mmpose/models/losses/heatmap_loss.py @@ -24,22 +24,21 @@ class KeypointMSELoss(nn.Module): loss_weight (float): Weight of the loss. Defaults to 1.0 """ - def __init__(self, - use_target_weight: bool = False, - skip_empty_channel: bool = False, - loss_weight: float = 1.): + def __init__(self, use_target_weight: bool = False, skip_empty_channel: bool = False, loss_weight: float = 1.0): super().__init__() self.use_target_weight = use_target_weight self.skip_empty_channel = skip_empty_channel self.loss_weight = loss_weight - def forward(self, - output: Tensor, - target: Tensor, - target_weights: Optional[Tensor] = None, - mask: Optional[Tensor] = None, - per_keypoint: bool = False, - per_pixel: bool = False) -> Tensor: + def forward( + self, + output: Tensor, + target: Tensor, + target_weights: Optional[Tensor] = None, + mask: Optional[Tensor] = None, + per_keypoint: bool = False, + per_pixel: bool = False, + ) -> Tensor: """Forward function of loss. Note: @@ -63,9 +62,9 @@ class KeypointMSELoss(nn.Module): """ _mask = self._get_mask(target, target_weights, mask) - - _loss = F.mse_loss(output, target, reduction='none') - + + _loss = F.mse_loss(output, target, reduction="none") + if _mask is not None: loss = _loss * _mask @@ -78,8 +77,7 @@ class KeypointMSELoss(nn.Module): return loss * self.loss_weight - def _get_mask(self, target: Tensor, target_weights: Optional[Tensor], - mask: Optional[Tensor]) -> Optional[Tensor]: + def _get_mask(self, target: Tensor, target_weights: Optional[Tensor], mask: Optional[Tensor]) -> Optional[Tensor]: """Generate the heatmap mask w.r.t. the given mask, target weight and `skip_empty_channel` setting. @@ -90,23 +88,19 @@ class KeypointMSELoss(nn.Module): # Given spatial mask if mask is not None: # check mask has matching type with target - assert (mask.ndim == target.ndim and all( - d_m == d_t or d_m == 1 - for d_m, d_t in zip(mask.shape, target.shape))), ( - f'mask and target have mismatched shapes {mask.shape} v.s.' - f'{target.shape}') + assert mask.ndim == target.ndim and all(d_m == d_t or d_m == 1 for d_m, d_t in zip(mask.shape, target.shape)), ( + f"mask and target have mismatched shapes {mask.shape} v.s." f"{target.shape}" + ) # Mask by target weights (keypoint-wise mask) if target_weights is not None: # check target weight has matching shape with target - assert (target_weights.ndim in (2, 4) and target_weights.shape - == target.shape[:target_weights.ndim]), ( - 'target_weights and target have mismatched shapes ' - f'{target_weights.shape} v.s. {target.shape}') + assert target_weights.ndim in (2, 4) and target_weights.shape == target.shape[: target_weights.ndim], ( + "target_weights and target have mismatched shapes " f"{target_weights.shape} v.s. {target.shape}" + ) ndim_pad = target.ndim - target_weights.ndim - _mask = target_weights.view(target_weights.shape + - (1, ) * ndim_pad) + _mask = target_weights.view(target_weights.shape + (1,) * ndim_pad) if mask is None: mask = _mask @@ -117,7 +111,7 @@ class KeypointMSELoss(nn.Module): if self.skip_empty_channel: _mask = (target != 0).flatten(2).any(dim=2) ndim_pad = target.ndim - _mask.ndim - _mask = _mask.view(_mask.shape + (1, ) * ndim_pad) + _mask = _mask.view(_mask.shape + (1,) * ndim_pad) if mask is None: mask = _mask @@ -143,16 +137,13 @@ class CombinedTargetMSELoss(nn.Module): loss_weight (float): Weight of the loss. Defaults to 1.0 """ - def __init__(self, - use_target_weight: bool = False, - loss_weight: float = 1.): + def __init__(self, use_target_weight: bool = False, loss_weight: float = 1.0): super().__init__() - self.criterion = nn.MSELoss(reduction='mean') + self.criterion = nn.MSELoss(reduction="mean") self.use_target_weight = use_target_weight self.loss_weight = loss_weight - def forward(self, output: Tensor, target: Tensor, - target_weights: Tensor) -> Tensor: + def forward(self, output: Tensor, target: Tensor, target_weights: Tensor) -> Tensor: """Forward function of loss. Note: @@ -174,11 +165,9 @@ class CombinedTargetMSELoss(nn.Module): """ batch_size = output.size(0) num_channels = output.size(1) - heatmaps_pred = output.reshape( - (batch_size, num_channels, -1)).split(1, 1) - heatmaps_gt = target.reshape( - (batch_size, num_channels, -1)).split(1, 1) - loss = 0. + heatmaps_pred = output.reshape((batch_size, num_channels, -1)).split(1, 1) + heatmaps_gt = target.reshape((batch_size, num_channels, -1)).split(1, 1) + loss = 0.0 num_joints = num_channels // 3 for idx in range(num_joints): heatmap_pred = heatmaps_pred[idx * 3].squeeze() @@ -194,10 +183,8 @@ class CombinedTargetMSELoss(nn.Module): # classification loss loss += 0.5 * self.criterion(heatmap_pred, heatmap_gt) # regression loss - loss += 0.5 * self.criterion(heatmap_gt * offset_x_pred, - heatmap_gt * offset_x_gt) - loss += 0.5 * self.criterion(heatmap_gt * offset_y_pred, - heatmap_gt * offset_y_gt) + loss += 0.5 * self.criterion(heatmap_gt * offset_x_pred, heatmap_gt * offset_x_gt) + loss += 0.5 * self.criterion(heatmap_gt * offset_y_pred, heatmap_gt * offset_y_gt) return loss / num_joints * self.loss_weight @@ -213,13 +200,10 @@ class KeypointOHKMMSELoss(nn.Module): loss_weight (float): Weight of the loss. Defaults to 1.0 """ - def __init__(self, - use_target_weight: bool = False, - topk: int = 8, - loss_weight: float = 1.): + def __init__(self, use_target_weight: bool = False, topk: int = 8, loss_weight: float = 1.0): super().__init__() assert topk > 0 - self.criterion = nn.MSELoss(reduction='none') + self.criterion = nn.MSELoss(reduction="none") self.use_target_weight = use_target_weight self.topk = topk self.loss_weight = loss_weight @@ -237,19 +221,17 @@ class KeypointOHKMMSELoss(nn.Module): Returns: Tensor: The calculated loss. """ - ohkm_loss = 0. + ohkm_loss = 0.0 B = losses.shape[0] for i in range(B): sub_loss = losses[i] - _, topk_idx = torch.topk( - sub_loss, k=self.topk, dim=0, sorted=False) + _, topk_idx = torch.topk(sub_loss, k=self.topk, dim=0, sorted=False) tmp_loss = torch.gather(sub_loss, 0, topk_idx) ohkm_loss += torch.sum(tmp_loss) / self.topk ohkm_loss /= B return ohkm_loss - def forward(self, output: Tensor, target: Tensor, - target_weights: Tensor) -> Tensor: + def forward(self, output: Tensor, target: Tensor, target_weights: Tensor) -> Tensor: """Forward function of loss. Note: @@ -269,16 +251,13 @@ class KeypointOHKMMSELoss(nn.Module): """ num_keypoints = output.size(1) if num_keypoints < self.topk: - raise ValueError(f'topk ({self.topk}) should not be ' - f'larger than num_keypoints ({num_keypoints}).') + raise ValueError(f"topk ({self.topk}) should not be " f"larger than num_keypoints ({num_keypoints}).") losses = [] for idx in range(num_keypoints): if self.use_target_weight: target_weight = target_weights[:, idx, None, None] - losses.append( - self.criterion(output[:, idx] * target_weight, - target[:, idx] * target_weight)) + losses.append(self.criterion(output[:, idx] * target_weight, target[:, idx] * target_weight)) else: losses.append(self.criterion(output[:, idx], target[:, idx])) @@ -301,13 +280,7 @@ class AdaptiveWingLoss(nn.Module): loss_weight (float): Weight of the loss. Default: 1.0. """ - def __init__(self, - alpha=2.1, - omega=14, - epsilon=1, - theta=0.5, - use_target_weight=False, - loss_weight=1.): + def __init__(self, alpha=2.1, omega=14, epsilon=1, theta=0.5, use_target_weight=False, loss_weight=1.0): super().__init__() self.alpha = float(alpha) self.omega = float(omega) @@ -330,27 +303,22 @@ class AdaptiveWingLoss(nn.Module): H, W = pred.shape[2:4] delta = (target - pred).abs() - A = self.omega * ( - 1 / (1 + torch.pow(self.theta / self.epsilon, self.alpha - target)) - ) * (self.alpha - target) * (torch.pow( - self.theta / self.epsilon, - self.alpha - target - 1)) * (1 / self.epsilon) - C = self.theta * A - self.omega * torch.log( - 1 + torch.pow(self.theta / self.epsilon, self.alpha - target)) + A = ( + self.omega + * (1 / (1 + torch.pow(self.theta / self.epsilon, self.alpha - target))) + * (self.alpha - target) + * (torch.pow(self.theta / self.epsilon, self.alpha - target - 1)) + * (1 / self.epsilon) + ) + C = self.theta * A - self.omega * torch.log(1 + torch.pow(self.theta / self.epsilon, self.alpha - target)) losses = torch.where( - delta < self.theta, - self.omega * - torch.log(1 + - torch.pow(delta / self.epsilon, self.alpha - target)), - A * delta - C) + delta < self.theta, self.omega * torch.log(1 + torch.pow(delta / self.epsilon, self.alpha - target)), A * delta - C + ) return torch.mean(losses) - def forward(self, - output: Tensor, - target: Tensor, - target_weights: Optional[Tensor] = None): + def forward(self, output: Tensor, target: Tensor, target_weights: Optional[Tensor] = None): """Forward function. Note: @@ -364,16 +332,13 @@ class AdaptiveWingLoss(nn.Module): Weights across different joint types. """ if self.use_target_weight: - assert (target_weights.ndim in (2, 4) and target_weights.shape - == target.shape[:target_weights.ndim]), ( - 'target_weights and target have mismatched shapes ' - f'{target_weights.shape} v.s. {target.shape}') + assert target_weights.ndim in (2, 4) and target_weights.shape == target.shape[: target_weights.ndim], ( + "target_weights and target have mismatched shapes " f"{target_weights.shape} v.s. {target.shape}" + ) ndim_pad = target.ndim - target_weights.ndim - target_weights = target_weights.view(target_weights.shape + - (1, ) * ndim_pad) - loss = self.criterion(output * target_weights, - target * target_weights) + target_weights = target_weights.view(target_weights.shape + (1,) * ndim_pad) + loss = self.criterion(output * target_weights, target * target_weights) else: loss = self.criterion(output, target) @@ -403,22 +368,14 @@ class FocalHeatmapLoss(KeypointMSELoss): loss_weight (float): Weight of the loss. Defaults to 1.0 """ - def __init__(self, - alpha: int = 2, - beta: int = 4, - use_target_weight: bool = False, - skip_empty_channel: bool = False, - loss_weight: float = 1.0): - super(FocalHeatmapLoss, self).__init__(use_target_weight, - skip_empty_channel, loss_weight) + def __init__( + self, alpha: int = 2, beta: int = 4, use_target_weight: bool = False, skip_empty_channel: bool = False, loss_weight: float = 1.0 + ): + super(FocalHeatmapLoss, self).__init__(use_target_weight, skip_empty_channel, loss_weight) self.alpha = alpha self.beta = beta - def forward(self, - output: Tensor, - target: Tensor, - target_weights: Optional[Tensor] = None, - mask: Optional[Tensor] = None) -> Tensor: + def forward(self, output: Tensor, target: Tensor, target_weights: Optional[Tensor] = None, mask: Optional[Tensor] = None) -> Tensor: """Calculate the modified focal loss for heatmap prediction. Note: @@ -451,10 +408,8 @@ class FocalHeatmapLoss(KeypointMSELoss): neg_weights = torch.pow(1 - target, self.beta) - pos_loss = torch.log(output) * torch.pow(1 - output, - self.alpha) * pos_inds - neg_loss = torch.log(1 - output) * torch.pow( - output, self.alpha) * neg_weights * neg_inds + pos_loss = torch.log(output) * torch.pow(1 - output, self.alpha) * pos_inds + neg_loss = torch.log(1 - output) * torch.pow(output, self.alpha) * neg_weights * neg_inds num_pos = pos_inds.float().sum() if num_pos == 0: @@ -486,18 +441,10 @@ class MLECCLoss(nn.Module): NotImplementedError: If the selected mode is not implemented. """ - def __init__(self, - reduction: str = 'mean', - mode: str = 'log', - use_target_weight: bool = False, - loss_weight: float = 1.0): + def __init__(self, reduction: str = "mean", mode: str = "log", use_target_weight: bool = False, loss_weight: float = 1.0): super().__init__() - assert reduction in ('mean', 'sum', 'none'), \ - f"`reduction` should be either 'mean', 'sum', or 'none', " \ - f'but got {reduction}' - assert mode in ('linear', 'square', 'log'), \ - f"`mode` should be either 'linear', 'square', or 'log', " \ - f'but got {mode}' + assert reduction in ("mean", "sum", "none"), f"`reduction` should be either 'mean', 'sum', or 'none', " f"but got {reduction}" + assert mode in ("linear", "square", "log"), f"`mode` should be either 'linear', 'square', or 'log', " f"but got {mode}" self.reduction = reduction self.mode = mode @@ -518,18 +465,17 @@ class MLECCLoss(nn.Module): reduction. """ - assert len(outputs) == len(targets), \ - 'Outputs and targets must have the same length' + assert len(outputs) == len(targets), "Outputs and targets must have the same length" prob = 1.0 for o, t in zip(outputs, targets): prob *= (o * t).sum(dim=-1) - if self.mode == 'linear': + if self.mode == "linear": loss = 1.0 - prob - elif self.mode == 'square': + elif self.mode == "square": loss = 1.0 - prob.pow(2) - elif self.mode == 'log': + elif self.mode == "log": loss = -torch.log(prob + 1e-4) loss[torch.isnan(loss)] = 0.0 @@ -540,9 +486,9 @@ class MLECCLoss(nn.Module): target_weight = target_weight.unsqueeze(-1) loss = loss * target_weight - if self.reduction == 'sum': + if self.reduction == "sum": loss = loss.flatten(1).sum(dim=1) - elif self.reduction == 'mean': + elif self.reduction == "mean": loss = loss.flatten(1).mean(dim=1) return loss * self.loss_weight @@ -563,13 +509,15 @@ class OKSHeatmapLoss(nn.Module): loss_weight (float): Weight of the loss. Defaults to 1.0 """ - def __init__(self, - use_target_weight: bool = False, - skip_empty_channel: bool = False, - smoothing_weight: float = 0.2, - gaussian_weight: float = 0.0, - loss_weight: float = 1., - oks_type: str = "minus"): + def __init__( + self, + use_target_weight: bool = False, + skip_empty_channel: bool = False, + smoothing_weight: float = 0.2, + gaussian_weight: float = 0.0, + loss_weight: float = 1.0, + oks_type: str = "minus", + ): super().__init__() self.use_target_weight = use_target_weight self.skip_empty_channel = skip_empty_channel @@ -580,13 +528,15 @@ class OKSHeatmapLoss(nn.Module): assert self.oks_type in ["minus", "plus", "both"] - def forward(self, - output: Tensor, - target: Tensor, - target_weights: Optional[Tensor] = None, - mask: Optional[Tensor] = None, - per_pixel: bool = False, - per_keypoint: bool = False) -> Tensor: + def forward( + self, + output: Tensor, + target: Tensor, + target_weights: Optional[Tensor] = None, + mask: Optional[Tensor] = None, + per_pixel: bool = False, + per_keypoint: bool = False, + ) -> Tensor: """Forward function of loss. Note: @@ -609,15 +559,14 @@ class OKSHeatmapLoss(nn.Module): Tensor: The calculated loss. """ - assert target.max() <= 1, 'target should be normalized' - assert target.min() >= 0, 'target should be normalized' + target = torch.clamp(target, 0, 1) B, K, H, W = output.shape _mask = self._get_mask(target, target_weights, mask) - - oks_minus = output * (1-target) - oks_plus = (1-output) * (target) + + oks_minus = output * (1 - target) + oks_plus = (1 - output) * (target) if self.oks_type == "both": oks = (oks_minus + oks_plus) / 2 elif self.oks_type == "minus": @@ -626,51 +575,39 @@ class OKSHeatmapLoss(nn.Module): oks = oks_plus else: raise ValueError(f"oks_type {self.oks_type} not recognized") - - mse = F.mse_loss(output, target, reduction='none') + + mse = F.mse_loss(output, target, reduction="none") # Smoothness loss sobel_x = torch.tensor([[1, 0, -1], [2, 0, -2], [1, 0, -1]], dtype=torch.float32).view(1, 1, 3, 3).to(output.device) sobel_y = torch.tensor([[1, 2, 1], [0, 0, 0], [-1, -2, -1]], dtype=torch.float32).view(1, 1, 3, 3).to(output.device) - gradient_x = F.conv2d(output.reshape(B*K, 1, H, W), sobel_x, padding='same') - gradient_y = F.conv2d(output.reshape(B*K, 1, H, W), sobel_y, padding='same') + gradient_x = F.conv2d(output.reshape(B * K, 1, H, W), sobel_x, padding="same") + gradient_y = F.conv2d(output.reshape(B * K, 1, H, W), sobel_y, padding="same") gradient = (gradient_x**2 + gradient_y**2).reshape(B, K, H, W) - + if _mask is not None: oks = oks * _mask mse = mse * _mask gradient = gradient * _mask - - oks_minus_weight = ( - 1 - self.smoothing_weight - self.gaussian_weight - ) + oks_minus_weight = 1 - self.smoothing_weight - self.gaussian_weight if per_pixel: - loss = ( - self.smoothing_weight * gradient + - oks_minus_weight * oks + - self.gaussian_weight * mse - ) + loss = self.smoothing_weight * gradient + oks_minus_weight * oks + self.gaussian_weight * mse elif per_keypoint: - max_gradient, _ = gradient.reshape((B, K, H*W)).max(dim=-1) + max_gradient, _ = gradient.reshape((B, K, H * W)).max(dim=-1) loss = ( - oks_minus_weight * oks.sum(dim=(2, 3)) + - self.smoothing_weight * max_gradient + - self.gaussian_weight * mse.mean(dim=(2, 3)) + oks_minus_weight * oks.sum(dim=(2, 3)) + self.smoothing_weight * max_gradient + self.gaussian_weight * mse.mean(dim=(2, 3)) ) else: - max_gradient, _ = gradient.reshape((B, K, H*W)).max(dim=-1) + max_gradient, _ = gradient.reshape((B, K, H * W)).max(dim=-1) loss = ( - oks_minus_weight * oks.sum(dim=(2, 3)) + - self.smoothing_weight * max_gradient + - self.gaussian_weight * mse.mean(dim=(2, 3)) + oks_minus_weight * oks.sum(dim=(2, 3)) + self.smoothing_weight * max_gradient + self.gaussian_weight * mse.mean(dim=(2, 3)) ).mean() - + return loss * self.loss_weight - def _get_mask(self, target: Tensor, target_weights: Optional[Tensor], - mask: Optional[Tensor]) -> Optional[Tensor]: + def _get_mask(self, target: Tensor, target_weights: Optional[Tensor], mask: Optional[Tensor]) -> Optional[Tensor]: """Generate the heatmap mask w.r.t. the given mask, target weight and `skip_empty_channel` setting. @@ -681,23 +618,19 @@ class OKSHeatmapLoss(nn.Module): # Given spatial mask if mask is not None: # check mask has matching type with target - assert (mask.ndim == target.ndim and all( - d_m == d_t or d_m == 1 - for d_m, d_t in zip(mask.shape, target.shape))), ( - f'mask and target have mismatched shapes {mask.shape} v.s.' - f'{target.shape}') + assert mask.ndim == target.ndim and all(d_m == d_t or d_m == 1 for d_m, d_t in zip(mask.shape, target.shape)), ( + f"mask and target have mismatched shapes {mask.shape} v.s." f"{target.shape}" + ) # Mask by target weights (keypoint-wise mask) if target_weights is not None: # check target weight has matching shape with target - assert (target_weights.ndim in (2, 4) and target_weights.shape - == target.shape[:target_weights.ndim]), ( - 'target_weights and target have mismatched shapes ' - f'{target_weights.shape} v.s. {target.shape}') + assert target_weights.ndim in (2, 4) and target_weights.shape == target.shape[: target_weights.ndim], ( + "target_weights and target have mismatched shapes " f"{target_weights.shape} v.s. {target.shape}" + ) ndim_pad = target.ndim - target_weights.ndim - _mask = target_weights.view(target_weights.shape + - (1, ) * ndim_pad) + _mask = target_weights.view(target_weights.shape + (1,) * ndim_pad) if mask is None: mask = _mask @@ -708,7 +641,7 @@ class OKSHeatmapLoss(nn.Module): if self.skip_empty_channel: _mask = (target != 0).flatten(2).any(dim=2) ndim_pad = target.ndim - _mask.ndim - _mask = _mask.view(_mask.shape + (1, ) * ndim_pad) + _mask = _mask.view(_mask.shape + (1,) * ndim_pad) if mask is None: mask = _mask @@ -719,7 +652,6 @@ class OKSHeatmapLoss(nn.Module): @MODELS.register_module() - class CalibrationLoss(nn.Module): """OKS-based loss for heatmaps. @@ -734,24 +666,28 @@ class CalibrationLoss(nn.Module): loss_weight (float): Weight of the loss. Defaults to 1.0 """ - def __init__(self, - use_target_weight: bool = False, - skip_empty_channel: bool = False, - loss_weight: float = 1., - ignore_bottom_percentile: float = 0.7): + def __init__( + self, + use_target_weight: bool = False, + skip_empty_channel: bool = False, + loss_weight: float = 1.0, + ignore_bottom_percentile: float = 0.7, + ): super().__init__() self.use_target_weight = use_target_weight self.skip_empty_channel = skip_empty_channel self.loss_weight = loss_weight self.ignore_bottom_percentile = ignore_bottom_percentile - def forward(self, - output: Tensor, - target: Tensor, - target_weights: Optional[Tensor] = None, - mask: Optional[Tensor] = None, - per_pixel: bool = False, - per_keypoint: bool = False) -> Tensor: + def forward( + self, + output: Tensor, + target: Tensor, + target_weights: Optional[Tensor] = None, + mask: Optional[Tensor] = None, + per_pixel: bool = False, + per_keypoint: bool = False, + ) -> Tensor: """Forward function of loss. Note: @@ -774,18 +710,18 @@ class CalibrationLoss(nn.Module): Tensor: The calculated loss. """ - assert target.max() <= 1, 'target should be normalized' - assert target.min() >= 0, 'target should be normalized' + assert target.max() <= 1, "target should be normalized" + assert target.min() >= 0, "target should be normalized" B, K, H, W = output.shape _mask = self._get_mask(target, target_weights, mask) - + pred_probs = output * target - pred_probs_sum = pred_probs.sum(dim=(2,3)) + pred_probs_sum = pred_probs.sum(dim=(2, 3)) # threshold = torch.quantile(pred_probs_sum.detach(), self.ignore_bottom_percentile) # _mask = _mask * (pred_probs_sum > self.ignore_bottom_percentile).view(B, K, 1, 1) - + # print() # tmp = -torch.log(pred_probs_sum.flatten() + 1e-10)[:, None] # tmp = torch.cat([pred_probs_sum.flatten()[:, None], tmp, _mask.reshape(tmp.shape)], dim=1) @@ -804,9 +740,7 @@ class CalibrationLoss(nn.Module): return loss * self.loss_weight - - def _get_mask(self, target: Tensor, target_weights: Optional[Tensor], - mask: Optional[Tensor]) -> Optional[Tensor]: + def _get_mask(self, target: Tensor, target_weights: Optional[Tensor], mask: Optional[Tensor]) -> Optional[Tensor]: """Generate the heatmap mask w.r.t. the given mask, target weight and `skip_empty_channel` setting. @@ -817,23 +751,19 @@ class CalibrationLoss(nn.Module): # Given spatial mask if mask is not None: # check mask has matching type with target - assert (mask.ndim == target.ndim and all( - d_m == d_t or d_m == 1 - for d_m, d_t in zip(mask.shape, target.shape))), ( - f'mask and target have mismatched shapes {mask.shape} v.s.' - f'{target.shape}') + assert mask.ndim == target.ndim and all(d_m == d_t or d_m == 1 for d_m, d_t in zip(mask.shape, target.shape)), ( + f"mask and target have mismatched shapes {mask.shape} v.s." f"{target.shape}" + ) # Mask by target weights (keypoint-wise mask) if target_weights is not None: # check target weight has matching shape with target - assert (target_weights.ndim in (2, 4) and target_weights.shape - == target.shape[:target_weights.ndim]), ( - 'target_weights and target have mismatched shapes ' - f'{target_weights.shape} v.s. {target.shape}') + assert target_weights.ndim in (2, 4) and target_weights.shape == target.shape[: target_weights.ndim], ( + "target_weights and target have mismatched shapes " f"{target_weights.shape} v.s. {target.shape}" + ) ndim_pad = target.ndim - target_weights.ndim - _mask = target_weights.view(target_weights.shape + - (1, ) * ndim_pad) + _mask = target_weights.view(target_weights.shape + (1,) * ndim_pad) if mask is None: mask = _mask @@ -844,7 +774,7 @@ class CalibrationLoss(nn.Module): if self.skip_empty_channel: _mask = (target != 0).flatten(2).any(dim=2) ndim_pad = target.ndim - _mask.ndim - _mask = _mask.view(_mask.shape + (1, ) * ndim_pad) + _mask = _mask.view(_mask.shape + (1,) * ndim_pad) if mask is None: mask = _mask diff --git a/mmpose/models/losses/logit_dis_loss.py b/mmpose/models/losses/logit_dis_loss.py index 32906a1c3f1a07723548946322dc637f1761a71b..c9a68ab398429bf1c19cb73c6b660cf20058d792 100644 --- a/mmpose/models/losses/logit_dis_loss.py +++ b/mmpose/models/losses/logit_dis_loss.py @@ -25,7 +25,7 @@ class KDLoss(nn.Module): super(KDLoss, self).__init__() self.log_softmax = nn.LogSoftmax(dim=1) - self.kl_loss = nn.KLDivLoss(reduction='none') + self.kl_loss = nn.KLDivLoss(reduction="none") self.weight = weight def forward(self, pred, pred_t, beta, target_weight): @@ -38,8 +38,8 @@ class KDLoss(nn.Module): num_joints = ls_x.size(1) loss = 0 - loss += (self.loss(ls_x, lt_x, beta, target_weight)) - loss += (self.loss(ls_y, lt_y, beta, target_weight)) + loss += self.loss(ls_x, lt_x, beta, target_weight) + loss += self.loss(ls_y, lt_y, beta, target_weight) return loss / num_joints diff --git a/mmpose/models/losses/loss_wrappers.py b/mmpose/models/losses/loss_wrappers.py index d821661b48a133ffd6c9232d5a6a2d3eb6bf0a50..5363f973a71e2364834a0cb3bec3ae16080e07e3 100644 --- a/mmpose/models/losses/loss_wrappers.py +++ b/mmpose/models/losses/loss_wrappers.py @@ -40,9 +40,9 @@ class MultipleLossWrapper(nn.Module): keypoint_weights (Tensor[N, K, D]): Weights across different joint types. """ - assert isinstance(input_list, list), '' - assert isinstance(target_list, list), '' - assert len(input_list) == len(target_list), '' + assert isinstance(input_list, list), "" + assert isinstance(target_list, list), "" + assert len(input_list) == len(target_list), "" losses = [] for i in range(self.num_losses): diff --git a/mmpose/models/losses/regression_loss.py b/mmpose/models/losses/regression_loss.py index 591bfb1b9cde41accdea2fd11456162bc6bfec0e..1e369d1e00f49a65e2bc984cee3b02106aeb185e 100644 --- a/mmpose/models/losses/regression_loss.py +++ b/mmpose/models/losses/regression_loss.py @@ -9,6 +9,7 @@ import torch.nn.functional as F from mmpose.datasets.datasets.utils import parse_pose_metainfo from mmpose.registry import MODELS + from ..utils.realnvp import RealNVP @@ -32,11 +33,7 @@ class RLELoss(nn.Module): Options: "laplace" or "gaussian" """ - def __init__(self, - use_target_weight=False, - size_average=True, - residual=True, - q_distribution='laplace'): + def __init__(self, use_target_weight=False, size_average=True, residual=True, q_distribution="laplace"): super(RLELoss, self).__init__() self.size_average = size_average self.use_target_weight = use_target_weight @@ -66,17 +63,15 @@ class RLELoss(nn.Module): # (B, K, 2) log_phi = self.flow_model.log_prob(error.reshape(-1, 2)) log_phi = log_phi.reshape(target.shape[0], target.shape[1], 1) - log_sigma = torch.log(sigma).reshape(target.shape[0], target.shape[1], - 2) + log_sigma = torch.log(sigma).reshape(target.shape[0], target.shape[1], 2) nf_loss = log_sigma - log_phi if self.residual: - assert self.q_distribution in ['laplace', 'gaussian'] - if self.q_distribution == 'laplace': + assert self.q_distribution in ["laplace", "gaussian"] + if self.q_distribution == "laplace": loss_q = torch.log(sigma * 2) + torch.abs(error) else: - loss_q = torch.log( - sigma * math.sqrt(2 * math.pi)) + 0.5 * error**2 + loss_q = torch.log(sigma * math.sqrt(2 * math.pi)) + 0.5 * error**2 loss = nf_loss + loss_q else: @@ -102,7 +97,7 @@ class SmoothL1Loss(nn.Module): loss_weight (float): Weight of the loss. Default: 1.0. """ - def __init__(self, use_target_weight=False, loss_weight=1.): + def __init__(self, use_target_weight=False, loss_weight=1.0): super().__init__() self.criterion = F.smooth_l1_loss self.use_target_weight = use_target_weight @@ -130,8 +125,7 @@ class SmoothL1Loss(nn.Module): for i in range(output.ndim - target_weight.ndim): target_weight = target_weight.unsqueeze(-1) - loss = self.criterion(output * target_weight, - target * target_weight) + loss = self.criterion(output * target_weight, target * target_weight) else: loss = self.criterion(output, target) @@ -148,7 +142,7 @@ class L1LogLoss(nn.Module): loss_weight (float): Weight of the loss. Default: 1.0. """ - def __init__(self, use_target_weight=False, loss_weight=1.): + def __init__(self, use_target_weight=False, loss_weight=1.0): super().__init__() self.criterion = F.smooth_l1_loss self.use_target_weight = use_target_weight @@ -179,8 +173,7 @@ class L1LogLoss(nn.Module): for i in range(output.ndim - target_weight.ndim): target_weight = target_weight.unsqueeze(-1) - loss = self.criterion(output * target_weight, - target * target_weight) + loss = self.criterion(output * target_weight, target * target_weight) else: loss = self.criterion(output, target) @@ -201,23 +194,18 @@ class SoftWeightSmoothL1Loss(nn.Module): loss_weight (float): Weight of the loss. Default: 1.0. """ - def __init__(self, - use_target_weight=False, - supervise_empty=True, - beta=1.0, - loss_weight=1.): + def __init__(self, use_target_weight=False, supervise_empty=True, beta=1.0, loss_weight=1.0): super().__init__() - reduction = 'none' if use_target_weight else 'mean' - self.criterion = partial( - self.smooth_l1_loss, reduction=reduction, beta=beta) + reduction = "none" if use_target_weight else "mean" + self.criterion = partial(self.smooth_l1_loss, reduction=reduction, beta=beta) self.supervise_empty = supervise_empty self.use_target_weight = use_target_weight self.loss_weight = loss_weight @staticmethod - def smooth_l1_loss(input, target, reduction='none', beta=1.0): + def smooth_l1_loss(input, target, reduction="none", beta=1.0): """Re-implement torch.nn.functional.smooth_l1_loss with beta to support pytorch <= 1.6.""" delta = input - target @@ -225,15 +213,14 @@ class SoftWeightSmoothL1Loss(nn.Module): delta[mask] = (delta[mask]).pow(2) / (2 * beta) delta[~mask] = delta[~mask].abs() - beta / 2 - if reduction == 'mean': + if reduction == "mean": return delta.mean() - elif reduction == 'sum': + elif reduction == "sum": return delta.sum() - elif reduction == 'none': + elif reduction == "none": return delta else: - raise ValueError(f'reduction must be \'mean\', \'sum\' or ' - f'\'none\', but got \'{reduction}\'') + raise ValueError(f"reduction must be 'mean', 'sum' or " f"'none', but got '{reduction}'") def forward(self, output, target, target_weight=None): """Forward function. @@ -281,11 +268,7 @@ class WingLoss(nn.Module): loss_weight (float): Weight of the loss. Default: 1.0. """ - def __init__(self, - omega=10.0, - epsilon=2.0, - use_target_weight=False, - loss_weight=1.): + def __init__(self, omega=10.0, epsilon=2.0, use_target_weight=False, loss_weight=1.0): super().__init__() self.omega = omega self.epsilon = epsilon @@ -309,9 +292,7 @@ class WingLoss(nn.Module): target (torch.Tensor[N, K, D]): Target regression. """ delta = (target - pred).abs() - losses = torch.where( - delta < self.omega, - self.omega * torch.log(1.0 + delta / self.epsilon), delta - self.C) + losses = torch.where(delta < self.omega, self.omega * torch.log(1.0 + delta / self.epsilon), delta - self.C) return torch.mean(torch.sum(losses, dim=[1, 2]), dim=0) def forward(self, output, target, target_weight=None): @@ -330,8 +311,7 @@ class WingLoss(nn.Module): """ if self.use_target_weight: assert target_weight is not None - loss = self.criterion(output * target_weight, - target * target_weight) + loss = self.criterion(output * target_weight, target * target_weight) else: loss = self.criterion(output, target) @@ -356,12 +336,7 @@ class SoftWingLoss(nn.Module): loss_weight (float): Weight of the loss. Default: 1.0. """ - def __init__(self, - omega1=2.0, - omega2=20.0, - epsilon=0.5, - use_target_weight=False, - loss_weight=1.): + def __init__(self, omega1=2.0, omega2=20.0, epsilon=0.5, use_target_weight=False, loss_weight=1.0): super().__init__() self.omega1 = omega1 self.omega2 = omega2 @@ -371,8 +346,7 @@ class SoftWingLoss(nn.Module): # constant that smoothly links the piecewise-defined linear # and nonlinear parts - self.B = self.omega1 - self.omega2 * math.log(1.0 + self.omega1 / - self.epsilon) + self.B = self.omega1 - self.omega2 * math.log(1.0 + self.omega1 / self.epsilon) def criterion(self, pred, target): """Criterion of wingloss. @@ -387,9 +361,7 @@ class SoftWingLoss(nn.Module): target (torch.Tensor[N, K, D]): Target regression. """ delta = (target - pred).abs() - losses = torch.where( - delta < self.omega1, delta, - self.omega2 * torch.log(1.0 + delta / self.epsilon) + self.B) + losses = torch.where(delta < self.omega1, delta, self.omega2 * torch.log(1.0 + delta / self.epsilon) + self.B) return torch.mean(torch.sum(losses, dim=[1, 2]), dim=0) def forward(self, output, target, target_weight=None): @@ -408,8 +380,7 @@ class SoftWingLoss(nn.Module): """ if self.use_target_weight: assert target_weight is not None - loss = self.criterion(output * target_weight, - target * target_weight) + loss = self.criterion(output * target_weight, target * target_weight) else: loss = self.criterion(output, target) @@ -426,11 +397,7 @@ class MPJPEVelocityJointLoss(nn.Module): lambda_3d_velocity (float): Factor of the velocity loss. Default: 20.0. """ - def __init__(self, - use_target_weight=False, - loss_weight=1., - lambda_scale=0.5, - lambda_3d_velocity=20.0): + def __init__(self, use_target_weight=False, loss_weight=1.0, lambda_scale=0.5, lambda_3d_velocity=20.0): super().__init__() self.use_target_weight = use_target_weight self.loss_weight = loss_weight @@ -451,45 +418,27 @@ class MPJPEVelocityJointLoss(nn.Module): target_weight (torch.Tensor[N,K,D]): Weights across different joint types. """ - norm_output = torch.mean( - torch.sum(torch.square(output), dim=-1, keepdim=True), - dim=-2, - keepdim=True) - norm_target = torch.mean( - torch.sum(target * output, dim=-1, keepdim=True), - dim=-2, - keepdim=True) + norm_output = torch.mean(torch.sum(torch.square(output), dim=-1, keepdim=True), dim=-2, keepdim=True) + norm_target = torch.mean(torch.sum(target * output, dim=-1, keepdim=True), dim=-2, keepdim=True) velocity_output = output[..., 1:, :, :] - output[..., :-1, :, :] velocity_target = target[..., 1:, :, :] - target[..., :-1, :, :] if self.use_target_weight: assert target_weight is not None - mpjpe = torch.mean( - torch.norm((output - target) * target_weight, dim=-1)) - - nmpjpe = torch.mean( - torch.norm( - (norm_target / norm_output * output - target) * - target_weight, - dim=-1)) - - loss_3d_velocity = torch.mean( - torch.norm( - (velocity_output - velocity_target) * target_weight, - dim=-1)) + mpjpe = torch.mean(torch.norm((output - target) * target_weight, dim=-1)) + + nmpjpe = torch.mean(torch.norm((norm_target / norm_output * output - target) * target_weight, dim=-1)) + + loss_3d_velocity = torch.mean(torch.norm((velocity_output - velocity_target) * target_weight, dim=-1)) else: mpjpe = torch.mean(torch.norm(output - target, dim=-1)) - nmpjpe = torch.mean( - torch.norm( - norm_target / norm_output * output - target, dim=-1)) + nmpjpe = torch.mean(torch.norm(norm_target / norm_output * output - target, dim=-1)) - loss_3d_velocity = torch.mean( - torch.norm(velocity_output - velocity_target, dim=-1)) + loss_3d_velocity = torch.mean(torch.norm(velocity_output - velocity_target, dim=-1)) - loss = mpjpe + nmpjpe * self.lambda_scale + \ - loss_3d_velocity * self.lambda_3d_velocity + loss = mpjpe + nmpjpe * self.lambda_scale + loss_3d_velocity * self.lambda_3d_velocity return loss * self.loss_weight @@ -504,7 +453,7 @@ class MPJPELoss(nn.Module): loss_weight (float): Weight of the loss. Default: 1.0. """ - def __init__(self, use_target_weight=False, loss_weight=1.): + def __init__(self, use_target_weight=False, loss_weight=1.0): super().__init__() self.use_target_weight = use_target_weight self.loss_weight = loss_weight @@ -526,8 +475,7 @@ class MPJPELoss(nn.Module): if self.use_target_weight: assert target_weight is not None - loss = torch.mean( - torch.norm((output - target) * target_weight, dim=-1)) + loss = torch.mean(torch.norm((output - target) * target_weight, dim=-1)) else: loss = torch.mean(torch.norm(output - target, dim=-1)) @@ -538,15 +486,12 @@ class MPJPELoss(nn.Module): class L1Loss(nn.Module): """L1Loss loss.""" - def __init__(self, - reduction='mean', - use_target_weight=False, - loss_weight=1.): + def __init__(self, reduction="mean", use_target_weight=False, loss_weight=1.0): super().__init__() - assert reduction in ('mean', 'sum', 'none'), f'the argument ' \ - f'`reduction` should be either \'mean\', \'sum\' or \'none\', ' \ - f'but got {reduction}' + assert reduction in ("mean", "sum", "none"), ( + f"the argument " f"`reduction` should be either 'mean', 'sum' or 'none', " f"but got {reduction}" + ) self.criterion = partial(F.l1_loss, reduction=reduction) self.use_target_weight = use_target_weight @@ -569,8 +514,7 @@ class L1Loss(nn.Module): assert target_weight is not None for _ in range(target.ndim - target_weight.ndim): target_weight = target_weight.unsqueeze(-1) - loss = self.criterion(output * target_weight, - target * target_weight) + loss = self.criterion(output * target_weight, target * target_weight) else: loss = self.criterion(output, target) @@ -581,7 +525,7 @@ class L1Loss(nn.Module): class MSELoss(nn.Module): """MSE loss for coordinate regression.""" - def __init__(self, use_target_weight=False, loss_weight=1.): + def __init__(self, use_target_weight=False, loss_weight=1.0): super().__init__() self.criterion = F.mse_loss self.use_target_weight = use_target_weight @@ -603,8 +547,7 @@ class MSELoss(nn.Module): if self.use_target_weight: assert target_weight is not None - loss = self.criterion(output * target_weight, - target * target_weight) + loss = self.criterion(output * target_weight, target * target_weight) else: loss = self.criterion(output, target) @@ -622,7 +565,7 @@ class BoneLoss(nn.Module): loss_weight (float): Weight of the loss. Default: 1.0. """ - def __init__(self, joint_parents, use_target_weight=False, loss_weight=1.): + def __init__(self, joint_parents, use_target_weight=False, loss_weight=1.0): super().__init__() self.joint_parents = joint_parents self.use_target_weight = use_target_weight @@ -647,20 +590,13 @@ class BoneLoss(nn.Module): target_weight (torch.Tensor[N, K-1]): Weights across different bone types. """ - output_bone = torch.norm( - output - output[:, self.joint_parents, :], - dim=-1)[:, self.non_root_indices] - target_bone = torch.norm( - target - target[:, self.joint_parents, :], - dim=-1)[:, self.non_root_indices] + output_bone = torch.norm(output - output[:, self.joint_parents, :], dim=-1)[:, self.non_root_indices] + target_bone = torch.norm(target - target[:, self.joint_parents, :], dim=-1)[:, self.non_root_indices] if self.use_target_weight: assert target_weight is not None - loss = torch.mean( - torch.abs((output_bone * target_weight).mean(dim=0) - - (target_bone * target_weight).mean(dim=0))) + loss = torch.mean(torch.abs((output_bone * target_weight).mean(dim=0) - (target_bone * target_weight).mean(dim=0))) else: - loss = torch.mean( - torch.abs(output_bone.mean(dim=0) - target_bone.mean(dim=0))) + loss = torch.mean(torch.abs(output_bone.mean(dim=0) - target_bone.mean(dim=0))) return loss * self.loss_weight @@ -688,16 +624,10 @@ class SemiSupervisionLoss(nn.Module): * warmup_epochs """ - def __init__(self, - joint_parents, - projection_loss_weight=1., - bone_loss_weight=1., - warmup_iterations=0): + def __init__(self, joint_parents, projection_loss_weight=1.0, bone_loss_weight=1.0, warmup_iterations=0): super().__init__() - self.criterion_projection = MPJPELoss( - loss_weight=projection_loss_weight) - self.criterion_bone = BoneLoss( - joint_parents, loss_weight=bone_loss_weight) + self.criterion_projection = MPJPELoss(loss_weight=projection_loss_weight) + self.criterion_bone = BoneLoss(joint_parents, loss_weight=bone_loss_weight) self.warmup_iterations = warmup_iterations self.num_iterations = 0 @@ -720,11 +650,8 @@ class SemiSupervisionLoss(nn.Module): k = intrinsics[..., 4:7] p = intrinsics[..., 7:9] - r2 = torch.sum(_x[:, :, :2]**2, dim=-1, keepdim=True) - radial = 1 + torch.sum( - k * torch.cat((r2, r2**2, r2**3), dim=-1), - dim=-1, - keepdim=True) + r2 = torch.sum(_x[:, :, :2] ** 2, dim=-1, keepdim=True) + radial = 1 + torch.sum(k * torch.cat((r2, r2**2, r2**3), dim=-1), dim=-1, keepdim=True) tan = torch.sum(p * _x, dim=-1, keepdim=True) _x = _x * (radial + tan) + p * r2 _x = f * _x + c @@ -737,22 +664,21 @@ class SemiSupervisionLoss(nn.Module): if self.num_iterations <= self.warmup_iterations: return losses - labeled_pose = output['labeled_pose'] - unlabeled_pose = output['unlabeled_pose'] - unlabeled_traj = output['unlabeled_traj'] - unlabeled_target_2d = target['unlabeled_target_2d'] - intrinsics = target['intrinsics'] + labeled_pose = output["labeled_pose"] + unlabeled_pose = output["unlabeled_pose"] + unlabeled_traj = output["unlabeled_traj"] + unlabeled_target_2d = target["unlabeled_target_2d"] + intrinsics = target["intrinsics"] # projection loss unlabeled_output = unlabeled_pose + unlabeled_traj unlabeled_output_2d = self.project_joints(unlabeled_output, intrinsics) - loss_proj = self.criterion_projection(unlabeled_output_2d, - unlabeled_target_2d, None) - losses['proj_loss'] = loss_proj + loss_proj = self.criterion_projection(unlabeled_output_2d, unlabeled_target_2d, None) + losses["proj_loss"] = loss_proj # bone loss loss_bone = self.criterion_bone(unlabeled_pose, labeled_pose, None) - losses['bone_loss'] = loss_bone + losses["bone_loss"] = loss_bone return losses @@ -784,22 +710,18 @@ class OKSLoss(nn.Module): with number of visible keypoints. Defaults to False. """ - def __init__(self, - metainfo: Optional[str] = None, - reduction='mean', - mode='linear', - eps=1e-8, - norm_target_weight=False, - loss_weight=1.): + def __init__( + self, metainfo: Optional[str] = None, reduction="mean", mode="linear", eps=1e-8, norm_target_weight=False, loss_weight=1.0 + ): super().__init__() - assert reduction in ('mean', 'sum', 'none'), f'the argument ' \ - f'`reduction` should be either \'mean\', \'sum\' or \'none\', ' \ - f'but got {reduction}' + assert reduction in ("mean", "sum", "none"), ( + f"the argument " f"`reduction` should be either 'mean', 'sum' or 'none', " f"but got {reduction}" + ) - assert mode in ('linear', 'square', 'log'), f'the argument ' \ - f'`reduction` should be either \'linear\', \'square\' or ' \ - f'\'log\', but got {mode}' + assert mode in ("linear", "square", "log"), ( + f"the argument " f"`reduction` should be either 'linear', 'square' or " f"'log', but got {mode}" + ) self.reduction = reduction self.loss_weight = loss_weight @@ -809,9 +731,9 @@ class OKSLoss(nn.Module): if metainfo is not None: metainfo = parse_pose_metainfo(dict(from_file=metainfo)) - sigmas = metainfo.get('sigmas', None) + sigmas = metainfo.get("sigmas", None) if sigmas is not None: - self.register_buffer('sigmas', torch.as_tensor(sigmas)) + self.register_buffer("sigmas", torch.as_tensor(sigmas)) def forward(self, output, target, target_weight=None, areas=None): """Forward function. @@ -830,33 +752,32 @@ class OKSLoss(nn.Module): dist = torch.norm(output - target, dim=-1) if areas is not None: dist = dist / areas.pow(0.5).clip(min=self.eps).unsqueeze(-1) - if hasattr(self, 'sigmas'): - sigmas = self.sigmas.reshape(*((1, ) * (dist.ndim - 1)), -1) + if hasattr(self, "sigmas"): + sigmas = self.sigmas.reshape(*((1,) * (dist.ndim - 1)), -1) dist = dist / (sigmas * 2) oks = torch.exp(-dist.pow(2) / 2) if target_weight is not None: if self.norm_target_weight: - target_weight = target_weight / target_weight.sum( - dim=-1, keepdims=True).clip(min=self.eps) + target_weight = target_weight / target_weight.sum(dim=-1, keepdims=True).clip(min=self.eps) else: target_weight = target_weight / target_weight.size(-1) oks = oks * target_weight oks = oks.sum(dim=-1) - if self.mode == 'linear': + if self.mode == "linear": loss = 1 - oks - elif self.mode == 'square': + elif self.mode == "square": loss = 1 - oks.pow(2) - elif self.mode == 'log': + elif self.mode == "log": loss = -oks.log() else: raise NotImplementedError() - if self.reduction == 'sum': + if self.reduction == "sum": loss = loss.sum() - elif self.reduction == 'mean': + elif self.reduction == "mean": loss = loss.mean() return loss * self.loss_weight diff --git a/mmpose/models/necks/__init__.py b/mmpose/models/necks/__init__.py index 90d68013d5f3ce92b61372430a4b7f02f1bedcd0..ecd9a50902084975b06c5b769b889b0940442901 100644 --- a/mmpose/models/necks/__init__.py +++ b/mmpose/models/necks/__init__.py @@ -9,6 +9,12 @@ from .posewarper_neck import PoseWarperNeck from .yolox_pafpn import YOLOXPAFPN __all__ = [ - 'GlobalAveragePooling', 'PoseWarperNeck', 'FPN', 'FeatureMapProcessor', - 'ChannelMapper', 'YOLOXPAFPN', 'CSPNeXtPAFPN', 'HybridEncoder' + "GlobalAveragePooling", + "PoseWarperNeck", + "FPN", + "FeatureMapProcessor", + "ChannelMapper", + "YOLOXPAFPN", + "CSPNeXtPAFPN", + "HybridEncoder", ] diff --git a/mmpose/models/necks/channel_mapper.py b/mmpose/models/necks/channel_mapper.py index 4d4148a08903f94a18abaf8aec804aceb9e2ea21..ce63e2fa416804102a512412542cb92c15594e71 100644 --- a/mmpose/models/necks/channel_mapper.py +++ b/mmpose/models/necks/channel_mapper.py @@ -54,11 +54,10 @@ class ChannelMapper(BaseModule): kernel_size: int = 3, conv_cfg: OptConfigType = None, norm_cfg: OptConfigType = None, - act_cfg: OptConfigType = dict(type='ReLU'), + act_cfg: OptConfigType = dict(type="ReLU"), num_outs: int = None, - bias: Union[bool, str] = 'auto', - init_cfg: OptMultiConfig = dict( - type='Xavier', layer='Conv2d', distribution='uniform') + bias: Union[bool, str] = "auto", + init_cfg: OptMultiConfig = dict(type="Xavier", layer="Conv2d", distribution="uniform"), ) -> None: super().__init__(init_cfg=init_cfg) assert isinstance(in_channels, list) @@ -76,7 +75,9 @@ class ChannelMapper(BaseModule): padding=(kernel_size - 1) // 2, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg)) + act_cfg=act_cfg, + ) + ) if num_outs > len(in_channels): self.extra_convs = nn.ModuleList() for i in range(len(in_channels), num_outs): @@ -86,15 +87,9 @@ class ChannelMapper(BaseModule): in_channel = out_channels self.extra_convs.append( ConvModule( - in_channel, - out_channels, - 3, - stride=2, - padding=1, - bias=bias, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg)) + in_channel, out_channels, 3, stride=2, padding=1, bias=bias, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg + ) + ) def forward(self, inputs: Tuple[Tensor]) -> Tuple[Tensor]: """Forward function.""" diff --git a/mmpose/models/necks/cspnext_pafpn.py b/mmpose/models/necks/cspnext_pafpn.py index 35f4dc2f10df1f36fe0fef9caffaae59edb66c5d..64dec46d006495d7da865c3e61360ab08108c779 100644 --- a/mmpose/models/necks/cspnext_pafpn.py +++ b/mmpose/models/necks/cspnext_pafpn.py @@ -10,6 +10,7 @@ from torch import Tensor from mmpose.registry import MODELS from mmpose.utils.typing import ConfigType, OptMultiConfig + from ..utils import CSPLayer @@ -51,17 +52,13 @@ class CSPNeXtPAFPN(BaseModule): num_csp_blocks: int = 3, use_depthwise: bool = False, expand_ratio: float = 0.5, - upsample_cfg: ConfigType = dict(scale_factor=2, mode='nearest'), + upsample_cfg: ConfigType = dict(scale_factor=2, mode="nearest"), conv_cfg: bool = None, - norm_cfg: ConfigType = dict(type='BN', momentum=0.03, eps=0.001), - act_cfg: ConfigType = dict(type='Swish'), + norm_cfg: ConfigType = dict(type="BN", momentum=0.03, eps=0.001), + act_cfg: ConfigType = dict(type="Swish"), init_cfg: OptMultiConfig = dict( - type='Kaiming', - layer='Conv2d', - a=math.sqrt(5), - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu') + type="Kaiming", layer="Conv2d", a=math.sqrt(5), distribution="uniform", mode="fan_in", nonlinearity="leaky_relu" + ), ) -> None: super().__init__(init_cfg) self.in_channels = in_channels @@ -76,13 +73,8 @@ class CSPNeXtPAFPN(BaseModule): self.top_down_blocks = nn.ModuleList() for idx in range(len(in_channels) - 1, 0, -1): self.reduce_layers.append( - ConvModule( - in_channels[idx], - in_channels[idx - 1], - 1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg)) + ConvModule(in_channels[idx], in_channels[idx - 1], 1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) + ) self.top_down_blocks.append( CSPLayer( in_channels[idx - 1] * 2, @@ -94,22 +86,17 @@ class CSPNeXtPAFPN(BaseModule): expand_ratio=expand_ratio, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg)) + act_cfg=act_cfg, + ) + ) # build bottom-up blocks self.downsamples = nn.ModuleList() self.bottom_up_blocks = nn.ModuleList() for idx in range(len(in_channels) - 1): self.downsamples.append( - conv( - in_channels[idx], - in_channels[idx], - 3, - stride=2, - padding=1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg)) + conv(in_channels[idx], in_channels[idx], 3, stride=2, padding=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) + ) self.bottom_up_blocks.append( CSPLayer( in_channels[idx] * 2, @@ -121,28 +108,17 @@ class CSPNeXtPAFPN(BaseModule): expand_ratio=expand_ratio, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg)) + act_cfg=act_cfg, + ) + ) if self.out_channels is not None: self.out_convs = nn.ModuleList() for i in range(len(in_channels)): self.out_convs.append( - conv( - in_channels[i], - out_channels, - 3, - padding=1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg)) - self.out_convs = conv( - in_channels[-1], - out_channels, - 3, - padding=1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg) + conv(in_channels[i], out_channels, 3, padding=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) + ) + self.out_convs = conv(in_channels[-1], out_channels, 3, padding=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) def forward(self, inputs: Tuple[Tensor, ...]) -> Tuple[Tensor, ...]: """ @@ -159,14 +135,12 @@ class CSPNeXtPAFPN(BaseModule): for idx in range(len(self.in_channels) - 1, 0, -1): feat_high = inner_outs[0] feat_low = inputs[idx - 1] - feat_high = self.reduce_layers[len(self.in_channels) - 1 - idx]( - feat_high) + feat_high = self.reduce_layers[len(self.in_channels) - 1 - idx](feat_high) inner_outs[0] = feat_high upsample_feat = self.upsample(feat_high) - inner_out = self.top_down_blocks[len(self.in_channels) - 1 - idx]( - torch.cat([upsample_feat, feat_low], 1)) + inner_out = self.top_down_blocks[len(self.in_channels) - 1 - idx](torch.cat([upsample_feat, feat_low], 1)) inner_outs.insert(0, inner_out) # bottom-up path @@ -175,8 +149,7 @@ class CSPNeXtPAFPN(BaseModule): feat_low = outs[-1] feat_high = inner_outs[idx + 1] downsample_feat = self.downsamples[idx](feat_low) - out = self.bottom_up_blocks[idx]( - torch.cat([downsample_feat, feat_high], 1)) + out = self.bottom_up_blocks[idx](torch.cat([downsample_feat, feat_high], 1)) outs.append(out) if self.out_channels is not None: diff --git a/mmpose/models/necks/fmap_proc_neck.py b/mmpose/models/necks/fmap_proc_neck.py index 2c3a4d7bf44ab07641a4968f143e17c19b24743b..289f3f9ea82435be696b843beab9ce1c41a2e36e 100644 --- a/mmpose/models/necks/fmap_proc_neck.py +++ b/mmpose/models/necks/fmap_proc_neck.py @@ -40,20 +40,16 @@ class FeatureMapProcessor(nn.Module): super().__init__() if isinstance(select_index, int): - select_index = (select_index, ) + select_index = (select_index,) self.select_index = select_index self.concat = concat - assert ( - scale_factor > 0 - ), f'the argument `scale_factor` must be positive, ' \ - f'but got {scale_factor}' + assert scale_factor > 0, f"the argument `scale_factor` must be positive, " f"but got {scale_factor}" self.scale_factor = scale_factor self.apply_relu = apply_relu self.align_corners = align_corners - def forward(self, inputs: Union[Tensor, Sequence[Tensor]] - ) -> Union[Tensor, List[Tensor]]: + def forward(self, inputs: Union[Tensor, Sequence[Tensor]]) -> Union[Tensor, List[Tensor]]: if not isinstance(inputs, (tuple, list)): sequential_input = False @@ -80,13 +76,7 @@ class FeatureMapProcessor(nn.Module): def _concat(self, inputs: Sequence[Tensor]) -> List[Tensor]: size = inputs[0].shape[-2:] - resized_inputs = [ - resize( - x, - size=size, - mode='bilinear', - align_corners=self.align_corners) for x in inputs - ] + resized_inputs = [resize(x, size=size, mode="bilinear", align_corners=self.align_corners) for x in inputs] return [torch.cat(resized_inputs, dim=1)] def _rescale(self, inputs: Sequence[Tensor]) -> List[Tensor]: @@ -94,8 +84,9 @@ class FeatureMapProcessor(nn.Module): resize( x, scale_factor=self.scale_factor, - mode='bilinear', + mode="bilinear", align_corners=self.align_corners, - ) for x in inputs + ) + for x in inputs ] return rescaled_inputs diff --git a/mmpose/models/necks/fpn.py b/mmpose/models/necks/fpn.py index d4d3311bda792898dd1bc7ef9b9462db7b01ce05..62cc136c7c05589ac0bb90f4b5d29f9bde4c737c 100644 --- a/mmpose/models/necks/fpn.py +++ b/mmpose/models/necks/fpn.py @@ -58,19 +58,21 @@ class FPN(nn.Module): outputs[3].shape = torch.Size([1, 11, 43, 43]) """ - def __init__(self, - in_channels, - out_channels, - num_outs, - start_level=0, - end_level=-1, - add_extra_convs=False, - relu_before_extra_convs=False, - no_norm_on_lateral=False, - conv_cfg=None, - norm_cfg=None, - act_cfg=None, - upsample_cfg=dict(mode='nearest')): + def __init__( + self, + in_channels, + out_channels, + num_outs, + start_level=0, + end_level=-1, + add_extra_convs=False, + relu_before_extra_convs=False, + no_norm_on_lateral=False, + conv_cfg=None, + norm_cfg=None, + act_cfg=None, + upsample_cfg=dict(mode="nearest"), + ): super().__init__() assert isinstance(in_channels, list) self.in_channels = in_channels @@ -96,9 +98,9 @@ class FPN(nn.Module): assert isinstance(add_extra_convs, (str, bool)) if isinstance(add_extra_convs, str): # Extra_convs_source choices: 'on_input', 'on_lateral', 'on_output' - assert add_extra_convs in ('on_input', 'on_lateral', 'on_output') + assert add_extra_convs in ("on_input", "on_lateral", "on_output") elif add_extra_convs: # True - self.add_extra_convs = 'on_input' + self.add_extra_convs = "on_input" self.lateral_convs = nn.ModuleList() self.fpn_convs = nn.ModuleList() @@ -111,16 +113,11 @@ class FPN(nn.Module): conv_cfg=conv_cfg, norm_cfg=norm_cfg if not self.no_norm_on_lateral else None, act_cfg=act_cfg, - inplace=False) + inplace=False, + ) fpn_conv = ConvModule( - out_channels, - out_channels, - 3, - padding=1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg, - inplace=False) + out_channels, out_channels, 3, padding=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg, inplace=False + ) self.lateral_convs.append(l_conv) self.fpn_convs.append(fpn_conv) @@ -129,57 +126,43 @@ class FPN(nn.Module): extra_levels = num_outs - self.backbone_end_level + self.start_level if self.add_extra_convs and extra_levels >= 1: for i in range(extra_levels): - if i == 0 and self.add_extra_convs == 'on_input': + if i == 0 and self.add_extra_convs == "on_input": in_channels = self.in_channels[self.backbone_end_level - 1] else: in_channels = out_channels extra_fpn_conv = ConvModule( - in_channels, - out_channels, - 3, - stride=2, - padding=1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg, - inplace=False) + in_channels, out_channels, 3, stride=2, padding=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg, inplace=False + ) self.fpn_convs.append(extra_fpn_conv) def init_weights(self): """Initialize model weights.""" for m in self.modules(): if isinstance(m, nn.Conv2d): - xavier_init(m, distribution='uniform') + xavier_init(m, distribution="uniform") def forward(self, inputs): """Forward function.""" assert len(inputs) == len(self.in_channels) # build laterals - laterals = [ - lateral_conv(inputs[i + self.start_level]) - for i, lateral_conv in enumerate(self.lateral_convs) - ] + laterals = [lateral_conv(inputs[i + self.start_level]) for i, lateral_conv in enumerate(self.lateral_convs)] # build top-down path used_backbone_levels = len(laterals) for i in range(used_backbone_levels - 1, 0, -1): # In some cases, fixing `scale factor` (e.g. 2) is preferred, but # it cannot co-exist with `size` in `F.interpolate`. - if 'scale_factor' in self.upsample_cfg: + if "scale_factor" in self.upsample_cfg: # fix runtime error of "+=" inplace operation in PyTorch 1.10 - laterals[i - 1] = laterals[i - 1] + F.interpolate( - laterals[i], **self.upsample_cfg) + laterals[i - 1] = laterals[i - 1] + F.interpolate(laterals[i], **self.upsample_cfg) else: prev_shape = laterals[i - 1].shape[2:] - laterals[i - 1] = laterals[i - 1] + F.interpolate( - laterals[i], size=prev_shape, **self.upsample_cfg) + laterals[i - 1] = laterals[i - 1] + F.interpolate(laterals[i], size=prev_shape, **self.upsample_cfg) # build outputs # part 1: from original levels - outs = [ - self.fpn_convs[i](laterals[i]) for i in range(used_backbone_levels) - ] + outs = [self.fpn_convs[i](laterals[i]) for i in range(used_backbone_levels)] # part 2: add extra levels if self.num_outs > len(outs): # use max pool to get more levels on top of outputs @@ -189,11 +172,11 @@ class FPN(nn.Module): outs.append(F.max_pool2d(outs[-1], 1, stride=2)) # add conv layers on top of original feature maps (RetinaNet) else: - if self.add_extra_convs == 'on_input': + if self.add_extra_convs == "on_input": extra_source = inputs[self.backbone_end_level - 1] - elif self.add_extra_convs == 'on_lateral': + elif self.add_extra_convs == "on_lateral": extra_source = laterals[-1] - elif self.add_extra_convs == 'on_output': + elif self.add_extra_convs == "on_output": extra_source = outs[-1] else: raise NotImplementedError diff --git a/mmpose/models/necks/gap_neck.py b/mmpose/models/necks/gap_neck.py index 58ce5d939ffdeb912a02e8b1823ab073cbc3d9e3..4ee790353e02b3073b3eb2d4566e3982c0526b45 100644 --- a/mmpose/models/necks/gap_neck.py +++ b/mmpose/models/necks/gap_neck.py @@ -26,8 +26,7 @@ class GlobalAveragePooling(nn.Module): if isinstance(inputs, tuple): outs = tuple([self.gap(x) for x in inputs]) - outs = tuple( - [out.view(x.size(0), -1) for out, x in zip(outs, inputs)]) + outs = tuple([out.view(x.size(0), -1) for out, x in zip(outs, inputs)]) elif isinstance(inputs, list): outs = [self.gap(x) for x in inputs] outs = [out.view(x.size(0), -1) for out, x in zip(outs, inputs)] @@ -35,5 +34,5 @@ class GlobalAveragePooling(nn.Module): outs = self.gap(inputs) outs = outs.view(inputs.size(0), -1) else: - raise TypeError('neck inputs should be tuple or torch.tensor') + raise TypeError("neck inputs should be tuple or torch.tensor") return outs diff --git a/mmpose/models/necks/hybrid_encoder.py b/mmpose/models/necks/hybrid_encoder.py index 6d9db8d1b8855ed5acf49ce22b46de4d2804b489..d11b382f32c0f0d07638e9b7f22fb930107f6bb1 100644 --- a/mmpose/models/necks/hybrid_encoder.py +++ b/mmpose/models/necks/hybrid_encoder.py @@ -8,8 +8,7 @@ from mmcv.cnn import ConvModule from mmengine.model import BaseModule, ModuleList from torch import Tensor -from mmpose.models.utils import (DetrTransformerEncoder, RepVGGBlock, - SinePositionalEncoding) +from mmpose.models.utils import DetrTransformerEncoder, RepVGGBlock, SinePositionalEncoding from mmpose.registry import MODELS from mmpose.utils.typing import ConfigType, OptConfigType @@ -32,39 +31,23 @@ class CSPRepLayer(BaseModule): Defaults to SiLU (Swish) with in-place operation. """ - def __init__(self, - in_channels: int, - out_channels: int, - num_blocks: int = 3, - widen_factor: float = 1.0, - norm_cfg: OptConfigType = dict(type='BN', requires_grad=True), - act_cfg: OptConfigType = dict(type='SiLU', inplace=True)): + def __init__( + self, + in_channels: int, + out_channels: int, + num_blocks: int = 3, + widen_factor: float = 1.0, + norm_cfg: OptConfigType = dict(type="BN", requires_grad=True), + act_cfg: OptConfigType = dict(type="SiLU", inplace=True), + ): super(CSPRepLayer, self).__init__() hidden_channels = int(out_channels * widen_factor) - self.conv1 = ConvModule( - in_channels, - hidden_channels, - kernel_size=1, - norm_cfg=norm_cfg, - act_cfg=act_cfg) - self.conv2 = ConvModule( - in_channels, - hidden_channels, - kernel_size=1, - norm_cfg=norm_cfg, - act_cfg=act_cfg) - - self.bottlenecks = nn.Sequential(*[ - RepVGGBlock(hidden_channels, hidden_channels, act_cfg=act_cfg) - for _ in range(num_blocks) - ]) + self.conv1 = ConvModule(in_channels, hidden_channels, kernel_size=1, norm_cfg=norm_cfg, act_cfg=act_cfg) + self.conv2 = ConvModule(in_channels, hidden_channels, kernel_size=1, norm_cfg=norm_cfg, act_cfg=act_cfg) + + self.bottlenecks = nn.Sequential(*[RepVGGBlock(hidden_channels, hidden_channels, act_cfg=act_cfg) for _ in range(num_blocks)]) if hidden_channels != out_channels: - self.conv3 = ConvModule( - hidden_channels, - out_channels, - kernel_size=1, - norm_cfg=norm_cfg, - act_cfg=act_cfg) + self.conv3 = ConvModule(hidden_channels, out_channels, kernel_size=1, norm_cfg=norm_cfg, act_cfg=act_cfg) else: self.conv3 = nn.Identity() @@ -121,21 +104,23 @@ class HybridEncoder(BaseModule): .. _`RT-DETR`: https://arxiv.org/abs/2304.08069 """ - def __init__(self, - encoder_cfg: ConfigType = dict(), - projector: OptConfigType = None, - num_encoder_layers: int = 1, - in_channels: List[int] = [512, 1024, 2048], - feat_strides: List[int] = [8, 16, 32], - hidden_dim: int = 256, - use_encoder_idx: List[int] = [2], - pe_temperature: int = 10000, - widen_factor: float = 1.0, - deepen_factor: float = 1.0, - spe_learnable: bool = False, - output_indices: Optional[List[int]] = None, - norm_cfg: OptConfigType = dict(type='BN', requires_grad=True), - act_cfg: OptConfigType = dict(type='SiLU', inplace=True)): + def __init__( + self, + encoder_cfg: ConfigType = dict(), + projector: OptConfigType = None, + num_encoder_layers: int = 1, + in_channels: List[int] = [512, 1024, 2048], + feat_strides: List[int] = [8, 16, 32], + hidden_dim: int = 256, + use_encoder_idx: List[int] = [2], + pe_temperature: int = 10000, + widen_factor: float = 1.0, + deepen_factor: float = 1.0, + spe_learnable: bool = False, + output_indices: Optional[List[int]] = None, + norm_cfg: OptConfigType = dict(type="BN", requires_grad=True), + act_cfg: OptConfigType = dict(type="SiLU", inplace=True), + ): super(HybridEncoder, self).__init__() self.in_channels = in_channels self.feat_strides = feat_strides @@ -148,48 +133,21 @@ class HybridEncoder(BaseModule): # channel projection self.input_proj = ModuleList() for in_channel in in_channels: - self.input_proj.append( - ConvModule( - in_channel, - hidden_dim, - kernel_size=1, - padding=0, - norm_cfg=norm_cfg, - act_cfg=None)) + self.input_proj.append(ConvModule(in_channel, hidden_dim, kernel_size=1, padding=0, norm_cfg=norm_cfg, act_cfg=None)) # encoder transformer if len(use_encoder_idx) > 0: pos_enc_dim = self.hidden_dim // 2 - self.encoder = ModuleList([ - DetrTransformerEncoder(num_encoder_layers, encoder_cfg) - for _ in range(len(use_encoder_idx)) - ]) + self.encoder = ModuleList([DetrTransformerEncoder(num_encoder_layers, encoder_cfg) for _ in range(len(use_encoder_idx))]) - self.sincos_pos_enc = SinePositionalEncoding( - pos_enc_dim, - learnable=spe_learnable, - temperature=self.pe_temperature, - spatial_dim=2) + self.sincos_pos_enc = SinePositionalEncoding(pos_enc_dim, learnable=spe_learnable, temperature=self.pe_temperature, spatial_dim=2) # top-down fpn lateral_convs = list() fpn_blocks = list() for idx in range(len(in_channels) - 1, 0, -1): - lateral_convs.append( - ConvModule( - hidden_dim, - hidden_dim, - 1, - 1, - norm_cfg=norm_cfg, - act_cfg=act_cfg)) - fpn_blocks.append( - CSPRepLayer( - hidden_dim * 2, - hidden_dim, - round(3 * deepen_factor), - act_cfg=act_cfg, - widen_factor=widen_factor)) + lateral_convs.append(ConvModule(hidden_dim, hidden_dim, 1, 1, norm_cfg=norm_cfg, act_cfg=act_cfg)) + fpn_blocks.append(CSPRepLayer(hidden_dim * 2, hidden_dim, round(3 * deepen_factor), act_cfg=act_cfg, widen_factor=widen_factor)) self.lateral_convs = ModuleList(lateral_convs) self.fpn_blocks = ModuleList(fpn_blocks) @@ -197,22 +155,8 @@ class HybridEncoder(BaseModule): downsample_convs = list() pan_blocks = list() for idx in range(len(in_channels) - 1): - downsample_convs.append( - ConvModule( - hidden_dim, - hidden_dim, - 3, - stride=2, - padding=1, - norm_cfg=norm_cfg, - act_cfg=act_cfg)) - pan_blocks.append( - CSPRepLayer( - hidden_dim * 2, - hidden_dim, - round(3 * deepen_factor), - act_cfg=act_cfg, - widen_factor=widen_factor)) + downsample_convs.append(ConvModule(hidden_dim, hidden_dim, 3, stride=2, padding=1, norm_cfg=norm_cfg, act_cfg=act_cfg)) + pan_blocks.append(CSPRepLayer(hidden_dim * 2, hidden_dim, round(3 * deepen_factor), act_cfg=act_cfg, widen_factor=widen_factor)) self.downsample_convs = ModuleList(downsample_convs) self.pan_blocks = ModuleList(pan_blocks) @@ -225,41 +169,33 @@ class HybridEncoder(BaseModule): """Forward function.""" assert len(inputs) == len(self.in_channels) - proj_feats = [ - self.input_proj[i](inputs[i]) for i in range(len(inputs)) - ] + proj_feats = [self.input_proj[i](inputs[i]) for i in range(len(inputs))] # encoder if self.num_encoder_layers > 0: for i, enc_ind in enumerate(self.use_encoder_idx): h, w = proj_feats[enc_ind].shape[2:] # flatten [B, C, H, W] to [B, HxW, C] - src_flatten = proj_feats[enc_ind].flatten(2).permute( - 0, 2, 1).contiguous() + src_flatten = proj_feats[enc_ind].flatten(2).permute(0, 2, 1).contiguous() if torch.onnx.is_in_onnx_export(): - pos_enc = getattr(self, f'pos_enc_{i}') + pos_enc = getattr(self, f"pos_enc_{i}") else: pos_enc = self.sincos_pos_enc(size=(h, w)) pos_enc = pos_enc.transpose(-1, -2).reshape(1, h * w, -1) - memory = self.encoder[i]( - src_flatten, query_pos=pos_enc, key_padding_mask=None) + memory = self.encoder[i](src_flatten, query_pos=pos_enc, key_padding_mask=None) - proj_feats[enc_ind] = memory.permute( - 0, 2, 1).contiguous().view([-1, self.hidden_dim, h, w]) + proj_feats[enc_ind] = memory.permute(0, 2, 1).contiguous().view([-1, self.hidden_dim, h, w]) # top-down fpn inner_outs = [proj_feats[-1]] for idx in range(len(self.in_channels) - 1, 0, -1): feat_high = inner_outs[0] feat_low = proj_feats[idx - 1] - feat_high = self.lateral_convs[len(self.in_channels) - 1 - idx]( - feat_high) + feat_high = self.lateral_convs[len(self.in_channels) - 1 - idx](feat_high) inner_outs[0] = feat_high - upsample_feat = F.interpolate( - feat_high, scale_factor=2., mode='nearest') - inner_out = self.fpn_blocks[len(self.in_channels) - 1 - idx]( - torch.cat([upsample_feat, feat_low], axis=1)) + upsample_feat = F.interpolate(feat_high, scale_factor=2.0, mode="nearest") + inner_out = self.fpn_blocks[len(self.in_channels) - 1 - idx](torch.cat([upsample_feat, feat_low], axis=1)) inner_outs.insert(0, inner_out) # bottom-up pan @@ -268,8 +204,7 @@ class HybridEncoder(BaseModule): feat_low = outs[-1] feat_high = inner_outs[idx + 1] downsample_feat = self.downsample_convs[idx](feat_low) # Conv - out = self.pan_blocks[idx]( # CSPRepLayer - torch.cat([downsample_feat, feat_high], axis=1)) + out = self.pan_blocks[idx](torch.cat([downsample_feat, feat_high], axis=1)) # CSPRepLayer outs.append(out) if self.output_indices is not None: @@ -283,16 +218,16 @@ class HybridEncoder(BaseModule): def switch_to_deploy(self, test_cfg): """Switch to deploy mode.""" - if getattr(self, 'deploy', False): + if getattr(self, "deploy", False): return if self.num_encoder_layers > 0: for i, enc_ind in enumerate(self.use_encoder_idx): - h, w = test_cfg['input_size'] - h = int(h / 2**(3 + enc_ind)) - w = int(w / 2**(3 + enc_ind)) + h, w = test_cfg["input_size"] + h = int(h / 2 ** (3 + enc_ind)) + w = int(w / 2 ** (3 + enc_ind)) pos_enc = self.sincos_pos_enc(size=(h, w)) pos_enc = pos_enc.transpose(-1, -2).reshape(1, h * w, -1) - self.register_buffer(f'pos_enc_{i}', pos_enc) + self.register_buffer(f"pos_enc_{i}", pos_enc) self.deploy = True diff --git a/mmpose/models/necks/posewarper_neck.py b/mmpose/models/necks/posewarper_neck.py index 517fabd2e839878e7cf692c91adad450f432e8f0..f3eb0b267dd520902669cae3c34ff5feb119a223 100644 --- a/mmpose/models/necks/posewarper_neck.py +++ b/mmpose/models/necks/posewarper_neck.py @@ -9,10 +9,12 @@ from torch.nn.modules.batchnorm import _BatchNorm from mmpose.models.utils.ops import resize from mmpose.registry import MODELS + from ..backbones.resnet import BasicBlock, Bottleneck try: from mmcv.ops import DeformConv2d + has_mmcv_full = True except (ImportError, ModuleNotFoundError): has_mmcv_full = False @@ -62,24 +64,27 @@ class PoseWarperNeck(nn.Module): im2col_step (int): the argument `im2col_step` in deformable conv, Default: 80. """ - blocks_dict = {'BASIC': BasicBlock, 'BOTTLENECK': Bottleneck} - minimum_mmcv_version = '1.3.17' - - def __init__(self, - in_channels, - out_channels, - inner_channels, - deform_groups=17, - dilations=(3, 6, 12, 18, 24), - trans_conv_kernel=1, - res_blocks_cfg=None, - offsets_kernel=3, - deform_conv_kernel=3, - in_index=0, - input_transform=None, - freeze_trans_layer=True, - norm_eval=False, - im2col_step=80): + + blocks_dict = {"BASIC": BasicBlock, "BOTTLENECK": Bottleneck} + minimum_mmcv_version = "1.3.17" + + def __init__( + self, + in_channels, + out_channels, + inner_channels, + deform_groups=17, + dilations=(3, 6, 12, 18, 24), + trans_conv_kernel=1, + res_blocks_cfg=None, + offsets_kernel=3, + deform_conv_kernel=3, + in_index=0, + input_transform=None, + freeze_trans_layer=True, + norm_eval=False, + im2col_step=80, + ): super().__init__() self.in_channels = in_channels self.out_channels = out_channels @@ -112,41 +117,35 @@ class PoseWarperNeck(nn.Module): self.trans_layer = nn.Identity() else: self.trans_layer = build_conv_layer( - cfg=dict(type='Conv2d'), + cfg=dict(type="Conv2d"), in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=1, - padding=padding) + padding=padding, + ) # build chain of residual blocks if res_blocks_cfg is not None and not isinstance(res_blocks_cfg, dict): - raise TypeError('res_blocks_cfg should be dict or None.') + raise TypeError("res_blocks_cfg should be dict or None.") if res_blocks_cfg is None: - block_type = 'BASIC' + block_type = "BASIC" num_blocks = 20 else: - block_type = res_blocks_cfg.get('block', 'BASIC') - num_blocks = res_blocks_cfg.get('num_blocks', 20) + block_type = res_blocks_cfg.get("block", "BASIC") + num_blocks = res_blocks_cfg.get("num_blocks", 20) block = self.blocks_dict[block_type] res_layers = [] downsample = nn.Sequential( build_conv_layer( - cfg=dict(type='Conv2d'), - in_channels=out_channels, - out_channels=inner_channels, - kernel_size=1, - stride=1, - bias=False), - build_norm_layer(dict(type='BN'), inner_channels)[1]) - res_layers.append( - block( - in_channels=out_channels, - out_channels=inner_channels, - downsample=downsample)) + cfg=dict(type="Conv2d"), in_channels=out_channels, out_channels=inner_channels, kernel_size=1, stride=1, bias=False + ), + build_norm_layer(dict(type="BN"), inner_channels)[1], + ) + res_layers.append(block(in_channels=out_channels, out_channels=inner_channels, downsample=downsample)) for _ in range(1, num_blocks): res_layers.append(block(inner_channels, inner_channels)) @@ -154,14 +153,13 @@ class PoseWarperNeck(nn.Module): # build offset layers self.num_offset_layers = len(dilations) - assert self.num_offset_layers > 0, 'Number of offset layers ' \ - 'should be larger than 0.' + assert self.num_offset_layers > 0, "Number of offset layers " "should be larger than 0." target_offset_channels = 2 * offsets_kernel**2 * deform_groups offset_layers = [ build_conv_layer( - cfg=dict(type='Conv2d'), + cfg=dict(type="Conv2d"), in_channels=inner_channels, out_channels=target_offset_channels, kernel_size=offsets_kernel, @@ -169,17 +167,18 @@ class PoseWarperNeck(nn.Module): dilation=dilations[i], padding=dilations[i], bias=False, - ) for i in range(self.num_offset_layers) + ) + for i in range(self.num_offset_layers) ] self.offset_layers = nn.ModuleList(offset_layers) # build deformable conv layers - assert digit_version(mmcv.__version__) >= \ - digit_version(self.minimum_mmcv_version), \ - f'Current MMCV version: {mmcv.__version__}, ' \ - f'but MMCV >= {self.minimum_mmcv_version} is required, see ' \ - f'https://github.com/open-mmlab/mmcv/issues/1440, ' \ - f'Please install the latest MMCV.' + assert digit_version(mmcv.__version__) >= digit_version(self.minimum_mmcv_version), ( + f"Current MMCV version: {mmcv.__version__}, " + f"but MMCV >= {self.minimum_mmcv_version} is required, see " + f"https://github.com/open-mmlab/mmcv/issues/1440, " + f"Please install the latest MMCV." + ) if has_mmcv_full: deform_conv_layers = [ @@ -192,11 +191,11 @@ class PoseWarperNeck(nn.Module): dilation=dilations[i], deform_groups=deform_groups, im2col_step=self.im2col_step, - ) for i in range(self.num_offset_layers) + ) + for i in range(self.num_offset_layers) ] else: - raise ImportError('Please install the full version of mmcv ' - 'to use `DeformConv2d`.') + raise ImportError("Please install the full version of mmcv " "to use `DeformConv2d`.") self.deform_conv_layers = nn.ModuleList(deform_conv_layers) @@ -216,18 +215,11 @@ class PoseWarperNeck(nn.Module): elif isinstance(m, (_BatchNorm, nn.GroupNorm)): constant_init(m, 1) elif isinstance(m, DeformConv2d): - filler = torch.zeros([ - m.weight.size(0), - m.weight.size(1), - m.weight.size(2), - m.weight.size(3) - ], - dtype=torch.float32, - device=m.weight.device) + filler = torch.zeros( + [m.weight.size(0), m.weight.size(1), m.weight.size(2), m.weight.size(3)], dtype=torch.float32, device=m.weight.device + ) for k in range(m.weight.size(0)): - filler[k, k, - int(m.weight.size(2) / 2), - int(m.weight.size(3) / 2)] = 1.0 + filler[k, k, int(m.weight.size(2) / 2), int(m.weight.size(3) / 2)] = 1.0 m.weight = torch.nn.Parameter(filler) m.weight.requires_grad = True @@ -247,17 +239,13 @@ class PoseWarperNeck(nn.Module): if not isinstance(inputs, list): return inputs - if self.input_transform == 'resize_concat': + if self.input_transform == "resize_concat": inputs = [inputs[i] for i in self.in_index] upsampled_inputs = [ - resize( - input=x, - size=inputs[0].shape[2:], - mode='bilinear', - align_corners=self.align_corners) for x in inputs + resize(input=x, size=inputs[0].shape[2:], mode="bilinear", align_corners=self.align_corners) for x in inputs ] inputs = torch.cat(upsampled_inputs, dim=1) - elif self.input_transform == 'multiple_select': + elif self.input_transform == "multiple_select": inputs = [inputs[i] for i in self.in_index] else: inputs = inputs[self.in_index] @@ -265,9 +253,9 @@ class PoseWarperNeck(nn.Module): return inputs def forward(self, inputs, frame_weight): - assert isinstance(inputs, (list, tuple)), 'PoseWarperNeck inputs ' \ - 'should be list or tuple, even though the length is 1, ' \ - 'for unified processing.' + assert isinstance(inputs, (list, tuple)), ( + "PoseWarperNeck inputs " "should be list or tuple, even though the length is 1, " "for unified processing." + ) output_heatmap = 0 if len(inputs) > 1: @@ -275,20 +263,16 @@ class PoseWarperNeck(nn.Module): inputs = [self.trans_layer(input) for input in inputs] # calculate difference features - diff_features = [ - self.offset_feats(inputs[0] - input) for input in inputs - ] + diff_features = [self.offset_feats(inputs[0] - input) for input in inputs] for i in range(len(inputs)): if frame_weight[i] == 0: continue warped_heatmap = 0 for j in range(self.num_offset_layers): - offset = (self.offset_layers[j](diff_features[i])) - warped_heatmap_tmp = self.deform_conv_layers[j](inputs[i], - offset) - warped_heatmap += warped_heatmap_tmp / \ - self.num_offset_layers + offset = self.offset_layers[j](diff_features[i]) + warped_heatmap_tmp = self.deform_conv_layers[j](inputs[i], offset) + warped_heatmap += warped_heatmap_tmp / self.num_offset_layers output_heatmap += warped_heatmap * frame_weight[i] @@ -314,8 +298,7 @@ class PoseWarperNeck(nn.Module): for i in range(num_frames): if frame_weight[i] == 0: continue - output_heatmap += warped_heatmap[i * batch_size:(i + 1) * - batch_size] * frame_weight[i] + output_heatmap += warped_heatmap[i * batch_size : (i + 1) * batch_size] * frame_weight[i] return output_heatmap diff --git a/mmpose/models/necks/yolox_pafpn.py b/mmpose/models/necks/yolox_pafpn.py index adc4cfffa304fce6c62d969f2bf66ef83b4f626d..084b168653d473c2bea19527d781099383aa3597 100644 --- a/mmpose/models/necks/yolox_pafpn.py +++ b/mmpose/models/necks/yolox_pafpn.py @@ -7,6 +7,7 @@ from mmcv.cnn import ConvModule, DepthwiseSeparableConvModule from mmengine.model import BaseModule from mmpose.registry import MODELS + from ..utils import CSPLayer @@ -32,22 +33,18 @@ class YOLOXPAFPN(BaseModule): Default: None. """ - def __init__(self, - in_channels, - out_channels, - num_csp_blocks=3, - use_depthwise=False, - upsample_cfg=dict(scale_factor=2, mode='nearest'), - conv_cfg=None, - norm_cfg=dict(type='BN', momentum=0.03, eps=0.001), - act_cfg=dict(type='Swish'), - init_cfg=dict( - type='Kaiming', - layer='Conv2d', - a=math.sqrt(5), - distribution='uniform', - mode='fan_in', - nonlinearity='leaky_relu')): + def __init__( + self, + in_channels, + out_channels, + num_csp_blocks=3, + use_depthwise=False, + upsample_cfg=dict(scale_factor=2, mode="nearest"), + conv_cfg=None, + norm_cfg=dict(type="BN", momentum=0.03, eps=0.001), + act_cfg=dict(type="Swish"), + init_cfg=dict(type="Kaiming", layer="Conv2d", a=math.sqrt(5), distribution="uniform", mode="fan_in", nonlinearity="leaky_relu"), + ): super(YOLOXPAFPN, self).__init__(init_cfg) self.in_channels = in_channels self.out_channels = out_channels @@ -60,13 +57,8 @@ class YOLOXPAFPN(BaseModule): self.top_down_blocks = nn.ModuleList() for idx in range(len(in_channels) - 1, 0, -1): self.reduce_layers.append( - ConvModule( - in_channels[idx], - in_channels[idx - 1], - 1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg)) + ConvModule(in_channels[idx], in_channels[idx - 1], 1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) + ) self.top_down_blocks.append( CSPLayer( in_channels[idx - 1] * 2, @@ -76,22 +68,17 @@ class YOLOXPAFPN(BaseModule): use_depthwise=use_depthwise, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg)) + act_cfg=act_cfg, + ) + ) # build bottom-up blocks self.downsamples = nn.ModuleList() self.bottom_up_blocks = nn.ModuleList() for idx in range(len(in_channels) - 1): self.downsamples.append( - conv( - in_channels[idx], - in_channels[idx], - 3, - stride=2, - padding=1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg)) + conv(in_channels[idx], in_channels[idx], 3, stride=2, padding=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) + ) self.bottom_up_blocks.append( CSPLayer( in_channels[idx] * 2, @@ -101,18 +88,13 @@ class YOLOXPAFPN(BaseModule): use_depthwise=use_depthwise, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg)) + act_cfg=act_cfg, + ) + ) self.out_convs = nn.ModuleList() for i in range(len(in_channels)): - self.out_convs.append( - ConvModule( - in_channels[i], - out_channels, - 1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg)) + self.out_convs.append(ConvModule(in_channels[i], out_channels, 1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg)) def forward(self, inputs): """ @@ -129,14 +111,12 @@ class YOLOXPAFPN(BaseModule): for idx in range(len(self.in_channels) - 1, 0, -1): feat_heigh = inner_outs[0] feat_low = inputs[idx - 1] - feat_heigh = self.reduce_layers[len(self.in_channels) - 1 - idx]( - feat_heigh) + feat_heigh = self.reduce_layers[len(self.in_channels) - 1 - idx](feat_heigh) inner_outs[0] = feat_heigh upsample_feat = self.upsample(feat_heigh) - inner_out = self.top_down_blocks[len(self.in_channels) - 1 - idx]( - torch.cat([upsample_feat, feat_low], 1)) + inner_out = self.top_down_blocks[len(self.in_channels) - 1 - idx](torch.cat([upsample_feat, feat_low], 1)) inner_outs.insert(0, inner_out) # bottom-up path @@ -145,8 +125,7 @@ class YOLOXPAFPN(BaseModule): feat_low = outs[-1] feat_height = inner_outs[idx + 1] downsample_feat = self.downsamples[idx](feat_low) - out = self.bottom_up_blocks[idx]( - torch.cat([downsample_feat, feat_height], 1)) + out = self.bottom_up_blocks[idx](torch.cat([downsample_feat, feat_height], 1)) outs.append(out) # out convs diff --git a/mmpose/models/pose_estimators/__init__.py b/mmpose/models/pose_estimators/__init__.py index c5287e0c2caa617f88ec5a0ff538478e5e562a0b..cef6285ea8321a81455abb374dbbaa9ddcb2f5fa 100644 --- a/mmpose/models/pose_estimators/__init__.py +++ b/mmpose/models/pose_estimators/__init__.py @@ -3,4 +3,4 @@ from .bottomup import BottomupPoseEstimator from .pose_lifter import PoseLifter from .topdown import TopdownPoseEstimator -__all__ = ['TopdownPoseEstimator', 'BottomupPoseEstimator', 'PoseLifter'] +__all__ = ["TopdownPoseEstimator", "BottomupPoseEstimator", "PoseLifter"] diff --git a/mmpose/models/pose_estimators/base.py b/mmpose/models/pose_estimators/base.py index 216f592fda1be26a6d7441aec339a96956ee19bb..784e6234d9b02b7ac1c0fad2b72de76f0c52a8b8 100644 --- a/mmpose/models/pose_estimators/base.py +++ b/mmpose/models/pose_estimators/base.py @@ -11,9 +11,7 @@ from torch import Tensor from mmpose.datasets.datasets.utils import parse_pose_metainfo from mmpose.models.utils import check_and_update_config from mmpose.registry import MODELS -from mmpose.utils.typing import (ConfigType, ForwardResults, OptConfigType, - Optional, OptMultiConfig, OptSampleList, - SampleList) +from mmpose.utils.typing import ConfigType, ForwardResults, OptConfigType, Optional, OptMultiConfig, OptSampleList, SampleList class BasePoseEstimator(BaseModel, metaclass=ABCMeta): @@ -32,20 +30,22 @@ class BasePoseEstimator(BaseModel, metaclass=ABCMeta): prepare_datasets.html#create-a-custom-dataset-info- config-file-for-the-dataset. Defaults to ``None`` """ + _version = 2 - def __init__(self, - backbone: ConfigType, - neck: OptConfigType = None, - head: OptConfigType = None, - train_cfg: OptConfigType = None, - test_cfg: OptConfigType = None, - data_preprocessor: OptConfigType = None, - use_syncbn: bool = False, - init_cfg: OptMultiConfig = None, - metainfo: Optional[dict] = None): - super().__init__( - data_preprocessor=data_preprocessor, init_cfg=init_cfg) + def __init__( + self, + backbone: ConfigType, + neck: OptConfigType = None, + head: OptConfigType = None, + train_cfg: OptConfigType = None, + test_cfg: OptConfigType = None, + data_preprocessor: OptConfigType = None, + use_syncbn: bool = False, + init_cfg: OptMultiConfig = None, + metainfo: Optional[dict] = None, + ): + super().__init__(data_preprocessor=data_preprocessor, init_cfg=init_cfg) self.metainfo = self._load_metainfo(metainfo) self.train_cfg = train_cfg if train_cfg else {} self.test_cfg = test_cfg if test_cfg else {} @@ -71,27 +71,26 @@ class BasePoseEstimator(BaseModel, metaclass=ABCMeta): # TODO๏ผš Waiting for mmengine support if use_syncbn and get_world_size() > 1: torch.nn.SyncBatchNorm.convert_sync_batchnorm(self) - print_log('Using SyncBatchNorm()', 'current') + print_log("Using SyncBatchNorm()", "current") def switch_to_deploy(self): """Switch the sub-modules to deploy mode.""" for name, layer in self.named_modules(): if layer == self: continue - if callable(getattr(layer, 'switch_to_deploy', None)): - print_log(f'module {name} has been switched to deploy mode', - 'current') + if callable(getattr(layer, "switch_to_deploy", None)): + print_log(f"module {name} has been switched to deploy mode", "current") layer.switch_to_deploy(self.test_cfg) @property def with_neck(self) -> bool: """bool: whether the pose estimator has a neck.""" - return hasattr(self, 'neck') and self.neck is not None + return hasattr(self, "neck") and self.neck is not None @property def with_head(self) -> bool: """bool: whether the pose estimator has a head.""" - return hasattr(self, 'head') and self.head is not None + return hasattr(self, "head") and self.head is not None @staticmethod def _load_metainfo(metainfo: dict = None) -> dict: @@ -108,16 +107,12 @@ class BasePoseEstimator(BaseModel, metaclass=ABCMeta): return None if not isinstance(metainfo, dict): - raise TypeError( - f'metainfo should be a dict, but got {type(metainfo)}') + raise TypeError(f"metainfo should be a dict, but got {type(metainfo)}") metainfo = parse_pose_metainfo(metainfo) return metainfo - def forward(self, - inputs: torch.Tensor, - data_samples: OptSampleList, - mode: str = 'tensor') -> ForwardResults: + def forward(self, inputs: torch.Tensor, data_samples: OptSampleList, mode: str = "tensor") -> ForwardResults: """The unified entry for a forward process in both training and test. The method should accept three modes: 'tensor', 'predict' and 'loss': @@ -151,19 +146,18 @@ class BasePoseEstimator(BaseModel, metaclass=ABCMeta): """ if isinstance(inputs, list): inputs = torch.stack(inputs) - if mode == 'loss': + if mode == "loss": return self.loss(inputs, data_samples) - elif mode == 'predict': + elif mode == "predict": # use customed metainfo to override the default metainfo if self.metainfo is not None: for data_sample in data_samples: data_sample.set_metainfo(self.metainfo) return self.predict(inputs, data_samples) - elif mode == 'tensor': + elif mode == "tensor": return self._forward(inputs) else: - raise RuntimeError(f'Invalid mode "{mode}". ' - 'Only supports loss, predict and tensor mode.') + raise RuntimeError(f'Invalid mode "{mode}". ' "Only supports loss, predict and tensor mode.") @abstractmethod def loss(self, inputs: Tensor, data_samples: SampleList) -> dict: @@ -174,10 +168,7 @@ class BasePoseEstimator(BaseModel, metaclass=ABCMeta): """Predict results from a batch of inputs and data samples with post- processing.""" - def _forward(self, - inputs: Tensor, - data_samples: OptSampleList = None - ) -> Union[Tensor, Tuple[Tensor]]: + def _forward(self, inputs: Tensor, data_samples: OptSampleList = None) -> Union[Tensor, Tuple[Tensor]]: """Network forward process. Usually includes backbone, neck and head forward without any post-processing. @@ -210,8 +201,7 @@ class BasePoseEstimator(BaseModel, metaclass=ABCMeta): return x - def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, - **kwargs): + def _load_state_dict_pre_hook(self, state_dict, prefix, local_meta, *args, **kwargs): """A hook function to. 1) convert old-version state dict of @@ -230,16 +220,16 @@ class BasePoseEstimator(BaseModel, metaclass=ABCMeta): # remove the keys in data_preprocessor to avoid warning for k in keys: - if k in ('data_preprocessor.mean', 'data_preprocessor.std'): + if k in ("data_preprocessor.mean", "data_preprocessor.std"): del state_dict[k] - version = local_meta.get('version', None) + version = local_meta.get("version", None) if version and version >= self._version: return # convert old-version state dict for k in keys: - if 'keypoint_head' in k: + if "keypoint_head" in k: v = state_dict.pop(k) - k = k.replace('keypoint_head', 'head') + k = k.replace("keypoint_head", "head") state_dict[k] = v diff --git a/mmpose/models/pose_estimators/bottomup.py b/mmpose/models/pose_estimators/bottomup.py index 7b82980a13a286f480f1616c9fc89d3ad2577196..6531d0cd2075f04494b1a88abca0f737edc5fdc3 100644 --- a/mmpose/models/pose_estimators/bottomup.py +++ b/mmpose/models/pose_estimators/bottomup.py @@ -6,8 +6,8 @@ from mmengine.utils import is_list_of from torch import Tensor from mmpose.registry import MODELS -from mmpose.utils.typing import (ConfigType, InstanceList, OptConfigType, - OptMultiConfig, PixelDataList, SampleList) +from mmpose.utils.typing import ConfigType, InstanceList, OptConfigType, OptMultiConfig, PixelDataList, SampleList + from .base import BasePoseEstimator @@ -31,15 +31,17 @@ class BottomupPoseEstimator(BasePoseEstimator): Defaults to ``None`` """ - def __init__(self, - backbone: ConfigType, - neck: OptConfigType = None, - head: OptConfigType = None, - train_cfg: OptConfigType = None, - test_cfg: OptConfigType = None, - use_syncbn: bool = False, - data_preprocessor: OptConfigType = None, - init_cfg: OptMultiConfig = None): + def __init__( + self, + backbone: ConfigType, + neck: OptConfigType = None, + head: OptConfigType = None, + train_cfg: OptConfigType = None, + test_cfg: OptConfigType = None, + use_syncbn: bool = False, + data_preprocessor: OptConfigType = None, + init_cfg: OptMultiConfig = None, + ): super().__init__( backbone=backbone, neck=neck, @@ -48,7 +50,8 @@ class BottomupPoseEstimator(BasePoseEstimator): test_cfg=test_cfg, use_syncbn=use_syncbn, data_preprocessor=data_preprocessor, - init_cfg=init_cfg) + init_cfg=init_cfg, + ) def loss(self, inputs: Tensor, data_samples: SampleList) -> dict: """Calculate losses from a batch of inputs and data samples. @@ -66,13 +69,11 @@ class BottomupPoseEstimator(BasePoseEstimator): losses = dict() if self.with_head: - losses.update( - self.head.loss(feats, data_samples, train_cfg=self.train_cfg)) + losses.update(self.head.loss(feats, data_samples, train_cfg=self.train_cfg)) return losses - def predict(self, inputs: Union[Tensor, List[Tensor]], - data_samples: SampleList) -> SampleList: + def predict(self, inputs: Union[Tensor, List[Tensor]], data_samples: SampleList) -> SampleList: """Predict results from a batch of inputs and data samples with post- processing. @@ -95,14 +96,13 @@ class BottomupPoseEstimator(BasePoseEstimator): - keypoint_scores (Tensor): predicted keypoint scores in shape (num_instances, K) """ - assert self.with_head, ( - 'The model must have head to perform prediction.') + assert self.with_head, "The model must have head to perform prediction." - multiscale_test = self.test_cfg.get('multiscale_test', False) - flip_test = self.test_cfg.get('flip_test', False) + multiscale_test = self.test_cfg.get("multiscale_test", False) + flip_test = self.test_cfg.get("flip_test", False) # enable multi-scale test - aug_scales = data_samples[0].metainfo.get('aug_scales', None) + aug_scales = data_samples[0].metainfo.get("aug_scales", None) if multiscale_test: assert isinstance(aug_scales, list) assert is_list_of(inputs, Tensor) @@ -135,14 +135,13 @@ class BottomupPoseEstimator(BasePoseEstimator): batch_pred_instances = preds batch_pred_fields = None - results = self.add_pred_to_datasample(batch_pred_instances, - batch_pred_fields, data_samples) + results = self.add_pred_to_datasample(batch_pred_instances, batch_pred_fields, data_samples) return results - def add_pred_to_datasample(self, batch_pred_instances: InstanceList, - batch_pred_fields: Optional[PixelDataList], - batch_data_samples: SampleList) -> SampleList: + def add_pred_to_datasample( + self, batch_pred_instances: InstanceList, batch_pred_fields: Optional[PixelDataList], batch_data_samples: SampleList + ) -> SampleList: """Add predictions into data samples. Args: @@ -162,26 +161,21 @@ class BottomupPoseEstimator(BasePoseEstimator): if batch_pred_fields is None: batch_pred_fields = [] - for pred_instances, pred_fields, data_sample in zip_longest( - batch_pred_instances, batch_pred_fields, batch_data_samples): + for pred_instances, pred_fields, data_sample in zip_longest(batch_pred_instances, batch_pred_fields, batch_data_samples): - input_size = data_sample.metainfo['input_size'] - input_center = data_sample.metainfo['input_center'] - input_scale = data_sample.metainfo['input_scale'] + input_size = data_sample.metainfo["input_size"] + input_center = data_sample.metainfo["input_center"] + input_scale = data_sample.metainfo["input_scale"] # convert keypoint coordinates from input space to image space - pred_instances.keypoints = pred_instances.keypoints / input_size \ - * input_scale + input_center - 0.5 * input_scale - if 'keypoints_visible' not in pred_instances: - pred_instances.keypoints_visible = \ - pred_instances.keypoint_scores + pred_instances.keypoints = pred_instances.keypoints / input_size * input_scale + input_center - 0.5 * input_scale + if "keypoints_visible" not in pred_instances: + pred_instances.keypoints_visible = pred_instances.keypoint_scores # convert bbox coordinates from input space to image space - if 'bboxes' in pred_instances: - bboxes = pred_instances.bboxes.reshape( - pred_instances.bboxes.shape[0], 2, 2) - bboxes = bboxes / input_size * input_scale + input_center \ - - 0.5 * input_scale + if "bboxes" in pred_instances: + bboxes = pred_instances.bboxes.reshape(pred_instances.bboxes.shape[0], 2, 2) + bboxes = bboxes / input_size * input_scale + input_center - 0.5 * input_scale pred_instances.bboxes = bboxes.reshape(bboxes.shape[0], 4) data_sample.pred_instances = pred_instances diff --git a/mmpose/models/pose_estimators/pose_lifter.py b/mmpose/models/pose_estimators/pose_lifter.py index ec8401d1a2bf2e425e2e106763d115a432de50fd..7e756fe9edbc73e55ea47d860b52c9c932459bff 100644 --- a/mmpose/models/pose_estimators/pose_lifter.py +++ b/mmpose/models/pose_estimators/pose_lifter.py @@ -8,9 +8,8 @@ from torch import Tensor from mmpose.models.utils import check_and_update_config from mmpose.models.utils.tta import flip_coordinates from mmpose.registry import MODELS -from mmpose.utils.typing import (ConfigType, InstanceList, OptConfigType, - Optional, OptMultiConfig, OptSampleList, - PixelDataList, SampleList) +from mmpose.utils.typing import ConfigType, InstanceList, OptConfigType, Optional, OptMultiConfig, OptSampleList, PixelDataList, SampleList + from .base import BasePoseEstimator @@ -47,19 +46,21 @@ class PoseLifter(BasePoseEstimator): config-file-for-the-dataset. Defaults to ``None`` """ - def __init__(self, - backbone: ConfigType, - neck: OptConfigType = None, - head: OptConfigType = None, - traj_backbone: OptConfigType = None, - traj_neck: OptConfigType = None, - traj_head: OptConfigType = None, - semi_loss: OptConfigType = None, - train_cfg: OptConfigType = None, - test_cfg: OptConfigType = None, - data_preprocessor: OptConfigType = None, - init_cfg: OptMultiConfig = None, - metainfo: Optional[dict] = None): + def __init__( + self, + backbone: ConfigType, + neck: OptConfigType = None, + head: OptConfigType = None, + traj_backbone: OptConfigType = None, + traj_neck: OptConfigType = None, + traj_head: OptConfigType = None, + semi_loss: OptConfigType = None, + train_cfg: OptConfigType = None, + test_cfg: OptConfigType = None, + data_preprocessor: OptConfigType = None, + init_cfg: OptMultiConfig = None, + metainfo: Optional[dict] = None, + ): super().__init__( backbone=backbone, neck=neck, @@ -68,7 +69,8 @@ class PoseLifter(BasePoseEstimator): test_cfg=test_cfg, data_preprocessor=data_preprocessor, init_cfg=init_cfg, - metainfo=metainfo) + metainfo=metainfo, + ) # trajectory model self.share_backbone = False @@ -82,8 +84,7 @@ class PoseLifter(BasePoseEstimator): # The following function automatically detects outdated # configurations and updates them accordingly, while also providing # clear and concise information on the changes made. - traj_neck, traj_head = check_and_update_config( - traj_neck, traj_head) + traj_neck, traj_head = check_and_update_config(traj_neck, traj_head) if traj_neck is not None: self.traj_neck = MODELS.build(traj_neck) @@ -99,28 +100,27 @@ class PoseLifter(BasePoseEstimator): @property def with_traj_backbone(self): """bool: Whether the pose lifter has trajectory backbone.""" - return hasattr(self, 'traj_backbone') and \ - self.traj_backbone is not None + return hasattr(self, "traj_backbone") and self.traj_backbone is not None @property def with_traj_neck(self): """bool: Whether the pose lifter has trajectory neck.""" - return hasattr(self, 'traj_neck') and self.traj_neck is not None + return hasattr(self, "traj_neck") and self.traj_neck is not None @property def with_traj(self): """bool: Whether the pose lifter has trajectory head.""" - return hasattr(self, 'traj_head') + return hasattr(self, "traj_head") @property def causal(self): """bool: Whether the pose lifter is causal.""" - if hasattr(self.backbone, 'causal'): + if hasattr(self.backbone, "causal"): return self.backbone.causal else: - raise AttributeError('A PoseLifter\'s backbone should have ' - 'the bool attribute "causal" to indicate if' - 'it performs causal inference.') + raise AttributeError( + "A PoseLifter's backbone should have " 'the bool attribute "causal" to indicate if' "it performs causal inference." + ) def extract_feat(self, inputs: Tensor) -> Tuple[Tensor]: """Extract features. @@ -151,10 +151,7 @@ class PoseLifter(BasePoseEstimator): else: return feats - def _forward(self, - inputs: Tensor, - data_samples: OptSampleList = None - ) -> Union[Tensor, Tuple[Tensor]]: + def _forward(self, inputs: Tensor, data_samples: OptSampleList = None) -> Union[Tensor, Tuple[Tensor]]: """Network forward process. Usually includes backbone, neck and head forward without any post-processing. @@ -199,16 +196,13 @@ class PoseLifter(BasePoseEstimator): if self.with_traj: x, traj_x = feats # loss of trajectory model - losses.update( - self.traj_head.loss( - traj_x, data_samples, train_cfg=self.train_cfg)) + losses.update(self.traj_head.loss(traj_x, data_samples, train_cfg=self.train_cfg)) else: x = feats if self.with_head: # loss of pose model - losses.update( - self.head.loss(x, data_samples, train_cfg=self.train_cfg)) + losses.update(self.head.loss(x, data_samples, train_cfg=self.train_cfg)) # TODO: support semi-supervised learning if self.semi_supervised: @@ -243,21 +237,22 @@ class PoseLifter(BasePoseEstimator): - keypoint_scores (Tensor): predicted keypoint scores in shape (num_instances, K) """ - assert self.with_head, ( - 'The model must have head to perform prediction.') + assert self.with_head, "The model must have head to perform prediction." - if self.test_cfg.get('flip_test', False): - flip_indices = data_samples[0].metainfo['flip_indices'] + if self.test_cfg.get("flip_test", False): + flip_indices = data_samples[0].metainfo["flip_indices"] _feats = self.extract_feat(inputs) _feats_flip = self.extract_feat( - torch.stack([ - flip_coordinates( - _input, - flip_indices=flip_indices, - shift_coords=self.test_cfg.get('shift_coords', True), - input_size=(1, 1)) for _input in inputs - ], - dim=0)) + torch.stack( + [ + flip_coordinates( + _input, flip_indices=flip_indices, shift_coords=self.test_cfg.get("shift_coords", True), input_size=(1, 1) + ) + for _input in inputs + ], + dim=0, + ) + ) feats = [_feats, _feats_flip] else: @@ -267,14 +262,12 @@ class PoseLifter(BasePoseEstimator): traj_preds, batch_traj_instances, batch_traj_fields = None, None, None if self.with_traj: x, traj_x = feats - traj_preds = self.traj_head.predict( - traj_x, data_samples, test_cfg=self.test_cfg) + traj_preds = self.traj_head.predict(traj_x, data_samples, test_cfg=self.test_cfg) else: x = feats if self.with_head: - pose_preds = self.head.predict( - x, data_samples, test_cfg=self.test_cfg) + pose_preds = self.head.predict(x, data_samples, test_cfg=self.test_cfg) if isinstance(pose_preds, tuple): batch_pred_instances, batch_pred_fields = pose_preds @@ -286,10 +279,9 @@ class PoseLifter(BasePoseEstimator): else: batch_traj_instances = traj_preds - results = self.add_pred_to_datasample(batch_pred_instances, - batch_pred_fields, - batch_traj_instances, - batch_traj_fields, data_samples) + results = self.add_pred_to_datasample( + batch_pred_instances, batch_pred_fields, batch_traj_instances, batch_traj_fields, data_samples + ) return results @@ -323,23 +315,18 @@ class PoseLifter(BasePoseEstimator): batch_pred_fields, batch_traj_fields = [], [] if batch_traj_instances is None: batch_traj_instances = [] - output_keypoint_indices = self.test_cfg.get('output_keypoint_indices', - None) + output_keypoint_indices = self.test_cfg.get("output_keypoint_indices", None) - for (pred_instances, pred_fields, traj_instances, traj_fields, - data_sample) in zip_longest(batch_pred_instances, - batch_pred_fields, - batch_traj_instances, - batch_traj_fields, - batch_data_samples): + for pred_instances, pred_fields, traj_instances, traj_fields, data_sample in zip_longest( + batch_pred_instances, batch_pred_fields, batch_traj_instances, batch_traj_fields, batch_data_samples + ): if output_keypoint_indices is not None: # select output keypoints with given indices num_keypoints = pred_instances.keypoints.shape[1] for key, value in pred_instances.all_items(): - if key.startswith('keypoint'): - pred_instances.set_field( - value[:, output_keypoint_indices], key) + if key.startswith("keypoint"): + pred_instances.set_field(value[:, output_keypoint_indices], key) data_sample.pred_instances = pred_instances @@ -350,8 +337,7 @@ class PoseLifter(BasePoseEstimator): for key, value in pred_fields.all_items(): if value.shape[0] != num_keypoints: continue - pred_fields.set_field(value[output_keypoint_indices], - key) + pred_fields.set_field(value[output_keypoint_indices], key) data_sample.pred_fields = pred_fields return batch_data_samples diff --git a/mmpose/models/pose_estimators/topdown.py b/mmpose/models/pose_estimators/topdown.py index ce458bc6cfc276e978537c5239bbd853a04e7686..f19e8409ffb13b573450dbd7a94126039dc91ac3 100644 --- a/mmpose/models/pose_estimators/topdown.py +++ b/mmpose/models/pose_estimators/topdown.py @@ -5,8 +5,8 @@ from typing import Optional from torch import Tensor from mmpose.registry import MODELS -from mmpose.utils.typing import (ConfigType, InstanceList, OptConfigType, - OptMultiConfig, PixelDataList, SampleList) +from mmpose.utils.typing import ConfigType, InstanceList, OptConfigType, OptMultiConfig, PixelDataList, SampleList + from .base import BasePoseEstimator @@ -35,16 +35,18 @@ class TopdownPoseEstimator(BasePoseEstimator): config-file-for-the-dataset. Defaults to ``None`` """ - def __init__(self, - backbone: ConfigType, - neck: OptConfigType = None, - head: OptConfigType = None, - train_cfg: OptConfigType = None, - test_cfg: OptConfigType = None, - data_preprocessor: OptConfigType = None, - init_cfg: OptMultiConfig = None, - metainfo: Optional[dict] = None, - freeze_backbone: bool = False): + def __init__( + self, + backbone: ConfigType, + neck: OptConfigType = None, + head: OptConfigType = None, + train_cfg: OptConfigType = None, + test_cfg: OptConfigType = None, + data_preprocessor: OptConfigType = None, + init_cfg: OptMultiConfig = None, + metainfo: Optional[dict] = None, + freeze_backbone: bool = False, + ): super().__init__( backbone=backbone, neck=neck, @@ -53,15 +55,14 @@ class TopdownPoseEstimator(BasePoseEstimator): test_cfg=test_cfg, data_preprocessor=data_preprocessor, init_cfg=init_cfg, - metainfo=metainfo) - + metainfo=metainfo, + ) + # Freeze all params of the backbone if freeze_backbone: - print("Freezing backbone!") for param in self.backbone.parameters(): param.requires_grad = False - def loss(self, inputs: Tensor, data_samples: SampleList) -> dict: """Calculate losses from a batch of inputs and data samples. @@ -78,8 +79,7 @@ class TopdownPoseEstimator(BasePoseEstimator): losses = dict() if self.with_head: - losses.update( - self.head.loss(feats, data_samples, train_cfg=self.train_cfg)) + losses.update(self.head.loss(feats, data_samples, train_cfg=self.train_cfg)) return losses @@ -104,10 +104,9 @@ class TopdownPoseEstimator(BasePoseEstimator): - keypoint_scores (Tensor): predicted keypoint scores in shape (num_instances, K) """ - assert self.with_head, ( - 'The model must have head to perform prediction.') + assert self.with_head, "The model must have head to perform prediction." - if self.test_cfg.get('flip_test', False): + if self.test_cfg.get("flip_test", False): _feats = self.extract_feat(inputs) _feats_flip = self.extract_feat(inputs.flip(-1)) feats = [_feats, _feats_flip] @@ -122,14 +121,13 @@ class TopdownPoseEstimator(BasePoseEstimator): batch_pred_instances = preds batch_pred_fields = None - results = self.add_pred_to_datasample(batch_pred_instances, - batch_pred_fields, data_samples) + results = self.add_pred_to_datasample(batch_pred_instances, batch_pred_fields, data_samples) return results - def add_pred_to_datasample(self, batch_pred_instances: InstanceList, - batch_pred_fields: Optional[PixelDataList], - batch_data_samples: SampleList) -> SampleList: + def add_pred_to_datasample( + self, batch_pred_instances: InstanceList, batch_pred_fields: Optional[PixelDataList], batch_data_samples: SampleList + ) -> SampleList: """Add predictions into data samples. Args: @@ -146,35 +144,31 @@ class TopdownPoseEstimator(BasePoseEstimator): assert len(batch_pred_instances) == len(batch_data_samples) if batch_pred_fields is None: batch_pred_fields = [] - output_keypoint_indices = self.test_cfg.get('output_keypoint_indices', - None) + output_keypoint_indices = self.test_cfg.get("output_keypoint_indices", None) - for pred_instances, pred_fields, data_sample in zip_longest( - batch_pred_instances, batch_pred_fields, batch_data_samples): + for pred_instances, pred_fields, data_sample in zip_longest(batch_pred_instances, batch_pred_fields, batch_data_samples): if pred_instances is None: continue gt_instances = data_sample.gt_instances # convert keypoint coordinates from input space to image space - input_center = data_sample.metainfo['input_center'] - input_scale = data_sample.metainfo['input_scale'] - input_size = data_sample.metainfo['input_size'] + input_center = data_sample.metainfo["input_center"] + input_scale = data_sample.metainfo["input_scale"] + input_size = data_sample.metainfo["input_size"] - pred_instances.keypoints[..., :2] = \ - pred_instances.keypoints[..., :2] / input_size * input_scale \ - + input_center - 0.5 * input_scale - if 'keypoints_visible' not in pred_instances: - pred_instances.keypoints_visible = \ - pred_instances.keypoint_scores + pred_instances.keypoints[..., :2] = ( + pred_instances.keypoints[..., :2] / input_size * input_scale + input_center - 0.5 * input_scale + ) + if "keypoints_visible" not in pred_instances: + pred_instances.keypoints_visible = pred_instances.keypoint_scores if output_keypoint_indices is not None: # select output keypoints with given indices num_keypoints = pred_instances.keypoints.shape[1] for key, value in pred_instances.all_items(): - if key.startswith('keypoint'): - pred_instances.set_field( - value[:, output_keypoint_indices], key) + if key.startswith("keypoint"): + pred_instances.set_field(value[:, output_keypoint_indices], key) # add bbox information into pred_instances pred_instances.bboxes = gt_instances.bboxes @@ -189,8 +183,7 @@ class TopdownPoseEstimator(BasePoseEstimator): for key, value in pred_fields.all_items(): if value.shape[0] != num_keypoints: continue - pred_fields.set_field(value[output_keypoint_indices], - key) + pred_fields.set_field(value[output_keypoint_indices], key) data_sample.pred_fields = pred_fields return batch_data_samples diff --git a/mmpose/models/task_modules/assigners/__init__.py b/mmpose/models/task_modules/assigners/__init__.py index 7b6b006e389ccf9a8bd465d807670b9df5a60de3..e5d3efd3bb9c138c3dbf086bec026b0091c1994a 100644 --- a/mmpose/models/task_modules/assigners/__init__.py +++ b/mmpose/models/task_modules/assigners/__init__.py @@ -2,4 +2,4 @@ from .metric_calculators import BBoxOverlaps2D, PoseOKS from .sim_ota_assigner import SimOTAAssigner -__all__ = ['SimOTAAssigner', 'PoseOKS', 'BBoxOverlaps2D'] +__all__ = ["SimOTAAssigner", "PoseOKS", "BBoxOverlaps2D"] diff --git a/mmpose/models/task_modules/assigners/metric_calculators.py b/mmpose/models/task_modules/assigners/metric_calculators.py index ebf4333b6646a2181f103f2a6f78138aa8ec6e93..65dbb5db97278d003fe6f0a95a1cf5119ad492c7 100644 --- a/mmpose/models/task_modules/assigners/metric_calculators.py +++ b/mmpose/models/task_modules/assigners/metric_calculators.py @@ -9,8 +9,8 @@ from mmpose.registry import TASK_UTILS from mmpose.structures.bbox import bbox_overlaps -def cast_tensor_type(x, scale=1., dtype=None): - if dtype == 'fp16': +def cast_tensor_type(x, scale=1.0, dtype=None): + if dtype == "fp16": # scale is for preventing overflows x = (x / scale).half() return x @@ -20,12 +20,12 @@ def cast_tensor_type(x, scale=1., dtype=None): class BBoxOverlaps2D: """2D Overlaps (e.g. IoUs, GIoUs) Calculator.""" - def __init__(self, scale=1., dtype=None): + def __init__(self, scale=1.0, dtype=None): self.scale = scale self.dtype = dtype @torch.no_grad() - def __call__(self, bboxes1, bboxes2, mode='iou', is_aligned=False): + def __call__(self, bboxes1, bboxes2, mode="iou", is_aligned=False): """Calculate IoU between 2D bboxes. Args: @@ -52,7 +52,7 @@ class BBoxOverlaps2D: if bboxes1.size(-1) == 5: bboxes1 = bboxes1[..., :4] - if self.dtype == 'fp16': + if self.dtype == "fp16": # change tensor type to save cpu and cuda memory and keep speed bboxes1 = cast_tensor_type(bboxes1, self.scale, self.dtype) bboxes2 = cast_tensor_type(bboxes2, self.scale, self.dtype) @@ -66,8 +66,7 @@ class BBoxOverlaps2D: def __repr__(self): """str: a string describing the module""" - repr_str = self.__class__.__name__ + f'(' \ - f'scale={self.scale}, dtype={self.dtype})' + repr_str = self.__class__.__name__ + f"(" f"scale={self.scale}, dtype={self.dtype})" return repr_str @@ -75,34 +74,27 @@ class BBoxOverlaps2D: class PoseOKS: """OKS score Calculator.""" - def __init__(self, - metainfo: Optional[str] = 'configs/_base_/datasets/coco.py'): + def __init__(self, metainfo: Optional[str] = "configs/_base_/datasets/coco.py"): if metainfo is not None: metainfo = parse_pose_metainfo(dict(from_file=metainfo)) - sigmas = metainfo.get('sigmas', None) + sigmas = metainfo.get("sigmas", None) if sigmas is not None: self.sigmas = torch.as_tensor(sigmas) @torch.no_grad() - def __call__(self, - output: Tensor, - target: Tensor, - target_weights: Tensor, - areas: Tensor, - eps: float = 1e-8) -> Tensor: + def __call__(self, output: Tensor, target: Tensor, target_weights: Tensor, areas: Tensor, eps: float = 1e-8) -> Tensor: dist = torch.norm(output - target, dim=-1) - areas = areas.reshape(*((1, ) * (dist.ndim - 2)), -1, 1) + areas = areas.reshape(*((1,) * (dist.ndim - 2)), -1, 1) dist = dist / areas.pow(0.5).clip(min=eps) - if hasattr(self, 'sigmas'): + if hasattr(self, "sigmas"): if self.sigmas.device != dist.device: self.sigmas = self.sigmas.to(dist.device) - sigmas = self.sigmas.reshape(*((1, ) * (dist.ndim - 1)), -1) + sigmas = self.sigmas.reshape(*((1,) * (dist.ndim - 1)), -1) dist = dist / (sigmas * 2) - target_weights = target_weights / target_weights.sum( - dim=-1, keepdims=True).clip(min=eps) + target_weights = target_weights / target_weights.sum(dim=-1, keepdims=True).clip(min=eps) oks = (torch.exp(-dist.pow(2) / 2) * target_weights).sum(dim=-1) return oks diff --git a/mmpose/models/task_modules/assigners/sim_ota_assigner.py b/mmpose/models/task_modules/assigners/sim_ota_assigner.py index b43851cf15a6e2116383304f6becd870fe9b4a27..06e015c44e46ee99c10e35c9d34ee9efdab59c15 100644 --- a/mmpose/models/task_modules/assigners/sim_ota_assigner.py +++ b/mmpose/models/task_modules/assigners/sim_ota_assigner.py @@ -36,34 +36,35 @@ class SimOTAAssigner: Defaults to dict(type='PoseOKS'). """ - def __init__(self, - center_radius: float = 2.5, - candidate_topk: int = 10, - iou_weight: float = 3.0, - cls_weight: float = 1.0, - oks_weight: float = 3.0, - vis_weight: float = 0.0, - dynamic_k_indicator: str = 'iou', - use_keypoints_for_center: bool = False, - iou_calculator: ConfigType = dict(type='BBoxOverlaps2D'), - oks_calculator: ConfigType = dict(type='PoseOKS')): + def __init__( + self, + center_radius: float = 2.5, + candidate_topk: int = 10, + iou_weight: float = 3.0, + cls_weight: float = 1.0, + oks_weight: float = 3.0, + vis_weight: float = 0.0, + dynamic_k_indicator: str = "iou", + use_keypoints_for_center: bool = False, + iou_calculator: ConfigType = dict(type="BBoxOverlaps2D"), + oks_calculator: ConfigType = dict(type="PoseOKS"), + ): self.center_radius = center_radius self.candidate_topk = candidate_topk self.iou_weight = iou_weight self.cls_weight = cls_weight self.oks_weight = oks_weight self.vis_weight = vis_weight - assert dynamic_k_indicator in ('iou', 'oks'), f'the argument ' \ - f'`dynamic_k_indicator` should be either \'iou\' or \'oks\', ' \ - f'but got {dynamic_k_indicator}' + assert dynamic_k_indicator in ("iou", "oks"), ( + f"the argument " f"`dynamic_k_indicator` should be either 'iou' or 'oks', " f"but got {dynamic_k_indicator}" + ) self.dynamic_k_indicator = dynamic_k_indicator self.use_keypoints_for_center = use_keypoints_for_center self.iou_calculator = TASK_UTILS.build(iou_calculator) self.oks_calculator = TASK_UTILS.build(oks_calculator) - def assign(self, pred_instances: InstanceData, gt_instances: InstanceData, - **kwargs) -> dict: + def assign(self, pred_instances: InstanceData, gt_instances: InstanceData, **kwargs) -> dict: """Assign gt to priors using SimOTA. Args: @@ -96,23 +97,14 @@ class SimOTAAssigner: num_bboxes = decoded_bboxes.size(0) # assign 0 by default - assigned_gt_inds = decoded_bboxes.new_full((num_bboxes, ), - 0, - dtype=torch.long) + assigned_gt_inds = decoded_bboxes.new_full((num_bboxes,), 0, dtype=torch.long) if num_gt == 0 or num_bboxes == 0: # No ground truth or boxes, return empty assignment - max_overlaps = decoded_bboxes.new_zeros((num_bboxes, )) - assigned_labels = decoded_bboxes.new_full((num_bboxes, ), - -1, - dtype=torch.long) - return dict( - num_gts=num_gt, - gt_inds=assigned_gt_inds, - max_overlaps=max_overlaps, - labels=assigned_labels) - - valid_mask, is_in_boxes_and_center = self.get_in_gt_and_in_center_info( - priors, gt_bboxes, gt_keypoints, gt_keypoints_visible) + max_overlaps = decoded_bboxes.new_zeros((num_bboxes,)) + assigned_labels = decoded_bboxes.new_full((num_bboxes,), -1, dtype=torch.long) + return dict(num_gts=num_gt, gt_inds=assigned_gt_inds, max_overlaps=max_overlaps, labels=assigned_labels) + + valid_mask, is_in_boxes_and_center = self.get_in_gt_and_in_center_info(priors, gt_bboxes, gt_keypoints, gt_keypoints_visible) valid_decoded_bbox = decoded_bboxes[valid_mask] valid_pred_scores = pred_scores[valid_mask] valid_pred_kpts = keypoints[valid_mask] @@ -121,15 +113,9 @@ class SimOTAAssigner: num_valid = valid_decoded_bbox.size(0) if num_valid == 0: # No valid bboxes, return empty assignment - max_overlaps = decoded_bboxes.new_zeros((num_bboxes, )) - assigned_labels = decoded_bboxes.new_full((num_bboxes, ), - -1, - dtype=torch.long) - return dict( - num_gts=num_gt, - gt_inds=assigned_gt_inds, - max_overlaps=max_overlaps, - labels=assigned_labels) + max_overlaps = decoded_bboxes.new_zeros((num_bboxes,)) + assigned_labels = decoded_bboxes.new_full((num_bboxes,), -1, dtype=torch.long) + return dict(num_gts=num_gt, gt_inds=assigned_gt_inds, max_overlaps=max_overlaps, labels=assigned_labels) cost_matrix = (~is_in_boxes_and_center) * INF @@ -140,12 +126,11 @@ class SimOTAAssigner: cost_matrix = cost_matrix + iou_cost * self.iou_weight # calculate oks - if self.oks_weight > 0 or self.dynamic_k_indicator == 'oks': + if self.oks_weight > 0 or self.dynamic_k_indicator == "oks": pairwise_oks = self.oks_calculator( valid_pred_kpts.unsqueeze(1), # [num_valid, 1, k, 2] target=gt_keypoints.unsqueeze(0), # [1, num_gt, k, 2] - target_weights=gt_keypoints_visible.unsqueeze( - 0), # [1, num_gt, k] + target_weights=gt_keypoints_visible.unsqueeze(0), # [1, num_gt, k] areas=gt_areas.unsqueeze(0), # [1, num_gt] ) # -> [num_valid, num_gt] @@ -154,58 +139,48 @@ class SimOTAAssigner: # calculate cls if self.cls_weight > 0: - gt_onehot_label = ( - F.one_hot(gt_labels.to(torch.int64), - pred_scores.shape[-1]).float().unsqueeze(0).repeat( - num_valid, 1, 1)) - valid_pred_scores = valid_pred_scores.unsqueeze(1).repeat( - 1, num_gt, 1) + gt_onehot_label = F.one_hot(gt_labels.to(torch.int64), pred_scores.shape[-1]).float().unsqueeze(0).repeat(num_valid, 1, 1) + valid_pred_scores = valid_pred_scores.unsqueeze(1).repeat(1, num_gt, 1) # disable AMP autocast to avoid overflow with torch.cuda.amp.autocast(enabled=False): cls_cost = ( F.binary_cross_entropy( valid_pred_scores.to(dtype=torch.float32), gt_onehot_label, - reduction='none', - ).sum(-1).to(dtype=valid_pred_scores.dtype)) + reduction="none", + ) + .sum(-1) + .to(dtype=valid_pred_scores.dtype) + ) cost_matrix = cost_matrix + cls_cost * self.cls_weight # calculate vis if self.vis_weight > 0: - valid_pred_kpts_vis = valid_pred_kpts_vis.unsqueeze(1).repeat( - 1, num_gt, 1) # [num_valid, 1, k] - gt_kpt_vis = gt_keypoints_visible.unsqueeze( - 0).float() # [1, num_gt, k] + valid_pred_kpts_vis = valid_pred_kpts_vis.unsqueeze(1).repeat(1, num_gt, 1) # [num_valid, 1, k] + gt_kpt_vis = gt_keypoints_visible.unsqueeze(0).float() # [1, num_gt, k] with torch.cuda.amp.autocast(enabled=False): vis_cost = ( F.binary_cross_entropy( valid_pred_kpts_vis.to(dtype=torch.float32), gt_kpt_vis.repeat(num_valid, 1, 1), - reduction='none', - ).sum(-1).to(dtype=valid_pred_kpts_vis.dtype)) + reduction="none", + ) + .sum(-1) + .to(dtype=valid_pred_kpts_vis.dtype) + ) cost_matrix = cost_matrix + vis_cost * self.vis_weight - if self.dynamic_k_indicator == 'iou': - matched_pred_ious, matched_gt_inds = \ - self.dynamic_k_matching( - cost_matrix, pairwise_ious, num_gt, valid_mask) - elif self.dynamic_k_indicator == 'oks': - matched_pred_ious, matched_gt_inds = \ - self.dynamic_k_matching( - cost_matrix, pairwise_oks, num_gt, valid_mask) + if self.dynamic_k_indicator == "iou": + matched_pred_ious, matched_gt_inds = self.dynamic_k_matching(cost_matrix, pairwise_ious, num_gt, valid_mask) + elif self.dynamic_k_indicator == "oks": + matched_pred_ious, matched_gt_inds = self.dynamic_k_matching(cost_matrix, pairwise_oks, num_gt, valid_mask) # convert to AssignResult format assigned_gt_inds[valid_mask] = matched_gt_inds + 1 - assigned_labels = assigned_gt_inds.new_full((num_bboxes, ), -1) + assigned_labels = assigned_gt_inds.new_full((num_bboxes,), -1) assigned_labels[valid_mask] = gt_labels[matched_gt_inds].long() - max_overlaps = assigned_gt_inds.new_full((num_bboxes, ), - -INF, - dtype=torch.float32) + max_overlaps = assigned_gt_inds.new_full((num_bboxes,), -INF, dtype=torch.float32) max_overlaps[valid_mask] = matched_pred_ious.to(max_overlaps) - return dict( - num_gts=num_gt, - gt_inds=assigned_gt_inds, - max_overlaps=max_overlaps, - labels=assigned_labels) + return dict(num_gts=num_gt, gt_inds=assigned_gt_inds, max_overlaps=max_overlaps, labels=assigned_labels) def get_in_gt_and_in_center_info( self, @@ -237,9 +212,9 @@ class SimOTAAssigner: gt_cxs = (gt_bboxes[:, 0] + gt_bboxes[:, 2]) / 2.0 gt_cys = (gt_bboxes[:, 1] + gt_bboxes[:, 3]) / 2.0 if self.use_keypoints_for_center and gt_keypoints_visible is not None: - gt_kpts_cts = (gt_keypoints * gt_keypoints_visible.unsqueeze(-1) - ).sum(dim=-2) / gt_keypoints_visible.sum( - dim=-1, keepdims=True).clip(min=0) + gt_kpts_cts = (gt_keypoints * gt_keypoints_visible.unsqueeze(-1)).sum(dim=-2) / gt_keypoints_visible.sum( + dim=-1, keepdims=True + ).clip(min=0) gt_kpts_cts = gt_kpts_cts.to(gt_bboxes) valid_mask = gt_keypoints_visible.sum(-1) > 0 gt_cxs[valid_mask] = gt_kpts_cts[valid_mask][..., 0] @@ -263,14 +238,10 @@ class SimOTAAssigner: is_in_gts_or_centers = is_in_gts_all | is_in_cts_all # both in boxes and centers, shape: [num_fg, num_gt] - is_in_boxes_and_centers = ( - is_in_gts[is_in_gts_or_centers, :] - & is_in_cts[is_in_gts_or_centers, :]) + is_in_boxes_and_centers = is_in_gts[is_in_gts_or_centers, :] & is_in_cts[is_in_gts_or_centers, :] return is_in_gts_or_centers, is_in_boxes_and_centers - def dynamic_k_matching(self, cost: Tensor, pairwise_ious: Tensor, - num_gt: int, - valid_mask: Tensor) -> Tuple[Tensor, Tensor]: + def dynamic_k_matching(self, cost: Tensor, pairwise_ious: Tensor, num_gt: int, valid_mask: Tensor) -> Tuple[Tensor, Tensor]: """Use IoU and matching cost to calculate the dynamic top-k positive targets.""" matching_matrix = torch.zeros_like(cost, dtype=torch.uint8) @@ -280,16 +251,14 @@ class SimOTAAssigner: # calculate dynamic k for each gt dynamic_ks = torch.clamp(topk_ious.sum(0).int(), min=1) for gt_idx in range(num_gt): - _, pos_idx = torch.topk( - cost[:, gt_idx], k=dynamic_ks[gt_idx], largest=False) + _, pos_idx = torch.topk(cost[:, gt_idx], k=dynamic_ks[gt_idx], largest=False) matching_matrix[:, gt_idx][pos_idx] = 1 del topk_ious, dynamic_ks, pos_idx prior_match_gt_mask = matching_matrix.sum(1) > 1 if prior_match_gt_mask.sum() > 0: - cost_min, cost_argmin = torch.min( - cost[prior_match_gt_mask, :], dim=1) + cost_min, cost_argmin = torch.min(cost[prior_match_gt_mask, :], dim=1) matching_matrix[prior_match_gt_mask, :] *= 0 matching_matrix[prior_match_gt_mask, cost_argmin] = 1 # get foreground mask inside box and center prior @@ -297,6 +266,5 @@ class SimOTAAssigner: valid_mask[valid_mask.clone()] = fg_mask_inboxes matched_gt_inds = matching_matrix[fg_mask_inboxes, :].argmax(1) - matched_pred_ious = (matching_matrix * - pairwise_ious).sum(1)[fg_mask_inboxes] + matched_pred_ious = (matching_matrix * pairwise_ious).sum(1)[fg_mask_inboxes] return matched_pred_ious, matched_gt_inds diff --git a/mmpose/models/task_modules/prior_generators/mlvl_point_generator.py b/mmpose/models/task_modules/prior_generators/mlvl_point_generator.py index aed01af7342e152b35b0ba2accbceb076f1cffbe..dcfe234a28b2b866d170a5930ba5a288f768f69a 100644 --- a/mmpose/models/task_modules/prior_generators/mlvl_point_generator.py +++ b/mmpose/models/task_modules/prior_generators/mlvl_point_generator.py @@ -25,10 +25,7 @@ class MlvlPointGenerator: the center of anchors. Defaults to False. """ - def __init__(self, - strides: Union[List[int], List[Tuple[int, int]]], - offset: float = 0.5, - centralize_points: bool = False) -> None: + def __init__(self, strides: Union[List[int], List[Tuple[int, int]]], offset: float = 0.5, centralize_points: bool = False) -> None: self.strides = [_pair(stride) for stride in strides] self.centralize_points = centralize_points self.offset = offset if not centralize_points else 0.0 @@ -44,10 +41,7 @@ class MlvlPointGenerator: on the feature grid""" return [1 for _ in range(len(self.strides))] - def _meshgrid(self, - x: Tensor, - y: Tensor, - row_major: bool = True) -> Tuple[Tensor, Tensor]: + def _meshgrid(self, x: Tensor, y: Tensor, row_major: bool = True) -> Tuple[Tensor, Tensor]: yy, xx = torch.meshgrid(y, x) if row_major: # warning .flatten() would cause error in ONNX exporting @@ -57,11 +51,9 @@ class MlvlPointGenerator: else: return yy.reshape(-1), xx.reshape(-1) - def grid_priors(self, - featmap_sizes: List[Tuple], - dtype: torch.dtype = torch.float32, - device: DeviceType = 'cuda', - with_stride: bool = False) -> List[Tensor]: + def grid_priors( + self, featmap_sizes: List[Tuple], dtype: torch.dtype = torch.float32, device: DeviceType = "cuda", with_stride: bool = False + ) -> List[Tensor]: """Generate grid points of multiple feature levels. Args: @@ -88,21 +80,18 @@ class MlvlPointGenerator: assert self.num_levels == len(featmap_sizes) multi_level_priors = [] for i in range(self.num_levels): - priors = self.single_level_grid_priors( - featmap_sizes[i], - level_idx=i, - dtype=dtype, - device=device, - with_stride=with_stride) + priors = self.single_level_grid_priors(featmap_sizes[i], level_idx=i, dtype=dtype, device=device, with_stride=with_stride) multi_level_priors.append(priors) return multi_level_priors - def single_level_grid_priors(self, - featmap_size: Tuple[int], - level_idx: int, - dtype: torch.dtype = torch.float32, - device: DeviceType = 'cuda', - with_stride: bool = False) -> Tensor: + def single_level_grid_priors( + self, + featmap_size: Tuple[int], + level_idx: int, + dtype: torch.dtype = torch.float32, + device: DeviceType = "cuda", + with_stride: bool = False, + ) -> Tensor: """Generate grid Points of a single level. Note: @@ -130,14 +119,12 @@ class MlvlPointGenerator: """ feat_h, feat_w = featmap_size stride_w, stride_h = self.strides[level_idx] - shift_x = (torch.arange(0, feat_w, device=device) + - self.offset) * stride_w + shift_x = (torch.arange(0, feat_w, device=device) + self.offset) * stride_w # keep featmap_size as Tensor instead of int, so that we # can convert to ONNX correctly shift_x = shift_x.to(dtype) - shift_y = (torch.arange(0, feat_h, device=device) + - self.offset) * stride_h + shift_y = (torch.arange(0, feat_h, device=device) + self.offset) * stride_h # keep featmap_size as Tensor instead of int, so that we # can convert to ONNX correctly shift_y = shift_y.to(dtype) @@ -151,19 +138,13 @@ class MlvlPointGenerator: shifts = torch.stack([shift_xx, shift_yy], dim=-1) else: # use `shape[0]` instead of `len(shift_xx)` for ONNX export - stride_w = shift_xx.new_full((shift_xx.shape[0], ), - stride_w).to(dtype) - stride_h = shift_xx.new_full((shift_yy.shape[0], ), - stride_h).to(dtype) - shifts = torch.stack([shift_xx, shift_yy, stride_w, stride_h], - dim=-1) + stride_w = shift_xx.new_full((shift_xx.shape[0],), stride_w).to(dtype) + stride_h = shift_xx.new_full((shift_yy.shape[0],), stride_h).to(dtype) + shifts = torch.stack([shift_xx, shift_yy, stride_w, stride_h], dim=-1) all_points = shifts.to(device) return all_points - def valid_flags(self, - featmap_sizes: List[Tuple[int, int]], - pad_shape: Tuple[int], - device: DeviceType = 'cuda') -> List[Tensor]: + def valid_flags(self, featmap_sizes: List[Tuple[int, int]], pad_shape: Tuple[int], device: DeviceType = "cuda") -> List[Tensor]: """Generate valid flags of points of multiple feature levels. Args: @@ -186,16 +167,11 @@ class MlvlPointGenerator: h, w = pad_shape[:2] valid_feat_h = min(int(np.ceil(h / point_stride[1])), feat_h) valid_feat_w = min(int(np.ceil(w / point_stride[0])), feat_w) - flags = self.single_level_valid_flags((feat_h, feat_w), - (valid_feat_h, valid_feat_w), - device=device) + flags = self.single_level_valid_flags((feat_h, feat_w), (valid_feat_h, valid_feat_w), device=device) multi_level_flags.append(flags) return multi_level_flags - def single_level_valid_flags(self, - featmap_size: Tuple[int, int], - valid_size: Tuple[int, int], - device: DeviceType = 'cuda') -> Tensor: + def single_level_valid_flags(self, featmap_size: Tuple[int, int], valid_size: Tuple[int, int], device: DeviceType = "cuda") -> Tensor: """Generate the valid flags of points of a single feature map. Args: @@ -221,12 +197,9 @@ class MlvlPointGenerator: valid = valid_xx & valid_yy return valid - def sparse_priors(self, - prior_idxs: Tensor, - featmap_size: Tuple[int], - level_idx: int, - dtype: torch.dtype = torch.float32, - device: DeviceType = 'cuda') -> Tensor: + def sparse_priors( + self, prior_idxs: Tensor, featmap_size: Tuple[int], level_idx: int, dtype: torch.dtype = torch.float32, device: DeviceType = "cuda" + ) -> Tensor: """Generate sparse points according to the ``prior_idxs``. Args: @@ -246,8 +219,7 @@ class MlvlPointGenerator: """ height, width = featmap_size x = (prior_idxs % width + self.offset) * self.strides[level_idx][0] - y = ((prior_idxs // width) % height + - self.offset) * self.strides[level_idx][1] + y = ((prior_idxs // width) % height + self.offset) * self.strides[level_idx][1] prioris = torch.stack([x, y], 1).to(dtype) prioris = prioris.to(device) return prioris diff --git a/mmpose/models/utils/__init__.py b/mmpose/models/utils/__init__.py index 92ad02b36f7da28edae56a76f33109f02d4b68cd..881370b9e1c915b1a3da90ff44a07c7b6421987a 100644 --- a/mmpose/models/utils/__init__.py +++ b/mmpose/models/utils/__init__.py @@ -6,12 +6,22 @@ from .misc import filter_scores_and_topk from .ops import FrozenBatchNorm2d, inverse_sigmoid from .reparam_layers import RepVGGBlock from .rtmcc_block import RTMCCBlock, rope -from .transformer import (DetrTransformerEncoder, GAUEncoder, PatchEmbed, - SinePositionalEncoding, nchw_to_nlc, nlc_to_nchw) +from .transformer import DetrTransformerEncoder, GAUEncoder, PatchEmbed, SinePositionalEncoding, nchw_to_nlc, nlc_to_nchw __all__ = [ - 'PatchEmbed', 'nchw_to_nlc', 'nlc_to_nchw', 'pvt_convert', 'RTMCCBlock', - 'rope', 'check_and_update_config', 'filter_scores_and_topk', 'CSPLayer', - 'FrozenBatchNorm2d', 'inverse_sigmoid', 'GAUEncoder', - 'SinePositionalEncoding', 'RepVGGBlock', 'DetrTransformerEncoder' + "PatchEmbed", + "nchw_to_nlc", + "nlc_to_nchw", + "pvt_convert", + "RTMCCBlock", + "rope", + "check_and_update_config", + "filter_scores_and_topk", + "CSPLayer", + "FrozenBatchNorm2d", + "inverse_sigmoid", + "GAUEncoder", + "SinePositionalEncoding", + "RepVGGBlock", + "DetrTransformerEncoder", ] diff --git a/mmpose/models/utils/check_and_update_config.py b/mmpose/models/utils/check_and_update_config.py index 4cd1efa39b584a08055d470343549349907c1a5c..ad1bde09f3ff41101e78d5d07cb8c2dd5998691c 100644 --- a/mmpose/models/utils/check_and_update_config.py +++ b/mmpose/models/utils/check_and_update_config.py @@ -8,91 +8,89 @@ from mmengine.logging import MMLogger ConfigType = Union[Config, ConfigDict] -def process_input_transform(input_transform: str, head: Dict, head_new: Dict, - head_deleted_dict: Dict, head_append_dict: Dict, - neck_new: Dict, input_index: Tuple[int], - align_corners: bool) -> None: +def process_input_transform( + input_transform: str, + head: Dict, + head_new: Dict, + head_deleted_dict: Dict, + head_append_dict: Dict, + neck_new: Dict, + input_index: Tuple[int], + align_corners: bool, +) -> None: """Process the input_transform field and update head and neck dictionaries.""" - if input_transform == 'resize_concat': - in_channels = head_new.pop('in_channels') - head_deleted_dict['in_channels'] = str(in_channels) + if input_transform == "resize_concat": + in_channels = head_new.pop("in_channels") + head_deleted_dict["in_channels"] = str(in_channels) in_channels = sum([in_channels[i] for i in input_index]) - head_new['in_channels'] = in_channels - head_append_dict['in_channels'] = str(in_channels) + head_new["in_channels"] = in_channels + head_append_dict["in_channels"] = str(in_channels) neck_new.update( dict( - type='FeatureMapProcessor', + type="FeatureMapProcessor", concat=True, select_index=input_index, - )) + ) + ) if align_corners: - neck_new['align_corners'] = align_corners - - elif input_transform == 'select': - if input_index != (-1, ): - neck_new.update( - dict(type='FeatureMapProcessor', select_index=input_index)) - if isinstance(head['in_channels'], tuple): - in_channels = head_new.pop('in_channels') - head_deleted_dict['in_channels'] = str(in_channels) + neck_new["align_corners"] = align_corners + + elif input_transform == "select": + if input_index != (-1,): + neck_new.update(dict(type="FeatureMapProcessor", select_index=input_index)) + if isinstance(head["in_channels"], tuple): + in_channels = head_new.pop("in_channels") + head_deleted_dict["in_channels"] = str(in_channels) if isinstance(input_index, int): in_channels = in_channels[input_index] else: in_channels = tuple([in_channels[i] for i in input_index]) - head_new['in_channels'] = in_channels - head_append_dict['in_channels'] = str(in_channels) + head_new["in_channels"] = in_channels + head_append_dict["in_channels"] = str(in_channels) if align_corners: - neck_new['align_corners'] = align_corners + neck_new["align_corners"] = align_corners else: - raise ValueError(f'model.head get invalid value for argument ' - f'input_transform: {input_transform}') + raise ValueError(f"model.head get invalid value for argument " f"input_transform: {input_transform}") -def process_extra_field(extra: Dict, head_new: Dict, head_deleted_dict: Dict, - head_append_dict: Dict, neck_new: Dict) -> None: +def process_extra_field(extra: Dict, head_new: Dict, head_deleted_dict: Dict, head_append_dict: Dict, neck_new: Dict) -> None: """Process the extra field and update head and neck dictionaries.""" - head_deleted_dict['extra'] = 'dict(' + head_deleted_dict["extra"] = "dict(" for key, value in extra.items(): - head_deleted_dict['extra'] += f'{key}={value},' - head_deleted_dict['extra'] = head_deleted_dict['extra'][:-1] + ')' - if 'final_conv_kernel' in extra: - kernel_size = extra['final_conv_kernel'] + head_deleted_dict["extra"] += f"{key}={value}," + head_deleted_dict["extra"] = head_deleted_dict["extra"][:-1] + ")" + if "final_conv_kernel" in extra: + kernel_size = extra["final_conv_kernel"] if kernel_size > 1: padding = kernel_size // 2 - head_new['final_layer'] = dict( - kernel_size=kernel_size, padding=padding) - head_append_dict[ - 'final_layer'] = f'dict(kernel_size={kernel_size}, ' \ - f'padding={padding})' + head_new["final_layer"] = dict(kernel_size=kernel_size, padding=padding) + head_append_dict["final_layer"] = f"dict(kernel_size={kernel_size}, " f"padding={padding})" else: - head_new['final_layer'] = dict(kernel_size=kernel_size) - head_append_dict[ - 'final_layer'] = f'dict(kernel_size={kernel_size})' - if 'upsample' in extra: + head_new["final_layer"] = dict(kernel_size=kernel_size) + head_append_dict["final_layer"] = f"dict(kernel_size={kernel_size})" + if "upsample" in extra: neck_new.update( dict( - type='FeatureMapProcessor', - scale_factor=float(extra['upsample']), + type="FeatureMapProcessor", + scale_factor=float(extra["upsample"]), apply_relu=True, - )) + ) + ) -def process_has_final_layer(has_final_layer: bool, head_new: Dict, - head_deleted_dict: Dict, - head_append_dict: Dict) -> None: +def process_has_final_layer(has_final_layer: bool, head_new: Dict, head_deleted_dict: Dict, head_append_dict: Dict) -> None: """Process the has_final_layer field and update the head dictionary.""" - head_deleted_dict['has_final_layer'] = str(has_final_layer) + head_deleted_dict["has_final_layer"] = str(has_final_layer) if not has_final_layer: - if 'final_layer' not in head_new: - head_new['final_layer'] = None - head_append_dict['final_layer'] = 'None' + if "final_layer" not in head_new: + head_new["final_layer"] = None + head_append_dict["final_layer"] = "None" -def check_and_update_config(neck: Optional[ConfigType], - head: ConfigType) -> Tuple[Optional[Dict], Dict]: +def check_and_update_config(neck: Optional[ConfigType], head: ConfigType) -> Tuple[Optional[Dict], Dict]: """Check and update the configuration of the head and neck components. Args: neck (Optional[ConfigType]): Configuration for the neck component. @@ -102,41 +100,36 @@ def check_and_update_config(neck: Optional[ConfigType], Tuple[Optional[Dict], Dict]: Updated configurations for the neck and head components. """ - head_new, neck_new = head.copy(), neck.copy() if isinstance(neck, - dict) else {} + head_new, neck_new = head.copy(), neck.copy() if isinstance(neck, dict) else {} head_deleted_dict, head_append_dict = {}, {} - if 'input_transform' in head: - input_transform = head_new.pop('input_transform') - head_deleted_dict['input_transform'] = f'\'{input_transform}\'' + if "input_transform" in head: + input_transform = head_new.pop("input_transform") + head_deleted_dict["input_transform"] = f"'{input_transform}'" else: - input_transform = 'select' + input_transform = "select" - if 'input_index' in head: - input_index = head_new.pop('input_index') - head_deleted_dict['input_index'] = str(input_index) + if "input_index" in head: + input_index = head_new.pop("input_index") + head_deleted_dict["input_index"] = str(input_index) else: - input_index = (-1, ) + input_index = (-1,) - if 'align_corners' in head: - align_corners = head_new.pop('align_corners') - head_deleted_dict['align_corners'] = str(align_corners) + if "align_corners" in head: + align_corners = head_new.pop("align_corners") + head_deleted_dict["align_corners"] = str(align_corners) else: align_corners = False - process_input_transform(input_transform, head, head_new, head_deleted_dict, - head_append_dict, neck_new, input_index, - align_corners) + process_input_transform(input_transform, head, head_new, head_deleted_dict, head_append_dict, neck_new, input_index, align_corners) - if 'extra' in head: - extra = head_new.pop('extra') - process_extra_field(extra, head_new, head_deleted_dict, - head_append_dict, neck_new) + if "extra" in head: + extra = head_new.pop("extra") + process_extra_field(extra, head_new, head_deleted_dict, head_append_dict, neck_new) - if 'has_final_layer' in head: - has_final_layer = head_new.pop('has_final_layer') - process_has_final_layer(has_final_layer, head_new, head_deleted_dict, - head_append_dict) + if "has_final_layer" in head: + has_final_layer = head_new.pop("has_final_layer") + process_has_final_layer(has_final_layer, head_new, head_deleted_dict, head_append_dict) display_modifications(head_deleted_dict, head_append_dict, neck_new) @@ -145,8 +138,7 @@ def check_and_update_config(neck: Optional[ConfigType], @master_only -def display_modifications(head_deleted_dict: Dict, head_append_dict: Dict, - neck: Dict) -> None: +def display_modifications(head_deleted_dict: Dict, head_append_dict: Dict, neck: Dict) -> None: """Display the modifications made to the head and neck configurations. Args: @@ -157,24 +149,21 @@ def display_modifications(head_deleted_dict: Dict, head_append_dict: Dict, if len(head_deleted_dict) + len(head_append_dict) == 0: return - old_model_info, new_model_info = build_model_info(head_deleted_dict, - head_append_dict, neck) + old_model_info, new_model_info = build_model_info(head_deleted_dict, head_append_dict, neck) - total_info = '\nThe config you are using is outdated. '\ - 'The following section of the config:\n```\n' + total_info = "\nThe config you are using is outdated. " "The following section of the config:\n```\n" total_info += old_model_info - total_info += '```\nshould be updated to\n```\n' + total_info += "```\nshould be updated to\n```\n" total_info += new_model_info - total_info += '```\nFor more information, please refer to '\ - 'https://mmpose.readthedocs.io/en/latest/' \ - 'guide_to_framework.html#step3-model' + total_info += ( + "```\nFor more information, please refer to " "https://mmpose.readthedocs.io/en/latest/" "guide_to_framework.html#step3-model" + ) logger: MMLogger = MMLogger.get_current_instance() logger.warning(total_info) -def build_model_info(head_deleted_dict: Dict, head_append_dict: Dict, - neck: Dict) -> Tuple[str, str]: +def build_model_info(head_deleted_dict: Dict, head_append_dict: Dict, neck: Dict) -> Tuple[str, str]: """Build the old and new model information strings. Args: head_deleted_dict (Dict): Dictionary of deleted fields in the head. @@ -188,9 +177,8 @@ def build_model_info(head_deleted_dict: Dict, head_append_dict: Dict, new_head_info = build_head_info(head_append_dict) neck_info = build_neck_info(neck) - old_model_info = 'model=dict(\n' + ' ' * 4 + '...,\n' + old_head_info - new_model_info = 'model=dict(\n' + ' ' * 4 + '...,\n' \ - + neck_info + new_head_info + old_model_info = "model=dict(\n" + " " * 4 + "...,\n" + old_head_info + new_model_info = "model=dict(\n" + " " * 4 + "...,\n" + neck_info + new_head_info return old_model_info, new_model_info @@ -203,10 +191,10 @@ def build_head_info(head_dict: Dict) -> str: Returns: str: Head information string. """ - head_info = ' ' * 4 + 'head=dict(\n' + head_info = " " * 4 + "head=dict(\n" for key, value in head_dict.items(): - head_info += ' ' * 8 + f'{key}={value},\n' - head_info += ' ' * 8 + '...),\n' + head_info += " " * 8 + f"{key}={value},\n" + head_info += " " * 8 + "...),\n" return head_info @@ -220,11 +208,10 @@ def build_neck_info(neck: Dict) -> str: """ if len(neck) > 0: neck = neck.copy() - neck_info = ' ' * 4 + 'neck=dict(\n' + ' ' * 8 + \ - f'type=\'{neck.pop("type")}\',\n' + neck_info = " " * 4 + "neck=dict(\n" + " " * 8 + f'type=\'{neck.pop("type")}\',\n' for key, value in neck.items(): - neck_info += ' ' * 8 + f'{key}={str(value)},\n' - neck_info += ' ' * 4 + '),\n' + neck_info += " " * 8 + f"{key}={str(value)},\n" + neck_info += " " * 4 + "),\n" else: - neck_info = '' + neck_info = "" return neck_info diff --git a/mmpose/models/utils/ckpt_convert.py b/mmpose/models/utils/ckpt_convert.py index 05f5cdb4a3cdf32ac2b6b7a8888c5a772e582f14..74d57d9e6133283795a860b6d25e9416e7d96b44 100644 --- a/mmpose/models/utils/ckpt_convert.py +++ b/mmpose/models/utils/ckpt_convert.py @@ -15,64 +15,61 @@ def pvt_convert(ckpt): use_abs_pos_embed = False use_conv_ffn = False for k in ckpt.keys(): - if k.startswith('pos_embed'): + if k.startswith("pos_embed"): use_abs_pos_embed = True - if k.find('dwconv') >= 0: + if k.find("dwconv") >= 0: use_conv_ffn = True for k, v in ckpt.items(): - if k.startswith('head'): + if k.startswith("head"): continue - if k.startswith('norm.'): + if k.startswith("norm."): continue - if k.startswith('cls_token'): + if k.startswith("cls_token"): continue - if k.startswith('pos_embed'): - stage_i = int(k.replace('pos_embed', '')) - new_k = k.replace(f'pos_embed{stage_i}', - f'layers.{stage_i - 1}.1.0.pos_embed') + if k.startswith("pos_embed"): + stage_i = int(k.replace("pos_embed", "")) + new_k = k.replace(f"pos_embed{stage_i}", f"layers.{stage_i - 1}.1.0.pos_embed") if stage_i == 4 and v.size(1) == 50: # 1 (cls token) + 7 * 7 new_v = v[:, 1:, :] # remove cls token else: new_v = v - elif k.startswith('patch_embed'): - stage_i = int(k.split('.')[0].replace('patch_embed', '')) - new_k = k.replace(f'patch_embed{stage_i}', - f'layers.{stage_i - 1}.0') + elif k.startswith("patch_embed"): + stage_i = int(k.split(".")[0].replace("patch_embed", "")) + new_k = k.replace(f"patch_embed{stage_i}", f"layers.{stage_i - 1}.0") new_v = v - if 'proj.' in new_k: - new_k = new_k.replace('proj.', 'projection.') - elif k.startswith('block'): - stage_i = int(k.split('.')[0].replace('block', '')) - layer_i = int(k.split('.')[1]) + if "proj." in new_k: + new_k = new_k.replace("proj.", "projection.") + elif k.startswith("block"): + stage_i = int(k.split(".")[0].replace("block", "")) + layer_i = int(k.split(".")[1]) new_layer_i = layer_i + use_abs_pos_embed - new_k = k.replace(f'block{stage_i}.{layer_i}', - f'layers.{stage_i - 1}.1.{new_layer_i}') + new_k = k.replace(f"block{stage_i}.{layer_i}", f"layers.{stage_i - 1}.1.{new_layer_i}") new_v = v - if 'attn.q.' in new_k: - sub_item_k = k.replace('q.', 'kv.') - new_k = new_k.replace('q.', 'attn.in_proj_') + if "attn.q." in new_k: + sub_item_k = k.replace("q.", "kv.") + new_k = new_k.replace("q.", "attn.in_proj_") new_v = torch.cat([v, ckpt[sub_item_k]], dim=0) - elif 'attn.kv.' in new_k: + elif "attn.kv." in new_k: continue - elif 'attn.proj.' in new_k: - new_k = new_k.replace('proj.', 'attn.out_proj.') - elif 'attn.sr.' in new_k: - new_k = new_k.replace('sr.', 'sr.') - elif 'mlp.' in new_k: - string = f'{new_k}-' - new_k = new_k.replace('mlp.', 'ffn.layers.') - if 'fc1.weight' in new_k or 'fc2.weight' in new_k: + elif "attn.proj." in new_k: + new_k = new_k.replace("proj.", "attn.out_proj.") + elif "attn.sr." in new_k: + new_k = new_k.replace("sr.", "sr.") + elif "mlp." in new_k: + string = f"{new_k}-" + new_k = new_k.replace("mlp.", "ffn.layers.") + if "fc1.weight" in new_k or "fc2.weight" in new_k: new_v = v.reshape((*v.shape, 1, 1)) - new_k = new_k.replace('fc1.', '0.') - new_k = new_k.replace('dwconv.dwconv.', '1.') + new_k = new_k.replace("fc1.", "0.") + new_k = new_k.replace("dwconv.dwconv.", "1.") if use_conv_ffn: - new_k = new_k.replace('fc2.', '4.') + new_k = new_k.replace("fc2.", "4.") else: - new_k = new_k.replace('fc2.', '3.') - string += f'{new_k} {v.shape}-{new_v.shape}' - elif k.startswith('norm'): + new_k = new_k.replace("fc2.", "3.") + string += f"{new_k} {v.shape}-{new_v.shape}" + elif k.startswith("norm"): stage_i = int(k[4]) - new_k = k.replace(f'norm{stage_i}', f'layers.{stage_i - 1}.2') + new_k = k.replace(f"norm{stage_i}", f"layers.{stage_i - 1}.2") new_v = v else: new_k = k diff --git a/mmpose/models/utils/csp_layer.py b/mmpose/models/utils/csp_layer.py index 071e1209a2b4b0e1acb722063bfbd9b248fb8b5c..631f8554b4c254ed8537969ee6fca1b62f5a7a82 100644 --- a/mmpose/models/utils/csp_layer.py +++ b/mmpose/models/utils/csp_layer.py @@ -61,38 +61,24 @@ class DarknetBottleneck(BaseModule): Defaults to dict(type='Swish'). """ - def __init__(self, - in_channels: int, - out_channels: int, - expansion: float = 0.5, - add_identity: bool = True, - use_depthwise: bool = False, - conv_cfg: OptConfigType = None, - norm_cfg: ConfigType = dict( - type='BN', momentum=0.03, eps=0.001), - act_cfg: ConfigType = dict(type='Swish'), - init_cfg: OptMultiConfig = None) -> None: + def __init__( + self, + in_channels: int, + out_channels: int, + expansion: float = 0.5, + add_identity: bool = True, + use_depthwise: bool = False, + conv_cfg: OptConfigType = None, + norm_cfg: ConfigType = dict(type="BN", momentum=0.03, eps=0.001), + act_cfg: ConfigType = dict(type="Swish"), + init_cfg: OptMultiConfig = None, + ) -> None: super().__init__(init_cfg=init_cfg) hidden_channels = int(out_channels * expansion) conv = DepthwiseSeparableConvModule if use_depthwise else ConvModule - self.conv1 = ConvModule( - in_channels, - hidden_channels, - 1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg) - self.conv2 = conv( - hidden_channels, - out_channels, - 3, - stride=1, - padding=1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg) - self.add_identity = \ - add_identity and in_channels == out_channels + self.conv1 = ConvModule(in_channels, hidden_channels, 1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) + self.conv2 = conv(hidden_channels, out_channels, 3, stride=1, padding=1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) + self.add_identity = add_identity and in_channels == out_channels def forward(self, x: Tensor) -> Tensor: """Forward function.""" @@ -130,29 +116,23 @@ class CSPNeXtBlock(BaseModule): Defaults to None. """ - def __init__(self, - in_channels: int, - out_channels: int, - expansion: float = 0.5, - add_identity: bool = True, - use_depthwise: bool = False, - kernel_size: int = 5, - conv_cfg: OptConfigType = None, - norm_cfg: ConfigType = dict( - type='BN', momentum=0.03, eps=0.001), - act_cfg: ConfigType = dict(type='SiLU'), - init_cfg: OptMultiConfig = None) -> None: + def __init__( + self, + in_channels: int, + out_channels: int, + expansion: float = 0.5, + add_identity: bool = True, + use_depthwise: bool = False, + kernel_size: int = 5, + conv_cfg: OptConfigType = None, + norm_cfg: ConfigType = dict(type="BN", momentum=0.03, eps=0.001), + act_cfg: ConfigType = dict(type="SiLU"), + init_cfg: OptMultiConfig = None, + ) -> None: super().__init__(init_cfg=init_cfg) hidden_channels = int(out_channels * expansion) conv = DepthwiseSeparableConvModule if use_depthwise else ConvModule - self.conv1 = conv( - in_channels, - hidden_channels, - 3, - stride=1, - padding=1, - norm_cfg=norm_cfg, - act_cfg=act_cfg) + self.conv1 = conv(in_channels, hidden_channels, 3, stride=1, padding=1, norm_cfg=norm_cfg, act_cfg=act_cfg) self.conv2 = DepthwiseSeparableConvModule( hidden_channels, out_channels, @@ -161,9 +141,9 @@ class CSPNeXtBlock(BaseModule): padding=kernel_size // 2, conv_cfg=conv_cfg, norm_cfg=norm_cfg, - act_cfg=act_cfg) - self.add_identity = \ - add_identity and in_channels == out_channels + act_cfg=act_cfg, + ) + self.add_identity = add_identity and in_channels == out_channels def forward(self, x: Tensor) -> Tensor: """Forward function.""" @@ -205,57 +185,35 @@ class CSPLayer(BaseModule): Defaults to None. """ - def __init__(self, - in_channels: int, - out_channels: int, - expand_ratio: float = 0.5, - num_blocks: int = 1, - add_identity: bool = True, - use_depthwise: bool = False, - use_cspnext_block: bool = False, - channel_attention: bool = False, - conv_cfg: OptConfigType = None, - norm_cfg: ConfigType = dict( - type='BN', momentum=0.03, eps=0.001), - act_cfg: ConfigType = dict(type='Swish'), - init_cfg: OptMultiConfig = None) -> None: + def __init__( + self, + in_channels: int, + out_channels: int, + expand_ratio: float = 0.5, + num_blocks: int = 1, + add_identity: bool = True, + use_depthwise: bool = False, + use_cspnext_block: bool = False, + channel_attention: bool = False, + conv_cfg: OptConfigType = None, + norm_cfg: ConfigType = dict(type="BN", momentum=0.03, eps=0.001), + act_cfg: ConfigType = dict(type="Swish"), + init_cfg: OptMultiConfig = None, + ) -> None: super().__init__(init_cfg=init_cfg) block = CSPNeXtBlock if use_cspnext_block else DarknetBottleneck mid_channels = int(out_channels * expand_ratio) self.channel_attention = channel_attention - self.main_conv = ConvModule( - in_channels, - mid_channels, - 1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg) - self.short_conv = ConvModule( - in_channels, - mid_channels, - 1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg) - self.final_conv = ConvModule( - 2 * mid_channels, - out_channels, - 1, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg) - - self.blocks = nn.Sequential(*[ - block( - mid_channels, - mid_channels, - 1.0, - add_identity, - use_depthwise, - conv_cfg=conv_cfg, - norm_cfg=norm_cfg, - act_cfg=act_cfg) for _ in range(num_blocks) - ]) + self.main_conv = ConvModule(in_channels, mid_channels, 1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) + self.short_conv = ConvModule(in_channels, mid_channels, 1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) + self.final_conv = ConvModule(2 * mid_channels, out_channels, 1, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) + + self.blocks = nn.Sequential( + *[ + block(mid_channels, mid_channels, 1.0, add_identity, use_depthwise, conv_cfg=conv_cfg, norm_cfg=norm_cfg, act_cfg=act_cfg) + for _ in range(num_blocks) + ] + ) if channel_attention: self.attention = ChannelAttention(2 * mid_channels) diff --git a/mmpose/models/utils/geometry.py b/mmpose/models/utils/geometry.py index 0ceadaec30cd2c9bb3fbada132e1ea674f2e8754..ef329b6ca9f961c31d635dcb7ab928bc651acd6e 100644 --- a/mmpose/models/utils/geometry.py +++ b/mmpose/models/utils/geometry.py @@ -17,7 +17,7 @@ def rot6d_to_rotmat(x): a1 = x[:, :, 0] a2 = x[:, :, 1] b1 = F.normalize(a1) - b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1) + b2 = F.normalize(a2 - torch.einsum("bi,bi->b", b1, a2).unsqueeze(-1) * b1) b3 = torch.cross(b1, b2) return torch.stack((b1, b2, b3), dim=-1) @@ -50,8 +50,7 @@ def quat_to_rotmat(quat): """ norm_quat = quat norm_quat = norm_quat / norm_quat.norm(p=2, dim=1, keepdim=True) - w, x, y, z = norm_quat[:, 0], norm_quat[:, 1],\ - norm_quat[:, 2], norm_quat[:, 3] + w, x, y, z = norm_quat[:, 0], norm_quat[:, 1], norm_quat[:, 2], norm_quat[:, 3] B = quat.size(0) @@ -59,10 +58,18 @@ def quat_to_rotmat(quat): wx, wy, wz = w * x, w * y, w * z xy, xz, yz = x * y, x * z, y * z - rotMat = torch.stack([ - w2 + x2 - y2 - z2, 2 * xy - 2 * wz, 2 * wy + 2 * xz, 2 * wz + 2 * xy, - w2 - x2 + y2 - z2, 2 * yz - 2 * wx, 2 * xz - 2 * wy, 2 * wx + 2 * yz, - w2 - x2 - y2 + z2 - ], - dim=1).view(B, 3, 3) + rotMat = torch.stack( + [ + w2 + x2 - y2 - z2, + 2 * xy - 2 * wz, + 2 * wy + 2 * xz, + 2 * wz + 2 * xy, + w2 - x2 + y2 - z2, + 2 * yz - 2 * wx, + 2 * xz - 2 * wy, + 2 * wx + 2 * yz, + w2 - x2 - y2 + z2, + ], + dim=1, + ).view(B, 3, 3) return rotMat diff --git a/mmpose/models/utils/misc.py b/mmpose/models/utils/misc.py index 347c5217092b0feadaef6e0534b4d77b51d3b190..fa34045300ad23a88ee910b6800ebeca63eff72d 100644 --- a/mmpose/models/utils/misc.py +++ b/mmpose/models/utils/misc.py @@ -71,6 +71,5 @@ def filter_scores_and_topk(scores, score_thr, topk, results=None): elif isinstance(results, torch.Tensor): filtered_results = results[keep_idxs] else: - raise NotImplementedError(f'Only supports dict or list or Tensor, ' - f'but get {type(results)}.') + raise NotImplementedError(f"Only supports dict or list or Tensor, " f"but get {type(results)}.") return scores, labels, keep_idxs, filtered_results diff --git a/mmpose/models/utils/ops.py b/mmpose/models/utils/ops.py index d1ba0cf37c3293e575d41ba47034ee08331a4fa9..d2a7485fcfcc09ae2a9a2410e41b4839f369a4a3 100644 --- a/mmpose/models/utils/ops.py +++ b/mmpose/models/utils/ops.py @@ -9,12 +9,14 @@ from torch.nn import functional as F from mmpose.registry import MODELS -def resize(input: torch.Tensor, - size: Optional[Union[Tuple[int, int], torch.Size]] = None, - scale_factor: Optional[float] = None, - mode: str = 'nearest', - align_corners: Optional[bool] = None, - warning: bool = True) -> torch.Tensor: +def resize( + input: torch.Tensor, + size: Optional[Union[Tuple[int, int], torch.Size]] = None, + scale_factor: Optional[float] = None, + mode: str = "nearest", + align_corners: Optional[bool] = None, + warning: bool = True, +) -> torch.Tensor: """Resize a given input tensor using specified size or scale_factor. Args: @@ -38,14 +40,17 @@ def resize(input: torch.Tensor, input_h, input_w = tuple(int(x) for x in input.shape[2:]) output_h, output_w = tuple(int(x) for x in size) if output_h > input_h or output_w > output_h: - if ((output_h > 1 and output_w > 1 and input_h > 1 - and input_w > 1) and (output_h - 1) % (input_h - 1) - and (output_w - 1) % (input_w - 1)): + if ( + (output_h > 1 and output_w > 1 and input_h > 1 and input_w > 1) + and (output_h - 1) % (input_h - 1) + and (output_w - 1) % (input_w - 1) + ): warnings.warn( - f'When align_corners={align_corners}, ' - 'the output would be more aligned if ' - f'input size {(input_h, input_w)} is `x+1` and ' - f'out size {(output_h, output_w)} is `nx+1`') + f"When align_corners={align_corners}, " + "the output would be more aligned if " + f"input size {(input_h, input_w)} is `x+1` and " + f"out size {(output_h, output_w)} is `nx+1`" + ) # Convert torch.Size to tuple if necessary if isinstance(size, torch.Size): @@ -67,22 +72,20 @@ class FrozenBatchNorm2d(torch.nn.Module): def __init__(self, n, eps: int = 1e-5): super(FrozenBatchNorm2d, self).__init__() - self.register_buffer('weight', torch.ones(n)) - self.register_buffer('bias', torch.zeros(n)) - self.register_buffer('running_mean', torch.zeros(n)) - self.register_buffer('running_var', torch.ones(n)) + self.register_buffer("weight", torch.ones(n)) + self.register_buffer("bias", torch.zeros(n)) + self.register_buffer("running_mean", torch.zeros(n)) + self.register_buffer("running_var", torch.ones(n)) self.eps = eps - def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict, - missing_keys, unexpected_keys, error_msgs): - num_batches_tracked_key = prefix + 'num_batches_tracked' + def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs): + num_batches_tracked_key = prefix + "num_batches_tracked" if num_batches_tracked_key in state_dict: del state_dict[num_batches_tracked_key] - super(FrozenBatchNorm2d, - self)._load_from_state_dict(state_dict, prefix, local_metadata, - strict, missing_keys, - unexpected_keys, error_msgs) + super(FrozenBatchNorm2d, self)._load_from_state_dict( + state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs + ) def forward(self, x): w = self.weight.reshape(1, -1, 1, 1) diff --git a/mmpose/models/utils/realnvp.py b/mmpose/models/utils/realnvp.py index 911953e8f9d1056d44a2d3538d750e89b9bd6a7a..63fa23637c46e4285b15ccbfb2a0957e29588db2 100644 --- a/mmpose/models/utils/realnvp.py +++ b/mmpose/models/utils/realnvp.py @@ -20,16 +20,12 @@ class RealNVP(nn.Module): @staticmethod def get_scale_net(): """Get the scale model in a single invertable mapping.""" - return nn.Sequential( - nn.Linear(2, 64), nn.LeakyReLU(), nn.Linear(64, 64), - nn.LeakyReLU(), nn.Linear(64, 2), nn.Tanh()) + return nn.Sequential(nn.Linear(2, 64), nn.LeakyReLU(), nn.Linear(64, 64), nn.LeakyReLU(), nn.Linear(64, 2), nn.Tanh()) @staticmethod def get_trans_net(): """Get the translation model in a single invertable mapping.""" - return nn.Sequential( - nn.Linear(2, 64), nn.LeakyReLU(), nn.Linear(64, 64), - nn.LeakyReLU(), nn.Linear(64, 2)) + return nn.Sequential(nn.Linear(2, 64), nn.LeakyReLU(), nn.Linear(64, 64), nn.LeakyReLU(), nn.Linear(64, 2)) @property def prior(self): @@ -39,15 +35,12 @@ class RealNVP(nn.Module): def __init__(self): super(RealNVP, self).__init__() - self.register_buffer('loc', torch.zeros(2)) - self.register_buffer('cov', torch.eye(2)) - self.register_buffer( - 'mask', torch.tensor([[0, 1], [1, 0]] * 3, dtype=torch.float32)) + self.register_buffer("loc", torch.zeros(2)) + self.register_buffer("cov", torch.eye(2)) + self.register_buffer("mask", torch.tensor([[0, 1], [1, 0]] * 3, dtype=torch.float32)) - self.s = torch.nn.ModuleList( - [self.get_scale_net() for _ in range(len(self.mask))]) - self.t = torch.nn.ModuleList( - [self.get_trans_net() for _ in range(len(self.mask))]) + self.s = torch.nn.ModuleList([self.get_scale_net() for _ in range(len(self.mask))]) + self.t = torch.nn.ModuleList([self.get_trans_net() for _ in range(len(self.mask))]) self.init_weights() def init_weights(self): diff --git a/mmpose/models/utils/regularizations.py b/mmpose/models/utils/regularizations.py index d8c7449038066016f6efb60e126111ace962fe98..db3036b7066b638f2cf925a64848a3eecfa0e79d 100644 --- a/mmpose/models/utils/regularizations.py +++ b/mmpose/models/utils/regularizations.py @@ -41,14 +41,14 @@ class PytorchModuleHook(metaclass=ABCMeta): """ assert isinstance(module, torch.nn.Module) - if self.hook_type == 'forward': + if self.hook_type == "forward": h = module.register_forward_hook(self.hook) - elif self.hook_type == 'forward_pre': + elif self.hook_type == "forward_pre": h = module.register_forward_pre_hook(self.hook) - elif self.hook_type == 'backward': + elif self.hook_type == "backward": h = module.register_backward_hook(self.hook) else: - raise ValueError(f'Invalid hook type {self.hook}') + raise ValueError(f"Invalid hook type {self.hook}") return h @@ -65,19 +65,17 @@ class WeightNormClipHook(PytorchModuleHook): apply weight norm clip. """ - def __init__(self, max_norm=1.0, module_param_names='weight'): - self.module_param_names = module_param_names if isinstance( - module_param_names, list) else [module_param_names] + def __init__(self, max_norm=1.0, module_param_names="weight"): + self.module_param_names = module_param_names if isinstance(module_param_names, list) else [module_param_names] self.max_norm = max_norm @property def hook_type(self): - return 'forward_pre' + return "forward_pre" def hook(self, module, _input): for name in self.module_param_names: - assert name in module._parameters, f'{name} is not a parameter' \ - f' of the module {type(module)}' + assert name in module._parameters, f"{name} is not a parameter" f" of the module {type(module)}" param = module._parameters[name] with torch.no_grad(): diff --git a/mmpose/models/utils/reparam_layers.py b/mmpose/models/utils/reparam_layers.py index 3ba196294f3cefe7702d053db953a83bfbde8db4..558de11f65d7057db8b25ec4312e8ce1aca2673d 100644 --- a/mmpose/models/utils/reparam_layers.py +++ b/mmpose/models/utils/reparam_layers.py @@ -36,18 +36,20 @@ class RepVGGBlock(BaseModule): init_cfg (dict): The config dict for initialization. Defaults to None. """ - def __init__(self, - in_channels: int, - out_channels: int, - stride: int = 1, - padding: int = 1, - dilation: int = 1, - groups: int = 1, - padding_mode: str = 'zeros', - norm_cfg: OptConfigType = dict(type='BN'), - act_cfg: OptConfigType = dict(type='ReLU'), - without_branch_norm: bool = True, - init_cfg: OptConfigType = None): + def __init__( + self, + in_channels: int, + out_channels: int, + stride: int = 1, + padding: int = 1, + dilation: int = 1, + groups: int = 1, + padding_mode: str = "zeros", + norm_cfg: OptConfigType = dict(type="BN"), + act_cfg: OptConfigType = dict(type="ReLU"), + without_branch_norm: bool = True, + init_cfg: OptConfigType = None, + ): super(RepVGGBlock, self).__init__(init_cfg) self.in_channels = in_channels @@ -62,8 +64,7 @@ class RepVGGBlock(BaseModule): # judge if input shape and output shape are the same. # If true, add a normalized identity shortcut. self.branch_norm = None - if out_channels == in_channels and stride == 1 and \ - padding == dilation and not without_branch_norm: + if out_channels == in_channels and stride == 1 and padding == dilation and not without_branch_norm: self.branch_norm = build_norm_layer(norm_cfg, in_channels)[1] self.branch_3x3 = ConvModule( @@ -75,15 +76,10 @@ class RepVGGBlock(BaseModule): groups=self.groups, dilation=self.dilation, norm_cfg=self.norm_cfg, - act_cfg=None) + act_cfg=None, + ) - self.branch_1x1 = ConvModule( - self.in_channels, - self.out_channels, - 1, - groups=self.groups, - norm_cfg=self.norm_cfg, - act_cfg=None) + self.branch_1x1 = ConvModule(self.in_channels, self.out_channels, 1, groups=self.groups, norm_cfg=self.norm_cfg, act_cfg=None) self.act = build_activation_layer(act_cfg) @@ -147,14 +143,12 @@ class RepVGGBlock(BaseModule): eps = branch.bn.eps else: assert isinstance(branch, (nn.SyncBatchNorm, nn.BatchNorm2d)) - if not hasattr(self, 'id_tensor'): + if not hasattr(self, "id_tensor"): input_dim = self.in_channels // self.groups - kernel_value = np.zeros((self.in_channels, input_dim, 3, 3), - dtype=np.float32) + kernel_value = np.zeros((self.in_channels, input_dim, 3, 3), dtype=np.float32) for i in range(self.in_channels): kernel_value[i, i % input_dim, 1, 1] = 1 - self.id_tensor = torch.from_numpy(kernel_value).to( - branch.weight.device) + self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device) kernel = self.id_tensor running_mean = branch.running_mean running_var = branch.running_var @@ -174,11 +168,9 @@ class RepVGGBlock(BaseModule): """ kernel3x3, bias3x3 = self._fuse_bn_tensor(self.branch_3x3) kernel1x1, bias1x1 = self._fuse_bn_tensor(self.branch_1x1) - kernelid, biasid = (0, 0) if self.branch_norm is None else \ - self._fuse_bn_tensor(self.branch_norm) + kernelid, biasid = (0, 0) if self.branch_norm is None else self._fuse_bn_tensor(self.branch_norm) - return (kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, - bias3x3 + bias1x1 + biasid) + return (kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid) def switch_to_deploy(self, test_cfg: Optional[Dict] = None): """Switches the block to deployment mode. @@ -187,7 +179,7 @@ class RepVGGBlock(BaseModule): derived from the equivalent kernel and bias, replacing the original branches. This reduces computational complexity during inference. """ - if getattr(self, 'deploy', False): + if getattr(self, "deploy", False): return kernel, bias = self.get_equivalent_kernel_bias() @@ -199,15 +191,16 @@ class RepVGGBlock(BaseModule): padding=self.branch_3x3.conv.padding, dilation=self.branch_3x3.conv.dilation, groups=self.branch_3x3.conv.groups, - bias=True) + bias=True, + ) self.conv_reparam.weight.data = kernel self.conv_reparam.bias.data = bias for para in self.parameters(): para.detach_() - self.__delattr__('branch_3x3') - self.__delattr__('branch_1x1') - if hasattr(self, 'branch_norm'): - self.__delattr__('branch_norm') + self.__delattr__("branch_3x3") + self.__delattr__("branch_1x1") + if hasattr(self, "branch_norm"): + self.__delattr__("branch_norm") def _forward(self, x): return self.act(self.conv_reparam(x)) diff --git a/mmpose/models/utils/rtmcc_block.py b/mmpose/models/utils/rtmcc_block.py index 0a16701c0f753d7e60dd02d081f377c9dcf74108..cbf111ade29a80a7e0fbf2680a05d5ee33787828 100644 --- a/mmpose/models/utils/rtmcc_block.py +++ b/mmpose/models/utils/rtmcc_block.py @@ -36,16 +36,13 @@ def rope(x, dim): for i in spatial_shape: total_len *= i - position = torch.reshape( - torch.arange(total_len, dtype=torch.int, device=x.device), - spatial_shape) + position = torch.reshape(torch.arange(total_len, dtype=torch.int, device=x.device), spatial_shape) for i in range(dim[-1] + 1, len(shape) - 1, 1): position = torch.unsqueeze(position, dim=-1) half_size = shape[-1] // 2 - freq_seq = -torch.arange( - half_size, dtype=torch.int, device=x.device) / float(half_size) + freq_seq = -torch.arange(half_size, dtype=torch.int, device=x.device) / float(half_size) inv_freq = 10000**-freq_seq sinusoid = position[..., None] * inv_freq[None, None, :] @@ -68,10 +65,9 @@ class Scale(nn.Module): Defaults to True. """ - def __init__(self, dim, init_value=1., trainable=True): + def __init__(self, dim, init_value=1.0, trainable=True): super().__init__() - self.scale = nn.Parameter( - init_value * torch.ones(dim), requires_grad=trainable) + self.scale = nn.Parameter(init_value * torch.ones(dim), requires_grad=trainable) def forward(self, x): """Forward function.""" @@ -119,20 +115,22 @@ class RTMCCBlock(nn.Module): `_ """ - def __init__(self, - num_token, - in_token_dims, - out_token_dims, - expansion_factor=2, - s=128, - eps=1e-5, - dropout_rate=0., - drop_path=0., - attn_type='self-attn', - act_fn='SiLU', - bias=False, - use_rel_bias=True, - pos_enc=False): + def __init__( + self, + num_token, + in_token_dims, + out_token_dims, + expansion_factor=2, + s=128, + eps=1e-5, + dropout_rate=0.0, + drop_path=0.0, + attn_type="self-attn", + act_fn="SiLU", + bias=False, + use_rel_bias=True, + pos_enc=False, + ): super(RTMCCBlock, self).__init__() self.s = s @@ -140,20 +138,18 @@ class RTMCCBlock(nn.Module): self.use_rel_bias = use_rel_bias self.attn_type = attn_type self.pos_enc = pos_enc - self.drop_path = DropPath(drop_path) \ - if drop_path > 0. else nn.Identity() + self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity() self.e = int(in_token_dims * expansion_factor) if use_rel_bias: - if attn_type == 'self-attn': - self.w = nn.Parameter( - torch.rand([2 * num_token - 1], dtype=torch.float)) + if attn_type == "self-attn": + self.w = nn.Parameter(torch.rand([2 * num_token - 1], dtype=torch.float)) else: self.a = nn.Parameter(torch.rand([1, s], dtype=torch.float)) self.b = nn.Parameter(torch.rand([1, s], dtype=torch.float)) self.o = nn.Linear(self.e, out_token_dims, bias=bias) - if attn_type == 'self-attn': + if attn_type == "self-attn": self.uv = nn.Linear(in_token_dims, 2 * self.e + self.s, bias=bias) self.gamma = nn.Parameter(torch.rand((2, self.s))) self.beta = nn.Parameter(torch.rand((2, self.s))) @@ -168,12 +164,11 @@ class RTMCCBlock(nn.Module): nn.init.xavier_uniform_(self.uv.weight) - if act_fn == 'SiLU' or act_fn == nn.SiLU: - assert digit_version(TORCH_VERSION) >= digit_version('1.7.0'), \ - 'SiLU activation requires PyTorch version >= 1.7' + if act_fn == "SiLU" or act_fn == nn.SiLU: + assert digit_version(TORCH_VERSION) >= digit_version("1.7.0"), "SiLU activation requires PyTorch version >= 1.7" self.act_fn = nn.SiLU(True) - elif act_fn == 'ReLU' or act_fn == nn.ReLU: + elif act_fn == "ReLU" or act_fn == nn.ReLU: self.act_fn = nn.ReLU(True) else: raise NotImplementedError @@ -188,14 +183,14 @@ class RTMCCBlock(nn.Module): self.dropout_rate = dropout_rate - if dropout_rate > 0.: + if dropout_rate > 0.0: self.dropout = nn.Dropout(dropout_rate) def rel_pos_bias(self, seq_len, k_len=None): """Add relative position bias.""" - if self.attn_type == 'self-attn': - t = F.pad(self.w[:2 * seq_len - 1], [0, seq_len]).repeat(seq_len) + if self.attn_type == "self-attn": + t = F.pad(self.w[: 2 * seq_len - 1], [0, seq_len]).repeat(seq_len) t = t[..., :-seq_len].reshape(-1, seq_len, 3 * seq_len - 2) r = (2 * seq_len - 1) // 2 t = t[..., r:-r] @@ -208,7 +203,7 @@ class RTMCCBlock(nn.Module): def _forward(self, inputs): """GAU Forward function.""" - if self.attn_type == 'self-attn': + if self.attn_type == "self-attn": x = inputs else: x, k, v = inputs @@ -219,7 +214,7 @@ class RTMCCBlock(nn.Module): uv = self.uv(x) uv = self.act_fn(uv) - if self.attn_type == 'self-attn': + if self.attn_type == "self-attn": # [B, K, e + e + s] -> [B, K, e], [B, K, e], [B, K, s] u, v, base = torch.split(uv, [self.e, self.e, self.s], dim=2) # [B, K, 1, s] * [1, 1, 2, s] + [2, s] -> [B, K, 2, s] @@ -246,15 +241,15 @@ class RTMCCBlock(nn.Module): qk = torch.bmm(q, k.permute(0, 2, 1)) if self.use_rel_bias: - if self.attn_type == 'self-attn': + if self.attn_type == "self-attn": bias = self.rel_pos_bias(q.size(1)) else: bias = self.rel_pos_bias(q.size(1), k.size(1)) - qk += bias[:, :q.size(1), :k.size(1)] + qk += bias[:, : q.size(1), : k.size(1)] # [B, K, K] kernel = torch.square(F.relu(qk / self.sqrt_s)) - if self.dropout_rate > 0.: + if self.dropout_rate > 0.0: kernel = self.dropout(kernel) # [B, K, K] x [B, K, e] -> [B, K, e] x = u * torch.bmm(kernel, v) @@ -267,7 +262,7 @@ class RTMCCBlock(nn.Module): """Forward function.""" if self.shortcut: - if self.attn_type == 'cross-attn': + if self.attn_type == "cross-attn": res_shortcut = x[0] else: res_shortcut = x diff --git a/mmpose/models/utils/transformer.py b/mmpose/models/utils/transformer.py index 987b8658083dd03e97e908f9700405911357d2f9..d54d922c9555be1aa4f1a90b10dcae2def4e84bc 100644 --- a/mmpose/models/utils/transformer.py +++ b/mmpose/models/utils/transformer.py @@ -34,7 +34,7 @@ def nlc_to_nchw(x, hw_shape): H, W = hw_shape assert len(x.shape) == 3 B, L, C = x.shape - assert L == H * W, 'The seq_len does not match H, W' + assert L == H * W, "The seq_len does not match H, W" return x.transpose(1, 2).reshape(B, C, H, W).contiguous() @@ -82,11 +82,11 @@ class AdaptivePadding(nn.Module): >>> assert (out.shape[2], out.shape[3]) == (16, 32) """ - def __init__(self, kernel_size=1, stride=1, dilation=1, padding='corner'): + def __init__(self, kernel_size=1, stride=1, dilation=1, padding="corner"): super(AdaptivePadding, self).__init__() - assert padding in ('same', 'corner') + assert padding in ("same", "corner") kernel_size = to_2tuple(kernel_size) stride = to_2tuple(stride) @@ -106,10 +106,8 @@ class AdaptivePadding(nn.Module): stride_h, stride_w = self.stride output_h = math.ceil(input_h / stride_h) output_w = math.ceil(input_w / stride_w) - pad_h = max((output_h - 1) * stride_h + - (kernel_h - 1) * self.dilation[0] + 1 - input_h, 0) - pad_w = max((output_w - 1) * stride_w + - (kernel_w - 1) * self.dilation[1] + 1 - input_w, 0) + pad_h = max((output_h - 1) * stride_h + (kernel_h - 1) * self.dilation[0] + 1 - input_h, 0) + pad_w = max((output_w - 1) * stride_w + (kernel_w - 1) * self.dilation[1] + 1 - input_w, 0) return pad_h, pad_w def forward(self, x): @@ -117,13 +115,10 @@ class AdaptivePadding(nn.Module): pad_h, pad_w = self.get_pad_shape(x.size()[-2:]) if pad_h > 0 or pad_w > 0: - if self.padding == 'corner': + if self.padding == "corner": x = F.pad(x, [0, pad_w, 0, pad_h]) - elif self.padding == 'same': - x = F.pad(x, [ - pad_w // 2, pad_w - pad_w // 2, pad_h // 2, - pad_h - pad_h // 2 - ]) + elif self.padding == "same": + x = F.pad(x, [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2]) return x @@ -159,10 +154,10 @@ class PatchEmbed(BaseModule): self, in_channels=3, embed_dims=768, - conv_type='Conv2d', + conv_type="Conv2d", kernel_size=16, stride=16, - padding='corner', + padding="corner", dilation=1, bias=True, norm_cfg=None, @@ -180,11 +175,7 @@ class PatchEmbed(BaseModule): dilation = to_2tuple(dilation) if isinstance(padding, str): - self.adap_padding = AdaptivePadding( - kernel_size=kernel_size, - stride=stride, - dilation=dilation, - padding=padding) + self.adap_padding = AdaptivePadding(kernel_size=kernel_size, stride=stride, dilation=dilation, padding=padding) # disable the padding of conv padding = 0 else: @@ -199,7 +190,8 @@ class PatchEmbed(BaseModule): stride=stride, padding=padding, dilation=dilation, - bias=bias) + bias=bias, + ) if norm_cfg is not None: self.norm = build_norm_layer(norm_cfg, embed_dims)[1] @@ -220,10 +212,8 @@ class PatchEmbed(BaseModule): input_size = (input_h, input_w) # https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html - h_out = (input_size[0] + 2 * padding[0] - dilation[0] * - (kernel_size[0] - 1) - 1) // stride[0] + 1 - w_out = (input_size[1] + 2 * padding[1] - dilation[1] * - (kernel_size[1] - 1) - 1) // stride[1] + 1 + h_out = (input_size[0] + 2 * padding[0] - dilation[0] * (kernel_size[0] - 1) - 1) // stride[0] + 1 + w_out = (input_size[1] + 2 * padding[1] - dilation[1] * (kernel_size[1] - 1) - 1) // stride[1] + 1 self.init_out_size = (h_out, w_out) else: self.init_input_size = None @@ -284,16 +274,18 @@ class PatchMerging(BaseModule): Default: None. """ - def __init__(self, - in_channels, - out_channels, - kernel_size=2, - stride=None, - padding='corner', - dilation=1, - bias=False, - norm_cfg=dict(type='LN'), - init_cfg=None): + def __init__( + self, + in_channels, + out_channels, + kernel_size=2, + stride=None, + padding="corner", + dilation=1, + bias=False, + norm_cfg=dict(type="LN"), + init_cfg=None, + ): super().__init__(init_cfg=init_cfg) self.in_channels = in_channels self.out_channels = out_channels @@ -307,22 +299,14 @@ class PatchMerging(BaseModule): dilation = to_2tuple(dilation) if isinstance(padding, str): - self.adap_padding = AdaptivePadding( - kernel_size=kernel_size, - stride=stride, - dilation=dilation, - padding=padding) + self.adap_padding = AdaptivePadding(kernel_size=kernel_size, stride=stride, dilation=dilation, padding=padding) # disable the padding of unfold padding = 0 else: self.adap_padding = None padding = to_2tuple(padding) - self.sampler = nn.Unfold( - kernel_size=kernel_size, - dilation=dilation, - padding=padding, - stride=stride) + self.sampler = nn.Unfold(kernel_size=kernel_size, dilation=dilation, padding=padding, stride=stride) sample_dim = kernel_size[0] * kernel_size[1] * in_channels @@ -348,13 +332,10 @@ class PatchMerging(BaseModule): (Merged_H, Merged_W). """ B, L, C = x.shape - assert isinstance(input_size, Sequence), f'Expect ' \ - f'input_size is ' \ - f'`Sequence` ' \ - f'but get {input_size}' + assert isinstance(input_size, Sequence), f"Expect " f"input_size is " f"`Sequence` " f"but get {input_size}" H, W = input_size - assert L == H * W, 'input feature has wrong size' + assert L == H * W, "input feature has wrong size" x = x.view(B, H, W, C).permute([0, 3, 1, 2]) # B, C, H, W # Use nn.Unfold to merge patch. About 25% faster than original method, @@ -367,12 +348,12 @@ class PatchMerging(BaseModule): x = self.sampler(x) # if kernel_size=2 and stride=2, x should has shape (B, 4*C, H/2*W/2) - out_h = (H + 2 * self.sampler.padding[0] - self.sampler.dilation[0] * - (self.sampler.kernel_size[0] - 1) - - 1) // self.sampler.stride[0] + 1 - out_w = (W + 2 * self.sampler.padding[1] - self.sampler.dilation[1] * - (self.sampler.kernel_size[1] - 1) - - 1) // self.sampler.stride[1] + 1 + out_h = (H + 2 * self.sampler.padding[0] - self.sampler.dilation[0] * (self.sampler.kernel_size[0] - 1) - 1) // self.sampler.stride[ + 0 + ] + 1 + out_w = (W + 2 * self.sampler.padding[1] - self.sampler.dilation[1] * (self.sampler.kernel_size[1] - 1) - 1) // self.sampler.stride[ + 1 + ] + 1 output_size = (out_h, out_w) x = x.transpose(1, 2) # B, H/2*W/2, 4*C @@ -409,8 +390,7 @@ class ScaleNorm(nn.Module): torch.Tensor: The tensor after applying scale norm. """ - if torch.onnx.is_in_onnx_export() and \ - digit_version(TORCH_VERSION) >= digit_version('1.12'): + if torch.onnx.is_in_onnx_export() and digit_version(TORCH_VERSION) >= digit_version("1.12"): norm = torch.linalg.norm(x, dim=-1, keepdim=True) @@ -462,26 +442,24 @@ class SinePositionalEncoding(nn.Module): pos_dim = out_channels // 2 dim_t = torch.arange(pos_dim, dtype=torch.float32) / pos_dim - dim_t = self.temperature**(dim_t) + dim_t = self.temperature ** (dim_t) if not learnable: - self.register_buffer('dim_t', dim_t) + self.register_buffer("dim_t", dim_t) else: self.dim_t = nn.Parameter(dim_t.detach()) # set parameters if eval_size: - if hasattr(self, f'pos_enc_{eval_size}'): - delattr(self, f'pos_enc_{eval_size}') + if hasattr(self, f"pos_enc_{eval_size}"): + delattr(self, f"pos_enc_{eval_size}") pos_enc = self.generate_pos_encoding(size=eval_size) - self.register_buffer(f'pos_enc_{eval_size}', pos_enc) + self.register_buffer(f"pos_enc_{eval_size}", pos_enc) def forward(self, *args, **kwargs): return self.generate_pos_encoding(*args, **kwargs) - def generate_pos_encoding(self, - size: Union[int, Sequence[int]] = None, - position: Optional[Tensor] = None): + def generate_pos_encoding(self, size: Union[int, Sequence[int]] = None, position: Optional[Tensor] = None): """Generate positional encoding for input features. Args: @@ -493,19 +471,16 @@ class SinePositionalEncoding(nn.Module): assert (size is not None) ^ (position is not None) - if (not (self.learnable - and self.training)) and size is not None and hasattr( - self, f'pos_enc_{size}'): - return getattr(self, f'pos_enc_{size}') + if (not (self.learnable and self.training)) and size is not None and hasattr(self, f"pos_enc_{size}"): + return getattr(self, f"pos_enc_{size}") if self.spatial_dim == 1: if size is not None: if isinstance(size, (tuple, list)): size = size[0] - position = torch.arange( - size, dtype=torch.float32, device=self.dim_t.device) + position = torch.arange(size, dtype=torch.float32, device=self.dim_t.device) - dim_t = self.dim_t.reshape(*((1, ) * position.ndim), -1) + dim_t = self.dim_t.reshape(*((1,) * position.ndim), -1) freq = position.unsqueeze(-1) / dim_t pos_enc = torch.cat((freq.cos(), freq.sin()), dim=-1) @@ -516,18 +491,17 @@ class SinePositionalEncoding(nn.Module): elif isinstance(size, (int, float)): h, w = int(size), int(size) else: - raise ValueError(f'got invalid type {type(size)} for size') + raise ValueError(f"got invalid type {type(size)} for size") grid_h, grid_w = torch.meshgrid( - torch.arange( - int(h), dtype=torch.float32, device=self.dim_t.device), - torch.arange( - int(w), dtype=torch.float32, device=self.dim_t.device)) + torch.arange(int(h), dtype=torch.float32, device=self.dim_t.device), + torch.arange(int(w), dtype=torch.float32, device=self.dim_t.device), + ) grid_h, grid_w = grid_h.flatten(), grid_w.flatten() else: assert position.size(-1) == 2 grid_h, grid_w = torch.unbind(position, dim=-1) - dim_t = self.dim_t.reshape(*((1, ) * grid_h.ndim), -1) + dim_t = self.dim_t.reshape(*((1,) * grid_h.ndim), -1) freq_h = grid_h.unsqueeze(-1) / dim_t freq_w = grid_w.unsqueeze(-1) / dim_t pos_enc_h = torch.cat((freq_h.cos(), freq_h.sin()), dim=-1) @@ -537,9 +511,7 @@ class SinePositionalEncoding(nn.Module): return pos_enc @staticmethod - def apply_additional_pos_enc(feature: Tensor, - pos_enc: Tensor, - spatial_dim: int = 1): + def apply_additional_pos_enc(feature: Tensor, pos_enc: Tensor, spatial_dim: int = 1): """Apply additional positional encoding to input features. Args: @@ -548,8 +520,7 @@ class SinePositionalEncoding(nn.Module): spatial_dim (int): Spatial dimension of input features. """ - assert spatial_dim in (1, 2), f'the argument spatial_dim must be ' \ - f'either 1 or 2, but got {spatial_dim}' + assert spatial_dim in (1, 2), f"the argument spatial_dim must be " f"either 1 or 2, but got {spatial_dim}" if spatial_dim == 2: pos_enc = pos_enc.flatten(-2) for _ in range(feature.ndim - pos_enc.ndim): @@ -557,9 +528,7 @@ class SinePositionalEncoding(nn.Module): return feature + pos_enc @staticmethod - def apply_rotary_pos_enc(feature: Tensor, - pos_enc: Tensor, - spatial_dim: int = 1): + def apply_rotary_pos_enc(feature: Tensor, pos_enc: Tensor, spatial_dim: int = 1): """Apply rotary positional encoding to input features. Args: @@ -568,8 +537,7 @@ class SinePositionalEncoding(nn.Module): spatial_dim (int): Spatial dimension of input features. """ - assert spatial_dim in (1, 2), f'the argument spatial_dim must be ' \ - f'either 1 or 2, but got {spatial_dim}' + assert spatial_dim in (1, 2), f"the argument spatial_dim must be " f"either 1 or 2, but got {spatial_dim}" for _ in range(feature.ndim - pos_enc.ndim + spatial_dim - 1): pos_enc = pos_enc.unsqueeze(0) @@ -577,14 +545,12 @@ class SinePositionalEncoding(nn.Module): x1, x2 = torch.chunk(feature, 2, dim=-1) if spatial_dim == 1: cos, sin = torch.chunk(pos_enc, 2, dim=-1) - feature = torch.cat((x1 * cos - x2 * sin, x2 * cos + x1 * sin), - dim=-1) + feature = torch.cat((x1 * cos - x2 * sin, x2 * cos + x1 * sin), dim=-1) elif spatial_dim == 2: pos_enc_h, pos_enc_w = torch.unbind(pos_enc, dim=-1) cos_h, sin_h = torch.chunk(pos_enc_h, 2, dim=-1) cos_w, sin_w = torch.chunk(pos_enc_w, 2, dim=-1) - feature = torch.cat( - (x1 * cos_h - x2 * sin_h, x1 * cos_w + x2 * sin_w), dim=-1) + feature = torch.cat((x1 * cos_h - x2 * sin_h, x1 * cos_w + x2 * sin_w), dim=-1) return feature @@ -600,10 +566,9 @@ class ChannelWiseScale(nn.Module): Defaults to True. """ - def __init__(self, dim, init_value=1., trainable=True): + def __init__(self, dim, init_value=1.0, trainable=True): super().__init__() - self.scale = nn.Parameter( - init_value * torch.ones(dim), requires_grad=trainable) + self.scale = nn.Parameter(init_value * torch.ones(dim), requires_grad=trainable) def forward(self, x): """Forward function.""" @@ -642,18 +607,20 @@ class GAUEncoder(BaseModule): `_ """ - def __init__(self, - in_token_dims, - out_token_dims, - expansion_factor=2, - s=128, - eps=1e-5, - dropout_rate=0., - drop_path=0., - act_fn='SiLU', - bias=False, - pos_enc: str = 'none', - spatial_dim: int = 1): + def __init__( + self, + in_token_dims, + out_token_dims, + expansion_factor=2, + s=128, + eps=1e-5, + dropout_rate=0.0, + drop_path=0.0, + act_fn="SiLU", + bias=False, + pos_enc: str = "none", + spatial_dim: int = 1, + ): super(GAUEncoder, self).__init__() self.s = s @@ -661,8 +628,7 @@ class GAUEncoder(BaseModule): self.pos_enc = pos_enc self.in_token_dims = in_token_dims self.spatial_dim = spatial_dim - self.drop_path = DropPath(drop_path) \ - if drop_path > 0. else nn.Identity() + self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity() self.e = int(in_token_dims * expansion_factor) self.o = nn.Linear(self.e, out_token_dims, bias=bias) @@ -673,9 +639,8 @@ class GAUEncoder(BaseModule): nn.init.xavier_uniform_(self.uv.weight) - if act_fn == 'SiLU': - assert digit_version(TORCH_VERSION) >= digit_version('1.7.0'), \ - 'SiLU activation requires PyTorch version >= 1.7' + if act_fn == "SiLU": + assert digit_version(TORCH_VERSION) >= digit_version("1.7.0"), "SiLU activation requires PyTorch version >= 1.7" self.act_fn = nn.SiLU(True) else: @@ -690,12 +655,11 @@ class GAUEncoder(BaseModule): self.sqrt_s = math.sqrt(s) self.dropout_rate = dropout_rate - if dropout_rate > 0.: + if dropout_rate > 0.0: self.dropout = nn.Dropout(dropout_rate) def _build_layers(self): - self.uv = nn.Linear( - self.in_token_dims, 2 * self.e + self.s, bias=self.bias) + self.uv = nn.Linear(self.in_token_dims, 2 * self.e + self.s, bias=self.bias) self.gamma = nn.Parameter(torch.rand((2, self.s))) self.beta = nn.Parameter(torch.rand((2, self.s))) @@ -712,20 +676,17 @@ class GAUEncoder(BaseModule): u, v, base = torch.split(uv, [self.e, self.e, self.s], dim=-1) # [B, K, 1, s] * [1, 1, 2, s] + [2, s] -> [B, K, 2, s] dim = base.ndim - self.gamma.ndim + 1 - gamma = self.gamma.view(*((1, ) * dim), *self.gamma.size()) - beta = self.beta.view(*((1, ) * dim), *self.beta.size()) + gamma = self.gamma.view(*((1,) * dim), *self.gamma.size()) + beta = self.beta.view(*((1,) * dim), *self.beta.size()) base = base.unsqueeze(-2) * gamma + beta # [B, K, 2, s] -> [B, K, s], [B, K, s] q, k = torch.unbind(base, dim=-2) - if self.pos_enc == 'rope': - q = SinePositionalEncoding.apply_rotary_pos_enc( - q, pos_enc, self.spatial_dim) - k = SinePositionalEncoding.apply_rotary_pos_enc( - k, pos_enc, self.spatial_dim) - elif self.pos_enc == 'add': - pos_enc = pos_enc.reshape(*((1, ) * (q.ndim - 2)), q.size(-2), - q.size(-1)) + if self.pos_enc == "rope": + q = SinePositionalEncoding.apply_rotary_pos_enc(q, pos_enc, self.spatial_dim) + k = SinePositionalEncoding.apply_rotary_pos_enc(k, pos_enc, self.spatial_dim) + elif self.pos_enc == "add": + pos_enc = pos_enc.reshape(*((1,) * (q.ndim - 2)), q.size(-2), q.size(-1)) q = q + pos_enc k = k + pos_enc @@ -739,7 +700,7 @@ class GAUEncoder(BaseModule): if mask is not None: kernel = kernel * mask - if self.dropout_rate > 0.: + if self.dropout_rate > 0.0: kernel = self.dropout(kernel) # [B, K, K] x [B, K, e] -> [B, K, e] @@ -771,11 +732,7 @@ class DetrTransformerEncoder(BaseModule): the initialization. Defaults to None. """ - def __init__(self, - num_layers: int, - layer_cfg: ConfigType, - num_cp: int = -1, - init_cfg: OptConfigType = None) -> None: + def __init__(self, num_layers: int, layer_cfg: ConfigType, num_cp: int = -1, init_cfg: OptConfigType = None) -> None: super().__init__(init_cfg=init_cfg) self.num_layers = num_layers @@ -786,24 +743,21 @@ class DetrTransformerEncoder(BaseModule): def _init_layers(self) -> None: """Initialize encoder layers.""" - self.layers = ModuleList([ - DetrTransformerEncoderLayer(**self.layer_cfg) - for _ in range(self.num_layers) - ]) + self.layers = ModuleList([DetrTransformerEncoderLayer(**self.layer_cfg) for _ in range(self.num_layers)]) if self.num_cp > 0: if checkpoint_wrapper is None: raise NotImplementedError( - 'If you want to reduce GPU memory usage, \ + "If you want to reduce GPU memory usage, \ please install fairscale by executing the \ - following command: pip install fairscale.') + following command: pip install fairscale." + ) for i in range(self.num_cp): self.layers[i] = checkpoint_wrapper(self.layers[i]) self.embed_dims = self.layers[0].embed_dims - def forward(self, query: Tensor, query_pos: Tensor, - key_padding_mask: Tensor, **kwargs) -> Tensor: + def forward(self, query: Tensor, query_pos: Tensor, key_padding_mask: Tensor, **kwargs) -> Tensor: """Forward function of encoder. Args: @@ -837,27 +791,27 @@ class DetrTransformerEncoderLayer(BaseModule): the initialization. Defaults to None. """ - def __init__(self, - self_attn_cfg: OptConfigType = dict( - embed_dims=256, num_heads=8, dropout=0.0), - ffn_cfg: OptConfigType = dict( - embed_dims=256, - feedforward_channels=1024, - num_fcs=2, - ffn_drop=0., - act_cfg=dict(type='ReLU', inplace=True)), - norm_cfg: OptConfigType = dict(type='LN'), - init_cfg: OptConfigType = None) -> None: + def __init__( + self, + self_attn_cfg: OptConfigType = dict(embed_dims=256, num_heads=8, dropout=0.0), + ffn_cfg: OptConfigType = dict( + embed_dims=256, feedforward_channels=1024, num_fcs=2, ffn_drop=0.0, act_cfg=dict(type="ReLU", inplace=True) + ), + norm_cfg: OptConfigType = dict(type="LN"), + init_cfg: OptConfigType = None, + ) -> None: super().__init__(init_cfg=init_cfg) self.self_attn_cfg = self_attn_cfg - if 'batch_first' not in self.self_attn_cfg: - self.self_attn_cfg['batch_first'] = True + if "batch_first" not in self.self_attn_cfg: + self.self_attn_cfg["batch_first"] = True else: - assert self.self_attn_cfg['batch_first'] is True, 'First \ + assert ( + self.self_attn_cfg["batch_first"] is True + ), "First \ dimension of all DETRs in mmdet is `batch`, \ - please set `batch_first` flag.' + please set `batch_first` flag." self.ffn_cfg = ffn_cfg self.norm_cfg = norm_cfg @@ -868,14 +822,10 @@ class DetrTransformerEncoderLayer(BaseModule): self.self_attn = MultiheadAttention(**self.self_attn_cfg) self.embed_dims = self.self_attn.embed_dims self.ffn = FFN(**self.ffn_cfg) - norms_list = [ - build_norm_layer(self.norm_cfg, self.embed_dims)[1] - for _ in range(2) - ] + norms_list = [build_norm_layer(self.norm_cfg, self.embed_dims)[1] for _ in range(2)] self.norms = ModuleList(norms_list) - def forward(self, query: Tensor, query_pos: Tensor, - key_padding_mask: Tensor, **kwargs) -> Tensor: + def forward(self, query: Tensor, query_pos: Tensor, key_padding_mask: Tensor, **kwargs) -> Tensor: """Forward function of an encoder layer. Args: @@ -888,13 +838,8 @@ class DetrTransformerEncoderLayer(BaseModule): Tensor: forwarded results, has shape (bs, num_queries, dim). """ query = self.self_attn( - query=query, - key=query, - value=query, - query_pos=query_pos, - key_pos=query_pos, - key_padding_mask=key_padding_mask, - **kwargs) + query=query, key=query, value=query, query_pos=query_pos, key_pos=query_pos, key_padding_mask=key_padding_mask, **kwargs + ) query = self.norms[0](query) query = self.ffn(query) query = self.norms[1](query) diff --git a/mmpose/models/utils/tta.py b/mmpose/models/utils/tta.py index 41d2f2fd47986797aeef3b688ac519e15de1a674..5d8025dcfb14be27c5df7564492b23f28a172fc8 100644 --- a/mmpose/models/utils/tta.py +++ b/mmpose/models/utils/tta.py @@ -6,10 +6,7 @@ import torch.nn.functional as F from torch import Tensor -def flip_heatmaps(heatmaps: Tensor, - flip_indices: Optional[List[int]] = None, - flip_mode: str = 'heatmap', - shift_heatmap: bool = True): +def flip_heatmaps(heatmaps: Tensor, flip_indices: Optional[List[int]] = None, flip_mode: str = "heatmap", shift_heatmap: bool = True): """Flip heatmaps for test-time augmentation. Args: @@ -33,12 +30,12 @@ def flip_heatmaps(heatmaps: Tensor, Tensor: flipped heatmaps in shape [B, C, H, W] """ - if flip_mode == 'heatmap': + if flip_mode == "heatmap": heatmaps = heatmaps.flip(-1) if flip_indices is not None: assert len(flip_indices) == heatmaps.shape[1] heatmaps = heatmaps[:, flip_indices] - elif flip_mode == 'udp_combined': + elif flip_mode == "udp_combined": B, C, H, W = heatmaps.shape heatmaps = heatmaps.view(B, C // 3, 3, H, W) heatmaps = heatmaps.flip(-1) @@ -48,7 +45,7 @@ def flip_heatmaps(heatmaps: Tensor, heatmaps[:, :, 1] = -heatmaps[:, :, 1] heatmaps = heatmaps.view(B, C, H, W) - elif flip_mode == 'offset': + elif flip_mode == "offset": B, C, H, W = heatmaps.shape heatmaps = heatmaps.view(B, C // 2, -1, H, W) heatmaps = heatmaps.flip(-1) @@ -80,16 +77,14 @@ def flip_vectors(x_labels: Tensor, y_labels: Tensor, flip_indices: List[int]): keypoint """ assert x_labels.ndim == 3 and y_labels.ndim == 3 - assert len(flip_indices) == x_labels.shape[1] and len( - flip_indices) == y_labels.shape[1] + assert len(flip_indices) == x_labels.shape[1] and len(flip_indices) == y_labels.shape[1] x_labels = x_labels[:, flip_indices].flip(-1) y_labels = y_labels[:, flip_indices] return x_labels, y_labels -def flip_coordinates(coords: Tensor, flip_indices: List[int], - shift_coords: bool, input_size: Tuple[int, int]): +def flip_coordinates(coords: Tensor, flip_indices: List[int], shift_coords: bool, input_size: Tuple[int, int]): """Flip normalized coordinates for test-time augmentation. Args: @@ -129,10 +124,7 @@ def flip_visibility(vis: Tensor, flip_indices: List[int]): return vis -def aggregate_heatmaps(heatmaps: List[Tensor], - size: Optional[Tuple[int, int]], - align_corners: bool = False, - mode: str = 'average'): +def aggregate_heatmaps(heatmaps: List[Tensor], size: Optional[Tuple[int, int]], align_corners: bool = False, mode: str = "average"): """Aggregate multiple heatmaps. Args: @@ -151,8 +143,8 @@ def aggregate_heatmaps(heatmaps: List[Tensor], - ``'concat'``: Concate the heatmaps at the channel dim """ - if mode not in {'average', 'concat'}: - raise ValueError(f'Invalid aggregation mode `{mode}`') + if mode not in {"average", "concat"}: + raise ValueError(f"Invalid aggregation mode `{mode}`") if size is None: h, w = heatmaps[0].shape[2:4] @@ -161,21 +153,17 @@ def aggregate_heatmaps(heatmaps: List[Tensor], for i, _heatmaps in enumerate(heatmaps): assert _heatmaps.ndim == 4 - if mode == 'average': + if mode == "average": assert _heatmaps.shape[:2] == heatmaps[0].shape[:2] else: assert _heatmaps.shape[0] == heatmaps[0].shape[0] if _heatmaps.shape[2:4] != (h, w): - heatmaps[i] = F.interpolate( - _heatmaps, - size=(h, w), - mode='bilinear', - align_corners=align_corners) + heatmaps[i] = F.interpolate(_heatmaps, size=(h, w), mode="bilinear", align_corners=align_corners) - if mode == 'average': + if mode == "average": output = sum(heatmaps).div(len(heatmaps)) - elif mode == 'concat': + elif mode == "concat": output = torch.cat(heatmaps, dim=1) else: raise ValueError() diff --git a/mmpose/registry.py b/mmpose/registry.py index 84903eaf2deab3711b1ff87cf93fc11fcda88730..f3d6bc86691168843eba8eab44139d5a2056ffab 100644 --- a/mmpose/registry.py +++ b/mmpose/registry.py @@ -7,130 +7,87 @@ More details can be found at https://mmengine.readthedocs.io/en/latest/tutorials/registry.html. """ -from mmengine.registry import DATA_SAMPLERS as MMENGINE_DATA_SAMPLERS -from mmengine.registry import DATASETS as MMENGINE_DATASETS -from mmengine.registry import EVALUATOR as MMENGINE_EVALUATOR -from mmengine.registry import HOOKS as MMENGINE_HOOKS -from mmengine.registry import INFERENCERS as MMENGINE_INFERENCERS -from mmengine.registry import LOG_PROCESSORS as MMENGINE_LOG_PROCESSORS -from mmengine.registry import LOOPS as MMENGINE_LOOPS -from mmengine.registry import METRICS as MMENGINE_METRICS -from mmengine.registry import MODEL_WRAPPERS as MMENGINE_MODEL_WRAPPERS -from mmengine.registry import MODELS as MMENGINE_MODELS -from mmengine.registry import \ - OPTIM_WRAPPER_CONSTRUCTORS as MMENGINE_OPTIM_WRAPPER_CONSTRUCTORS -from mmengine.registry import OPTIM_WRAPPERS as MMENGINE_OPTIM_WRAPPERS -from mmengine.registry import OPTIMIZERS as MMENGINE_OPTIMIZERS -from mmengine.registry import PARAM_SCHEDULERS as MMENGINE_PARAM_SCHEDULERS -from mmengine.registry import \ - RUNNER_CONSTRUCTORS as MMENGINE_RUNNER_CONSTRUCTORS -from mmengine.registry import RUNNERS as MMENGINE_RUNNERS -from mmengine.registry import TASK_UTILS as MMENGINE_TASK_UTILS -from mmengine.registry import TRANSFORMS as MMENGINE_TRANSFORMS -from mmengine.registry import VISBACKENDS as MMENGINE_VISBACKENDS -from mmengine.registry import VISUALIZERS as MMENGINE_VISUALIZERS -from mmengine.registry import \ - WEIGHT_INITIALIZERS as MMENGINE_WEIGHT_INITIALIZERS -from mmengine.registry import Registry +from mmengine.registry import ( + DATA_SAMPLERS as MMENGINE_DATA_SAMPLERS, + DATASETS as MMENGINE_DATASETS, + EVALUATOR as MMENGINE_EVALUATOR, + HOOKS as MMENGINE_HOOKS, + INFERENCERS as MMENGINE_INFERENCERS, + LOG_PROCESSORS as MMENGINE_LOG_PROCESSORS, + LOOPS as MMENGINE_LOOPS, + METRICS as MMENGINE_METRICS, + MODEL_WRAPPERS as MMENGINE_MODEL_WRAPPERS, + MODELS as MMENGINE_MODELS, + OPTIM_WRAPPER_CONSTRUCTORS as MMENGINE_OPTIM_WRAPPER_CONSTRUCTORS, + OPTIM_WRAPPERS as MMENGINE_OPTIM_WRAPPERS, + OPTIMIZERS as MMENGINE_OPTIMIZERS, + PARAM_SCHEDULERS as MMENGINE_PARAM_SCHEDULERS, + RUNNER_CONSTRUCTORS as MMENGINE_RUNNER_CONSTRUCTORS, + RUNNERS as MMENGINE_RUNNERS, + TASK_UTILS as MMENGINE_TASK_UTILS, + TRANSFORMS as MMENGINE_TRANSFORMS, + VISBACKENDS as MMENGINE_VISBACKENDS, + VISUALIZERS as MMENGINE_VISUALIZERS, + WEIGHT_INITIALIZERS as MMENGINE_WEIGHT_INITIALIZERS, + Registry, +) # Registries For Runner and the related # manage all kinds of runners like `EpochBasedRunner` and `IterBasedRunner` -RUNNERS = Registry('runner', parent=MMENGINE_RUNNERS) +RUNNERS = Registry("runner", parent=MMENGINE_RUNNERS) # manage runner constructors that define how to initialize runners -RUNNER_CONSTRUCTORS = Registry( - 'runner constructor', parent=MMENGINE_RUNNER_CONSTRUCTORS) +RUNNER_CONSTRUCTORS = Registry("runner constructor", parent=MMENGINE_RUNNER_CONSTRUCTORS) # manage all kinds of loops like `EpochBasedTrainLoop` -LOOPS = Registry('loop', parent=MMENGINE_LOOPS) +LOOPS = Registry("loop", parent=MMENGINE_LOOPS) # manage all kinds of hooks like `CheckpointHook` -HOOKS = Registry( - 'hook', parent=MMENGINE_HOOKS, locations=['mmpose.engine.hooks']) +HOOKS = Registry("hook", parent=MMENGINE_HOOKS, locations=["mmpose.engine.hooks"]) # Registries For Data and the related # manage data-related modules -DATASETS = Registry( - 'dataset', parent=MMENGINE_DATASETS, locations=['mmpose.datasets']) -DATA_SAMPLERS = Registry( - 'data sampler', - parent=MMENGINE_DATA_SAMPLERS, - locations=['mmpose.datasets.samplers']) -TRANSFORMS = Registry( - 'transform', - parent=MMENGINE_TRANSFORMS, - locations=['mmpose.datasets.transforms']) +DATASETS = Registry("dataset", parent=MMENGINE_DATASETS, locations=["mmpose.datasets"]) +DATA_SAMPLERS = Registry("data sampler", parent=MMENGINE_DATA_SAMPLERS, locations=["mmpose.datasets.samplers"]) +TRANSFORMS = Registry("transform", parent=MMENGINE_TRANSFORMS, locations=["mmpose.datasets.transforms"]) # manage all kinds of modules inheriting `nn.Module` -MODELS = Registry('model', parent=MMENGINE_MODELS, locations=['mmpose.models']) +MODELS = Registry("model", parent=MMENGINE_MODELS, locations=["mmpose.models"]) # manage all kinds of model wrappers like 'MMDistributedDataParallel' -MODEL_WRAPPERS = Registry( - 'model_wrapper', - parent=MMENGINE_MODEL_WRAPPERS, - locations=['mmpose.models']) +MODEL_WRAPPERS = Registry("model_wrapper", parent=MMENGINE_MODEL_WRAPPERS, locations=["mmpose.models"]) # manage all kinds of weight initialization modules like `Uniform` -WEIGHT_INITIALIZERS = Registry( - 'weight initializer', - parent=MMENGINE_WEIGHT_INITIALIZERS, - locations=['mmpose.models']) +WEIGHT_INITIALIZERS = Registry("weight initializer", parent=MMENGINE_WEIGHT_INITIALIZERS, locations=["mmpose.models"]) # manage all kinds of batch augmentations like Mixup and CutMix. -BATCH_AUGMENTS = Registry('batch augment', locations=['mmpose.models']) +BATCH_AUGMENTS = Registry("batch augment", locations=["mmpose.models"]) # Registries For Optimizer and the related # manage all kinds of optimizers like `SGD` and `Adam` -OPTIMIZERS = Registry( - 'optimizer', parent=MMENGINE_OPTIMIZERS, locations=['mmpose.engine']) +OPTIMIZERS = Registry("optimizer", parent=MMENGINE_OPTIMIZERS, locations=["mmpose.engine"]) # manage optimizer wrapper -OPTIM_WRAPPERS = Registry( - 'optimizer_wrapper', - parent=MMENGINE_OPTIM_WRAPPERS, - locations=['mmpose.engine']) +OPTIM_WRAPPERS = Registry("optimizer_wrapper", parent=MMENGINE_OPTIM_WRAPPERS, locations=["mmpose.engine"]) # manage constructors that customize the optimization hyperparameters. OPTIM_WRAPPER_CONSTRUCTORS = Registry( - 'optimizer wrapper constructor', - parent=MMENGINE_OPTIM_WRAPPER_CONSTRUCTORS, - locations=['mmpose.engine.optim_wrappers']) + "optimizer wrapper constructor", parent=MMENGINE_OPTIM_WRAPPER_CONSTRUCTORS, locations=["mmpose.engine.optim_wrappers"] +) # manage all kinds of parameter schedulers like `MultiStepLR` -PARAM_SCHEDULERS = Registry( - 'parameter scheduler', - parent=MMENGINE_PARAM_SCHEDULERS, - locations=['mmpose.engine.schedulers']) +PARAM_SCHEDULERS = Registry("parameter scheduler", parent=MMENGINE_PARAM_SCHEDULERS, locations=["mmpose.engine.schedulers"]) # manage all kinds of metrics -METRICS = Registry( - 'metric', parent=MMENGINE_METRICS, locations=['mmpose.evaluation.metrics']) +METRICS = Registry("metric", parent=MMENGINE_METRICS, locations=["mmpose.evaluation.metrics"]) # manage all kinds of evaluators -EVALUATORS = Registry( - 'evaluator', - parent=MMENGINE_EVALUATOR, - locations=['mmpose.evaluation.evaluators']) +EVALUATORS = Registry("evaluator", parent=MMENGINE_EVALUATOR, locations=["mmpose.evaluation.evaluators"]) # manage task-specific modules like anchor generators and box coders -TASK_UTILS = Registry( - 'task util', - parent=MMENGINE_TASK_UTILS, - locations=['mmpose.models.task_modules']) +TASK_UTILS = Registry("task util", parent=MMENGINE_TASK_UTILS, locations=["mmpose.models.task_modules"]) # Registries For Visualizer and the related # manage visualizer -VISUALIZERS = Registry( - 'visualizer', - parent=MMENGINE_VISUALIZERS, - locations=['mmpose.visualization']) +VISUALIZERS = Registry("visualizer", parent=MMENGINE_VISUALIZERS, locations=["mmpose.visualization"]) # manage visualizer backend -VISBACKENDS = Registry( - 'vis_backend', - parent=MMENGINE_VISBACKENDS, - locations=['mmpose.visualization']) +VISBACKENDS = Registry("vis_backend", parent=MMENGINE_VISBACKENDS, locations=["mmpose.visualization"]) # manage all kinds log processors -LOG_PROCESSORS = Registry( - 'log processor', - parent=MMENGINE_LOG_PROCESSORS, - locations=['mmpose.visualization']) +LOG_PROCESSORS = Registry("log processor", parent=MMENGINE_LOG_PROCESSORS, locations=["mmpose.visualization"]) # manager keypoint encoder/decoder -KEYPOINT_CODECS = Registry('KEYPOINT_CODECS', locations=['mmpose.codecs']) +KEYPOINT_CODECS = Registry("KEYPOINT_CODECS", locations=["mmpose.codecs"]) # manage inferencer -INFERENCERS = Registry( - 'inferencer', - parent=MMENGINE_INFERENCERS, - locations=['mmpose.apis.inferencers']) +INFERENCERS = Registry("inferencer", parent=MMENGINE_INFERENCERS, locations=["mmpose.apis.inferencers"]) diff --git a/mmpose/structures/__init__.py b/mmpose/structures/__init__.py index 56a7dd725e06943c45a047b4da2f2ddd386ab16f..f8f1def9233139e95ba50b02246a93a5e27a8036 100644 --- a/mmpose/structures/__init__.py +++ b/mmpose/structures/__init__.py @@ -1,19 +1,45 @@ # Copyright (c) OpenMMLab. All rights reserved. -from .bbox import (bbox_clip_border, bbox_corner2xyxy, bbox_cs2xywh, - bbox_cs2xyxy, bbox_xywh2cs, bbox_xywh2xyxy, - bbox_xyxy2corner, bbox_xyxy2cs, bbox_xyxy2xywh, flip_bbox, - get_pers_warp_matrix, get_udp_warp_matrix, get_warp_matrix) -from .keypoint import flip_keypoints, keypoint_clip_border, find_min_padding_exact, fix_bbox_aspect_ratio +from .bbox import ( + bbox_clip_border, + bbox_corner2xyxy, + bbox_cs2xywh, + bbox_cs2xyxy, + bbox_xywh2cs, + bbox_xywh2xyxy, + bbox_xyxy2corner, + bbox_xyxy2cs, + bbox_xyxy2xywh, + flip_bbox, + get_pers_warp_matrix, + get_udp_warp_matrix, + get_warp_matrix, +) +from .keypoint import find_min_padding_exact, fix_bbox_aspect_ratio, flip_keypoints, keypoint_clip_border from .multilevel_pixel_data import MultilevelPixelData from .pose_data_sample import PoseDataSample from .utils import merge_data_samples, revert_heatmap, split_instances __all__ = [ - 'PoseDataSample', 'MultilevelPixelData', 'bbox_cs2xywh', 'bbox_cs2xyxy', - 'bbox_xywh2cs', 'bbox_xywh2xyxy', 'bbox_xyxy2cs', 'bbox_xyxy2xywh', - 'flip_bbox', 'get_udp_warp_matrix', 'get_warp_matrix', 'flip_keypoints', - 'merge_data_samples', 'revert_heatmap', 'split_instances', - 'keypoint_clip_border', 'bbox_clip_border', 'bbox_xyxy2corner', - 'bbox_corner2xyxy', 'get_pers_warp_matrix', - 'find_min_padding_exact', 'fix_bbox_aspect_ratio' + "PoseDataSample", + "MultilevelPixelData", + "bbox_cs2xywh", + "bbox_cs2xyxy", + "bbox_xywh2cs", + "bbox_xywh2xyxy", + "bbox_xyxy2cs", + "bbox_xyxy2xywh", + "flip_bbox", + "get_udp_warp_matrix", + "get_warp_matrix", + "flip_keypoints", + "merge_data_samples", + "revert_heatmap", + "split_instances", + "keypoint_clip_border", + "bbox_clip_border", + "bbox_xyxy2corner", + "bbox_corner2xyxy", + "get_pers_warp_matrix", + "find_min_padding_exact", + "fix_bbox_aspect_ratio", ] diff --git a/mmpose/structures/bbox/__init__.py b/mmpose/structures/bbox/__init__.py index abd3d5f2d9534842327cfd3a5b8a4fb225fda68d..4bb2749e979e6a4d585caf2e26c33f75c29439eb 100644 --- a/mmpose/structures/bbox/__init__.py +++ b/mmpose/structures/bbox/__init__.py @@ -1,14 +1,34 @@ # Copyright (c) OpenMMLab. All rights reserved. from .bbox_overlaps import bbox_overlaps -from .transforms import (bbox_clip_border, bbox_corner2xyxy, bbox_cs2xywh, - bbox_cs2xyxy, bbox_xywh2cs, bbox_xywh2xyxy, - bbox_xyxy2corner, bbox_xyxy2cs, bbox_xyxy2xywh, - flip_bbox, get_pers_warp_matrix, get_udp_warp_matrix, - get_warp_matrix) +from .transforms import ( + bbox_clip_border, + bbox_corner2xyxy, + bbox_cs2xywh, + bbox_cs2xyxy, + bbox_xywh2cs, + bbox_xywh2xyxy, + bbox_xyxy2corner, + bbox_xyxy2cs, + bbox_xyxy2xywh, + flip_bbox, + get_pers_warp_matrix, + get_udp_warp_matrix, + get_warp_matrix, +) __all__ = [ - 'bbox_cs2xywh', 'bbox_cs2xyxy', 'bbox_xywh2cs', 'bbox_xywh2xyxy', - 'bbox_xyxy2cs', 'bbox_xyxy2xywh', 'flip_bbox', 'get_udp_warp_matrix', - 'get_warp_matrix', 'bbox_overlaps', 'bbox_clip_border', 'bbox_xyxy2corner', - 'bbox_corner2xyxy', 'get_pers_warp_matrix' + "bbox_cs2xywh", + "bbox_cs2xyxy", + "bbox_xywh2cs", + "bbox_xywh2xyxy", + "bbox_xyxy2cs", + "bbox_xyxy2xywh", + "flip_bbox", + "get_udp_warp_matrix", + "get_warp_matrix", + "bbox_overlaps", + "bbox_clip_border", + "bbox_xyxy2corner", + "bbox_corner2xyxy", + "get_pers_warp_matrix", ] diff --git a/mmpose/structures/bbox/bbox_overlaps.py b/mmpose/structures/bbox/bbox_overlaps.py index 682008c3378833dff45f99b41a00b56ad0d24710..c27441c478ba3b7ff04cfa870a2c4c496864702c 100644 --- a/mmpose/structures/bbox/bbox_overlaps.py +++ b/mmpose/structures/bbox/bbox_overlaps.py @@ -8,11 +8,7 @@ def fp16_clamp(x, min_val=None, max_val=None): return x.clamp(min_val, max_val) -def bbox_overlaps(bboxes1, - bboxes2, - mode='iou', - is_aligned=False, - eps=1e-6) -> torch.Tensor: +def bbox_overlaps(bboxes1, bboxes2, mode="iou", is_aligned=False, eps=1e-6) -> torch.Tensor: """Calculate overlap between two sets of bounding boxes. Args: @@ -47,9 +43,9 @@ def bbox_overlaps(bboxes1, >>> overlaps = bbox_overlaps(bboxes1, bboxes2, is_aligned=True) >>> assert overlaps.shape == (3, ) """ - assert mode in ['iou', 'iof', 'giou'], f'Unsupported mode {mode}' - assert (bboxes1.size(-1) == 4 or bboxes1.size(0) == 0) - assert (bboxes2.size(-1) == 4 or bboxes2.size(0) == 0) + assert mode in ["iou", "iof", "giou"], f"Unsupported mode {mode}" + assert bboxes1.size(-1) == 4 or bboxes1.size(0) == 0 + assert bboxes2.size(-1) == 4 or bboxes2.size(0) == 0 if bboxes1.ndim == 1: bboxes1 = bboxes1.unsqueeze(0) @@ -66,14 +62,12 @@ def bbox_overlaps(bboxes1, if rows * cols == 0: if is_aligned: - return bboxes1.new(batch_shape + (rows, )) + return bboxes1.new(batch_shape + (rows,)) else: return bboxes1.new(batch_shape + (rows, cols)) - area1 = (bboxes1[..., 2] - bboxes1[..., 0]) * ( - bboxes1[..., 3] - bboxes1[..., 1]) - area2 = (bboxes2[..., 2] - bboxes2[..., 0]) * ( - bboxes2[..., 3] - bboxes2[..., 1]) + area1 = (bboxes1[..., 2] - bboxes1[..., 0]) * (bboxes1[..., 3] - bboxes1[..., 1]) + area2 = (bboxes2[..., 2] - bboxes2[..., 0]) * (bboxes2[..., 3] - bboxes2[..., 1]) if is_aligned: lt = torch.max(bboxes1[..., :2], bboxes2[..., :2]) @@ -81,11 +75,11 @@ def bbox_overlaps(bboxes1, wh = fp16_clamp(rb - lt, min_val=0) overlap = wh[..., 0] * wh[..., 1] - if mode in ['iou', 'giou']: + if mode in ["iou", "giou"]: union = area1 + area2 - overlap else: union = area1 - if mode == 'giou': + if mode == "giou": enclosed_lt = torch.min(bboxes1[..., :2], bboxes2[..., :2]) enclosed_rb = torch.max(bboxes1[..., 2:], bboxes2[..., 2:]) else: @@ -94,22 +88,20 @@ def bbox_overlaps(bboxes1, wh = fp16_clamp(rb - lt, min_val=0) overlap = wh[..., 0] * wh[..., 1] - if mode in ['iou', 'giou']: + if mode in ["iou", "giou"]: union = area1[..., None] + area2[..., None, :] - overlap else: union = area1[..., None] - if mode == 'giou': - enclosed_lt = torch.min(bboxes1[..., :, None, :2], - bboxes2[..., None, :, :2]) - enclosed_rb = torch.max(bboxes1[..., :, None, 2:], - bboxes2[..., None, :, 2:]) + if mode == "giou": + enclosed_lt = torch.min(bboxes1[..., :, None, :2], bboxes2[..., None, :, :2]) + enclosed_rb = torch.max(bboxes1[..., :, None, 2:], bboxes2[..., None, :, 2:]) eps_tensor = union.new_tensor([eps]) union = torch.max(union, eps_tensor) ious = overlap / union - if mode in ['iou', 'iof']: + if mode in ["iou", "iof"]: return ious - elif mode == 'giou': + elif mode == "giou": enclose_wh = fp16_clamp(enclosed_rb - enclosed_lt, min_val=0) enclose_area = enclose_wh[..., 0] * enclose_wh[..., 1] enclose_area = torch.max(enclose_area, eps_tensor) diff --git a/mmpose/structures/bbox/transforms.py b/mmpose/structures/bbox/transforms.py index 88db311c274a92ca36bdc085fe43d694d50c0cf1..30524779017e3be4190e0189979a82ebacfd5c89 100644 --- a/mmpose/structures/bbox/transforms.py +++ b/mmpose/structures/bbox/transforms.py @@ -41,8 +41,7 @@ def bbox_xywh2xyxy(bbox_xywh: np.ndarray) -> np.ndarray: return bbox_xyxy -def bbox_xyxy2cs(bbox: np.ndarray, - padding: float = 1.) -> Tuple[np.ndarray, np.ndarray]: +def bbox_xyxy2cs(bbox: np.ndarray, padding: float = 1.0) -> Tuple[np.ndarray, np.ndarray]: """Transform the bbox format from (x,y,w,h) into (center, scale) Args: @@ -73,8 +72,7 @@ def bbox_xyxy2cs(bbox: np.ndarray, return center, scale -def bbox_xywh2cs(bbox: np.ndarray, - padding: float = 1.) -> Tuple[np.ndarray, np.ndarray]: +def bbox_xywh2cs(bbox: np.ndarray, padding: float = 1.0) -> Tuple[np.ndarray, np.ndarray]: """Transform the bbox format from (x,y,w,h) into (center, scale) Args: @@ -107,9 +105,7 @@ def bbox_xywh2cs(bbox: np.ndarray, return center, scale -def bbox_cs2xyxy(center: np.ndarray, - scale: np.ndarray, - padding: float = 1.) -> np.ndarray: +def bbox_cs2xyxy(center: np.ndarray, scale: np.ndarray, padding: float = 1.0) -> np.ndarray: """Transform the bbox format from (center, scale) to (x1,y1,x2,y2). Args: @@ -139,9 +135,7 @@ def bbox_cs2xyxy(center: np.ndarray, return bbox -def bbox_cs2xywh(center: np.ndarray, - scale: np.ndarray, - padding: float = 1.) -> np.ndarray: +def bbox_cs2xywh(center: np.ndarray, scale: np.ndarray, padding: float = 1.0) -> np.ndarray: """Transform the bbox format from (center, scale) to (x,y,w,h). Args: @@ -268,10 +262,7 @@ def bbox_clip_border(bbox: np.ndarray, shape: Tuple[int, int]) -> np.ndarray: return bbox -def flip_bbox(bbox: np.ndarray, - image_size: Tuple[int, int], - bbox_format: str = 'xywh', - direction: str = 'horizontal') -> np.ndarray: +def flip_bbox(bbox: np.ndarray, image_size: Tuple[int, int], bbox_format: str = "xywh", direction: str = "horizontal") -> np.ndarray: """Flip the bbox in the given direction. Args: @@ -287,37 +278,32 @@ def flip_bbox(bbox: np.ndarray, Returns: np.ndarray: The flipped bounding boxes. """ - direction_options = {'horizontal', 'vertical', 'diagonal'} - assert direction in direction_options, ( - f'Invalid flipping direction "{direction}". ' - f'Options are {direction_options}') + direction_options = {"horizontal", "vertical", "diagonal"} + assert direction in direction_options, f'Invalid flipping direction "{direction}". ' f"Options are {direction_options}" - format_options = {'xywh', 'xyxy', 'center'} - assert bbox_format in format_options, ( - f'Invalid bbox format "{bbox_format}". ' - f'Options are {format_options}') + format_options = {"xywh", "xyxy", "center"} + assert bbox_format in format_options, f'Invalid bbox format "{bbox_format}". ' f"Options are {format_options}" bbox_flipped = bbox.copy() w, h = image_size # TODO: consider using "integer corner" coordinate system - if direction == 'horizontal': - if bbox_format == 'xywh' or bbox_format == 'center': + if direction == "horizontal": + if bbox_format == "xywh" or bbox_format == "center": bbox_flipped[..., 0] = w - bbox[..., 0] - 1 - elif bbox_format == 'xyxy': + elif bbox_format == "xyxy": bbox_flipped[..., ::2] = w - bbox[..., -2::-2] - 1 - elif direction == 'vertical': - if bbox_format == 'xywh' or bbox_format == 'center': + elif direction == "vertical": + if bbox_format == "xywh" or bbox_format == "center": bbox_flipped[..., 1] = h - bbox[..., 1] - 1 - elif bbox_format == 'xyxy': + elif bbox_format == "xyxy": bbox_flipped[..., 1::2] = h - bbox[..., ::-2] - 1 - elif direction == 'diagonal': - if bbox_format == 'xywh' or bbox_format == 'center': + elif direction == "diagonal": + if bbox_format == "xywh" or bbox_format == "center": bbox_flipped[..., :2] = [w, h] - bbox[..., :2] - 1 - elif bbox_format == 'xyxy': + elif bbox_format == "xyxy": bbox_flipped[...] = [w, h, w, h] - bbox - 1 - bbox_flipped = np.concatenate( - (bbox_flipped[..., 2:], bbox_flipped[..., :2]), axis=-1) + bbox_flipped = np.concatenate((bbox_flipped[..., 2:], bbox_flipped[..., :2]), axis=-1) return bbox_flipped @@ -358,14 +344,10 @@ def get_udp_warp_matrix( scale_y = (output_size[1] - 1) / scale[1] warp_mat[0, 0] = math.cos(rot_rad) * scale_x warp_mat[0, 1] = -math.sin(rot_rad) * scale_x - warp_mat[0, 2] = scale_x * (-0.5 * input_size[0] * math.cos(rot_rad) + - 0.5 * input_size[1] * math.sin(rot_rad) + - 0.5 * scale[0]) + warp_mat[0, 2] = scale_x * (-0.5 * input_size[0] * math.cos(rot_rad) + 0.5 * input_size[1] * math.sin(rot_rad) + 0.5 * scale[0]) warp_mat[1, 0] = math.sin(rot_rad) * scale_y warp_mat[1, 1] = math.cos(rot_rad) * scale_y - warp_mat[1, 2] = scale_y * (-0.5 * input_size[0] * math.sin(rot_rad) - - 0.5 * input_size[1] * math.cos(rot_rad) + - 0.5 * scale[1]) + warp_mat[1, 2] = scale_y * (-0.5 * input_size[0] * math.sin(rot_rad) - 0.5 * input_size[1] * math.cos(rot_rad) + 0.5 * scale[1]) return warp_mat @@ -374,7 +356,7 @@ def get_warp_matrix( scale: np.ndarray, rot: float, output_size: Tuple[int, int], - shift: Tuple[float, float] = (0., 0.), + shift: Tuple[float, float] = (0.0, 0.0), inv: bool = False, fix_aspect_ratio: bool = True, ) -> np.ndarray: @@ -408,8 +390,8 @@ def get_warp_matrix( dst_w, dst_h = output_size[:2] rot_rad = np.deg2rad(rot) - src_dir = _rotate_point(np.array([src_w * -0.5, 0.]), rot_rad) - dst_dir = np.array([dst_w * -0.5, 0.]) + src_dir = _rotate_point(np.array([src_w * -0.5, 0.0]), rot_rad) + dst_dir = np.array([dst_w * -0.5, 0.0]) src = np.zeros((3, 2), dtype=np.float32) src[0, :] = center + scale * shift @@ -423,8 +405,8 @@ def get_warp_matrix( src[2, :] = _get_3rd_point(src[0, :], src[1, :]) dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :]) else: - src_dir_2 = _rotate_point(np.array([0., src_h * -0.5]), rot_rad) - dst_dir_2 = np.array([0., dst_h * -0.5]) + src_dir_2 = _rotate_point(np.array([0.0, src_h * -0.5]), rot_rad) + dst_dir_2 = np.array([0.0, dst_h * -0.5]) src[2, :] = center + src_dir_2 + scale * shift dst[2, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir_2 @@ -435,9 +417,7 @@ def get_warp_matrix( return warp_mat -def get_pers_warp_matrix(center: np.ndarray, translate: np.ndarray, - scale: float, rot: float, - shear: np.ndarray) -> np.ndarray: +def get_pers_warp_matrix(center: np.ndarray, translate: np.ndarray, scale: float, rot: float, shear: np.ndarray) -> np.ndarray: """Compute a perspective warp matrix based on specified transformations. Args: @@ -459,33 +439,22 @@ def get_pers_warp_matrix(center: np.ndarray, translate: np.ndarray, >>> warp_matrix = get_pers_warp_matrix(center, translate, scale, rot, shear) """ - translate_mat = np.array([[1, 0, translate[0] + center[0]], - [0, 1, translate[1] + center[1]], [0, 0, 1]], - dtype=np.float32) + translate_mat = np.array([[1, 0, translate[0] + center[0]], [0, 1, translate[1] + center[1]], [0, 0, 1]], dtype=np.float32) shear_x = math.radians(shear[0]) shear_y = math.radians(shear[1]) - shear_mat = np.array([[1, np.tan(shear_x), 0], [np.tan(shear_y), 1, 0], - [0, 0, 1]], - dtype=np.float32) + shear_mat = np.array([[1, np.tan(shear_x), 0], [np.tan(shear_y), 1, 0], [0, 0, 1]], dtype=np.float32) rotate_angle = math.radians(rot) - rotate_mat = np.array([[np.cos(rotate_angle), -np.sin(rotate_angle), 0], - [np.sin(rotate_angle), - np.cos(rotate_angle), 0], [0, 0, 1]], - dtype=np.float32) - - scale_mat = np.array([[scale, 0, 0], [0, scale, 0], [0, 0, 1]], - dtype=np.float32) - - recover_center_mat = np.array([[1, 0, -center[0]], [0, 1, -center[1]], - [0, 0, 1]], - dtype=np.float32) - - warp_matrix = np.dot( - np.dot( - np.dot(np.dot(translate_mat, shear_mat), rotate_mat), scale_mat), - recover_center_mat) + rotate_mat = np.array( + [[np.cos(rotate_angle), -np.sin(rotate_angle), 0], [np.sin(rotate_angle), np.cos(rotate_angle), 0], [0, 0, 1]], dtype=np.float32 + ) + + scale_mat = np.array([[scale, 0, 0], [0, scale, 0], [0, 0, 1]], dtype=np.float32) + + recover_center_mat = np.array([[1, 0, -center[0]], [0, 1, -center[1]], [0, 0, 1]], dtype=np.float32) + + warp_matrix = np.dot(np.dot(np.dot(np.dot(translate_mat, shear_mat), rotate_mat), scale_mat), recover_center_mat) return warp_matrix diff --git a/mmpose/structures/keypoint/__init__.py b/mmpose/structures/keypoint/__init__.py index e662a9287a1cb0ad3098a1b0187f7512aafa281e..f27ee39e2b4bb4e07e2e1bf4c633b8b4ffd38920 100644 --- a/mmpose/structures/keypoint/__init__.py +++ b/mmpose/structures/keypoint/__init__.py @@ -1,10 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. -from .transforms import (flip_keypoints, flip_keypoints_custom_center, - keypoint_clip_border) -from .keypoints_min_padding import fix_bbox_aspect_ratio, find_min_padding_exact +from .keypoints_min_padding import find_min_padding_exact, fix_bbox_aspect_ratio +from .transforms import flip_keypoints, flip_keypoints_custom_center, keypoint_clip_border -__all__ = [ - 'flip_keypoints', 'flip_keypoints_custom_center', 'keypoint_clip_border', - 'fix_bbox_aspect_ratio', 'find_min_padding_exact' -] +__all__ = ["flip_keypoints", "flip_keypoints_custom_center", "keypoint_clip_border", "fix_bbox_aspect_ratio", "find_min_padding_exact"] diff --git a/mmpose/structures/keypoint/keypoints_min_padding.py b/mmpose/structures/keypoint/keypoints_min_padding.py index df93646716de71c5c86c1b10bd0be80cc938e66b..f799a9a70fe432f458ae1a306937b77ce22b54d7 100644 --- a/mmpose/structures/keypoint/keypoints_min_padding.py +++ b/mmpose/structures/keypoint/keypoints_min_padding.py @@ -1,8 +1,23 @@ +# Copyright (c) Miroslav Purkrabek, ProbPose. All rights reserved. import numpy as np -def find_min_padding_exact(bbox, kpts, aspect_ratio=3/4, bbox_format='xywh'): - '''Find the minimum padding to make keypoint inside bbox''' - assert bbox_format.lower() in ['xywh', 'xyxy'], f"Invalid bbox format {bbox_format}. Only 'xyxy' or 'xywh' are supported." + +def find_min_padding_exact(bbox, kpts, aspect_ratio=3 / 4, bbox_format="xywh"): + """ + Find the minimum padding factor to make all keypoints lie inside the bbox. + + Args: + bbox (np.ndarray): Bounding box in the specified format. + kpts (np.ndarray): Keypoints array of shape (K, 2) or (K, 3). + aspect_ratio (float or None): Target aspect ratio applied before padding check. + Pass None to skip aspect-ratio adjustment. + bbox_format (str): Bounding box format, either 'xywh' or 'xyxy'. + + Returns: + np.ndarray: Per-keypoint padding factors (shape (K,)). Values <= 0 indicate + invisible keypoints (visibility <= 0). + """ + assert bbox_format.lower() in ["xywh", "xyxy"], f"Invalid bbox format {bbox_format}. Only 'xyxy' or 'xywh' are supported." if kpts.size % 2 == 0: kpts = kpts.reshape(-1, 2) @@ -12,32 +27,44 @@ def find_min_padding_exact(bbox, kpts, aspect_ratio=3/4, bbox_format='xywh'): vis = kpts[:, 2].flatten() kpts = kpts[:, :2] else: - raise ValueError('Keypoints should have 2 or 3 values each') - - if bbox_format.lower() == 'xyxy': - bbox = np.array([ - bbox[0], - bbox[1], - bbox[2] - bbox[0], - bbox[3] - bbox[1], - ]) + raise ValueError("Keypoints should have 2 or 3 values each") + + if bbox_format.lower() == "xyxy": + bbox = np.array( + [ + bbox[0], + bbox[1], + bbox[2] - bbox[0], + bbox[3] - bbox[1], + ] + ) if aspect_ratio is not None: # Fix the aspect ratio of the bounding box - bbox = fix_bbox_aspect_ratio(bbox, aspect_ratio=aspect_ratio, padding=1.0, bbox_format='xywh') - + bbox = fix_bbox_aspect_ratio(bbox, aspect_ratio=aspect_ratio, padding=1.0, bbox_format="xywh") + x0, y0, w, h = np.hsplit(bbox, [1, 2, 3]) x1 = x0 + w y1 = y0 + h - x_bbox_distances = np.max(np.stack([ - np.clip(x0 - kpts[:, 0], a_min=0, a_max=None), - np.clip(kpts[:, 0] - x1, a_min=0, a_max=None), - ]), axis=0) - y_bbox_distances = np.max(np.stack([ - np.clip(y0 - kpts[:, 1], a_min=0, a_max=None), - np.clip(kpts[:, 1] - y1, a_min=0, a_max=None), - ]), axis=0) + x_bbox_distances = np.max( + np.stack( + [ + np.clip(x0 - kpts[:, 0], a_min=0, a_max=None), + np.clip(kpts[:, 0] - x1, a_min=0, a_max=None), + ] + ), + axis=0, + ) + y_bbox_distances = np.max( + np.stack( + [ + np.clip(y0 - kpts[:, 1], a_min=0, a_max=None), + np.clip(kpts[:, 1] - y1, a_min=0, a_max=None), + ] + ), + axis=0, + ) padding_x = 2 * x_bbox_distances / w padding_y = 2 * y_bbox_distances / h @@ -45,37 +72,52 @@ def find_min_padding_exact(bbox, kpts, aspect_ratio=3/4, bbox_format='xywh'): padding = np.array(padding).flatten() padding[vis <= 0] = -1.0 - + return padding -def fix_bbox_aspect_ratio(bbox, aspect_ratio=3/4, padding=1.25, bbox_format='xywh'): - assert bbox_format.lower() in ['xywh', 'xyxy'], f"Invalid bbox format {bbox_format}. Only 'xyxy' or 'xywh' are supported." + +def fix_bbox_aspect_ratio(bbox, aspect_ratio=3 / 4, padding=1.25, bbox_format="xywh"): + """ + Adjust bounding box dimensions to match a target aspect ratio with optional padding. + + Args: + bbox (np.ndarray): Bounding box or array of boxes. + aspect_ratio (float): Target width/height ratio. Defaults to 3/4. + padding (float): Scale factor applied to the adjusted box. Defaults to 1.25. + bbox_format (str): Input format, either 'xywh' or 'xyxy'. Defaults to 'xywh'. + + Returns: + np.ndarray: Adjusted bounding box(es) in the same format as input. + """ + assert bbox_format.lower() in ["xywh", "xyxy"], f"Invalid bbox format {bbox_format}. Only 'xyxy' or 'xywh' are supported." in_shape = bbox.shape bbox = bbox.reshape((-1, 4)) - if bbox_format.lower() == 'xywh': - bbox_xyxy = np.array([ - bbox[:, 0], - bbox[:, 1], - bbox[:, 0] + bbox[:, 2], - bbox[:, 1] + bbox[:, 3], - ]).T + if bbox_format.lower() == "xywh": + bbox_xyxy = np.array( + [ + bbox[:, 0], + bbox[:, 1], + bbox[:, 0] + bbox[:, 2], + bbox[:, 1] + bbox[:, 3], + ] + ).T else: bbox_xyxy = np.array(bbox) - + centers = bbox_xyxy[:, :2] + (bbox_xyxy[:, 2:] - bbox_xyxy[:, :2]) / 2 widths = bbox_xyxy[:, 2] - bbox_xyxy[:, 0] heights = bbox_xyxy[:, 3] - bbox_xyxy[:, 1] - + new_widths = widths.copy().astype(np.float32) new_heights = heights.copy().astype(np.float32) for i in range(bbox_xyxy.shape[0]): if widths[i] == 0: - widths[i] =+ 1 + widths[i] = +1 if heights[i] == 0: - heights[i] =+ 1 + heights[i] = +1 if widths[i] / heights[i] > aspect_ratio: new_heights[i] = widths[i] / aspect_ratio @@ -84,24 +126,27 @@ def fix_bbox_aspect_ratio(bbox, aspect_ratio=3/4, padding=1.25, bbox_format='xyw new_widths *= padding new_heights *= padding - new_bbox_xyxy = np.array([ - centers[:, 0] - new_widths / 2, - centers[:, 1] - new_heights / 2, - centers[:, 0] + new_widths / 2, - centers[:, 1] + new_heights / 2, - ]).T - - if bbox_format.lower() == 'xywh': - new_bbox = np.array([ - new_bbox_xyxy[:, 0], - new_bbox_xyxy[:, 1], - new_bbox_xyxy[:, 2] - new_bbox_xyxy[:, 0], - new_bbox_xyxy[:, 3] - new_bbox_xyxy[:, 1], - ]).T + new_bbox_xyxy = np.array( + [ + centers[:, 0] - new_widths / 2, + centers[:, 1] - new_heights / 2, + centers[:, 0] + new_widths / 2, + centers[:, 1] + new_heights / 2, + ] + ).T + + if bbox_format.lower() == "xywh": + new_bbox = np.array( + [ + new_bbox_xyxy[:, 0], + new_bbox_xyxy[:, 1], + new_bbox_xyxy[:, 2] - new_bbox_xyxy[:, 0], + new_bbox_xyxy[:, 3] - new_bbox_xyxy[:, 1], + ] + ).T else: new_bbox = new_bbox_xyxy - new_bbox = new_bbox.reshape(in_shape) - return new_bbox \ No newline at end of file + return new_bbox diff --git a/mmpose/structures/keypoint/transforms.py b/mmpose/structures/keypoint/transforms.py index aa7cebda907a7156d733944579790a871560ad2a..733f10c3bc96c51242bc1852855d9113c40d28b8 100644 --- a/mmpose/structures/keypoint/transforms.py +++ b/mmpose/structures/keypoint/transforms.py @@ -4,12 +4,13 @@ from typing import List, Optional, Tuple, Union import numpy as np -def flip_keypoints(keypoints: np.ndarray, - keypoints_visible: Optional[np.ndarray], - image_size: Tuple[int, int], - flip_indices: List[int], - direction: str = 'horizontal' - ) -> Tuple[np.ndarray, Optional[np.ndarray]]: +def flip_keypoints( + keypoints: np.ndarray, + keypoints_visible: Optional[np.ndarray], + image_size: Tuple[int, int], + flip_indices: List[int], + direction: str = "horizontal", +) -> Tuple[np.ndarray, Optional[np.ndarray]]: """Flip keypoints in the given direction. Note: @@ -38,27 +39,24 @@ def flip_keypoints(keypoints: np.ndarray, """ ndim = keypoints.ndim - assert keypoints.shape[:-1] == keypoints_visible.shape[:ndim - 1], ( - f'Mismatched shapes of keypoints {keypoints.shape} and ' - f'keypoints_visible {keypoints_visible.shape}') + assert keypoints.shape[:-1] == keypoints_visible.shape[: ndim - 1], ( + f"Mismatched shapes of keypoints {keypoints.shape} and " f"keypoints_visible {keypoints_visible.shape}" + ) - direction_options = {'horizontal', 'vertical', 'diagonal'} - assert direction in direction_options, ( - f'Invalid flipping direction "{direction}". ' - f'Options are {direction_options}') + direction_options = {"horizontal", "vertical", "diagonal"} + assert direction in direction_options, f'Invalid flipping direction "{direction}". ' f"Options are {direction_options}" # swap the symmetric keypoint pairs - if direction == 'horizontal' or direction == 'vertical': + if direction == "horizontal" or direction == "vertical": keypoints = keypoints.take(flip_indices, axis=ndim - 2) if keypoints_visible is not None: - keypoints_visible = keypoints_visible.take( - flip_indices, axis=ndim - 2) + keypoints_visible = keypoints_visible.take(flip_indices, axis=ndim - 2) # flip the keypoints w, h = image_size - if direction == 'horizontal': + if direction == "horizontal": keypoints[..., 0] = w - 1 - keypoints[..., 0] - elif direction == 'vertical': + elif direction == "vertical": keypoints[..., 1] = h - 1 - keypoints[..., 1] else: keypoints = [w, h] - keypoints - 1 @@ -66,12 +64,14 @@ def flip_keypoints(keypoints: np.ndarray, return keypoints, keypoints_visible -def flip_keypoints_custom_center(keypoints: np.ndarray, - keypoints_visible: np.ndarray, - flip_indices: List[int], - center_mode: str = 'static', - center_x: float = 0.5, - center_index: Union[int, List] = 0): +def flip_keypoints_custom_center( + keypoints: np.ndarray, + keypoints_visible: np.ndarray, + flip_indices: List[int], + center_mode: str = "static", + center_x: float = 0.5, + center_index: Union[int, List] = 0, +): """Flip human joints horizontally. Note: @@ -99,17 +99,15 @@ def flip_keypoints_custom_center(keypoints: np.ndarray, np.ndarray([..., K, C]): Flipped joints. """ - assert keypoints.ndim >= 2, f'Invalid pose shape {keypoints.shape}' + assert keypoints.ndim >= 2, f"Invalid pose shape {keypoints.shape}" - allowed_center_mode = {'static', 'root'} - assert center_mode in allowed_center_mode, 'Get invalid center_mode ' \ - f'{center_mode}, allowed choices are {allowed_center_mode}' + allowed_center_mode = {"static", "root"} + assert center_mode in allowed_center_mode, "Get invalid center_mode " f"{center_mode}, allowed choices are {allowed_center_mode}" - if center_mode == 'static': + if center_mode == "static": x_c = center_x - elif center_mode == 'root': - center_index = [center_index] if isinstance(center_index, int) else \ - center_index + elif center_mode == "root": + center_index = [center_index] if isinstance(center_index, int) else center_index assert keypoints.shape[-2] > max(center_index) x_c = keypoints[..., center_index, 0].mean(axis=-1) @@ -125,9 +123,7 @@ def flip_keypoints_custom_center(keypoints: np.ndarray, return keypoints_flipped, keypoints_visible_flipped -def keypoint_clip_border(keypoints: np.ndarray, keypoints_visible: np.ndarray, - shape: Tuple[int, - int]) -> Tuple[np.ndarray, np.ndarray]: +def keypoint_clip_border(keypoints: np.ndarray, keypoints_visible: np.ndarray, shape: Tuple[int, int]) -> Tuple[np.ndarray, np.ndarray]: """Set the visibility values for keypoints outside the image border. Args: @@ -143,8 +139,7 @@ def keypoint_clip_border(keypoints: np.ndarray, keypoints_visible: np.ndarray, width, height = shape[:2] # Create a mask for keypoints outside the frame - outside_mask = ((keypoints[..., 0] > width) | (keypoints[..., 0] < 0) | - (keypoints[..., 1] > height) | (keypoints[..., 1] < 0)) + outside_mask = (keypoints[..., 0] > width) | (keypoints[..., 0] < 0) | (keypoints[..., 1] > height) | (keypoints[..., 1] < 0) # Update visibility values for keypoints outside the frame if keypoints_visible.ndim == 2: diff --git a/mmpose/structures/multilevel_pixel_data.py b/mmpose/structures/multilevel_pixel_data.py index bea191e7297c233cc129f2da09ab5a4c6793fa0f..041f4d145a07a8db6970f896423baaed713c8a6b 100644 --- a/mmpose/structures/multilevel_pixel_data.py +++ b/mmpose/structures/multilevel_pixel_data.py @@ -7,9 +7,7 @@ import torch from mmengine.structures import BaseDataElement, PixelData from mmengine.utils import is_list_of -IndexType = Union[str, slice, int, list, torch.LongTensor, - torch.cuda.LongTensor, torch.BoolTensor, - torch.cuda.BoolTensor, np.ndarray] +IndexType = Union[str, slice, int, list, torch.LongTensor, torch.cuda.LongTensor, torch.BoolTensor, torch.cuda.BoolTensor, np.ndarray] class MultilevelPixelData(BaseDataElement): @@ -51,7 +49,7 @@ class MultilevelPixelData(BaseDataElement): """ def __init__(self, *, metainfo: Optional[dict] = None, **kwargs) -> None: - object.__setattr__(self, '_nlevel', None) + object.__setattr__(self, "_nlevel", None) super().__init__(metainfo=metainfo, **kwargs) @property @@ -64,13 +62,11 @@ class MultilevelPixelData(BaseDataElement): """ return self._nlevel - def __getitem__(self, item: Union[int, str, list, - slice]) -> Union[PixelData, Sequence]: + def __getitem__(self, item: Union[int, str, list, slice]) -> Union[PixelData, Sequence]: if isinstance(item, int): if self.nlevel is None or item >= self.nlevel: - raise IndexError( - f'Lcale index {item} out of range ({self.nlevel})') - return self.get(f'_level_{item}') + raise IndexError(f"Lcale index {item} out of range ({self.nlevel})") + return self.get(f"_level_{item}") if isinstance(item, str): if item not in self: @@ -78,9 +74,7 @@ class MultilevelPixelData(BaseDataElement): return getattr(self, item) # TODO: support indexing by list and slice over levels - raise NotImplementedError( - f'{self.__class__.__name__} does not support index type ' - f'{type(item)}') + raise NotImplementedError(f"{self.__class__.__name__} does not support index type " f"{type(item)}") def levels(self) -> List[PixelData]: if self.nlevel: @@ -108,66 +102,51 @@ class MultilevelPixelData(BaseDataElement): data (dict): A dict contains annotations of image or model predictions. """ - assert isinstance(data, - dict), f'meta should be a `dict` but got {data}' + assert isinstance(data, dict), f"meta should be a `dict` but got {data}" for k, v in data.items(): - self.set_field(v, k, field_type='data') + self.set_field(v, k, field_type="data") - def set_field(self, - value: Any, - name: str, - dtype: Optional[Union[Type, Tuple[Type, ...]]] = None, - field_type: str = 'data') -> None: + def set_field(self, value: Any, name: str, dtype: Optional[Union[Type, Tuple[Type, ...]]] = None, field_type: str = "data") -> None: """Special method for set union field, used as property.setter functions.""" - assert field_type in ['metainfo', 'data'] + assert field_type in ["metainfo", "data"] if dtype is not None: - assert isinstance( - value, - dtype), f'{value} should be a {dtype} but got {type(value)}' + assert isinstance(value, dtype), f"{value} should be a {dtype} but got {type(value)}" - if name.startswith('_level_'): - raise AttributeError( - f'Cannot set {name} to be a field because the pattern ' - '<_level_{n}> is reserved for inner data field') + if name.startswith("_level_"): + raise AttributeError(f"Cannot set {name} to be a field because the pattern " "<_level_{n}> is reserved for inner data field") - if field_type == 'metainfo': + if field_type == "metainfo": if name in self._data_fields: - raise AttributeError( - f'Cannot set {name} to be a field of metainfo ' - f'because {name} is already a data field') + raise AttributeError(f"Cannot set {name} to be a field of metainfo " f"because {name} is already a data field") self._metainfo_fields.add(name) else: if name in self._metainfo_fields: - raise AttributeError( - f'Cannot set {name} to be a field of data ' - f'because {name} is already a metainfo field') + raise AttributeError(f"Cannot set {name} to be a field of data " f"because {name} is already a metainfo field") if not isinstance(value, abc.Sequence): - raise TypeError( - 'The value should be a sequence (of numpy.ndarray or' - f'torch.Tesnor), but got a {type(value)}') + raise TypeError("The value should be a sequence (of numpy.ndarray or" f"torch.Tesnor), but got a {type(value)}") if len(value) == 0: - raise ValueError('Setting empty value is not allowed') + raise ValueError("Setting empty value is not allowed") if not isinstance(value[0], (torch.Tensor, np.ndarray)): raise TypeError( - 'The value should be a sequence of numpy.ndarray or' - f'torch.Tesnor, but got a sequence of {type(value[0])}') + "The value should be a sequence of numpy.ndarray or" f"torch.Tesnor, but got a sequence of {type(value[0])}" + ) if self.nlevel is not None: assert len(value) == self.nlevel, ( - f'The length of the value ({len(value)}) should match the' - f'number of the levels ({self.nlevel})') + f"The length of the value ({len(value)}) should match the" f"number of the levels ({self.nlevel})" + ) else: - object.__setattr__(self, '_nlevel', len(value)) + object.__setattr__(self, "_nlevel", len(value)) for i in range(self.nlevel): - object.__setattr__(self, f'_level_{i}', PixelData()) + object.__setattr__(self, f"_level_{i}", PixelData()) for i, v in enumerate(value): - self[i].set_field(v, name, field_type='data') + self[i].set_field(v, name, field_type="data") self._data_fields.add(name) @@ -179,9 +158,8 @@ class MultilevelPixelData(BaseDataElement): Args: item (str): The key to delete. """ - if item in ('_metainfo_fields', '_data_fields'): - raise AttributeError(f'{item} has been used as a ' - 'private attribute, which is immutable. ') + if item in ("_metainfo_fields", "_data_fields"): + raise AttributeError(f"{item} has been used as a " "private attribute, which is immutable. ") if item in self._metainfo_fields: super().__delattr__(item) @@ -191,17 +169,14 @@ class MultilevelPixelData(BaseDataElement): self._data_fields.remove(item) def __getattr__(self, name): - if name in {'_data_fields', '_metainfo_fields' - } or name not in self._data_fields: - raise AttributeError( - f'\'{self.__class__.__name__}\' object has no attribute ' - f'\'{name}\'') + if name in {"_data_fields", "_metainfo_fields"} or name not in self._data_fields: + raise AttributeError(f"'{self.__class__.__name__}' object has no attribute " f"'{name}'") return [getattr(level, name) for level in self.levels()] def pop(self, *args) -> Any: """pop property in data and metainfo as the same as python.""" - assert len(args) < 3, '``pop`` get more than 2 arguments' + assert len(args) < 3, "``pop`` get more than 2 arguments" name = args[0] if name in self._metainfo_fields: self._metainfo_fields.remove(name) @@ -217,10 +192,9 @@ class MultilevelPixelData(BaseDataElement): else: # don't just use 'self.__dict__.pop(*args)' for only popping key in # metainfo or data - raise KeyError(f'{args[0]} is not contained in metainfo or data') + raise KeyError(f"{args[0]} is not contained in metainfo or data") - def _convert(self, apply_to: Type, - func: Callable[[Any], Any]) -> 'MultilevelPixelData': + def _convert(self, apply_to: Type, func: Callable[[Any], Any]) -> "MultilevelPixelData": """Convert data items with the given function. Args: @@ -239,34 +213,32 @@ class MultilevelPixelData(BaseDataElement): new_data.set_data(data) return new_data - def cpu(self) -> 'MultilevelPixelData': + def cpu(self) -> "MultilevelPixelData": """Convert all tensors to CPU in data.""" return self._convert(apply_to=torch.Tensor, func=lambda x: x.cpu()) - def cuda(self) -> 'MultilevelPixelData': + def cuda(self) -> "MultilevelPixelData": """Convert all tensors to GPU in data.""" return self._convert(apply_to=torch.Tensor, func=lambda x: x.cuda()) - def detach(self) -> 'MultilevelPixelData': + def detach(self) -> "MultilevelPixelData": """Detach all tensors in data.""" return self._convert(apply_to=torch.Tensor, func=lambda x: x.detach()) - def numpy(self) -> 'MultilevelPixelData': + def numpy(self) -> "MultilevelPixelData": """Convert all tensor to np.narray in data.""" - return self._convert( - apply_to=torch.Tensor, func=lambda x: x.detach().cpu().numpy()) + return self._convert(apply_to=torch.Tensor, func=lambda x: x.detach().cpu().numpy()) - def to_tensor(self) -> 'MultilevelPixelData': + def to_tensor(self) -> "MultilevelPixelData": """Convert all tensor to np.narray in data.""" - return self._convert( - apply_to=np.ndarray, func=lambda x: torch.from_numpy(x)) + return self._convert(apply_to=np.ndarray, func=lambda x: torch.from_numpy(x)) # Tensor-like methods - def to(self, *args, **kwargs) -> 'MultilevelPixelData': + def to(self, *args, **kwargs) -> "MultilevelPixelData": """Apply same name function to all tensors in data_fields.""" new_data = self.new() for k, v in self.items(): - if hasattr(v[0], 'to'): + if hasattr(v[0], "to"): v = [v_.to(*args, **kwargs) for v_ in v] data = {k: v} new_data.set_data(data) diff --git a/mmpose/structures/pose_data_sample.py b/mmpose/structures/pose_data_sample.py index 53f6e8990e96206844b18b4983e416a534fd5afc..d8636409ea2c44d7ad17ea941f2f84592191bfc3 100644 --- a/mmpose/structures/pose_data_sample.py +++ b/mmpose/structures/pose_data_sample.py @@ -49,7 +49,7 @@ class PoseDataSample(BaseDataElement): @gt_instances.setter def gt_instances(self, value: InstanceData): - self.set_field(value, '_gt_instances', dtype=InstanceData) + self.set_field(value, "_gt_instances", dtype=InstanceData) @gt_instances.deleter def gt_instances(self): @@ -61,7 +61,7 @@ class PoseDataSample(BaseDataElement): @gt_instance_labels.setter def gt_instance_labels(self, value: InstanceData): - self.set_field(value, '_gt_instance_labels', dtype=InstanceData) + self.set_field(value, "_gt_instance_labels", dtype=InstanceData) @gt_instance_labels.deleter def gt_instance_labels(self): @@ -73,7 +73,7 @@ class PoseDataSample(BaseDataElement): @pred_instances.setter def pred_instances(self, value: InstanceData): - self.set_field(value, '_pred_instances', dtype=InstanceData) + self.set_field(value, "_pred_instances", dtype=InstanceData) @pred_instances.deleter def pred_instances(self): @@ -85,7 +85,7 @@ class PoseDataSample(BaseDataElement): @gt_fields.setter def gt_fields(self, value: Union[PixelData, MultilevelPixelData]): - self.set_field(value, '_gt_fields', dtype=type(value)) + self.set_field(value, "_gt_fields", dtype=type(value)) @gt_fields.deleter def gt_fields(self): @@ -97,7 +97,7 @@ class PoseDataSample(BaseDataElement): @pred_fields.setter def pred_fields(self, value: PixelData): - self.set_field(value, '_pred_heatmaps', dtype=PixelData) + self.set_field(value, "_pred_heatmaps", dtype=PixelData) @pred_fields.deleter def pred_fields(self): diff --git a/mmpose/structures/utils.py b/mmpose/structures/utils.py index 11abd9ee6c0627093d29212fceb645575b6f022f..a28ddc5c78b16b118fddc8bcc474e5459190107d 100644 --- a/mmpose/structures/utils.py +++ b/mmpose/structures/utils.py @@ -7,12 +7,11 @@ import numpy as np import torch from mmengine.structures import InstanceData, PixelData from mmengine.utils import is_list_of +from pycocotools import mask as Mask from .bbox.transforms import get_warp_matrix from .pose_data_sample import PoseDataSample -from pycocotools import mask as Mask - def merge_data_samples(data_samples: List[PoseDataSample]) -> PoseDataSample: """Merge the given data samples into a single data sample. @@ -31,29 +30,23 @@ def merge_data_samples(data_samples: List[PoseDataSample]) -> PoseDataSample: """ if not is_list_of(data_samples, PoseDataSample): - raise ValueError('Invalid input type, should be a list of ' - ':obj:`PoseDataSample`') + raise ValueError("Invalid input type, should be a list of " ":obj:`PoseDataSample`") if len(data_samples) == 0: - warnings.warn('Try to merge an empty list of data samples.') + warnings.warn("Try to merge an empty list of data samples.") return PoseDataSample() merged = PoseDataSample(metainfo=data_samples[0].metainfo) - if 'gt_instances' in data_samples[0]: - merged.gt_instances = InstanceData.cat( - [d.gt_instances for d in data_samples]) + if "gt_instances" in data_samples[0]: + merged.gt_instances = InstanceData.cat([d.gt_instances for d in data_samples]) - if 'pred_instances' in data_samples[0]: - merged.pred_instances = InstanceData.cat( - [d.pred_instances for d in data_samples]) + if "pred_instances" in data_samples[0]: + merged.pred_instances = InstanceData.cat([d.pred_instances for d in data_samples]) - if 'pred_fields' in data_samples[0] and 'heatmaps' in data_samples[ - 0].pred_fields: + if "pred_fields" in data_samples[0] and "heatmaps" in data_samples[0].pred_fields: reverted_heatmaps = [ - revert_heatmap(data_sample.pred_fields.heatmaps, - data_sample.input_center, data_sample.input_scale, - data_sample.ori_shape) + revert_heatmap(data_sample.pred_fields.heatmaps, data_sample.input_center, data_sample.input_scale, data_sample.ori_shape) for data_sample in data_samples ] @@ -62,12 +55,9 @@ def merge_data_samples(data_samples: List[PoseDataSample]) -> PoseDataSample: pred_fields.set_data(dict(heatmaps=merged_heatmaps)) merged.pred_fields = pred_fields - if 'gt_fields' in data_samples[0] and 'heatmaps' in data_samples[ - 0].gt_fields: + if "gt_fields" in data_samples[0] and "heatmaps" in data_samples[0].gt_fields: reverted_heatmaps = [ - revert_heatmap(data_sample.gt_fields.heatmaps, - data_sample.input_center, data_sample.input_scale, - data_sample.ori_shape) + revert_heatmap(data_sample.gt_fields.heatmaps, data_sample.input_center, data_sample.input_scale, data_sample.ori_shape) for data_sample in data_samples ] @@ -98,15 +88,9 @@ def revert_heatmap(heatmap, input_center, input_scale, img_shape): hm_h, hm_w = heatmap.shape[:2] img_h, img_w = img_shape - warp_mat = get_warp_matrix( - input_center.reshape((2, )), - input_scale.reshape((2, )), - rot=0, - output_size=(hm_w, hm_h), - inv=True) + warp_mat = get_warp_matrix(input_center.reshape((2,)), input_scale.reshape((2,)), rot=0, output_size=(hm_w, hm_h), inv=True) - heatmap = cv2.warpAffine( - heatmap, warp_mat, (img_w, img_h), flags=cv2.INTER_LINEAR) + heatmap = cv2.warpAffine(heatmap, warp_mat, (img_w, img_h), flags=cv2.INTER_LINEAR) # [H, W, K] -> [K, H, W] if ndim == 3: @@ -129,11 +113,11 @@ def split_instances(instances: InstanceData) -> List[InstanceData]: keypoints=instances.keypoints[i].tolist(), keypoint_scores=instances.keypoint_scores[i].tolist(), ) - if 'bboxes' in instances: - result['bbox'] = instances.bboxes[i].flatten().tolist() - if 'bbox_scores' in instances: - result['bbox_score'] = instances.bbox_scores[i] - if 'masks' in instances: + if "bboxes" in instances: + result["bbox"] = instances.bboxes[i].flatten().tolist() + if "bbox_scores" in instances: + result["bbox_score"] = instances.bbox_scores[i] + if "masks" in instances: # Conver mask from binary to COCO polygon format mask = instances.masks[i].astype(np.uint8) contours, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE) @@ -143,7 +127,7 @@ def split_instances(instances: InstanceData) -> List[InstanceData]: # Valid polygons have >= 6 coordinates (3 points) if contour.size >= 6: segmentation.append(contour.flatten().tolist()) - result['segmentation'] = segmentation + result["segmentation"] = segmentation results.append(result) return results diff --git a/mmpose/testing/__init__.py b/mmpose/testing/__init__.py index 5612dac6c66e3bf7c2bad86154ac62c9d5e9529a..728eb83807b9cf24019940dc1a3779322ecbb314 100644 --- a/mmpose/testing/__init__.py +++ b/mmpose/testing/__init__.py @@ -1,8 +1,4 @@ # Copyright (c) OpenMMLab. All rights reserved. -from ._utils import (get_coco_sample, get_config_file, get_packed_inputs, - get_pose_estimator_cfg, get_repo_dir) +from ._utils import get_coco_sample, get_config_file, get_packed_inputs, get_pose_estimator_cfg, get_repo_dir -__all__ = [ - 'get_packed_inputs', 'get_coco_sample', 'get_config_file', - 'get_pose_estimator_cfg', 'get_repo_dir' -] +__all__ = ["get_packed_inputs", "get_coco_sample", "get_config_file", "get_pose_estimator_cfg", "get_repo_dir"] diff --git a/mmpose/testing/_utils.py b/mmpose/testing/_utils.py index 2a2dd023484cf1a3ed54e8115b3694f9dd8cb9c8..df20d733c7bbbe2a84f9fef4dab877b9667913fa 100644 --- a/mmpose/testing/_utils.py +++ b/mmpose/testing/_utils.py @@ -14,13 +14,14 @@ from mmpose.structures.bbox import bbox_xyxy2cs def get_coco_sample( - img_shape=(240, 320), - img_fill: Optional[int] = None, - num_instances=1, - with_bbox_cs=True, - with_img_mask=False, - random_keypoints_visible=False, - non_occlusion=False): + img_shape=(240, 320), + img_fill: Optional[int] = None, + num_instances=1, + with_bbox_cs=True, + with_img_mask=False, + random_keypoints_visible=False, + non_occlusion=False, +): """Create a dummy data sample in COCO style.""" rng = np.random.RandomState(0) h, w = img_shape @@ -38,57 +39,72 @@ def get_coco_sample( keypoints = _rand_keypoints(rng, bbox, 17) if random_keypoints_visible: - keypoints_visible = np.random.randint(0, 2, (num_instances, - 17)).astype(np.float32) + keypoints_visible = np.random.randint(0, 2, (num_instances, 17)).astype(np.float32) else: keypoints_visible = np.full((num_instances, 17), 1, dtype=np.float32) upper_body_ids = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] lower_body_ids = [11, 12, 13, 14, 15, 16] - flip_pairs = [[2, 1], [1, 2], [4, 3], [3, 4], [6, 5], [5, 6], [8, 7], - [7, 8], [10, 9], [9, 10], [12, 11], [11, 12], [14, 13], - [13, 14], [16, 15], [15, 16]] + flip_pairs = [ + [2, 1], + [1, 2], + [4, 3], + [3, 4], + [6, 5], + [5, 6], + [8, 7], + [7, 8], + [10, 9], + [9, 10], + [12, 11], + [11, 12], + [14, 13], + [13, 14], + [16, 15], + [15, 16], + ] flip_indices = [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15] - dataset_keypoint_weights = np.array([ - 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, - 1.5 - ]).astype(np.float32) + dataset_keypoint_weights = np.array([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5]).astype( + np.float32 + ) data = { - 'img': img, - 'img_shape': img_shape, - 'ori_shape': img_shape, - 'bbox': bbox, - 'keypoints': keypoints, - 'keypoints_visible': keypoints_visible, - 'upper_body_ids': upper_body_ids, - 'lower_body_ids': lower_body_ids, - 'flip_pairs': flip_pairs, - 'flip_indices': flip_indices, - 'dataset_keypoint_weights': dataset_keypoint_weights, - 'invalid_segs': [], + "img": img, + "img_shape": img_shape, + "ori_shape": img_shape, + "bbox": bbox, + "keypoints": keypoints, + "keypoints_visible": keypoints_visible, + "upper_body_ids": upper_body_ids, + "lower_body_ids": lower_body_ids, + "flip_pairs": flip_pairs, + "flip_indices": flip_indices, + "dataset_keypoint_weights": dataset_keypoint_weights, + "invalid_segs": [], } if with_bbox_cs: - data['bbox_center'], data['bbox_scale'] = bbox_xyxy2cs(data['bbox']) + data["bbox_center"], data["bbox_scale"] = bbox_xyxy2cs(data["bbox"]) if with_img_mask: - data['img_mask'] = np.random.randint(0, 2, (h, w), dtype=np.uint8) + data["img_mask"] = np.random.randint(0, 2, (h, w), dtype=np.uint8) return data -def get_packed_inputs(batch_size=2, - num_instances=1, - num_keypoints=17, - num_levels=1, - img_shape=(256, 192), - input_size=(192, 256), - heatmap_size=(48, 64), - simcc_split_ratio=2.0, - with_heatmap=True, - with_reg_label=True, - with_simcc_label=True): +def get_packed_inputs( + batch_size=2, + num_instances=1, + num_keypoints=17, + num_levels=1, + img_shape=(256, 192), + input_size=(192, 256), + heatmap_size=(48, 64), + simcc_split_ratio=2.0, + with_heatmap=True, + with_reg_label=True, + with_simcc_label=True, +): """Create a dummy batch of model inputs and data samples.""" rng = np.random.RandomState(0) @@ -99,31 +115,30 @@ def get_packed_inputs(batch_size=2, # input h, w = img_shape image = rng.randint(0, 255, size=(3, h, w), dtype=np.uint8) - inputs['inputs'] = torch.from_numpy(image) + inputs["inputs"] = torch.from_numpy(image) # attributes bboxes = _rand_bboxes(rng, num_instances, w, h) bbox_centers, bbox_scales = bbox_xyxy2cs(bboxes) keypoints = _rand_keypoints(rng, bboxes, num_keypoints) - keypoints_visible = np.ones((num_instances, num_keypoints), - dtype=np.float32) + keypoints_visible = np.ones((num_instances, num_keypoints), dtype=np.float32) # meta img_meta = { - 'id': idx, - 'img_id': idx, - 'img_path': '.png', - 'img_shape': img_shape, - 'input_size': input_size, - 'input_center': bbox_centers, - 'input_scale': bbox_scales, - 'flip': False, - 'flip_direction': None, - 'flip_indices': list(range(num_keypoints)) + "id": idx, + "img_id": idx, + "img_path": ".png", + "img_shape": img_shape, + "input_size": input_size, + "input_center": bbox_centers, + "input_scale": bbox_scales, + "flip": False, + "flip_direction": None, + "flip_indices": list(range(num_keypoints)), } - np.random.shuffle(img_meta['flip_indices']) + np.random.shuffle(img_meta["flip_indices"]) data_sample = PoseDataSample(metainfo=img_meta) # gt_instance @@ -133,32 +148,27 @@ def get_packed_inputs(batch_size=2, # [N, K] -> [N, num_levels, K] # keep the first dimension as the num_instances if num_levels > 1: - keypoint_weights = np.tile(keypoints_visible[:, None], - (1, num_levels, 1)) + keypoint_weights = np.tile(keypoints_visible[:, None], (1, num_levels, 1)) else: keypoint_weights = keypoints_visible.copy() gt_instances.bboxes = bboxes gt_instances.bbox_centers = bbox_centers gt_instances.bbox_scales = bbox_scales - gt_instances.bbox_scores = np.ones((num_instances, ), dtype=np.float32) + gt_instances.bbox_scores = np.ones((num_instances,), dtype=np.float32) gt_instances.keypoints = keypoints gt_instances.keypoints_visible = keypoints_visible - gt_instance_labels.keypoint_weights = torch.FloatTensor( - keypoint_weights) + gt_instance_labels.keypoint_weights = torch.FloatTensor(keypoint_weights) if with_reg_label: - gt_instance_labels.keypoint_labels = torch.FloatTensor(keypoints / - input_size) + gt_instance_labels.keypoint_labels = torch.FloatTensor(keypoints / input_size) if with_simcc_label: len_x = np.around(input_size[0] * simcc_split_ratio) len_y = np.around(input_size[1] * simcc_split_ratio) - gt_instance_labels.keypoint_x_labels = torch.FloatTensor( - _rand_simcc_label(rng, num_instances, num_keypoints, len_x)) - gt_instance_labels.keypoint_y_labels = torch.FloatTensor( - _rand_simcc_label(rng, num_instances, num_keypoints, len_y)) + gt_instance_labels.keypoint_x_labels = torch.FloatTensor(_rand_simcc_label(rng, num_instances, num_keypoints, len_x)) + gt_instance_labels.keypoint_y_labels = torch.FloatTensor(_rand_simcc_label(rng, num_instances, num_keypoints, len_y)) # gt_fields if with_heatmap: @@ -183,7 +193,7 @@ def get_packed_inputs(batch_size=2, data_sample.gt_instances = gt_instances data_sample.gt_instance_labels = gt_instance_labels - inputs['data_samples'] = data_sample + inputs["data_samples"] = data_sample inputs_list.append(inputs) packed_inputs = pseudo_collate(inputs_list) @@ -193,8 +203,7 @@ def get_packed_inputs(batch_size=2, def _rand_keypoints(rng, bboxes, num_keypoints): n = bboxes.shape[0] relative_pos = rng.rand(n, num_keypoints, 2) - keypoints = relative_pos * bboxes[:, None, :2] + ( - 1 - relative_pos) * bboxes[:, None, 2:4] + keypoints = relative_pos * bboxes[:, None, :2] + (1 - relative_pos) * bboxes[:, None, 2:4] return keypoints @@ -225,6 +234,7 @@ def get_repo_dir(): except NameError: # For IPython development when __file__ is not defined import mmpose + repo_dir = osp.dirname(osp.dirname(mmpose.__file__)) return repo_dir @@ -233,13 +243,13 @@ def get_repo_dir(): def get_config_file(fn: str): """Return full path of a config file from the given relative path.""" repo_dir = get_repo_dir() - if fn.startswith('configs'): + if fn.startswith("configs"): fn_config = osp.join(repo_dir, fn) else: - fn_config = osp.join(repo_dir, 'configs', fn) + fn_config = osp.join(repo_dir, "configs", fn) if not osp.isfile(fn_config): - raise FileNotFoundError(f'Cannot find config file {fn_config}') + raise FileNotFoundError(f"Cannot find config file {fn_config}") return fn_config diff --git a/mmpose/utils/__init__.py b/mmpose/utils/__init__.py index fb9c018ed0c48c56d3ef17a3783c15f37f0292a4..6c39f08f6bee07245454b5904668141ed07bf99f 100644 --- a/mmpose/utils/__init__.py +++ b/mmpose/utils/__init__.py @@ -8,7 +8,13 @@ from .setup_env import register_all_modules, setup_multi_processes from .timer import StopWatch __all__ = [ - 'get_root_logger', 'collect_env', 'StopWatch', 'setup_multi_processes', - 'register_all_modules', 'SimpleCamera', 'SimpleCameraTorch', - 'adapt_mmdet_pipeline', 'reduce_mean' + "get_root_logger", + "collect_env", + "StopWatch", + "setup_multi_processes", + "register_all_modules", + "SimpleCamera", + "SimpleCameraTorch", + "adapt_mmdet_pipeline", + "reduce_mean", ] diff --git a/mmpose/utils/camera.py b/mmpose/utils/camera.py index a7759d308f38fda99fcf56910b09251d24eccbed..8bc977fe87409afae3eb873fecd19ceb7e978ebf 100644 --- a/mmpose/utils/camera.py +++ b/mmpose/utils/camera.py @@ -5,7 +5,7 @@ import numpy as np import torch from mmengine.registry import Registry -CAMERAS = Registry('camera') +CAMERAS = Registry("camera") class SingleCameraBase(metaclass=ABCMeta): @@ -85,55 +85,54 @@ class SimpleCamera(SingleCameraBase): self.param = {} # extrinsic param - R = np.array(param['R'], dtype=np.float32) - T = np.array(param['T'], dtype=np.float32) + R = np.array(param["R"], dtype=np.float32) + T = np.array(param["T"], dtype=np.float32) assert R.shape == (3, 3) assert T.shape == (3, 1) # The camera matrices are transposed in advance because the joint # coordinates are stored as row vectors. - self.param['R_c2w'] = R.T - self.param['T_c2w'] = T.T - self.param['R_w2c'] = R - self.param['T_w2c'] = -self.param['T_c2w'] @ self.param['R_w2c'] + self.param["R_c2w"] = R.T + self.param["T_c2w"] = T.T + self.param["R_w2c"] = R + self.param["T_w2c"] = -self.param["T_c2w"] @ self.param["R_w2c"] # intrinsic param - if 'K' in param: - K = np.array(param['K'], dtype=np.float32) + if "K" in param: + K = np.array(param["K"], dtype=np.float32) assert K.shape == (2, 3) - self.param['K'] = K.T - self.param['f'] = np.array([K[0, 0], K[1, 1]])[:, np.newaxis] - self.param['c'] = np.array([K[0, 2], K[1, 2]])[:, np.newaxis] - elif 'f' in param and 'c' in param: - f = np.array(param['f'], dtype=np.float32) - c = np.array(param['c'], dtype=np.float32) + self.param["K"] = K.T + self.param["f"] = np.array([K[0, 0], K[1, 1]])[:, np.newaxis] + self.param["c"] = np.array([K[0, 2], K[1, 2]])[:, np.newaxis] + elif "f" in param and "c" in param: + f = np.array(param["f"], dtype=np.float32) + c = np.array(param["c"], dtype=np.float32) assert f.shape == (2, 1) assert c.shape == (2, 1) - self.param['K'] = np.concatenate((np.diagflat(f), c), axis=-1).T - self.param['f'] = f - self.param['c'] = c + self.param["K"] = np.concatenate((np.diagflat(f), c), axis=-1).T + self.param["f"] = f + self.param["c"] = c else: - raise ValueError('Camera intrinsic parameters are missing. ' - 'Either "K" or "f"&"c" should be provided.') + raise ValueError("Camera intrinsic parameters are missing. " 'Either "K" or "f"&"c" should be provided.') # distortion param - if 'k' in param and 'p' in param: + if "k" in param and "p" in param: self.undistortion = True - self.param['k'] = np.array(param['k'], dtype=np.float32).flatten() - self.param['p'] = np.array(param['p'], dtype=np.float32).flatten() - assert self.param['k'].size in {3, 6} - assert self.param['p'].size == 2 + self.param["k"] = np.array(param["k"], dtype=np.float32).flatten() + self.param["p"] = np.array(param["p"], dtype=np.float32).flatten() + assert self.param["k"].size in {3, 6} + assert self.param["p"].size == 2 else: self.undistortion = False def world_to_camera(self, X): assert isinstance(X, np.ndarray) assert X.ndim >= 2 and X.shape[-1] == 3 - return X @ self.param['R_w2c'] + self.param['T_w2c'] + return X @ self.param["R_w2c"] + self.param["T_w2c"] def camera_to_world(self, X): assert isinstance(X, np.ndarray) assert X.ndim >= 2 and X.shape[-1] == 3 - return X @ self.param['R_c2w'] + self.param['T_c2w'] + return X @ self.param["R_c2w"] + self.param["T_c2w"] def camera_to_pixel(self, X): assert isinstance(X, np.ndarray) @@ -142,27 +141,24 @@ class SimpleCamera(SingleCameraBase): _X = X / X[..., 2:] if self.undistortion: - k = self.param['k'] - p = self.param['p'] + k = self.param["k"] + p = self.param["p"] _X_2d = _X[..., :2] r2 = (_X_2d**2).sum(-1) - radial = 1 + sum(ki * r2**(i + 1) for i, ki in enumerate(k[:3])) + radial = 1 + sum(ki * r2 ** (i + 1) for i, ki in enumerate(k[:3])) if k.size == 6: - radial /= 1 + sum( - (ki * r2**(i + 1) for i, ki in enumerate(k[3:]))) + radial /= 1 + sum((ki * r2 ** (i + 1) for i, ki in enumerate(k[3:]))) tangential = 2 * (p[1] * _X[..., 0] + p[0] * _X[..., 1]) - _X[..., :2] = _X_2d * (radial + tangential)[..., None] + np.outer( - r2, p[::-1]).reshape(_X_2d.shape) - return _X @ self.param['K'] + _X[..., :2] = _X_2d * (radial + tangential)[..., None] + np.outer(r2, p[::-1]).reshape(_X_2d.shape) + return _X @ self.param["K"] def pixel_to_camera(self, X): assert isinstance(X, np.ndarray) assert X.ndim >= 2 and X.shape[-1] == 3 _X = X.copy() - _X[:, :2] = (X[:, :2] - self.param['c'].T) / self.param['f'].T * X[:, - [2]] + _X[:, :2] = (X[:, :2] - self.param["c"].T) / self.param["f"].T * X[:, [2]] return _X @@ -204,58 +200,55 @@ class SimpleCameraTorch(SingleCameraBase): self.param = {} # extrinsic param - R = torch.tensor(param['R'], device=device) - T = torch.tensor(param['T'], device=device) + R = torch.tensor(param["R"], device=device) + T = torch.tensor(param["T"], device=device) assert R.shape == (3, 3) assert T.shape == (3, 1) # The camera matrices are transposed in advance because the joint # coordinates are stored as row vectors. - self.param['R_c2w'] = R.T - self.param['T_c2w'] = T.T - self.param['R_w2c'] = R - self.param['T_w2c'] = -self.param['T_c2w'] @ self.param['R_w2c'] + self.param["R_c2w"] = R.T + self.param["T_c2w"] = T.T + self.param["R_w2c"] = R + self.param["T_w2c"] = -self.param["T_c2w"] @ self.param["R_w2c"] # intrinsic param - if 'K' in param: - K = torch.tensor(param['K'], device=device) + if "K" in param: + K = torch.tensor(param["K"], device=device) assert K.shape == (2, 3) - self.param['K'] = K.T - self.param['f'] = torch.tensor([[K[0, 0]], [K[1, 1]]], - device=device) - self.param['c'] = torch.tensor([[K[0, 2]], [K[1, 2]]], - device=device) - elif 'f' in param and 'c' in param: - f = torch.tensor(param['f'], device=device) - c = torch.tensor(param['c'], device=device) + self.param["K"] = K.T + self.param["f"] = torch.tensor([[K[0, 0]], [K[1, 1]]], device=device) + self.param["c"] = torch.tensor([[K[0, 2]], [K[1, 2]]], device=device) + elif "f" in param and "c" in param: + f = torch.tensor(param["f"], device=device) + c = torch.tensor(param["c"], device=device) assert f.shape == (2, 1) assert c.shape == (2, 1) - self.param['K'] = torch.cat([torch.diagflat(f), c], dim=-1).T - self.param['f'] = f - self.param['c'] = c + self.param["K"] = torch.cat([torch.diagflat(f), c], dim=-1).T + self.param["f"] = f + self.param["c"] = c else: - raise ValueError('Camera intrinsic parameters are missing. ' - 'Either "K" or "f"&"c" should be provided.') + raise ValueError("Camera intrinsic parameters are missing. " 'Either "K" or "f"&"c" should be provided.') # distortion param - if 'k' in param and 'p' in param: + if "k" in param and "p" in param: self.undistortion = True - self.param['k'] = torch.tensor(param['k'], device=device).view(-1) - self.param['p'] = torch.tensor(param['p'], device=device).view(-1) - assert len(self.param['k']) in {3, 6} - assert len(self.param['p']) == 2 + self.param["k"] = torch.tensor(param["k"], device=device).view(-1) + self.param["p"] = torch.tensor(param["p"], device=device).view(-1) + assert len(self.param["k"]) in {3, 6} + assert len(self.param["p"]) == 2 else: self.undistortion = False def world_to_camera(self, X): assert isinstance(X, torch.Tensor) assert X.ndim >= 2 and X.shape[-1] == 3 - return X @ self.param['R_w2c'] + self.param['T_w2c'] + return X @ self.param["R_w2c"] + self.param["T_w2c"] def camera_to_world(self, X): assert isinstance(X, torch.Tensor) assert X.ndim >= 2 and X.shape[-1] == 3 - return X @ self.param['R_c2w'] + self.param['T_c2w'] + return X @ self.param["R_c2w"] + self.param["T_c2w"] def camera_to_pixel(self, X): assert isinstance(X, torch.Tensor) @@ -264,17 +257,15 @@ class SimpleCameraTorch(SingleCameraBase): _X = X / X[..., 2:] if self.undistortion: - k = self.param['k'] - p = self.param['p'] + k = self.param["k"] + p = self.param["p"] _X_2d = _X[..., :2] r2 = (_X_2d**2).sum(-1) - radial = 1 + sum(ki * r2**(i + 1) for i, ki in enumerate(k[:3])) + radial = 1 + sum(ki * r2 ** (i + 1) for i, ki in enumerate(k[:3])) if k.size == 6: - radial /= 1 + sum( - (ki * r2**(i + 1) for i, ki in enumerate(k[3:]))) + radial /= 1 + sum((ki * r2 ** (i + 1) for i, ki in enumerate(k[3:]))) tangential = 2 * (p[1] * _X[..., 0] + p[0] * _X[..., 1]) - _X[..., :2] = _X_2d * (radial + tangential)[..., None] + torch.ger( - r2, p.flip([0])).reshape(_X_2d.shape) - return _X @ self.param['K'] + _X[..., :2] = _X_2d * (radial + tangential)[..., None] + torch.ger(r2, p.flip([0])).reshape(_X_2d.shape) + return _X @ self.param["K"] diff --git a/mmpose/utils/collect_env.py b/mmpose/utils/collect_env.py index e8fb5f35e10fe6535b49b7eb7def1459b28835e3..b15c9079abe08a67bde36967e6e616cfd46502b6 100644 --- a/mmpose/utils/collect_env.py +++ b/mmpose/utils/collect_env.py @@ -7,10 +7,10 @@ import mmpose def collect_env(): env_info = collect_base_env() - env_info['MMPose'] = (mmpose.__version__ + '+' + get_git_hash(digits=7)) + env_info["MMPose"] = mmpose.__version__ + "+" + get_git_hash(digits=7) return env_info -if __name__ == '__main__': +if __name__ == "__main__": for name, val in collect_env().items(): - print(f'{name}: {val}') + print(f"{name}: {val}") diff --git a/mmpose/utils/config_utils.py b/mmpose/utils/config_utils.py index 2f54d2ef24093a77933dbf8026465e3cdaf5e839..10f48d6a9d41f4afd55ac3363defe9a16381c47f 100644 --- a/mmpose/utils/config_utils.py +++ b/mmpose/utils/config_utils.py @@ -15,12 +15,12 @@ def adapt_mmdet_pipeline(cfg: ConfigDict) -> ConfigDict: # use lazy import to avoid hard dependence on mmdet from mmdet.datasets import transforms - if 'test_dataloader' not in cfg: + if "test_dataloader" not in cfg: return cfg pipeline = cfg.test_dataloader.dataset.pipeline for trans in pipeline: - if trans['type'] in dir(transforms): - trans['type'] = 'mmdet.' + trans['type'] + if trans["type"] in dir(transforms): + trans["type"] = "mmdet." + trans["type"] return cfg diff --git a/mmpose/utils/dist_utils.py b/mmpose/utils/dist_utils.py index 915f92585a3780576490e92199a0baebe0cb7e7d..0dbb537f9baa5550229e1008ef22615516799a5b 100644 --- a/mmpose/utils/dist_utils.py +++ b/mmpose/utils/dist_utils.py @@ -3,7 +3,7 @@ import torch.distributed as dist def reduce_mean(tensor): - """"Obtain the mean of tensor on different GPUs.""" + """ "Obtain the mean of tensor on different GPUs.""" if not (dist.is_available() and dist.is_initialized()): return tensor tensor = tensor.clone() diff --git a/mmpose/utils/hooks.py b/mmpose/utils/hooks.py index 4a2eb8aea29646d5de3587a43f2746bd8a64e30f..f476bc4bd565d479f83cd8ef5561c271cd136d4a 100644 --- a/mmpose/utils/hooks.py +++ b/mmpose/utils/hooks.py @@ -19,12 +19,9 @@ class OutputHook: self.layer_outputs[name] = output else: if isinstance(output, list): - self.layer_outputs[name] = [ - out.detach().cpu().numpy() for out in output - ] + self.layer_outputs[name] = [out.detach().cpu().numpy() for out in output] else: - self.layer_outputs[name] = output.detach().cpu().numpy( - ) + self.layer_outputs[name] = output.detach().cpu().numpy() return hook @@ -35,8 +32,7 @@ class OutputHook: layer = rgetattr(module, name) h = layer.register_forward_hook(hook_wrapper(name)) except ModuleNotFoundError as module_not_found: - raise ModuleNotFoundError( - f'Module {name} not found') from module_not_found + raise ModuleNotFoundError(f"Module {name} not found") from module_not_found self.handles.append(h) def remove(self): @@ -65,7 +61,7 @@ def rsetattr(obj, attr, val): attr (str): The attribute path in dot notation (e.g., 'x.y.z'). val (any): The value to set at the specified attribute path. """ - pre, _, post = attr.rpartition('.') + pre, _, post = attr.rpartition(".") return setattr(rgetattr(obj, pre) if pre else obj, post, val) @@ -87,4 +83,4 @@ def rgetattr(obj, attr, *args): def _getattr(obj, attr): return getattr(obj, attr, *args) - return functools.reduce(_getattr, [obj] + attr.split('.')) + return functools.reduce(_getattr, [obj] + attr.split(".")) diff --git a/mmpose/utils/logger.py b/mmpose/utils/logger.py index f67e56efeb998cf966e3729c90791b4a70f2bb84..3b98af740506815e277e9c426270522d755af131 100644 --- a/mmpose/utils/logger.py +++ b/mmpose/utils/logger.py @@ -22,4 +22,4 @@ def get_root_logger(log_file=None, log_level=logging.INFO): Returns: logging.Logger: The root logger. """ - return MMLogger('MMLogger', __name__.split('.')[0], log_file, log_level) + return MMLogger("MMLogger", __name__.split(".")[0], log_file, log_level) diff --git a/mmpose/utils/setup_env.py b/mmpose/utils/setup_env.py index ff299539ef8cc83a17a24e41498c01ff4f26667f..8810f037f7cba412ae8610e4dd75c3942eda08d1 100644 --- a/mmpose/utils/setup_env.py +++ b/mmpose/utils/setup_env.py @@ -12,41 +12,44 @@ from mmengine import DefaultScope def setup_multi_processes(cfg): """Setup multi-processing environment variables.""" # set multi-process start method as `fork` to speed up the training - if platform.system() != 'Windows': - mp_start_method = cfg.get('mp_start_method', 'fork') + if platform.system() != "Windows": + mp_start_method = cfg.get("mp_start_method", "fork") current_method = mp.get_start_method(allow_none=True) if current_method is not None and current_method != mp_start_method: warnings.warn( - f'Multi-processing start method `{mp_start_method}` is ' - f'different from the previous setting `{current_method}`.' - f'It will be force set to `{mp_start_method}`. You can change ' - f'this behavior by changing `mp_start_method` in your config.') + f"Multi-processing start method `{mp_start_method}` is " + f"different from the previous setting `{current_method}`." + f"It will be force set to `{mp_start_method}`. You can change " + f"this behavior by changing `mp_start_method` in your config." + ) mp.set_start_method(mp_start_method, force=True) # disable opencv multithreading to avoid system being overloaded - opencv_num_threads = cfg.get('opencv_num_threads', 0) + opencv_num_threads = cfg.get("opencv_num_threads", 0) cv2.setNumThreads(opencv_num_threads) # setup OMP threads # This code is referred from https://github.com/pytorch/pytorch/blob/master/torch/distributed/run.py # noqa - if 'OMP_NUM_THREADS' not in os.environ and cfg.data.workers_per_gpu > 1: + if "OMP_NUM_THREADS" not in os.environ and cfg.data.workers_per_gpu > 1: omp_num_threads = 1 warnings.warn( - f'Setting OMP_NUM_THREADS environment variable for each process ' - f'to be {omp_num_threads} in default, to avoid your system being ' - f'overloaded, please further tune the variable for optimal ' - f'performance in your application as needed.') - os.environ['OMP_NUM_THREADS'] = str(omp_num_threads) + f"Setting OMP_NUM_THREADS environment variable for each process " + f"to be {omp_num_threads} in default, to avoid your system being " + f"overloaded, please further tune the variable for optimal " + f"performance in your application as needed." + ) + os.environ["OMP_NUM_THREADS"] = str(omp_num_threads) # setup MKL threads - if 'MKL_NUM_THREADS' not in os.environ and cfg.data.workers_per_gpu > 1: + if "MKL_NUM_THREADS" not in os.environ and cfg.data.workers_per_gpu > 1: mkl_num_threads = 1 warnings.warn( - f'Setting MKL_NUM_THREADS environment variable for each process ' - f'to be {mkl_num_threads} in default, to avoid your system being ' - f'overloaded, please further tune the variable for optimal ' - f'performance in your application as needed.') - os.environ['MKL_NUM_THREADS'] = str(mkl_num_threads) + f"Setting MKL_NUM_THREADS environment variable for each process " + f"to be {mkl_num_threads} in default, to avoid your system being " + f"overloaded, please further tune the variable for optimal " + f"performance in your application as needed." + ) + os.environ["MKL_NUM_THREADS"] = str(mkl_num_threads) def register_all_modules(init_default_scope: bool = True) -> None: @@ -69,18 +72,19 @@ def register_all_modules(init_default_scope: bool = True) -> None: import mmpose.visualization # noqa: F401,F403 if init_default_scope: - never_created = DefaultScope.get_current_instance() is None \ - or not DefaultScope.check_instance_created('mmpose') + never_created = DefaultScope.get_current_instance() is None or not DefaultScope.check_instance_created("mmpose") if never_created: - DefaultScope.get_instance('mmpose', scope_name='mmpose') + DefaultScope.get_instance("mmpose", scope_name="mmpose") return current_scope = DefaultScope.get_current_instance() - if current_scope.scope_name != 'mmpose': - warnings.warn('The current default scope ' - f'"{current_scope.scope_name}" is not "mmpose", ' - '`register_all_modules` will force the current' - 'default scope to be "mmpose". If this is not ' - 'expected, please set `init_default_scope=False`.') + if current_scope.scope_name != "mmpose": + warnings.warn( + "The current default scope " + f'"{current_scope.scope_name}" is not "mmpose", ' + "`register_all_modules` will force the current" + 'default scope to be "mmpose". If this is not ' + "expected, please set `init_default_scope=False`." + ) # avoid name conflict - new_instance_name = f'mmpose-{datetime.datetime.now()}' - DefaultScope.get_instance(new_instance_name, scope_name='mmpose') + new_instance_name = f"mmpose-{datetime.datetime.now()}" + DefaultScope.get_instance(new_instance_name, scope_name="mmpose") diff --git a/mmpose/utils/tensor_utils.py b/mmpose/utils/tensor_utils.py index 755e26854cb379d6218d1ac2bfad15039330df42..1b9cd737dd25dc35d6c88ab0c19ab1f542ac7172 100644 --- a/mmpose/utils/tensor_utils.py +++ b/mmpose/utils/tensor_utils.py @@ -8,9 +8,7 @@ from mmengine.utils import is_seq_of from torch import Tensor -def to_numpy(x: Union[Tensor, Sequence[Tensor]], - return_device: bool = False, - unzip: bool = False) -> Union[np.ndarray, tuple]: +def to_numpy(x: Union[Tensor, Sequence[Tensor]], return_device: bool = False, unzip: bool = False) -> Union[np.ndarray, tuple]: """Convert torch tensor to numpy.ndarray. Args: @@ -31,21 +29,18 @@ def to_numpy(x: Union[Tensor, Sequence[Tensor]], device = x.device elif isinstance(x, np.ndarray) or is_seq_of(x, np.ndarray): arrays = x - device = 'cpu' + device = "cpu" elif is_seq_of(x, Tensor): if unzip: # convert (A, B) -> [(A[0], B[0]), (A[1], B[1]), ...] - arrays = [ - tuple(to_numpy(_x[None, :]) for _x in _each) - for _each in zip(*x) - ] + arrays = [tuple(to_numpy(_x[None, :]) for _x in _each) for _each in zip(*x)] else: arrays = [to_numpy(_x) for _x in x] device = x[0].device else: - raise ValueError(f'Invalid input type {type(x)}') + raise ValueError(f"Invalid input type {type(x)}") if return_device: return arrays, device @@ -53,8 +48,7 @@ def to_numpy(x: Union[Tensor, Sequence[Tensor]], return arrays -def to_tensor(x: Union[np.ndarray, Sequence[np.ndarray]], - device: Optional[Any] = None) -> Union[Tensor, Sequence[Tensor]]: +def to_tensor(x: Union[np.ndarray, Sequence[np.ndarray]], device: Optional[Any] = None) -> Union[Tensor, Sequence[Tensor]]: """Convert numpy.ndarray to torch tensor. Args: @@ -71,4 +65,4 @@ def to_tensor(x: Union[np.ndarray, Sequence[np.ndarray]], elif is_seq_of(x, np.ndarray): return [to_tensor(_x, device=device) for _x in x] else: - raise ValueError(f'Invalid input type {type(x)}') + raise ValueError(f"Invalid input type {type(x)}") diff --git a/mmpose/utils/timer.py b/mmpose/utils/timer.py index c219c04069d239605a7854b06a370876dbe8fd58..2bd0f49051c810708cd9987e141f6bd6895aeab8 100644 --- a/mmpose/utils/timer.py +++ b/mmpose/utils/timer.py @@ -7,7 +7,7 @@ import numpy as np from mmengine import Timer -class RunningAverage(): +class RunningAverage: r"""A helper class to calculate running average in a sliding window. Args: @@ -21,7 +21,7 @@ class RunningAverage(): def update(self, value): """Update a new data sample.""" self._data.append(value) - self._data = self._data[-self.window:] + self._data = self._data[-self.window :] def average(self): """Get the average value of current window.""" @@ -57,7 +57,7 @@ class StopWatch: self._timer_stack = [] @contextmanager - def timeit(self, timer_name='_FPS_'): + def timeit(self, timer_name="_FPS_"): """Timing a code snippet with an assigned name. Args: @@ -84,13 +84,10 @@ class StopWatch: dict: The key is the timer name and the value is the \ corresponding average time consuming. """ - result = { - name: r.average() * 1000. - for name, r in self._record.items() - } + result = {name: r.average() * 1000.0 for name, r in self._record.items()} - if '_FPS_' in result: - result['_FPS_'] = 1000. / result.pop('_FPS_') + if "_FPS_" in result: + result["_FPS_"] = 1000.0 / result.pop("_FPS_") if key is None: return result @@ -107,9 +104,9 @@ class StopWatch: """ result = self.report() strings = [] - if '_FPS_' in result: + if "_FPS_" in result: strings.append(f'FPS: {result["_FPS_"]:>5.1f}') - strings += [f'{name}: {val:>3.0f}' for name, val in result.items()] + strings += [f"{name}: {val:>3.0f}" for name, val in result.items()] return strings def reset(self): diff --git a/mmpose/utils/typing.py b/mmpose/utils/typing.py index 557891b3b92e657de43eb50d4b5fbce7d369e7ee..47240432e8abab975a995e89e23f7b6a78634911 100644 --- a/mmpose/utils/typing.py +++ b/mmpose/utils/typing.py @@ -20,8 +20,7 @@ InstanceList = List[InstanceData] PixelDataList = List[PixelData] Predictions = Union[InstanceList, Tuple[InstanceList, PixelDataList]] # Type hint of model outputs -ForwardResults = Union[Dict[str, Tensor], List[PoseDataSample], Tuple[Tensor], - Tensor] +ForwardResults = Union[Dict[str, Tensor], List[PoseDataSample], Tuple[Tensor], Tensor] # Type hint of features # - Tuple[Tensor]: multi-level features extracted by the network # - List[Tuple[Tensor]]: multiple feature pyramids for TTA diff --git a/mmpose/version.py b/mmpose/version.py index 39bc36f2bb2fcd1f52e62b4453acb4f4b50e9d8f..8222e0c151ff3b708624aefd95468a15f74a1e1f 100644 --- a/mmpose/version.py +++ b/mmpose/version.py @@ -1,6 +1,6 @@ # Copyright (c) Open-MMLab. All rights reserved. -__version__ = '1.3.1' +__version__ = "1.3.1" short_version = __version__ @@ -14,17 +14,17 @@ def parse_version_info(version_str): (1, 3, 0), and "2.0.0rc1" is parsed into (2, 0, 0, 'rc1'). """ version_info = [] - for x in version_str.split('.'): + for x in version_str.split("."): if x.isdigit(): version_info.append(int(x)) - elif x.find('rc') != -1: - patch_version = x.split('rc') + elif x.find("rc") != -1: + patch_version = x.split("rc") version_info.append(int(patch_version[0])) - version_info.append(f'rc{patch_version[1]}') - elif x.find('b') != -1: - patch_version = x.split('b') + version_info.append(f"rc{patch_version[1]}") + elif x.find("b") != -1: + patch_version = x.split("b") version_info.append(int(patch_version[0])) - version_info.append(f'b{patch_version[1]}') + version_info.append(f"b{patch_version[1]}") return tuple(version_info) diff --git a/mmpose/visualization/__init__.py b/mmpose/visualization/__init__.py index 4a18e8bc5b4fa8d58adee30576013bb780bd9a19..025a1d77a4a152fb3067879fe25a7f47e110fad5 100644 --- a/mmpose/visualization/__init__.py +++ b/mmpose/visualization/__init__.py @@ -3,4 +3,4 @@ from .fast_visualizer import FastVisualizer from .local_visualizer import PoseLocalVisualizer from .local_visualizer_3d import Pose3dLocalVisualizer -__all__ = ['PoseLocalVisualizer', 'FastVisualizer', 'Pose3dLocalVisualizer'] +__all__ = ["PoseLocalVisualizer", "FastVisualizer", "Pose3dLocalVisualizer"] diff --git a/mmpose/visualization/fast_visualizer.py b/mmpose/visualization/fast_visualizer.py index fa0cb385270832f12a9d12fac892e920f32c2002..0e022d420baf67c35ba2a8d59661eecff8b2b723 100644 --- a/mmpose/visualization/fast_visualizer.py +++ b/mmpose/visualization/fast_visualizer.py @@ -23,11 +23,11 @@ class FastVisualizer: self.line_width = line_width self.kpt_thr = kpt_thr - self.keypoint_id2name = metainfo['keypoint_id2name'] - self.keypoint_name2id = metainfo['keypoint_name2id'] - self.keypoint_colors = metainfo['keypoint_colors'] - self.skeleton_links = metainfo['skeleton_links'] - self.skeleton_link_colors = metainfo['skeleton_link_colors'] + self.keypoint_id2name = metainfo["keypoint_id2name"] + self.keypoint_name2id = metainfo["keypoint_name2id"] + self.keypoint_colors = metainfo["keypoint_colors"] + self.skeleton_links = metainfo["skeleton_links"] + self.skeleton_link_colors = metainfo["skeleton_link_colors"] def draw_pose(self, img, instances): """Draw pose estimations on the given image. @@ -46,7 +46,7 @@ class FastVisualizer: """ if instances is None: - print('no instance detected') + print("no instance detected") return keypoints = instances.keypoints @@ -72,7 +72,5 @@ class FastVisualizer: x_coord, y_coord = int(kpt[0]), int(kpt[1]) color = self.keypoint_colors[kid].tolist() - cv2.circle(img, (int(x_coord), int(y_coord)), self.radius, - color, -1) - cv2.circle(img, (int(x_coord), int(y_coord)), self.radius, - (255, 255, 255)) + cv2.circle(img, (int(x_coord), int(y_coord)), self.radius, color, -1) + cv2.circle(img, (int(x_coord), int(y_coord)), self.radius, (255, 255, 255)) diff --git a/mmpose/visualization/local_visualizer.py b/mmpose/visualization/local_visualizer.py index f147919457b9b75177b511b034b7710f3dccd239..fd0945754dcaf92a6fb2fb0f92371b78cdf8b9a5 100644 --- a/mmpose/visualization/local_visualizer.py +++ b/mmpose/visualization/local_visualizer.py @@ -13,23 +13,19 @@ from mmengine.structures import InstanceData, PixelData from mmpose.datasets.datasets.utils import parse_pose_metainfo from mmpose.registry import VISUALIZERS from mmpose.structures import PoseDataSample -from .opencv_backend_visualizer import OpencvBackendVisualizer -from .simcc_vis import SimCCVisualizer - from mmpose.structures.keypoint import fix_bbox_aspect_ratio -import cv2 +from .opencv_backend_visualizer import OpencvBackendVisualizer +from .simcc_vis import SimCCVisualizer try: - POSEVIS=True + POSEVIS = True from posevis import pose_visualization except ImportError: - POSEVIS=False + POSEVIS = False -def _get_adaptive_scales(areas: np.ndarray, - min_area: int = 800, - max_area: int = 30000) -> np.ndarray: +def _get_adaptive_scales(areas: np.ndarray, min_area: int = 800, max_area: int = 30000) -> np.ndarray: """Get adaptive scales according to areas. The scale range is [0.5, 1.0]. When the area is less than @@ -112,34 +108,27 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): ... pred_pose_data_sample) """ - def __init__(self, - name: str = 'visualizer', - image: Optional[np.ndarray] = None, - vis_backends: Optional[Dict] = None, - save_dir: Optional[str] = None, - bbox_color: Optional[Union[str, Tuple[int]]] = 'green', - kpt_color: Optional[Union[str, Tuple[Tuple[int]]]] = 'red', - link_color: Optional[Union[str, Tuple[Tuple[int]]]] = None, - text_color: Optional[Union[str, - Tuple[int]]] = (255, 255, 255), - skeleton: Optional[Union[List, Tuple]] = None, - line_width: Union[int, float] = 1, - radius: Union[int, float] = 3, - show_keypoint_weight: bool = False, - backend: str = 'opencv', - alpha: float = 1.0): - - warnings.filterwarnings( - 'ignore', - message='.*please provide the `save_dir` argument.*', - category=UserWarning) - - super().__init__( - name=name, - image=image, - vis_backends=vis_backends, - save_dir=save_dir, - backend=backend) + def __init__( + self, + name: str = "visualizer", + image: Optional[np.ndarray] = None, + vis_backends: Optional[Dict] = None, + save_dir: Optional[str] = None, + bbox_color: Optional[Union[str, Tuple[int]]] = "green", + kpt_color: Optional[Union[str, Tuple[Tuple[int]]]] = "red", + link_color: Optional[Union[str, Tuple[Tuple[int]]]] = None, + text_color: Optional[Union[str, Tuple[int]]] = (255, 255, 255), + skeleton: Optional[Union[List, Tuple]] = None, + line_width: Union[int, float] = 1, + radius: Union[int, float] = 3, + show_keypoint_weight: bool = False, + backend: str = "opencv", + alpha: float = 1.0, + ): + + warnings.filterwarnings("ignore", message=".*please provide the `save_dir` argument.*", category=UserWarning) + + super().__init__(name=name, image=image, vis_backends=vis_backends, save_dir=save_dir, backend=backend) self.bbox_color = bbox_color self.kpt_color = kpt_color @@ -155,44 +144,34 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): # it will override the default value. self.dataset_meta = {} - def set_dataset_meta(self, - dataset_meta: Dict, - skeleton_style: str = 'mmpose'): + def set_dataset_meta(self, dataset_meta: Dict, skeleton_style: str = "mmpose"): """Assign dataset_meta to the visualizer. The default visualization settings will be overridden. Args: dataset_meta (dict): meta information of dataset. """ - if skeleton_style == 'openpose': - dataset_name = dataset_meta['dataset_name'] - if dataset_name == 'coco': - dataset_meta = parse_pose_metainfo( - dict(from_file='configs/_base_/datasets/coco_openpose.py')) - elif dataset_name == 'coco_wholebody': - dataset_meta = parse_pose_metainfo( - dict(from_file='configs/_base_/datasets/' - 'coco_wholebody_openpose.py')) + if skeleton_style == "openpose": + dataset_name = dataset_meta["dataset_name"] + if dataset_name == "coco": + dataset_meta = parse_pose_metainfo(dict(from_file="configs/_base_/datasets/coco_openpose.py")) + elif dataset_name == "coco_wholebody": + dataset_meta = parse_pose_metainfo(dict(from_file="configs/_base_/datasets/" "coco_wholebody_openpose.py")) else: - raise NotImplementedError( - f'openpose style has not been ' - f'supported for {dataset_name} dataset') + raise NotImplementedError(f"openpose style has not been " f"supported for {dataset_name} dataset") if isinstance(dataset_meta, dict): self.dataset_meta = dataset_meta.copy() - self.bbox_color = dataset_meta.get('bbox_color', self.bbox_color) - self.kpt_color = dataset_meta.get('keypoint_colors', - self.kpt_color) - self.link_color = dataset_meta.get('skeleton_link_colors', - self.link_color) - self.skeleton = dataset_meta.get('skeleton_links', self.skeleton) + self.bbox_color = dataset_meta.get("bbox_color", self.bbox_color) + self.kpt_color = dataset_meta.get("keypoint_colors", self.kpt_color) + self.link_color = dataset_meta.get("skeleton_link_colors", self.link_color) + self.skeleton = dataset_meta.get("skeleton_links", self.skeleton) # sometimes self.dataset_meta is manually set, which might be None. # it should be converted to a dict at these times if self.dataset_meta is None: self.dataset_meta = {} - def _draw_instances_bbox(self, image: np.ndarray, - instances: InstanceData) -> np.ndarray: + def _draw_instances_bbox(self, image: np.ndarray, instances: InstanceData) -> np.ndarray: """Draw bounding boxes and corresponding labels of GT or prediction. Args: @@ -205,31 +184,24 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): """ self.set_image(image) - if 'bboxes' in instances: + if "bboxes" in instances: bboxes = instances.bboxes - self.draw_bboxes( - bboxes, - edge_colors=self.bbox_color, - alpha=self.alpha, - line_widths=self.line_width) + self.draw_bboxes(bboxes, edge_colors=self.bbox_color, alpha=self.alpha, line_widths=self.line_width) else: return self.get_image() - if 'labels' in instances and self.text_color is not None: - classes = self.dataset_meta.get('classes', None) + if "labels" in instances and self.text_color is not None: + classes = self.dataset_meta.get("classes", None) labels = instances.labels positions = bboxes[:, :2] - areas = (bboxes[:, 3] - bboxes[:, 1]) * ( - bboxes[:, 2] - bboxes[:, 0]) + areas = (bboxes[:, 3] - bboxes[:, 1]) * (bboxes[:, 2] - bboxes[:, 0]) scales = _get_adaptive_scales(areas) for i, (pos, label) in enumerate(zip(positions, labels)): - label_text = classes[ - label] if classes is not None else f'class {label}' + label_text = classes[label] if classes is not None else f"class {label}" - if isinstance(self.bbox_color, - tuple) and max(self.bbox_color) > 1: + if isinstance(self.bbox_color, tuple) and max(self.bbox_color) > 1: facecolor = [c / 255.0 for c in self.bbox_color] else: facecolor = self.bbox_color @@ -239,22 +211,15 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): pos, colors=self.text_color, font_sizes=int(13 * scales[i]), - vertical_alignments='bottom', - bboxes=[{ - 'facecolor': facecolor, - 'alpha': 0.8, - 'pad': 0.7, - 'edgecolor': 'none' - }]) + vertical_alignments="bottom", + bboxes=[{"facecolor": facecolor, "alpha": 0.8, "pad": 0.7, "edgecolor": "none"}], + ) return self.get_image() - def _draw_instances_kpts(self, - image: np.ndarray, - instances: InstanceData, - kpt_thr: float = 0.3, - show_kpt_idx: bool = False, - skeleton_style: str = 'mmpose'): + def _draw_instances_kpts( + self, image: np.ndarray, instances: InstanceData, kpt_thr: float = 0.3, show_kpt_idx: bool = False, skeleton_style: str = "mmpose" + ): """Draw keypoints and skeletons (optional) of GT or prediction. Args: @@ -272,18 +237,16 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): np.ndarray: the drawn image which channel is RGB. """ - if skeleton_style == 'openpose': - return self._draw_instances_kpts_openpose(image, instances, - kpt_thr) + if skeleton_style == "openpose": + return self._draw_instances_kpts_openpose(image, instances, kpt_thr) self.set_image(image) img_h, img_w, _ = image.shape - if 'keypoints' in instances: - keypoints = instances.get('transformed_keypoints', - instances.keypoints) + if "keypoints" in instances: + keypoints = instances.get("transformed_keypoints", instances.keypoints) - if 'keypoints_visible' in instances: + if "keypoints_visible" in instances: keypoints_visible = instances.keypoints_visible else: keypoints_visible = np.ones(keypoints.shape[:-1]) @@ -297,33 +260,39 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): kpt_color = self.kpt_color else: raise ValueError( - f'the length of kpt_color ' - f'({len(self.kpt_color)}) does not matches ' - f'that of keypoints ({len(kpts)})') + f"the length of kpt_color " f"({len(self.kpt_color)}) does not matches " f"that of keypoints ({len(kpts)})" + ) # draw links if self.skeleton is not None and self.link_color is not None: - if self.link_color is None or isinstance( - self.link_color, str): + if self.link_color is None or isinstance(self.link_color, str): link_color = [self.link_color] * len(self.skeleton) elif len(self.link_color) == len(self.skeleton): link_color = self.link_color else: raise ValueError( - f'the length of link_color ' - f'({len(self.link_color)}) does not matches ' - f'that of skeleton ({len(self.skeleton)})') + f"the length of link_color " + f"({len(self.link_color)}) does not matches " + f"that of skeleton ({len(self.skeleton)})" + ) for sk_id, sk in enumerate(self.skeleton): pos1 = (int(kpts[sk[0], 0]), int(kpts[sk[0], 1])) pos2 = (int(kpts[sk[1], 0]), int(kpts[sk[1], 1])) - if (pos1[0] <= 0 or pos1[0] >= img_w or pos1[1] <= 0 - or pos1[1] >= img_h or pos2[0] <= 0 - or pos2[0] >= img_w or pos2[1] <= 0 - or pos2[1] >= img_h or visible[sk[0]] < kpt_thr - or visible[sk[1]] < kpt_thr - or link_color[sk_id] is None): + if ( + pos1[0] <= 0 + or pos1[0] >= img_w + or pos1[1] <= 0 + or pos1[1] >= img_h + or pos2[0] <= 0 + or pos2[0] >= img_w + or pos2[1] <= 0 + or pos2[1] >= img_h + or visible[sk[0]] < kpt_thr + or visible[sk[1]] < kpt_thr + or link_color[sk_id] is None + ): # skip the link that should not be drawn continue @@ -334,13 +303,9 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): color = tuple(int(c) for c in color) transparency = self.alpha if self.show_keypoint_weight: - transparency *= max( - 0, - min(1, - 0.5 * (visible[sk[0]] + visible[sk[1]]))) + transparency *= max(0, min(1, 0.5 * (visible[sk[0]] + visible[sk[1]]))) - self.draw_lines( - X, Y, color, line_widths=self.line_width) + self.draw_lines(X, Y, color, line_widths=self.line_width) # draw each point on image for kid, kpt in enumerate(kpts): @@ -360,7 +325,8 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): face_colors=color, edge_colors=color, alpha=transparency, - line_widths=self.radius) + line_widths=self.radius, + ) if show_kpt_idx: kpt_idx_coords = kpt + [self.radius, -self.radius] self.draw_texts( @@ -368,15 +334,13 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): kpt_idx_coords, colors=color, font_sizes=self.radius * 3, - vertical_alignments='bottom', - horizontal_alignments='center') + vertical_alignments="bottom", + horizontal_alignments="center", + ) return self.get_image() - def _draw_instances_kpts_openpose(self, - image: np.ndarray, - instances: InstanceData, - kpt_thr: float = 0.3): + def _draw_instances_kpts_openpose(self, image: np.ndarray, instances: InstanceData, kpt_thr: float = 0.3): """Draw keypoints and skeletons (optional) of GT or prediction in openpose style. @@ -394,33 +358,27 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): self.set_image(image) img_h, img_w, _ = image.shape - if 'keypoints' in instances: - keypoints = instances.get('transformed_keypoints', - instances.keypoints) + if "keypoints" in instances: + keypoints = instances.get("transformed_keypoints", instances.keypoints) - if 'keypoints_visible' in instances: + if "keypoints_visible" in instances: keypoints_visible = instances.keypoints_visible else: keypoints_visible = np.ones(keypoints.shape[:-1]) - keypoints_info = np.concatenate( - (keypoints, keypoints_visible[..., None]), axis=-1) + keypoints_info = np.concatenate((keypoints, keypoints_visible[..., None]), axis=-1) # compute neck joint neck = np.mean(keypoints_info[:, [5, 6]], axis=1) # neck score when visualizing pred - neck[:, 2:3] = np.logical_and( - keypoints_info[:, 5, 2:3] > kpt_thr, - keypoints_info[:, 6, 2:3] > kpt_thr).astype(int) + neck[:, 2:3] = np.logical_and(keypoints_info[:, 5, 2:3] > kpt_thr, keypoints_info[:, 6, 2:3] > kpt_thr).astype(int) new_keypoints_info = np.insert(keypoints_info, 17, neck, axis=1) mmpose_idx = [17, 6, 8, 10, 7, 9, 12, 14, 16, 13, 15, 2, 1, 4, 3] openpose_idx = [1, 2, 3, 4, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17] - new_keypoints_info[:, openpose_idx] = \ - new_keypoints_info[:, mmpose_idx] + new_keypoints_info[:, openpose_idx] = new_keypoints_info[:, mmpose_idx] keypoints_info = new_keypoints_info - keypoints, keypoints_visible = keypoints_info[ - ..., :2], keypoints_info[..., 2] + keypoints, keypoints_visible = keypoints_info[..., :2], keypoints_info[..., 2] for kpts, visible in zip(keypoints, keypoints_visible): kpts = np.array(kpts, copy=False) @@ -431,33 +389,39 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): kpt_color = self.kpt_color else: raise ValueError( - f'the length of kpt_color ' - f'({len(self.kpt_color)}) does not matches ' - f'that of keypoints ({len(kpts)})') + f"the length of kpt_color " f"({len(self.kpt_color)}) does not matches " f"that of keypoints ({len(kpts)})" + ) # draw links if self.skeleton is not None and self.link_color is not None: - if self.link_color is None or isinstance( - self.link_color, str): + if self.link_color is None or isinstance(self.link_color, str): link_color = [self.link_color] * len(self.skeleton) elif len(self.link_color) == len(self.skeleton): link_color = self.link_color else: raise ValueError( - f'the length of link_color ' - f'({len(self.link_color)}) does not matches ' - f'that of skeleton ({len(self.skeleton)})') + f"the length of link_color " + f"({len(self.link_color)}) does not matches " + f"that of skeleton ({len(self.skeleton)})" + ) for sk_id, sk in enumerate(self.skeleton): pos1 = (int(kpts[sk[0], 0]), int(kpts[sk[0], 1])) pos2 = (int(kpts[sk[1], 0]), int(kpts[sk[1], 1])) - if (pos1[0] <= 0 or pos1[0] >= img_w or pos1[1] <= 0 - or pos1[1] >= img_h or pos2[0] <= 0 - or pos2[0] >= img_w or pos2[1] <= 0 - or pos2[1] >= img_h or visible[sk[0]] < kpt_thr - or visible[sk[1]] < kpt_thr - or link_color[sk_id] is None): + if ( + pos1[0] <= 0 + or pos1[0] >= img_w + or pos1[1] <= 0 + or pos1[1] >= img_h + or pos2[0] <= 0 + or pos2[0] >= img_w + or pos2[1] <= 0 + or pos2[1] >= img_h + or visible[sk[0]] < kpt_thr + or visible[sk[1]] < kpt_thr + or link_color[sk_id] is None + ): # skip the link that should not be drawn continue @@ -468,29 +432,18 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): color = tuple(int(c) for c in color) transparency = self.alpha if self.show_keypoint_weight: - transparency *= max( - 0, - min(1, - 0.5 * (visible[sk[0]] + visible[sk[1]]))) + transparency *= max(0, min(1, 0.5 * (visible[sk[0]] + visible[sk[1]]))) if sk_id <= 16: # body part mX = np.mean(X) mY = np.mean(Y) - length = ((Y[0] - Y[1])**2 + (X[0] - X[1])**2)**0.5 + length = ((Y[0] - Y[1]) ** 2 + (X[0] - X[1]) ** 2) ** 0.5 transparency = 0.6 - angle = math.degrees( - math.atan2(Y[0] - Y[1], X[0] - X[1])) - polygons = cv2.ellipse2Poly( - (int(mX), int(mY)), - (int(length / 2), int(self.line_width)), - int(angle), 0, 360, 1) - - self.draw_polygons( - polygons, - edge_colors=color, - face_colors=color, - alpha=transparency) + angle = math.degrees(math.atan2(Y[0] - Y[1], X[0] - X[1])) + polygons = cv2.ellipse2Poly((int(mX), int(mY)), (int(length / 2), int(self.line_width)), int(angle), 0, 360, 1) + + self.draw_polygons(polygons, edge_colors=color, face_colors=color, alpha=transparency) else: # hand part @@ -498,8 +451,7 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): # draw each point on image for kid, kpt in enumerate(kpts): - if visible[kid] < kpt_thr or kpt_color[ - kid] is None or kpt_color[kid].sum() == 0: + if visible[kid] < kpt_thr or kpt_color[kid] is None or kpt_color[kid].sum() == 0: # skip the point that should not be drawn continue @@ -514,12 +466,8 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): radius = self.radius // 2 if kid > 17 else self.radius self.draw_circles( - kpt, - radius=np.array([radius]), - face_colors=color, - edge_colors=color, - alpha=transparency, - line_widths=radius) + kpt, radius=np.array([radius]), face_colors=color, edge_colors=color, alpha=transparency, line_widths=radius + ) return self.get_image() @@ -538,7 +486,7 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): Returns: np.ndarray: the drawn image which channel is RGB. """ - if 'heatmaps' not in fields: + if "heatmaps" not in fields: return None heatmaps = fields.heatmaps if isinstance(heatmaps, np.ndarray): @@ -570,33 +518,34 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): Returns: np.ndarray: the drawn image which channel is RGB. """ - if 'heatmaps' not in fields: + if "heatmaps" not in fields: return None heatmaps = fields.heatmaps _, h, w = heatmaps.shape if isinstance(heatmaps, np.ndarray): heatmaps = torch.from_numpy(heatmaps) - out_image = SimCCVisualizer().draw_instance_xy_heatmap( - heatmaps, overlaid_image, n) + out_image = SimCCVisualizer().draw_instance_xy_heatmap(heatmaps, overlaid_image, n) out_image = cv2.resize(out_image[:, :, ::-1], (w, h)) return out_image @master_only - def add_datasample(self, - name: str, - image: np.ndarray, - data_sample: PoseDataSample, - draw_gt: bool = True, - draw_pred: bool = True, - draw_heatmap: bool = True, - draw_bbox: bool = False, - show_kpt_idx: bool = False, - skeleton_style: str = 'mmpose', - show: bool = False, - wait_time: float = 0, - out_file: Optional[str] = None, - kpt_thr: float = 0.3, - step: int = 0) -> None: + def add_datasample( + self, + name: str, + image: np.ndarray, + data_sample: PoseDataSample, + draw_gt: bool = True, + draw_pred: bool = True, + draw_heatmap: bool = True, + draw_bbox: bool = False, + show_kpt_idx: bool = False, + skeleton_style: str = "mmpose", + show: bool = False, + wait_time: float = 0, + out_file: Optional[str] = None, + kpt_thr: float = 0.3, + step: int = 0, + ) -> None: """Draw datasample and save to all backends. - If GT and prediction are plotted at the same time, they are @@ -642,35 +591,30 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): gt_img_heatmap = None # draw bboxes & keypoints - if 'gt_instances' in data_sample: - gt_img_data = self._draw_instances_kpts( - gt_img_data, data_sample.gt_instances, kpt_thr, - show_kpt_idx, skeleton_style) + if "gt_instances" in data_sample: + gt_img_data = self._draw_instances_kpts(gt_img_data, data_sample.gt_instances, kpt_thr, show_kpt_idx, skeleton_style) if draw_bbox: - gt_img_data = self._draw_instances_bbox( - gt_img_data, data_sample.gt_instances) + gt_img_data = self._draw_instances_bbox(gt_img_data, data_sample.gt_instances) # draw heatmaps - if 'gt_fields' in data_sample and draw_heatmap: - gt_img_heatmap = self._draw_instance_heatmap( - data_sample.gt_fields, image) - + if "gt_fields" in data_sample and draw_heatmap: + gt_img_heatmap = self._draw_instance_heatmap(data_sample.gt_fields, image) + # Draw abox over heatmap bbox_xyxy = data_sample.gt_instances.bboxes.squeeze() - abox_xyxy = fix_bbox_aspect_ratio(bbox_xyxy, aspect_ratio=3/4, padding=1.25, bbox_format='xyxy') + abox_xyxy = fix_bbox_aspect_ratio(bbox_xyxy, aspect_ratio=3 / 4, padding=1.25, bbox_format="xyxy") abox_xyxy = abox_xyxy.flatten().astype(int) gt_img_heatmap = cv2.rectangle(gt_img_heatmap, (abox_xyxy[0], abox_xyxy[1]), (abox_xyxy[2], abox_xyxy[3]), (0, 255, 0), 2) if gt_img_heatmap is not None: - gt_img_data = np.concatenate((gt_img_data, gt_img_heatmap), - axis=0) + gt_img_data = np.concatenate((gt_img_data, gt_img_heatmap), axis=0) if draw_pred: pred_img_data = image.copy() pred_img_heatmap = None # draw bboxes & keypoints - if 'pred_instances' in data_sample: + if "pred_instances" in data_sample: if POSEVIS: pred_samples = [] for i in range(data_sample.pred_instances.keypoints.shape[0]): @@ -684,16 +628,20 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): kpts = np.concatenate([kpts, vis], axis=1) kpts[kpts[:, -1] < 1e-6, :] = 0 bbox_xyxy = data_sample.pred_instances.bboxes[i].squeeze() - bbox_xywh = np.array([ - bbox_xyxy[0], - bbox_xyxy[1], - bbox_xyxy[2] - bbox_xyxy[0], - bbox_xyxy[3] - bbox_xyxy[1], - ]) - pred_samples.append({ - 'keypoints': kpts[:17, :], - 'bbox': bbox_xywh, - }) + bbox_xywh = np.array( + [ + bbox_xyxy[0], + bbox_xyxy[1], + bbox_xyxy[2] - bbox_xyxy[0], + bbox_xyxy[3] - bbox_xyxy[1], + ] + ) + pred_samples.append( + { + "keypoints": kpts[:17, :], + "bbox": bbox_xywh, + } + ) pred_img_data = pose_visualization( pred_img_data, @@ -732,31 +680,29 @@ class PoseLocalVisualizer(OpencvBackendVisualizer): else: pred_img_data = self._draw_instances_kpts( - pred_img_data, data_sample.pred_instances, kpt_thr, - show_kpt_idx, skeleton_style) + pred_img_data, data_sample.pred_instances, kpt_thr, show_kpt_idx, skeleton_style + ) if draw_bbox: - pred_img_data = self._draw_instances_bbox( - pred_img_data, data_sample.pred_instances) + pred_img_data = self._draw_instances_bbox(pred_img_data, data_sample.pred_instances) # draw heatmaps - if 'pred_fields' in data_sample and draw_heatmap: - if 'keypoint_x_labels' in data_sample.pred_instances: - pred_img_heatmap = self._draw_instance_xy_heatmap( - data_sample.pred_fields, image) + if "pred_fields" in data_sample and draw_heatmap: + if "keypoint_x_labels" in data_sample.pred_instances: + pred_img_heatmap = self._draw_instance_xy_heatmap(data_sample.pred_fields, image) else: - pred_img_heatmap = self._draw_instance_heatmap( - data_sample.pred_fields, image) - - # Draw abox over heatmap + pred_img_heatmap = self._draw_instance_heatmap(data_sample.pred_fields, image) + + # Draw abox over heatmap bbox_xyxy = data_sample.gt_instances.bboxes.squeeze() - abox_xyxy = fix_bbox_aspect_ratio(bbox_xyxy, aspect_ratio=3/4, padding=1.25, bbox_format='xyxy') + abox_xyxy = fix_bbox_aspect_ratio(bbox_xyxy, aspect_ratio=3 / 4, padding=1.25, bbox_format="xyxy") abox_xyxy = abox_xyxy.flatten().astype(int) - pred_img_heatmap = cv2.rectangle(pred_img_heatmap, (abox_xyxy[0], abox_xyxy[1]), (abox_xyxy[2], abox_xyxy[3]), (0, 255, 0), 1) + pred_img_heatmap = cv2.rectangle( + pred_img_heatmap, (abox_xyxy[0], abox_xyxy[1]), (abox_xyxy[2], abox_xyxy[3]), (0, 255, 0), 1 + ) if pred_img_heatmap is not None: pred_img_heatmap = cv2.resize(pred_img_heatmap, (pred_img_data.shape[:2][::-1])) - pred_img_data = np.concatenate( - (pred_img_data, pred_img_heatmap), axis=0) + pred_img_data = np.concatenate((pred_img_data, pred_img_heatmap), axis=0) # merge visualization results if gt_img_data is not None and pred_img_data is not None: diff --git a/mmpose/visualization/local_visualizer_3d.py b/mmpose/visualization/local_visualizer_3d.py index 09603dba8064fab9c35571ebfb4b094ce2a819eb..ca09e2f8b65ca84ce92b15c5a7bdc068becb1a6e 100644 --- a/mmpose/visualization/local_visualizer_3d.py +++ b/mmpose/visualization/local_visualizer_3d.py @@ -12,6 +12,7 @@ from mmengine.structures import InstanceData from mmpose.apis import convert_keypoint_definition from mmpose.registry import VISUALIZERS from mmpose.structures import PoseDataSample + from . import PoseLocalVisualizer @@ -48,45 +49,59 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): """ def __init__( - self, - name: str = 'visualizer', - image: Optional[np.ndarray] = None, - vis_backends: Optional[Dict] = None, - save_dir: Optional[str] = None, - bbox_color: Optional[Union[str, Tuple[int]]] = 'green', - kpt_color: Optional[Union[str, Tuple[Tuple[int]]]] = 'red', - link_color: Optional[Union[str, Tuple[Tuple[int]]]] = None, - text_color: Optional[Union[str, Tuple[int]]] = (255, 255, 255), - skeleton: Optional[Union[List, Tuple]] = None, - line_width: Union[int, float] = 1, - radius: Union[int, float] = 3, - show_keypoint_weight: bool = False, - backend: str = 'opencv', - alpha: float = 0.8, - det_kpt_color: Optional[Union[str, Tuple[Tuple[int]]]] = None, - det_dataset_skeleton: Optional[Union[str, - Tuple[Tuple[int]]]] = None, - det_dataset_link_color: Optional[np.ndarray] = None): - super().__init__(name, image, vis_backends, save_dir, bbox_color, - kpt_color, link_color, text_color, skeleton, - line_width, radius, show_keypoint_weight, backend, - alpha) + self, + name: str = "visualizer", + image: Optional[np.ndarray] = None, + vis_backends: Optional[Dict] = None, + save_dir: Optional[str] = None, + bbox_color: Optional[Union[str, Tuple[int]]] = "green", + kpt_color: Optional[Union[str, Tuple[Tuple[int]]]] = "red", + link_color: Optional[Union[str, Tuple[Tuple[int]]]] = None, + text_color: Optional[Union[str, Tuple[int]]] = (255, 255, 255), + skeleton: Optional[Union[List, Tuple]] = None, + line_width: Union[int, float] = 1, + radius: Union[int, float] = 3, + show_keypoint_weight: bool = False, + backend: str = "opencv", + alpha: float = 0.8, + det_kpt_color: Optional[Union[str, Tuple[Tuple[int]]]] = None, + det_dataset_skeleton: Optional[Union[str, Tuple[Tuple[int]]]] = None, + det_dataset_link_color: Optional[np.ndarray] = None, + ): + super().__init__( + name, + image, + vis_backends, + save_dir, + bbox_color, + kpt_color, + link_color, + text_color, + skeleton, + line_width, + radius, + show_keypoint_weight, + backend, + alpha, + ) self.det_kpt_color = det_kpt_color self.det_dataset_skeleton = det_dataset_skeleton self.det_dataset_link_color = det_dataset_link_color - def _draw_3d_data_samples(self, - image: np.ndarray, - pose_samples: PoseDataSample, - draw_gt: bool = True, - kpt_thr: float = 0.3, - num_instances=-1, - axis_azimuth: float = 70, - axis_limit: float = 1.7, - axis_dist: float = 10.0, - axis_elev: float = 15.0, - show_kpt_idx: bool = False, - scores_2d: Optional[np.ndarray] = None): + def _draw_3d_data_samples( + self, + image: np.ndarray, + pose_samples: PoseDataSample, + draw_gt: bool = True, + kpt_thr: float = 0.3, + num_instances=-1, + axis_azimuth: float = 70, + axis_limit: float = 1.7, + axis_dist: float = 10.0, + axis_elev: float = 15.0, + show_kpt_idx: bool = False, + scores_2d: Optional[np.ndarray] = None, + ): """Draw keypoints and skeletons (optional) of GT or prediction. Args: @@ -121,12 +136,12 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): vis_width = max(image.shape) vis_height = vis_width - if 'pred_instances' in pose_samples: + if "pred_instances" in pose_samples: pred_instances = pose_samples.pred_instances else: pred_instances = InstanceData() if num_instances < 0: - if 'keypoints' in pred_instances: + if "keypoints" in pred_instances: num_instances = len(pred_instances) else: num_instances = 0 @@ -146,28 +161,18 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): num_fig *= 2 plt.ioff() - fig = plt.figure( - figsize=(vis_width * num_instances * 0.01, vis_height * 0.01)) + fig = plt.figure(figsize=(vis_width * num_instances * 0.01, vis_height * 0.01)) - def _draw_3d_instances_kpts(keypoints, - scores, - scores_2d, - keypoints_visible, - fig_idx, - show_kpt_idx, - title=None): + def _draw_3d_instances_kpts(keypoints, scores, scores_2d, keypoints_visible, fig_idx, show_kpt_idx, title=None): - for idx, (kpts, score, score_2d) in enumerate( - zip(keypoints, scores, scores_2d)): + for idx, (kpts, score, score_2d) in enumerate(zip(keypoints, scores, scores_2d)): - valid = np.logical_and(score >= kpt_thr, score_2d >= kpt_thr, - np.any(~np.isnan(kpts), axis=-1)) + valid = np.logical_and(score >= kpt_thr, score_2d >= kpt_thr, np.any(~np.isnan(kpts), axis=-1)) kpts_valid = kpts[valid] - ax = fig.add_subplot( - 1, num_fig, fig_idx * (idx + 1), projection='3d') + ax = fig.add_subplot(1, num_fig, fig_idx * (idx + 1), projection="3d") ax.view_init(elev=axis_elev, azim=axis_azimuth) - ax.set_aspect('auto') + ax.set_aspect("auto") ax.set_xticks([]) ax.set_yticks([]) ax.set_zticks([]) @@ -175,7 +180,7 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): ax.set_yticklabels([]) ax.set_zticklabels([]) if title: - ax.set_title(f'{title} ({idx})') + ax.set_title(f"{title} ({idx})") ax.dist = axis_dist x_c = np.mean(kpts_valid[:, 0]) if valid.any() else 0 @@ -184,8 +189,7 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): ax.set_xlim3d([x_c - axis_limit / 2, x_c + axis_limit / 2]) ax.set_ylim3d([y_c - axis_limit / 2, y_c + axis_limit / 2]) - ax.set_zlim3d( - [min(0, z_c - axis_limit / 2), z_c + axis_limit / 2]) + ax.set_zlim3d([min(0, z_c - axis_limit / 2), z_c + axis_limit / 2]) if self.kpt_color is None or isinstance(self.kpt_color, str): kpt_color = [self.kpt_color] * len(kpts) @@ -193,32 +197,30 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): kpt_color = self.kpt_color else: raise ValueError( - f'the length of kpt_color ' - f'({len(self.kpt_color)}) does not matches ' - f'that of keypoints ({len(kpts)})') + f"the length of kpt_color " f"({len(self.kpt_color)}) does not matches " f"that of keypoints ({len(kpts)})" + ) x_3d, y_3d, z_3d = np.split(kpts_valid[:, :3], [1, 2], axis=1) - kpt_color = kpt_color[valid] / 255. + kpt_color = kpt_color[valid] / 255.0 - ax.scatter(x_3d, y_3d, z_3d, marker='o', c=kpt_color) + ax.scatter(x_3d, y_3d, z_3d, marker="o", c=kpt_color) if show_kpt_idx: for kpt_idx in range(len(x_3d)): - ax.text(x_3d[kpt_idx][0], y_3d[kpt_idx][0], - z_3d[kpt_idx][0], str(kpt_idx)) + ax.text(x_3d[kpt_idx][0], y_3d[kpt_idx][0], z_3d[kpt_idx][0], str(kpt_idx)) if self.skeleton is not None and self.link_color is not None: - if self.link_color is None or isinstance( - self.link_color, str): + if self.link_color is None or isinstance(self.link_color, str): link_color = [self.link_color] * len(self.skeleton) elif len(self.link_color) == len(self.skeleton): link_color = self.link_color else: raise ValueError( - f'the length of link_color ' - f'({len(self.link_color)}) does not matches ' - f'that of skeleton ({len(self.skeleton)})') + f"the length of link_color " + f"({len(self.link_color)}) does not matches " + f"that of skeleton ({len(self.skeleton)})" + ) for sk_id, sk in enumerate(self.skeleton): sk_indices = [_i for _i in sk] @@ -227,18 +229,15 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): zs_3d = kpts[sk_indices, 2] kpt_score = score[sk_indices] kpt_score_2d = score_2d[sk_indices] - if kpt_score.min() > kpt_thr and kpt_score_2d.min( - ) > kpt_thr: + if kpt_score.min() > kpt_thr and kpt_score_2d.min() > kpt_thr: # matplotlib uses RGB color in [0, 1] value range - _color = link_color[sk_id] / 255. - ax.plot( - xs_3d, ys_3d, zs_3d, color=_color, zdir='z') + _color = link_color[sk_id] / 255.0 + ax.plot(xs_3d, ys_3d, zs_3d, color=_color, zdir="z") - if 'keypoints' in pred_instances: - keypoints = pred_instances.get('keypoints', - pred_instances.keypoints) + if "keypoints" in pred_instances: + keypoints = pred_instances.get("keypoints", pred_instances.keypoints) - if 'keypoint_scores' in pred_instances: + if "keypoint_scores" in pred_instances: scores = pred_instances.keypoint_scores else: scores = np.ones(keypoints.shape[:-1]) @@ -246,72 +245,58 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): if scores_2d is None: scores_2d = np.ones(keypoints.shape[:-1]) - if 'keypoints_visible' in pred_instances: + if "keypoints_visible" in pred_instances: keypoints_visible = pred_instances.keypoints_visible else: keypoints_visible = np.ones(keypoints.shape[:-1]) - _draw_3d_instances_kpts(keypoints, scores, scores_2d, - keypoints_visible, 1, show_kpt_idx, - 'Prediction') + _draw_3d_instances_kpts(keypoints, scores, scores_2d, keypoints_visible, 1, show_kpt_idx, "Prediction") - if draw_gt and 'gt_instances' in pose_samples: + if draw_gt and "gt_instances" in pose_samples: gt_instances = pose_samples.gt_instances - if 'lifting_target' in gt_instances: - keypoints = gt_instances.get('lifting_target', - gt_instances.lifting_target) + if "lifting_target" in gt_instances: + keypoints = gt_instances.get("lifting_target", gt_instances.lifting_target) scores = np.ones(keypoints.shape[:-1]) - if 'lifting_target_visible' in gt_instances: + if "lifting_target_visible" in gt_instances: keypoints_visible = gt_instances.lifting_target_visible else: keypoints_visible = np.ones(keypoints.shape[:-1]) - elif 'keypoints_gt' in gt_instances: - keypoints = gt_instances.get('keypoints_gt', - gt_instances.keypoints_gt) + elif "keypoints_gt" in gt_instances: + keypoints = gt_instances.get("keypoints_gt", gt_instances.keypoints_gt) scores = np.ones(keypoints.shape[:-1]) - if 'keypoints_visible' in gt_instances: + if "keypoints_visible" in gt_instances: keypoints_visible = gt_instances.keypoints_visible else: keypoints_visible = np.ones(keypoints.shape[:-1]) else: - raise ValueError('to visualize ground truth results, ' - 'data sample must contain ' - '"lifting_target" or "keypoints_gt"') + raise ValueError("to visualize ground truth results, " "data sample must contain " '"lifting_target" or "keypoints_gt"') if scores_2d is None: scores_2d = np.ones(keypoints.shape[:-1]) - _draw_3d_instances_kpts(keypoints, scores, scores_2d, - keypoints_visible, 2, show_kpt_idx, - 'Ground Truth') + _draw_3d_instances_kpts(keypoints, scores, scores_2d, keypoints_visible, 2, show_kpt_idx, "Ground Truth") # convert figure to numpy array fig.tight_layout() fig.canvas.draw() - pred_img_data = np.frombuffer( - fig.canvas.tostring_rgb(), dtype=np.uint8) + pred_img_data = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8) if not pred_img_data.any(): pred_img_data = np.full((vis_height, vis_width, 3), 255) else: width, height = fig.get_size_inches() * fig.get_dpi() - pred_img_data = pred_img_data.reshape( - int(height), - int(width) * num_instances, 3) + pred_img_data = pred_img_data.reshape(int(height), int(width) * num_instances, 3) plt.close(fig) return pred_img_data - def _draw_instances_kpts(self, - image: np.ndarray, - instances: InstanceData, - kpt_thr: float = 0.3, - show_kpt_idx: bool = False, - skeleton_style: str = 'mmpose'): + def _draw_instances_kpts( + self, image: np.ndarray, instances: InstanceData, kpt_thr: float = 0.3, show_kpt_idx: bool = False, skeleton_style: str = "mmpose" + ): """Draw keypoints and skeletons (optional) of GT or prediction. Args: @@ -333,53 +318,39 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): img_h, img_w, _ = image.shape scores = None - if 'keypoints' in instances: - keypoints = instances.get('transformed_keypoints', - instances.keypoints) + if "keypoints" in instances: + keypoints = instances.get("transformed_keypoints", instances.keypoints) - if 'keypoint_scores' in instances: + if "keypoint_scores" in instances: scores = instances.keypoint_scores else: scores = np.ones(keypoints.shape[:-1]) - if 'keypoints_visible' in instances: + if "keypoints_visible" in instances: keypoints_visible = instances.keypoints_visible else: keypoints_visible = np.ones(keypoints.shape[:-1]) - if skeleton_style == 'openpose': - keypoints_info = np.concatenate( - (keypoints, scores[..., None], keypoints_visible[..., - None]), - axis=-1) + if skeleton_style == "openpose": + keypoints_info = np.concatenate((keypoints, scores[..., None], keypoints_visible[..., None]), axis=-1) # compute neck joint neck = np.mean(keypoints_info[:, [5, 6]], axis=1) # neck score when visualizing pred - neck[:, 2:4] = np.logical_and( - keypoints_info[:, 5, 2:4] > kpt_thr, - keypoints_info[:, 6, 2:4] > kpt_thr).astype(int) - new_keypoints_info = np.insert( - keypoints_info, 17, neck, axis=1) - - mmpose_idx = [ - 17, 6, 8, 10, 7, 9, 12, 14, 16, 13, 15, 2, 1, 4, 3 - ] - openpose_idx = [ - 1, 2, 3, 4, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17 - ] - new_keypoints_info[:, openpose_idx] = \ - new_keypoints_info[:, mmpose_idx] + neck[:, 2:4] = np.logical_and(keypoints_info[:, 5, 2:4] > kpt_thr, keypoints_info[:, 6, 2:4] > kpt_thr).astype(int) + new_keypoints_info = np.insert(keypoints_info, 17, neck, axis=1) + + mmpose_idx = [17, 6, 8, 10, 7, 9, 12, 14, 16, 13, 15, 2, 1, 4, 3] + openpose_idx = [1, 2, 3, 4, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17] + new_keypoints_info[:, openpose_idx] = new_keypoints_info[:, mmpose_idx] keypoints_info = new_keypoints_info - keypoints, scores, keypoints_visible = keypoints_info[ - ..., :2], keypoints_info[..., 2], keypoints_info[..., 3] + keypoints, scores, keypoints_visible = keypoints_info[..., :2], keypoints_info[..., 2], keypoints_info[..., 3] kpt_color = self.kpt_color if self.det_kpt_color is not None: kpt_color = self.det_kpt_color - for kpts, score, visible in zip(keypoints, scores, - keypoints_visible): + for kpts, score, visible in zip(keypoints, scores, keypoints_visible): kpts = np.array(kpts[..., :2], copy=False) if kpt_color is None or isinstance(kpt_color, str): @@ -387,14 +358,11 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): elif len(kpt_color) == len(kpts): kpt_color = kpt_color else: - raise ValueError(f'the length of kpt_color ' - f'({len(kpt_color)}) does not matches ' - f'that of keypoints ({len(kpts)})') + raise ValueError(f"the length of kpt_color " f"({len(kpt_color)}) does not matches " f"that of keypoints ({len(kpts)})") # draw each point on image for kid, kpt in enumerate(kpts): - if score[kid] < kpt_thr or not visible[ - kid] or kpt_color[kid] is None: + if score[kid] < kpt_thr or not visible[kid] or kpt_color[kid] is None: # skip the point that should not be drawn continue @@ -410,15 +378,17 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): face_colors=color, edge_colors=color, alpha=transparency, - line_widths=self.radius) + line_widths=self.radius, + ) if show_kpt_idx: self.draw_texts( str(kid), kpt, colors=color, font_sizes=self.radius * 3, - vertical_alignments='bottom', - horizontal_alignments='center') + vertical_alignments="bottom", + horizontal_alignments="center", + ) # draw links skeleton = self.skeleton @@ -434,9 +404,8 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): link_color = link_color else: raise ValueError( - f'the length of link_color ' - f'({len(link_color)}) does not matches ' - f'that of skeleton ({len(skeleton)})') + f"the length of link_color " f"({len(link_color)}) does not matches " f"that of skeleton ({len(skeleton)})" + ) for sk_id, sk in enumerate(skeleton): pos1 = (int(kpts[sk[0], 0]), int(kpts[sk[0], 1])) @@ -444,12 +413,19 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): if not (visible[sk[0]] and visible[sk[1]]): continue - if (pos1[0] <= 0 or pos1[0] >= img_w or pos1[1] <= 0 - or pos1[1] >= img_h or pos2[0] <= 0 - or pos2[0] >= img_w or pos2[1] <= 0 - or pos2[1] >= img_h or score[sk[0]] < kpt_thr - or score[sk[1]] < kpt_thr - or link_color[sk_id] is None): + if ( + pos1[0] <= 0 + or pos1[0] >= img_w + or pos1[1] <= 0 + or pos1[1] >= img_h + or pos2[0] <= 0 + or pos2[0] >= img_w + or pos2[1] <= 0 + or pos2[1] >= img_h + or score[sk[0]] < kpt_thr + or score[sk[1]] < kpt_thr + or link_color[sk_id] is None + ): # skip the link that should not be drawn continue X = np.array((pos1[0], pos2[0])) @@ -459,58 +435,50 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): color = tuple(int(c) for c in color) transparency = self.alpha if self.show_keypoint_weight: - transparency *= max( - 0, min(1, 0.5 * (score[sk[0]] + score[sk[1]]))) + transparency *= max(0, min(1, 0.5 * (score[sk[0]] + score[sk[1]]))) - if skeleton_style == 'openpose': + if skeleton_style == "openpose": mX = np.mean(X) mY = np.mean(Y) - length = ((Y[0] - Y[1])**2 + (X[0] - X[1])**2)**0.5 - angle = math.degrees( - math.atan2(Y[0] - Y[1], X[0] - X[1])) + length = ((Y[0] - Y[1]) ** 2 + (X[0] - X[1]) ** 2) ** 0.5 + angle = math.degrees(math.atan2(Y[0] - Y[1], X[0] - X[1])) stickwidth = 2 - polygons = cv2.ellipse2Poly( - (int(mX), int(mY)), - (int(length / 2), int(stickwidth)), int(angle), - 0, 360, 1) + polygons = cv2.ellipse2Poly((int(mX), int(mY)), (int(length / 2), int(stickwidth)), int(angle), 0, 360, 1) - self.draw_polygons( - polygons, - edge_colors=color, - face_colors=color, - alpha=transparency) + self.draw_polygons(polygons, edge_colors=color, face_colors=color, alpha=transparency) else: - self.draw_lines( - X, Y, color, line_widths=self.line_width) + self.draw_lines(X, Y, color, line_widths=self.line_width) return self.get_image(), scores @master_only - def add_datasample(self, - name: str, - image: np.ndarray, - data_sample: PoseDataSample, - det_data_sample: Optional[PoseDataSample] = None, - draw_gt: bool = True, - draw_pred: bool = True, - draw_2d: bool = True, - draw_bbox: bool = False, - show_kpt_idx: bool = False, - skeleton_style: str = 'mmpose', - dataset_2d: str = 'coco', - dataset_3d: str = 'h36m', - convert_keypoint: bool = True, - axis_azimuth: float = 70, - axis_limit: float = 1.7, - axis_dist: float = 10.0, - axis_elev: float = 15.0, - num_instances: int = -1, - show: bool = False, - wait_time: float = 0, - out_file: Optional[str] = None, - kpt_thr: float = 0.3, - step: int = 0) -> None: + def add_datasample( + self, + name: str, + image: np.ndarray, + data_sample: PoseDataSample, + det_data_sample: Optional[PoseDataSample] = None, + draw_gt: bool = True, + draw_pred: bool = True, + draw_2d: bool = True, + draw_bbox: bool = False, + show_kpt_idx: bool = False, + skeleton_style: str = "mmpose", + dataset_2d: str = "coco", + dataset_3d: str = "h36m", + convert_keypoint: bool = True, + axis_azimuth: float = 70, + axis_limit: float = 1.7, + axis_dist: float = 10.0, + axis_elev: float = 15.0, + num_instances: int = -1, + show: bool = False, + wait_time: float = 0, + out_file: Optional[str] = None, + kpt_thr: float = 0.3, + step: int = 0, + ) -> None: """Draw datasample and save to all backends. - If GT and prediction are plotted at the same time, they are @@ -576,23 +544,20 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): det_img_data = image.copy() # draw bboxes & keypoints - if (det_data_sample is not None - and 'pred_instances' in det_data_sample): + if det_data_sample is not None and "pred_instances" in det_data_sample: det_img_data, scores_2d = self._draw_instances_kpts( image=det_img_data, instances=det_data_sample.pred_instances, kpt_thr=kpt_thr, show_kpt_idx=show_kpt_idx, - skeleton_style=skeleton_style) + skeleton_style=skeleton_style, + ) if draw_bbox: - det_img_data = self._draw_instances_bbox( - det_img_data, det_data_sample.pred_instances) + det_img_data = self._draw_instances_bbox(det_img_data, det_data_sample.pred_instances) if scores_2d is not None and convert_keypoint: if scores_2d.ndim == 2: scores_2d = scores_2d[..., None] - scores_2d = np.squeeze( - convert_keypoint_definition(scores_2d, dataset_2d, dataset_3d), - axis=-1) + scores_2d = np.squeeze(convert_keypoint_definition(scores_2d, dataset_2d, dataset_3d), axis=-1) pred_img_data = self._draw_3d_data_samples( image.copy(), data_sample, @@ -603,7 +568,8 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): show_kpt_idx=show_kpt_idx, axis_dist=axis_dist, axis_elev=axis_elev, - scores_2d=scores_2d) + scores_2d=scores_2d, + ) # merge visualization results if det_img_data is not None: @@ -613,9 +579,11 @@ class Pose3dLocalVisualizer(PoseLocalVisualizer): det_img_data, height // 2, (height // 2 + 1) if height % 2 == 1 else height // 2, - width // 2, (width // 2 + 1) if width % 2 == 1 else width // 2, + width // 2, + (width // 2 + 1) if width % 2 == 1 else width // 2, cv2.BORDER_CONSTANT, - value=(255, 255, 255)) + value=(255, 255, 255), + ) drawn_img = np.concatenate((det_img_data, pred_img_data), axis=1) else: drawn_img = pred_img_data diff --git a/mmpose/visualization/opencv_backend_visualizer.py b/mmpose/visualization/opencv_backend_visualizer.py index 9604d07fead0187fc08e084f07f7714406072b93..2cdaacc271343787816e0c34516b692255a1d50f 100644 --- a/mmpose/visualization/opencv_backend_visualizer.py +++ b/mmpose/visualization/opencv_backend_visualizer.py @@ -29,15 +29,11 @@ class OpencvBackendVisualizer(Visualizer): alpha (int, float): The transparency of bboxes. Defaults to ``1.0`` """ - def __init__(self, - name='visualizer', - backend: str = 'matplotlib', - *args, - **kwargs): + def __init__(self, name="visualizer", backend: str = "matplotlib", *args, **kwargs): super().__init__(name, *args, **kwargs) - assert backend in ('opencv', 'matplotlib'), f'the argument ' \ - f'\'backend\' must be either \'opencv\' or \'matplotlib\', ' \ - f'but got \'{backend}\'.' + assert backend in ("opencv", "matplotlib"), ( + f"the argument " f"'backend' must be either 'opencv' or 'matplotlib', " f"but got '{backend}'." + ) self.backend = backend @master_only @@ -49,25 +45,19 @@ class OpencvBackendVisualizer(Visualizer): backend (str): The backend to save the image. """ assert image is not None - image = image.astype('uint8') + image = image.astype("uint8") self._image = image self.width, self.height = image.shape[1], image.shape[0] - self._default_font_size = max( - np.sqrt(self.height * self.width) // 90, 10) + self._default_font_size = max(np.sqrt(self.height * self.width) // 90, 10) - if self.backend == 'matplotlib': + if self.backend == "matplotlib": # add a small 1e-2 to avoid precision lost due to matplotlib's # truncation (https://github.com/matplotlib/matplotlib/issues/15363) # noqa - self.fig_save.set_size_inches( # type: ignore - (self.width + 1e-2) / self.dpi, - (self.height + 1e-2) / self.dpi) + self.fig_save.set_size_inches((self.width + 1e-2) / self.dpi, (self.height + 1e-2) / self.dpi) # type: ignore # self.canvas = mpl.backends.backend_cairo.FigureCanvasCairo(fig) self.ax_save.cla() self.ax_save.axis(False) - self.ax_save.imshow( - image, - extent=(0, self.width, self.height, 0), - interpolation='none') + self.ax_save.imshow(image, extent=(0, self.width, self.height, 0), interpolation="none") @master_only def get_image(self) -> np.ndarray: @@ -76,20 +66,21 @@ class OpencvBackendVisualizer(Visualizer): Returns: np.ndarray: the drawn image which channel is RGB. """ - assert self._image is not None, 'Please set image using `set_image`' - if self.backend == 'matplotlib': + assert self._image is not None, "Please set image using `set_image`" + if self.backend == "matplotlib": return super().get_image() else: return self._image @master_only - def draw_circles(self, - center: Union[np.ndarray, torch.Tensor], - radius: Union[np.ndarray, torch.Tensor], - face_colors: Union[str, tuple, List[str], - List[tuple]] = 'none', - alpha: float = 1.0, - **kwargs) -> 'Visualizer': + def draw_circles( + self, + center: Union[np.ndarray, torch.Tensor], + radius: Union[np.ndarray, torch.Tensor], + face_colors: Union[str, tuple, List[str], List[tuple]] = "none", + alpha: float = 1.0, + **kwargs, + ) -> "Visualizer": """Draw single or multiple circles. Args: @@ -120,29 +111,19 @@ class OpencvBackendVisualizer(Visualizer): alpha (Union[int, float]): The transparency of circles. Defaults to 0.8. """ - if self.backend == 'matplotlib': - super().draw_circles( - center=center, - radius=radius, - face_colors=face_colors, - alpha=alpha, - **kwargs) - elif self.backend == 'opencv': + if self.backend == "matplotlib": + super().draw_circles(center=center, radius=radius, face_colors=face_colors, alpha=alpha, **kwargs) + elif self.backend == "opencv": if isinstance(face_colors, str): face_colors = mmcv.color_val(face_colors)[::-1] if alpha == 1.0: - self._image = cv2.circle(self._image, - (int(center[0]), int(center[1])), - int(radius), face_colors, -1) + self._image = cv2.circle(self._image, (int(center[0]), int(center[1])), int(radius), face_colors, -1) else: - img = cv2.circle(self._image.copy(), - (int(center[0]), int(center[1])), int(radius), - face_colors, -1) - self._image = cv2.addWeighted(self._image, 1 - alpha, img, - alpha, 0) + img = cv2.circle(self._image.copy(), (int(center[0]), int(center[1])), int(radius), face_colors, -1) + self._image = cv2.addWeighted(self._image, 1 - alpha, img, alpha, 0) else: - raise ValueError(f'got unsupported backend {self.backend}') + raise ValueError(f"got unsupported backend {self.backend}") @master_only def draw_texts( @@ -150,12 +131,12 @@ class OpencvBackendVisualizer(Visualizer): texts: Union[str, List[str]], positions: Union[np.ndarray, torch.Tensor], font_sizes: Optional[Union[int, List[int]]] = None, - colors: Union[str, tuple, List[str], List[tuple]] = 'g', - vertical_alignments: Union[str, List[str]] = 'top', - horizontal_alignments: Union[str, List[str]] = 'left', + colors: Union[str, tuple, List[str], List[tuple]] = "g", + vertical_alignments: Union[str, List[str]] = "top", + horizontal_alignments: Union[str, List[str]] = "left", bboxes: Optional[Union[dict, List[dict]]] = None, **kwargs, - ) -> 'Visualizer': + ) -> "Visualizer": """Draw single or multiple text boxes. Args: @@ -218,7 +199,7 @@ class OpencvBackendVisualizer(Visualizer): `New in version 0.6.0.` """ # noqa: E501 - if self.backend == 'matplotlib': + if self.backend == "matplotlib": super().draw_texts( texts=texts, positions=positions, @@ -227,48 +208,48 @@ class OpencvBackendVisualizer(Visualizer): vertical_alignments=vertical_alignments, horizontal_alignments=horizontal_alignments, bboxes=bboxes, - **kwargs) + **kwargs, + ) - elif self.backend == 'opencv': + elif self.backend == "opencv": font_scale = max(0.1, font_sizes / 30) thickness = max(1, font_sizes // 15) - text_size, text_baseline = cv2.getTextSize(texts, - cv2.FONT_HERSHEY_DUPLEX, - font_scale, thickness) + text_size, text_baseline = cv2.getTextSize(texts, cv2.FONT_HERSHEY_DUPLEX, font_scale, thickness) x = int(positions[0]) - if horizontal_alignments == 'right': + if horizontal_alignments == "right": x = max(0, x - text_size[0]) y = int(positions[1]) - if vertical_alignments == 'top': + if vertical_alignments == "top": y = min(self.height, y + text_size[1]) if bboxes is not None: - bbox_color = bboxes[0]['facecolor'] + bbox_color = bboxes[0]["facecolor"] if isinstance(bbox_color, str): bbox_color = mmcv.color_val(bbox_color)[::-1] y = y - text_baseline // 2 self._image = cv2.rectangle( - self._image, (x, y - text_size[1] - text_baseline // 2), - (x + text_size[0], y + text_baseline // 2), bbox_color, - cv2.FILLED) - - self._image = cv2.putText(self._image, texts, (x, y), - cv2.FONT_HERSHEY_SIMPLEX, font_scale, - colors, thickness - 1) + self._image, + (x, y - text_size[1] - text_baseline // 2), + (x + text_size[0], y + text_baseline // 2), + bbox_color, + cv2.FILLED, + ) + + self._image = cv2.putText(self._image, texts, (x, y), cv2.FONT_HERSHEY_SIMPLEX, font_scale, colors, thickness - 1) else: - raise ValueError(f'got unsupported backend {self.backend}') + raise ValueError(f"got unsupported backend {self.backend}") @master_only - def draw_bboxes(self, - bboxes: Union[np.ndarray, torch.Tensor], - edge_colors: Union[str, tuple, List[str], - List[tuple]] = 'g', - line_widths: Union[Union[int, float], - List[Union[int, float]]] = 2, - **kwargs) -> 'Visualizer': + def draw_bboxes( + self, + bboxes: Union[np.ndarray, torch.Tensor], + edge_colors: Union[str, tuple, List[str], List[tuple]] = "g", + line_widths: Union[Union[int, float], List[Union[int, float]]] = 2, + **kwargs, + ) -> "Visualizer": """Draw single or multiple bboxes. Args: @@ -297,32 +278,23 @@ class OpencvBackendVisualizer(Visualizer): alpha (Union[int, float]): The transparency of bboxes. Defaults to 0.8. """ - if self.backend == 'matplotlib': - super().draw_bboxes( - bboxes=bboxes, - edge_colors=edge_colors, - line_widths=line_widths, - **kwargs) - - elif self.backend == 'opencv': - self._image = mmcv.imshow_bboxes( - self._image, - bboxes, - edge_colors, - top_k=-1, - thickness=line_widths, - show=False) + if self.backend == "matplotlib": + super().draw_bboxes(bboxes=bboxes, edge_colors=edge_colors, line_widths=line_widths, **kwargs) + + elif self.backend == "opencv": + self._image = mmcv.imshow_bboxes(self._image, bboxes, edge_colors, top_k=-1, thickness=line_widths, show=False) else: - raise ValueError(f'got unsupported backend {self.backend}') + raise ValueError(f"got unsupported backend {self.backend}") @master_only - def draw_lines(self, - x_datas: Union[np.ndarray, torch.Tensor], - y_datas: Union[np.ndarray, torch.Tensor], - colors: Union[str, tuple, List[str], List[tuple]] = 'g', - line_widths: Union[Union[int, float], - List[Union[int, float]]] = 2, - **kwargs) -> 'Visualizer': + def draw_lines( + self, + x_datas: Union[np.ndarray, torch.Tensor], + y_datas: Union[np.ndarray, torch.Tensor], + colors: Union[str, tuple, List[str], List[tuple]] = "g", + line_widths: Union[Union[int, float], List[Union[int, float]]] = 2, + **kwargs, + ) -> "Visualizer": """Draw single or multiple line segments. Args: @@ -349,33 +321,24 @@ class OpencvBackendVisualizer(Visualizer): If ``line_widths`` is single value, all the lines will have the same linewidth. Defaults to 2. """ - if self.backend == 'matplotlib': - super().draw_lines( - x_datas=x_datas, - y_datas=y_datas, - colors=colors, - line_widths=line_widths, - **kwargs) + if self.backend == "matplotlib": + super().draw_lines(x_datas=x_datas, y_datas=y_datas, colors=colors, line_widths=line_widths, **kwargs) - elif self.backend == 'opencv': + elif self.backend == "opencv": if isinstance(colors, str): colors = mmcv.color_val(colors)[::-1] - self._image = cv2.line( - self._image, (x_datas[0], y_datas[0]), - (x_datas[1], y_datas[1]), - colors, - thickness=line_widths) + self._image = cv2.line(self._image, (x_datas[0], y_datas[0]), (x_datas[1], y_datas[1]), colors, thickness=line_widths) else: - raise ValueError(f'got unsupported backend {self.backend}') + raise ValueError(f"got unsupported backend {self.backend}") @master_only - def draw_polygons(self, - polygons: Union[Union[np.ndarray, torch.Tensor], - List[Union[np.ndarray, torch.Tensor]]], - edge_colors: Union[str, tuple, List[str], - List[tuple]] = 'g', - alpha: float = 1.0, - **kwargs) -> 'Visualizer': + def draw_polygons( + self, + polygons: Union[Union[np.ndarray, torch.Tensor], List[Union[np.ndarray, torch.Tensor]]], + edge_colors: Union[str, tuple, List[str], List[tuple]] = "g", + alpha: float = 1.0, + **kwargs, + ) -> "Visualizer": """Draw single or multiple bboxes. Args: @@ -405,31 +368,20 @@ class OpencvBackendVisualizer(Visualizer): alpha (Union[int, float]): The transparency of polygons. Defaults to 0.8. """ - if self.backend == 'matplotlib': - super().draw_polygons( - polygons=polygons, - edge_colors=edge_colors, - alpha=alpha, - **kwargs) - - elif self.backend == 'opencv': + if self.backend == "matplotlib": + super().draw_polygons(polygons=polygons, edge_colors=edge_colors, alpha=alpha, **kwargs) + + elif self.backend == "opencv": if alpha == 1.0: - self._image = cv2.fillConvexPoly(self._image, polygons, - edge_colors) + self._image = cv2.fillConvexPoly(self._image, polygons, edge_colors) else: - img = cv2.fillConvexPoly(self._image.copy(), polygons, - edge_colors) - self._image = cv2.addWeighted(self._image, 1 - alpha, img, - alpha, 0) + img = cv2.fillConvexPoly(self._image.copy(), polygons, edge_colors) + self._image = cv2.addWeighted(self._image, 1 - alpha, img, alpha, 0) else: - raise ValueError(f'got unsupported backend {self.backend}') + raise ValueError(f"got unsupported backend {self.backend}") @master_only - def show(self, - drawn_img: Optional[np.ndarray] = None, - win_name: str = 'image', - wait_time: float = 0., - continue_key=' ') -> None: + def show(self, drawn_img: Optional[np.ndarray] = None, win_name: str = "image", wait_time: float = 0.0, continue_key=" ") -> None: """Show the drawn image. Args: @@ -442,24 +394,20 @@ class OpencvBackendVisualizer(Visualizer): continue_key (str): The key for users to continue. Defaults to the space key. """ - if self.backend == 'matplotlib': - super().show( - drawn_img=drawn_img, - win_name=win_name, - wait_time=wait_time, - continue_key=continue_key) - - elif self.backend == 'opencv': + if self.backend == "matplotlib": + super().show(drawn_img=drawn_img, win_name=win_name, wait_time=wait_time, continue_key=continue_key) + + elif self.backend == "opencv": # Keep images are shown in the same window, and the title of window # will be updated with `win_name`. if not hasattr(self, win_name): self._cv_win_name = win_name - cv2.namedWindow(winname=f'{id(self)}') - cv2.setWindowTitle(f'{id(self)}', win_name) + cv2.namedWindow(winname=f"{id(self)}") + cv2.setWindowTitle(f"{id(self)}", win_name) else: - cv2.setWindowTitle(f'{id(self)}', win_name) + cv2.setWindowTitle(f"{id(self)}", win_name) shown_img = self.get_image() if drawn_img is None else drawn_img cv2.imshow(str(id(self)), mmcv.bgr2rgb(shown_img)) cv2.waitKey(int(np.ceil(wait_time * 1000))) else: - raise ValueError(f'got unsupported backend {self.backend}') + raise ValueError(f"got unsupported backend {self.backend}") diff --git a/mmpose/visualization/simcc_vis.py b/mmpose/visualization/simcc_vis.py index 3a5b602fb5c4ffe2a46ddb2cf09a2cd4501b1664..eb138538fe63193ab4fd3d5858da1527f9d90afd 100644 --- a/mmpose/visualization/simcc_vis.py +++ b/mmpose/visualization/simcc_vis.py @@ -9,12 +9,9 @@ from torchvision.transforms import ToPILImage class SimCCVisualizer: - def draw_instance_xy_heatmap(self, - heatmap: torch.Tensor, - overlaid_image: Optional[np.ndarray], - n: int = 20, - mix: bool = True, - weight: float = 0.5): + def draw_instance_xy_heatmap( + self, heatmap: torch.Tensor, overlaid_image: Optional[np.ndarray], n: int = 20, mix: bool = True, weight: float = 0.5 + ): """Draw heatmaps of GT or prediction. Args: @@ -31,18 +28,16 @@ class SimCCVisualizer: xy_heatmap, K = self.split_simcc_xy(heatmap) K = K if K <= n else n blank_size = tuple(heatmap.size()[1:]) - maps = {'x': [], 'y': []} + maps = {"x": [], "y": []} for i in xy_heatmap: - x, y = self.draw_1d_heatmaps(i['x']), self.draw_1d_heatmaps(i['y']) - maps['x'].append(x) - maps['y'].append(y) + x, y = self.draw_1d_heatmaps(i["x"]), self.draw_1d_heatmaps(i["y"]) + maps["x"].append(x) + maps["y"].append(y) white = self.creat_blank(blank_size, K) map2d = self.draw_2d_heatmaps(heatmap2d) if mix: - map2d = cv.addWeighted(overlaid_image, 1 - weight, map2d, weight, - 0) - self.image_cover(white, map2d, int(blank_size[1] * 0.1), - int(blank_size[0] * 0.1)) + map2d = cv.addWeighted(overlaid_image, 1 - weight, map2d, weight, 0) + self.image_cover(white, map2d, int(blank_size[1] * 0.1), int(blank_size[0] * 0.1)) white = self.add_1d_heatmaps(maps, white, blank_size, K) return white @@ -55,7 +50,7 @@ class SimCCVisualizer: for _ in range(k): xy_dict = {} single_heatmap = heatmap[_] - xy_dict['x'], xy_dict['y'] = self.merge_maps(single_heatmap) + xy_dict["x"], xy_dict["y"] = self.merge_maps(single_heatmap) maps.append(xy_dict) return maps, k @@ -69,7 +64,7 @@ class SimCCVisualizer: """Draw one-dimensional heatmap.""" size = heatmap_1d.size() length = max(size) - np_heatmap = ToPILImage()(heatmap_1d).convert('RGB') + np_heatmap = ToPILImage()(heatmap_1d).convert("RGB") cv_img = cv.cvtColor(np.asarray(np_heatmap), cv.COLOR_RGB2BGR) if size[0] < size[1]: cv_img = cv.resize(cv_img, (length, 15)) @@ -78,59 +73,41 @@ class SimCCVisualizer: single_map = cv.applyColorMap(cv_img, cv.COLORMAP_JET) return single_map - def creat_blank(self, - size: Union[list, tuple], - K: int = 20, - interval: int = 10): + def creat_blank(self, size: Union[list, tuple], K: int = 20, interval: int = 10): """Create the background.""" - blank_height = int( - max(size[0] * 2, size[0] * 1.1 + (K + 1) * (15 + interval))) - blank_width = int( - max(size[1] * 2, size[1] * 1.1 + (K + 1) * (15 + interval))) + blank_height = int(max(size[0] * 2, size[0] * 1.1 + (K + 1) * (15 + interval))) + blank_width = int(max(size[1] * 2, size[1] * 1.1 + (K + 1) * (15 + interval))) blank = np.zeros((blank_height, blank_width, 3), np.uint8) blank.fill(255) return blank def draw_2d_heatmaps(self, heatmap_2d): """Draw a two-dimensional heatmap fused with the original image.""" - np_heatmap = ToPILImage()(heatmap_2d).convert('RGB') + np_heatmap = ToPILImage()(heatmap_2d).convert("RGB") cv_img = cv.cvtColor(np.asarray(np_heatmap), cv.COLOR_RGB2BGR) map_2d = cv.applyColorMap(cv_img, cv.COLORMAP_JET) return map_2d - def image_cover(self, background: np.ndarray, foreground: np.ndarray, - x: int, y: int): + def image_cover(self, background: np.ndarray, foreground: np.ndarray, x: int, y: int): """Paste the foreground on the background.""" fore_size = foreground.shape - background[y:y + fore_size[0], x:x + fore_size[1]] = foreground + background[y : y + fore_size[0], x : x + fore_size[1]] = foreground return background - def add_1d_heatmaps(self, - maps: dict, - background: np.ndarray, - map2d_size: Union[tuple, list], - K: int, - interval: int = 10): + def add_1d_heatmaps(self, maps: dict, background: np.ndarray, map2d_size: Union[tuple, list], K: int, interval: int = 10): """Paste one-dimensional heatmaps onto the background in turn.""" - y_startpoint, x_startpoint = [int(1.1*map2d_size[1]), - int(0.1*map2d_size[0])],\ - [int(0.1*map2d_size[1]), - int(1.1*map2d_size[0])] + y_startpoint, x_startpoint = [int(1.1 * map2d_size[1]), int(0.1 * map2d_size[0])], [ + int(0.1 * map2d_size[1]), + int(1.1 * map2d_size[0]), + ] x_startpoint[1] += interval * 2 y_startpoint[0] += interval * 2 add = interval + 10 for i in range(K): - self.image_cover(background, maps['x'][i], x_startpoint[0], - x_startpoint[1]) - cv.putText(background, str(i), - (x_startpoint[0] - 30, x_startpoint[1] + 10), - cv.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2) - self.image_cover(background, maps['y'][i], y_startpoint[0], - y_startpoint[1]) - cv.putText(background, str(i), - (y_startpoint[0], y_startpoint[1] - 5), - cv.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2) + self.image_cover(background, maps["x"][i], x_startpoint[0], x_startpoint[1]) + cv.putText(background, str(i), (x_startpoint[0] - 30, x_startpoint[1] + 10), cv.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2) + self.image_cover(background, maps["y"][i], y_startpoint[0], y_startpoint[1]) + cv.putText(background, str(i), (y_startpoint[0], y_startpoint[1] - 5), cv.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2) x_startpoint[1] += add y_startpoint[0] += add - return background[:x_startpoint[1] + y_startpoint[1] + - 1, :y_startpoint[0] + x_startpoint[0] + 1] + return background[: x_startpoint[1] + y_startpoint[1] + 1, : y_startpoint[0] + x_startpoint[0] + 1] diff --git a/pmpose/__init__.py b/pmpose/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..8529c68c2ce47713132c719ad0c8627f2ff5d064 --- /dev/null +++ b/pmpose/__init__.py @@ -0,0 +1,12 @@ +# Copyright (c) authors of BBoxMaskPose (BMPv2). All rights reserved. + +""" +PMPose package - Public API for pose estimation. + +This package provides a stable wrapper around MaskPose (preparing for PMPose) +with a user-friendly interface for pose estimation tasks. +""" + +from .api import PMPose + +__all__ = ["PMPose"] diff --git a/pmpose/api.py b/pmpose/api.py new file mode 100644 index 0000000000000000000000000000000000000000..b6cbe0f29b7944d3b8fb658a4e57fc36efa5b320 --- /dev/null +++ b/pmpose/api.py @@ -0,0 +1,393 @@ +# Copyright (c) authors of BBoxMaskPose (BMPv2). All rights reserved. + +""" +Public API for PMPose (MaskPose) wrapper. + +This module provides a stable, user-friendly interface for pose estimation +using the MaskPose model. It handles model initialization, inference, +and visualization while preparing for future PMPose model integration. + +Note: Current implementation uses MaskPose and returns dummy presence/visibility +values to maintain API compatibility with future PMPose model. +""" + +import os +from pathlib import Path +from typing import List, Optional, Tuple, Union + +import cv2 +import numpy as np +import torch +from mmengine.structures import InstanceData + +from mmpose.apis import inference_topdown, init_model as init_pose_estimator + +from .mm_utils import run_MMPose as _internal_run_mmpose +from .posevis_lite import pose_visualization + +BMP_ROOT = Path(__file__).parent.parent + +# Pretrained model URLs +PRETRAINED_URLS = { + # "pmpose-default": "https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/MaskPose-b.pth", + # "maskpose-b": "https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/MaskPose-b.pth", + # MaskPose + "MaskPose-s": "https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/MaskPose/MaskPose-s-1.1.0.pth", + "MaskPose-b": "https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/MaskPose/MaskPose-b-1.1.0.pth", + "MaskPose-l": "https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/MaskPose/MaskPose-l-1.1.0.pth", + "MaskPose-h": "https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/MaskPose/MaskPose-h-1.1.0.pth", + # MaskPose whole body + "MaskPose-s-wb": "TBD", + "MaskPose-b-wb": "TBD", + "MaskPose-l-wb": "TBD", + "MaskPose-h-wb": "TBD", + # PMPose + "PMPose-s": "https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/PMPose/PMPose-s-1.0.0.pth", + "PMPose-b": "https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/PMPose/PMPose-b-1.0.0.pth", + "PMPose-l": "https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/PMPose/PMPose-l-1.0.0.pth", + "PMPose-h": "https://huggingface.co/vrg-prague/BBoxMaskPose/resolve/main/PMPose/PMPose-h-1.0.0.pth", +} + +# Default config paths (relative to package root) +DEFAULT_CONFIGS = { + # MaskPose + "MaskPose-s": "mmpose/configs/MaskPose/MaskPose-s-1.1.0.py", + "MaskPose-b": "mmpose/configs/MaskPose/MaskPose-b-1.1.0.py", + "MaskPose-l": "mmpose/configs/MaskPose/MaskPose-l-1.1.0.py", + "MaskPose-h": "mmpose/configs/MaskPose/MaskPose-h-1.1.0.py", + # MaskPose-wb (whole body) + "MaskPose-s-wb": "mmpose/configs/MaskPose/MaskPose-s-wb-1.1.0.py", + "MaskPose-b-wb": "mmpose/configs/MaskPose/MaskPose-b-wb-1.1.0.py", + "MaskPose-l-wb": "mmpose/configs/MaskPose/MaskPose-l-wb-1.1.0.py", + "MaskPose-h-wb": "mmpose/configs/MaskPose/MaskPose-h-wb-1.1.0.py", + # PMPose + "PMPose-s": "mmpose/configs/ProbMaskPose/PMPose-s-1.0.0.py", + "PMPose-b": "mmpose/configs/ProbMaskPose/PMPose-b-1.0.0.py", + "PMPose-l": "mmpose/configs/ProbMaskPose/PMPose-l-1.0.0.py", + "PMPose-h": "mmpose/configs/ProbMaskPose/PMPose-h-1.0.0.py", +} + + +class PMPose: + """ + Public wrapper API for PMPose (currently MaskPose) pose estimation. + + This class provides a torch.hub-like interface for pose estimation, + handling model initialization, inference, and visualization. + + Example: + >>> pose_model = PMPose(device="cuda") + >>> keypoints, presence, visibility, heatmaps = pose_model.predict( + ... image="path/to/image.jpg", + ... bboxes=[[100, 100, 300, 400]], + ... return_probmaps=False + ... ) + """ + + def __init__( + self, + device: str = "cuda", + variant: str = "PMPose-b", + from_pretrained: bool = False, + pretrained_id: Optional[str] = None, + config_path: Optional[str] = None, + ): + """ + Initialize PMPose model. + + Args: + device (str): Device for inference ('cuda' or 'cpu'). Defaults to 'cuda'. + variant (str): Model variant to use. Defaults to 'PMPose-b'. + from_pretrained (bool): Whether to load pretrained weights. Defaults to False. + pretrained_id (str, optional): ID for pretrained model from PRETRAINED_URLS. + config_path (str, optional): Path to custom config file. Overrides variant. + """ + self.device = device + self.variant = variant + self._model = None + + # Determine config path + if config_path is not None: + self.config_path = config_path + elif variant in DEFAULT_CONFIGS: + self.config_path = os.path.join(BMP_ROOT, DEFAULT_CONFIGS[variant]) + else: + raise ValueError(f"Unknown variant '{variant}'. Available: {list(DEFAULT_CONFIGS.keys())}") + + # Determine checkpoint path + checkpoint_path = None + if from_pretrained: + if pretrained_id is None: + pretrained_id = variant # Use variant as default pretrained_id + if pretrained_id not in PRETRAINED_URLS: + raise ValueError(f"Unknown pretrained_id '{pretrained_id}'. " f"Available: {list(PRETRAINED_URLS.keys())}") + checkpoint_path = PRETRAINED_URLS[pretrained_id] + + # Initialize model + self._load_model(checkpoint_path) + + def _load_model(self, checkpoint_path: Optional[str] = None): + """Load the pose estimation model.""" + cfg_options = dict(model=dict(test_cfg=dict(output_heatmaps=False))) + self._model = init_pose_estimator( + self.config_path, + checkpoint_path, + device=self.device, + cfg_options=cfg_options, + ) + + def to(self, device: str): + """ + Move model to specified device. + + Args: + device (str): Target device ('cuda' or 'cpu'). + + Returns: + PMPose: Self for chaining. + """ + self.device = device + if self._model is not None: + self._model.to(device) + return self + + def device(self, device_str: str): + """Alias for to() method for compatibility.""" + return self.to(device_str) + + def load_from_file(self, path: str) -> None: + """ + Load model weights from a local file. + + Args: + path (str): Path to checkpoint file. + """ + if not os.path.exists(path): + raise FileNotFoundError(f"Checkpoint file not found: {path}") + + # Reload model with new checkpoint + self._load_model(checkpoint_path=path) + + def predict( + self, + image: Union[str, np.ndarray], + bboxes: Union[List, np.ndarray], + masks: Optional[Union[List, np.ndarray]] = None, + return_probmaps: bool = False, + ) -> Tuple[np.ndarray, np.ndarray, np.ndarray, Optional[np.ndarray]]: + """ + Run pose estimation on image given bounding boxes. + + Args: + image: Image path (str) or BGR numpy array. + bboxes: List/array of N bounding boxes in [x1, y1, x2, y2] format. + masks: Optional instance masks. Can be: + - List of (H, W) boolean/0-1 numpy arrays + - numpy array of shape (N, H, W) + - None (no masks) + return_probmaps: If True, return heatmaps. Defaults to False. + + Returns: + Tuple containing: + - keypoints: (N, K, 3) array with [x, y, score] in image coordinates + - presence: (N, K) array with presence probabilities (dummy for MaskPose) + - visibility: (N, K) array with visibility flags (dummy for MaskPose) + - heatmaps: (N, K, H, W) if return_probmaps=True, else None + """ + # Load image if path is provided + if isinstance(image, str): + img = cv2.imread(image) + if img is None: + raise ValueError(f"Failed to load image from {image}") + else: + img = image + + # Convert bboxes to numpy array + if isinstance(bboxes, list): + bboxes = np.array(bboxes, dtype=np.float32) + + # Handle empty bboxes + if len(bboxes) == 0: + num_kpts = 17 # COCO format + return ( + np.zeros((0, num_kpts, 3)), + np.zeros((0, num_kpts)), + np.zeros((0, num_kpts)), + None, + ) + + # Prepare masks + if masks is not None: + if isinstance(masks, list): + masks = np.array(masks) + # Ensure masks have correct shape (N, H, W) + if masks.ndim == 2: + masks = masks[np.newaxis, ...] + + # Update model config to return heatmaps if requested + if return_probmaps: + self._model.cfg.model.test_cfg.output_heatmaps = True + + # Run inference using mmpose API + pose_results = inference_topdown(self._model, img, bboxes=bboxes, masks=masks, bbox_format="xyxy") + + # Reset heatmap output setting + if return_probmaps: + self._model.cfg.model.test_cfg.output_heatmaps = False + + # Extract results + N = len(pose_results) + keypoints_list = [] + scores_list = [] + visibilities_list = [] + probabilities_list = [] + heatmaps_list = [] if return_probmaps else None + + K = pose_results[0].pred_instances.keypoints.shape[-2] if N > 0 else 17 # COCO keypoints + + for result in pose_results: + pred_instances = result.pred_instances + kpts = pred_instances.keypoints.reshape(K, 2) # (K, 2) + kpt_scores = pred_instances.keypoint_scores.reshape(K, 1) # (K, 1) + + # Extract visibility if available, otherwise use scores as a proxy + try: + kpt_visibiliteies = pred_instances.keypoints_visible.reshape(K, 1) # (K, 1) + except AttributeError: + kpt_visibiliteies = np.copy(kpt_scores) + + # Extract presence probabilities if available, otherwise use scores as a proxy + try: + kpt_presence = pred_instances.keypoints_probs.reshape(K, 1) # (K, 1) + except AttributeError: + kpt_presence = np.copy(kpt_scores) + + # Combine into (K, 3) format + kpts_with_scores = np.concatenate([kpts, kpt_scores], axis=1) + keypoints_list.append(kpts_with_scores) + scores_list.append(kpt_scores) + visibilities_list.append(kpt_visibiliteies) + probabilities_list.append(kpt_presence) + + if return_probmaps and hasattr(pred_instances, "heatmaps"): + heatmaps_list.append(pred_instances.heatmaps) + + # Stack results + keypoints = np.stack(keypoints_list, axis=0) if keypoints_list else np.zeros((0, K, 3)) + # keypoint_scores = np.stack(scores_list, axis=0) if scores_list else np.zeros((0, K, 1)) + keypoint_visibilities = np.stack(visibilities_list, axis=0) if visibilities_list else np.zeros((0, K, 1)) + keypoint_presence = np.stack(probabilities_list, axis=0) if probabilities_list else np.zeros((0, K, 1)) + + # Process heatmaps if requested + heatmaps = None + if return_probmaps and heatmaps_list: + heatmaps = np.stack(heatmaps_list, axis=0) + + return keypoints, keypoint_presence, keypoint_visibilities, heatmaps + + def get_features( + self, + image: Union[str, np.ndarray], + bboxes: Union[List, np.ndarray], + masks: Optional[Union[List, np.ndarray]] = None, + ) -> np.ndarray: + """ + Extract backbone features for given bounding boxes. + + Args: + image: Image path (str) or BGR numpy array. + bboxes: List/array of N bounding boxes in [x1, y1, x2, y2] format. + masks: Optional instance masks (same format as predict). + + Returns: + np.ndarray: Backbone features of shape (N, C, H, W). + """ + # Load image if path is provided + if isinstance(image, str): + img = cv2.imread(image) + if img is None: + raise ValueError(f"Failed to load image from {image}") + else: + img = image + + # Convert bboxes to numpy array + if isinstance(bboxes, list): + bboxes = np.array(bboxes, dtype=np.float32) + + if len(bboxes) == 0: + # Return empty features + return np.zeros((0, 768, 16, 16)) # Typical ViT-B features + + # Prepare masks + if masks is not None: + if isinstance(masks, list): + masks = np.array(masks) + if masks.ndim == 2: + masks = masks[np.newaxis, ...] + + # Run inference and extract features + # This requires accessing model internals + pose_results = inference_topdown(self._model, img, bboxes=bboxes, masks=masks, bbox_format="xyxy") + + # Extract features from results if available + # Note: This is a simplified implementation + # Real feature extraction would require model hooks + features_list = [] + for result in pose_results: + if hasattr(result, "features"): + features_list.append(result.features) + + if features_list: + return np.stack(features_list, axis=0) + else: + # Return placeholder if features not available + N = len(pose_results) + return np.zeros((N, 768, 16, 16)) + + def visualize( + self, + image: Union[str, np.ndarray], + keypoints: np.ndarray, + bboxes: Optional[np.ndarray] = None, + masks: Optional[np.ndarray] = None, + save_path: Optional[str] = None, + ) -> np.ndarray: + """ + Visualize pose estimation results on image. + + Args: + image: Image path (str) or BGR numpy array. + keypoints: (N, K, 3) array with [x, y, score]. + bboxes: Optional (N, 4) bounding boxes. + masks: Optional (N, H, W) binary masks. + save_path: Optional path to save visualization. + + Returns: + np.ndarray: Visualization image (BGR). + """ + # Load image if path is provided + if isinstance(image, str): + img = cv2.imread(image) + if img is None: + raise ValueError(f"Failed to load image from {image}") + else: + img = image.copy() + + # Visualize each person's pose + # for i, kpts in enumerate(keypoints): + keypoints_17 = keypoints[ + :, :17, : + ] # Assuming COCO format for visualization. For now, the visualization supports only 17 keypoints. + img = pose_visualization( + img, + keypoints_17, + width_multiplier=8, + differ_individuals=True, + keep_image_size=True, + ) + + # Save if requested + if save_path is not None: + cv2.imwrite(save_path, img) + + return img diff --git a/demo/mm_utils.py b/pmpose/mm_utils.py similarity index 98% rename from demo/mm_utils.py rename to pmpose/mm_utils.py index dda709ccc37b64eb8a43ed945c4827a9f1a69e4e..5797c99db5b2db1b888912bdc0ee6ecff06393d5 100644 --- a/demo/mm_utils.py +++ b/pmpose/mm_utils.py @@ -1,3 +1,5 @@ +# Copyright (c) authors of BBoxMaskPose (BMPv2). All rights reserved. + """ This module provides high-level interfaces to run MMDetection and MMPose models sequentially. Users can call run_MMDetector and run_MMPose from diff --git a/pmpose/posevis_lite.py b/pmpose/posevis_lite.py new file mode 100644 index 0000000000000000000000000000000000000000..7016ccf618565e8e25ea1c1dfddbf7a099d123ea --- /dev/null +++ b/pmpose/posevis_lite.py @@ -0,0 +1,511 @@ +# Copyright (c) authors of BBoxMaskPose (BMPv2). All rights reserved. + +import os +from typing import Any, Dict, List, Optional, Tuple, Union + +import cv2 +import numpy as np + +from bboxmaskpose.sam2.distinctipy import get_colors + +NEUTRAL_COLOR = (52, 235, 107) + +LEFT_ARM_COLOR = (216, 235, 52) +LEFT_LEG_COLOR = (235, 107, 52) +LEFT_SIDE_COLOR = (245, 188, 113) +LEFT_FACE_COLOR = (235, 52, 107) + +RIGHT_ARM_COLOR = (52, 235, 216) +RIGHT_LEG_COLOR = (52, 107, 235) +RIGHT_SIDE_COLOR = (52, 171, 235) +RIGHT_FACE_COLOR = (107, 52, 235) + +COCO_MARKERS = [ + ["nose", cv2.MARKER_CROSS, NEUTRAL_COLOR], + ["left_eye", cv2.MARKER_SQUARE, LEFT_FACE_COLOR], + ["right_eye", cv2.MARKER_SQUARE, RIGHT_FACE_COLOR], + ["left_ear", cv2.MARKER_CROSS, LEFT_FACE_COLOR], + ["right_ear", cv2.MARKER_CROSS, RIGHT_FACE_COLOR], + ["left_shoulder", cv2.MARKER_TRIANGLE_UP, LEFT_ARM_COLOR], + ["right_shoulder", cv2.MARKER_TRIANGLE_UP, RIGHT_ARM_COLOR], + ["left_elbow", cv2.MARKER_SQUARE, LEFT_ARM_COLOR], + ["right_elbow", cv2.MARKER_SQUARE, RIGHT_ARM_COLOR], + ["left_wrist", cv2.MARKER_CROSS, LEFT_ARM_COLOR], + ["right_wrist", cv2.MARKER_CROSS, RIGHT_ARM_COLOR], + ["left_hip", cv2.MARKER_TRIANGLE_UP, LEFT_LEG_COLOR], + ["right_hip", cv2.MARKER_TRIANGLE_UP, RIGHT_LEG_COLOR], + ["left_knee", cv2.MARKER_SQUARE, LEFT_LEG_COLOR], + ["right_knee", cv2.MARKER_SQUARE, RIGHT_LEG_COLOR], + ["left_ankle", cv2.MARKER_TILTED_CROSS, LEFT_LEG_COLOR], + ["right_ankle", cv2.MARKER_TILTED_CROSS, RIGHT_LEG_COLOR], +] + + +COCO_SKELETON = [ + [[16, 14], LEFT_LEG_COLOR], # Left ankle - Left knee + [[14, 12], LEFT_LEG_COLOR], # Left knee - Left hip + [[17, 15], RIGHT_LEG_COLOR], # Right ankle - Right knee + [[15, 13], RIGHT_LEG_COLOR], # Right knee - Right hip + [[12, 13], NEUTRAL_COLOR], # Left hip - Right hip + [[6, 12], LEFT_SIDE_COLOR], # Left hip - Left shoulder + [[7, 13], RIGHT_SIDE_COLOR], # Right hip - Right shoulder + [[6, 7], NEUTRAL_COLOR], # Left shoulder - Right shoulder + [[6, 8], LEFT_ARM_COLOR], # Left shoulder - Left elbow + [[7, 9], RIGHT_ARM_COLOR], # Right shoulder - Right elbow + [[8, 10], LEFT_ARM_COLOR], # Left elbow - Left wrist + [[9, 11], RIGHT_ARM_COLOR], # Right elbow - Right wrist + [[2, 3], NEUTRAL_COLOR], # Left eye - Right eye + [[1, 2], LEFT_FACE_COLOR], # Nose - Left eye + [[1, 3], RIGHT_FACE_COLOR], # Nose - Right eye + [[2, 4], LEFT_FACE_COLOR], # Left eye - Left ear + [[3, 5], RIGHT_FACE_COLOR], # Right eye - Right ear + [[4, 6], LEFT_FACE_COLOR], # Left ear - Left shoulder + [[5, 7], RIGHT_FACE_COLOR], # Right ear - Right shoulder +] + + +def _draw_line( + img: np.ndarray, + start: Tuple[float, float], + stop: Tuple[float, float], + color: Tuple[int, int, int], + line_type: str, + thickness: int = 1, +) -> np.ndarray: + """ + Draw a line segment on an image, supporting solid, dashed, or dotted styles. + + Args: + img (np.ndarray): BGR image of shape (H, W, 3). + start (tuple of float): (x, y) start coordinates. + stop (tuple of float): (x, y) end coordinates. + color (tuple of int): BGR color values. + line_type (str): One of 'solid', 'dashed', or 'doted'. + thickness (int): Line thickness in pixels. + + Returns: + np.ndarray: Image with the line drawn. + """ + start = np.array(start)[:2] + stop = np.array(stop)[:2] + if line_type.lower() == "solid": + img = cv2.line( + img, + (int(start[0]), int(start[1])), + (int(stop[0]), int(stop[1])), + color=color, + thickness=thickness, + lineType=cv2.LINE_AA, + ) + elif line_type.lower() == "dashed": + delta = stop - start + length = np.linalg.norm(delta) + frac = np.linspace(0, 1, num=int(length / 5), endpoint=True) + for i in range(0, len(frac) - 1, 2): + s = start + frac[i] * delta + e = start + frac[i + 1] * delta + img = cv2.line( + img, + (int(s[0]), int(s[1])), + (int(e[0]), int(e[1])), + color=color, + thickness=thickness, + lineType=cv2.LINE_AA, + ) + elif line_type.lower() == "doted": + delta = stop - start + length = np.linalg.norm(delta) + frac = np.linspace(0, 1, num=int(length / 5), endpoint=True) + for i in range(0, len(frac)): + s = start + frac[i] * delta + img = cv2.circle( + img, + (int(s[0]), int(s[1])), + radius=max(thickness // 2, 1), + color=color, + thickness=-1, + lineType=cv2.LINE_AA, + ) + return img + + +def pose_visualization( + img: Union[str, np.ndarray], + keypoints: Union[Dict[str, Any], np.ndarray], + format: str = "COCO", + greyness: float = 1.0, + show_markers: bool = True, + show_bones: bool = True, + line_type: str = "solid", + width_multiplier: float = 1.0, + bbox_width_multiplier: float = 1.0, + show_bbox: bool = False, + differ_individuals: bool = False, + confidence_thr: float = 0.3, + errors: Optional[np.ndarray] = None, + color: Optional[Tuple[int, int, int]] = None, + keep_image_size: bool = False, + return_padding: bool = False, +) -> Union[np.ndarray, Tuple[np.ndarray, List[int]]]: + """ + Overlay pose keypoints and skeleton on an image. + + Args: + img (str or np.ndarray): Path to image file or BGR image array. + keypoints (dict or np.ndarray): Either a dict with 'bbox' and 'keypoints' or + an array of shape (17, 2 or 3) or multiple poses stacked. + format (str): Keypoint format, currently only 'COCO'. + greyness (float): Factor for bone/marker color intensity (0.0-1.0). + show_markers (bool): Whether to draw keypoint markers. + show_bones (bool): Whether to draw skeleton bones. + line_type (str): One of 'solid', 'dashed', 'doted' for bone style. + width_multiplier (float): Line width scaling factor for bones. + bbox_width_multiplier (float): Line width scaling factor for bounding box. + show_bbox (bool): Whether to draw bounding box around keypoints. + differ_individuals (bool): Use distinct color per individual pose. + confidence_thr (float): Confidence threshold for keypoint visibility. + errors (np.ndarray or None): Optional array of per-kpt errors (17,1). + color (tuple or None): Override color for markers and bones. + keep_image_size (bool): Prevent image padding for out-of-bounds keypoints. + return_padding (bool): If True, also return padding offsets [top,bottom,left,right]. + + Returns: + np.ndarray or (np.ndarray, list of int): Annotated image, and optional + padding offsets if `return_padding` is True. + """ + + bbox = None + if isinstance(keypoints, dict): + try: + bbox = np.array(keypoints["bbox"]).flatten() + except KeyError: + pass + keypoints = np.array(keypoints["keypoints"]) + + # If keypoints is a list of poses, draw them all + if len(keypoints) % 17 != 0 or keypoints.ndim == 3: + + if color is not None: + if not isinstance(color, (list, tuple)): + color = [color for keypoint in keypoints] + else: + if differ_individuals: + color = ( + (np.array(get_colors(len(keypoints), exclude_colors=[(0, 1, 0), (0, 0, 0), (1, 1, 1)], rng=0)) * 255) + .astype(int) + .tolist() + ) + print("Created colors for individuals:", color) + else: + color = [None for keypoint in keypoints] + + max_padding = [0, 0, 0, 0] + for keypoint, clr in zip(keypoints, color): + img = pose_visualization( + img, + keypoint, + format=format, + greyness=greyness, + show_markers=show_markers, + show_bones=show_bones, + line_type=line_type, + width_multiplier=width_multiplier, + bbox_width_multiplier=bbox_width_multiplier, + show_bbox=show_bbox, + differ_individuals=differ_individuals, + color=clr, + confidence_thr=confidence_thr, + keep_image_size=keep_image_size, + return_padding=return_padding, + ) + if return_padding: + img, padding = img + max_padding = [max(max_padding[i], int(padding[i])) for i in range(4)] + + if return_padding: + return img, max_padding + else: + return img + + keypoints = np.array(keypoints).reshape(17, -1) + # If keypoint visibility is not provided, assume all keypoints are visible + if keypoints.shape[1] == 2: + keypoints = np.hstack([keypoints, np.ones((17, 1)) * 2]) + + assert keypoints.shape[1] == 3, "Keypoints should be in the format (x, y, visibility)" + assert keypoints.shape[0] == 17, "Keypoints should be in the format (x, y, visibility)" + + if errors is not None: + errors = np.array(errors).reshape(17, -1) + assert errors.shape[1] == 1, "Errors should be in the format (K, r)" + assert errors.shape[0] == 17, "Errors should be in the format (K, r)" + else: + errors = np.ones((17, 1)) * np.nan + + # If keypoint visibility is float between 0 and 1, it is detection + # If conf < confidence_thr: conf = 1 + # If conf >= confidence_thr: conf = 2 + vis_is_float = np.any(np.logical_and(keypoints[:, -1] > 0, keypoints[:, -1] < 1)) + if keypoints.shape[1] == 3 and vis_is_float: + # print("before", keypoints[:, -1]) + lower_idx = keypoints[:, -1] < confidence_thr + keypoints[lower_idx, -1] = 1 + keypoints[~lower_idx, -1] = 2 + # print("after", keypoints[:, -1]) + # print("-"*20) + + # All visibility values should be ints + keypoints[:, -1] = keypoints[:, -1].astype(int) + + if isinstance(img, str): + img = cv2.imread(img) + + if img is None: + if return_padding: + return None, [0, 0, 0, 0] + else: + return None + + if not (keypoints[:, 2] > 0).any(): + if return_padding: + return img, [0, 0, 0, 0] + else: + return img + + valid_kpts = (keypoints[:, 0] > 0) & (keypoints[:, 1] > 0) + num_valid_kpts = np.sum(valid_kpts) + + if num_valid_kpts == 0: + if return_padding: + return img, [0, 0, 0, 0] + else: + return img + + min_x_kpts = np.min(keypoints[keypoints[:, 2] > 0, 0]) + min_y_kpts = np.min(keypoints[keypoints[:, 2] > 0, 1]) + max_x_kpts = np.max(keypoints[keypoints[:, 2] > 0, 0]) + max_y_kpts = np.max(keypoints[keypoints[:, 2] > 0, 1]) + if bbox is None: + min_x = min_x_kpts + min_y = min_y_kpts + max_x = max_x_kpts + max_y = max_y_kpts + else: + min_x = bbox[0] + min_y = bbox[1] + max_x = bbox[2] + max_y = bbox[3] + + max_area = (max_x - min_x) * (max_y - min_y) + diagonal = np.sqrt((max_x - min_x) ** 2 + (max_y - min_y) ** 2) + line_width = max(int(np.sqrt(max_area) / 500 * width_multiplier), 1) + bbox_line_width = max(int(np.sqrt(max_area) / 500 * bbox_width_multiplier), 1) + marker_size = max(int(np.sqrt(max_area) / 80), 1) + invisible_marker_size = max(int(np.sqrt(max_area) / 100), 1) + marker_thickness = max(int(np.sqrt(max_area) / 100), 1) + + if differ_individuals: + if color is not None: + instance_color = color + else: + instance_color = np.random.randint(0, 255, size=(3,)).tolist() + instance_color = tuple(instance_color) + + # Pad image with dark gray if keypoints are outside the image + if not keep_image_size: + padding = [ + max(0, -min_y_kpts), + max(0, max_y_kpts - img.shape[0]), + max(0, -min_x_kpts), + max(0, max_x_kpts - img.shape[1]), + ] + padding = [int(p) for p in padding] + img = cv2.copyMakeBorder( + img, + padding[0], + padding[1], + padding[2], + padding[3], + cv2.BORDER_CONSTANT, + value=(80, 80, 80), + ) + + # Add padding to bbox and kpts + value_x_to_add = max(0, -min_x_kpts) + value_y_to_add = max(0, -min_y_kpts) + keypoints[keypoints[:, 2] > 0, 0] += value_x_to_add + keypoints[keypoints[:, 2] > 0, 1] += value_y_to_add + if bbox is not None: + bbox[0] += value_x_to_add + bbox[1] += value_y_to_add + bbox[2] += value_x_to_add + bbox[3] += value_y_to_add + + if show_bbox and not (bbox is None): + pts = [ + (bbox[0], bbox[1]), + (bbox[0], bbox[3]), + (bbox[2], bbox[3]), + (bbox[2], bbox[1]), + (bbox[0], bbox[1]), + ] + for i in range(len(pts) - 1): + if differ_individuals: + img = _draw_line(img, pts[i], pts[i + 1], instance_color, "doted", thickness=bbox_line_width) + else: + img = _draw_line(img, pts[i], pts[i + 1], (0, 255, 0), line_type, thickness=bbox_line_width) + + if show_markers: + for kpt, marker_info, err in zip(keypoints, COCO_MARKERS, errors): + if kpt[0] == 0 and kpt[1] == 0: + continue + + if kpt[2] != 2: + color = (140, 140, 140) + elif differ_individuals: + color = instance_color + else: + color = marker_info[2] + + if kpt[2] == 1: + img_overlay = img.copy() + img_overlay = cv2.drawMarker( + img_overlay, + (int(kpt[0]), int(kpt[1])), + color=color, + markerType=marker_info[1], + markerSize=marker_size, + thickness=marker_thickness, + ) + img = cv2.addWeighted(img_overlay, 0.4, img, 0.6, 0) + + else: + img = cv2.drawMarker( + img, + (int(kpt[0]), int(kpt[1])), + color=color, + markerType=marker_info[1], + markerSize=invisible_marker_size if kpt[2] == 1 else marker_size, + thickness=marker_thickness, + ) + + if not np.isnan(err).any(): + radius = err * diagonal + clr = (0, 0, 255) if "solid" in line_type else (0, 255, 0) + plus = 1 if "solid" in line_type else -1 + img = cv2.circle( + img, + (int(kpt[0]), int(kpt[1])), + radius=int(radius), + color=clr, + thickness=1, + lineType=cv2.LINE_AA, + ) + dx = np.sqrt(radius**2 / 2) + img = cv2.line( + img, + (int(kpt[0]), int(kpt[1])), + (int(kpt[0] + plus * dx), int(kpt[1] - dx)), + color=clr, + thickness=1, + lineType=cv2.LINE_AA, + ) + + if show_bones: + for bone_info in COCO_SKELETON: + kp1 = keypoints[bone_info[0][0] - 1, :] + kp2 = keypoints[bone_info[0][1] - 1, :] + + if (kp1[0] == 0 and kp1[1] == 0) or (kp2[0] == 0 and kp2[1] == 0): + continue + + dashed = kp1[2] == 1 or kp2[2] == 1 + + if differ_individuals: + color = np.array(instance_color) + else: + color = np.array(bone_info[1]) + color = (color * greyness).astype(int).tolist() + + if dashed: + img_overlay = img.copy() + img_overlay = _draw_line(img_overlay, kp1, kp2, color, line_type, thickness=line_width) + img = cv2.addWeighted(img_overlay, 0.4, img, 0.6, 0) + + else: + img = _draw_line(img, kp1, kp2, color, line_type, thickness=line_width) + + if return_padding: + return img, padding + else: + return img + + +if __name__ == "__main__": + kpts = np.array( + [ + 344, + 222, + 2, + 356, + 211, + 2, + 330, + 211, + 2, + 372, + 220, + 2, + 309, + 224, + 2, + 413, + 279, + 2, + 274, + 300, + 2, + 444, + 372, + 2, + 261, + 396, + 2, + 398, + 359, + 2, + 316, + 372, + 2, + 407, + 489, + 2, + 185, + 580, + 2, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + ] + ) + + kpts = kpts.reshape(-1, 3) + kpts[:, -1] = np.random.randint(1, 3, size=(17,)) + + img = pose_visualization("demo/posevis_test.jpg", kpts, show_markers=True, line_type="solid") + + kpts2 = kpts.copy() + kpts2[kpts2[:, 1] > 0, :2] += 10 + img = pose_visualization(img, kpts2, show_markers=False, line_type="doted") + + os.makedirs("demo/outputs", exist_ok=True) + cv2.imwrite("demo/outputs/posevis_test_out.jpg", img) diff --git a/requirements/albu.txt b/requirements/albu.txt new file mode 100644 index 0000000000000000000000000000000000000000..f421fbbdc472527e6010cb62a7d0236cf034f24f --- /dev/null +++ b/requirements/albu.txt @@ -0,0 +1 @@ +albumentations>=0.3.2 --no-binary qudida,albumentations diff --git a/requirements/build.txt b/requirements/build.txt new file mode 100644 index 0000000000000000000000000000000000000000..fb44aadd437e7d6aa0be1d424f2c77ed1e08c676 --- /dev/null +++ b/requirements/build.txt @@ -0,0 +1,3 @@ +# These must be installed before building mmpose +numpy +torch>=1.8 diff --git a/requirements/docs.txt b/requirements/docs.txt new file mode 100644 index 0000000000000000000000000000000000000000..d278090dbb0fe9979c76e85d9e46ef4254151d5a --- /dev/null +++ b/requirements/docs.txt @@ -0,0 +1,8 @@ +docutils==0.16.0 +markdown +myst-parser +-e git+https://github.com/gaotongxiao/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme +sphinx==4.5.0 +sphinx_copybutton +sphinx_markdown_tables +urllib3<2.0.0 diff --git a/requirements/mminstall.txt b/requirements/mminstall.txt new file mode 100644 index 0000000000000000000000000000000000000000..32eb00acebba1a8b0f579613063b62770177ab56 --- /dev/null +++ b/requirements/mminstall.txt @@ -0,0 +1,3 @@ +mmcv>=2.0.0,<2.2.0 +mmdet>=3.0.0,<3.3.0 +mmengine>=0.4.0,<1.0.0 diff --git a/requirements/optional.txt b/requirements/optional.txt new file mode 100644 index 0000000000000000000000000000000000000000..f2293605cf1b01dca72aad0a15c45b72ed5429a2 --- /dev/null +++ b/requirements/optional.txt @@ -0,0 +1 @@ +requests diff --git a/requirements/poseval.txt b/requirements/poseval.txt new file mode 100644 index 0000000000000000000000000000000000000000..f4d95e1afab83a79496c607e80b43a5ec231dd6b --- /dev/null +++ b/requirements/poseval.txt @@ -0,0 +1,2 @@ +poseval@git+https://github.com/svenkreiss/poseval.git +shapely==1.8.4 diff --git a/requirements/readthedocs.txt b/requirements/readthedocs.txt new file mode 100644 index 0000000000000000000000000000000000000000..13af2ec22dc97da169da3d511fe18577d5afd06d --- /dev/null +++ b/requirements/readthedocs.txt @@ -0,0 +1,9 @@ +mmcv>=2.0.0rc4 +mmengine>=0.6.0,<1.0.0 +munkres +regex +scipy +titlecase +torch>1.6 +torchvision +xtcocotools>=1.13 diff --git a/requirements/runtime.txt b/requirements/runtime.txt new file mode 100644 index 0000000000000000000000000000000000000000..2a4a4cd6f3da33549f2ba1c981deebe0fea311af --- /dev/null +++ b/requirements/runtime.txt @@ -0,0 +1,14 @@ +chumpy +json_tricks +matplotlib +munkres +numpy>=1.24.4 +torch>=2.1.0 +opencv-python +pillow>=9.4.0 +scipy +torchvision>=0.16.0 +xtcocotools>=1.12 +tqdm>=4.66.1 +hydra-core>=1.3.2 +iopath>=0.1.10 diff --git a/requirements/sam2_extras.txt b/requirements/sam2_extras.txt new file mode 100644 index 0000000000000000000000000000000000000000..a6b57e65d2ce80bb18514b11bd9b6db7ca1979ad --- /dev/null +++ b/requirements/sam2_extras.txt @@ -0,0 +1,24 @@ +matplotlib>=3.9.1 +jupyter>=1.0.0 +opencv-python>=4.7.0 +eva-decord>=0.6.1 +Flask>=3.0.3 +Flask-Cors>=5.0.0 +av>=13.0.0 +dataclasses-json>=0.6.7 +eva-decord>=0.6.1 +gunicorn>=23.0.0 +imagesize>=1.4.1 +pycocotools>=2.0.8 +strawberry-graphql>=0.243.0 +black==24.2.0 +usort==1.0.2 +ufmt==2.0.0b2 +fvcore>=0.1.5.post20221221 +pandas>=2.2.2 +scikit-image>=0.24.0 +tensorboard>=2.17.0 +pycocotools>=2.0.8 +tensordict>=0.6.0 +opencv-python>=4.7.0 +submitit>=1.5.1 diff --git a/requirements/sam3d.txt b/requirements/sam3d.txt new file mode 100644 index 0000000000000000000000000000000000000000..2c5badbe97599d624c00a69080c1d8260d8f6874 --- /dev/null +++ b/requirements/sam3d.txt @@ -0,0 +1,39 @@ +# SAM-3D-Body dependencies (optional) +# Install these only if you want to use 3D human mesh recovery +# +# Source: Requirements are based on SAM-3D-Body official installation guide +# (https://github.com/facebookresearch/sam-3d-body/blob/main/INSTALL.md) +# Version constraints are relaxed to minimum versions for broader compatibility +# unless a specific version is required by SAM-3D-Body. + +# Core dependencies from SAM-3D-Body +pytorch-lightning # Training framework (not strictly needed for inference, but required by SAM-3D-Body) +pyrender # 3D mesh rendering +yacs # Configuration system +scikit-image # Image processing utilities +einops # Tensor operations +timm # Vision transformer models +dill # Serialization +pandas # Data handling +rich # Terminal formatting +hydra-core>=1.3.0 # Configuration management (BMP also uses hydra-core>=1.3.2) +pyrootutils # Project root utilities +webdataset # Dataset loading +networkx>=3.2.1 # Graph operations (SAM-3D-Body specifies ==3.2.1, relaxed to >=) +roma # Rotation utilities +joblib # Parallelization +seaborn # Visualization +appdirs # Directory utilities +ffmpeg-python # Video processing +cython # C extensions +jsonlines # JSON line format +loguru # Logging +optree # Tree operations +fvcore # Facebook research utilities +huggingface-hub # Model downloading + +# Note: The following need special installation commands: +# - detectron2: pip install 'git+https://github.com/facebookresearch/detectron2.git@a1ce2f9' --no-build-isolation --no-deps +# (Specific commit required by SAM-3D-Body for compatibility) +# - MoGe (optional FOV estimator): pip install git+https://github.com/microsoft/MoGe.git +# (Optional, improves camera calibration) diff --git a/requirements/tests.txt b/requirements/tests.txt new file mode 100644 index 0000000000000000000000000000000000000000..c63bc90822fee9b8352d70c3362e4a1ab5fde68d --- /dev/null +++ b/requirements/tests.txt @@ -0,0 +1,9 @@ +coverage +flake8 +interrogate +isort==4.3.21 +parameterized +pytest +pytest-runner +xdoctest>=0.10.0 +yapf diff --git a/sam2/configs/samurai/__init__.py b/sam2/configs/samurai/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/sam2/configs/samurai/sam2.1_hiera_b+.yaml b/sam2/configs/samurai/sam2.1_hiera_b+.yaml deleted file mode 100644 index 3650edc236d02c03a5492f55e7d8cb5a946b758b..0000000000000000000000000000000000000000 --- a/sam2/configs/samurai/sam2.1_hiera_b+.yaml +++ /dev/null @@ -1,125 +0,0 @@ -# @package _global_ - -# Model -model: - _target_: sam2.modeling.sam2_base.SAM2Base - image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder - scalp: 1 - trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera - embed_dim: 112 - num_heads: 2 - neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck - position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine - num_pos_feats: 256 - normalize: true - scale: null - temperature: 10000 - d_model: 256 - backbone_channel_list: [896, 448, 224, 112] - fpn_top_down_levels: [2, 3] # output level 0 and 1 directly use the backbone features - fpn_interp_model: nearest - - memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention - d_model: 256 - pos_enc_at_input: true - layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer - activation: relu - dim_feedforward: 2048 - dropout: 0.1 - pos_enc_at_attn: false - self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention - rope_theta: 10000.0 - feat_sizes: [32, 32] - embedding_dim: 256 - num_heads: 1 - downsample_rate: 1 - dropout: 0.1 - d_model: 256 - pos_enc_at_cross_attn_keys: true - pos_enc_at_cross_attn_queries: false - cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention - rope_theta: 10000.0 - feat_sizes: [32, 32] - rope_k_repeat: True - embedding_dim: 256 - num_heads: 1 - downsample_rate: 1 - dropout: 0.1 - kv_in_dim: 64 - num_layers: 4 - - memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder - out_dim: 64 - position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine - num_pos_feats: 64 - normalize: true - scale: null - temperature: 10000 - mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler - kernel_size: 3 - stride: 2 - padding: 1 - fuser: - _target_: sam2.modeling.memory_encoder.Fuser - layer: - _target_: sam2.modeling.memory_encoder.CXBlock - dim: 256 - kernel_size: 7 - padding: 3 - layer_scale_init_value: 1e-6 - use_dwconv: True # depth-wise convs - num_layers: 2 - - num_maskmem: 7 - image_size: 1024 - # apply scaled sigmoid on mask logits for memory encoder, and directly feed input mask as output mask - sigmoid_scale_for_mem_enc: 20.0 - sigmoid_bias_for_mem_enc: -10.0 - use_mask_input_as_output_without_sam: true - # Memory - directly_add_no_mem_embed: true - no_obj_embed_spatial: true - # use high-resolution feature map in the SAM mask decoder - use_high_res_features_in_sam: true - # output 3 masks on the first click on initial conditioning frames - multimask_output_in_sam: true - # SAM heads - iou_prediction_use_sigmoid: True - # cross-attend to object pointers from other frames (based on SAM output tokens) in the encoder - use_obj_ptrs_in_encoder: true - add_tpos_enc_to_obj_ptrs: true - proj_tpos_enc_in_obj_ptrs: true - use_signed_tpos_enc_to_obj_ptrs: true - only_obj_ptrs_in_the_past_for_eval: true - # object occlusion prediction - pred_obj_scores: true - pred_obj_scores_mlp: true - fixed_no_obj_ptr: true - # multimask tracking settings - multimask_output_for_tracking: true - use_multimask_token_for_obj_ptr: true - multimask_min_pt_num: 0 - multimask_max_pt_num: 1 - use_mlp_for_obj_ptr_proj: true - # Compilation flag - compile_image_encoder: False - # SAMURAI - samurai_mode: true - stable_frames_threshold: 15 - stable_ious_threshold: 0.3 - min_obj_score_logits: -1 - kf_score_weight: 0.25 - memory_bank_iou_threshold: 0.5 - memory_bank_obj_score_threshold: 0.0 - memory_bank_kf_score_threshold: 0.0 diff --git a/sam2/configs/samurai/sam2.1_hiera_l.yaml b/sam2/configs/samurai/sam2.1_hiera_l.yaml deleted file mode 100644 index 8458dbbc0261da04619ec902cffb03e3fb44499c..0000000000000000000000000000000000000000 --- a/sam2/configs/samurai/sam2.1_hiera_l.yaml +++ /dev/null @@ -1,129 +0,0 @@ -# @package _global_ - -# Model -model: - _target_: sam2.modeling.sam2_base.SAM2Base - image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder - scalp: 1 - trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera - embed_dim: 144 - num_heads: 2 - stages: [2, 6, 36, 4] - global_att_blocks: [23, 33, 43] - window_pos_embed_bkg_spatial_size: [7, 7] - window_spec: [8, 4, 16, 8] - neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck - position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine - num_pos_feats: 256 - normalize: true - scale: null - temperature: 10000 - d_model: 256 - backbone_channel_list: [1152, 576, 288, 144] - fpn_top_down_levels: [2, 3] # output level 0 and 1 directly use the backbone features - fpn_interp_model: nearest - - memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention - d_model: 256 - pos_enc_at_input: true - layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer - activation: relu - dim_feedforward: 2048 - dropout: 0.1 - pos_enc_at_attn: false - self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention - rope_theta: 10000.0 - feat_sizes: [32, 32] - embedding_dim: 256 - num_heads: 1 - downsample_rate: 1 - dropout: 0.1 - d_model: 256 - pos_enc_at_cross_attn_keys: true - pos_enc_at_cross_attn_queries: false - cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention - rope_theta: 10000.0 - feat_sizes: [32, 32] - rope_k_repeat: True - embedding_dim: 256 - num_heads: 1 - downsample_rate: 1 - dropout: 0.1 - kv_in_dim: 64 - num_layers: 4 - - memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder - out_dim: 64 - position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine - num_pos_feats: 64 - normalize: true - scale: null - temperature: 10000 - mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler - kernel_size: 3 - stride: 2 - padding: 1 - fuser: - _target_: sam2.modeling.memory_encoder.Fuser - layer: - _target_: sam2.modeling.memory_encoder.CXBlock - dim: 256 - kernel_size: 7 - padding: 3 - layer_scale_init_value: 1e-6 - use_dwconv: True # depth-wise convs - num_layers: 2 - - num_maskmem: 7 - image_size: 1024 - # apply scaled sigmoid on mask logits for memory encoder, and directly feed input mask as output mask - sigmoid_scale_for_mem_enc: 20.0 - sigmoid_bias_for_mem_enc: -10.0 - use_mask_input_as_output_without_sam: true - # Memory - directly_add_no_mem_embed: true - no_obj_embed_spatial: true - # use high-resolution feature map in the SAM mask decoder - use_high_res_features_in_sam: true - # output 3 masks on the first click on initial conditioning frames - multimask_output_in_sam: true - # SAM heads - iou_prediction_use_sigmoid: True - # cross-attend to object pointers from other frames (based on SAM output tokens) in the encoder - use_obj_ptrs_in_encoder: true - add_tpos_enc_to_obj_ptrs: true - proj_tpos_enc_in_obj_ptrs: true - use_signed_tpos_enc_to_obj_ptrs: true - only_obj_ptrs_in_the_past_for_eval: true - # object occlusion prediction - pred_obj_scores: true - pred_obj_scores_mlp: true - fixed_no_obj_ptr: true - # multimask tracking settings - multimask_output_for_tracking: true - use_multimask_token_for_obj_ptr: true - multimask_min_pt_num: 0 - multimask_max_pt_num: 1 - use_mlp_for_obj_ptr_proj: true - # Compilation flag - compile_image_encoder: False - # SAMURAI - samurai_mode: true - stable_frames_threshold: 15 - stable_ious_threshold: 0.3 - min_obj_score_logits: -1 - kf_score_weight: 0.15 - memory_bank_iou_threshold: 0.5 - memory_bank_obj_score_threshold: 0.0 - memory_bank_kf_score_threshold: 0.0 \ No newline at end of file diff --git a/sam2/configs/samurai/sam2.1_hiera_s.yaml b/sam2/configs/samurai/sam2.1_hiera_s.yaml deleted file mode 100644 index d703cf7651229858f28204f3f2f9a541a3a88040..0000000000000000000000000000000000000000 --- a/sam2/configs/samurai/sam2.1_hiera_s.yaml +++ /dev/null @@ -1,128 +0,0 @@ -# @package _global_ - -# Model -model: - _target_: sam2.modeling.sam2_base.SAM2Base - image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder - scalp: 1 - trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera - embed_dim: 96 - num_heads: 1 - stages: [1, 2, 11, 2] - global_att_blocks: [7, 10, 13] - window_pos_embed_bkg_spatial_size: [7, 7] - neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck - position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine - num_pos_feats: 256 - normalize: true - scale: null - temperature: 10000 - d_model: 256 - backbone_channel_list: [768, 384, 192, 96] - fpn_top_down_levels: [2, 3] # output level 0 and 1 directly use the backbone features - fpn_interp_model: nearest - - memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention - d_model: 256 - pos_enc_at_input: true - layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer - activation: relu - dim_feedforward: 2048 - dropout: 0.1 - pos_enc_at_attn: false - self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention - rope_theta: 10000.0 - feat_sizes: [32, 32] - embedding_dim: 256 - num_heads: 1 - downsample_rate: 1 - dropout: 0.1 - d_model: 256 - pos_enc_at_cross_attn_keys: true - pos_enc_at_cross_attn_queries: false - cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention - rope_theta: 10000.0 - feat_sizes: [32, 32] - rope_k_repeat: True - embedding_dim: 256 - num_heads: 1 - downsample_rate: 1 - dropout: 0.1 - kv_in_dim: 64 - num_layers: 4 - - memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder - out_dim: 64 - position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine - num_pos_feats: 64 - normalize: true - scale: null - temperature: 10000 - mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler - kernel_size: 3 - stride: 2 - padding: 1 - fuser: - _target_: sam2.modeling.memory_encoder.Fuser - layer: - _target_: sam2.modeling.memory_encoder.CXBlock - dim: 256 - kernel_size: 7 - padding: 3 - layer_scale_init_value: 1e-6 - use_dwconv: True # depth-wise convs - num_layers: 2 - - num_maskmem: 7 - image_size: 1024 - # apply scaled sigmoid on mask logits for memory encoder, and directly feed input mask as output mask - sigmoid_scale_for_mem_enc: 20.0 - sigmoid_bias_for_mem_enc: -10.0 - use_mask_input_as_output_without_sam: true - # Memory - directly_add_no_mem_embed: true - no_obj_embed_spatial: true - # use high-resolution feature map in the SAM mask decoder - use_high_res_features_in_sam: true - # output 3 masks on the first click on initial conditioning frames - multimask_output_in_sam: true - # SAM heads - iou_prediction_use_sigmoid: True - # cross-attend to object pointers from other frames (based on SAM output tokens) in the encoder - use_obj_ptrs_in_encoder: true - add_tpos_enc_to_obj_ptrs: true - proj_tpos_enc_in_obj_ptrs: true - use_signed_tpos_enc_to_obj_ptrs: true - only_obj_ptrs_in_the_past_for_eval: true - # object occlusion prediction - pred_obj_scores: true - pred_obj_scores_mlp: true - fixed_no_obj_ptr: true - # multimask tracking settings - multimask_output_for_tracking: true - use_multimask_token_for_obj_ptr: true - multimask_min_pt_num: 0 - multimask_max_pt_num: 1 - use_mlp_for_obj_ptr_proj: true - # Compilation flag - compile_image_encoder: False - # SAMURAI - samurai_mode: true - stable_frames_threshold: 15 - stable_ious_threshold: 0.3 - min_obj_score_logits: -1 - kf_score_weight: 0.25 - memory_bank_iou_threshold: 0.5 - memory_bank_obj_score_threshold: 0.0 - memory_bank_kf_score_threshold: 0.0 \ No newline at end of file diff --git a/sam2/configs/samurai/sam2.1_hiera_t.yaml b/sam2/configs/samurai/sam2.1_hiera_t.yaml deleted file mode 100644 index 43c1435134510b6fcc4251601b1840339fb8c92d..0000000000000000000000000000000000000000 --- a/sam2/configs/samurai/sam2.1_hiera_t.yaml +++ /dev/null @@ -1,130 +0,0 @@ -# @package _global_ - -# Model -model: - _target_: sam2.modeling.sam2_base.SAM2Base - image_encoder: - _target_: sam2.modeling.backbones.image_encoder.ImageEncoder - scalp: 1 - trunk: - _target_: sam2.modeling.backbones.hieradet.Hiera - embed_dim: 96 - num_heads: 1 - stages: [1, 2, 7, 2] - global_att_blocks: [5, 7, 9] - window_pos_embed_bkg_spatial_size: [7, 7] - neck: - _target_: sam2.modeling.backbones.image_encoder.FpnNeck - position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine - num_pos_feats: 256 - normalize: true - scale: null - temperature: 10000 - d_model: 256 - backbone_channel_list: [768, 384, 192, 96] - fpn_top_down_levels: [2, 3] # output level 0 and 1 directly use the backbone features - fpn_interp_model: nearest - - memory_attention: - _target_: sam2.modeling.memory_attention.MemoryAttention - d_model: 256 - pos_enc_at_input: true - layer: - _target_: sam2.modeling.memory_attention.MemoryAttentionLayer - activation: relu - dim_feedforward: 2048 - dropout: 0.1 - pos_enc_at_attn: false - self_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention - rope_theta: 10000.0 - feat_sizes: [32, 32] - embedding_dim: 256 - num_heads: 1 - downsample_rate: 1 - dropout: 0.1 - d_model: 256 - pos_enc_at_cross_attn_keys: true - pos_enc_at_cross_attn_queries: false - cross_attention: - _target_: sam2.modeling.sam.transformer.RoPEAttention - rope_theta: 10000.0 - feat_sizes: [32, 32] - rope_k_repeat: True - embedding_dim: 256 - num_heads: 1 - downsample_rate: 1 - dropout: 0.1 - kv_in_dim: 64 - num_layers: 4 - - memory_encoder: - _target_: sam2.modeling.memory_encoder.MemoryEncoder - out_dim: 64 - position_encoding: - _target_: sam2.modeling.position_encoding.PositionEmbeddingSine - num_pos_feats: 64 - normalize: true - scale: null - temperature: 10000 - mask_downsampler: - _target_: sam2.modeling.memory_encoder.MaskDownSampler - kernel_size: 3 - stride: 2 - padding: 1 - fuser: - _target_: sam2.modeling.memory_encoder.Fuser - layer: - _target_: sam2.modeling.memory_encoder.CXBlock - dim: 256 - kernel_size: 7 - padding: 3 - layer_scale_init_value: 1e-6 - use_dwconv: True # depth-wise convs - num_layers: 2 - - num_maskmem: 7 - image_size: 1024 - # apply scaled sigmoid on mask logits for memory encoder, and directly feed input mask as output mask - # SAM decoder - sigmoid_scale_for_mem_enc: 20.0 - sigmoid_bias_for_mem_enc: -10.0 - use_mask_input_as_output_without_sam: true - # Memory - directly_add_no_mem_embed: true - no_obj_embed_spatial: true - # use high-resolution feature map in the SAM mask decoder - use_high_res_features_in_sam: true - # output 3 masks on the first click on initial conditioning frames - multimask_output_in_sam: true - # SAM heads - iou_prediction_use_sigmoid: True - # cross-attend to object pointers from other frames (based on SAM output tokens) in the encoder - use_obj_ptrs_in_encoder: true - add_tpos_enc_to_obj_ptrs: true - proj_tpos_enc_in_obj_ptrs: true - use_signed_tpos_enc_to_obj_ptrs: true - only_obj_ptrs_in_the_past_for_eval: true - # object occlusion prediction - pred_obj_scores: true - pred_obj_scores_mlp: true - fixed_no_obj_ptr: true - # multimask tracking settings - multimask_output_for_tracking: true - use_multimask_token_for_obj_ptr: true - multimask_min_pt_num: 0 - multimask_max_pt_num: 1 - use_mlp_for_obj_ptr_proj: true - # Compilation flag - # HieraT does not currently support compilation, should always be set to False - compile_image_encoder: False - # SAMURAI - samurai_mode: true - stable_frames_threshold: 15 - stable_ious_threshold: 0.3 - min_obj_score_logits: -1 - kf_score_weight: 0.25 - memory_bank_iou_threshold: 0.5 - memory_bank_obj_score_threshold: 0.0 - memory_bank_kf_score_threshold: 0.0 \ No newline at end of file